KEEP DENSE_RANK versus ROW_NUMBER()

Post author By Laurent Schneider
Post date September 18, 2006
10 Comments on KEEP DENSE_RANK versus ROW_NUMBER()

I often see questions like

How do you get the row of each department with the highest salary

In case you only want 1 row, you have two modern solutions :
Analytics, which is trend, and KEEP, which is not very known

The old fashion would be something like where s in (select max())

Ok, let’s start with analytics

SQL> select ename,deptno,sal 2 from (select ename,deptno,sal, 3 row_number() over (partition by deptno 4 order by sal desc,empno) r from emp) 5 where r=1; ENAME DEPTNO SAL ---------- ---------- ---------- KING 10 5000 SCOTT 20 3000 BLAKE 30 2850

and the KEEP method, which is a special aggregation

SQL> select max(ename) keep (dense_rank first 2 order by sal desc,empno) ename, 3 deptno,max(sal) sal 4 from emp group by deptno; ENAME DEPTNO SAL ---------- ---------- ---------- KING 10 5000 SCOTT 20 3000 BLAKE 30 2850

the second one should be more performant

SQL> select count(*) from emp2; COUNT(*) ---------- 917504


SQL> select max(ename) keep (dense_rank first

  2  order by sal desc,empno) ename,

  3  deptno,max(sal) sal

  4  from emp group by deptno;

ENAME          DEPTNO        SAL

---------- ---------- ----------

KING               10       5000

SCOTT              20       3000

BLAKE              30       2850
Elapsed: 00:00:01.00

SQL> select ename,deptno,sal

  2  from (select ename,deptno,sal,

  3  row_number() over (partition by deptno

  4  order by sal desc,empno) r from emp)

  5  where r=1;

ENAME          DEPTNO        SAL

---------- ---------- ----------

KING               10       5000

SCOTT              20       3000

BLAKE              30       2850

Elapsed: 00:00:01.43

10 thoughts on “KEEP DENSE_RANK versus ROW_NUMBER()”

Nicolas Gasparotto September 18, 2006 at 23:06

That’s why your post here

😉
Mennan September 25, 2006 at 16:08

Can it be possible to use KEEP to find out duplicates that have more than one key?
I found duplicates with two methods and tried to find out with KEEP. My work is below:

Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0
Connected as HR

SQL>
SQL> drop table dup;

Table dropped

SQL> create table dup(i number, x varchar2(8), d date );

Table created

SQL> BEGIN
2 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE + 5));
3 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE));
4 INSERT INTO dup VALUES (2, ‘X’, trunc(SYSDATE));
5 INSERT INTO dup VALUES (2, ‘C’, trunc(SYSDATE – 1));
6 INSERT INTO dup VALUES (2, ‘Y’, trunc(SYSDATE – 3));
7 INSERT INTO dup VALUES (2, ‘Y’, trunc(SYSDATE + 3));
8 INSERT INTO dup VALUES (1, ‘Z’, trunc(SYSDATE));
9 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE + 1));
10 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE));
11 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
12 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
13 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 2));
14 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
15 END;
16 /

PL/SQL procedure successfully completed

SQL> commit;

Commit complete

SQL> SELECT * FROM dup;

I X D
———- ——– ———–
1 X 30.09.2006
1 X 25.09.2006
2 X 25.09.2006
2 C 24.09.2006
2 Y 22.09.2006
2 Y 28.09.2006
1 Z 25.09.2006
1 X 26.09.2006
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006
4 Z 27.09.2006
4 Z 26.09.2006

13 rows selected

SQL> —
SQL> SELECT i, x, d, COUNT(*)
2 FROM dup
3 GROUP BY i, x, d
4 HAVING COUNT(*) > 1;

I X D COUNT(*)
———- ——– ———– ———-
4 Z 26.09.2006 3
1 X 25.09.2006 2

SQL> —
SQL> SELECT dd.i, dd.x, dd.d
2 FROM (SELECT i,
3 x,
4 d,
5 ROWID,
6 row_number() over(PARTITION BY i, x, d ORDER BY ROWID) rn
7 FROM dup) dd
8 WHERE dd.rn != 1;

I X D
———- ——– ———–
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006

SQL> —
SQL> SELECT *
2 FROM dup a
3 WHERE a.ROWID NOT IN (SELECT MAX(ROWID)
4 FROM dup b
5 WHERE a.i = b.i
6 AND a.x = b.x
7 AND a.d = b.d);

I X D
———- ——– ———–
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006

SQL>
Laurent Schneider September 25, 2006 at 16:56

well, group by will suppress rows, so you will not get it without subquery.

evtl
select i,x,d,rowid from dup minus
select i,x,d,max(rowid) keep (dense_rank first order by rownum) from dup group by i,x,d

I X D ROWID
———- ——– ——— ——————
1 X 25-SEP-06 AAAE5oAAEAAAAA6AAI
4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAK
4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAM

but actually the keep is not needed!

and rownumber is better
Laurent Schneider September 25, 2006 at 17:07

in case you cannot use rowid, I could imagine using rownum like that :

select i,x,d,rownum from dup minus
select i,x,d,max(rownum) from dup group by i,x,d;

I X D ROWNUM
———- ——– ——— ———-
1 X 25-SEP-06 2
4 Z 26-SEP-06 10
4 Z 26-SEP-06 11
Mennan September 26, 2006 at 08:28

Yes you are right keep is not needed. As i see without subquery it is not possible. I will do a simple performance test will post the results as soon as possible.

I could not undersstand why you said “in case you cannot use rowid”. Using rowid is giving is exact solution, isn’t it?

Thanks for replies…
Regards, Mennan
Laurent Schneider September 26, 2006 at 10:17

if you want to select from a view, or a subquery, you may want to use rownum

SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
2 select x, rownum from t minus select x,max(rownum) from t group by x;
X ROWNUM
———- ———-
2 2

SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
2 select x, rowid from t minus select x,max(rowid) from t group by x;
select x, rowid from t minus select x,max(rowid) from t group by x
*
ERROR at line 2:
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.

about performance, the minus + max will be terrible, I do not want to see it 😉

go for rownumber+rowid
Mennan September 26, 2006 at 10:33

Thanks, i got it
Anonymous October 6, 2006 at 08:02

When I look at Explain Plan this doesn’t do what I expect and does not look as efficient as I expect.

SQL> explain plan for select min(component_id) keep (dense_rank first order by component_id nulls last) waof
2 , max(component_id) keep (dense_rank first order by component_id nulls last) waof2
3 from i_component
4 /

已解释。

SQL> @C:oraclexeapporacleproduct10.2.0serverRDBMSADMINutlxpls.sql

PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Plan hash value: 1142152098

———————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | INDEX FULL SCAN| I_COMPONENT_PK | 391 | 1564 | 1 (0)| 00:00:01 |
———————————————————————————–

已选择9行。

Why doesn’t it just get the first row from the primary key rather than doing the Index Full Scan?
Jeff Moss March 12, 2010 at 23:43

Nice post…useful stuff.

A bit more evidence on the following post on my blog and a bit more discussion…

http://www.oramoss.com/blog/?p=157

Cheers
Jeff
Pingback: KEEP DENSE_RANK versus ROW_NUMBER – further details – Oramoss Ltd.

Comments are closed.