KEEP DENSE_RANK versus ROW_NUMBER()

I often see questions like

How do you get the row of each department with the highest salary

In case you only want 1 row, you have two modern solutions :
Analytics, which is trend, and KEEP, which is not very known

The old fashion would be something like where s in (select max())

Ok, let’s start with analytics


SQL> select ename,deptno,sal
2 from (select ename,deptno,sal,
3 row_number() over (partition by deptno
4 order by sal desc,empno) r from emp)
5 where r=1;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850

and the KEEP method, which is a special aggregation


SQL> select max(ename) keep (dense_rank first
2 order by sal desc,empno) ename,
3 deptno,max(sal) sal
4 from emp group by deptno;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850

the second one should be more performant


SQL> select count(*) from emp2;
COUNT(*)
----------
917504

SQL> select max(ename) keep (dense_rank first
2 order by sal desc,empno) ename,
3 deptno,max(sal) sal
4 from emp group by deptno;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850

Elapsed: 00:00:01.00
SQL> select ename,deptno,sal
2 from (select ename,deptno,sal,
3 row_number() over (partition by deptno
4 order by sal desc,empno) r from emp)
5 where r=1;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850

Elapsed: 00:00:01.43

10 thoughts on “KEEP DENSE_RANK versus ROW_NUMBER()

  1. Mennan

    Can it be possible to use KEEP to find out duplicates that have more than one key?
    I found duplicates with two methods and tried to find out with KEEP. My work is below:

    Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0
    Connected as HR

    SQL>
    SQL> drop table dup;

    Table dropped

    SQL> create table dup(i number, x varchar2(8), d date );

    Table created

    SQL> BEGIN
    2 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE + 5));
    3 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE));
    4 INSERT INTO dup VALUES (2, ‘X’, trunc(SYSDATE));
    5 INSERT INTO dup VALUES (2, ‘C’, trunc(SYSDATE – 1));
    6 INSERT INTO dup VALUES (2, ‘Y’, trunc(SYSDATE – 3));
    7 INSERT INTO dup VALUES (2, ‘Y’, trunc(SYSDATE + 3));
    8 INSERT INTO dup VALUES (1, ‘Z’, trunc(SYSDATE));
    9 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE + 1));
    10 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE));
    11 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
    12 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
    13 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 2));
    14 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
    15 END;
    16 /

    PL/SQL procedure successfully completed

    SQL> commit;

    Commit complete

    SQL> SELECT * FROM dup;

    I X D
    ———- ——– ———–
    1 X 30.09.2006
    1 X 25.09.2006
    2 X 25.09.2006
    2 C 24.09.2006
    2 Y 22.09.2006
    2 Y 28.09.2006
    1 Z 25.09.2006
    1 X 26.09.2006
    1 X 25.09.2006
    4 Z 26.09.2006
    4 Z 26.09.2006
    4 Z 27.09.2006
    4 Z 26.09.2006

    13 rows selected

    SQL> —
    SQL> SELECT i, x, d, COUNT(*)
    2 FROM dup
    3 GROUP BY i, x, d
    4 HAVING COUNT(*) > 1;

    I X D COUNT(*)
    ———- ——– ———– ———-
    4 Z 26.09.2006 3
    1 X 25.09.2006 2

    SQL> —
    SQL> SELECT dd.i, dd.x, dd.d
    2 FROM (SELECT i,
    3 x,
    4 d,
    5 ROWID,
    6 row_number() over(PARTITION BY i, x, d ORDER BY ROWID) rn
    7 FROM dup) dd
    8 WHERE dd.rn != 1;

    I X D
    ———- ——– ———–
    1 X 25.09.2006
    4 Z 26.09.2006
    4 Z 26.09.2006

    SQL> —
    SQL> SELECT *
    2 FROM dup a
    3 WHERE a.ROWID NOT IN (SELECT MAX(ROWID)
    4 FROM dup b
    5 WHERE a.i = b.i
    6 AND a.x = b.x
    7 AND a.d = b.d);

    I X D
    ———- ——– ———–
    1 X 25.09.2006
    4 Z 26.09.2006
    4 Z 26.09.2006

    SQL>

  2. Laurent Schneider

    well, group by will suppress rows, so you will not get it without subquery.

    evtl
    select i,x,d,rowid from dup minus
    select i,x,d,max(rowid) keep (dense_rank first order by rownum) from dup group by i,x,d

    I X D ROWID
    ———- ——– ——— ——————
    1 X 25-SEP-06 AAAE5oAAEAAAAA6AAI
    4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAK
    4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAM

    but actually the keep is not needed!

    and rownumber is better

  3. Laurent Schneider

    in case you cannot use rowid, I could imagine using rownum like that :

    select i,x,d,rownum from dup minus
    select i,x,d,max(rownum) from dup group by i,x,d;

    I X D ROWNUM
    ———- ——– ——— ———-
    1 X 25-SEP-06 2
    4 Z 26-SEP-06 10
    4 Z 26-SEP-06 11

  4. Mennan

    Yes you are right keep is not needed. As i see without subquery it is not possible. I will do a simple performance test will post the results as soon as possible.

    I could not undersstand why you said “in case you cannot use rowid”. Using rowid is giving is exact solution, isn’t it?

    Thanks for replies…
    Regards, Mennan

  5. Laurent Schneider

    if you want to select from a view, or a subquery, you may want to use rownum

    SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
    2 select x, rownum from t minus select x,max(rownum) from t group by x;
    X ROWNUM
    ———- ———-
    2 2

    SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
    2 select x, rowid from t minus select x,max(rowid) from t group by x;
    select x, rowid from t minus select x,max(rowid) from t group by x
    *
    ERROR at line 2:
    ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.

    about performance, the minus + max will be terrible, I do not want to see it 😉

    go for rownumber+rowid

  6. Anonymous

    When I look at Explain Plan this doesn’t do what I expect and does not look as efficient as I expect.


    SQL> explain plan for select min(component_id) keep (dense_rank first order by component_id nulls last) waof
    2 , max(component_id) keep (dense_rank first order by component_id nulls last) waof2
    3 from i_component
    4 /

    已解释。

    SQL> @C:oraclexeapporacleproduct10.2.0serverRDBMSADMINutlxpls.sql

    PLAN_TABLE_OUTPUT
    ————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
    Plan hash value: 1142152098

    ———————————————————————————–
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ———————————————————————————–
    | 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01 |
    | 1 | SORT AGGREGATE | | 1 | 4 | | |
    | 2 | INDEX FULL SCAN| I_COMPONENT_PK | 391 | 1564 | 1 (0)| 00:00:01 |
    ———————————————————————————–

    已选择9行。

    Why doesn’t it just get the first row from the primary key rather than doing the Index Full Scan?

  7. Pingback: KEEP DENSE_RANK versus ROW_NUMBER – further details – Oramoss Ltd.

Comments are closed.