KEEP DENSE_RANK versus ROW_NUMBER()

I often see questions like

How do you get the row of each department with the highest salary

In case you only want 1 row, you have two modern solutions :
Analytics, which is trend, and KEEP, which is not very known

The old fashion would be something like where s in (select max())

Ok, let’s start with analytics

SQL> select ename,deptno,sal
  2  from (select ename,deptno,sal,
  3  row_number() over (partition by deptno
  4  order by sal desc,empno) r from emp)
  5  where r=1;
ENAME          DEPTNO        SAL
---------- ---------- ----------
KING               10       5000
SCOTT              20       3000
BLAKE              30       2850

and the KEEP method, which is a special aggregation

SQL> select max(ename) keep (dense_rank first
  2  order by sal desc,empno) ename,
  3  deptno,max(sal) sal
  4  from emp group by deptno;
ENAME          DEPTNO        SAL
---------- ---------- ----------
KING               10       5000
SCOTT              20       3000
BLAKE              30       2850

the second one should be more performant

SQL> select count(*) from emp2;
  COUNT(*)
----------
    917504

SQL> select max(ename) keep (dense_rank first
  2  order by sal desc,empno) ename,
  3  deptno,max(sal) sal
  4  from emp group by deptno;
ENAME          DEPTNO        SAL
---------- ---------- ----------
KING               10       5000
SCOTT              20       3000
BLAKE              30       2850

Elapsed: 00:00:01.00
SQL> select ename,deptno,sal
  2  from (select ename,deptno,sal,
  3  row_number() over (partition by deptno
  4  order by sal desc,empno) r from emp)
  5  where r=1;
ENAME          DEPTNO        SAL
---------- ---------- ----------
KING               10       5000
SCOTT              20       3000
BLAKE              30       2850

Elapsed: 00:00:01.43

Published by

Laurent Schneider

Oracle Certified Master

9 thoughts on “KEEP DENSE_RANK versus ROW_NUMBER()”

  1. Can it be possible to use KEEP to find out duplicates that have more than one key?
    I found duplicates with two methods and tried to find out with KEEP. My work is below:

    Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0
    Connected as HR

    SQL>
    SQL> drop table dup;

    Table dropped

    SQL> create table dup(i number, x varchar2(8), d date );

    Table created

    SQL> BEGIN
    2 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE + 5));
    3 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE));
    4 INSERT INTO dup VALUES (2, ‘X’, trunc(SYSDATE));
    5 INSERT INTO dup VALUES (2, ‘C’, trunc(SYSDATE – 1));
    6 INSERT INTO dup VALUES (2, ‘Y’, trunc(SYSDATE – 3));
    7 INSERT INTO dup VALUES (2, ‘Y’, trunc(SYSDATE + 3));
    8 INSERT INTO dup VALUES (1, ‘Z’, trunc(SYSDATE));
    9 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE + 1));
    10 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE));
    11 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
    12 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
    13 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 2));
    14 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));
    15 END;
    16 /

    PL/SQL procedure successfully completed

    SQL> commit;

    Commit complete

    SQL> SELECT * FROM dup;

    I X D
    ———- ——– ———–
    1 X 30.09.2006
    1 X 25.09.2006
    2 X 25.09.2006
    2 C 24.09.2006
    2 Y 22.09.2006
    2 Y 28.09.2006
    1 Z 25.09.2006
    1 X 26.09.2006
    1 X 25.09.2006
    4 Z 26.09.2006
    4 Z 26.09.2006
    4 Z 27.09.2006
    4 Z 26.09.2006

    13 rows selected

    SQL> —
    SQL> SELECT i, x, d, COUNT(*)
    2 FROM dup
    3 GROUP BY i, x, d
    4 HAVING COUNT(*) > 1;

    I X D COUNT(*)
    ———- ——– ———– ———-
    4 Z 26.09.2006 3
    1 X 25.09.2006 2

    SQL> —
    SQL> SELECT dd.i, dd.x, dd.d
    2 FROM (SELECT i,
    3 x,
    4 d,
    5 ROWID,
    6 row_number() over(PARTITION BY i, x, d ORDER BY ROWID) rn
    7 FROM dup) dd
    8 WHERE dd.rn != 1;

    I X D
    ———- ——– ———–
    1 X 25.09.2006
    4 Z 26.09.2006
    4 Z 26.09.2006

    SQL> —
    SQL> SELECT *
    2 FROM dup a
    3 WHERE a.ROWID NOT IN (SELECT MAX(ROWID)
    4 FROM dup b
    5 WHERE a.i = b.i
    6 AND a.x = b.x
    7 AND a.d = b.d);

    I X D
    ———- ——– ———–
    1 X 25.09.2006
    4 Z 26.09.2006
    4 Z 26.09.2006

    SQL>

  2. well, group by will suppress rows, so you will not get it without subquery.

    evtl
    select i,x,d,rowid from dup minus
    select i,x,d,max(rowid) keep (dense_rank first order by rownum) from dup group by i,x,d

    I X D ROWID
    ———- ——– ——— ——————
    1 X 25-SEP-06 AAAE5oAAEAAAAA6AAI
    4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAK
    4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAM

    but actually the keep is not needed!

    and rownumber is better

  3. in case you cannot use rowid, I could imagine using rownum like that :

    select i,x,d,rownum from dup minus
    select i,x,d,max(rownum) from dup group by i,x,d;

    I X D ROWNUM
    ———- ——– ——— ———-
    1 X 25-SEP-06 2
    4 Z 26-SEP-06 10
    4 Z 26-SEP-06 11

  4. Yes you are right keep is not needed. As i see without subquery it is not possible. I will do a simple performance test will post the results as soon as possible.

    I could not undersstand why you said “in case you cannot use rowid”. Using rowid is giving is exact solution, isn’t it?

    Thanks for replies…
    Regards, Mennan

  5. if you want to select from a view, or a subquery, you may want to use rownum

    SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
    2 select x, rownum from t minus select x,max(rownum) from t group by x;
    X ROWNUM
    ———- ———-
    2 2

    SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
    2 select x, rowid from t minus select x,max(rowid) from t group by x;
    select x, rowid from t minus select x,max(rowid) from t group by x
    *
    ERROR at line 2:
    ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.

    about performance, the minus + max will be terrible, I do not want to see it ;-)

    go for rownumber+rowid

  6. When I look at Explain Plan this doesn’t do what I expect and does not look as efficient as I expect.


    SQL> explain plan for select min(component_id) keep (dense_rank first order by component_id nulls last) waof
    2 , max(component_id) keep (dense_rank first order by component_id nulls last) waof2
    3 from i_component
    4 /

    已解释。

    SQL> @C:oraclexeapporacleproduct10.2.0serverRDBMSADMINutlxpls.sql

    PLAN_TABLE_OUTPUT
    ————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
    Plan hash value: 1142152098

    ———————————————————————————–
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ———————————————————————————–
    | 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01 |
    | 1 | SORT AGGREGATE | | 1 | 4 | | |
    | 2 | INDEX FULL SCAN| I_COMPONENT_PK | 391 | 1564 | 1 (0)| 00:00:01 |
    ———————————————————————————–

    已选择9行。

    Why doesn’t it just get the first row from the primary key rather than doing the Index Full Scan?

Leave a Reply

Your email address will not be published.


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>