I often see questions like

*How do you get the row of each department with the highest salary*

In case **you only want 1 row**, you have two modern solutions :

Analytics, which is trend, and KEEP, which is not very known

The old fashion would be something like *where s in (select max())*

Ok, let’s start with analytics

SQL> select ename,deptno,sal 2 from (select ename,deptno,sal, 3 row_number() over (partition by deptno 4 order by sal desc,empno) r from emp) 5 where r=1; ENAME DEPTNO SAL ---------- ---------- ---------- KING 10 5000 SCOTT 20 3000 BLAKE 30 2850

and the KEEP method, which is a special aggregation

SQL> select max(ename) keep (dense_rank first 2 order by sal desc,empno) ename, 3 deptno,max(sal) sal 4 from emp group by deptno; ENAME DEPTNO SAL ---------- ---------- ---------- KING 10 5000 SCOTT 20 3000 BLAKE 30 2850

the second one should be more performant

SQL> select count(*) from emp2; COUNT(*) ---------- 917504 SQL> select max(ename) keep (dense_rank first 2 order by sal desc,empno) ename, 3 deptno,max(sal) sal 4 from emp group by deptno; ENAME DEPTNO SAL ---------- ---------- ---------- KING 10 5000 SCOTT 20 3000 BLAKE 30 2850 Elapsed: 00:00:01.00 SQL> select ename,deptno,sal 2 from (select ename,deptno,sal, 3 row_number() over (partition by deptno 4 order by sal desc,empno) r from emp) 5 where r=1; ENAME DEPTNO SAL ---------- ---------- ---------- KING 10 5000 SCOTT 20 3000 BLAKE 30 2850 Elapsed: 00:00:01.43

That’s why your post

hereCan it be possible to use KEEP to find out duplicates that have more than one key?

I found duplicates with two methods and tried to find out with KEEP. My work is below:

Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0

Connected as HR

SQL>

SQL> drop table dup;

Table dropped

SQL> create table dup(i number, x varchar2(8), d date );

Table created

SQL> BEGIN

2 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE + 5));

3 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE));

4 INSERT INTO dup VALUES (2, ‘X’, trunc(SYSDATE));

5 INSERT INTO dup VALUES (2, ‘C’, trunc(SYSDATE – 1));

6 INSERT INTO dup VALUES (2, ‘Y’, trunc(SYSDATE – 3));

7 INSERT INTO dup VALUES (2, ‘Y’, trunc(SYSDATE + 3));

8 INSERT INTO dup VALUES (1, ‘Z’, trunc(SYSDATE));

9 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE + 1));

10 INSERT INTO dup VALUES (1, ‘X’, trunc(SYSDATE));

11 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));

12 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));

13 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 2));

14 INSERT INTO dup VALUES (4, ‘Z’, trunc(SYSDATE + 1));

15 END;

16 /

PL/SQL procedure successfully completed

SQL> commit;

Commit complete

SQL> SELECT * FROM dup;

I X D

———- ——– ———–

1 X 30.09.2006

1 X 25.09.2006

2 X 25.09.2006

2 C 24.09.2006

2 Y 22.09.2006

2 Y 28.09.2006

1 Z 25.09.2006

1 X 26.09.2006

1 X 25.09.2006

4 Z 26.09.2006

4 Z 26.09.2006

4 Z 27.09.2006

4 Z 26.09.2006

13 rows selected

SQL> —

SQL> SELECT i, x, d, COUNT(*)

2 FROM dup

3 GROUP BY i, x, d

4 HAVING COUNT(*) > 1;

I X D COUNT(*)

———- ——– ———– ———-

4 Z 26.09.2006 3

1 X 25.09.2006 2

SQL> —

SQL> SELECT dd.i, dd.x, dd.d

2 FROM (SELECT i,

3 x,

4 d,

5 ROWID,

6 row_number() over(PARTITION BY i, x, d ORDER BY ROWID) rn

7 FROM dup) dd

8 WHERE dd.rn != 1;

I X D

———- ——– ———–

1 X 25.09.2006

4 Z 26.09.2006

4 Z 26.09.2006

SQL> —

SQL> SELECT *

2 FROM dup a

3 WHERE a.ROWID NOT IN (SELECT MAX(ROWID)

4 FROM dup b

5 WHERE a.i = b.i

6 AND a.x = b.x

7 AND a.d = b.d);

I X D

———- ——– ———–

1 X 25.09.2006

4 Z 26.09.2006

4 Z 26.09.2006

SQL>

well, group by will suppress rows, so you will not get it without subquery.

evtl

select i,x,d,rowid from dup minus

select i,x,d,max(rowid) keep (dense_rank first order by rownum) from dup group by i,x,d

I X D ROWID

———- ——– ——— ——————

1 X 25-SEP-06 AAAE5oAAEAAAAA6AAI

4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAK

4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAM

but actually the keep is not needed!

and rownumber is better

in case you cannot use rowid, I could imagine using rownum like that :

select i,x,d,rownum from dup minus

select i,x,d,max(rownum) from dup group by i,x,d;

I X D ROWNUM

———- ——– ——— ———-

1 X 25-SEP-06 2

4 Z 26-SEP-06 10

4 Z 26-SEP-06 11

Yes you are right keep is not needed. As i see without subquery it is not possible. I will do a simple performance test will post the results as soon as possible.

I could not undersstand why you said “in case you cannot use rowid”. Using rowid is giving is exact solution, isn’t it?

Thanks for replies…

Regards, Mennan

if you want to select from a view, or a subquery, you may want to use rownum

SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)

2 select x, rownum from t minus select x,max(rownum) from t group by x;

X ROWNUM

———- ———-

2 2

SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)

2 select x, rowid from t minus select x,max(rowid) from t group by x;

select x, rowid from t minus select x,max(rowid) from t group by x

*

ERROR at line 2:

ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.

about performance, the minus + max will be terrible, I do not want to see it

go for rownumber+rowid

Thanks, i got it

When I look at Explain Plan this doesn’t do what I expect and does not look as efficient as I expect.

SQL> explain plan for select min(component_id) keep (dense_rank first order by component_id nulls last) waof

2 , max(component_id) keep (dense_rank first order by component_id nulls last) waof2

3 from i_component

4 /

已解释。

SQL> @C:oraclexeapporacleproduct10.2.0serverRDBMSADMINutlxpls.sql

PLAN_TABLE_OUTPUT

————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————

Plan hash value: 1142152098

———————————————————————————–

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

———————————————————————————–

| 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 4 | | |

| 2 | INDEX FULL SCAN| I_COMPONENT_PK | 391 | 1564 | 1 (0)| 00:00:01 |

———————————————————————————–

已选择9行。Why doesn’t it just get the first row from the primary key rather than doing the Index Full Scan?

Nice post…useful stuff.

A bit more evidence on the following post on my blog and a bit more discussion…

http://www.oramoss.com/blog/?p=157

Cheers

Jeff