When I first saw analytics appearing in Oracle last century, I did not realised they were going to change my way of writting basic SQL queries.

Some (Variance, deviance) are truely mathematical and still reserved for statistical analysis.

Here I will try to describe ROW_NUMBER :

Back in Oracle 7, I remember to have written a lot of reports using *in max* subqueries, like in

select deptno,ename,sal from emp where (deptno,sal) in (select deptno,max(sal) from emp group by deptno);

With analytics, I can rewrite it with

select deptno,ename,sal

from (

select emp.*, rank() over (

partition by deptno

order by sal desc) r

from emp

) where r=1;

If I want to get exactly one row per deptno, I could then write something like

select deptno,ename,sal

from (

select emp.*, row_number() over (

partition by deptno

order by sal desc) r

from emp

) where r=1;

or better

select deptno,ename,sal

from (

select emp.*, row_number() over (

partition by deptno

order by sal desc, empno) r

from emp

) where r=1;

row_number will select only one row. I prefer the second, because empno is a primary key and the result will be constant over time. In the first solution, I cannot determine which row will be returned, and Oracle may choose one on the morning, and another one in the afternoon, depending on the execution plan (new index/new stats/different load/…).

Note that I can also select the 2nd biggest salary (r=2), or the top 5 (r<=5) It is also very performant, because you are doing only one full table scan instead of two