Hierarchical queries

The typical hierarchical query is you want to select your boss, and the boss of your boss, etc.
It could look like

select prior ename ename, ename mgr
from emp
connect by prior mgr=empno
start with ename=’SCOTT’;

SCOTT
SCOTT JONES
JONES KING

I start with Scott and the hierarchy is built. I can use the pseudo column LEVEL in hierarchical queries.
One of the biggest problem in hierarchical queries is ORA-01436: CONNECT BY loop in user data.
If you are your own boss, or if you are the boss of your boss, then you have built a cycle. It is probably not wished to have this relation, but it cannot be enforced by a constraint and before 10g, it was difficult to detect. In 10g, you have a new clause, CONNECT BY NOCYCLE, which detect cycles and give flag.

Here I want to present an alternative way of using hierarchies.
Situation:
I have five boxes of five different sizes. Tiny up to 5 liters. Small up to 10 liters. Medium up to 15 liters. Big up to 20 liters. Hudge up to 25 liters.
I have 3 fluids, and I want to find the smallest boxes.
Yeah! very easy, you do select fluids.volume, min(box.capacity) from fluids, box where capacity>=volume group by volume, do not you?
Ok, but I do not want to mix the fluids! So I need 3 different containers.
This means, I will start with the first product, find the smallest box, go to the second, find the smallest free box, and so on.
Hmm… It seems an impossible task with analytics, least, min, keep, lag, lead, ???
Ok, I am going to build a hierarchy, based on volume > prior volume and capacity > prior capacity.
Then I do a min with the sys_connect_by_path function.

select max(sys_connect_by_path(volume,’/’)) volume_path,
ltrim(min(lpad(sys_connect_by_path(capacity,’/’),999))) capacity_path
from box,
(select row_number() over (order by volume) r, count(*) over () c, volume from fluids)
where volume
connect by capacity > prior capacity and r > prior r;

/7/11/14 /10/15/20

There is also one more CONNECT I would like to briefly mention, is the connect without prior.
select level from dual connect by level<11;
It is a special way of creating pivot tables, but it is fairly dangerous, and could make your session / database hang, depending on your oracle version.

Oracle analytics in basic sql queries

When I first saw analytics appearing in Oracle last century, I did not realised they were going to change my way of writting basic SQL queries.

Some (Variance, deviance) are truely mathematical and still reserved for statistical analysis.

Here I will try to describe ROW_NUMBER :

Back in Oracle 7, I remember to have written a lot of reports using in max subqueries, like in

select deptno,ename,sal from emp where (deptno,sal) in (select deptno,max(sal) from emp group by deptno);

With analytics, I can rewrite it with

select deptno,ename,sal
from (
select emp.*, rank() over (
partition by deptno
order by sal desc) r
from emp
) where r=1;

If I want to get exactly one row per deptno, I could then write something like

select deptno,ename,sal
from (
select emp.*, row_number() over (
partition by deptno
order by sal desc) r
from emp
) where r=1;

or better

select deptno,ename,sal
from (
select emp.*, row_number() over (
partition by deptno
order by sal desc, empno) r
from emp
) where r=1;

row_number will select only one row. I prefer the second, because empno is a primary key and the result will be constant over time. In the first solution, I cannot determine which row will be returned, and Oracle may choose one on the morning, and another one in the afternoon, depending on the execution plan (new index/new stats/different load/…).

Note that I can also select the 2nd biggest salary (r=2), or the top 5 (r<=5) It is also very performant, because you are doing only one full table scan instead of two