Category Archives: sql

stragg in 10gR2

well, you all know string aggregration
http://www.oracle-base.com/articles/10g/StringAggregationTechniques.php

here is a suggestion with xquery in 10gR2

select deptno,
  replace( replace( replace(
    XMLQUERY(‘for $cc in ora:view(“emp”) let $ename:=$cc/ROW/ENAME/text()     where $cc/ROW/DEPTNO/number()=$deptno/d/number() return <e>{$ename}</e>’
    passing by value xmltype(‘<d>’||deptno||’</d>’) as “deptno”
    returning content
  ),’</e><e>’, ‘,’),’<e>’),’</e>’) enames
from dept
/

DEPTNO ENAMES
—— ——————————————
    10 CLARK,KING,MILLER
    20 SMITH,JONES,SCOTT,ADAMS,FORD
    30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
    40

dynamic number of columns

probably one of the most frequently asked question, you want to have a table like

            ACCOUNTING OPERATIONS RESEARCH SALES
ANALYST              0          0        2     0
CLERK                1          0        2     1
MANAGER              0          0        1     1
PRESIDENT            1          0        0     0
SALESMAN             0          0        0     4

but the number and name of columns must be dynamic.

typically, I answer : this is not possible in plain sql. you need to use plsql.

well. It is not too much beautifoul, but I have a suggestion

step one:
generate the select

spool crosstab.sql
select text from (
  select 1 i, ‘select job’ text from dual
  union all
  select 2+rownum, ‘, count(decode(deptno,’||deptno||’,deptno)) ‘||dname from
    (select deptno,dname from dept order by dname)
  union all
  select 1e125, ‘ from emp group by job order by job;’ from dual
)
order by i
/
spool off

step two:
execute the generated query
@crosstab
JOB       ACCOUNTING OPERATIONS   RESEARCH      SALES
——— ———- ———- ———- ———-
ANALYST            0          0          2          0
CLERK              1          0          2          1
MANAGER            1          0          1          1
PRESIDENT          1          0          0          0
SALESMAN           0          0          0          4

done!

pivot table

big mission today : explain the mess with pivot table.

let’s look the following query
SQL> select extract(year from hiredate) year, count(*) from emp group by extract(year from hiredate) order by year
YEAR COU
—– —
1980   1
1981  10
1982   1
1987   2

how do we get the years without hiredate with 0?

1) with a table of years
it is fast, and easy to undestand. But it require to create a table

create table year(year number);
insert into year values (1980);
insert into year values (1981);
insert into year values (1982);
insert into year values (1983);
insert into year values (1984);
insert into year values (1985);
insert into year values (1986);
insert into year values (1987);
insert into year values (1988);
insert into year values (1989);

SQL> select year, count(hiredate) COU from emp right join year on (extract(year from hiredate)=year) group by year order by year
YEAR COU
—– —
1980   1
1981  10
1982   1
1983   0
1984   0
1985   0
1986   0
1987   2
1988   0
1989   0

2) I have a few artifice to select from dual
A. union
SQL> select year, count(hiredate) COU from emp right join
(select 1980 year from dual
union all select 1981 from dual
union all select 1982 from dual
union all select 1983 from dual
union all select 1984 from dual
union all select 1985 from dual
union all select 1986 from dual
union all select 1987 from dual
union all select 1988 from dual
union all select 1989 from dual)
on (extract(year from hiredate)=year) group by year order by year;

ok for 10 rows, but not for 1000!

B. cube
a trick, which may not work in all versions
SQL> select year, count(hiredate) COU from emp right join
(select 1980+rownum year from
(select null from dual group by cube (null,null,null,null))
where rownum

not ok if the number of rows to be generated is volatile, sometimes 1, sometimes 1000000

C. connect by
even more dirty tricky, no guarantee that it will work

SQL> select year, count(hiredate) COU from emp right join
(select 1980+level year from dual connect by level

3) use rownum and all_objects
SQL> select year, count(hiredate) COU from emp right join
(select 1980+rownum year from all_objects where rownum

not scalable, you do not know how many rows are in all_objects (500,5000,50000?), and not performant at all, because all_objects is a complex view

4) PL/SQL pipelined table
perfectly scalable, quite fast, requires to create a few objects

SQL> create type t_year as TABLE OF number;
/

Type created.

SQL> create or replace function f_year(n1 number, n2 number) return t_year pipelined is
begin for i in n1..n2 loop pipe row(i); end loop; return; end;
/

SQL> select COLUMN_VALUE year, count(hiredate)
from emp right join
table(f_year((select min(extract(year from hiredate)) from emp), (select max(extract(year from hiredate)) from emp)))
on ( COLUMN_VALUE = extract(year from hiredate) )
group by COLUMN_VALUE
YEAR COU
—– —
1980   1
1981  10
1982   1
1983   0
1984   0
1985   0
1986   0
1987   2

I like the last one, because you can return 1 or 1000000000 rows, the query will not change. It will be fast for 1 row, and slow but will still work for 1000000000 rows.

pivot table part 2
pivot table part 3

group by does not sort

An user just posted an interresting question today. Why the Group By is crazy?

I summarize his example

SQL> SELECT NUM FROM (select 400 num from dual union select 310 from dual union select 220000 from dual) group by num,null;
    NUM
——-
    400
 220000
    310

Well, group by is “sorting”, but how? this seems crazy. Oracle use the sort algorythm he wants. He can ascending-sort, descending-sort, hash-sort, or any other internal algorythm.

Here, 220000 is before 310 because it is smaller in bytes.

Have a look

SELECT num,dump(num) FROM (select 400 num from dual union select 310 from dual union select 220000 from dual) group by num,null;
       NUM DUMP(NUM)
———- ————————-
       400 Typ=2 Len=2: 194,5
    220000 Typ=2 Len=2: 195,23
       310 Typ=2 Len=3: 194,4,11

Well, if you need to sort, use order by and read tom blog

TO_CHAR(…, ‘D’)

How do I get than MONDAY=1, TUESDAY=2, WEDNESDAY=3 … ?

With to_char()

alter session set nls_territory=germany;
select to_char(sysdate,’DAY D’) from dual;
TUESDAY 2

With decode()

select decode(to_char(sysdate, ‘FMDAY’, ‘NLS_DATE_LANGUAGE=american’),’MONDAY’, ’1′, ‘TUESDAY’, ’2′, ‘…’)) from dual;

With mod()
As a reference, I take monday Jan 1st, 1000.
select mod(trunc(sysdate)-date ’1000-01-01′,7)+1 from dual;
2

How do I trunc date to current monday?
trunc(date, ‘D’) or here is with my solution with 1000-01-01:
select trunc((sysdate-date ’1000-01-01′)/7)*7+date ’1000-01-01′ from dual;

thanks to jan-marcel idea, I found that one
trunc(date,’IW’) for current monday and date-trunc(date,’IW’)+1 for day number

deterministic

if I create a function for a materialized view with query rewrite or for a function based index, I must create hte deterministic.

f(x) = x*2 is deterministic. for a give x, f(x) will always be the same, f(5) will be always 10; always.

f(y) = sysdate+y is non-deterministic. For a given y, the return values may vary.

OK, we cannot use sysdate in mviews. What else should I say?

Well, some are less evident to find out!

Some limit cases:
to_date(’2000′,’YYYY’) is non deterministic, it returns the 1st of current month, 2000, f. ex. 2000-07-01 or 2000-08-01
to_char(date ’2000-01-01′,’DAY’) is non deterministic, it can deliver SATURDAY or SAMSTAG
to_char(date ’2000-01-01′,’DAY’,'NLS_DATE_LANGUAGE=american’) is deterministic
to_char(date ’2000-01-01′,’D') is non deterministic, it can deliver 6,7,1 or 2, depending what your territory is (in Bangladesh, the week starts on friday)
timestamp ’2005-10-30 02:30:00 Europe/Zurich’ is per default non deterministic, if you try it in summer, you will get time offset +02:00, and in winter you will get +01:00, because the period 02:00-02:59, Oct 30th, exists in both time zones. This is called time boundaries. I can make it deterministic by setting ERROR_ON_OVERLAP_TIME in the session, in which case boundaries will be rejected.

I can always define a function as deterministic, at my own risk…

For exemple if I have t(id,gender) {1,male;2,female}, I could have a function

f(id)=select gender from t where t.id=f.id;

and I could define it as deterministic. However, if I decide to set id=2 for female later, I am prepared to have corrupted data, wrong results, inconsistencies and even ORA-600 errors.

How old are you?

I just come back from holiday, I am quite busy at the moment. Here is a tiny function to get the age

trunc((to_char(sysdate,’YYYYMMDD’)-to_char(birthdate,’YYYYMMDD’))/10000)

it is much safer than add_months, because add_months do some conversion at the end of the month, and I would never accept to wait until Feb 29th, 2008 (28-2-1990 + 18*12 months) to be 18 if I were born Feb 28th, 1990.

exotic constraints

Today I read a post on metalink where the user wanted a unique constraint for not-null values…

Sounds easy, because Oracle never indexes null in btree index.

If I have only one column, I simply index it, it will work.

SQL> create table t66 ( n number);

Table created.

SQL> create unique index i66 on t66(n);

Index created.

SQL> insert into t66 values (null);

1 row created.

SQL> insert into t66 values (null);

1 row created.

The nulls are not indexed. In that post today, the user is using a two column index, and do not want to enforce that constraint when one of the column is null. No problem, we can use FBI to enforce this.

create index i on t(decode(col2,null,null,col1), decode(col1,null,null,col2));

so the index will contain only entries were both columns are not null.

Yesterday a user on forums.oracle.com wanted to have a not-different constraint, that is was only accepting entries [p;r] if [p;s] does not exist. It is quite hard to solved. I have read an interresting solution using ON-COMMIT-REFRESH materialized view with aggregates and constraints.

A long time ago, one user wanted a constraint “table should contain only one row”.

create unique index i on t(col*0);
would ensure at most one row, with col NOT NULL

I think I can do better.

Imagine the user wants always exactly one row in STATUS(code number)

create table STATUS_MAXONE(code number, n number default 0 primary key);
create view STATUS as select code from STATUS_MAXONE;
insert into STATUS values (null);
create table STATUS_MINONE(n number references STATUS_MAXONE(n));
insert into STATUS_MINONE values (0);

Now the user can update the view STATUS, but neither delete nor insert…

Sometimes, you can also have circular foreign key constraints, for example, a PERSON can only marry with someone who exists in PERSON, that is a PERSON.SPOUSE => PERSON.ID relation. This is also quite special… Actually, a person can only marry his spouse, that means if I am your SPOUSE, you are my SPOUSE! But there is no way to reference a column that may be null (we must reference a primary key).

Using unique index on FBI has the limitation of FBI: function must be deterministic. Same with triggers. complex constraints needs additional tables to enforce your business rules.

Like the p;r; accepting p;r; and q;s; but not p;s;
we could simply have a table containing p;s; with p as primary key…

unexpected results !

It makes you cry! It makes you claim you have found a bug! but it is working as specified!

1) subquery refers to a column of the main query
select * from emp where ename in (select ename from dept where deptno=10);

the query does not complain that column does not exist in dept. It is perfectly legal to specify a non-prefixed column of the main query in the subquery. This could be like

select * from tab where 'foo' in (select 'foo' from dict);

so the “in” clause is always true

2) “not in” does not deliver result

select sysdate from dual where 400 not in (select comm from emp);

this is because 400!=null is UNKOWN. Check the 3 state booleans operations!

it could be rewritten with a not exists, or in 10g with LNNVL

select sysdate from dual where lnnvl(400 in (select comm from emp));

3) rows are not delivered in the correct order
it is a wrong assumption to think that the rows will be delivered in a specific order if you do not specify order by.

4) table not found


SQL> select * from tab;
TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
GreatTable                     TABLE

SQL> desc GreatTable
ERROR:
ORA-04043: object GreatTable does not exist

Whoever it is, someone created a case sensitive table name.


SQL> desc "GreatTable"
 Name                    Null?    Type
 ----------------------- -------- ----------------
 MagicColumn                      NUMBER(38,2)

The time has gone

Sure we know date. Oracle always store date and time in an internal format. It could be understood as a number of seconds since Jan 1st, -4712. It must be smaller than Jan 1st, 10000. Let’s say we do not care of the Y10K nowadays…

It works well and it is easy to add days and fraction of days. A common notation I use is date ‘YYYY-MM-DD’. This bypass the nls settings.

I add one day to 28-Feb-1900

select date ’1900-02-28′ + 1 from dual;
01.03.1900

Cool it works! Try it in in excel and meet the Y1900 bug :-)

The first problem I am aware off are zero dates insert by OCI in 8i, which look like 0000-00-00, but evaluates to somewhen in 10101 BC!

The second one, more serious, is the Y0 bug. There is no year 0 in our Era, because the zero was not know when our Calendar was created. But Oracle has a year 0.

Proof:
select date ’0001-01-01′ – 1 from dual;
31.12.0000

And it is full of bugs!

SQL> select to_char(date ’0000-01-01′, ‘DD/MM/YYYY’) from dual;
00/00/0000

SQL> select to_char(date ’0000-01-01′, ‘DD/MM/YYYYSP’) from dual;
00/00/000000000000000000000000000000000000000000

SQL> select to_char(date ’0000-01-01′, ‘DD/MM/YYYY JSP’) from dual;
01/01/0000 ONE MILLION SEVEN HUNDRED TWENTY-ONE THOUSAND FIFTY-EIGHT

Ok, what about timestamp?
Basically, a timestamp is a date + a fraction of seconds + evtl a timezone offset.

Again there, I like to use the ANSI notation TIMESTAMP ‘YYYY-MM-DD HH24:MI:SS.FF’

I try to stick to timezone region, it scales better over summer time than timezone offsets.

SQL> select timestamp ’2005-10-30 01:30:00.00 Europe/Zurich’ + to_dsinterval(’0 2:0:0′) from dual;
30.10.2005 02:30:00.00 EUROPE/ZURICH

Timezone conversion is done with AT

SQL> select current_timestamp at time zone ‘US/Central’ from dual;
14.06.2005 09:25:11.77 US/CENTRAL

Timestamps do not accept addition of numbers. The only think you can add is interval, but take care, it is no longer a pure “addition”, because the associativity and commutativity properties are not retained.

(1Mon + ts) + 1Day != 1Mon + (ts + 1Day)
ts + 1Mon + 1Day != ts + 1Day + 1Mon

The + seems to be a “group operator” there, not the mathematical addition.

Anyway, if you want to know when a baby born 2000-02-29 will have his 18th birthday, you should rather ask a lawyer than a dba :-)

order by to_number ?

to_number is often subject to generate ORA-01722: invalid number.

When I deal with integers, I prefer lpad.

ex: sort pseudo-numeric
select * from t order by lpad(col,20);
1
2
10

It performs well, and do not generate ora-01722. ora-01722 does not mean that I have bad data.

select name,value from v$parameter where name like ‘%pool_size’ order by to_number(value);
This works on my db, but it is bad coding, because to_number could well be executed for other parameters. There is no guarantee that oracle execute the function only when the where is true.

It is also more flexible
ex: first alpha alphabetically, then number numerically

order by translate(col,’~0123456789′,’~'), lpad(col,20)
a
aa
z
2
3
11

In 10g, regular expression will ease complex sorts

1

sys_connect_by_path

sys_connect_by_path is the only function provided by Oracle to get the the hierarchy path in one field, and it is only concatenating.

I just found out a way of doing a sum of the path :

let’s imagine I want the sum of the salary of all my hierarchy.

select ename, sys_connect_by_path(ename,’/') hierarchy, length(replace(sys_connect_by_path(lpad(‘ ‘,sal/10),’/'),’/'))*10 sal
from emp
connect by mgr=prior empno
start with mgr is null ;

ENAME HIERARCHY SAL
———- —————————— ——-
KING /KING 5000
JONES /KING/JONES 7970
SCOTT /KING/JONES/SCOTT 10970
ADAMS /KING/JONES/SCOTT/ADAMS 12070
FORD /KING/JONES/FORD 10970
SMITH /KING/JONES/FORD/SMITH 11770
BLAKE /KING/BLAKE 7850
ALLEN /KING/BLAKE/ALLEN 9450
WARD /KING/BLAKE/WARD 9100
MARTIN /KING/BLAKE/MARTIN 9100
TURNER /KING/BLAKE/TURNER 9350
JAMES /KING/BLAKE/JAMES 8800
CLARK /KING/CLARK 7450
MILLER /KING/CLARK/MILLER 8750

The LPAD is transforming a number in a string, then length will calculate the length of the calculated string. It is limited to 4000 char. That’s why I divided it by 10.

Let’s do it more flexible with PLSQL

create or replace function eval(expr varchar2) return number is retval number;
begin execute immediate ‘select ‘||expr||’ from dual’ into retval; return retval; end;
/

select ename, sys_connect_by_path(ename,’/') hierarchy, eval(sys_connect_by_path(sal,’+')) sal
from emp
connect by mgr=prior empno
start with mgr is null
/

Just using + to add, simple, is not it?

Recursive SQL

One of the most common school exercice about recursion is the factorial. Guess what, I am going to do it in sql with hierarchies!

I use the following ln property :
x1*…*xn = exp(ln(x1)+..+ln(xn)))

Ok, here it is

SQL> select n, (select exp(sum(ln(level))) from dual connect by level<=n) "N!" from t1;

N N!
- ---
4 24
6 720

One example about hierarchies

Today morning I just received a question from a friend where I used hierarchies :

> Let’s assume a couple of persons have bought some cakes togeher and they want to eat it:
>
> Create table cake_owners
> (owner# number,
> cake# number,
> constraint cake_pk primary key (owner#,cake#)
> using index);
>
> insert into cake_owners values (1,100);
> insert into cake_owners values (1,200);
> insert into cake_owners values (2,200);
> insert into cake_owners values (2,300);
> insert into cake_owners values (3,300);
> —–
> insert into cake_owners values (4,500);
> —–
> insert into cake_owners values (6,600);
> insert into cake_owners values (7,600);
> —–
> commit;
>
> So owner 1 owns cake 100 and a part of cake 200. Owner 2 owns a part of cake 200 and a part of cake 300 where the reset is owned by 3.
> Owner 4 owns cake 500 alone and cake 600 is owned by 2 persones 6 and 7.
>
> Now I want to place all owners on one table who share parts of their cake so that all cakes can be eaten compleatly without leaving the table.
> The table must be as small as possible and I want to know how many tables are needed and how big each one must be, or who is sitting at it.
> Of course a person can sit only at one table.
>
> In this much simplyfied example I need 3 tables the biggest one needs 3 chairs.

This is typically solved with PL/SQL, but with plain SQL, I need a hierachy (cake=prior cake and ownerprior owner) or (cakeprior cake and owner=prior owner). This is going to loop, with 10g I will use nocycle. With connect by root and count, I will found out the table master with the most guests.

select dense_rank() over (order by rootowner) tableno, owner#
from (
 select owner#,
  row_number() over
   (partition by owner# order by owner_c, rootowner) r,
  rootowner
 from (
  select
   rootowner,
   count(distinct owner#) over
    (partition by rootowner) owner_c,
   owner#
  from (
   select
    owner#, cake#,
    connect_by_root owner# rootowner
   from
    cake_owners
   connect by nocycle
    (owner#prior owner# and cake#=prior cake#)
    or
    (owner#=prior owner# and cake#prior cake#)
   )
  )
 ) where r=1
order by tableno, owner#;

TABLENO OWNER#
——- ——
1       1
1       2
1       3
2       4
3       6
3       7

Hierarchical queries

The typical hierarchical query is you want to select your boss, and the boss of your boss, etc.
It could look like

select prior ename ename, ename mgr
from emp
connect by prior mgr=empno
start with ename=’SCOTT’;

SCOTT
SCOTT JONES
JONES KING

I start with Scott and the hierarchy is built. I can use the pseudo column LEVEL in hierarchical queries.
One of the biggest problem in hierarchical queries is ORA-01436: CONNECT BY loop in user data.
If you are your own boss, or if you are the boss of your boss, then you have built a cycle. It is probably not wished to have this relation, but it cannot be enforced by a constraint and before 10g, it was difficult to detect. In 10g, you have a new clause, CONNECT BY NOCYCLE, which detect cycles and give flag.

Here I want to present an alternative way of using hierarchies.
Situation:
I have five boxes of five different sizes. Tiny up to 5 liters. Small up to 10 liters. Medium up to 15 liters. Big up to 20 liters. Hudge up to 25 liters.
I have 3 fluids, and I want to find the smallest boxes.
Yeah! very easy, you do select fluids.volume, min(box.capacity) from fluids, box where capacity>=volume group by volume, do not you?
Ok, but I do not want to mix the fluids! So I need 3 different containers.
This means, I will start with the first product, find the smallest box, go to the second, find the smallest free box, and so on.
Hmm… It seems an impossible task with analytics, least, min, keep, lag, lead, ???
Ok, I am going to build a hierarchy, based on volume > prior volume and capacity > prior capacity.
Then I do a min with the sys_connect_by_path function.

select max(sys_connect_by_path(volume,’/')) volume_path,
ltrim(min(lpad(sys_connect_by_path(capacity,’/'),999))) capacity_path
from box,
(select row_number() over (order by volume) r, count(*) over () c, volume from fluids)
where volume
connect by capacity > prior capacity and r > prior r;

/7/11/14 /10/15/20

There is also one more CONNECT I would like to briefly mention, is the connect without prior.
select level from dual connect by level<11;
It is a special way of creating pivot tables, but it is fairly dangerous, and could make your session / database hang, depending on your oracle version.

Oracle analytics in basic sql queries

When I first saw analytics appearing in Oracle last century, I did not realised they were going to change my way of writting basic SQL queries.

Some (Variance, deviance) are truely mathematical and still reserved for statistical analysis.

Here I will try to describe ROW_NUMBER :

Back in Oracle 7, I remember to have written a lot of reports using in max subqueries, like in

select deptno,ename,sal from emp where (deptno,sal) in (select deptno,max(sal) from emp group by deptno);

With analytics, I can rewrite it with

select deptno,ename,sal from (select emp.*, rank() over (partition by deptno order by sal desc) r from emp) where r=1;

If I want to get exactly one row per deptno, I could then write something like

select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc) r from emp) where r=1;

or better

select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc, empno) r from emp) where r=1;

row_number will select only one row. I prefer the second, because empno is a primary key and the result will be constant over time. In the first solution, I cannot determine which row will be returned, and Oracle may choose one on the morning, and another one in the afternoon, depending on the execution plan (new index/new stats/different load/…).

Note that I can also select the 2nd biggest salary (r=2), or the top 5 (r)

It is also very performant, because you are doing only one full table scan instead of two