row generators performance

I wrote a few generators, and listed some existant in
http://laurentschneider.blogspot.com/2005/08/pivot-table.html

I decided to test them

note that this is not a “good” test, it is simply an overview. I am doing a count(*), another operation may be better in one or worst in another one.

all_objects is so slow (6 seconds for 50K rows) that I did not include it in my test

generate 1000 rows : all method are fast, except xquery…

generate 1000000 rows : model and cube never ends, xquery is very slow, connect and union are slow, plsql is ok, simple heap table is very fast (!)

I would like to point out that “generating rows” is rarely a business requirement. If you need to outer join with every day of the year, than create a table and insert every day of the year in it. It will be very fast, it is maintenable, extensible, supported, and self-documented…

Ok, here is my test

create type t_number as TABLE OF number;
/

create function f_number(n number) return t_number pipelined is
begin for i in 1..n loop pipe row(i); end loop; return; end;
/

create table t1000 as select 0 n from xmltable(‘for $i in 1 to 1000 return $i’ );
create table t1000000 as select 0 n from xmltable(‘for $i in 1 to 1000000 return $i’ );

set timi on feedb off echo off head off

prompt model 1000
select count(*)
from (
select null
from dual
model
dimension by (1 h)
measures (1 c)
rules( c[FOR h FROM 1 to 1000 INCREMENT 1] = 1)
)
/

prompt xquery 1000
select count(*)
from
xmltable(‘for $i in 1 to 1000 return $i’ )
/

prompt xquery 1000000
select count(*)
from
xmltable(‘for $i in 1 to 1000000 return $i’ )
/

prompt union 2*2*… 1K
with s as (select null from dual union all select null from dual)
select count(*)
from s,s,s,s,s,s,s,s,s,s
/

prompt union 1M
with s as (select null from dual union all select null from dual)
select count(*)
from
s,s,s,s,s,s,s,s,s,s,
s,s,s,s,s,s,s,s,s,s
/

prompt cube 1K
select count(*)
from (
select 1,2,3,4,5,6,7,8,9,10
from dual
group by cube(1,2,3,4,5,6,7,8,9,10)
)
/

prompt table 1000
select count(*)
from t1000
/

prompt table 1000000
select count(*)
from t1000000
/

set termout off
create type t_number as TABLE OF number;
/

create function f_number(n number) return t_number pipelined is
begin for i in 1..n loop pipe row(i); end loop; return; end;
/
set termout on

prompt plsql 1000
select count(*)
from table(f_number(1000))
/

prompt plsql 1000000
select count(*)
from table(f_number(1000000))
/

prompt all_objects
select count(*)
from all_objects
/

prompt connect 1000
select count(*)
from (
select level
from dual
connect by level

6 thoughts on “row generators performance

  1. Laurent Schneider

    I have used them many times to answer user questions, but myself, I have never used them in my productive scripts as far as I can remember…
    Thanks for your question 😉

  2. Thomas Kyte

    filling in the gaps!

    ops$tkyte@ORA10GR2> select * from inv;

    DT ITEM QTY
    ——— ———- ———-
    01-JAN-05 item1 55
    08-JAN-05 item1 20
    15-JAN-05 item1 30
    01-JAN-05 item2 155
    04-JAN-05 item2 120

    but we need to do time series analysis so we need to have every day filled in (and qty carried down..)

    ops$tkyte@ORA10GR2> with dates as
    2 (select to_date(’01-jan-2005′)+level-1 dt
    3 from dual connect by level

  3. Laurent Schneider

    thanks tom for your comment.
    personnaly, I prefer

    create table dates (dt date);
    exec for i in 0..3000 loop insert into dates values(date ‘2000-01-01’+i); end loop

    but this is a little bit Oracle6 dba style 😉

    style a row generator is very powerfull, but I use it more in my answers than in my scripts!

  4. Bob B

    It is nice to join to a premade table, but in some cases, that becomes a maintenance nightmare. Suppose you want to let users run reports on the last 36 months of data. A 36 row table for each month works fine, but now you need to maintain the information in the table so that it always contains all available dates. Generating the rows, you only have to validate that the reporting dates are within a range of TRUNC( SYSDATE, ‘MON’ ).

  5. Laurent Schneider

    however, with predefined tables, you can use materialized views, dimensions, hierarchies, partitioning, parallelism, etc…

    the “dimension” table containing the months could help, but maybe you just need to having a look at “moving sum”, a window clause of the analytics which let you defines relative values (range between 1080 preceding and current row)

    have a look at the datawarehouse guide about the “time” dimension example.

Comments are closed.