random statements

I read the ultimate excuse database and checked if I can use model to do generate random statements:


with t as (
select '%E and %E are travelling to %L.' fmt
from dual union all
select 'About %N employees live in %L.'
from dual)
select str from t
model
reference
dept on (
select loc,count(*) over () c,rownum r
from dept)
dimension by (r) measures (loc,c)
reference
emp on (
select ename,count(*) over () c, rownum r
from emp)
dimension by (r) measures (ename, c)
partition by (fmt)
dimension by (1 x)
measures (cast(fmt as varchar2(4000)) str)
rules
iterate (1000)
until str[1] not like '%/%%' escape '/'
(str[1]=substr(str[1],1,
decode(instr(str[1],'%'),0,4000,
instr(str[1],'%')-1))||
decode(substr(str[1],nullif(
instr(str[1],'%'),0)+1,1),
'N',to_char(trunc(
dbms_random.value(0,1+emp.c[1]))),
'E',initcap(emp.ename[trunc(
dbms_random.value(1,1+emp.c[1]))]),
'L',initcap(dept.loc[trunc(
dbms_random.value(1,1+dept.c[1]))]))||
substr(str[1],nullif(
instr(str[1],'%'),0)+2))
/
STR
----------------------------------------
James and Ward are travelling to Dallas.
About 1 employees live in Chicago.
/
STR
--------------------------------------------
King and Blake are travelling to New York.
About 3 employees live in Dallas.

10 thoughts on “random statements

  1. Laurent Schneider Post author

    the algorythm is straightforward.

    In the format string '%E / %N / %L' I replace the %E, %N and %L by a random employee, a random number of employees (from 0 to count) and a random location, respectively.

    So I look until there is no more % in the string, and one by one I replace each string containing a % with [begin][substitution][end], where substitution is a value of a reference model where the dimension is a random value (pointing to a random row number).

    I apologize for the lack of readability of the substr and instr due to the very short line of the wordpress template, which make proper indenting hard to implement

    The point of using model and not REGEXP_REPLACE is to have %E replaced by different values (accidentally it will be the same) within the same string.

    Here I avoid using PL/SQL, which is the natural way to translate T-SQL, in order to demonstrate the power of MODEL.

    Those who attended my unconference at OpenWorld know what it is about 😎

  2. Brian Ballsun-Stanton

    It’s waaaay easier to do random tuples.

    select foo
    from bar
    where rownum = 1
    order by DBMS_RANDOM.value();

    Reading over the random excuse generator more closely, you folks are looking for the functional equivalent of good, random, data. (Since I make around a gig of that a quarter for RIT… I’m good at this)

    The trick to good random data that doesn’t require a lot of PL is the ugly cartesian product.

    Select regexReplacementThatImTooTiredToLookUp(prefixText || midText || postText)
    from pre, mid, post
    where rownum = 1
    order by DBMS_RANDOM.value();

    This is how, for example, one creates 10,000 bottles of wine from two 50 entry lists ganked from the wikipedia.

  3. Laurent Schneider Post author

    select foo
    from bar
    where rownum = 1
    order by DBMS_RANDOM.value();

    this does not generate random values. you need subquery because the order by is executed after rownum=1

    Select regexp_replace('%E and %E travel to %L',
    '%E',(select ename from (
    select ename from emp order by dbms_random.value(0,1))
    where rownum=1)))
    from t

    is also not possible, because all %E will be replaced by the same string…

    Wow, 10000 bottle of wine, that is quite a lot 😎

  4. Brian Ballsun-Stanton

    Whoops. Forgot the subquery. Yea, the wrapper is important. (I was rather tired last night, which isn’t a proper excuse. )

    For employee name… lets see here

    select text from (
    select a.Name ||’ and ‘|| b.name || ‘ travel to ‘ || location as text,
    rownum as rnum
    from name a, name b, location
    where a.name != b.name
    order by DBMS_RANDOM.value() ) c
    where rnum = 1;

    should work. However, I’ll actually be testing it later today, since my previous post was full of errors…

    10,000 bottles of wine, 15,000 cases, and 100,000 orders over 3 years. It makes for a fairly compact data warehouse. (Versus last year’s which, one of our students computed, had a business one third the size of wal-mart’s.

  5. Laurent Schneider Post author

    Brian, the objective is of course to select the table only once and to later substitute, according to the various %. The string ‘%E and %E travel to %L’ could be a bind variable, for example

  6. Brian Ballsun-Stanton

    Hrmm, that makes it more interesting. Are we allowed to select against a precomputed materialized view?

    (The replace is necessary insofar as one can select and populate random phrase. But the dual name thing requires two hits to the employees table, doesn’t it?)

    Also, could you explain your model code? There are an impressive amount of parens.

  7. Laurent Schneider Post author

    well, two is only an example, but you could have 200 %E if you want

    About model :

    I look (ITERATE) until the is no more % in the string (UNTIL LIKE).

    Each loop, I replace the first % (the second SUBSTR) with a value. i check what kind of % it is (the second DECODE), if it is a %E, i then use the reference model (REFERENCE EMP ON) and a random value (DBMS_RANDOM) as a dimension (ROWNUM R).

    And then I loop for next value. It means, my str[1]= is executed as many time as you have % in the string.

    I hope it puts some light 😎

    10’000 bottles over 3 years, it is about 9 bottles a day, you must be very thirsty 😆

  8. Brian Ballsun-Stanton

    On a completely random tangent, I love your human-checking. It’s quite a bit better than a bloody image.

    100,000 cases over 3 years 🙂 Since the date dimension is by weekday only, that’s *quite a few* cases per day. Aren’t all DBAs alcoholics? 🙂

    With respect to your iterate function, it’s a neat way to do it, but I’m not sure why you chose to do it that way. Was it a “look what I can do” hack? I’d be interested in seeing the explain plan output from this. If your reference tables are only full-table-scanned once, this might be a really shiny way for me to generate data.

  9. Laurent Schneider Post author

    why that way
    because you have no subquery and no plsql
    because Model is the most elegant solution I found

    Well, 100000 over 3 years, that’s enough for a whole DBA team :mrgreen:

  10. Pingback: I’m just a simple DBA on a complex production system

Comments are closed.