random statements

I read the ultimate excuse database and checked if I can use model to do generate random statements:

with t as ( select '%E and %E are travelling to %L.' fmt from dual union all select 'About %N employees live in %L.' from dual) select str from t model reference dept on ( select loc,count(*) over () c,rownum r from dept) dimension by (r) measures (loc,c) reference emp on ( select ename,count(*) over () c, rownum r from emp) dimension by (r) measures (ename, c) partition by (fmt) dimension by (1 x) measures (cast(fmt as varchar2(4000)) str) rules iterate (1000) until str[1] not like '%/%%' escape '/' (str[1]=substr(str[1],1, decode(instr(str[1],'%'),0,4000, instr(str[1],'%')-1))|| decode(substr(str[1],nullif( instr(str[1],'%'),0)+1,1), 'N',to_char(trunc( dbms_random.value(0,1+emp.c[1]))), 'E',initcap(emp.ename[trunc( dbms_random.value(1,1+emp.c[1]))]), 'L',initcap(dept.loc[trunc( dbms_random.value(1,1+dept.c[1]))]))|| substr(str[1],nullif( instr(str[1],'%'),0)+2)) / STR ---------------------------------------- James and Ward are travelling to Dallas. About 1 employees live in Chicago. / STR -------------------------------------------- King and Blake are travelling to New York. About 3 employees live in Dallas.

10 thoughts on “random statements”

Laurent Schneider Post authorNovember 23, 2007 at 15:13

the algorythm is straightforward.

In the format string '%E / %N / %L' I replace the %E, %N and %L by a random employee, a random number of employees (from 0 to count) and a random location, respectively.

So I look until there is no more % in the string, and one by one I replace each string containing a % with [begin][substitution][end], where substitution is a value of a reference model where the dimension is a random value (pointing to a random row number).

I apologize for the lack of readability of the substr and instr due to the very short line of the wordpress template, which make proper indenting hard to implement

The point of using model and not REGEXP_REPLACE is to have %E replaced by different values (accidentally it will be the same) within the same string.

Here I avoid using PL/SQL, which is the natural way to translate T-SQL, in order to demonstrate the power of MODEL.

Those who attended my unconference at OpenWorld know what it is about 😎
Brian Ballsun-Stanton November 24, 2007 at 04:20

It’s waaaay easier to do random tuples.

select foo
from bar
where rownum = 1
order by DBMS_RANDOM.value();

Reading over the random excuse generator more closely, you folks are looking for the functional equivalent of good, random, data. (Since I make around a gig of that a quarter for RIT… I’m good at this)

The trick to good random data that doesn’t require a lot of PL is the ugly cartesian product.

Select regexReplacementThatImTooTiredToLookUp(prefixText || midText || postText)
from pre, mid, post
where rownum = 1
order by DBMS_RANDOM.value();

This is how, for example, one creates 10,000 bottles of wine from two 50 entry lists ganked from the wikipedia.
Laurent Schneider Post authorNovember 24, 2007 at 10:18

select foo from bar where rownum = 1 order by DBMS_RANDOM.value();
this does not generate random values. you need subquery because the order by is executed after rownum=1
Select regexp_replace('%E and %E travel to %L', '%E',(select ename from ( select ename from emp order by dbms_random.value(0,1)) where rownum=1))) from t
is also not possible, because all %E will be replaced by the same string…

Wow, 10000 bottle of wine, that is quite a lot 😎
Brian Ballsun-Stanton November 24, 2007 at 19:08

Whoops. Forgot the subquery. Yea, the wrapper is important. (I was rather tired last night, which isn’t a proper excuse. )

For employee name… lets see here

select text from (
select a.Name ||’ and ‘|| b.name || ‘ travel to ‘ || location as text,
rownum as rnum
from name a, name b, location
where a.name != b.name
order by DBMS_RANDOM.value() ) c
where rnum = 1;

should work. However, I’ll actually be testing it later today, since my previous post was full of errors…

10,000 bottles of wine, 15,000 cases, and 100,000 orders over 3 years. It makes for a fairly compact data warehouse. (Versus last year’s which, one of our students computed, had a business one third the size of wal-mart’s.
Laurent Schneider Post authorNovember 24, 2007 at 20:34

Brian, the objective is of course to select the table only once and to later substitute, according to the various %. The string ‘%E and %E travel to %L’ could be a bind variable, for example
Brian Ballsun-Stanton November 24, 2007 at 20:42

Hrmm, that makes it more interesting. Are we allowed to select against a precomputed materialized view?

(The replace is necessary insofar as one can select and populate random phrase. But the dual name thing requires two hits to the employees table, doesn’t it?)

Also, could you explain your model code? There are an impressive amount of parens.
Laurent Schneider Post authorNovember 24, 2007 at 21:23

well, two is only an example, but you could have 200 %E if you want

About model :

I look (ITERATE) until the is no more % in the string (UNTIL LIKE).

Each loop, I replace the first % (the second SUBSTR) with a value. i check what kind of % it is (the second DECODE), if it is a %E, i then use the reference model (REFERENCE EMP ON) and a random value (DBMS_RANDOM) as a dimension (ROWNUM R).

And then I loop for next value. It means, my str[1]= is executed as many time as you have % in the string.

I hope it puts some light 😎

10’000 bottles over 3 years, it is about 9 bottles a day, you must be very thirsty 😆
Brian Ballsun-Stanton November 24, 2007 at 22:23

On a completely random tangent, I love your human-checking. It’s quite a bit better than a bloody image.

100,000 cases over 3 years 🙂 Since the date dimension is by weekday only, that’s *quite a few* cases per day. Aren’t all DBAs alcoholics? 🙂

With respect to your iterate function, it’s a neat way to do it, but I’m not sure why you chose to do it that way. Was it a “look what I can do” hack? I’d be interested in seeing the explain plan output from this. If your reference tables are only full-table-scanned once, this might be a really shiny way for me to generate data.
Laurent Schneider Post authorNovember 24, 2007 at 22:44

why that way
because you have no subquery and no plsql
because Model is the most elegant solution I found

Well, 100000 over 3 years, that’s enough for a whole DBA team
Pingback: I’m just a simple DBA on a complex production system

Comments are closed.