CSV part 4, fast !!

I got some comments that my other csv solutions were slow to export gigabytes of data.

One more try.

thanks to the feedbacks, I provided a new version

This could generate very large files in just a few minutes (instead of hours).

I use bulk collect and utl_file to boost performance

CREATE TYPE collist IS TABLE OF VARCHAR2 (4000)
/

CREATE OR REPLACE PROCEDURE bulk_csv (directory_name    VARCHAR2,
                                      file_name         VARCHAR2,
                                      query        VARCHAR2)
   AUTHID CURRENT_USER
IS
   -- $Id$
   fh             UTL_FILE.file_type;
   stmt           VARCHAR2 (32767) := NULL;
   header         VARCHAR2 (32767) := NULL;
   curid          NUMBER;
   desctab        DBMS_SQL.DESC_TAB;
   colcnt         NUMBER;
   namevar        VARCHAR2 (32767);

   TYPE cola IS TABLE OF collist
                   INDEX BY BINARY_INTEGER;

   res            cola;
   rcur           SYS_REFCURSOR;
   current_line   VARCHAR2 (32767);
   next_line      VARCHAR2 (32767);

BEGIN
   curid := DBMS_SQL.open_cursor;
   DBMS_SQL.parse (curid, query, DBMS_SQL.NATIVE);
   DBMS_SQL.DESCRIBE_COLUMNS (curid, colcnt, desctab);

   FOR i IN 1 .. colcnt
   LOOP
      DBMS_SQL.DEFINE_COLUMN (curid,
                              i,
                              namevar,
                              32767);
   END LOOP;

   IF DBMS_SQL.execute (curid) = 0
   THEN
      FOR i IN 1 .. colcnt
      LOOP
         IF (i > 1)
         THEN
            header := header || ';';
            stmt := stmt || ',';
         END IF;

         header := header || desctab (i).col_name;
         stmt :=
               stmt
            || CASE
                  WHEN desctab (i).col_type IN
                          (DBMS_SQL.Varchar2_Type,
                           DBMS_SQL.Char_Type)
                  THEN
                     '"'||desctab (i).col_name || '"'
                  WHEN desctab (i).col_type IN
                          (DBMS_SQL.Number_Type,
                           DBMS_SQL.Date_Type,
                           DBMS_SQL.Binary_Float_Type,
                           DBMS_SQL.Binary_Bouble_Type,
                           DBMS_SQL.Timestamp_Type,
                           DBMS_SQL.Timestamp_With_TZ_Type,
                           DBMS_SQL.Interval_Year_to_Month_Type,
                           DBMS_SQL.Interval_Day_To_Second_Type,
                           DBMS_SQL.Timestamp_With_Local_TZ_type)
                  THEN
                     'to_char("' || desctab (i).col_name || '")'
                  WHEN desctab (i).col_type = DBMS_SQL.Raw_Type
                  THEN
                     'rawtohex("' || desctab (i).col_name || '")'
                  WHEN desctab (i).col_type = DBMS_SQL.Rowid_Type
                  THEN
                     '''unsupport datatype : ROWID'''
                  WHEN desctab (i).col_type = DBMS_SQL.Long_Type
                  THEN
                     '''unsupport datatype : LONG'''
                  WHEN desctab (i).col_type = DBMS_SQL.Long_Raw_Type
                  THEN
                     '''unsupport datatype : LONG RAW'''
                  WHEN desctab (i).col_type = DBMS_SQL.User_Defined_Type
                  THEN
                     '''unsupport datatype : User Defined Type'''
                  WHEN desctab (i).col_type = DBMS_SQL.MLSLabel_Type
                  THEN
                     '''unsupport datatype : MLSLABEL'''
                  WHEN desctab (i).col_type = DBMS_SQL.Ref_Type
                  THEN
                     '''unsupport datatype : REF'''
                  WHEN desctab (i).col_type = DBMS_SQL.Clob_Type
                  THEN
                     '''unsupport datatype : CLOB'''
                  WHEN desctab (i).col_type = DBMS_SQL.Blob_Type
                  THEN
                     '''unsupport datatype : BLOB'''
                  WHEN desctab (i).col_type = DBMS_SQL.Rowid_Type
                  THEN
                     '''unsupport datatype : ROWID'''
                  WHEN desctab (i).col_type = DBMS_SQL.Bfile_Type
                  THEN
                     '''unsupport datatype : BFILE'''
                  WHEN desctab (i).col_type = DBMS_SQL.Urowid_Type
                  THEN
                     '''unsupport datatype : UROWID'''
                  ELSE
                     '''unsupport datatype : '||desctab (i).col_type||''''
               END;
      END LOOP;

      stmt := 'select collist(' || stmt || ') from (' || query || ')';

      fh :=
         UTL_FILE.fopen (directory_name,
                         file_name,
                         'W',
                         32767);

      begin
            OPEN rcur FOR stmt;
      exception 
        when others then 
          dbms_output.put_line(stmt);
          raise;
      end;
      LOOP
         FETCH rcur
         BULK COLLECT INTO res
         LIMIT 10000;

         current_line := header;
         next_line := NULL;

         FOR f IN 1 .. res.COUNT
         LOOP
            FOR g IN 1 .. res (f).COUNT
            LOOP
               IF (g > 1)
               THEN
                  next_line := next_line || ';';
               END IF;

               IF (  NVL(LENGTH (current_line),0)
                   + NVL(LENGTH (next_line),0)
                   + NVL(LENGTH (res (f) (g)),0)
                   + 5 > 32767)
               THEN
                  UTL_FILE.put_line (fh, current_line);
                  current_line := NULL;
               END IF;

               IF (NVL(LENGTH (next_line),0) + NVL(LENGTH (res (f) (g)),0) + 5 > 32767)
               THEN
                  UTL_FILE.put_line (fh, next_line);
                  next_line := NULL;
               END IF;

               next_line := next_line || res (f) (g);
            END LOOP;

            current_line :=
                  CASE
                     WHEN current_line IS NOT NULL
                     THEN
                        current_line || CHR (10)
                  END
               || next_line;
            next_line := NULL;
         END LOOP;

         UTL_FILE.put_line (fh, current_line);
         EXIT WHEN rcur%NOTFOUND;
      END LOOP;

      CLOSE rcur;

      UTL_FILE.fclose (fh);
   END IF;

   DBMS_SQL.CLOSE_CURSOR (curid);
END;
/

CREATE OR REPLACE DIRECTORY tmp AS '/tmp';

EXEC bulk_csv('TMP','emp.csv','SELECT * FROM EMP ORDER BY ENAME')


EMPNO;ENAME;JOB;MGR;HIREDATE;SAL;COMM;DEPTNO
7876;ADAMS;CLERK;7788;1987-05-23 00:00:00;1100;;20
7499;ALLEN;SALESMAN;7698;1981-02-20 00:00:00;1600;30;30
7698;BLAKE;MANAGER;7839;1981-05-01 00:00:00;2850;;30
7782;CLARK;MANAGER;7839;1981-06-09 00:00:00;2450;;10
7902;FORD;ANALYST;7566;1981-12-03 00:00:00;3000;;20
7900;JAMES;CLERK;7698;1981-12-03 00:00:00;950;;30
7566;JONES;MANAGER;7839;1981-04-02 00:00:00;2975;;20
7839;KING;PRESIDENT;;1981-11-17 00:00:00;5000;;10
7654;MARTIN;SALESMAN;7698;1981-09-28 00:00:00;1250;140;30
7934;MILLER;CLERK;7782;1982-01-23 00:00:00;1300;;10
7788;SCOTT;ANALYST;7566;1987-04-19 00:00:00;3000;;20
7369;SMITH;CLERK;7902;1980-12-17 00:00:00;800;;20
7844;TURNER;SALESMAN;7698;1981-09-08 00:00:00;1500;0;30
7521;WARD;SALESMAN;7698;1981-02-22 00:00:00;1250;50;30

on materialized view constraints

Oracle is pretty strong at enforcing constraint.

Table for this blog post:
create table t(x number primary key, y number);

For instance if you alter table t add check (y<1000); then Y will not be bigger than 1000, right?

SQL> insert into t values (1,2000);
insert into t values (1,2000)
Error at line 1
ORA-02290: check constraint (SCOTT.SYS_C0029609) violated

I believe this code to be unbreakable. If you have only SELECT and INSERT privilege on the table, you cannot bypass the constraint.

Let’s imagine some complex constraint. CHECK (sum(y) < 1000)

SQL> alter table t add check (sum(y) < 1000);
alter table t add check (sum(y) < 1000)
Error at line 1
ORA-00934: group function is not allowed here

Ok, clear enough I suppose, we cannot handle this complex constraint with a CHECK condition.

We could have some before trigger that fires an exception

CREATE TRIGGER tr
   BEFORE INSERT OR UPDATE
   ON T
   FOR EACH ROW
   WHEN (NEW.Y > 0)
DECLARE
   s   NUMBER;
BEGIN
   SELECT SUM (y) INTO s FROM t;

   IF (s + :new.y >= 1000)
   THEN
      raise_application_error (-20001, 'SUM(Y) would exceed 1000');
   END IF;
END;
/

Now the trigger will compute the sum and return an exception whenever it fails.

SQL> insert into t values (2, 600);

1 row created.

SQL> insert into t values (3, 600);
insert into t values (3, 600)
            *
ERROR at line 1:
ORA-20001: SUM(Y) would exceed 1000
ORA-06512: at "SCOTT.TR", line 8
ORA-04088: error during execution of trigger 'SCOTT.TR'

SQL> drop trigger tr;

Trigger dropped.

SQL> truncate table t;

Table truncated.

But I am not good with triggers, and the triggers are as bad as their developers and have dark sides like mutating triggers and thelike.

As Tom Kyte mentioned in the comment, the code above is not efficient effective if more than one user update the table at the same time

Another popular approach is to create a fast-refreshable-on-commit mview with a constraint.

Let’s see how this works.


create materialized view log on t with rowid, primary key (y) including new values;

create materialized view mv
refresh fast 
on commit 
as select sum(y) sum from t;

alter table mv add check (sum < 1000);

The constraint is on the mview, so once you commit (and only at commit time), Oracle will try to refresh the mview.

SQL> insert into t values (4, 600);

1 row created.

SQL> commit;

Commit complete.

SQL> insert into t values (5, 600);

1 row created.

SQL> commit;
commit
*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-02290: check constraint (SCOTT.SYS_C0029631) violated

SQL> select * from t;

         X          Y
---------- ----------
         4        600

So far so good. The mechanism rollbacks the transaction in case of an ORA-12008. A bit similar to a DEFERABLE constraint.

But how safe is this after all? Oracle does not enforce anything on the table, it just fails on refresh…

Anything that does not fulfill the materialized view fast refresh requisites will also break the data integrity.

SQL> delete from t;

1 row deleted.

SQL> commit;

Commit complete.

SQL> alter session enable parallel dml;

Session altered.

SQL> insert /*+PARALLEL*/ into t select 100+rownum, rownum*100 from dual connect by level<20;

19 rows created.

SQL> commit;

Commit complete.

SQL> select sum(y) from t;

    SUM(Y)
----------
     19000

SQL> select staleness from user_mviews;

STALENESS
-------------------
UNUSABLE

Your data integrity is gone. By “breaking” the mview, with only SELECT, INSERT and ALTER SESSION privilege, you can now insert any data.

This is documented as
FAST Clause

For both conventional DML changes and for direct-path INSERT operations, other conditions may restrict the eligibility of a materialized view for fast refresh.

Other operations like TRUNCATE may also prevent you from inserting fresh data


SQL> alter materialized view mv compile;

Materialized view altered.

SQL> exec dbms_mview.refresh('MV','COMPLETE');

PL/SQL procedure successfully completed.

SQL> select * from mv;

       SUM
----------

SQL> insert into t values(1,1);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from mv;

       SUM
----------
         1

SQL> truncate table t;

Table truncated.

SQL> insert into t values(1,1);

1 row created.

SQL> commit;
commit
*
ERROR at line 1:
ORA-32321: REFRESH FAST of "SCOTT"."MV" unsupported after detail table
TRUNCATE

On using Toad against a database

I got this question once again today in a previous post.

What’s wrong by using Toad against a database?

The worst case scenario:
– some non-technical staff is clicking around in your production database with read-write access :(

The best-case scenario :
– nobody has access to your database :)

Here is a short list on how you could protect your data :
– Give the right privilege to the right person. DBA role to the DBA, CREATE TABLE/CREATE INDEX to the developer, INSERT/UPDATE/DELETE to the application
– Restrict access to your database server. Use some firewall. Allow only the dba workstation and the application server to the Production environment

What if the end-user PC needs access to the Production database with a powerfull user? This often happend in real world. A fat client is installed on the PC, the password is somehow hardcoded, the privileges granted to the hardcoded user are uterly generous…

It is not a bad practice in this case to block access to the database server to Toad/SQLPLUS and thelike. This will very ineffeciently prevent some garage-hacker from corrupting your database, but it will prevent your sales / marketing colleagues from deleting data, locking tables and degrading performance. This could be done by some login triggers or, my preference, some administrative measures like information, auditting and sanctions.