import into UTF8 database

A common error when you import single-byte characters (e.g. iso8859p1 or mswin1252) into multi-bytes databases (e.g. utf8) is ORA-12899: value too large for column.

The root cause is the default semantics in a database being BYTE


SQL> select VALUE, ISDEFAULT
from v$parameter
where NAME='nls_length_semantics'
VALUE ISDEFAULT
------- ---------
BYTE TRUE

It means, one char equals one byte. But after conversion, one char is larger than one byte and does not fit any longer.

single-byte

SQL> select VALUE
from nls_database_parameters
where parameter='NLS_CHARACTERSET';
VALUE
-------------
WE8MSWIN1252
SQL> create table t(x char(1));
Table created.
SQL> insert into t values ('é');
1 row created.
SQL> commit;
Commit complete.
$ expdp scott/tiger dumpfile=t.dmp tables=t
. . exported "SCOTT"."T" 1 rows

multi-byte

SQL> select VALUE
from nls_database_parameters
where parameter='NLS_CHARACTERSET';
VALUE
-----------
UTF8
$ impdp scott/tiger dumpfile=t.dmp
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type
TABLE_EXPORT/TABLE/TABLE_DATA
ORA-02374: conversion error loading table
"SCOTT"."T"
ORA-12899: value too large for column X
(actual: 2, maximum: 1)
ORA-02372: data for row: X : 0X'E9'
. . imported "SCOTT"."T" 0 out of 1 rows

How do I import my data?

1) import the metadata

$ impdp scott/tiger dumpfile=t.dmp content=metadata_only
Processing object type TABLE_EXPORT/TABLE/TABLE

2) change the char_used of the column(s) from (B)yte to (C)har

SQL> select
column_name, char_used, data_length, data_type
from user_tab_columns
where table_name='T' and char_used='B';
COLUMN_NAME C DATA_LENGTH DATA_TYPE
------------ - ----------- ---------
X B 1 CHAR
SQL> alter table t modify x char(1 char);
Table altered.

3) import the data

$ impdp scott/tiger dumpfile=t.dmp content=data_only
Processing object type
TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "SCOTT"."T" 1 rows

4) check

SQL> select x, length(x), lengthb(x) from t;
X LENGTH(X) LENGTHB(X)
- ---------- ----------
é 1 2

My column has now a length of one char and two bytes.

4 thoughts on “import into UTF8 database

Comments are closed.