Transporting Oracle CHARs over a dblink

Oracle Logo At work, we've got two Oracle databases and a Java web application. One of the tasks of the web application is to copy tables over from one database to the other. Last week we ran into a funny problem which turns out to be an Oracle bug.

I've posted a question about this last week on stackoverflow.com but it seems that not many people are affected by this particular problem, or at least not many people know the answer to this particular little problem. We set out to tackle this and we did.

**The Setup ** We have two databases, and want to transfer a complete table from the remote database to the local database, using a database link and a "create table as select". For both databases, we can find the version and used character set with a few simple queries:

select * from v$version where banner like 'Oracle%';

select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';

The results in our setup are:

  **Version** **Character set**
**Remote:** Oracle 10g Release 10.2.0.3.0 WE8ISO8859P1
**Local:** Oracle 9i Release 9.2.0.6.0 AL32UTF8
As you see, we transfer data between different versions of Oracle, and different character sets. Add the CHAR datatype into the mix, and an annoying little bug pops up.

**The Test Data ** On the remote database, we create a table and fill it with some fake data, like so:

create table foo    (vcharbyte varchar2(1 byte),     vcharchar varchar2(1 char),     charbyte  char(1 byte),     charchar  char(1 char));

insert into foo     values ('A', 'B', 'C', 'D');

Let's check how the datatypes of our new table look:

select column_name, data_type, data_length as bytes, char_length as chars from user_tab_columns where table_name like 'FOO';

COLUMN_NAME DATA_TYPE BYTES CHARS
VCHARBYTE VARCHAR2 1 1
VCHARCHAR VARCHAR2 1 1
CHARBYTE CHAR 1 1
CHARCHAR CHAR 1 1
Everything looks fine. In the remote database, the data types are 1 byte in size, and 1 character in length, as expected.

**Copying the table over ** Log on to the local database. Setup a link to the remote database and create a local table with a "create table as select" statement:

create database link foo_link connect to foodb identified by foodb using 'foo_tns_entry';

create table bar as select * from foo@foo_link;

Everthing seems to be ok, and selecting the new data on the "bar" table seems to work. But there's a snag. When using JDBC to connect to the database, you might notice that the CHARBYTE column is padded with 2 spaces. Why? First we look at what happened to our data types:

select column_name, data_type, data_length as bytes, char_length as chars from user_tab_columns where table_name like 'BAR'

COLUMN_NAME DATA_TYPE BYTES CHARS
CHARBYTE CHAR **3** **3**
CHARCHAR CHAR 4 1
VCHARBYTE VARCHAR2 **3** **3**
VCHARCHAR VARCHAR2 4 1
You may notice something funny going on here. As mentioned earlier, the remote database uses the WE8ISO8859P1 character set, which uses 1 byte to store a character. The local database uses the AL32UTF8 character set, which uses 3 to 4 bytes to store a character. So in order to copy characters over Oracle reserves 4 bytes per character (see CHARCHAR and VCHARCHAR columns).

For the data types we defined the size in bytes in, Oracle reserves 3 bytes for some reason. This is strange, because the AL32UTF8 character set has some 4 byte characters in it which never can be stored in those columns. Suppose Oracle says "but you defined length in bytes", it still doesn't make sense. Why would I not choose to do conversion, but still reserve 3 times as much space for each byte? But wait, that's not all.

Lets look at the length of the datatypes in characters (which makes much more sense). When we define the length of the original column in characters, oracle converts the data properly, as you see for the "CHARCHAR" and "VCHARCHAR" columns. But, for the columns we defined size in bytes, Oracle screws up. Not only the the number of bytes gets scaled up, but the number of characters also gets scaled up, and your content is padded with spaces. It seems as if Oracle has given us a 3 character column with the WE8ISO8859P1 character set in a AL32UTF8 database, but I did not do tests to prove this.

Oracle decided to handle the datatypes differently depending on how you defined it's length. Although I find that very obscure, the only difference in behaviour I would expect is that if I define size in bytes, it should stay that size in bytes in the new character set, because I was "planning to store bytes".

So defining the size of character-based types in characters results in predictable behaviour. Defining the size of character-based types in bytes makes Oracle go loopy.

Conclusion: Stay clear of the CHAR datatype, unless you really have to. If you want to use the CHAR type, make sure you always specify the size in characters, not in bytes. Not only will it prevent you from having this problem, it will also make your database scripts more concise and readable.

_P.S. Make sure you clean up both databases and the link: _drop table bar; drop database link foo_link;

And on the remote database __ drop table foo;