H2 CSVREAD issue with null chars embedded in the data

534 views
Skip to first unread message

rvanamb

unread,
Sep 29, 2010, 4:59:10 PM9/29/10
to H2 Database
First I want to say as someone new to the H2 database that I think you
did an outstanding job on both the quality and documentation of the
software.

I was able to install it and get up and running quickly. Performance
seems good.


The problem I am seeing is when I try to import csv data that contains
null chars in it. I recognize that I can fix the routine that writes
out the data to either drop chars less that 32 or replace them with
spaces but I am hoping that the code for CSVREAD could be changed to
treat the null char/s as an empty char value that is not null.

So a csv data null value would be || (put into the db as null) and |
char(0)| or |char(0)char(0) ... | would be put into the db as '' when
importing into a varchar column.

As a side note MS SQL Server handles the embedded nulls (imported as a
bulk import) by assigning an empty value of '', which is of course not
the same as NULL. It would be nice to be compatible with the CSV logic
that bulk import uses.






Here is an example (names, etc changed somewhat to protect the
client). The value of the ZIP data is char(0) which I don't think
should be entered into the db as null.

/* Table Drop and Create
--------------------------------------------------------------------
*/
DROP TABLE FOO.foo;
CREATE TABLE FOO.foo(
STREET varchar(25) NOT NULL,
STREETAD varchar(14) NOT NULL,
CITY varchar(24) NOT NULL,
ST varchar(2) NOT NULL,
ZIP varchar(5) NOT NULL,
PHONE varchar(11) NOT NULL
);

/* Import Data from flat file
--------------------------------------------------------------------
*/
INSERT INTO FOO.foo (STREET,STREETAD,CITY,ST,ZIP,PHONE) SELECT * FROM
CSVREAD('C:\temp\testdata\foo_foo.dat', 'STREET|STREETAD|CITY|ST|ZIP|
PHONE', 'UTF-8', '|', '\n');



then I get the following error

NULL not allowed for column "ZIP"; SQL statement:

Here is a hex dump of part of the csv data with nulls in it

20202020 20202020 7C202020 20207C4D | |M 000070
41207C00 00000000 00007C30 31202020 A |.......|01 000080

Thomas Mueller

unread,
Oct 4, 2010, 1:59:21 PM10/4/10
to h2-da...@googlegroups.com
Hi,

MS SQL Server compatibility would be nice, but for me it's quite
strange to automatically convert an empty string to NULL, plus char(0)
to an empty string. How to encode char(0) then? Why not use a special
token (such as 'NULL') for NULL?

What about converting char(0) to an empty string in the query? Example:

create alias convert0 as $$
String convert0(String s) {
return s.length() == 0 ? null : s.charAt(0) == 0 ? "" : s;
}
$$

INSERT INTO FOO.foo (STREET,STREETAD,CITY,ST,ZIP,PHONE)
SELECT convert0(street), streetad, city, st, zip, phone


FROM CSVREAD('C:\temp\testdata\foo_foo.dat', 'STREET|STREETAD|CITY|ST|ZIP|
PHONE', 'UTF-8', '|', '\n');

Regards,
Thomas

Reply all
Reply to author
Forward
0 new messages