CSVREAD: unable to convert double-quoted string to NULL

406 views
Skip to first unread message

spacewiz

unread,
Aug 6, 2012, 7:17:22 PM8/6/12
to h2-da...@googlegroups.com, oleg....@datapop.com
Hello,

I saw a number of posts describing similar problems, however was not able to fine a solution...

My program receives a tab-separated file from a 3rd party (which I cannot modify) , which sometimes has double-quotes surrounding values.
I've attached a sample file to this email. I'm using CSVREAD to import the file as an H2 database table for further downstream processing.
The problem is that the downstream logic depends on empty strings (including "") being converted to NULL values.

No matter what I try - I'm not able to convert empty strings represented by two double-quotes "" between tabs to NULL values in database table :(

I tried many things including passing ||CHAR(34)||CHAR(34)|| to nullString parameter, but nothing works...
Looks like a bug?
-------
DROP TABLE IF EXISTS sample_table;

CREATE MEMORY TABLE IF NOT EXISTS sample_tab;le 
AS SELECT * FROM CSVREAD('sample_tsv_with_double_quotes.csv',null,
'caseSensitiveColumnNames=true charset=UTF-8 nullString="" fieldSeparator= ');

SELECT * FROM sample_table ;
-------
(sample input file is attached)

Any help is appreciated!

Thank You!
Oleg
sample_tsv_with_double_quotes.csv

Thomas Mueller

unread,
Aug 10, 2012, 4:19:10 PM8/10/12
to h2-da...@googlegroups.com
Hi,

The nullString is only used for non-delimited values. I will document this. Example:

A,B,C
1,NULL,X
2,,Y
"3","NULL","Z"

If the nullString is NULL, Column B of row 1 is null. Column B of row 3 is the literal "NULL". I don't plan to change this behavior.

If you want to convert quoted empty strings to NULL, you need to convert it yourself. It doesn't make sense to add such a feature to H2 in my view, as it would make things more complicated for a very unusual use case.

Regards,
Thomas

--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To view this discussion on the web visit https://groups.google.com/d/msg/h2-database/-/LmV7tuExSOkJ.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

spacewiz

unread,
Aug 14, 2012, 4:40:29 PM8/14/12
to h2-da...@googlegroups.com
Thomas,

The use case is not that unusual, for the company I have to work with standard Google AdWords bulksheets, and this is the format they use.
for nullString based on available documentation my assumption was that whatever string I pass to it - if the value of the field in the input file is that value - it will be converted to NULL in the database table.
I played with converting "" to NULLNULL as a pre-processing step, and specifying nullString=NULLNULL, and that worked, but nullString="" or nullString="""" or nullString=CHAR(34)||CHAR(34) did not work.
I just don't see why it works for some characters, but not double-quotes...

Anyway, in my case pre-processing and modifying input files in not practical, so my solution was a custom AFTER INSERT trigger implemented in Java that went though every inserted value and for empty strings converted them to NULL. The result is an insert performance hit, but it's not bad for files with a few tens of thousands of records.. still will have to test what it will be for larger files, which could contain a few millions of rows..

Thanks,
Oleg
Reply all
Reply to author
Forward
0 new messages