Typed version of jexp's batch import?

45 views
Skip to first unread message

James Thornton

unread,
Aug 7, 2012, 10:50:18 PM8/7/12
to ne...@googlegroups.com
Michael -

Is there a version of the batch importer (https://github.com/jexp/batch-import) that handles typed data? The version on GitHub looks like it only handles strings.

Thanks.

- James

Michael Hunger

unread,
Aug 8, 2012, 2:13:15 AM8/8/12
to ne...@googlegroups.com
James,

you're right, but it should be pretty simple to add.

I think of either specifying the type in the column name like age:int or in a secondary column below.

Another option that I could think of is to treat the cell contents as json and have the json parser take care of those. And then run the importer with -json

What do you think?

Michael

Friso van Vollenhoven

unread,
Aug 8, 2012, 4:49:53 AM8/8/12
to ne...@googlegroups.com
Here's my solution to this:

private static Pattern fpNumber = Pattern.compile("^[-]?\\d+[\\.]\\d+$");
private static Pattern intNumber = Pattern.compile("^[-]?\\d+$");
private static Pattern hexNumber = Pattern.compile("^0x[0-9A-Fa-f]+$");
etc...

And then match properties against the different patterns to figure out what it is. This is computationally a bit expensive, though. You could however do some sampling on the data; e.g. only run the matchers against the first N records and then assume the same type for the rest of the set.


Friso



Michael Hunger

unread,
Aug 8, 2012, 4:58:42 AM8/8/12
to ne...@googlegroups.com
Good idea,

sampling makes sense.

And if you assume wrongly your conversion will tell you :)

Michael

Michael Hunger

unread,
Aug 9, 2012, 10:23:57 PM8/9/12
to ne...@googlegroups.com
Ok, James, I added the type support.

But with explicit type information on the header row, sampling can lead to properties that should stay strings to be converted to numbers.

Michael

James Thornton

unread,
Aug 10, 2012, 12:14:44 AM8/10/12
to ne...@googlegroups.com


On Thursday, August 9, 2012 9:23:57 PM UTC-5, Michael Hunger wrote:
Ok, James, I added the type support.

But with explicit type information on the header row, sampling can lead to properties that should stay strings to be converted to numbers.


Ha! -- that's exactly what I recommended this morning to the person who asked me about it, but you may have beat him to it :)

Thanks Michael, I'll let him know!

- James
 

Amergin

unread,
Aug 15, 2012, 4:09:30 AM8/15/12
to ne...@googlegroups.com
First of all, many thanks to Michael for adding the support for primitive types in the importer.

Would it be possible to add typing support for index properties as well? It seems currently there is no such support for index property values.

Also, once numeric properties for indexes are supported, it'd be great to have support for numeric range searches from an index as explained in http://docs.neo4j.org/chunked/stable/indexing-lucene-extras.html
Reply all
Reply to author
Forward
0 new messages