problem adding a gzipped ntriples file

1 view
Skip to first unread message

Bill Roberts

unread,
Aug 3, 2013, 11:39:54 AM8/3/13
to sta...@clarkparsia.com
(Stardog 1.2.3 on Mac OSX 10.8.4)

Having read in the docs that it is allowed and possibly preferred to add compressed files, I was trying to do this:

stardog data add test -g "http://graph1" sl.nt.gz

and get the error below.  The same file loads fine when not gzipped.  Am I doing something wrong?

Many thanks

Bill

[SEVERE com.clarkparsia.stardog.snarl.client.ClientHandler.handleError - Aug 3, 2013 04:36:50.768] Unhandled exception caught on client
com.clarkparsia.stardog.snarl.shared.IdAwareException: java.io.IOException: org.openrdf.rio.RDFParseException: Expected '<' or '_', found:  [line 1]
at com.clarkparsia.stardog.snarl.shared.BigPacketEncoder.writeRequested(BigPacketEncoder.java:235)
at org.jboss.netty.channel.Channels.write(Channels.java:712)
at org.jboss.netty.channel.Channels.write(Channels.java:679)
at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:246)
at com.clarkparsia.stardog.snarl.client.ClientHandler.send(ClientHandler.java:80)
at com.clarkparsia.stardog.snarl.client.ClientHandler.send(ClientHandler.java:104)
at com.clarkparsia.stardog.snarl.client.AbstractClient.execute(AbstractClient.java:199)
at com.clarkparsia.stardog.snarl.client.SNARLClientImpl.add(SNARLClientImpl.java:270)
at com.clarkparsia.stardog.snarl.client.SNARLConnection.change(SNARLConnection.java:269)
at com.clarkparsia.stardog.snarl.client.SNARLConnection.applyChanges(SNARLConnection.java:245)
at com.clarkparsia.stardog.api.impl.AbstractConnection.pushOutstanding(AbstractConnection.java:272)
at com.clarkparsia.stardog.api.impl.AbstractConnection.commit(AbstractConnection.java:190)
at com.clarkparsia.stardog.snarl.client.SNARLConnection.commit(SNARLConnection.java:193)
at com.clarkparsia.stardog.cli.impl.Add.execute(Add.java:115)
at com.clarkparsia.stardog.cli.impl.ConnectionCommand.call(ConnectionCommand.java:82)
at com.clarkparsia.stardog.cli.CLIBase.execute(CLIBase.java:54)
at com.clarkparsia.stardog.cli.CLI.main(CLI.java:89)
Caused by: java.io.IOException: org.openrdf.rio.RDFParseException: Expected '<' or '_', found:  [line 1]
at com.clarkparsia.stardog.rdf.io.StreamStatementIteration.computeNext(StreamStatementIteration.java:138)
at com.clarkparsia.stardog.rdf.io.StreamStatementIteration.computeNext(StreamStatementIteration.java:45)
at com.clarkparsia.common.iterations.AbstractIteration.tryToComputeNext(AbstractIteration.java:104)
at com.clarkparsia.common.iterations.AbstractIteration.hasNext(AbstractIteration.java:91)
at com.clarkparsia.stardog.snarl.shared.BigPacketEncoder.writeRequested(BigPacketEncoder.java:106)
... 16 more
Caused by: org.openrdf.rio.RDFParseException: Expected '<' or '_', found:  [line 1]
at org.openrdf.rio.helpers.RDFParserBase.reportFatalError(RDFParserBase.java:691)
at org.openrdf.rio.ntriples.NTriplesParser.reportFatalError(NTriplesParser.java:590)
at org.openrdf.rio.ntriples.NTriplesParser.parseSubject(NTriplesParser.java:341)
at org.openrdf.rio.ntriples.NTriplesParser.parseTriple(NTriplesParser.java:274)
at org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:187)
at org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:129)
at com.clarkparsia.stardog.rdf.io.StreamStatementIteration$ParseService.run(StreamStatementIteration.java:206)
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
at java.lang.Thread.run(Thread.java:680)

Expected '<' or '_', found:  [line 1]

Evren Sirin

unread,
Aug 5, 2013, 12:54:55 PM8/5/13
to Stardog
Hi Bill,

Compressed files can only be used during bulk loading at database creation time right now (db create command). Unfortunately we are not producing a meaningful error message here.

We are thinking of adding compressed file support for data add/remove commands in the next release but until then you will need to use uncompressed files after db creation.

Best,
Evren



--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
 
 

Bill Roberts

unread,
Aug 5, 2013, 1:00:29 PM8/5/13
to sta...@clarkparsia.com
Thanks Evren - no problem. Looks like I just misunderstood the documentation.

Zachary Whitley

unread,
Aug 5, 2013, 1:10:07 PM8/5/13
to sta...@clarkparsia.com
I thought you could only use compressed files on bulk load as well but when I looked at the Stardog docs it says " If a file name passed to create or add commands (through CLI or API) will be interpreted to be a gzip file..." [1]

[1] http://www.stardog.com/docs/admin/#compressed

After having looked through the documentation I have  a couple of quick questions about bulk loading.

The docs suggest that you use multiple files to take advantage of milticore architectures and parallel index creation. Would you still get the parallel load with multiple files in a zip file? If you did would this make multiple files in a zip file a better alternative to a single .gz file? Can you pass multiple .gz files?






Evren Sirin

unread,
Aug 5, 2013, 1:31:42 PM8/5/13
to Stardog
On Mon, Aug 5, 2013 at 1:10 PM, Zachary Whitley <zachary...@gmail.com> wrote:
I thought you could only use compressed files on bulk load as well but when I looked at the Stardog docs it says " If a file name passed to create or add commands (through CLI or API) will be interpreted to be a gzip file..." [1]

That looks like an error in the documentation.
 

[1] http://www.stardog.com/docs/admin/#compressed

After having looked through the documentation I have  a couple of quick questions about bulk loading.

The docs suggest that you use multiple files to take advantage of milticore architectures and parallel index creation. Would you still get the parallel load with multiple files in a zip file?

No, all the files inside the same zip file are loaded in the same thread.
 
If you did would this make multiple files in a zip file a better alternative to a single .gz file? Can you pass multiple .gz files?

You can pass multiple files to db create command. The input files can be any combination of zipped, gzipped, and uncompressed files. If you are loading multiple .gz files multiple threads will be used but only one thread will be used for a single zip file with multiple files. How much performance improvement you would see depends on the machine and the dataset.  

Best,
Evren

Kendall Clark

unread,
Aug 5, 2013, 1:35:14 PM8/5/13
to stardog
I committed some cleanups of this part of the docs to clarify the situation (which may be changing before the 2.0 release).

Cheers,
Kendall
Reply all
Reply to author
Forward
0 new messages