Problem with big data source!

Reihaneh Amini

unread,

Jan 3, 2017, 4:59:33 PM1/3/17

to Silk

Hi All,

I have a NTriple file size of 27 gb. I cannot upload it in workbench!
Any one has any suggestion?

This is just one of my sources and my target source is also as big as this file.

Jindřich Mynarz

unread,

Jan 3, 2017, 5:12:49 PM1/3/17

to silk-di...@googlegroups.com

Hi,

you can load your data into an RDF store and load it to Silk via SPARQL endpoint. Out of the open-source offerings on the RDF store market, Virtuoso (https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader) can be one example of those capable of handling data of this size.

You can also reduce your dataset to only the resources that you link (perhaps instances of a given class) and the properties that you use in your linkage rule. Depending on your dataset, this may result in significant decrease in size of your dataset.

As a last resort, you can split your source file into smaller chunks and load each individually.

Best,

Jindřich

--

Jindřich Mynarz

http://mynarz.net/#jindrich

--
You received this message because you are subscribed to the Google Groups "Silk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to silk-discussion+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reihaneh Amini

unread,

Jan 3, 2017, 9:53:20 PM1/3/17

to Silk

Thanks for the response!
Do you know what is the maximum size of data that I can upload there?
Is this fine of I use .nt format?

_Reihan

To unsubscribe from this group and stop receiving emails from it, send an email to silk-discussi...@googlegroups.com.

Robert Isele

unread,

Jan 4, 2017, 4:42:02 AM1/4/17

to silk-discussion

Dear Reihan,

the RDF dataset in Silk currently loads all data to an in-memory Jena Model and thus is limited by the available memory. I just improved the plugin description to make this clear.

As Jindřich wrote, the preferred way of handling large RDF datasets is to load them into an RDF store, such as Virtuoso.

For large datasets, there is also commercial support for processing them on a Spark Cluster. Please contact me in case you are interested in that.

Kind regards,

Robert

To unsubscribe from this group and stop receiving emails from it, send an email to silk-discussion+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward