Note that you can actually load files into default graph and named
graphs at database creation time using the API and you don't need to
separate loading into two steps. The function that maps files to named
graphs cannot returns null (which is why you get the NPE before) but
you can use the special named graph <tag:stardog:api:context:default>
(Contexts. DEFAULT in the API) to accomplish this. In your first
example this would look like this:
Map<Path, Resource> paths = Maps.newHashMap();
paths.put(Paths.get("/tmp/data1.zip"), Contexts.DEFAULT);
paths.put(Paths.get("/tmp/data2.zip"),
Values.iri("
http://example.com/ns/databases/data2"));
paths.put(Paths.get("/tmp/data3.zip"),
Values.iri("
http://example.com/ns/databases/data3"));
Connection conn = admin
.disk(dbName)
.set(SearchOptions.SEARCHABLE, true)
.create(path -> paths.get(path), paths.keySet().toArray(new Path[0]));
>
> 2. I had 500,000 files to upload in default graph. But this is not a problem
> either from CLI or code and 20 million triples are uploaded in 3 minutes.
>
> 3. For named graph I had to upload 90000+ files. This is rather slow
> compared to default graph creation and this is only available from code not
> CLI.
The CLI allows adding multiple files into an arbitrary named graph via
"data add --named-graph". It does not allow adding different files
into different named graphs in one transaction. If getting all the
data in one transaction is not a requirement which seems to be the
case here you can run multiple CLI commands for each named graph. You
can pass a directory to data add command so all the files under that
directory will be loaded to the named graph you specify.
>
> 4. I found out that a single connection could not take all this data and
> that is why I was getting GC overhead limit reached error. Eventually I had
> to split files in batches and upload limited number of files in single
> connection.
The client does not push the updates to the server until commit is
called. The triples added using Adder.statement or Adder.graph would
accumulate in memory in the client-side. But the files added through
IO would not be parsed at all if serverSide option is used and only
the file paths would be kept in memory. In you case having +90K files
might take up considerable memory so you'd need to increase the memory
available to the client.
>
> 5. The ConnectionConfiguration + IO code does not work with zip files. Here
> is the error I got,
>
> error: Exception in thread "main" com.complexible.stardog.StardogException:
> java.lang.NullPointerException
> error: at
> com.complexible.stardog.protocols.client.SPECClientUtil.toStardogException(SPECClientUtil.java:87)
> error: at
> com.complexible.stardog.protocols.client.SPECClientUtil.toStardogException(SPECClientUtil.java:35)
> error: at
> com.complexible.stardog.api.impl.SPECConnection.applyChanges(SPECConnection.java:322)
> error: at
> com.complexible.stardog.api.impl.AbstractConnection.pushOutstanding(AbstractConnection.java:286)
> error: at
> com.complexible.stardog.api.impl.AbstractConnection.commit(AbstractConnection.java:195)
This is a bug. We've created a ticket (#2849) for this issue.
>
> 6. I'll try passing a directory to create method, is it supposed to be
> faster?
It might be faster in some cases (multiple threads would be used on
the server side when parsing multiple files) but I was suggesting that
mostly for convenience; you wouldn't need to create a zip file if it
doesn't already exist.
>
> Suggestion: Creation of default graph is very easy in Stardog. Creation of
> named graphs should be equally easy.
The API allows to do this but we'll update the documentation to
explain how it works in more detail.
Best,
Evren
>
> -Ajay
>
> On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:
>>
>> We are using [stardog-admin db create ...] to create graph with bulk load.
>>
>> We also want to create some named graphs. Is it possible to do this in the
>> same command [stardog-admin db create ...] or we need to execute separate
>> commands to create named graphs? Both default and named graphs have large
>> number of triples so we need bulk load for both.
>>
>> -Regards
>> Ajay
>