Create default and named graphs in same command?

2 views
Skip to first unread message

Ajay Kamble

unread,
Feb 25, 2016, 5:52:31 AM2/25/16
to sta...@clarkparsia.com
We are using [stardog-admin db create ...] to create graph with bulk load.

We also want to create some named graphs. Is it possible to do this in the same command [stardog-admin db create ...] or we need to execute separate commands to create named graphs? Both default and named graphs have large number of triples so we need bulk load for both.

-Regards
Ajay

Michael Grove

unread,
Feb 25, 2016, 6:34:20 AM2/25/16
to stardog
On Thu, Feb 25, 2016 at 5:52 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
We are using [stardog-admin db create ...] to create graph with bulk load.

We also want to create some named graphs. Is it possible to do this in the same command [stardog-admin db create ...] or we need to execute separate commands to create named graphs?

Yes, it's possible, but only programmatically via the SNARL API. We'll be exposing that capability to the CLI in a future release.

Cheers,

Mike
 

-Regards
Ajay

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Zachary Whitley

unread,
Feb 25, 2016, 6:51:05 AM2/25/16
to sta...@clarkparsia.com


On Feb 25, 2016, at 6:34 AM, Michael Grove <mi...@stardog.com> wrote:



On Thu, Feb 25, 2016 at 5:52 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
We are using [stardog-admin db create ...] to create graph with bulk load.

We also want to create some named graphs. Is it possible to do this in the same command [stardog-admin db create ...] or we need to execute separate commands to create named graphs?

Yes, it's possible, but only programmatically via the SNARL API. We'll be exposing that capability to the CLI in a future release.

Cheers,

Mike

Would this be for specifying the named graph for bulk load on db creation?

 

-Regards
Ajay

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
---
You received this message because you are subscribed to the Google Groups "Stardog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stardog+u...@clarkparsia.com.

Ajay Kamble

unread,
Feb 25, 2016, 6:52:05 AM2/25/16
to Stardog
I checked SNARL API in documentation.

The Java examples that I found are adding 1 file at a time. We have lot of data and we have a zip file.

Is it possible in SNARL API?

Michael Grove

unread,
Feb 25, 2016, 7:27:17 AM2/25/16
to stardog
On Thu, Feb 25, 2016 at 6:52 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
I checked SNARL API in documentation.

The Java examples that I found are adding 1 file at a time. We have lot of data and we have a zip file.

Is it possible in SNARL API?

Yes, via the DatabaseBuilder [1] you can get from AdminConnection to create a database.

Cheers,

Mike

Ajay Kamble

unread,
Feb 26, 2016, 4:47:56 AM2/26/16
to Stardog
Is there any example of this?

I am going through documentation and stardog-examples but taking lot of time to figure it out.

-Ajay

Zachary Whitley

unread,
Feb 26, 2016, 9:34:38 AM2/26/16
to Stardog
I might be able to provide some help. You mentioned that you have a lot of data in a zip file. Can you share your data? How large is the zip file? How many files are in the zip file? What format are the files in? What platform are you working on? (windows/linux)

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
---

Evren Sirin

unread,
Feb 26, 2016, 10:29:31 AM2/26/16
to Stardog
There are two examples in the following gist:

https://gist.github.com/evren/7601614

First example shows how to use the API to load multiple files into
different named graphs during database creation. The second example
shows how to do the same for an existing database.

Best,
Evren

Ajay Kamble

unread,
Feb 26, 2016, 11:29:42 AM2/26/16
to Stardog
It is not possible for me to share data.

Here is the command that I was using earlier,

./stardog-admin db create -n myDB -o search.enabled=true -v -- /tmp/data1.zip /tmp/data2.zip /tmp/data3.zip

Now I want to load data1.zip as default graph and data2.zip and data3.zip as named graphs. Here is the code that I have written so far (this is Scala code),

val adminConnection = AdminConnectionConfiguration
      .toServer("http://localhost:5820")
      .credentials("admin", "admin")
      .connect()

adminConnection.builder(
  Metadata
    .create()
    .set(DatabaseOptions.NAME, "myDB"))
    .set[java.lang.Boolean](SearchOptions.SEARCHABLE, true)
    .create(Paths.get("/tmp/data1.zip"))

val paths = Map(
  Paths.get("/tmp/data2.zip") -> Values.iri("http://example.com/ns/databases/data2"),
  Paths.get("/tmp/data3.zip") -> Values.iri("http://example.com/ns/databases/data3")
)

adminConnection.builder(
  Metadata
    .create()
    .set(DatabaseOptions.NAME, "myDB"))
  .set[java.lang.Boolean](SearchOptions.SEARCHABLE, true)
  .create(???)

adminConnection.close()

Creating default graph seems straight-forward, but I do not know what to do for named graphs. Does it take map of path to namespace?


On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:

Ajay Kamble

unread,
Feb 28, 2016, 10:37:31 AM2/28/16
to Stardog
Any ideas?

Is it not possible to create named graph from zip file? What is missing/wrong in code snippet (from earlier post)?


On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:

Zachary Whitley

unread,
Feb 28, 2016, 5:38:53 PM2/28/16
to sta...@clarkparsia.com
I'm on my phone so I can't put together a detailed answer right now but have you looked at the example that Evren put together?


I'd imagine it's exactly what you're looking for. 

If you're not comfortable with that you could transform your files to either Trig or quads that does support names graphs. If it's already in turtle and there aren't too many files you could probably do it by hand but you didn't say how many files you've got. 
--

Zachary Whitley

unread,
Feb 28, 2016, 9:14:42 PM2/28/16
to sta...@clarkparsia.com




Sent from my iPad
To answer your specific question it takes a function that maps paths to resources. [1] I'm not sure if it will take a compressed file as a path. The documentation doesn't say anything specifically but does consistently refer to "files" so I'm not positive. Give it a try. I'm sure you'll find out fairly quickly if it doesn't when it barfs. If it doesn't you'll have to uncompress them to a temp directory first. I'd be interested to know what the answer is. On the one hand if you can do that then it's convenient but on the other hand you're stuck loading an entire zip file into a single named graph. 



On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:
We are using [stardog-admin db create ...] to create graph with bulk load.

We also want to create some named graphs. Is it possible to do this in the same command [stardog-admin db create ...] or we need to execute separate commands to create named graphs? Both default and named graphs have large number of triples so we need bulk load for both.

-Regards
Ajay

--

Ajay Kamble

unread,
Feb 29, 2016, 12:35:57 AM2/29/16
to Stardog
I'll try the given example. But I wanted to know if it is possible to directly work with zip files.

I have 90000 turtle files from which I want to create a named graph. I do not know which one is better approach: passing a single zip file or iterating through all files one by one.

-Ajay


On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:

Ajay Kamble

unread,
Feb 29, 2016, 3:39:24 AM2/29/16
to Stardog
I am struggling to get the code working. As already mentioned I need to create default graph as well as some named graphs.

The examples at [https://gist.github.com/evren/7601614] shows how to add all files to different named graphs.

If I create default graph first and then try to create named graphs, I get error that 'database already exists'.

When I tried to merge creation of all graphs in single create call I am getting different error,

Exception in thread "main" java.lang.NullPointerException
  at com.complexible.common.rdf.model.StardogIRI.<init>(StardogIRI.java:30)
  at com.complexible.common.rdf.model.StardogValueFactory.createIRI(StardogValueFactory.java:195)
  at com.complexible.common.rdf.model.Values.iri(Values.java:46)

I used the fact that if Path -> Resource function returns null then the file will be added to the default graph, but instead I got NullPointerException.

The biggest problem is rather than using zip files I have to iterate through all files.

-Ajay

On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:

Ajay Kamble

unread,
Feb 29, 2016, 5:28:36 AM2/29/16
to Stardog
I managed to fix NullPointerException, but hit com.complexible.stardog.StardogException: GC overhead limit exceeded.

This option creates problem as rather than creating graphs from few zip files, I need to go through each file iteratively. Anyways the program died after 45 minutes.

-Ajay

On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:

Evren Sirin

unread,
Feb 29, 2016, 10:44:24 AM2/29/16
to Stardog
On Mon, Feb 29, 2016 at 5:28 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
> I managed to fix NullPointerException, but hit
> com.complexible.stardog.StardogException: GC overhead limit exceeded.

Did this happen on the client-side or server-side? How many triples in
total are your trying to add? What memory settings are you using?

>
> This option creates problem as rather than creating graphs from few zip
> files, I need to go through each file iteratively. Anyways the program died
> after 45 minutes.

Using a zip file should work but keep in mind that you can also pass a
directory to the create function and all the files in that directory
will be loaded to the named graph you specify for the directory. Also
db create is faster than data add but unless you are loading billions
of triples you won't see a huge performance difference.

Best,
Evren

>
> -Ajay
>
> On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:
>>
>> We are using [stardog-admin db create ...] to create graph with bulk load.
>>
>> We also want to create some named graphs. Is it possible to do this in the
>> same command [stardog-admin db create ...] or we need to execute separate
>> commands to create named graphs? Both default and named graphs have large
>> number of triples so we need bulk load for both.
>>
>> -Regards
>> Ajay
>

Ajay Kamble

unread,
Mar 1, 2016, 4:16:26 AM3/1/16
to Stardog
I managed to get this working. Here is my feedback (better ways might be available but I am not aware of those, apart from what was suggested in this post),

1. Here is the code that worked for me: https://gist.github.com/kambleajay/2da5af5c5b0cbbd94a8b

2. I had 500,000 files to upload in default graph. But this is not a problem either from CLI or code and 20 million triples are uploaded in 3 minutes.

3. For named graph I had to upload 90000+ files. This is rather slow compared to default graph creation and this is only available from code not CLI.

4. I found out that a single connection could not take all this data and  that is why I was getting GC overhead limit reached error. Eventually I had to split files in batches and upload limited number of files in single connection.

5. The ConnectionConfiguration + IO code does not work with zip files. Here is the error I got,

error: Exception in thread "main" com.complexible.stardog.StardogException: java.lang.NullPointerException
error:  at com.complexible.stardog.protocols.client.SPECClientUtil.toStardogException(SPECClientUtil.java:87)
error:  at com.complexible.stardog.protocols.client.SPECClientUtil.toStardogException(SPECClientUtil.java:35)
error:  at com.complexible.stardog.api.impl.SPECConnection.applyChanges(SPECConnection.java:322)
error:  at com.complexible.stardog.api.impl.AbstractConnection.pushOutstanding(AbstractConnection.java:286)
error:  at com.complexible.stardog.api.impl.AbstractConnection.commit(AbstractConnection.java:195)

6. I'll try passing a directory to create method, is it supposed to be faster?

Suggestion: Creation of default graph is very easy in Stardog. Creation of named graphs should be equally easy.

-Ajay

On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:

Evren Sirin

unread,
Mar 1, 2016, 2:15:16 PM3/1/16
to Stardog
On Tue, Mar 1, 2016 at 4:16 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
> I managed to get this working. Here is my feedback (better ways might be
> available but I am not aware of those, apart from what was suggested in this
> post),
>
> 1. Here is the code that worked for me:
> https://gist.github.com/kambleajay/2da5af5c5b0cbbd94a8b

Note that you can actually load files into default graph and named
graphs at database creation time using the API and you don't need to
separate loading into two steps. The function that maps files to named
graphs cannot returns null (which is why you get the NPE before) but
you can use the special named graph <tag:stardog:api:context:default>
(Contexts. DEFAULT in the API) to accomplish this. In your first
example this would look like this:

Map<Path, Resource> paths = Maps.newHashMap();
paths.put(Paths.get("/tmp/data1.zip"), Contexts.DEFAULT);
paths.put(Paths.get("/tmp/data2.zip"),
Values.iri("http://example.com/ns/databases/data2"));
paths.put(Paths.get("/tmp/data3.zip"),
Values.iri("http://example.com/ns/databases/data3"));

Connection conn = admin
.disk(dbName)
.set(SearchOptions.SEARCHABLE, true)
.create(path -> paths.get(path), paths.keySet().toArray(new Path[0]));

>
> 2. I had 500,000 files to upload in default graph. But this is not a problem
> either from CLI or code and 20 million triples are uploaded in 3 minutes.
>
> 3. For named graph I had to upload 90000+ files. This is rather slow
> compared to default graph creation and this is only available from code not
> CLI.

The CLI allows adding multiple files into an arbitrary named graph via
"data add --named-graph". It does not allow adding different files
into different named graphs in one transaction. If getting all the
data in one transaction is not a requirement which seems to be the
case here you can run multiple CLI commands for each named graph. You
can pass a directory to data add command so all the files under that
directory will be loaded to the named graph you specify.

>
> 4. I found out that a single connection could not take all this data and
> that is why I was getting GC overhead limit reached error. Eventually I had
> to split files in batches and upload limited number of files in single
> connection.

The client does not push the updates to the server until commit is
called. The triples added using Adder.statement or Adder.graph would
accumulate in memory in the client-side. But the files added through
IO would not be parsed at all if serverSide option is used and only
the file paths would be kept in memory. In you case having +90K files
might take up considerable memory so you'd need to increase the memory
available to the client.

>
> 5. The ConnectionConfiguration + IO code does not work with zip files. Here
> is the error I got,
>
> error: Exception in thread "main" com.complexible.stardog.StardogException:
> java.lang.NullPointerException
> error: at
> com.complexible.stardog.protocols.client.SPECClientUtil.toStardogException(SPECClientUtil.java:87)
> error: at
> com.complexible.stardog.protocols.client.SPECClientUtil.toStardogException(SPECClientUtil.java:35)
> error: at
> com.complexible.stardog.api.impl.SPECConnection.applyChanges(SPECConnection.java:322)
> error: at
> com.complexible.stardog.api.impl.AbstractConnection.pushOutstanding(AbstractConnection.java:286)
> error: at
> com.complexible.stardog.api.impl.AbstractConnection.commit(AbstractConnection.java:195)

This is a bug. We've created a ticket (#2849) for this issue.

>
> 6. I'll try passing a directory to create method, is it supposed to be
> faster?

It might be faster in some cases (multiple threads would be used on
the server side when parsing multiple files) but I was suggesting that
mostly for convenience; you wouldn't need to create a zip file if it
doesn't already exist.

>
> Suggestion: Creation of default graph is very easy in Stardog. Creation of
> named graphs should be equally easy.

The API allows to do this but we'll update the documentation to
explain how it works in more detail.

Best,
Evren

>
> -Ajay
>
> On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:
>>
>> We are using [stardog-admin db create ...] to create graph with bulk load.
>>
>> We also want to create some named graphs. Is it possible to do this in the
>> same command [stardog-admin db create ...] or we need to execute separate
>> commands to create named graphs? Both default and named graphs have large
>> number of triples so we need bulk load for both.
>>
>> -Regards
>> Ajay
>

Zachary Whitley

unread,
Mar 1, 2016, 2:41:48 PM3/1/16
to Stardog
On Tue, Mar 1, 2016 at 2:14 PM, Evren Sirin <ev...@complexible.com> wrote:
On Tue, Mar 1, 2016 at 4:16 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
> I managed to get this working. Here is my feedback (better ways might be
> available but I am not aware of those, apart from what was suggested in this
> post),
>
> 1. Here is the code that worked for me:
> https://gist.github.com/kambleajay/2da5af5c5b0cbbd94a8b

Note that you can actually load files into default graph and named
graphs at database creation time using the API and you don't need to
separate loading into two steps. The function that maps files to named
graphs cannot returns null (which is why you get the NPE before) but
you can use the special named graph <tag:stardog:api:context:default>
(Contexts. DEFAULT in the API) to accomplish this. In your first
example this would look like this:

Map<Path, Resource> paths = Maps.newHashMap();
paths.put(Paths.get("/tmp/data1.zip"), Contexts.DEFAULT);
paths.put(Paths.get("/tmp/data2.zip"),
Values.iri("http://example.com/ns/databases/data2"));
paths.put(Paths.get("/tmp/data3.zip"),
Values.iri("http://example.com/ns/databases/data3"));

Connection conn = admin
   .disk(dbName)
   .set(SearchOptions.SEARCHABLE, true)
   .create(path -> paths.get(path), paths.keySet().toArray(new Path[0]));


Does the javadoc need to be updated for this? I'm guessing the confusion came from "If the mapping returns a null value then those triples will be loaded into the default graph without a context." from the javadocs. http://docs.stardog.com/java/snarl/com/complexible/stardog/api/admin/DatabaseBuilder.html#create-java.util.function.Function-java.nio.file.Path...-
 
---

Ajay Kamble

unread,
Mar 2, 2016, 3:19:40 AM3/2/16
to Stardog
Thank you Evren for reply. All information is helpful (and I was not aware of it)!

I tried couple of suggestions:

1. Using data add command with directory as target -> but this option did not work for me as the command appeared to be idle in the middle of execution. I am attaching thread dump.

2. I tried the code approach where all data can be bulk loaded in single API call with configuration in map -> this option worked really well and all data was loaded in 4 minutes.

-Ajay

On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:
data-add.log

Pavel Klinov

unread,
Mar 2, 2016, 3:27:34 AM3/2/16
to sta...@clarkparsia.com
Hi Ajay,

This is a client-side dump showing that the client is waiting for the server's response. It'd be helpful if you could send the server thread dump so we can understand what is going on.

Cheers,
Pavel

--

Ajay Kamble

unread,
Mar 2, 2016, 5:41:46 AM3/2/16
to Stardog
Here it is.

Both processes are running on my local machine.

-Ajay

On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:
data-add.log

Michael Grove

unread,
Mar 3, 2016, 7:17:57 AM3/3/16
to stardog
On Wed, Mar 2, 2016 at 5:41 AM, Ajay Kamble <ajay.ri...@gmail.com> wrote:
Here it is.

Both processes are running on my local machine.

Are there any exceptions in the stardog.log file? The server stack trace does not look stuck; its opening a new connection in the trace, presumably to handle data coming in from the client.

There's not a lot of progress printed to the client beyond which file is being uploaded, maybe it just looked like it was stuck?

Cheers,

Mike 
 

-Ajay

On Thursday, February 25, 2016 at 4:22:31 PM UTC+5:30, Ajay Kamble wrote:
We are using [stardog-admin db create ...] to create graph with bulk load.

We also want to create some named graphs. Is it possible to do this in the same command [stardog-admin db create ...] or we need to execute separate commands to create named graphs? Both default and named graphs have large number of triples so we need bulk load for both.

-Regards
Ajay

--
Reply all
Reply to author
Forward
0 new messages