Graph Retrieval

0 views
Skip to first unread message

Tze-John Tang

unread,
Jun 2, 2014, 11:28:08 AM6/2/14
to sta...@clarkparsia.com
What is the recommended approach to retrieving a graph of a specific depth from a starting entity? I am currently using the getter API, where I get the graph where the my starting entity is the subject, and then I make the same call where the starting entity is the object. For each of those, I iterate over each of the results, and if the value is a URI, then I call the getter API again to fetch, using the URI as the subject/object, etc. Obviously this starts performing slowly after a couple of levels.

Thanks,

-tj

Mike Grove

unread,
Jun 3, 2014, 9:40:07 AM6/3/14
to stardog
On Mon, Jun 2, 2014 at 11:28 AM, Tze-John Tang <tzejoh...@gmail.com> wrote:
What is the recommended approach to retrieving a graph of a specific depth from a starting entity? I am currently using the getter API, where I get the graph where the my starting entity is the subject, and then I make the same call where the starting entity is the object. For each of those, I iterate over each of the results, and if the value is a URI, then I call the getter API again to fetch, using the URI as the subject/object, etc. Obviously this starts performing slowly after a couple of levels.

Are you trying to retrieve a graph of arbitrary depth?

How much data are you pulling back after "a couple levels"?

What are you planning on doing with that data?  Is it something you could leave in place and just query as-is instead?

What you're doing is not unreasonable, but there may be a better approach given what you're trying to accomplish.

Cheers,

Mike
 

Thanks,

-tj

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Tze-John Tang

unread,
Jun 3, 2014, 11:31:46 AM6/3/14
to sta...@clarkparsia.com
Are you trying to retrieve a graph of arbitrary depth?

Yes. Not sure of what depth. Currently I implemented it as recursive calls using the getter API.

How much data are you pulling back after "a couple levels"?

The actual data is not that large... at depth 3 we are looking at 137 unique subjects. About 1000 triples. The graph fetch takes about 20s, due to the recursive getter calls.

What are you planning on doing with that data?  Is it something you could leave in place and just query as-is instead?

Yeah, we don't know what we are exactly doing with the data, but something related to some data browsing interface.

Mike Grove

unread,
Jun 3, 2014, 12:11:58 PM6/3/14
to stardog
On Tue, Jun 3, 2014 at 11:31 AM, Tze-John Tang <tzejoh...@gmail.com> wrote:
Are you trying to retrieve a graph of arbitrary depth?

Yes. Not sure of what depth. Currently I implemented it as recursive calls using the getter API.

How much data are you pulling back after "a couple levels"?

The actual data is not that large... at depth 3 we are looking at 137 unique subjects. About 1000 triples. The graph fetch takes about 20s, due to the recursive getter calls.

That seems unusually slow.  I am doing a construct to pull back 1000 triples from a 100M triple database and it completes in well under a second.  What does the code look like that is doing the calls?

Cheers,

Mike
 

What are you planning on doing with that data?  Is it something you could leave in place and just query as-is instead?

Yeah, we don't know what we are exactly doing with the data, but something related to some data browsing interface.

Tze-John Tang

unread,
Jun 3, 2014, 3:19:34 PM6/3/14
to sta...@clarkparsia.com
Each is called once, and each recursively calls until the depth is reached. i.e. depth = 3.

protected Graph fetchSubjectGraph(URI entityUri, int depth) throws StardogException {
 
Connection conn = null;
 
 
try {
 conn
= connPool.obtain();


 
Graph fullGraph = new GraphBuilder().graph();
 
Graph subjectGraph = conn.get().subject(entityUri).context(valueFactory.createURI(IStardogConstants.ALL_GRAPHS_CONTEXT)).graph();
 fullGraph
.addAll(subjectGraph);
 
 depth
--;
 
 
if (depth > 0) {
 
for (Statement stmt : subjectGraph) {
 
if (stmt.getObject() instanceof URI) {
 fullGraph
.addAll(fetchSubjectGraph((URI) stmt.getObject(), depth));
 
}
 
}
 
}
 
 
return fullGraph;
 
} finally {
 
try {
 
if (conn != null) {
 connPool
.release(conn);
 
}
 
} catch (StardogException excep) {
 logger
.warn("Exception during connection pool release.", excep);
 
}
 
}
 
}
 
 
protected Graph fetchObjectGraph(URI entityUri, int depth) throws StardogException {
 
Connection conn = null;
 
 
try {
 conn
= connPool.obtain();


 
Graph fullGraph = new GraphBuilder().graph();
 
Graph objectGraph = conn.get().context(valueFactory.createURI(IStardogConstants.ALL_GRAPHS_CONTEXT)).object(entityUri).graph();
 fullGraph
.addAll(objectGraph);
 
 depth
--;
 
 
if (depth > 0) {
 
for (Statement stmt : objectGraph) {
 
if (stmt.getSubject() instanceof URI) {
 fullGraph
.addAll(fetchObjectGraph(entityUri, depth));
 
}
 
}
 
}
 
 
return fullGraph;
 
} finally {
 
try {
 
if (conn != null) {
 connPool
.release(conn);
 
}
 
} catch (StardogException excep) {
 logger
.warn("Exception during connection pool release.", excep);
 
}
 
}
 
}

Mike Grove

unread,
Jun 3, 2014, 3:44:32 PM6/3/14
to stardog
The first thing that jumps out at me is an issue with how you're using the connection pool.  If you are using the example pool configuration from the example code in the distribution, the initial pool size is 10.

As written, your code can require a new connection for each fetch.  You stated that there are 137 unique subjects, not sure if that includes objects or not, but this code could, given an initial pool size of 10, create 127 connections.  To some degree that depends on the shape of the graph you're walking over, but because you don't release the Connection to the pool before you iterate further, once the pool is exhausted, each further iteration will require a new Connection.

So part of what you would be timing is how long it takes to create, potentially, a hundred or more connections.

You may consider a refactor; just pass in the Getter from a Connection created at the top level.  You can reuse it at each level by simply resetting the state as you go and setting the appropriate fields.  Then you avoid creating connections, managing the pool, or the overhead (though minimal) of creating a bunch of objects.

This would also be fairly straightforward to write as an iterative algorithm rather than recursive.  That might perform better with larger datasets or larger depths.

Cheers,

Mike



Tze-John Tang

unread,
Jun 3, 2014, 4:09:16 PM6/3/14
to sta...@clarkparsia.com
Thanks Mike.

Even with the change to pass the Getter in, it is still taking the same amount of time. Essentially each call to the Getter....statement() results in some fetch, and it is adding up.  In the logs, each call results in:

15:05:27.986 ["http-bio-8080"-exec-10] DEBUG c.c.c.p.client.rpc.DefaultRPCClient - RPC CALL: 7237782d-a316-4c77-b820-8996370a10b7 QueryRequest
15:05:27.990 ["http-bio-8080"-exec-10] DEBUG c.complexible.stardog.api.Connection - Pushing outstanding changes to index
15:05:27.990 ["http-bio-8080"-exec-10] DEBUG c.complexible.stardog.api.Connection - Index synch complete

Mike Grove

unread,
Jun 4, 2014, 7:18:51 AM6/4/14
to stardog
On Tue, Jun 3, 2014 at 4:09 PM, Tze-John Tang <tzejoh...@gmail.com> wrote:
Thanks Mike.

Even with the change to pass the Getter in, it is still taking the same amount of time. Essentially each call to the Getter....statement() results in some fetch, and it is adding up.  In the logs, each call results in:

Can you provide a minimal example that demonstrates the performance?

Thanks.

Mike
 

15:05:27.986 ["http-bio-8080"-exec-10] DEBUG c.c.c.p.client.rpc.DefaultRPCClient - RPC CALL: 7237782d-a316-4c77-b820-8996370a10b7 QueryRequest
15:05:27.990 ["http-bio-8080"-exec-10] DEBUG c.complexible.stardog.api.Connection - Pushing outstanding changes to index
15:05:27.990 ["http-bio-8080"-exec-10] DEBUG c.complexible.stardog.api.Connection - Index synch complete

Reply all
Reply to author
Forward
0 new messages