stardog database size - diff Directory

1 view
Skip to first unread message

Mark James

unread,
Jun 16, 2016, 9:37:54 AM6/16/16
to Stardog
Hi Guy,
I have a quick query regarding db size and the diff directory.

I have a db with under 10 million triples that is currently at 69GB. 68 of those GB's are in the diff directory for this db. Initially I thought I had versioning on by accident, but I've confirmed it isn't. 

What's causing the huge size? And what can I do to compact it down?

The docs suggest this kind of size would be expected for 1 billion triple db's. We're a long way of that with this db...

This is on a 4.0.5 instance. Though it was running for a few days at 4.1 prior to a downgrade.

cheers
Mark


Zachary Whitley

unread,
Jun 16, 2016, 9:45:52 AM6/16/16
to Stardog
I'm guessing that differential is referring to differential indexes as described here http://docs.stardog.com/#_differential_indexes

Did change any of the default options?

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
---
You received this message because you are subscribed to the Google Groups "Stardog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stardog+u...@clarkparsia.com.

Mark James

unread,
Jun 16, 2016, 9:51:34 AM6/16/16
to sta...@clarkparsia.com
That would make sense. I did leave the defaults though.

index.differential.enable.limit           | 1000000 

index.differential.merge.limit            | 10000    

There is no mention in the docs that I can see for another setting to enable a cleanup of the differential index post merge.


--
Mark James | Smarta Systems Pty Ltd
+61 433 922 944mja...@smarta.io | smarta.io

Zachary Whitley

unread,
Jun 16, 2016, 9:57:36 AM6/16/16
to Stardog
I'm guessing it will always be "cleaned up" after a merge. Anything interesting in stardog.log? Are you running Linux or Windows?

Mark James

unread,
Jun 16, 2016, 10:09:32 AM6/16/16
to sta...@clarkparsia.com
We're running on Ubuntu. Stardog itself is running in a docker container with the db added as a named partition. So perhaps docker might be complicating something.

There are some errors around dodgey queries in the log file but nothing that jumps out as related to this. 

I'm not sure if I've seen this one before (But don't think it's relevant?) -
WARN  2016-06-16 00:20:09,717 [StardogServer.WorkerGroup-2] io.netty.channel.DefaultChannelPipeline:warn(151): An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.io.IOException: java.nio.channels.ClosedChannelException
        at com.complexible.stardog.protocols.http.server.HttpMessageEncoder$SendResponseChannelOutputStream.assertNoError(HttpMessageEncoder.java:452) ~[stardog-protocols-http-server-4.0.5.jar:?]
        at com.complexible.stardog.protocols.http.server.HttpMessageEncoder$SendResponseChannelOutputStream.close(HttpMessageEncoder.java:364) ~[stardog-protocols-http-server-4.0.5.jar:?]
        at com.complexible.stardog.protocols.http.server.HttpMessageEncoder.write(HttpMessageEncoder.java:171) [stardog-protocols-http-server-4.0.5.jar:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) [netty-all-4.0.32.Final.jar:4.0.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) [netty-all-4.0.32.Final.jar:4.0.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) [netty-all-4.0.32.Final.jar:4.0.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) [netty-all-4.0.32.Final.jar:4.0.32.Final]
        at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) [netty-all-4.0.32.Final.jar:4.0.32.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) [netty-all-4.0.32.Final.jar:4.0.32.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-all-4.0.32.Final.jar:4.0.32.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) [netty-all-4.0.32.Final.jar:4.0.32.Final]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72-internal]
Caused by: java.nio.channels.ClosedChannelException

Evren Sirin

unread,
Jun 16, 2016, 1:10:30 PM6/16/16
to Stardog
Differential index is cleaned up logically after a merge but the
physical files are reused next time the differential index is needed.
But if you have open connections/queries then these files would not be
reclaimed since those connections/queries would need their snapshot to
stay around. So if you had many concurrent writes updating the diff
index and read connections that were not closed for a very long time
then you might end up with the diff index getting too large. There was
also a bug fixed in 4.1.1 where removals might cause diff index exceed
the index.differential.merge.limit but that wouldn't explain the size
you are seeing.

The solution now would be to first run `db optimize` to make sure diff
index is merged to the main index, offline the database (or shutdown
the server), delete the diff directory, and online the database (or
start the server).

If you can send us the keys.N files in the database and diff directory
we can also take a look at that. If you have any other information
about what was going on when this happened that would be helpful too.

Best,
Evren
Reply all
Reply to author
Forward
0 new messages