Reproducible corruption

13 views
Skip to first unread message

Håvard Ottestad

unread,
Feb 7, 2017, 2:14:27 PM2/7/17
to Stardog
Hi,

I've made a reproducible example that corrupts a stardog database.

Checkout the code from https://github.com/hmottestad/corruptDatabse and run:

docker-compose up


You should see this on screen:

.....
stardog_1  | 
stardog_1  | Creating testSesame database
stardog_1  | Successfully created database 'testSesame'.
stardog_1  | 


Now run the main method in the Main class. It will start adding data to stardog. Wait until you see "50000" printed on screen.

Go back to your terminal and press ctrl+c (once is enough!). This will send an INT signal to stardog. And the java program will crash. This is fine.

Run up stardog again with 

docker-compose up

And this time you should see the following on screen. 

.....
stardog_1  | Creating testSesame database
stardog_1  | com.complexible.stardog.server.DatabaseExistsException: Database already exists: testSesame
stardog_1  |    at com.complexible.stardog.StardogKernel.assertDbNotExists(StardogKernel.java:2157)
stardog_1  |    at com.complexible.stardog.StardogKernel.createDatabase(StardogKernel.java:1039)
stardog_1  |    at com.complexible.stardog.protocols.server.admin.AdminServerFunction.create(AdminServerFunction.java:700)
stardog_1  |    at com.complexible.stardog.protocols.server.admin.AdminServerFunction.handleMessage(AdminServerFunction.java:148)
stardog_1  |    at com.complexible.common.protocols.server.rpc.ServerHandler.lambda$handleMessage$1(ServerHandler.java:337)
stardog_1  |    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
stardog_1  |    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
stardog_1  |    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
stardog_1  |    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
stardog_1  |    at java.lang.Thread.run(Thread.java:745)
stardog_1  | Database already exists: testSesame

If you don't see this, then it means that the database is corrupted!!!

If not, then run the java program again. Wait for it to print 50000, or even a bit longer. The press ctrl+c in the terminal and stardog should stop. Bring it back up again and start over. When you see the following output from stardog it's time to stop:

.....
stardog_1  | 
stardog_1  | Creating testSesame database
stardog_1  | Successfully created database 'testSesame'.
stardog_1  | 

To see what's happened we need to get access to the stardog.log file.

To do this run the following command:

docker-compose up -d; docker-compose exec stardog bash

This will drop you into bash inside the container. Now run 

clear; tail -n100 ../../../data/stardog.log 

In the logs you should see

WARN  2017-02-07 20:01:52,080 [main] com.complexible.stardog.db.DatabaseFactoryImpl:read(137): Database testSesame is invalid and the repair failed: Cannot read header: 252792 != 9768495
INFO  2017-02-07 20:01:52,082 [main] com.complexible.stardog.StardogKernel:initDatabases(1944): Database testSesame will not be available because there was an error initializing the database: Cannot read header: 252792 != 9768495
INFO  2017-02-07 20:01:52,726 [main] com.complexible.stardog.StardogKernel:handleUnusableIndex(1999): Moving irreparable database testSesame to /home/stardog/data/.unusable/testSesame


Let me know when you have checked out the code. So I can delete the repo. 

If you figure out what is happening here I would be very grateful. We sometimes restart our stardog container and three times now we have seen it get corrupted beyond repair like this. In production we would very rarely restart stardog, but you never know when there is going to be a power outage. I've also seen this kind of corruption with read only queries.

Cheers,
Håvard M. Ottestad



Håvard Ottestad

unread,
Feb 7, 2017, 2:33:50 PM2/7/17
to Stardog
To run the code you will need to add a license; docker/stardog/stardog-license-key.bin .

Stephen Nowell

unread,
Feb 7, 2017, 2:44:49 PM2/7/17
to sta...@clarkparsia.com
Hi Håvard,

I have cloned this repository, so you can delete it. If we need to share it internally I can make a copy. I'll take a look into this and see what I can discover.

Cheers,
Stephen
--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Håvard Ottestad

unread,
Feb 7, 2017, 2:52:42 PM2/7/17
to Stardog
Thank you Stephen.

Håvard Ottestad

unread,
Feb 7, 2017, 5:38:37 PM2/7/17
to Stardog
Corruption is much rarer when I force kill. Eg. ctrl-c ... ctrl-c

I know docker will first send a SIGTERM signal and then 10 seconds later send a SIGKILL signal. Maybe the SIGKILL kills the cleaup process that stardog initiates when receiving a SIGINT?

Håvard
Reply all
Reply to author
Forward
0 new messages