Very slow shutting down of Stardog

2 views
Skip to first unread message

Ron Michael Zettlemoyer

unread,
Jul 20, 2012, 4:27:51 PM7/20/12
to sta...@clarkparsia.com
I've noticed in the recent versions that sometimes when I hit Control-C to shutdown Stardog (in Windows) it takes a little while to shutdown.  In 1.0.0 and 1.0.1 I saw it take maybe 15-30 seconds to finally shutdown.  Today I waited about 3 minutes for 1.0.2 to shutdown; although I think the time is more related to the load and size of the data.  I started getting impatient so I tried Control-Break which started giving me errors (see below).  It did finally stop at about 4 minutes.

The errors may mean nothing and just be a result of the Control-Break, but I thought I'd attach them in case they are useful.


nitor entry [0x0000000010f6f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x000000000d3da000 nid=0x2140 waiting for mo
nitor entry [0x0000000010daf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x000000000d3d7800 nid=0x2730 waiting for mo
nitor entry [0x000000001080f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x000000000d3d6000 nid=0x1060 waiting for mo
nitor entry [0x0000000010c5f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x000000000d3e4000 nid=0x1be0 waiting for mo
nitor entry [0x0000000010abf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e7000 nid=0x1860 waiting for mo
nitor entry [0x000000001091f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e6000 nid=0x2680 waiting for mo
nitor entry [0x00000000106cf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e4800 nid=0x254c waiting for mo
nitor entry [0x000000000dcbf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e9800 nid=0x1184 waiting for mo
nitor entry [0x000000001057f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e4000 nid=0x4a4 waiting for mon
itor entry [0x000000001025e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e7800 nid=0x16f0 waiting for mo
nitor entry [0x00000000100de000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e8800 nid=0x1fb8 waiting for mo
nitor entry [0x000000000e74f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e9000 nid=0x1430 waiting for mo
nitor entry [0x000000000e57f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e2800 nid=0x206c waiting for mo
nitor entry [0x000000000e3ff000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x00000000198e5800 nid=0x1d28 waiting for mo
nitor entry [0x000000000e1cf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Unknown Source)
        - waiting to lock <0x00000007807c5220> (a java.lang.Class for java.lang.
Shutdown)
        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"SIGINT handler" daemon prio=10 tid=0x000000000d3db800 nid=0x27c8 runnable [0x00
0000001a75e000]
   java.lang.Thread.State: RUNNABLE
        at java.io.WinNTFileSystem.delete0(Native Method)
        at java.io.Win32FileSystem.delete(Unknown Source)
        at java.io.File.delete(Unknown Source)
        at java.io.DeleteOnExitHook.runHooks(Unknown Source)
        at java.io.DeleteOnExitHook$1.run(Unknown Source)
        at java.lang.Shutdown.runHooks(Unknown Source)
        at java.lang.Shutdown.sequence(Unknown Source)
        at java.lang.Shutdown.exit(Unknown Source)
        - locked <0x00000007807c5220> (a java.lang.Class for java.lang.Shutdown)

        at java.lang.Terminator$1.handle(Unknown Source)
        at sun.misc.Signal$1.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"464084054@qtp-1521428112-1062" prio=6 tid=0x00000000198e3000 nid=0x2788 in Obje
ct.wait() [0x00000000103bf000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:625)
        - locked <0x00000007f304f670> (a org.mortbay.thread.QueuedThreadPool$Poo
lThread)

"Thread-7" daemon prio=6 tid=0x000000000d3e3000 nid=0x2574 waiting on condition
[0x00000000146cf000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000780426620> (a java.util.concurrent.lock
s.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject
.awaitNanos(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.tak
e(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.tak
e(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"DestroyJavaVM" prio=6 tid=0x000000000d3e2800 nid=0x2248 waiting on condition [0
x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"511626597@qtp-1521428112-3 - Acceptor0 SelectChann...@0.0.0.0:8989" pri
o=6 tid=0x000000000d3e2000 nid=0x16e0 runnable [0x00000000144de000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method)
        at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(Unknown Source)
        at sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(Unknown Source)

        at sun.nio.ch.WindowsSelectorImpl.doSelect(Unknown Source)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
        - locked <0x000000078078c7e0> (a sun.nio.ch.Util$2)
        - locked <0x000000078078c7d0> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000078018f808> (a sun.nio.ch.WindowsSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(Unknown Source)
        at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager
.java:459)
        at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)

        at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConn
ector.java:124)
        at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.ja
va:707)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)

"com.google.inject.internal.util.$Finalizer" daemon prio=6 tid=0x000000000d2f400
0 nid=0x257c in Object.wait() [0x000000000df8f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000780540800> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x0000000780540800> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at com.google.inject.internal.util.$Finalizer.run(Finalizer.java:114)

"Service Thread" daemon prio=6 tid=0x000000000b2b9800 nid=0x14d4 runnable [0x000
0000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x000000000b2ad800 nid=0x144 waiting on
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x000000000b2a8000 nid=0x23d0 waiting on
 condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Attach Listener" daemon prio=10 tid=0x000000000b29f000 nid=0x123c runnable [0x0
000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x000000000b29e000 nid=0x235c waiting on
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=0x000000000b21a000 nid=0x1128 in Object.wait() [0x
000000000c25f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x0000000780028f20> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

"Reference Handler" daemon prio=10 tid=0x000000000b211000 nid=0x1628 in Object.w
ait() [0x000000000c0ef000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
        - locked <0x0000000780028c98> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x000000000b207000 nid=0x1048 runnable

"GC task thread#0 (ParallelGC)" prio=6 tid=0x00000000020a7000 nid=0x14d0 runnabl
e

"GC task thread#1 (ParallelGC)" prio=6 tid=0x00000000020a8800 nid=0x1d40 runnabl
e

"VM Periodic Task Thread" prio=10 tid=0x000000000b2c2000 nid=0x8ac waiting on co
ndition

JNI global references: 201

Heap
 PSYoungGen      total 590400K, used 342538K [0x00000007d5560000, 0x000000080000
0000, 0x0000000800000000)
  eden space 486144K, 70% used [0x00000007d5560000,0x00000007ea332860,0x00000007
f3020000)
  from space 104256K, 0% used [0x00000007f3020000,0x00000007f30d0000,0x00000007f
95f0000)
  to   space 99712K, 0% used [0x00000007f9ea0000,0x00000007f9ea0000,0x0000000800
000000)
 ParOldGen       total 1398144K, used 407198K [0x0000000780000000, 0x00000007d55
60000, 0x00000007d5560000)
  object space 1398144K, 29% used [0x0000000780000000,0x0000000798da7ab0,0x00000
007d5560000)
 PSPermGen       total 67072K, used 34118K [0x000000077ae00000, 0x000000077ef800
00, 0x0000000780000000)
  object space 67072K, 50% used [0x000000077ae00000,0x000000077cf51a20,0x0000000
77ef80000)

                                                                             


Kendall Clark

unread,
Jul 20, 2012, 4:30:01 PM7/20/12
to sta...@clarkparsia.com
What kind of database? 
--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
 
 


--
Cheers,
Kendall

Ron Michael Zettlemoyer

unread,
Jul 20, 2012, 5:02:20 PM7/20/12
to sta...@clarkparsia.com
Ummm, a Stardog database?  :)  I suppose I might have a better answer if I RTFM carefully, I remember there being some options for creating a database...  does it help if I say I just created the database using no special options or anything like that?

Kendall Clark

unread,
Jul 20, 2012, 5:50:53 PM7/20/12
to sta...@clarkparsia.com
I meant what creation options you used, which you've answered. Thx.


On Friday, July 20, 2012, Ron Michael Zettlemoyer wrote:
Ummm, a Stardog database?  :)  I suppose I might have a better answer if I RTFM carefully, I remember there being some options for creating a database...  does it help if I say I just created the database using no special options or anything like that?

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en
 
 


--
Cheers,
Kendall

Mike Grove

unread,
Jul 23, 2012, 1:10:15 PM7/23/12
to sta...@clarkparsia.com
We'll try and reproduce this on a Windows machine, but we routinely
create and use databases that contain 100's of millions or billions of
statements when doing testing and we've not see that type of sluggish
shutdown behavior. But we're not using Windows, so might be some
weird OS specific thing we're just not allowing for.

One of the things I can see in that trace is that it's cleaning up
temp files created by Stardog that it marked as deleteOnExit. I don't
know how long into things that thread dump is from, but it's odd that
that would take any significant amount of time.

For our testing, can you give me a better idea of the setup and what
you're up to?

Did you run the server w/ the standard parameters? If not, what did
you change? Are there other databases online in the server, and if
so, how many and how big are each of them? How many triples were in
the database you recently created? Do you see the slowdown if you
just start the server and then immediately shut it down, or do you
have to have created a database to get it to happen? Do you have to
create multiple databases to trigger the slowdown, or does it happen
with the first one?

Which Windows? Which Java?

Thanks.

Mike

Ron Michael Zettlemoyer

unread,
Jul 23, 2012, 1:48:16 PM7/23/12
to sta...@clarkparsia.com
The Stardog server was run with standard parameters.  There are a few other databases on the server, but I'm pretty sure none of them were used at all when this occurred.  The others are all pretty small too: 10k triples at most.

I have never encountered the slowdown if I start the server and then stop it.  It only seems to happen after I've done a lot of reads or writes - at which point the java process is consuming about 2GB of RAM.

I am not creating new databases at this point, I'm really just working in one and have an app that loads and syncs data in there (over HTTP using dotNetRDF) with data from a few external sources, so it's doing a lot of little reads and writes.  There's also a web app that reads and displays the data; that activity is pretty minimal and doesn't seem to cause the slowdowns.

Server is 64-bit Windows 2008 R2 Datacenter, service pack 1.  Java version is 1.7.0_02.  Machine is a large AWS instance with two cores and 7.5GB of RAM.

If it'd help I'd be happy to give you access to the server or copy of any data or files you want.  

Kendall Clark

unread,
Jul 23, 2012, 2:37:22 PM7/23/12
to sta...@clarkparsia.com
How much RAM are you giving the JVM on the server side? Standard amount (i.e., you aren't tweaking it)?

But, also, I/O on EC2 is notoriously awful and spotty, especially on smaller instances. If it's EC2 I/O, there's obviously nothing we can do about it...

Can you reproduce on an actual computer?

Cheers,
Kendall



Ron Michael Zettlemoyer

unread,
Jul 23, 2012, 3:05:56 PM7/23/12
to sta...@clarkparsia.com
Default memory, no tweaks at all.

It's not I/O.  I know EC2 I/O is a bit slow but it's not *this* slow.  :)  And this is a Large instance so it is getting "high" I/O performance.

But to make doubly sure...

1- I just hit Control-C in Stardog on my AWS server and am able to see that it's not waiting on any disk resources as it shuts down.  It's not doing much of anything really.  It took about 4 minutes to finally shut down.  As I was watching, 30-45 seconds could pass where it did no disk activity at all.

2- As you suggested I did try to see how it performed locally.  I created a blank database and ran my syncing app to start loading in all the external data into the new database.  I didn't let it run all the way (it can take about an hour when starting from scratch) but I let it run for about 10 minutes and it loaded about 12k triples.  I then hit Control-C and it took about 30 seconds for Stardog to shut down.


Kendall Clark

unread,
Jul 23, 2012, 3:21:59 PM7/23/12
to sta...@clarkparsia.com
On Mon, Jul 23, 2012 at 3:05 PM, Ron Michael Zettlemoyer <ron.zet...@fynydd.com> wrote:
1- I just hit Control-C in Stardog on my AWS server and am able to see that it's not waiting on any disk resources as it shuts down.  It's not doing much of anything really.  It took about 4 minutes to finally shut down.  As I was watching, 30-45 seconds could pass where it did no disk activity at all.

I think you have to attach a debugger to know for sure what's really happening. Did you do that? If so, can we get the output?
 
2- As you suggested I did try to see how it performed locally.  I created a blank database and ran my syncing app to start loading in all the external data into the new database.  I didn't let it run all the way (it can take about an hour when starting from scratch) but I let it run for about 10 minutes and it loaded about 12k triples.  I then hit Control-C and it took about 30 seconds for Stardog to shut down.

Can you use stardog-admin server stop instead of ctrl-c? It's generally a better choice, anyway. 

This *could* be a JVM bug on yr platform; could you try a Java 6 JVM instead of Java 7?

Cheers,
Kendall

Ron Michael Zettlemoyer

unread,
Jul 25, 2012, 10:40:03 AM7/25/12
to sta...@clarkparsia.com
Whew, this is getting to be a lot of work.  :)

It kinda feels like there is some memory leak...  if my syncing app runs long enough (doing a lot of little reads and writes from Stardog) the java.exe process keeps on using more and more memory up to the limit set in the batch file.  Then it takes a while to clean up and shut down.

I didn't attach a debugger; I just used Window's resource monitor to watch the I/O activity of java.exe.  I don't really do java much but is there some special Java debugger or debug mode I could use that might provide more useful information for you?

FWIW I just restarted Stardog using Java 6.  I'll let it run for a day or two and see if that makes any difference.

BTW, I found a bug in stardog-admin.bat:

if exist %JAVA_HOME%\bin\java.exe set JAVA="%JAVA_HOME%\bin\java"

should be:

if exist "%JAVA_HOME%\bin\java.exe" set JAVA="%JAVA_HOME%\bin\java"

Without the quotes the exist check will fail if JAVA_HOME contains a space, which it most likely will as it's defaultly installed in c:\program files\java.



Ron Michael Zettlemoyer

unread,
Jul 27, 2012, 9:34:09 AM7/27/12
to
I can confirm that switching to Java 6 does not make a difference, using Control-C as usual to shutdown.  I'll let it run again for a while and try the stardog shutdown command.



Kendall Clark

unread,
Jul 27, 2012, 9:34:02 AM7/27/12
to sta...@clarkparsia.com
Thanks, Ron.

Evren put some changes into 1.0.3 that may address this issue. (They address an issue that's similar to yrs; let's hope it's the same one.)

Cheers,
Kendall



On Fri, Jul 27, 2012 at 9:30 AM, Ron Michael Zettlemoyer <ron.zet...@fynydd.com> wrote:
I can confirm that switching to Java 6 does not made a difference, using Control-C as usual to shutdown.  I'll let it run again for a while and try the stardog shutdown command.

Ron Michael Zettlemoyer

unread,
Jul 31, 2012, 1:55:55 PM7/31/12
to sta...@clarkparsia.com
Update: Shutting down Stardog with the shutdown command as opposed to Control-C made no difference.  :(  I quickly got a message back that the server successfully received the shutdown request, but it's still taking forever to actually shutdown.

Once it's down I'll be installing Stardog 1.0.3 and we'll see how that works.


Kendall Clark

unread,
Jul 31, 2012, 2:02:43 PM7/31/12
to sta...@clarkparsia.com
Thanks, Ron.

Let us know how 1.0.3 does.

Cheers,
Kendall



On Tue, Jul 31, 2012 at 1:55 PM, Ron Michael Zettlemoyer <ron.zet...@fynydd.com> wrote:
Update: Shutting down Stardog with the shutdown command as opposed to Control-C made no difference.  :(  I quickly got a message back that the server successfully received the shutdown request, but it's still taking forever to actually shutdown.

Once it's down I'll be installing Stardog 1.0.3 and we'll see how that works.

Ron Michael Zettlemoyer

unread,
Jul 31, 2012, 2:22:20 PM7/31/12
to sta...@clarkparsia.com
Funny, when I restarted with Stardog 1.0.3 I got an error that the statistics were missing and that they probably had to be rebuilt.  To double-check I went back to my original copy of things with Stardog 1.0.2 and restarted that and got the same error.  So sometime during the past few days the statistics got corrupted?

Is there a command to rebuild them?  I couldn't find one so I just exported my data, dropped and re-created the database and loaded the old data.  

Kendall Clark

unread,
Aug 1, 2012, 12:18:08 PM8/1/12
to sta...@clarkparsia.com
On Tue, Jul 31, 2012 at 2:22 PM, Ron Michael Zettlemoyer <ron.zet...@fynydd.com> wrote:
Funny, when I restarted with Stardog 1.0.3 I got an error that the statistics were missing and that they probably had to be rebuilt.  To double-check I went back to my original copy of things with Stardog 1.0.2 and restarted that and got the same error.  So sometime during the past few days the statistics got corrupted?

This is interesting. We haven't seen that happen.

Can you explain the upgrade sequence you performed?
 
Is there a command to rebuild them?  I couldn't find one so I just exported my data, dropped and re-created the database and loaded the old data.  

Nothing explicit; what you did (wipe & reload) will recreate them.

We need to figure out why they got corrupted in the first place...

Cheers,
Kendall 

Ron Michael Zettlemoyer

unread,
Aug 2, 2012, 9:22:42 AM8/2/12
to sta...@clarkparsia.com
That's the thing, after I did the upgrade to 1.0.3 and noticed the statistics error I switched over to a backup I had made of everything before the upgrade. I took everything back to how it was right after I last shut down Stardog 1.0.2.  But even there I found the statistics error.  So, the upgrade was not a factor; sometime between my last restart of Stardog (last Thursday or Friday probably) and the final shutdown of 1.0.2 on Tuesday, the statistics file got corrupted.  It did take 1-2 minutes to shutdown so maybe something failed during the shutdown that messed up the file.

That probably doesn't give you much to go on.  But like I said, I do have a complete backup of Stardog and the data files as it was right after that last shutdown of 1.0.2.  I'd be happy to give that to you if you think it'd be useful.

BTW, I'm still seeing slow shutdowns of Stardog after it's run for a while even with 1.0.3.  :(

And a note about the statistics error: it didn't tell me which database had a corrupt statistics file.  At first I thought maybe they were all corrupt, but after recreating my one big database from a previous export, the error disappeared.  So it only affected one database, which makes sense as it is the only one that is highly active in capturing new data.  The other two are lightly used and mostly doing reads.

Ron Michael Zettlemoyer

unread,
Aug 28, 2012, 10:43:28 AM8/28/12
to sta...@clarkparsia.com
Update: 

After running Stardog 1.0.4 for 7 days I shut it down in preparation of installing 1.0.5. It took 14 minutes to shutdown. I did copy a bunch of stack traces while it was doing that. They are attached. Maybe it'll help shed some more light on the problem.

Luckily it shut down properly and I was able to upgrade and restart without any issue.

Kendall suggested that I try to reproduce this on a laptop which I could bring down to you to diagnose in person. But I was thinking: this is all running in an EC2 instance. I could easily clone it and give you full access to the copy. A physical machine might be a little easier to deal with, and I'm not opposed to the laptop idea, but I can give you a EC2 cloned instance much faster than I can set up a laptop with everything.
stardogTrace.zip

Mike Grove

unread,
Aug 28, 2012, 12:54:01 PM8/28/12
to sta...@clarkparsia.com
On Tue, Aug 28, 2012 at 10:43 AM, Ron Michael Zettlemoyer <ron.zet...@fynydd.com> wrote:
Update: 

After running Stardog 1.0.4 for 7 days I shut it down in preparation of installing 1.0.5. It took 14 minutes to shutdown. I did copy a bunch of stack traces while it was doing that. They are attached. Maybe it'll help shed some more light on the problem.

Luckily it shut down properly and I was able to upgrade and restart without any issue.

That's good to hear.

Thanks for sending this along.  We were finally able to reproduce locally enough that we've got a solid idea of what's going on.  So I dont think we'll need access to the EC2 instance quite yet, but I appreciate the offer.

I'll continue to look into a resolution for this now that we can see what's going on.

Cheers,

Mike


Kendall suggested that I try to reproduce this on a laptop which I could bring down to you to diagnose in person. But I was thinking: this is all running in an EC2 instance. I could easily clone it and give you full access to the copy. A physical machine might be a little easier to deal with, and I'm not opposed to the laptop idea, but I can give you a EC2 cloned instance much faster than I can set up a laptop with everything.




On Thursday, August 2, 2012 9:22:42 AM UTC-4, Ron Michael Zettlemoyer wrote:
That's the thing, after I did the upgrade to 1.0.3 and noticed the statistics error I switched over to a backup I had made of everything before the upgrade. I took everything back to how it was right after I last shut down Stardog 1.0.2.  But even there I found the statistics error.  So, the upgrade was not a factor; sometime between my last restart of Stardog (last Thursday or Friday probably) and the final shutdown of 1.0.2 on Tuesday, the statistics file got corrupted.  It did take 1-2 minutes to shutdown so maybe something failed during the shutdown that messed up the file.

That probably doesn't give you much to go on.  But like I said, I do have a complete backup of Stardog and the data files as it was right after that last shutdown of 1.0.2.  I'd be happy to give that to you if you think it'd be useful.

BTW, I'm still seeing slow shutdowns of Stardog after it's run for a while even with 1.0.3.  :(

And a note about the statistics error: it didn't tell me which database had a corrupt statistics file.  At first I thought maybe they were all corrupt, but after recreating my one big database from a previous export, the error disappeared.  So it only affected one database, which makes sense as it is the only one that is highly active in capturing new data.  The other two are lightly used and mostly doing reads.

Reply all
Reply to author
Forward
0 new messages