Neo4j 2.0 to 2.1 Upgrade Fails When Neo4j Running as a Windows Service

154 views
Skip to first unread message

Jim Salmons

unread,
May 31, 2014, 5:09:30 PM5/31/14
to ne...@googlegroups.com
This issue has surfaced a number of times in various flavors, particularly when an explicit upgrade (non-automatic requiring "allow_store_upgrade=true") is involved and the culprit seems to be this:
  • If you run Neo4j as a Windows Service (having installed via Neo4jInstaller.bat and using recommended start/stop 'sc' commands, etc.) there is no way to do a clean shutdown of a database.
(As long as no upgrade is involved, apparently Neo4j running as a Windows Service can start, stop, and restart databases with no problem (although the message logs reveal that a non-clean shutdown has been silently been detected and addressed on restart).

In the past, it has been suggested that this issue was from "jumping the gun" of not letting the Neo4j-Server instance shut down completely before restarting the Neo4j-Server Windows Service. But that is not the case.

To test the basic situation I did this with the same results on both 2.0 and 2.1:
  1. Create an empty graph.db directory in my Neo4j data directory, then start the Neo4j-Server Windows Service instance.
  2. Observe the fresh database being made. Make a copy of the messages.log (called messages_onCreation.log
  3. Stop the Neo4j-Server Windows Service... wait, wait... wait (longer than necessary)
  4. Compare the two messages logs... the same, no stopping messages.
  5. Delete the message.log and start the Neo4j-Server Windows Service.
  6. Observe the database directory during restart. Make a copy of the new messages.log (called messages_on2ndStart.log
With no activity other than to create it, stop it, and then start it back up the messages.log shows multiple messages about detecting a non-clean shutdown. (The number of non-clean detection issues in the log depend on the Neo4j version being run.)

In the past, the recommendation was to use the deprecated Gremlin shell to do g.shutdown() then exit. But this doesn't seem to be (readily) available.

QUESTION 1: Does anyone know of a way to cleanly shut down a Neo4j database running under a Windows Service configuration?

QUESTION 2: Has anyone running Neo4j-Server as a Windows Service successfully upgraded a 2.0 DB to 2.1? If so, how?

I'm hoping to get a quick helpful reply or additional insights here before posting a question to S/O. Once I fully understand this issue, if it still appears that the fundamental problem is the Windows Service not cleanly shutting down, I'll enter an issue to this effect on the Neo4j GitHub Issue queue.

Thanks,
--Jim--

Chris Vest

unread,
Jun 1, 2014, 1:45:18 PM6/1/14
to ne...@googlegroups.com
You can, after you’ve shut down the Neo4j Windows service, use neo4j-shell with the `-path <dir>` option to start a non-service Neo4j instance and then do a clean shutdown with the `exit` command. You should then be able to upgrade.

I don’t know why there is this problem with shutting down the Windows service.

--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jim Salmons

unread,
Jun 1, 2014, 5:01:07 PM6/1/14
to ne...@googlegroups.com
Thanks for the pointer to that technique, Chris. None of my databases ATM are so critical that it is a show-stopper for me, but I figure it is good to surface the issue in the event that others with critical needs get bitten.

Now that I've looked at the situation a bit, too, it appears that a possible 'fix' would be to silently call the shell and do the exit before asking 'sc' to stop the Windows Service. I'm going to look at incorporating that approach into my Neo4jCP (mini Control Panel, https://github.com/Jim-Salmons/neo4jcp) and if that looks promising, we might want to see if a similar approach in the Neo4jInstaller.bat file would avoid this situation.

ITMT, you might want to consider a Heads-up note somewhere in the Update 2.0-.2.1 notes that points this situation out and provides the recommended work-around.

Thanks again,
--Jim--

Michael Hunger

unread,
Jun 1, 2014, 8:58:18 PM6/1/14
to ne...@googlegroups.com
I think we should rather find and fix the root cause why stopping the service via sc does make an unclean shutdown.

Michael

Jim Salmons

unread,
Jun 2, 2014, 5:57:27 PM6/2/14
to ne...@googlegroups.com
Absolutely agree. 

I will gladly cooperate on any investigation or shake-down of any solution we might develop. My suspicion is that 'sc' just asks the service to stop at an outer "black box level" where the service itself may not get an 'inner signal' of the request to stop. If this is the case, since the recommended procedure is to run the supplied batch file rather than directly issue the service stop request, it may be as simple as incorporating a silent run of the shell and 'exit' before telling 'sc' to stop the service.

If OTOH the service does get such stop notification, it just may be that an exit call is missing in this case before the service stops.

I'll look into the basic behavior of Windows services to see if there is any info that might shed some light on this.

--Jim--

Michael Hunger

unread,
Jun 2, 2014, 8:54:19 PM6/2/14
to ne...@googlegroups.com
it looks as if the shutdown timeout is just too short. Especially if there are still hanging tx Neo4j takes more than 20s to stop and aI read that the default timeout is 30s. I also saw that there is some registry setting for the timeout that is applied by sc.

Reply all
Reply to author
Forward
0 new messages