openHAB becoming very unstable

297 views
Skip to first unread message

Ben Jones

unread,
Apr 25, 2014, 8:22:14 PM4/25/14
to ope...@googlegroups.com
I am starting to notice openHAB becoming very unstable.

Whenever I startup openHAB it loads all the item files no problem, but then when it gets to starting the various 'Refresh Services' it quite often stalls and only starts 1-2 before moving to loading the rule files. At this point it only ever loads 6 of about 15 rule files and seems to lock up.

Waiting doesn't seem to help.

Sometimes it will successfully start all the refresh services, maybe 1/10 attempts, and then all rule files are loaded. But on average about 3-4 of those rule files will fail to load with a FileNotFoundException. I then have to touch those files to force them to be refreshed and then everything seems to be ok.

It is close to being unusable, as any config change that requires a restart takes me an hour or so to get openHAB back up and running.

I am running the latest 1.5.0-SNAPSHOT but have only upgraded this morning - it was doing the same thing on 1.4.0.

The other thing I have noticed is when shutting down openHAB it can take up to 5mins for the java process to die. What is taking so long to shutdown?

There is nothing in the logs, so I am at a bit of a loss to work out what is going on here.

Anyone got any tips?!


Kai Kreuzer

unread,
Apr 26, 2014, 4:24:38 AM4/26/14
to ope...@googlegroups.com
Ben,

I assume the problem must be due to some binding that you use. I am running 1.4.0 on my production system at I have no troubles with startup or shutdown whatsoever. So although it is probably tedious, try to remove bindings in order to find out with which the problems start.

Best regards,
Kai


--
You received this message because you are subscribed to the Google Groups "openhab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openhab+u...@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at http://groups.google.com/group/openhab.
For more options, visit https://groups.google.com/d/optout.

Brian Crosby

unread,
Apr 26, 2014, 8:33:15 AM4/26/14
to ope...@googlegroups.com
Ben,

I have noticed this same behavior when using the latest snapshots of OH and the Zwave Binding. Unloading the zwave binding and using a 1.4 build seems to work (2 days so far).

Chris Jackson

unread,
Apr 26, 2014, 3:26:12 PM4/26/14
to ope...@googlegroups.com
Hi Brian,
As mentioned previous, for the version you were using (622 I think?) there were very few changes from V1.5, so it’s difficult to see what’s causing this. Of course, the latest code has now changed a lot, but I think most people find it working well (it had quite a lot of testing before it was merged).

In any case, as I said, I’m happy to take a look at any logs if you want to send them - without the logs it’s very difficult to say what might be causing the problem, and therefore it’s not going to get fixed. The other option is to systematically revert back to some over versions and try to find out where it stopped working. Either way, I’m happy to help, but I need more feedback than “it doesn’t work”.

Cheers
Chris

Ben Jones

unread,
Apr 26, 2014, 5:32:06 PM4/26/14
to ope...@googlegroups.com
Hey guys,

Thanks for the replies. After following Kai's advice I think I have narrowed this issue down to the MQTT binding. I first tried removing the ZWave binding and things were still unstable - i.e. locking up on startup.

Once I got rid of MQTT everything starts up ok. I am still running the MQTT IO transport, and the MQTTITUDE binding, but as soon as I add the MQTT binding back in I get lock ups during start up.

Now I have 26 MQTT item bindings, both in/out bound. I will need to do some more research but it would be good to know if anyone else, with a lot of MQTT bindings is having any issues...

Cheers,
Ben

Brian Crosby

unread,
Apr 28, 2014, 8:50:55 AM4/28/14
to ope...@googlegroups.com
Ben

Good information. I have MQTT running in my dev build as well. Let me see if i have the same outcome. Stand by...


--
You received this message because you are subscribed to a topic in the Google Groups "openhab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openhab/XjLkDAPMt58/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openhab+u...@googlegroups.com.

To post to this group, send email to ope...@googlegroups.com.
Visit this group at http://groups.google.com/group/openhab.
For more options, visit https://groups.google.com/d/optout.



--
Brian M Crosby
(703) 286-9128 

Ben Jones

unread,
Apr 28, 2014, 3:53:38 PM4/28/14
to ope...@googlegroups.com
Any luck Brian?

Ben Jones

unread,
Apr 28, 2014, 11:49:54 PM4/28/14
to ope...@googlegroups.com
Greg and I are trying to track down the issue with the Squeezebox action binding, but I think it might be related to a more core part of openHAB. I am just not sure what or where!

After a restart, and waiting the 5-6 mins for it to complete after stalling, I noticed this time my rrd4j persistence bundle hadn't started. I tried starting it manually from the OSGI console and got the following error;

java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.String
        at org.eclipse.equinox.console.commands.EquinoxCommandsConverter.convert(EquinoxCommandsConverter.java:61)
        at org.apache.felix.gogo.runtime.CommandProcessorImpl.convert(CommandProcessorImpl.java:254)
        at org.apache.felix.gogo.runtime.CommandSessionImpl.convert(CommandSessionImpl.java:374)
        at org.apache.felix.gogo.runtime.Reflective.coerce(Reflective.java:389)
        at org.apache.felix.gogo.runtime.Reflective.coerce(Reflective.java:310)
        at org.apache.felix.gogo.runtime.Reflective.invoke(Reflective.java:110)
        at org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:82)
        at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:477)
        at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:403)
        at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)
        at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)
        at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:120)
        at org.apache.felix.gogo.runtime.CommandSessionImpl.execute(CommandSessionImpl.java:89)
        at org.apache.felix.gogo.shell.Console.run(Console.java:62)
        at org.apache.felix.gogo.shell.Shell.console(Shell.java:203)
        at org.apache.felix.gogo.shell.Shell.gosh(Shell.java:128)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.felix.gogo.runtime.Reflective.invoke(Reflective.java:137)
        at org.apache.felix.gogo.runtime.CommandProxy.execute(CommandProxy.java:82)
        at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:477)
        at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:403)
        at org.apache.felix.gogo.runtime.Pipe.run(Pipe.java:108)
        at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:183)
        at org.apache.felix.gogo.r

Greg has noticed this same error when trying to start the Squeezebox action bundle on his setup.

Anyone have any clue what could be the issue here?!

Ben Jones

unread,
Apr 28, 2014, 11:52:48 PM4/28/14
to ope...@googlegroups.com
I should add that after this the rrd4j bundle did actually start and seems to be working ok.

Ben Jones

unread,
Apr 29, 2014, 12:31:20 AM4/29/14
to ope...@googlegroups.com
Things are really not very healthly here. Tried reverting to JDK6 but no difference, back on JDK7 now.

Just did another restart and noticed this in my MQTT logs...

2014-04-29 16:19:49 DEBUG o.o.i.t.mqtt.MqttService[:117]- Starting MQTT Service...
2014-04-29 16:19:49 INFO  o.o.i.t.mqtt.MqttService[:102]- MQTT Service initialization completed.
2014-04-29 16:19:49 INFO  o.o.i.t.m.i.MqttBrokerConnection[:106]- Starting MQTT broker connection 'mosquitto'
2014-04-29 16:19:49 DEBUG o.o.i.t.m.i.MqttBrokerConnection[:277]- Creating new client for 'tcp://127.0.0.1:1883' using id 'openhab' and file store '/tmp/mosquitto'
2014-04-29 16:20:06 DEBUG o.o.b.m.i.MqttitudeActivator[:33]- Mqttitude binding has been started.
2014-04-29 16:20:06 DEBUG o.o.b.m.i.MqttitudeBinding[:177]- Registering Mqttitude consumer for /owntracks/renee (on mosquitto)
2014-04-29 16:20:06 DEBUG o.o.i.t.m.i.MqttBrokerConnection[:440]- Starting message consumer for broker 'mosquitto' on topic '/owntracks/renee'
2014-04-29 16:20:06 DEBUG o.o.b.m.i.MqttitudeBinding[:177]- Registering Mqttitude consumer for /owntracks/ben (on mosquitto)
2014-04-29 16:20:06 DEBUG o.o.i.t.m.i.MqttBrokerConnection[:440]- Starting message consumer for broker 'mosquitto' on topic '/owntracks/ben'
2014-04-29 16:20:06 DEBUG o.o.b.m.i.MqttitudeBinding[:84]- Activating Mqttitude binding
2014-04-29 16:20:07 DEBUG o.o.b.m.internal.MqttActivator[:34]- MQTT binding has been started.
2014-04-29 16:20:07 DEBUG o.o.b.m.i.MqttItemConfig[:71]- Loaded MQTT config for item 'Alarm_MasterTime' : 1 subscribers, 0 publishers
2014-04-29 16:20:07 DEBUG o.o.i.t.m.i.MqttBrokerConnection[:440]- Starting message consumer for broker 'mosquitto' on topic '/openhab/alarm/master'
2014-04-29 16:20:07 ERROR o.o.i.t.m.i.MqttBrokerConnection[:497]- MQTT connection to broker was lost
org.eclipse.paho.client.mqttv3.MqttException: MqttException
        at org.eclipse.paho.client.mqttv3.internal.CommsCallback.run(CommsCallback.java:171)
        at java.lang.Thread.run(Thread.java:701)
Caused by: java.util.ConcurrentModificationException: null
        at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:782)
        at java.util.ArrayList$Itr.next(ArrayList.java:754)
        at org.openhab.io.transport.mqtt.internal.MqttBrokerConnection.messageArrived(MqttBrokerConnection.java:518)
        at org.eclipse.paho.client.mqttv3.internal.CommsCallback.handleMessage(CommsCallback.java:336)
        at org.eclipse.paho.client.mqttv3.internal.CommsCallback.run(CommsCallback.java:148)
        at java.lang.Thread.run(Thread.java:701)
2014-04-29 16:20:07 INFO  o.o.i.t.m.i.MqttBrokerConnection[:500]- Starting connection helper to periodically try restore connection to broker 'mosquitto'
2014-04-29 16:20:17 INFO  o.o.i.t.m.i.MqttBrokerConnection[:106]- Starting MQTT broker connection 'mosquitto'
2014-04-29 16:20:17 DEBUG o.o.i.t.m.i.MqttBrokerConnection[:440]- Starting message consumer for broker 'mosquitto' on topic '/owntracks/renee'
2014-04-29 16:20:17 DEBUG o.o.i.t.m.i.MqttBrokerConnection[:440]- Starting message consumer for broker 'mosquitto' on topic '/owntracks/ben'
2014-04-29 16:20:17 DEBUG o.o.i.t.m.i.MqttBrokerConnection[:440]- Starting message consumer for broker 'mosquitto' on topic '/openhab/alarm/master'
2014-04-29 16:20:17 DEBUG o.o.b.m.i.MqttitudeConsumer[:173]- PresenceBen_MqttHome has entered region home

The MQTT binding was reporting itself as running in OSGI console, but it was not responding to any messages. I had to manually stop/start the bundle in the OSGI console and it has burst into life.

I really do think this is something to do with the MQTT binding but I can't for the life of me figure out what it is.


Ben Jones

unread,
Apr 29, 2014, 1:51:05 AM4/29/14
to ope...@googlegroups.com
I am starting to think this is some kind of deadlock or race condition with the MQTT IO bundle. There is no thread synchronisation when accessing the list of consumers and producers. I think with my config containing many MQTT bindings plus MQTTITUDE bindings it might be causing a problem when starting up. 

I am going to make some changes to the MQTT IO bundle and add some synchronisation and see if that helps.

Stand by, might be an hour or two as I have to cook dinner right now!

Ben Jones

unread,
Apr 29, 2014, 6:10:26 AM4/29/14
to ope...@googlegroups.com
Ok - I have added some synchronisation into the MQTT IO bundle and I am seeing a vast improvement in system stability. Everything starts up immediately now, with the only remaining issue being the FileNotFoundExceptions when loading the rules. Aside from that everything is now humming along. I am therefore almost certain the issue causing stalling during startup was in the MQTT IO bundle.

I have attached a new version of the bundle - you will need to remove the version from your /server/plugins folder and copy this version into /addons.

Let me know if this helps anyone else out there that is suffering from startup/stability problems. 

If it looks like a solution I will create a pull request so it can be added to the CloudBees build asap.
org.openhab.io.transport.mqtt-1.5.0-SNAPSHOT.jar

dan cunningham

unread,
Apr 29, 2014, 11:39:23 AM4/29/14
to ope...@googlegroups.com
Nice work Ben!  I'll give it a try soon.  Are all your changes pushed to your github repo? 


--
You received this message because you are subscribed to the Google Groups "openhab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openhab+u...@googlegroups.com.

Ben Jones

unread,
Apr 29, 2014, 3:48:51 PM4/29/14
to ope...@googlegroups.com
Yep - under the mqtt-concurrency-issue branch.
Reply all
Reply to author
Forward
0 new messages