Broken bus or deadlock?

300 views
Skip to first unread message

Austin Hendrix

unread,
Mar 16, 2015, 3:25:58 AM3/16/15
to home-assi...@googlegroups.com
When I leave home-assistant running for more than a few hours, it eventually gets hung up somewhere and stops processing events. There’s a lot of data in the logs, but this bit seems to be most relevant:

INFO:homeassistant.components.device_tracker.nmap_tracker:nmap scan successful
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=device_tracke
r.william, new_state=<state device_tracker.william=not_home; friendly_name=Willi
am @ 20:37:17 15-03-2015>, old_state=<state device_tracker.william=home; friendl
y_name=William @ 20:32:53 15-03-2015>>
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=group.all_dev
ices, new_state=<state group.all_devices=not_home; entity_id=('device_tracker.bi
nney', 'device_tracker.austin', 'device_tracker.william', 'device_tracker.katrin
a'), friendly_name=all devices, auto=True @ 20:37:17 15-03-2015>, old_state=<sta
te group.all_devices=home; entity_id=('device_tracker.binney', 'device_tracker.a
ustin', 'device_tracker.william', 'device_tracker.katrina'), friendly_name=all d
evices, auto=True @ 20:32:53 15-03-2015>>
INFO:homeassistant.components.device_sun_light_trigger:Everyone has left but the
re are lights on. Turning them off
INFO:homeassistant.components.device_sun_light_trigger:Everyone has left but the
re are lights on. Turning them off
INFO:homeassistant:Bus:Handling <Event call_service[L]: domain=light, entity_id=
['light.project_room', 'light.project_desk'], service_call_id=140459272728192-23
, service=turn_off>
INFO:homeassistant:Bus:Handling <Event call_service[L]: domain=light, entity_id=
['light.living_room_1', 'light.living_room_2', 'light.entry'], service_call_id=1
40459272728192-24, service=turn_off>
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=light.project
_room, new_state=<state light.project_room=off; friendly_name=Project Room @ 20:
37:18 15-03-2015>, old_state=<state light.project_room=on; friendly_name=Project
Room, brightness=144, xy_color=[0.5113, 0.4145] @ 20:32:53 15-03-2015>>
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=light.project
_desk, new_state=<state light.project_desk=off; friendly_name=Project desk @ 20:
37:18 15-03-2015>, old_state=<state light.project_desk=on; friendly_name=Project
desk, brightness=144, xy_color=[0.5119, 0.4147] @ 20:32:53 15-03-2015>>
INFO:homeassistant:Bus:Handling <Event service_executed[L]: service_call_id=1404
59272728192-23>
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=group.project
_group, new_state=<state group.project_group=off; entity_id=('light.project_room
', 'light.project_desk'), friendly_name=project_group, auto=False @ 20:37:18 15-
03-2015>, old_state=<state group.project_group=on; entity_id=('light.project_roo
m', 'light.project_desk'), friendly_name=project_group, auto=False @ 20:32:53 15
-03-2015>>
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=light.living_
room_1, new_state=<state light.living_room_1=off; friendly_name=Living Room 1 @
20:37:18 15-03-2015>, old_state=<state light.living_room_1=on; friendly_name=Liv
ing Room 1, brightness=144, xy_color=[0.5113, 0.4145] @ 20:32:53 15-03-2015>>
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=light.living_
room_2, new_state=<state light.living_room_2=off; friendly_name=Living Room 2 @
20:37:18 15-03-2015>, old_state=<state light.living_room_2=on; friendly_name=Liv
ing Room 2, brightness=144, xy_color=[0.5119, 0.4147] @ 20:32:53 15-03-2015>>
INFO:homeassistant:Bus:Handling <Event service_executed[L]: service_call_id=1404
59272728192-24>
INFO:homeassistant.components.device_tracker.nmap_tracker:Scanning
INFO:homeassistant.components.device_tracker.nmap_tracker:No MAC address found f
or 192.168.1.4
INFO:homeassistant.components.device_tracker.nmap_tracker:nmap scan successful
INFO:homeassistant.components.light:Updating light states
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=light.entry,
new_state=<state light.entry=off; friendly_name=Entry @ 20:37:31 15-03-2015>, old_state=<state light.entry=on; friendly_name=Entry, brightness=144, xy_color=[0.5119, 0.4147] @ 20:33:01 15-03-2015>>
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=group.all_lights, new_state=<state group.all_lights=off; entity_id=('light.entry', 'light.bed_right', 'light.living_room_2', 'light.project_desk', 'light.dead', 'light.bed', 'light.reading_lamp', 'light.bed_left', 'light.project_room', 'light.living_room_1'), friendly_name=all lights, auto=True @ 20:37:31 15-03-2015>, old_state=<state group.all_lights=on; entity_id=('light.entry', 'light.bed_right', 'light.living_room_2', 'light.project_desk', 'light.dead', 'light.bed', 'light.reading_lamp', 'light.bed_left', 'light.project_room', 'light.living_room_1'), friendly_name=all lights, auto=True @ 20:32:53 15-03-2015>>
INFO:homeassistant:Bus:Handling <Event state_changed[L]: entity_id=group.living_room, new_state=<state group.living_room=off; entity_id=('light.living_room_1', 'light.living_room_2', 'light.entry'), friendly_name=living_room, auto=False @ 20:38:39 15-03-2015>, old_state=<state group.living_room=on; entity_id=('light.living_room_1', 'light.living_room_2', 'light.entry'), friendly_name=living_room, auto=False @ 20:32:53 15-03-2015>>
INFO:homeassistant.components.device_tracker.nmap_tracker:Scanning
INFO:homeassistant.components.light:Updating light states
INFO:homeassistant.components.light:Updating light states
INFO:homeassistant.components.api:Found broken event stream to 192.168.1.112, cleaning up
ERROR:homeassistant:BusHandler:Exception doing job
Traceback (most recent call last):
File "/home/hendrix/home-assistant/homeassistant/__init__.py", line 275, in job_handler
func(arg)
File "/home/hendrix/home-assistant/homeassistant/components/api.py", line 120, in forward_events
write_message(json.dumps(event, cls=rem.JSONEncoder))
File "/home/hendrix/home-assistant/homeassistant/components/api.py", line 104, in write_message
wfile.write(msg.encode("UTF-8"))
File "/usr/lib/python3.4/socket.py", line 388, in write
self._checkClosed()
ValueError: I/O operation on closed file.
INFO:homeassistant.components.light:Updating light states
WARNING:homeassistant:WorkerPool:All 9 threads are busy and 1729 jobs pending
WARNING:homeassistant:WorkerPool:Current job from 20:38:39 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:38:36 15-03-2015>)
WARNING:homeassistant:WorkerPool:Current job from 20:38:39 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:38:24 15-03-2015>)
WARNING:homeassistant:WorkerPool:Current job from 20:38:39 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:38:12 15-03-2015>)
WARNING:homeassistant:WorkerPool:Current job from 20:38:39 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:38:00 15-03-2015>)
WARNING:homeassistant:WorkerPool:Current job from 20:38:39 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:37:48 15-03-2015>)
WARNING:homeassistant:WorkerPool:Current job from 20:38:39 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:37:36 15-03-2015>)
WARNING:homeassistant:WorkerPool:Current job from 20:38:48 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:38:48 15-03-2015>)
WARNING:homeassistant:WorkerPool:Current job from 20:39:00 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:39:00 15-03-2015>)
WARNING:homeassistant:WorkerPool:Current job from 20:39:12 15-03-2015: (<function DeviceTracker.__init__.<locals>.update_device_state at 0x7fbf33616c80>, <Event time_changed[L]: now=20:39:12 15-03-2015>)



My analysis is that there is a bunch of activity, triggered when all of my roommates leave, which triggers a deadlock, crash or broken pipe somewhere, and the event queue stops processing events.

This could be a deadlock because the socket for the event bus is closed, or it could be a deadlock in the DeviceTracker or the nmap device tracker.

Has anyone else seen an issue like this?

Thanks,
-Austin

signature.asc

Paulus Schoutsen

unread,
Mar 16, 2015, 4:58:31 PM3/16/15
to home-assi...@googlegroups.com
This used to be a bug in the homeassistant.util.Throttle code. It would stack up calls that are Throttled instead of returning. This bug should have been fixed jan 5.

I see in the log the message
INFO:homeassistant.components.api:Found broken event stream to 192.168.1.112, cleaning up
So I'm assuming that you are using a recent version that includes this fix. 

It looks like all tasks that are queued up are calls for the device tracker. So I think it should be related to util.Throttle, device_tracker component or the nmap scanner platform.

The nmap tracker does have it's own lock but Throttle should stop it from ever having this code run twice. 




--
You received this message because you are subscribed to the Google Groups "Home Assistant Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to home-assistant-...@googlegroups.com.
To post to this group, send email to home-assi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/home-assistant-dev/33380BAB-A64B-4C2E-8209-88DE53D7A66D%40gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
It's nice to be important but it's more important to be nice.

James Cole

unread,
Mar 22, 2015, 10:40:44 PM3/22/15
to home-assi...@googlegroups.com
There's definitely still something going on with nmap tracker, if I leave it running overnight it seems to bank up a ton of requests and uses a lot of resources.  I'm not at home so I can't post my logs at the moment but it definitely looks like something to do with throttling.  The issue  only starts becoming apparent after running for 6 hours or so.
To unsubscribe from this group and stop receiving emails from it, send an email to home-assistant-dev+unsub...@googlegroups.com.

Andrew Thigpen

unread,
Mar 25, 2015, 9:30:21 AM3/25/15
to home-assi...@googlegroups.com
I can also confirm that this is an issue.  I've been trying to track down the cause, but unfortunately I can't reliably reproduce it yet.  It's happened twice now in the past few weeks where I've arrived home at the end of the day and realized my lights never turned off when I left in the morning.  I'm also using the nmap tracker.

Magnus Hacker

unread,
Mar 25, 2015, 11:49:45 AM3/25/15
to home-assi...@googlegroups.com
I noticed problems yesterday when I started using the nmap scanner, but I have a different error - an exception was raised somewhere inside the python threading library.

I can check later today for the exact error message.

Magnus Hacker

unread,
Mar 25, 2015, 2:17:49 PM3/25/15
to home-assi...@googlegroups.com
INFO:homeassistant.components.device_tracker.nmap_tracker:Scanning
Exception ignored in: <module 'threading' from '/usr/lib/python3.4/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 1353, in _after_fork
    thread._stop()
TypeError: 'Event' object is not callable

This is with python 3.4.2 from raspbian jessie. I'm guessing this is a raspbian problem rather than a HA problem.

Andrew Thigpen

unread,
Mar 25, 2015, 2:33:39 PM3/25/15
to home-assi...@googlegroups.com

I think you're running an older version of the code.  I fixed this particular issue a while back: https://github.com/balloob/home-assistant/pull/35

Unfortunately, I don't think it's related to the deadlock.

-Andrew

--
You received this message because you are subscribed to a topic in the Google Groups "Home Assistant Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/home-assistant-dev/hKcH32MzQvs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to home-assistant-...@googlegroups.com.

To post to this group, send email to home-assi...@googlegroups.com.

Paulus Schoutsen

unread,
Mar 26, 2015, 1:36:04 AM3/26/15
to home-assi...@googlegroups.com
I think the offending line that blocks is the lock.acquire() on line 160 of device_tracker/__init__.py (the beginning of the update_devices call).

To prevent the problem from happening we can change that line to be:
if not lock.acquire(False):
  return

This code will not wait if we are updating but returns. This is a fix that will help prevent this problem and future similar problems. And we should probably add it.

It, however, does not fix the problem that the nmap_tracker gets stuck. So our solution would avoid the bus to get broken but keep the device tracker in a broken state.

nmap has a bunch of options we can set that might help in it getting stuck. Most interesting category is TIMING AND PERFORMANCE and especially this option:
  --host-timeout <time>: Give up on target after this long

Anyone has some more experience with nmap?

Paulus

--
You received this message because you are subscribed to the Google Groups "Home Assistant Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to home-assistant-...@googlegroups.com.

To post to this group, send email to home-assi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Paulus Schoutsen

unread,
Mar 26, 2015, 1:56:27 AM3/26/15
to home-assi...@googlegroups.com
Converted my suggestions to some code:

https://github.com/balloob/home-assistant/pull/73

Magnus Hacker

unread,
Mar 26, 2015, 5:15:33 PM3/26/15
to home-assi...@googlegroups.com
Actually, I'm still getting that exception. I verified that I have the updated __init__.py:

pi@raspen ~/home-assistant/homeassistant $ grep _stop __init__.py
    def block_till_stopped(self):
        # Wait till all responses to homeassistant_stop are done
        self._stop_event = threading.Event()
                                  lambda event: self._stop_event.set())
        while not self._stop_event.isSet():

Is there a cache that I need to clear after running scripts/update?


On Wednesday, March 25, 2015 at 7:33:39 PM UTC+1, Andrew Thigpen wrote:

I think you're running an older version of the code.  I fixed this particular issue a while back: https://github.com/balloob/home-assistant/pull/35

Unfortunately, I don't think it's related to the deadlock.

-Andrew

On Mar 25, 2015 1:17 PM, "Magnus Hacker" <magnus...@gmail.com> wrote:
INFO:homeassistant.components.device_tracker.nmap_tracker:Scanning
Exception ignored in: <module 'threading' from '/usr/lib/python3.4/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 1353, in _after_fork
    thread._stop()
TypeError: 'Event' object is not callable

This is with python 3.4.2 from raspbian jessie. I'm guessing this is a raspbian problem rather than a HA problem.

On Wednesday, March 25, 2015 at 4:49:45 PM UTC+1, Magnus Hacker wrote:
I noticed problems yesterday when I started using the nmap scanner, but I have a different error - an exception was raised somewhere inside the python threading library.

I can check later today for the exact error message.

--
You received this message because you are subscribed to a topic in the Google Groups "Home Assistant Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/home-assistant-dev/hKcH32MzQvs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to home-assistant-dev+unsub...@googlegroups.com.

Paulus Schoutsen

unread,
Mar 26, 2015, 5:31:08 PM3/26/15
to home-assi...@googlegroups.com
Python 3 caches compiled versions in the folder __pycache__. Removing those should force Python 3 to compile new versions. 




--
You received this message because you are subscribed to the Google Groups "Home Assistant Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to home-assistant-...@googlegroups.com.

To post to this group, send email to home-assi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Paulus Schoutsen

unread,
Mar 26, 2015, 11:44:32 PM3/26/15
to home-assi...@googlegroups.com
I've merged the fixes into the dev branch. Please report back if it helps

Paulus

Andrew Thigpen

unread,
Mar 27, 2015, 9:22:38 AM3/27/15
to home-assi...@googlegroups.com

Thanks!  I'll try it out this weekend.

-Andrew

To unsubscribe from this group and all its topics, send an email to home-assistant-...@googlegroups.com.

To post to this group, send email to home-assi...@googlegroups.com.

Austin Hendrix

unread,
Mar 29, 2015, 2:49:34 PM3/29/15
to home-assi...@googlegroups.com
I gave the updated version a try this weekend. It looks like the error messages are gone and the event queue is no longer stacking up events, but the underlying issue is still there.

It looks like the nmap device tracker gets stuck sometimes; I’ll try to find a bit of time to debug it further.

-Austin


For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "Home Assistant Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to home-assistant-...@googlegroups.com.
To post to this group, send email to home-assi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/home-assistant-dev/5245ada3-ab6b-4b10-8cc6-1b30cb7130c8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
-- 
It's nice to be important but it's more important to be nice.



-- 
It's nice to be important but it's more important to be nice.

-- 
You received this message because you are subscribed to a topic in the Google Groups "Home Assistant Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/home-assistant-dev/hKcH32MzQvs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to home-assistant-...@googlegroups.com.
To post to this group, send email to home-assi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/home-assistant-dev/CAKRfqu0Ae0fmXA%3D25hM5-UuB-FNE1qg6v2uGbmA%2BoOXh1Tn2yA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "Home Assistant Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email tohome-assistant-...@googlegroups.com.
To post to this group, send email to home-assi...@googlegroups.com.
signature.asc

Andrew Thigpen

unread,
Apr 30, 2015, 10:26:19 PM4/30/15
to home-assi...@googlegroups.com
Ok, this just happened to my HA instance again, and I've collected as much info as I can.  Unfortunately, I don't have a debug version of python installed w/ symbols so I wasn't able to get an exact cause.  However, I think I can at least narrow it down.

I noticed that there were two instances of HA when listing the processes:
hass     12785     1  4 Apr27 ?        03:07:07 python3 -m homeassistant -c /srv/hass-config
hass      
5740 12785  0 09:40 ?        00:00:00 python3 -m homeassistant -c /srv/hass-config

One is forked from the other...so I guessed that was the deadlocked nmap process.  Attaching with gdb and inspecting the Python frames as best I could, it looks to be deadlocked somewhere in the run method in libnmap/process.py:237.  The last Python frame in the stack is in multiprocessing/process.py:228, the _bootstrap method of the BaseProcess class.  I don't know which line caused the deadlock.

I checked the libnmap project on Github for known issues and found this issue where the multiprocess library was removed.  Maybe that would resolve this?  I'm going to try upgrading to the latest to see if that works, but I thought I'd come back here and share what I found in case anyone else is still experiencing the same problem.

-Andrew
To unsubscribe from this group and all its topics, send an email to home-assistant-dev+unsubscri...@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "Home Assistant Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to home-assistant-dev+unsub...@googlegroups.com.

To post to this group, send email to home-assi...@googlegroups.com.
-- 
It's nice to be important but it's more important to be nice.



-- 
It's nice to be important but it's more important to be nice.

-- 
You received this message because you are subscribed to a topic in the Google Groups "Home Assistant Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/home-assistant-dev/hKcH32MzQvs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to home-assistant-dev+unsub...@googlegroups.com.
To post to this group, send email to home-assi...@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "Home Assistant Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email tohome-assistant-dev+unsub...@googlegroups.com.

Edwin Smulders

unread,
May 21, 2015, 1:17:46 PM5/21/15
to home-assi...@googlegroups.com
I posted a bug report to github before I found this thread. I encountered this problem on a fresh install of HA.
I tried python-libnmap from git and it does seem to fix it, however it currently contains a bytes to str conversion error in python 3.4. I'm sending a pull request to python-libnmap for that.

Edwin Smulders

unread,
May 21, 2015, 1:24:53 PM5/21/15
to home-assi...@googlegroups.com
I'm afraid I spoke just a bit too soon; my process listing shows nmap as [defunct] and HA hangs after that. It might require a bit more work.

Edwin Smulders

unread,
May 22, 2015, 4:35:02 AM5/22/15
to home-assi...@googlegroups.com
I made a pull request to https://github.com/savon-noir/python-libnmap for the problems. I hope they fix/include it soon and release a new version.
Reply all
Reply to author
Forward
0 new messages