Ry,
Ok. I think I now understand the issue quite a bit better.
The timer.remove(socket) method does not properly reset the state of
the socket.[next/prev], such that if client.destroy() [which in turn
calls timer.unenroll(socket)] is called, then the socket is removed
from the list again (b/c socket.next is still set).
if this occurs immediately (ie called from within timeout callback)
the double remove is a no-op b/c the next/prev pointers are still in
the same place as before. However, if the client.destroy() is deferred
(using setTimeout etc) then the list order can change due to socket
activity which would move active sockets to the end of the list. Thus
there is an arbitrary shuffling of the order of the list before
timeout.unenroll() is called. This would cause the second pass of the
removal logic to corrupt the linked list. The nature of the corruption
is a function of the shuffling that occurred; it could be a loop, it
could orphan part of the list but still look valid etc.
The solution (i think) is to change the logic of remove() to include:
socket._idlePrev = socket._idleNext = socket; (or null if you prefer)
and to call remove from unenroll and other places where removal is the
intention, or to add such state reset logic to all places that are
inlining the remove() code.
This is a pretty major issue. I know it has been addressed in 3.6 (by
code inspection), but i think it should be fixed in the 2.x branch in
preparation for 2.7 etc.
the fix is fairly simple and does not require the full _linkedlist
module lift (tho that would be cool too).
agree?
-w