I wrote a simple tcludp application, which listens for udp packets on
two udp ports.
All works fine.
But when closing the one listening port (since there was here no more
packets expected), the other listening port also does not receive
packets any more.
I use ActiveTcl8.5.3.0.286404-win32-ix86-threaded.exe and
udp108-win32.zip with Windows XP.
I think, this is a bug. I googled for it, but found no info.
Is this a (known) bug?
Vera
Try updating tcludp. I have version 109 here and don't have the bug.
-Alex
Hi again,
I looked for 1.0.9, but the latest version on sourceforge is 1.0.8. From
where can I get version 1.0.9 (Windows)?
--
Vera
On the ActiveState teapot:
$ teacup search udp
entity name version platform
------- ---- ------------------------- ---------------------
package udp 0.0.0.2007.11.22.23.21.11 source
package udp 0.0.0.2008.07.05.23.11.56 source
package udp 0.0.0.2008.07.06.23.12.31 source
package udp 1.0.9 aix-powerpc
package udp 1.0.9 hpux-parisc
package udp 1.0.9 linux-glibc2.2-ix86
package udp 1.0.9 linux-glibc2.3-ix86
package udp 1.0.9 linux-glibc2.3-x86_64
package udp 1.0.9 macosx-universal
package udp 1.0.9 solaris2.8-sparc
package udp 1.0.9 solaris2.10-ix86
package udp 1.0.9 win32-ix86
------- ---- ------------------------- ---------------------
12 entities found
-Alex
I tried tcludp 1.0.9 as suggested by Alexandre, but it has the bug, too.
(thank you, Alexandre for the hint, finding tcludp 1.0.9.)
I attach my test scripts.
Here is the answer of udpDoubleReceiver.tcl, when I start two sender
with some time shift.
-----------------------------------------------------
package udp has version: 1.0.9
Listening on udp port: 1300
Listening on udp port: 1310
AWAIT_EXIT = 2
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 1}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 2}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 3}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 4}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 1}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 5}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 2}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 6}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 3}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 7}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 4}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 8}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 5}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 9}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 6}
recv at 1300: 4 {exit}
udp socket at port 1300 closed
AWAIT_EXIT = 1
^C
------------------------------------------------
As you can see, after packet 6 the receiving finished also on the second
port 1310. Packets 7 to 9 and the "exit" package are lost.
BTW, when I disable receiving (script udpDoubleReceiver_disable.tcl),
all works fine.
------------------------------------------------
C:\user\vera\Projekte\test>tclsh85 udpDoubleReceiver_disable.tcl
package udp has version: 1.0.9
Listening on udp port: 1300
Listening on udp port: 1310
AWAIT_EXIT = 2
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 1}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 2}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 3}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 4}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 5}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 1}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 6}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 2}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 7}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 3}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 8}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 4}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 9}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 5}
recv at 1310: 4 {exit}
AWAIT_EXIT = 1
recv at 1310: 0 {}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 6}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 7}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 8}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 9}
recv at 1300: 4 {exit}
AWAIT_EXIT = 0
udp socket at port 1300 closed
udp socket at port 1310 closed
------------------------------------------------
Any suggestions?
Did anybody compile tcludp with VisualStudio2008 Express?
--
Vera
try:
while true {
vwait AWAIT_EXIT
puts "AWAIT_EXIT = $AWAIT_EXIT"
}
# you will drop out of the loop when
# no more events are possible
udp_close ...
OK. First, I cannot reproduce the bug here (XP SP2, ActiveTcl 8.5.3.0,
tcludp 109 from the teapot). Second, what happens is that your
fileevent handler blocks (since the "udp socket...closed" line doesn't
appear). This in turn explains why the event loop is no longer
responsive afterwards.
Now what is blocking the handler ? It may be [fconfigure] or [close].
Try sprinkling a few [puts] to investigate. But I'm still worried that
the bug is so elusive while I'm supposed to have the exact same
config ! The next step would be to compile Tcl in -g and attach to the
running process once it is locked...
-Alex
Hi Uwe,
with your suggestion it does also not work. Here is the output:
-------------------------------------------------------------------
C:\user\vera\Projekte\tcludp\test>tclsh85 udpDoubleReceiver.tcl
load debug version: 1
./udp1.0.9/udp1091g.dll
package udp has version: 1.0.9.1
Listening on udp port: 1300
Listening on udp port: 1310
AWAIT_EXIT = 2
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 1}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 2}
recv at 1300: 4 {exit}
udp socket at port 1300 closed
AWAIT_EXIT = 1
^C
-------------------------------------------------------------------
Meanwhile I built the debug version with Visual Studio 2008.
To use the makefile.vc was rather complicated, so I set up a native
VS2008 project and called it 1.0.9.1.
Up to now I try to understand, what happend.
I hope, I find the reason, when I debug.
BTW, I have installed XP SP3. Does it affect sockets?
--
Vera
I have installed XP SP3. Does it affect sockets?
Meanwhile I built the debug version with Visual Studio 2008.
To use the makefile.vc was rather complicated, so I set up a native
VS2008 project and called it 1.0.9.1. But this did not affect the bug :-(.
Up to now I try to understand, what happend.
My guess is, too, closing one port blocks/disables/removes the fileevent
handler for the residual port, for whatever reason.
I hope, I find the reason, when I debug.
BTW, the message "udp socket...closed" line did appear the first time
close was called, and "AWAIT_EXIT=1" comes. So the Tcl event handler is
still working and waits in the second "vwait AWAIT_EXIT".
--
Vera
Oops you're right, I misread the logs.
Just to be sure the event loop is okay, can you add a periodic task:
proc periodic {} {
puts PERIODIC
after 1000 periodic
}
periodic
Also, do you have access to an XP SP2 ? Can you try there ?
While the whole stuff does look like a bug in TclUDP's event source,
it's hard to link it to a Windows service pack...
-Alex
Are you sure it's not blocking in the [read] in your fileevent handler?
You haven't configured either channel as non-blocking and you don't
check for eof before or after reading. Closing the client channel may
trigger a readable fileevent on the server (it does for TCP sockets, not
sure if/why it would do the same for UDP), so you should always be
prepared for eof there, otherwise your [read] will block indefinitely
(as the channel is in blocking mode).
My advice would be to [fconfigure $srv -blocking 0] in udp_listen and
then to change your udpEventHandler to have:
set pkt [read $sock]
...
if {[eof $sock] || $pkt eq "exit"} {
udp_close $sock
...
}
Note that the [fileevent $sock readable] does nothing -- if you want to
remove the handler then you need [fileevent $sock readable {}], but this
is redundant as [close] will do that anyway.
-- Neil
There's no notion of EOF in UDP.
Moreover, Tcl-UDP is documented to produce short-reads even in
blocking mode (see example in the package's documentation).
So *if* it is blocking in [read] in the second handler, it is a bug.
-Alex
Meanwhile I am one step forward.
After closing the first port, the Socket Thread blocks in
WaitForSingleObject(waitSockRead, INFINITE);
since it has the notification of an pending (empty?) packet after exit
(port 1300 has: statePtr->packetNum > 0, but the packet is not read yet).
The worker thread awaits an
statePtr->packetNum > 0
before it sends the event
SetEvent(waitSockRead);
But after close, port 1300 is deleted from list and there is NO port
with "statePtr->packetNum > 0" left and so the worker thread does not
send the event and the socket thread waits forever.
Unfortunately only the socket thread set "statePtr->packetNum".
So far, so complicate.
Any idea, where the pending packet after "exit" can come from?
Or how to overcome the dilemma?
Tomorrow I will try XP SP2 with little hope.
--
Vera
It looks like you're much more acquainted with the code in TclUDP than
I am :-}
Two possible sequels
(1) you help me catch up by setting the proper compile flags so that
UDPTRACE() logs something
(2) you go the extensions' maintainer with what looks like a clean
bug report
> Tomorrow I will try XP SP2 with little hope.
Little hope ? No ! Bringing the bug in sync between the two of us can
only help ;-)
-Alex
Update : now I do reproduce the behavior :-)
The reason I overlooked it first, is that the sender does take an
active part in exhibiting the bug...
(So far I had tried only your receiver, doing the send by hand).
As it turns out, the sequence
puts -nonewline $s exit;flush $s;close $s
when executed at full speed (ie *not* in three separate interactive
lines), produces the bug.
This is interesting. The dependency on speed of sequence indicates
some kind of race condition.
Now what exactly goes over the wire in that case ? Unfortunately
Windows is unable to snoop on its loopback interface, so Wireshark is
no help here. Can you check with two separate PCs, over a true network
interface ?
If I were to guess, I'd say something like a zero-length UDP packet...
but who knows ?
-Alex
Update 2: I think I have a better characterization of the behavior
now.
It is indeed linked with zero-sized UDP packets.
I have found a deterministic way of producing them (as witnessed by
Wireshark when sending over the LAN):
% set s [udp_open]
% fconfigure $s -remote {somemachine 3000} -buffering none
% puts -nonewline $s titi
% puts -nonewline $s titi
% puts -nonewline $s titi
% close $s
This sends 3 normal UDP packets + 1 zero-sized.
Notice that all this is at interactive speed (no race condition), and
that three 'titi' are needed (two are not enough). Now combined with
your receiver, sending an "exit" packet then a zero-sized one, this
locks the receiving code, somewhere inside the intricate tcludp
machinery,
Several observations:
(1) The speed of the sending sequence has no effect (contrarily to my
first impression)
(2) The zero-sized packet wreaks havoc in tcludp only if it comes on
a closed socket (your case)
(3) Tcludp is not supposed to send zero-sized UDP packets
(4) However it should handle them gracefully, because some other
source can send them.
So I think we have at least two bugs, one on the write side (producing
zero-sized packets) and one on the read side (locking an internal
thread when these beasts come at the wrong time).
I think you should forward this to the maintainer.
-Alex
Neil,
I tried [fconfigure $srv -blocking 0], but it does not make any difference.
And thank you for the correction, I accidentally deleted the {} in
fileevent $sock readable {}. But no difference either.
The packet "exit" shall simulate an eof on logical layer, because IMHO
udp (other than tcp?) has no eof or other notification, that the sender
closes the sending port.
You did observe an empty packet on tcp, too? Interesting, such empty
package seems to make the trouble. See Alexandres postings yesterday.
Is there an unknown-feature/bug on tcl's close?
--
Vera
Congratulation Alex, great work.
Can you, please, send me the method, NOT to get the bug?
Your handcrafted sender, you used at first?
Since you can reproduce the bug, I must not longer search for an PC not
updated to XP SP3.
--
Vera
Alex,
I can reproduce the behavior, 2 packets will work, 3 not.
Here is the working output:
----------------------------------------------------------------
C:\user\vera\Projekte\tcludp\test>tclsh85 udpDoubleReceiver.tcl
package udp has version: 1.0.9
Listening on udp port: 1300
Listening on udp port: 1310
AWAIT_EXIT = 2
recv at 1300: 4 {titi}
recv at 1300: 4 {exit}
udp socket at port 1300 closed
AWAIT_EXIT = 1
recv at 1310: 3 {bla}
recv at 1310: 4 {exit}
udp socket at port 1310 closed
AWAIT_EXIT = 0
C:\user\vera\Projekte\tcludp\test>
----------------------------------------------------------------
and here the not working:
----------------------------------------------------------------
C:\user\vera\Projekte\tcludp\test>tclsh85 udpDoubleReceiver.tcl
package udp has version: 1.0.9
Listening on udp port: 1300
Listening on udp port: 1310
AWAIT_EXIT = 2
recv at 1300: 4 {titi}
recv at 1300: 4 {tata}
recv at 1300: 4 {exit}
udp socket at port 1300 closed
AWAIT_EXIT = 1
^CBatchvorgang abbrechen (J/N)? j
----------------------------------------------------------------
May be this helps me finding the bug.
--
Vera
Oh, you mean a workaround, assuming the maintainers don't fix it
quickly ?
Well, avoiding [close] is certainly a method, but its applicability
depends on external factors like the use of dynamic vs. fixed ports,
the app's lifecycle, etc.
> Since you can reproduce the bug, I must not longer search for an PC not
> updated to XP SP3.
Indeed :-)
Instead you should chase tcludp's author or maintainer community !
-Alex
Maybe. But if I were in your seat I would just forward the above
analysis (zero-length UDP) to the author. Fixing two bugs in an open-
source package is great fun, but the amount of reverse-engineering
required upfront makes me shy. Especially if the author is still
around. Is he ?
-Alex
Correct. EOF is a stream-oriented concept (TCP, pipes, files,
devices). UDP packets are just independent datagrams.
> You did observe an empty packet on tcp, too?
No. "Empty" packets do occur in TCP, but they are normal, since they
carry acknowledge and flow control information, and they certainly
don't bubble up to the client program.
Moreover, TCP sockets are built into Tcl, while UDP is in an
extension. Different numbers of users, and of maintainers. That
explains why one is more polished than the other ;-)
> Is there an unknown-feature/bug on tcl's close?
The problem is in TclUDP's channeltype-dependent closeProc and
fileevent handling code. Not in the generic channel layer's [close].
That why I keep suggesting you to turn to the author ;-)
-Alex
Hi Alex,
I need no workaround any more, since I have already successfully
finished the project, where the bug originally comes out :-). My
workaround - let the ports open and disable reading - was sufficient.
But hey, its too funny to search for the bug. I have some time this
days, and it is better than housekeeping.
(Better to find bugs in software than in bed!)
I found already one bug in the source, the oldest C bug ever (buffersize
has not always place for terminating null byte).
And I found one race condition:
With "after 10", i.e. 10 ms time between "exit"-package and close on
SENDER, it works, even for more than 2 packets. Even 1ms seems to be
enough. Strange.
If I would need a workaround, I would prefer that ;-).
BTW, NOT closing the port on SENDER does also work - second workaround!
Can you confirm this observation?
We have UDP, how can a closed sender port affect the receiver?
I thought, this is TCP stuff.
Now I tried out also Neil's suggestion, and with the eof, it works!
I can't believe. I need no packet "exit", since closing the sending port
triggers eof. Try it out!
----------------------------------------------------------------------
C:\user\vera\Projekte\tcludp\test>tclsh85 udpDoubleReceiver.tcl
package udp has version: 1.0.9
fconfigure sock1884: -blocking 0 -buffering none -buffersize 4096
-encoding binary -eofchar {{} {}} -translation {lf lf} -myport 1300
-remote {{} 0} -peer {{} 0} -mcastgroups {} -mcastloop 1 -broadcast 0
-ttl 32
Listening on udp port: 1300
fconfigure sock1836: -blocking 0 -buffering none -buffersize 4096
-encoding binary -eofchar {{} {}} -translation {lf lf} -myport 1310
-remote {{} 0} -peer {{} 0} -mcastgroups {} -mcastloop 1 -broadcast 0
-ttl 32
Listening on udp port: 1310
AWAIT_EXIT = 2
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 1}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 2}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 3}
recv at 1300: 27 {to 127.0.0.1 1300: PACKET 4}
recv at 1300: 0 {}
udp socket at port 1300 closed
AWAIT_EXIT = 1
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 1}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 2}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 3}
recv at 1310: 27 {to 127.0.0.1 1310: PACKET 4}
recv at 1310: 0 {}
udp socket at port 1310 closed
AWAIT_EXIT = 0
C:\user\vera\Projekte\tcludp\test>
----------------------------------------------------------------------
Here you can see the empty packets in full beauty.
Configured with no-blocking, no "exit"-package, just closing socket.
Now we have at least 4 ways to overcome the problem.
But the last tree let me be perplex.
--
Vera
Oh, so should I assume you won't forward the bugreport ?..
> And I found one race condition:
> With "after 10", i.e. 10 ms time between "exit"-package and close on
> SENDER, it works, even for more than 2 packets. Even 1ms seems to be
> enough. Strange.
> If I would need a workaround, I would prefer that ;-).
Clearly all this is a race condition, a lack of proper sync between
the various threads. Otherwise a zero-UDP coming after the close
wouldn't do any harm !
> BTW, NOT closing the port on SENDER does also work - second workaround!
> Can you confirm this observation?
> We have UDP, how can a closed sender port affect the receiver?
Because, as already explained, the origin of all this is the sender
sending zero-sized UDPs.
That's how causality jumps over the canyon.
And again we have two bugs: (1) sending them is a bug and (2) choking
on receiving them is also a bug.
> Now I tried out also Neil's suggestion, and with the eof, it works!
> I can't believe. I need no packet "exit", since closing the sending port
> triggers eof. Try it out!
Yes, and it's an accidental consequence of the bugs above. A kind of
"invented UDP EOF" based on zero-sized packets. Except it is nowhere
specified in the standard ! (and not robust at that, since the zero-
sized packets are not reliably sent).
> Now we have at least 4 ways to overcome the problem.
> But the last tree let me be perplex.
Don't think about the millions of workarounds, think about the bugs
and how to fix them for good !
-Alex
Update.
Meanwhile I tried the whole suggestion, it works, even without an
"exit"-package. Strange.
I could not believe, that closing a port on SENDER affects the RECEIVERS
behavior on UDP.
I thought, this is TCP.
Thank you, Neil.
--
Vera
Ok, I did so. It is
https://sourceforge.net/tracker2/?func=detail&aid=2221831&group_id=75201&atid=543222
It is my first bug report on sourceforge. I hope, I do it right.
--
Vera
Right -- but this is UDP being shoe-horned into Tcl's stream-oriented
channel abstraction. I wouldn't assume that EOF isn't generated
somewhere -- e.g. testing here (Mac OS X 10.4.11 Intel, Tcl 8.6a4,
TclUDP 1.0.9) shows that [eof $s] on server does indeed return true
after closing the remote socket. I can reliably reproduce this here so
long as at least 1 byte was sent over the channel to the server. Doesn't
seem to be any timing dependency that I can see. Indeed the behaviour of
(Tcl) UDP sockets seems to be identical to TCP sockets in this respect:
the only difference being that TCP sockets also generate EOF when no
data has been transmitted.
>> You did observe an empty packet on tcp, too?
Not an empty packet. Tcl's channel abstraction will generate a
[fileevent readable] event when it detects EOF on a channel, so that you
can cleanup. In this case, there is nothing to read on the channel
(typically). This is why you'd typically structure a read-event callback as:
set msg [read $sock]
if {[eof $sock]} { ... cleanup ... }
Tcl will keep generating readable events on the socket until you close it.
>
> No. "Empty" packets do occur in TCP, but they are normal, since they
> carry acknowledge and flow control information, and they certainly
> don't bubble up to the client program.
Are you sure any actual packet is being transmitted here?
>
> Moreover, TCP sockets are built into Tcl, while UDP is in an
> extension. Different numbers of users, and of maintainers. That
> explains why one is more polished than the other ;-)
>
>> Is there an unknown-feature/bug on tcl's close?
>
> The problem is in TclUDP's channeltype-dependent closeProc and
> fileevent handling code. Not in the generic channel layer's [close].
> That why I keep suggesting you to turn to the author ;-)
The maintainer is Pat Thoyts I believe.
-- Neil
Yes, it is a side-effect of the zero-sized packet. All very well, but
not part of RFC 768. So it is a pathological EOF.
> Indeed the behaviour of
> (Tcl) UDP sockets seems to be identical to TCP sockets in this respect:
> the only difference being that TCP sockets also generate EOF when no
> data has been transmitted.
Again, that's just a property of one buggy implementation, not one of
the generic UDP concept. For example, at C level if you just do
send...send..close(), you will never receive any EOF nor even zero-
sized packet on the other side.
> Are you sure any actual packet is being transmitted here?
Wireshark is not cheating. The zero-sized packet is directly
correlated with that mutant EOF.
> The maintainer is Pat Thoyts I believe.
Thanks for the info Neil. Will try to contact him.
-Alex