event loop code sometimes stops execution in TclOO method (concurrent variable access?)

EL

unread,

Jul 19, 2021, 3:22:05 AM7/19/21

to

Hello,

I have a somewhat strange behavior with file event code. Sometimes, but
not always, the event code just *stops* executing at a spurious place,
without raising an error. There is also no bgerror (which would be put
to stderr, so no special handling). I suspect it has to do with
concurrent access to a variable in the object, but I am not sure.

A rather simplified code example is below. For this code, A server runs
on localhost:1234 and sends messages in random time intervals to its
connected clients. Tcl version is 8.6.11 and platform Linux.

The issue is hard to reproduce, but it happens.
Did anyone else already have this? About the possibility of concurrent
access problems to variables: Can concurrent variable access take place
at all in event loop code? If so, how is this secured in the Tcl C code,
via some kind of mutex?

Any ideas welcome...

$ rlwrap tclsh8.6
% oo::class create test {
variable Sock
variable X

constructor {} {
my variable Sock
my variable X
}

method connect {} {
set Sock [socket -async localhost 1234]
fconfigure $Sock -translation auto -buffering line
fileevent $Sock readable [list [self] handleSocket $Sock]
}

method handleSocket {sock} {
set message ""
try {
gets $sock message
if {[eof $sock]} {
close $sock
}
# some processing that involves setting/getting X
my ProcessMsg $message
} trap {} {err status} {
puts "handleSocket raised Error: $err $status $message"
}
}

method ProcessMsg {msg} {
# some processing that involves setting/getting the value of
# variable X
}
}

% test create t
% t connect
% vwait forever

--
EL

Gerald Lester

unread,

Jul 19, 2021, 12:31:18 PM7/19/21

to

You can not count on a socket always having a full line. You have the
socket as blocking. Thus, if the line is split across blocks, you will
block (i.e. stop working) at the gets.

--
+----------------------------------------------------------------------+
| Gerald W. Lester, President, KNG Consulting LLC |
| Email: Gerald...@kng-consulting.net |
+----------------------------------------------------------------------+

EL

unread,

Jul 19, 2021, 2:02:04 PM7/19/21

to

On 19.07.2021 18:31, Gerald Lester wrote:

> You can not count on a socket always having a full line. You have the
> socket as blocking. Thus, if the line is split across blocks, you will
> block (i.e. stop working) at the gets.

Thanks for pointing that out. Hm, not sure whether I understand it
correctly from the man pages, so I try to explain:

[gets] will always try to get a line from the socket, with the eol
character determined by the -translation flag. But if for some reason
there is no eol character in the socket input, i.e. because the line is
longer than the socket buffer, [gets] will block and wait for more
input. If no input arrives, the whole program is blocked? Even the
current stack of the event loop code, that is executed in that moment?

I will try with [fconfigure $sock -blocking 0] ... it does apparently do
no harm in my code anyway.

Thanks again,

--
EL

Rich

unread,

Jul 19, 2021, 3:24:52 PM7/19/21

to

EL <e...@noreply.spam> wrote:
> On 19.07.2021 18:31, Gerald Lester wrote:
>
>> You can not count on a socket always having a full line. You have the
>> socket as blocking. Thus, if the line is split across blocks, you will
>> block (i.e. stop working) at the gets.
>
> Thanks for pointing that out. Hm, not sure whether I understand it
> correctly from the man pages, so I try to explain:
>
> [gets] will always try to get a line from the socket, with the eol
> character determined by the -translation flag. But if for some reason
> there is no eol character in the socket input, i.e. because the line is
> longer than the socket buffer, [gets] will block and wait for more
> input. If no input arrives, the whole program is blocked? Even the
> current stack of the event loop code, that is executed in that moment?

An individual Tcl interpreter is single threaded. When the interpreter
is running your Tcl code, it is not running the event loop.
Conversely, the event loop only runs when your Tcl code is not running.
So when you block in a [gets] call, you also block the event loop from
being executed as well.

> I will try with [fconfigure $sock -blocking 0] ... it does apparently do
> no harm in my code anyway.

You may have to handle gets returning to your code by signaling that
only a "partial line" was available -- so you will likely need to make
some code changes. Read the [gets] man page portions that address
non-blocking mode.

Robert Heller

unread,

Jul 19, 2021, 4:11:24 PM7/19/21

to

At Mon, 19 Jul 2021 19:24:48 -0000 (UTC) Rich <ri...@example.invalid> wrote:

>
> EL <e...@noreply.spam> wrote:
> > On 19.07.2021 18:31, Gerald Lester wrote:
> >

> >> You can not count on a socket always having a full line.Â You have the
> >> socket as blocking.Â Thus, if the line is split across blocks, you will

> >> block (i.e. stop working) at the gets.
> >
> > Thanks for pointing that out. Hm, not sure whether I understand it
> > correctly from the man pages, so I try to explain:
> >
> > [gets] will always try to get a line from the socket, with the eol
> > character determined by the -translation flag. But if for some reason
> > there is no eol character in the socket input, i.e. because the line is
> > longer than the socket buffer, [gets] will block and wait for more
> > input. If no input arrives, the whole program is blocked? Even the
> > current stack of the event loop code, that is executed in that moment?
>
> An individual Tcl interpreter is single threaded. When the interpreter
> is running your Tcl code, it is not running the event loop.
> Conversely, the event loop only runs when your Tcl code is not running.
> So when you block in a [gets] call, you also block the event loop from
> being executed as well.
>
> > I will try with [fconfigure $sock -blocking 0] ... it does apparently do
> > no harm in my code anyway.
>
> You may have to handle gets returning to your code by signaling that
> only a "partial line" was available -- so you will likely need to make
> some code changes. Read the [gets] man page portions that address
> non-blocking mode.

With sockets and line oriented I/O, you need several things:

If the sending side is also Tcl, you may need to add a flush call.

You need ALL of the proper fconfigure settings. This is what works reliably
for one of my applications:

fconfigure $socket_ -blocking 0 -buffering line -translation lf

I don't know what the defauly fconfigure settings are for sockets. Might NOT
be -buffering line -- I always set the buffering mode, just to be sure.

This is the "client" side where the "server" is a C++ program that implements
a simple CLI over a Tcp/Ip socket. My Tcl program (the client) is using
"fileevent $socket_ readable ..." on the socket and implements a write queue
on the write side -- in my case, every write by the client is replied to by
the server -- eg every write is one line and results in a one line response,
either an answer or an ack message. The fileevent proc does exactly one gets.

>
>
>
>

--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
hel...@deepsoft.com -- Webhosting Services

EL

unread,

Jul 19, 2021, 4:44:05 PM7/19/21

to

On 19.07.2021 21:24, Rich wrote:

> An individual Tcl interpreter is single threaded. When the interpreter
> is running your Tcl code, it is not running the event loop.
> Conversely, the event loop only runs when your Tcl code is not running.
> So when you block in a [gets] call, you also block the event loop from
> being executed as well.

Okay.. now that part makes sense to me.

> You may have to handle gets returning to your code by signaling that
> only a "partial line" was available -- so you will likely need to make
> some code changes. Read the [gets] man page portions that address
> non-blocking mode.

I suspected it, but I don't fully get it, I am afraid. From the man page
(http://tcl.tk/man/tcl8.6/TclCmd/gets.htm) I read:

"If channelId is in non-blocking mode and there is not a full line of
input available, the command returns an empty string and does not
consume any input. If varName is specified and an empty string is
returned in varName because of end-of-file or because of insufficient
data in non-blocking mode, then the return count is -1"

With non-blocking mode this means that [gets] returns -1 (with a varName
specified). But it returns -1 as well on EOF. To distinguish the cases,
I can use [eof] and [fblocked]. So far, so good.

But what can I do when [gets] doesn't get a full line of input and the
sock is [fblocked] consequently? Wait for more data to come? How can
this be achieved without blocking the event loop?
Actually the server is not under my control and I don't have any
influence on what it sends or when... I can only react. I could mock up
a server and test this case. But how to provoke it?

Actually the only way I can think of is something like this:

method connect {} {
set Sock [socket -async localhost 1234]

fconfigure $Sock -translation auto -buffering line -blocking 0

fileevent $Sock readable [list [self] handleSocket $Sock]
}

method handleSocket {sock} {
set message ""
try {
gets $sock message

if {![fblocked $sock]} {
# process message only when the sock is not blocked
my ProcessMsg $message
} elseif {[eof $sock]} {
close $sock

}
} trap {} {err status} {
puts "handleSocket raised Error: $err $status $message"
}
}

To process the message only if the socket is not blocked (in which case
it should be a complete line), and hope that it will be unblocked some when.

--
EL

heinrichmartin

unread,

Jul 20, 2021, 3:59:08 AM7/20/21

to

On Monday, July 19, 2021 at 10:44:05 PM UTC+2, EL wrote:
> On 19.07.2021 21:24, Rich wrote:
> > An individual Tcl interpreter is single threaded. When the interpreter
> > is running your Tcl code, it is not running the event loop.
> > Conversely, the event loop only runs when your Tcl code is not running.
> > So when you block in a [gets] call, you also block the event loop from
> > being executed as well.
> Okay.. now that part makes sense to me.

Just to prevent you from a pitfall that I had learned the hard way:

On Tuesday, February 17, 2015 at 5:11:14 PM UTC+1, Rich wrote:
> heinrichmartin wrote:
>
> > So Tcl does not have "the event loop", but stacked event loops, and
> > update does not enter "the event loop", but "the current event loop",
> > which was new to me ...
>
> > I can see two improvements to vwait's doc now:
>
> > - This command enters the Tcl event loop to process events, ...
> > + This command enters a newly created Tcl event loop to process events, ...
>
> > - In some cases the vwait command may not return immediately after
> > - varName is set. This happens if the event handler that sets varName
> > - does not complete immediately.
> > + vwait does not return immediately after varName is set, but only
> > + after all active event handlers have completed.
>
> Both are already documented, just not adjacent to those two sections,
> but a bit further within the man page:
>
> man vwait:
>
> ...
> To be clear, multiple vwait calls will nest and will not happen in
> parallel. The outermost call to vwait will not return until all
> the inner ones do.
> ...

Not even [update] in a callback can unblock [vwait] that allowed that callback.

luocl

unread,

Jul 20, 2021, 4:26:11 AM7/20/21

to

On 7/20/21 3:24 AM, Rich wrote:
>
> An individual Tcl interpreter is single threaded. When the interpreter
> is running your Tcl code, it is not running the event loop.
> Conversely, the event loop only runs when your Tcl code is not running.
> So when you block in a [gets] call, you also block the event loop from
> being executed as well.
>
>> I will try with [fconfigure $sock -blocking 0] ... it does apparently do
>> no harm in my code anyway.
>
> You may have to handle gets returning to your code by signaling that
> only a "partial line" was available -- so you will likely need to make
> some code changes. Read the [gets] man page portions that address
> non-blocking mode.
>
>

I encounter the problem before, I think the coroutine::util package of
Tcllib is born for this. the [coroutine::util gets] can be blocked, but
the event loop going on.

EL

unread,

Jul 20, 2021, 5:22:05 AM7/20/21

to

On 20.07.2021 09:59, heinrichmartin wrote:

> Just to prevent you from a pitfall that I had learned the hard way:
>
> On Tuesday, February 17, 2015 at 5:11:14 PM UTC+1, Rich wrote:
>> heinrichmartin wrote:
>>
>>> So Tcl does not have "the event loop", but stacked event loops, and
>>> update does not enter "the event loop", but "the current event loop",
>>> which was new to me ...
>>
>>> I can see two improvements to vwait's doc now:
>>
>>> - This command enters the Tcl event loop to process events, ...
>>> + This command enters a newly created Tcl event loop to process events, ...
>>
>>> - In some cases the vwait command may not return immediately after
>>> - varName is set. This happens if the event handler that sets varName
>>> - does not complete immediately.
>>> + vwait does not return immediately after varName is set, but only
>>> + after all active event handlers have completed.
>>
>> Both are already documented, just not adjacent to those two sections,
>> but a bit further within the man page:
>>
>> man vwait:
>>
>> ...
>> To be clear, multiple vwait calls will nest and will not happen in
>> parallel. The outermost call to vwait will not return until all
>> the inner ones do.
>> ...
>
> Not even [update] in a callback can unblock [vwait] that allowed that callback.
>

Interesting indeed. Thanks a lot :)

--
EL

Emiliano Gavilán

unread,

Jul 20, 2021, 11:18:31 AM7/20/21

to

This is the minimal code I use for line based, async protocols (no TclOO, but
the idea is the same).

proc cback {fd} {
if {[chan gets $fd data] < 0} {
if {[chan eof $fd]} {
chan close $fd
}
return
}
# now process $data
}

set fd [open $whatever] ; # or [socket $whatever]
chan configure $fd -blocking 0 ; # and other channel options
chan event $fd readable [list [namespace which cback] $fd]
# vwait if not already running the event loop

No need to call [fblocked], Tcl I/O subsystem takes care of the details.

Regards
Emiliano

EL

unread,

Jul 20, 2021, 1:22:05 PM7/20/21

to

On 20.07.2021 17:18, Emiliano Gavilán wrote:

> This is the minimal code I use for line based, async protocols (no TclOO, but
> the idea is the same).
>
> proc cback {fd} {
> if {[chan gets $fd data] < 0} {
> if {[chan eof $fd]} {
> chan close $fd
> }
> return
> }
> # now process $data
> }
>
> set fd [open $whatever] ; # or [socket $whatever]
> chan configure $fd -blocking 0 ; # and other channel options
> chan event $fd readable [list [namespace which cback] $fd]
> # vwait if not already running the event loop
>
>
> No need to call [fblocked], Tcl I/O subsystem takes care of the details.

Ok, that is essentially the same as my code with [fblocked], since the
[gets] command returns -1 in both cases: eof and fblocked. You filter
out the eof case with [eof] and implicitly assume that the other case is
fblocked. I evaluate the fblocked case explicitly as well as the eof case.

The basic idea of both ways is to _skip_ the processing of the data
entirely, when the channel is blocked. The callback will be executed
again when the next chunk of data is available on the channel, and then
it might be unblocked again. So both ways will work.

I think I have a much clearer idea now about this topic :).

--
EL