Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Checking failures for external command execution

165 views
Skip to first unread message

Markus Elfring

unread,
May 10, 2019, 2:50:49 PM5/10/19
to
Hello,

I am trying to run a Python script (which works already) by the TCL command “exec”.
Unfortunately, I observe a test result like the following.

elfring@Sonne:~/Projekte/TCL> tclsh test-statistic-server4.tcl

exec failure NONE:

errorinfo:
Using Python version:
3.7.2 …

while executing
"exec -keepnewline /usr/bin/python3 /home/elfring/Projekte/Python/socket-send_json_data.py --server_id [lindex ${server} 0] …"


Why does this data processing approach fail?

Regards,
Markus

Rich

unread,
May 10, 2019, 3:07:39 PM5/10/19
to
Markus Elfring <Markus....@web.de> wrote:
> Hello,
>
> I am trying to run a Python script (which works already) by the TCL command ?exec?.
> Unfortunately, I observe a test result like the following.
>
> elfring@Sonne:~/Projekte/TCL> tclsh test-statistic-server4.tcl
> ?
> exec failure NONE:
> ?
> errorinfo:
> Using Python version:
> 3.7.2 ?
>
> while executing
> "exec -keepnewline /usr/bin/python3 /home/elfring/Projekte/Python/socket-send_json_data.py --server_id [lindex ${server} 0] ?"
>
>
> Why does this data processing approach fail?

Most likely this reason from the 'exec' man page:

-ignorestderr
Stops the exec command from treating the output of messages to the
pipeline's standard error channel as an error case.

...

If any of the commands writes to its standard error file and that
standard error is not redirected and -ignorestderr is not specified,
then exec will return an error; the error message will include the
pipeline's standard output, followed by messages about abnormal
terminations (if any), followed by the standard error output.

Markus....@web.de

unread,
May 10, 2019, 3:28:05 PM5/10/19
to
> Most likely this reason from the 'exec' man page:
>
> -ignorestderr
> Stops the exec command from treating the output of messages to the
> pipeline's standard error channel as an error case.

Thanks for the reminder.

Would you like to help also with any remaining software development challenges around the safe application of condition variables in TCL scripts?

Regards,
Markus

Robert Heller

unread,
May 10, 2019, 3:46:58 PM5/10/19
to
At Fri, 10 May 2019 11:50:46 -0700 (PDT) Markus Elfring <Markus....@web.de> wrote:

>
> Hello,
>
> I am trying to run a Python script (which works already) by the TCL command=
> =E2=80=9Cexec=E2=80=9D.
> Unfortunately, I observe a test result like the following.
>
> elfring@Sonne:~/Projekte/TCL> tclsh test-statistic-server4.tcl
> =E2=80=A6
> exec failure NONE:
> =E2=80=A6
> errorinfo:
> Using Python version:
> 3.7.2 =E2=80=A6
>
> while executing
> "exec -keepnewline /usr/bin/python3 /home/elfring/Projekte/Python/socket-s=
> end_json_data.py --server_id [lindex ${server} 0] =E2=80=A6"
>
>
> Why does this data processing approach fail?

Another possible issue would be if the Python returns an "error" exit code
(exits with something other than 0). Often people get lazy and are not
careful about what they pass to exit(). Most of the time this is in C/C++
programs that don't bother with an explicit return value from the main()
program. I suspect the the Python interpreter is itself careful, but a script
writer might exlicitly call exit with some random value...

>
> Regards,
> Markus
>

--
Robert Heller -- 978-544-6933
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
hel...@deepsoft.com -- Webhosting Services

walto...@gmail.com

unread,
May 11, 2019, 10:52:41 PM5/11/19
to
On Friday, May 10, 2019 at 2:28:05 PM UTC-5, Markus...@web.de wrote:
> Would you like to help also with any remaining software development challenges around the safe application of condition variables in TCL scripts?
>
> Regards,
> Markus

You may also want to look into using the 'open' command.

% set f [open "| date"]
file6
% read $f
Sun May 12 02:49:53 UTC 2019

Markus....@web.de

unread,
May 12, 2019, 3:23:38 AM5/12/19
to
> You may also want to look into using the 'open' command.

I became more interested in the clarification of unexpected software behaviour
also for related topics.
https://sourceforge.net/p/tcl/mailman/message/36663310/

Such a published data processing approach contains implementation details
which will need further development considerations.
Would you like to help with finding better solutions around the safe application

walto...@gmail.com

unread,
May 12, 2019, 4:44:37 AM5/12/19
to
I dunno - maybe you're dealing with a buggy python script. I'd consider just communicating with the server using Tcl or exec'ing curl.

Markus Elfring

unread,
May 12, 2019, 5:33:51 AM5/12/19
to
> I dunno - maybe you're dealing with a buggy python script.

I find my script “~/Projekte/Python/socket-send_json_data.py” sufficient
for software test purposes at the moment.
It sends a bit of data which should be properly handled by a companion process.


> I'd consider just communicating with the server using Tcl or exec'ing curl.

I am fiddling again with development challenges around inter-process communication.
How would you ensure in the service implementation that the desired record sets
were completely received for a test run?

Corresponding test example:
https://sourceforge.net/p/tcl/mailman/message/36663310/

Regards,
Markus

Ralf Fassel

unread,
May 13, 2019, 4:35:06 AM5/13/19
to
* Markus Elfring <Markus....@web.de>
| I am fiddling again with development challenges around inter-process communication.
| How would you ensure in the service implementation that the desired record sets
| were completely received for a test run?
>
| Corresponding test example:
| https://sourceforge.net/p/tcl/mailman/message/36663310/

The Activestate example is simple enough and basically sends back
everything it receives, so I would start by logging the receiced content
on the server side and see whether there is something missing there.
If there is already something missing there, investigate why (client
not flushing the channel before closing).

If this seems ok, start inserting the parts of your script
(i.e. condition variables) which seem to have a problem until it breaks.
Should give some more clues...

R'

Markus Elfring

unread,
May 13, 2019, 6:55:20 AM5/13/19
to
> …, so I would start by logging the receiced content
> on the server side and see whether there is something missing there.

Did you notice that my test example contains extra information output already?


> If there is already something missing there, investigate why
> (client not flushing the channel before closing).

My test data provider “~/Projekte/Python/socket-send_json_data.py”
should close the connection after each transmission.


> If this seems ok, start inserting the parts of your script
> (i.e. condition variables) which seem to have a problem until it breaks.
> Should give some more clues...

* Do you spot any questionable details in the shown data processing approach?

* Would you like to help in checking the involved system dependencies any more?

Regards,
Markus

Ralf Fassel

unread,
May 13, 2019, 12:38:22 PM5/13/19
to
* Markus Elfring <Markus....@web.de>
| > …, so I would start by logging the receiced content
| > on the server side and see whether there is something missing there.
>
| Did you notice that my test example contains extra information output already?

No. The only point where your code reads from the connection (proc
handle_connection) just appends what was read to a global TSV variable
in a questionable way (why 'lappend' and not plain 'append'?).

At this place, I would dump what was read to some log file (probably
with clock millisecond timestamps on a per-thread basis), or at least
to stderr, plus some more strategic messages on critical parts: server
socket established, connection accepted from client, connection ended by
client etc.

When trying to debug with millisecond timestamps, note that I found on
Windows that the ms-timestamps from different processes may not be
chronologically "in order", meaning that the timestamp in one process
affecting a response in a second process might be later than the
timestamp of the response (which is logically not possible, but can
occur in multi-processor machines, there is even a TCL ticket for this
stating that it is not easily fixable).

Good luck!
R'

Markus Elfring

unread,
May 13, 2019, 1:12:34 PM5/13/19
to
> | Did you notice that my test example contains extra information output already?
>
> No.

My script “test-statistic-server5.tcl” contains 14 calls of the command “puts”.
Would you dare to try it out a bit?


> The only point where your code reads from the connection
> (proc handle_connection) just appends what was read to a global TSV variable
> in a questionable way (why 'lappend' and not plain 'append'?).

I would like to append list elements for each connection instead of working with
unwanted string concatenations for the desired data inputs.


> At this place, I would dump what was read to some log file

The concrete data (within received contents) are not really relevant for
this test case at the moment.
I try to check the number of connections which were handled by the service.
I hope that such a system check will become easier and consistent finally.

Regards,
Markus

Ralf Fassel

unread,
May 15, 2019, 6:57:35 AM5/15/19
to
* Markus Elfring <Markus....@web.de>
| > The only point where your code reads from the connection
| > (proc handle_connection) just appends what was read to a global TSV variable
| > in a questionable way (why 'lappend' and not plain 'append'?).
>
| I would like to append list elements for each connection instead of
| working with unwanted string concatenations for the desired data inputs.

Ok, I had missed the fact that you're using blocking read to gather all
of the data, so all data of one connection come in one big chunk, while
the thread is blocked in the read.

Note that this will fail if the remote side sends more data than fits in
memory. Do you have control over the remote side? If so, the
"read-all-data-in-one-chunk" approach might be ok. If the remote side
might be 'hostile', you need some other approach in order to avoid
DOS-attacks.

With sockets, I always use fileevent-based non-blocking I/O and read the
input either line-based or in smaller chunks.

| > At this place, I would dump what was read to some log file
>
| The concrete data (within received contents) are not really relevant
| for this test case at the moment. I try to check the number of
| connections which were handled by the service. I hope that such a
| system check will become easier and consistent finally.

Ah, ok. I had thought your error was about missed contents from the
connections.

You're starting three threads sequentially, each thread creates a new
socket, then you start a 'remote' client, then you wait for that thread
to finish, and then you start the next (procs repeat_command and
perform_command).

So I would expect three connections in total, but only one at a time?

Note that you are all doing blocking and sequentially in
perform_command, I wonder whether this is the intention? Shouldn't the
3 threads run in parallel (at least I thought this is what threading is
all about)?

A final remark about the style and structure of your test-example code:
I know you had started with the Activestate example, but that is just an
_example_, showing the basic principles by having the thread-procs as
inline code. But it should be obvious that as the thread-code becomes
more complex as in your case, inlined thread-code is not the way to go...

So first of all I would start by putting all of the thread procs in a
TCL-source file, format that nicely and let the threads 'source' that
file instead of having all of the code inline. This inline-code is a
nightmare to read and maintain...

My EUR 0.01
R'

Markus Elfring

unread,
May 15, 2019, 8:15:29 AM5/15/19
to
> …, so all data of one connection come in one big chunk, …

This should happen here so far.


> Do you have control over the remote side?

Yes. - I constructed also the test data provider “socket-send_json_data.py”.


> You're starting three threads sequentially, each thread creates a new socket,

Yes.


> then you start a 'remote' client,

The test data provider is executed as a child process after a server socket
was established.


> then you wait for that thread to finish,
> and then you start the next (procs repeat_command and perform_command).

This implementation detail is clear, isn't it?


> So I would expect three connections in total,

This expectation is inappropriate because the Python script should send
a bit of test data in six connections during a loop iteration within
the listening service process.


> but only one at a time?

An additional thread will be created for each connection.


> Note that you are all doing blocking

Occasionally, yes.


> and sequentially in perform_command,

I suggest to take another look at the involved data exchange synchronisation.


> I wonder whether this is the intention?

This is intended for the shown test approach.


> Shouldn't the 3 threads run in parallel

You refer also to the number of chosen loop iterations.


> (at least I thought this is what threading is all about)?

The network input connections should be handled in a concurrent way here.


> So first of all I would start by putting all of the thread procs in a
> TCL-source file, format that nicely and let the threads 'source' that
> file instead of having all of the code inline.

The splitting of the source code into additional script files will become
more interesting later for other software developments.

Regards,
Markus

Ralf Fassel

unread,
May 15, 2019, 12:12:18 PM5/15/19
to
This:

proc finished {} ...

thread::mutex lock $sm

try \
{
if [tsv::exists ${context} finished] \
{
while {[set lw [llength ${workers}]] != [set lf [tsv::llength ${context} finished]]} \
{
puts stderr "[thread::id] finish cond wait\n$lw workers: ${workers}\n$lf finished"
thread::cond wait $sc $sm
}
} \
else \
{
puts stderr {Variable "finished" was not set so far!}
thread::cond wait $sc $sm
}
}
sounds fishy to me.

Basically a cond_wait should *always* be

lock
while test condition-protected-by-lock
cond-wait

to protect against spurious wakeups and missed notifications, and in
this case checking whether all the work has actually been done,

If you enter the 'else' in the above code, the thread waits only once
and then exits. This is especially wrong if actually multiple
connections are started.

IMHO you should skip the
if [tsv::exists ${context} finished]
and initalize the 'finished' tsv-variable to an empty list, and then
unconditionally do the lock-check-wait.

I would guess that this is the main source of problems, if you actually
see the {Variable "finished" was not set so far!} message.



Then:
a notification should always be:
lock
set condition-protected-by-lock
notify

The 'finished' subvar seems to be additionally protected by mutex sm,
but the I/O-handling thread sets it while mutex {hcm} is locked:

finally \
{
close ${channel}
global hcc hcm sc sm context
thread::mutex lock ${hcm}
*here*
tsv::lappend ${context} finished {X}

and only later locks sm and does the notify. This might sometimes mess
up the sequence of events (not sure, depends on how the other involved
variables are set).

Note that for simple notifications, the lock is not necessary, in fact,
it is even hindering the flow of events:

thread 1
lock mutex
while check predicate
cond-wait var mutex

later:
thread 2
lock mutex
notify var

Now thread 1 wakes up and tries to reclaim the mutex to come out of the
cond-wait, but since the mutex is still locked by thread 2, thread 1 has
to wait until thread 2 releases the mutex, and only then the cond-wait
can return.



Then:
try \
{
tsv::set ${context} workers_listen 0

foreach w ${workers} \
{
thread::cond notify ${hcc}
}
} \

The 'foreach' seems unneccessary to me, since:

thread::cond notify cond
Wakes up all threads waiting on the condition variable cond.




Also note that the thread(n) manpage has this advice:

thread::transfer id channel
[...]
Due to the internal Tcl core implementation and the restriction on
transferring shared channels, one has to take extra measures when
transfer- ring socket channels created by accepting the connection out
of the socket commands callback procedures:

socket -server _Accept 2200
proc _Accept {s ipaddr port} {
after idle [list Accept $s $ipaddr $port]
}
proc Accept {s ipaddr port} {
set tid [thread::create]
thread::transfer $tid $s
}

In your code, you create the thread in the callback and only transfer the
channel asynchronously, I don't know if this matters.


BTW, is there a specific reason why this pattern:

proc finish_unlock {} \
{
global hcm
thread::mutex unlock ${hcm}
}
finish_unlock

proc perform_command_unlock {} \
{
thread::mutex unlock [tsv::get [uplevel {set x}] pcm]
}
perform_command_unlock

repeatedly appears (defining short one-liner procs which are called
only once)? Why not simply call the code directly? IMHO this just
makes the code more complex than necessary...


HTH
R'

Markus Elfring

unread,
May 15, 2019, 2:42:17 PM5/15/19
to
> sounds fishy to me.

Why?

This code part can indicate also an implementation detail where
I am occasionally struggling with unexpected software behaviour.


> Basically a cond_wait should *always* be
>
> lock
> while test condition-protected-by-lock
> cond-wait

> If you enter the 'else' in the above code, the thread waits only once
> and then exits.

Thanks for this reminder.


> IMHO you should skip the
> if [tsv::exists ${context} finished]
> and initalize the 'finished' tsv-variable to an empty list,
> and then unconditionally do the lock-check-wait.

This might look convenient. But I would like to distinguish the existence
of this variable from storing an empty list there.


> I would guess that this is the main source of problems, if you actually
> see the {Variable "finished" was not set so far!} message.

I noticed such a special case during a test run and added therefore
a corresponding case distinction.


> Then:
> a notification should always be:
> lock
> set condition-protected-by-lock
> notify

Where do you miss such actions in my application of condition variables?


> The 'finished' subvar seems to be additionally protected by mutex sm,

This is the mutex for the server thread configuration.


> but the I/O-handling thread sets it while mutex {hcm} is locked:

The mutexes protect accesses to different parts in the code.


> and only later locks sm and does the notify.

Notifications are performed at three places with different condition variables.


> Note that for simple notifications, the lock is not necessary, …

I got an other impression for my test example.
I find some lock necessary then so that the notifications will be attempted
only if it was determined that other threads will be listening.


> The 'foreach' seems unneccessary to me, since:

Thanks for another reminder.


> BTW, is there a specific reason why this pattern:

> proc perform_command_unlock {} \
> {
> thread::mutex unlock [tsv::get [uplevel {set x}] pcm]
> }
> perform_command_unlock
>
> repeatedly appears (defining short one-liner procs which are called
> only once)?

Yes!


> Why not simply call the code directly?

I find the TCL backtraces easier to follow if an unlock attempt would fail.
An unique procedure name is displayed then instead of an unsafe line number
for the affected place.

Regards,
Markus

Ralf Fassel

unread,
May 16, 2019, 4:14:38 AM5/16/19
to
* Markus Elfring <Markus....@web.de>
| > sounds fishy to me.
>
| Why?

Well, because of the text I wrote in the next paragraph, basically the
missing cond-wait-loop in case of the finished subvariable
not-yet-existing.

| > IMHO you should skip the
| > if [tsv::exists ${context} finished]
| > and initalize the 'finished' tsv-variable to an empty list,
| > and then unconditionally do the lock-check-wait.
>
| This might look convenient. But I would like to distinguish the
| existence of this variable from storing an empty list there.

Ok, but you need to do the same looping in both cases, which you
currently don't.

| > Then:
| > a notification should always be:
| > lock
| > set condition-protected-by-lock
| > notify
>
| Where do you miss such actions in my application of condition variables?

In the setting of the 'finished' subvar, as described in the next
paragraph in the cited message.


| > Note that for simple notifications, the lock is not necessary, …
>
| I got an other impression for my test example.
| I find some lock necessary then so that the notifications will be attempted
| only if it was determined that other threads will be listening.

Well, your impression is wrong ;-) The notification will happen
regardless of whether someone waits for it or not. There are many
texts regarding this, e.g.
https://stackoverflow.com/questions/17101922/do-i-have-to-acquire-lock-before-calling-condition-variable-notify-one
(This is about c++, but the principle is the same, and also the same if
you call notify-one or notify-all (which TCL seems to do)).

Again: the notification will happen regardless of whether someone waits
for it or not. This is why you always additionally need the predicate
with the condition variable.

Thread-1
lock
while not predicate
cond-wait (i.e. unlock until notification, then continue locked)
continue locked

Thread-2
lock
set predicate true
unlock
notify

Now consider this: Thread 1 locks and sees the "not predicate", so it
is about to enter the cond-wait. Now the OS scheduler suspends Thread
1, right before Thread 1 enters the cond-wait, so Thread 1 is not yet
waiting on the cond var, but still holds the lock.

Concurrently thread 2 runs, sets the predicate without lock and
notifies, before Thread 1 is scheduled to run again.

Since Thread 1 has not entered the cond-wait (remember, it still is
suspended by the OS), it misses the notification.

If Thread 2 had aquired the lock before it had set the predicate, it
would have been blocked until Thread 1 had actually entered the
cond-wait. Thus the notification is not missed.

| > BTW, is there a specific reason why this pattern:
| …
| > proc perform_command_unlock {} \
| > {
| > thread::mutex unlock [tsv::get [uplevel {set x}] pcm]
| > }
| > perform_command_unlock
| >
| > repeatedly appears (defining short one-liner procs which are called
| > only once)?
>
| Yes!
>
>
| > Why not simply call the code directly?
>
| I find the TCL backtraces easier to follow if an unlock attempt would fail.
| An unique procedure name is displayed then instead of an unsafe line number
| for the affected place.

Ok, point taken.

HTH
R'

Markus Elfring

unread,
May 16, 2019, 8:00:15 AM5/16/19
to
> The notification will happen regardless of whether someone waits for it or not.

You are generally right for this technical aspect. But additional lock scopes
can result in desirable effects.
How do you think about information around scheduling behaviour as an answer
to the question “When to unlock for using pthread_cond_signal()?”?
https://danluu.com/threads-faq/#Q233

Did you get any interesting impressions from a test run of the
script “test-statistic-server5.tcl”?

Regards,
Markus

Ralf Fassel

unread,
May 16, 2019, 12:11:15 PM5/16/19
to
* Markus Elfring <Markus....@web.de>
| > The notification will happen regardless of whether someone waits for it or not.
>
| You are generally right for this technical aspect. But additional lock scopes
| can result in desirable effects.
| How do you think about information around scheduling behaviour as an answer
| to the question “When to unlock for using pthread_cond_signal()?”?
| https://danluu.com/threads-faq/#Q233

This FAQ basically repeats what I intended to say.
If I understand correctly, you are referring with your question to

However, your scheduling behavior may be "more predictable" if you signal a
condition variable while holding the mutex. That may reduce some of the
causes of "spurious wakeups", by ensuring that the waiter has a slightly
better chance to get onto the mutex waiter list before you release the
mutex. (That may reduce the chance that some other thread will get the
mutex, and access to the predicate, first... though there are no
guarantees.)

? In this case: note the the many 'may's in that paragraph, and
especially the last sentence.

| Did you get any interesting impressions from a test run of the
| script “test-statistic-server5.tcl”?

I did not run that script, so no.

R'

Markus Elfring

unread,
May 16, 2019, 2:47:52 PM5/16/19
to
> I did not run that script, so no.

Would you ever get into the mood to try it out practically?

Regards,
Markus

Ralf Fassel

unread,
May 18, 2019, 5:56:51 AM5/18/19
to
* Markus Elfring <Markus....@web.de>
| > I did not run that script, so no.
>
| Would you ever get into the mood to try it out practically?

Let's see...

You originally asked about a problem with a multi-threaded server script
and unexpected software behaviour.

Though you did not explicitely state these problems here, they were
mentioned in a discussion on Sourceforge, to which you provided a link.

The problems listed in this discussion on Sourceforge are the classical
problems of multi-threaded programs: blocking in conditional-waits, and
missed events.

You also made your script available on Sourceforge.

So in order to investigate the problem with the script, one had to go to
sourceforge first, find the link to your script and download it from
there. I did that. The script was undocumented, uses cryptic variable
names, weird indentation and programming style.

Then I pointed out two flaws in the script which can lead to exactly
these problems you observed: unlocked setting of the predicate used for
one of the conditional variables, and improper conditional-waiting
without checking for spurious wakeup. You did not provide a new version
of the script with these issues addressed.

Even if I nevertheless wanted to run the script, it would fail since I
don't have access to the .js script which your script invokes to
generate the input.

Considering all of this, do you understand that I have a problem with
your request cited above?

R'

Markus Elfring

unread,
May 18, 2019, 7:24:31 AM5/18/19
to
> You originally asked about a problem with a multi-threaded server script
> and unexpected software behaviour.

The clarification of corresponding implementation details is evolving.


> Though you did not explicitely state these problems here, they were
> mentioned in a discussion on Sourceforge, to which you provided a link.

I omitted to repeat information for this forum (or newsgroup) because it is
already available by the archive of the mentioned mailing list “Tcl-Threads”.
(Related data were published also in other information systems.)


> The problems listed in this discussion on Sourceforge are the classical
> problems of multi-threaded programs: blocking in conditional-waits,
> and missed events.

This is reasonable to some degree, isn't it?


> So in order to investigate the problem with the script, one had to go to
> sourceforge first, find the link to your script and download it from there.

This is generally possible.


> The script was undocumented, uses cryptic variable names, weird indentation
> and programming style.

I chose my coding style for this programming language.


> Then I pointed out two flaws in the script which can lead to exactly
> these problems you observed:

There are opportunities for further development considerations.


> unlocked setting of the predicate used for one of the conditional variables,

Where did you notice this detail?


> and improper conditional-waiting without checking for spurious wakeup.

I am still looking for a final solution in the mentioned special case
according to the presented data processing approach.


> You did not provide a new version of the script with these issues addressed.

An adjusted variant will eventually be published later.


> Even if I nevertheless wanted to run the script, it would fail since I
> don't have access to the .js script which your script invokes to
> generate the input.

I omitted the auxiliary script “socket-send_json_data.py” also so far.
You can choose other script variants which would send a bit of data over
a few test TCP connections after they get informed over the command parameters
“server_id” and “server_port”.


> Considering all of this, do you understand that I have a problem with
> your request cited above?

Partly, yes.

Can the common understanding become better for remaining development challenges?

Regards,
Markus

Rich

unread,
May 18, 2019, 3:45:19 PM5/18/19
to
Markus Elfring <Markus....@web.de> wrote:
>> The script was undocumented, uses cryptic variable names, weird indentation
>> and programming style.
>
> I chose my coding style for this programming language.

You are certianly free to choose whatever coding style you want.

But keep in mind that when asking for *free* help of other volunteers,
it can be a distraction to those others, volunteering their time for
*free*, to have to interpret past non-standard coding styles before
they even get a chance to recognize the bug you for which are asking
for *free* assistance.


Ralf Fassel

unread,
May 20, 2019, 8:51:05 AM5/20/19
to
* Markus Elfring <Markus....@web.de>
| > Then I pointed out two flaws in the script which can lead to exactly
| > these problems you observed:
>
| There are opportunities for further development considerations.

I don't understand that remark. IMHO flaws in a program need to get
fixed now, not considered later.

| > unlocked setting of the predicate used for one of the conditional
| > variables,
>
| Where did you notice this detail?

From: Ralf Fassel <ral...@gmx.de>
Newsgroups: comp.lang.tcl
Subject: Re: Checking network input processing by TCL for a multi-threaded server
Date: Wed, 15 May 2019 18:12:14 +0200
Message-ID: <ygaftpf...@akutech.de>

| > and improper conditional-waiting without checking for spurious wakeup.
>
| I am still looking for a final solution in the mentioned special case
| according to the presented data processing approach.

Just stick to the pattern required for setting and reading the
lock-protected variables (same Message as above).

| I omitted the auxiliary script “socket-send_json_data.py” also so far.
| You can choose other script variants which would send a bit of data
| over a few test TCP connections after they get informed over the
| command parameters “server_id” and “server_port”.

For the errors of missed connections to be observed, it is an important
feature of the client side that it opens _multiple_ concurrent
connections to the server, not only one. So rather that have your
readers guess what might be required, provide the *complete* test case
with which you have problems.

Since this is comp.lang.tcl, I would expect the client side also to be
in TCL. I don't speak python and will not run an arbitrary script from
the net in a language which I can't verify.

| Can the common understanding become better for remaining development
| challenges?

As always: it depends :-)

R'

Markus Elfring

unread,
May 20, 2019, 9:50:27 AM5/20/19
to
> | > unlocked setting of the predicate used for one of the conditional
> | > variables,
> >
> | Where did you notice this detail?

> Newsgroups: comp.lang.tcl
> Subject: Re: Checking network input processing by TCL for a multi-threaded server
> Date: Wed, 15 May 2019 18:12:14 +0200
> Message-ID: <ygaftpf...@akutech.de>

You pointed out here that a loop was missing for the safe waiting on
a condition change.
https://groups.google.com/d/msg/comp.lang.tcl/q9XtvAT6T6o/495lUOb7AgAJ

The clarification for required lock scopes is still evolving, isn't it?


> Just stick to the pattern required for setting and reading the
> lock-protected variables (same Message as above).

I see that all involved status variables are modified under different lock scopes
in my test example.
Do you find my view on this technical aspect appropriate?


> For the errors of missed connections to be observed,

I might not really miss them. I wonder more about the distribution of
the connection counts according to test runs.


> it is an important feature of the client side that it opens
> _multiple_ concurrent connections to the server, not only one.

I observe enough software surprises already by using a single-threaded Python
script for sending a bit of test data in a simple loop of six iterations.


> So rather that have your readers guess what might be required,
> provide the *complete* test case with which you have problems.

The test data consumer implementation (the listening threads) points
some development challenges out.


> Since this is comp.lang.tcl, I would expect the client side also to be
> in TCL.

I find that my test data provider is working good enough for the shown test case.
Do you insist that the desired software clarification will only succeed together
with a TCL script variant for this data processing task?

Regards,
Markus
Message has been deleted

Ralf Fassel

unread,
May 21, 2019, 6:06:02 AM5/21/19
to
* Markus Elfring <Markus....@web.de>
| You pointed out here that a loop was missing for the safe waiting on
| a condition change.
| https://groups.google.com/d/msg/comp.lang.tcl/q9XtvAT6T6o/495lUOb7AgAJ
>
| The clarification for required lock scopes is still evolving, isn't it?

I'm sorry, I'm not a native english speaker, so sometimes I can't
understand whether your comments are meant ironic, sarcastic, or should
express an understanding.

Required lock scopes in threading are a well-defined subject, there is
nothing to clarify or evolve here.

In the simplest form, Mutexes, Condition-Variables and Predicates always
come in couples of three, and the most basic usage is:

Thread-1:
take lock(mutex-1)
set predicate-variable-1
release lock(mutex-1) (*)
notify condition-variable-1 (*)

(the last two marked (*) may appear in reversed order according to
taste).

Thread-2:
take lock(mutex-1)
while (not predicate-variable-1)
condition-wait(condition-variable-1, mutex-1)

This way everything should run as expected.

Note especially that locking mutex-2 and then changing
predicate-variable-1 will NOT work.

| > Just stick to the pattern required for setting and reading the
| > lock-protected variables (same Message as above).
>
| I see that all involved status variables are modified under different lock scopes
| in my test example.
| Do you find my view on this technical aspect appropriate?

They are modified under different lock scopes, that is correct, but
especially the finished subvariable is modified while the corresponding
mutex is NOT locked.

proc handle_connection {channel} \
{
try \
--<snip-snip>--
finally \
{
close ${channel}
global hcc hcm sc sm context
thread::mutex lock ${hcm}
=> tsv::lappend ${context} finished {X}

Here. Note that "${context} finished" is set while mutex HCM is locked.

But the other thread does:

proc finish {} \
{
global context hcc hcm lso sc sm workers
close $lso
(1) thread::mutex lock ${hcm}
try \
{
--<snip-snip>--
} \
finally \
{
proc finish_unlock {} \
{
global hcm
thread::mutex unlock ${hcm}
}
(2) finish_unlock
}

(3) thread::mutex lock $sm

try \
{
(4a) if [tsv::exists ${context} finished] \
{
(4b) while {[set lw [llength ${workers}]] != [set lf [tsv::llength ${context} finished]]} \
{
puts stderr "[thread::id] finish cond wait\n$lw workers: ${workers}\n$lf finished"
(5a) thread::cond wait $sc $sm
}
} \
else \
{
puts stderr {Variable "finished" was not set so far!}
(5b) thread::cond wait $sc $sm
}

At (1) HCM is locked, and at (2) released. At (3) Mutex SM is locked,
and at (4a) and (4b) "${context} finished" is checked, and the condition
wait is entered at (5a/b). But since the other thread does not lock Mutex
SM before changing the "${context} finished" variable and notifying the
condition variable, there is a chance that these two steps happen
between (4a/b) and (5a/b), and thus the notification is lost (and your
second thread blocked).

Note that there is a similar error regarding Mutex HCM and the
"${context} workers_listen" variable elsewhere in the code (set in #167
while HCM is not locked), though I have not verified whether this
actually might be protected by another mutex locked.

Additionally the code block around 4/5 also contains the other error
that in (5b) the necessary loop to cope with spurious wakeups and checking
for the correct end condition is missing.

I really do hope this clarifies matters now.

| > For the errors of missed connections to be observed,
>
| I might not really miss them. I wonder more about the distribution of
| the connection counts according to test runs.

I do not understand that comment. If the counts differ between runs,
doesn't that mean that some connections were missed?

| > it is an important feature of the client side that it opens
| > _multiple_ concurrent connections to the server, not only one.
>
| I observe enough software surprises already by using a single-threaded Python
| script for sending a bit of test data in a simple loop of six iterations.

In that case I wonder how you think that adding threads to the picture
will give you less surprises... As twogm6 pointed out, do you really
need multi-threading here? Or at least reduce the threading by one
level, have the main thread accept the connections and hand over the
processing to worker threads.

| > Since this is comp.lang.tcl, I would expect the client side also to be
| > in TCL.
>
| I find that my test data provider is working good enough for the shown
| test case. Do you insist that the desired software clarification will
| only succeed together with a TCL script variant for this data
| processing task?

No, I don't insist on that. But I do insist on understanding the code I
should download from the net and run on my computer. If that code is
written in a language I don't understand, the chances of me running that
code are much smaller than if that code was written in a language I _do_
understand and can verify.

HTH
R'

Rich

unread,
May 21, 2019, 7:03:16 AM5/21/19
to
Ralf Fassel <ral...@gmx.de> wrote:
> * Markus Elfring <Markus....@web.de>
> | You pointed out here that a loop was missing for the safe waiting on
> | a condition change.
> | https://groups.google.com/d/msg/comp.lang.tcl/q9XtvAT6T6o/495lUOb7AgAJ
>>
> | The clarification for required lock scopes is still evolving, isn't it?
>
> I'm sorry, I'm not a native english speaker, so sometimes I can't
> understand whether your comments are meant ironic, sarcastic, or should
> express an understanding.

Native English speaker here, it is not you Ralf. Through much of this
thread I've also failed at determining what Markus may actually mean by
comments such as the one you quote above. In fact, I've felt, many
times, that Markus is a non-native English speaker from much of the
non-idiomatic phrasing that has been used.

Markus Elfring

unread,
May 21, 2019, 8:40:13 AM5/21/19
to
> Required lock scopes in threading are a well-defined subject,

In principle.


> there is nothing to clarify or evolve here.

Different development opinions can occur for the selection of lock scope sizes.


> take lock(mutex-1)
> set predicate-variable-1
> release lock(mutex-1) (*)
> notify condition-variable-1 (*)

Can it be nicer to perform the change notification after the shown program
status update while the mutex is still locked?


> They are modified under different lock scopes, that is correct,
> but especially the finished subvariable is modified while the corresponding
> mutex is NOT locked.

The quoted scipt code expresses that an additional lock is applied.


> global hcc hcm sc sm context
> thread::mutex lock ${hcm}
> => tsv::lappend ${context} finished {X}
>
> Here. Note that "${context} finished" is set while mutex HCM is locked.

Yes. - I interpret this place as intended actions.

You might find it more appropriate to move this modification to a subsequent
lock scope.
I do not observe better test results then.


> | > For the errors of missed connections to be observed,
> >
> | I might not really miss them. I wonder more about the distribution of
> | the connection counts according to test runs.
>
> I do not understand that comment. If the counts differ between runs,
> doesn't that mean that some connections were missed?

The accounting for the discussed connection management is obviously questionable.


> In that case I wonder how you think that adding threads to the picture
> will give you less surprises... As twogm6 pointed out, do you really
> need multi-threading here?

I am trying the data processing approach “separate thread per connection” out.
https://www.activestate.com/blog/concurreny-tcl-weaving-threads/


> But I do insist on understanding the code I should download from the net
> and run on my computer.

This view is generally fine.


> If that code is written in a language I don't understand,
> the chances of me running that code are much smaller

This is also reasonable.


> than if that code was written in a language I _do_ understand and can verify.

The development challenges increase when there is more than a single (programming)
language involved.
(Do you find any of my clarification requests interesting for releated software
areas in other information systems?)

The amount of shown source code (additional try-finally blocks) might distract
also from the originally intended program control flow.
I imagine that scoped locks (or lock guards) can help more here together
with object-oriented programming.

Regards,
Markus

Ralf Fassel

unread,
May 21, 2019, 9:40:23 AM5/21/19
to
* Markus Elfring <Markus....@web.de>
| > take lock(mutex-1)
| > set predicate-variable-1
| > release lock(mutex-1) (*)
| > notify condition-variable-1 (*)
>
| Can it be nicer to perform the change notification after the shown
| program status update while the mutex is still locked?

Nicer? Don't know. Possible? Yes.
Did you notice the two lines immediately following the above citation in
my original message:
(the last two marked (*) may appear in reversed order according to
taste).
?

| > They are modified under different lock scopes, that is correct, but
| > especially the finished subvariable is modified while the
| > corresponding mutex is NOT locked.
>
| The quoted scipt code expresses that an additional lock is applied.

'Additional' to what? The modifying thread holds no other locks at that
place. You cannot lock just *any* Mutex, you need to lock the
*specific* Mutex required for the condition-variable/predicate-pair you
plan to modify.

| > global hcc hcm sc sm context
| > thread::mutex lock ${hcm}
| > => tsv::lappend ${context} finished {X}
| >
| > Here. Note that "${context} finished" is set while mutex HCM is locked.
>
| Yes. - I interpret this place as intended actions.
>
| You might find it more appropriate to move this modification to a
| subsequent lock scope.

I'm completely puzzled here. I do not find it 'appropriate', I find it
absolutely _necessary_ for this code to have a chance to work at all.

Do you think that

lock mutex-hcm
set predicate-sm

is ok (it is not)?

Or do you think that the above code has SM locked while the tsv::lappend
is running? If the latter, please show the sequence of events that
leads to SM locked at that point, since I don't see that happening.

And just to be clear: the code needs to have Mutex SM locked for the
"tsv::lappend ${context} finished", since the other thread does

thread::mutex lock $sm
if [tsv::exists ${context} finished] \
...
thread::cond wait $sc $sm

| I do not observe better test results then.

You need to fix *all* errors before trying to get better results.

So far I've shown two, which you need to fix, and mentioned a possible
third, see previous message regarding
Note that there is a similar error regarding Mutex HCM and the
"${context} workers_listen" variable elsewhere in the code (set in #167
while HCM is not locked)
, which you should investigate.

R'

Ralf Fassel

unread,
May 21, 2019, 9:45:08 AM5/21/19
to
* Rich <ri...@example.invalid>
| Ralf Fassel <ral...@gmx.de> wrote:
| > I'm sorry, I'm not a native english speaker, so sometimes I can't
| > understand whether your comments are meant ironic, sarcastic, or
| > should express an understanding.
>
| Native English speaker here, it is not you Ralf. [...]

Thanks for the confirmation, Rich :-)

R'

Markus Elfring

unread,
May 21, 2019, 10:52:00 AM5/21/19
to
> 'Additional' to what?

The shown TCL thread shared variable is protected already by an internal mutex
of the software module “tsv”.


> The modifying thread holds no other locks at that place.

I get an other impression fro the command “thread::mutex lock ${hcm}”.


> You cannot lock just *any* Mutex,

It can be possible. - But a proper usage of synchronisation primitives
should be achieved finally.


> you need to lock the *specific* Mutex required for the condition-variable
> /predicate-pair you plan to modify.

This is usually desirable.


> | > global hcc hcm sc sm context
> | > thread::mutex lock ${hcm}
> | > => tsv::lappend ${context} finished {X}
> | >
> | > Here. Note that "${context} finished" is set while mutex HCM is locked.
> >
> | Yes. - I interpret this place as intended actions.
> >
> | You might find it more appropriate to move this modification to a
> | subsequent lock scope.
>
> I'm completely puzzled here. I do not find it 'appropriate', I find it
> absolutely _necessary_ for this code to have a chance to work at all.

Your expectation is generally fine.


> Do you think that
>
> lock mutex-hcm
> set predicate-sm
>
> is ok (it is not)?

My view is influenced also by the “tsv” programming support.

Regards,
Markus

Ralf Fassel

unread,
May 21, 2019, 11:45:09 AM5/21/19
to
* Markus Elfring <Markus....@web.de>
| > 'Additional' to what?
>
| The shown TCL thread shared variable is protected already by an
| internal mutex of the software module “tsv”.
>
| > The modifying thread holds no other locks at that place.
>
| I get an other impression fro the command “thread::mutex lock ${hcm}”.
--<snip-snip>--
| > Do you think that
| >
| > lock mutex-hcm
| > set predicate-sm
| >
| > is ok (it is not)?
>
| My view is influenced also by the “tsv” programming support.

Ah, ok, now I see where this is going...

You seem to think that the internal locking done by the tsv-variables
relieves you of the task of locking mutexes before setting the variables
and condition-wait on them.

It is true that you don't need any additional locks for the plain
set/read operation on the tsv-variable itself, i.e. to do

thread 1
tsv::set x y z

thread 2
tsv::get x y

These operations indeed require no *additional* locks (they do use
locking behind the scenes).

BUT.

We are talking *synchronization of threads* here, which *does* require
additional locking. Not for the setting/reading of the related tsv
variables, but for the correct sequence of events (lock-set-notify
vs. lock-check-wait).

The internal tsv-locks only prohibit concurrent access to the tsv
variable, but this is only true for that one line of tsv-code. In the
next line the tsv-variable may already have been changed by the other
thread.

I.e. if you do

Thread 1
lock mutex
while [tsv get x y]
*critical phase*
condition-wait csv mutex

Thread 2
# no lock on mutex
tsv set x y 0
notify csv

thread 2 may set the tsv and notify the csv right between the while and
the condtion-wait in thread 1 (denoted *critical phase*). Thread 1 then
enters the condition-wait where it should not (since the predicate no
longer is true and the notification has already occured).

Clearer now?

R'

Markus Elfring

unread,
May 21, 2019, 12:25:23 PM5/21/19
to
> You seem to think that the internal locking done by the tsv-variables
> relieves you of the task of locking mutexes before setting the variables

Partly, yes.

This programming interface helps to keep thread shared variables
consistent by default.


> and condition-wait on them.

But the application of condition variables requires the usage of
additional mutexes also within TCL scripts.


> We are talking *synchronization of threads* here, which *does* require
> additional locking. Not for the setting/reading of the related tsv
> variables, but for the correct sequence of events (lock-set-notify
> vs. lock-check-wait).

I agree to this information.


> Clearer now?

It seems that we agree on the general TSV functionality.

I hope that such a consensus can help also for the explanation of remaining
variations in test results.

Regards,
Markus

Markus Elfring

unread,
May 22, 2019, 2:47:04 AM5/22/19
to
> proc finished {} ...

> sounds fishy to me.

Would you like to clarify further possible software adjustments around a script
like “test-statistic-server8.tcl”?
https://sourceforge.net/p/tcl/mailman/message/36673739/

Will any more test variants become interesting for the safe application of
condition variables together with TCL thread shared variables?

Do you know a test data provider which you would find easier to reuse for
the checking of network connections?

Regards,
Markus

Ralf Fassel

unread,
May 22, 2019, 5:18:48 AM5/22/19
to
* Markus Elfring <Markus....@web.de>
Since I have interest in the topic, I created a small test data provider
in TCL (see end of message, snip 1). You can adjust the timing of
events either globally in the default argument of proc waitabit, or
individually when calling it.

In the default version it processes everything as quickly as possible
and shows another race condition in your code (the new version
test-statistic-server8.tcl, with corrected lock scopes):

Thread 1
perform_command
start thread 2
wait for thread 2 to open the server socket and provide connection details
(1) execute the data sender (client.tcl/socket-send_json_data.py)
(2) call 'finish' on thread 2

Thread 2
open server socket
notify thread 1 of connection details
wait for connections

when a connection comes in:
call handle_input
start thread 3
(3) after 0 [list hand_over $t $so]

Thread 3:
define proc handle_connection
wait for events
The next event on thread 3 should be the transfer of the channel by
thread 2 and the call of handle_connection in proc hand_over.


Now if (1) finishes very quickly (since we have next to no delay in the
data sender), we call finish on Thread 2 at (2) very quickly, too.

Thread 2 will handle this 'finish' request when it enters the event
loop, which it does after scheduling the call to hand_over at (3) when
handling the first socket request.

Noe we have several events pending in Thread 2:
- two more incoming socket requests
- one thread::send call to "proc finish"
- one "after 0"-call to hand_over from the first socket request.

Which of these is handled first depends on the internal handling of the
event queue in TCL (what is checked first: sockets? TCL thread events?
plain 'after' events?).

*If* finish is entered first (which it does on my computer), thread 2
closes the server port and then blocks in the cond-waiting. It does not
handle the pending socket calls any more.


IMHO you should add some debugging at start and end of procs, so you can
follow the control flow, and especially see when some functions are not
called as expected.


I find all of this very educational in the sense that it confirms my
willing to stay away from threading as long as I can :-), especially
considering the non-threaded server script attached as "snip 2" after
the client, which handles all of this in less complicated and
error-prone code...

HTH
R'



# ---------------- snip 1 ------------------
# client.tcl - client side for threaded server test
puts stderr "client.tcl STARTED $argv"

# Timing of events: adjust the delay between steps either globally
# here or in the individual calls to waitabit.
# The default setting -1 tries to do everything as quickly as possible.
proc waitabit {{delay -1}} {
if {$delay >= 0} {
after $delay {set ::continue 1}
vwait ::continue
}
}

# open some connections
proc connect {host port {howmany 3}} {
for {set i 0} {$i < $howmany} {incr i} {
lappend fdlist [socket $host $port]
waitabit
}
puts stderr "client.tcl Openend $howmany connections: $fdlist"
return $fdlist
}

# send something to all connections in FDLIST
proc sendstuff fdlist {
foreach fd $fdlist {
puts $fd "this is from $fd"
waitabit
}
puts stderr "client.tcl Content sent to $fdlist"
return $fdlist
}

# close down all connections in FDLIST
proc disconnect fdlist {
foreach fd $fdlist {
close $fd
waitabit
}
puts stderr "client.tcl Connections closed: $fdlist"
return $fdlist
}

disconnect [sendstuff [connect [lindex $argv 0] [lindex $argv 1]]]

puts stderr "client.tcl EXIT"
exit 0
# End of file
# ---------------- snip 1 ------------------

# ---------------- snip 2 ------------------
#!/usr/bin/tclsh --
proc debug {msg {flag 1}} {
if {$flag} { puts stderr $msg }
}

# when to finish
set shutdown_timer ""
proc check_shutdown {} {
after cancel $::shutdown_timer
foreach {conn active} [array get ::connections] {
if {$active ne 0} {
# connection still open, check again in 1 second
set ::shutdown_timer [after 1000 check_shutdown]
return
}
}
debug "check_shutdown: no connections pending any more, shutdown"
set ::forever "is now"
}

proc start_server {} {
set ::server_socket [socket -server accept -myaddr localhost 0]
debug "start_server: [chan configure $::server_socket]"
exec tclsh client.tcl {*}[lrange [chan configure $::server_socket -sockname] 1 2] 2>@stderr
}

proc accept {sock host port} {
fconfigure $sock -blocking 0
fileevent $sock readable [list handle_input $sock]
set ::connections($sock) $sock
list_connections "accepted connection from $host/$port => $sock"
check_shutdown
}

proc handle_input {sock} {
lappend ::contents($sock) [gets $sock]
if {[eof $sock]} {
close $sock
set ::connections($sock) 0
list_connections "EOF on $sock"
}
}

proc list_connections {msg} {
debug "==============\n$msg"
debug "[array size ::connections] connections"
foreach {conn active} [array get ::connections] {
debug "$conn => $active"
}
debug "=============="
}

proc summary {} {
parray ::contents
}

start_server
vwait forever
summary

# End of file
# ---------------- snip 2 ------------------

Markus Elfring

unread,
May 22, 2019, 7:40:26 AM5/22/19
to
> Since I have interest in the topic, I created a small test data provider
> in TCL (see end of message, snip 1).

Thanks for your contribution.


> In the default version it processes everything as quickly as possible

This should usually happen.


> and shows another race condition in your code

I suggest to take another look at such a view.


> (the new version test-statistic-server8.tcl, with corrected lock scopes):
>
> Thread 1
> perform_command
> start thread 2
> wait for thread 2 to open the server socket and provide connection details
> (1) execute the data sender (client.tcl/socket-send_json_data.py)
> (2) call 'finish' on thread 2
>
> Thread 2
> open server socket
> notify thread 1 of connection details
> wait for connections
>
> when a connection comes in:
> call handle_input
> start thread 3
> (3) after 0 [list hand_over $t $so]
>
> Thread 3:
> define proc handle_connection
> wait for events
> The next event on thread 3 should be the transfer of the channel by
> thread 2 and the call of handle_connection in proc hand_over.

This is also a helpful control flow description.


> Now if (1) finishes very quickly (since we have next to no delay in the
> data sender), we call finish on Thread 2 at (2) very quickly, too.

The execution of the test data provider (as a child process) will take
a moment. The procedure “finish” is called by the socket management thread
only after this external command was executed by the main thread, isn't it?


> Thread 2 will handle this 'finish' request when it enters the event loop,

It should wait on a corresponding termination condition after test data
were received.


> which it does after scheduling the call to hand_over at (3) when
> handling the first socket request.

Data import should usually happen here.


> Noe we have several events pending in Thread 2:
> - two more incoming socket requests
> - one thread::send call to "proc finish"
> - one "after 0"-call to hand_over from the first socket request.

My expectation was that the needed worker threads were created (in the meantime).


> Which of these is handled first depends on the internal handling of the
> event queue in TCL

I find it relevant that the thread handles will be completely stored in
the variable “workers” for a test run.


> (what is checked first: sockets? TCL thread events? plain 'after' events?).
>
> *If* finish is entered first (which it does on my computer),

This is possible.


> thread 2 closes the server port

I wonder about this information.


> and then blocks in the cond-waiting.

This is expected.


> It does not handle the pending socket calls any more.

I would find this detail unexpected.
Should data from test TCP connections still be appropriately processed?

Regards,
Markus

Ralf Fassel

unread,
May 22, 2019, 9:40:58 AM5/22/19
to
* Markus Elfring <Markus....@web.de>
| > Now if (1) finishes very quickly (since we have next to no delay in the
| > data sender), we call finish on Thread 2 at (2) very quickly, too.
>
| The execution of the test data provider (as a child process) will take
| a moment. The procedure “finish” is called by the socket management thread
| only after this external command was executed by the main thread, isn't it?

Yes. But this only means that the external command has finished, not
that all the processing of the output of that command already has taken
place in the other threads. Especially with small one-liner output like
in my client.tcl, I expect lots of buffering in the TCP stack of the OS.

| > Thread 2 will handle this 'finish' request when it enters the event loop,
>
| It should wait on a corresponding termination condition after test
| data were received.

Well, "it should". Making it actually do so is the challenge in multi-threading :-)

| > Noe we have several events pending in Thread 2:
| > - two more incoming socket requests
| > - one thread::send call to "proc finish"
| > - one "after 0"-call to hand_over from the first socket request.
>
| My expectation was that the needed worker threads were created (in the meantime).

Verify that expectation by adding stderr output in the thread creation
code, right at the beginning and immediately before the end of procs.
You will see that this expectation is not always fulfilled.

| > (what is checked first: sockets? TCL thread events? plain 'after' events?).
| >
| > *If* finish is entered first (which it does on my computer),
>
| This is possible.

Ok...

| > thread 2 closes the server port
>
| I wonder about this information.

proc finish {} \
{
global context hcc hcm lso sc sm workers
=> close $lso

Here.

| > It does not handle the pending socket calls any more.
>
| I would find this detail unexpected.
| Should data from test TCP connections still be appropriately processed?

How? The incoming sockets are handled in thread 2, but thread 2 now is
blocked in 'finish' waiting for

while {[set lw [llength ${workers}]] != [set lf [tsv::get ${context} finished]]} \

As I said before, add debugging output to the start and end of the procs
to see the sequence (or in this case missing sequence) of events.



Here are test runs on my computer with various delays in client.tcl:

delay -1, delay 0 (i.e. as quick as possible on the client side)
% tclsh8.6 test-statistic-server8.tcl
1. run
waiting: 127.0.0.1 localhost 42918 ::1 localhost 42918
client.tcl STARTED localhost 42918
client.tcl Openend 3 connections: sock4 sock5 sock6
client.tcl Content sent to sock4 sock5 sock6
client.tcl Connections closed: sock4 sock5 sock6
client.tcl EXIT
tid0x7f6de7c5a700 finish cond wait
1 workers: tid0x7f6de6bb8700
0 finished
[blocks]


"delay 1" (i.e. wait a teeny weeny bit)
% tclsh8.6 test-statistic-server8.tcl
1. run
waiting: 127.0.0.1 localhost 41902 ::1 localhost 41902
client.tcl STARTED localhost 41902
client.tcl Openend 3 connections: sock4 sock5 sock6
client.tcl Content sent to sock4 sock5 sock6
tid0x7f735b7ae700 handle_connection cond wait
client.tcl Connections closed: sock4 sock5 sock6
client.tcl EXIT
tid0x7f735af5d700 handle_connection cond wait
tid0x7f7360b8d700 finish cond wait
3 workers: tid0x7f735b7ae700 tid0x7f735af5d700 tid0x7f735a70c700
0 finished
tid0x7f735b7ae700 handle_connection: cond notify
tid0x7f7360b8d700 finish cond wait
3 workers: tid0x7f735b7ae700 tid0x7f735af5d700 tid0x7f735a70c700
1 finished
tid0x7f735af5d700 handle_connection: cond notify
tid0x7f7360b8d700 finish cond wait
3 workers: tid0x7f735b7ae700 tid0x7f735af5d700 tid0x7f735a70c700
2 finished
[blocks]

"delay 10" (slooooow, give everyone time to do their job)
% tclsh8.6 test-statistic-server8.tcl
1. run
waiting: 127.0.0.1 localhost 42383 ::1 localhost 42383
client.tcl STARTED localhost 42383
client.tcl Openend 3 connections: sock4 sock5 sock6
client.tcl Content sent to sock4 sock5 sock6
tid0x7f01df00b700 handle_connection cond wait
tid0x7f01de7ba700 handle_connection cond wait
tid0x7f01ddf69700 handle_connection cond wait
client.tcl Connections closed: sock4 sock5 sock6
client.tcl EXIT
tid0x7f01e00ad700 finish cond wait
3 workers: tid0x7f01df00b700 tid0x7f01de7ba700 tid0x7f01ddf69700
0 finished
tid0x7f01de7ba700 handle_connection: cond notify
tid0x7f01e00ad700 finish cond wait
3 workers: tid0x7f01df00b700 tid0x7f01de7ba700 tid0x7f01ddf69700
1 finished
tid0x7f01ddf69700 handle_connection: cond notify
tid0x7f01e00ad700 finish cond wait
3 workers: tid0x7f01df00b700 tid0x7f01de7ba700 tid0x7f01ddf69700
2 finished
tid0x7f01df00b700 handle_connection: cond notify
2. run
waiting: 127.0.0.1 localhost 41083 ::1 localhost 41083
client.tcl STARTED localhost 41083
client.tcl Openend 3 connections: sock4 sock5 sock6
client.tcl Content sent to sock4 sock5 sock6
tid0x7f01e00ad700 handle_connection cond wait
tid0x7f01df00b700 handle_connection cond wait
tid0x7f01ddf69700 handle_connection cond wait
client.tcl Connections closed: sock4 sock5 sock6
client.tcl EXIT
tid0x7f01dd718700 finish cond wait
3 workers: tid0x7f01e00ad700 tid0x7f01df00b700 tid0x7f01ddf69700
0 finished
tid0x7f01e00ad700 handle_connection: cond notify
tid0x7f01dd718700 finish cond wait
3 workers: tid0x7f01e00ad700 tid0x7f01df00b700 tid0x7f01ddf69700
1 finished
tid0x7f01df00b700 handle_connection: cond notify
tid0x7f01dd718700 finish cond wait
3 workers: tid0x7f01e00ad700 tid0x7f01df00b700 tid0x7f01ddf69700
2 finished
tid0x7f01ddf69700 handle_connection: cond notify
3. run
waiting: 127.0.0.1 localhost 45478 ::1 localhost 45478
client.tcl STARTED localhost 45478
client.tcl Openend 3 connections: sock4 sock5 sock6
client.tcl Content sent to sock4 sock5 sock6
tid0x7f01dd718700 handle_connection cond wait
tid0x7f01ddf69700 handle_connection cond wait
tid0x7f01df00b700 handle_connection cond wait
client.tcl Connections closed: sock4 sock5 sock6
client.tcl EXIT
tid0x7f01de7ba700 finish cond wait
3 workers: tid0x7f01dd718700 tid0x7f01ddf69700 tid0x7f01df00b700
0 finished
tid0x7f01ddf69700 handle_connection: cond notify
tid0x7f01de7ba700 finish cond wait
3 workers: tid0x7f01dd718700 tid0x7f01ddf69700 tid0x7f01df00b700
1 finished
tid0x7f01df00b700 handle_connection: cond notify
tid0x7f01de7ba700 finish cond wait
3 workers: tid0x7f01dd718700 tid0x7f01ddf69700 tid0x7f01df00b700
2 finished
tid0x7f01dd718700 handle_connection: cond notify
incidence|"available records"|"return code"|"command output"
3|3|0|

Seems to work most of the times, but sometimes I also get
2|1|0|
1|1|0|
or similar, i.e. even the slow sender is not reliable, but I think this
is what your original question was about.

Fix the fast processing which now blocks, then the slow one should be
fixed, too.

<joke>
All of this will be much easier when TIP 131 is finally implemented.
https://core.tcl.tk/tips/doc/trunk/tip/131.md
Don't hold your breath, though...
</joke>

HTH
R'

Markus Elfring

unread,
May 22, 2019, 12:15:08 PM5/22/19
to
> But this only means that the external command has finished,
> not that all the processing of the output of that command already has taken
> place in the other threads.

I was looking for more software support in this direction.


> | It should wait on a corresponding termination condition after test
> | data were received.
>
> Well, "it should". Making it actually do so is the challenge in multi-threading :-)

I agree to this view.


> | Should data from test TCP connections still be appropriately processed?
>
> How? The incoming sockets are handled in thread 2, but thread 2 now is
> blocked in 'finish' waiting for
>
> while {[set lw [llength ${workers}]] != [set lf [tsv::get ${context} finished]]} \

It seems that such a termination condition was too optimistic and
therefore incomplete.

* I can try out to move the connection management code to another TCL thread
(so that this task will also be separately executed).

* How should be determined by TCL programming interfaces if any received test
data are still in network channel waiting queues?


> Seems to work most of the times, but sometimes I also get
> 2|1|0|
> 1|1|0|
> or similar, i.e. even the slow sender is not reliable, but I think this
> is what your original question was about.

Yes.

Better understanding is evolving for this software situation.

Regards,
Markus

Ralf Fassel

unread,
May 22, 2019, 12:38:45 PM5/22/19
to
* Markus Elfring <Markus....@web.de>
| > The incoming sockets are handled in thread 2, but thread 2 now is
| > blocked in 'finish' waiting for
| >
| > while {[set lw [llength ${workers}]] != [set lf [tsv::get ${context} finished]]} \
>
| It seems that such a termination condition was too optimistic and
| therefore incomplete.
>
| * I can try out to move the connection management code to another TCL thread
| (so that this task will also be separately executed).
>
| * How should be determined by TCL programming interfaces if any received test
| data are still in network channel waiting queues?

I would think that involving yet another thread would not change the
underlying problem, which is IMHO:

- after thread 1 has succesfully exec'd the client, the server needs to
wait for all of the incoming data to be processed.

- But in general you don't know how much data will come in, how many
connections will be established etc. (In this specific case you know
how many there will be, but in general you don't know.)

- So you can't say for sure "this is the last connection to come in, now
we can close down the server and wait for the workers (thread 3s) to
do their job".

- so if you call 'finish' too early, you will miss some connections, if
you call it too late, you wait uneccessarily long.

E.g. is 5 seconds wait enough? One should think that after 5 seconds
all of the processing should be done, until one day the machine is
heavily loaded and your threads 2 and 3 don't run for several seconds.

So I guess that the best you can do here is to say: I will wait for at
most N seconds for answers, and everything after that will be discarded.

E.g.. if I insert

puts stderr "[thread::id] waiting for 1 second before calling $s {finish}..."
after 1000
puts stderr "[thread::id] now calling $s {finish}"

right before

thread::send $s {finish}

in proc perform_command, I always (well... :-) get

3|3|0|

Or refine the closing down of the socket in thread 2, e.g. in "proc
finish" set up a timer to shut down thread 2 only if no new connection
comes in for a certain time. This is the approach I had chosen in the
non-threaded example.

HTH
R'

Markus Elfring

unread,
May 22, 2019, 2:33:42 PM5/22/19
to
> I would think that involving yet another thread would not change

The extra thread would not execute a condition wait loop for finishing
a test approach so that more TCL event processing can probably be performed
as usual.


> the underlying problem, which is IMHO:
>
> - after thread 1 has succesfully exec'd the client, the server needs to
> wait for all of the incoming data to be processed.

This is intended for the shown test example.


> - But in general you don't know how much data will come in, how many
> connections will be established etc.

The available data amount can vary. An upper limit is occasionally known.


> - So you can't say for sure "this is the last connection to come in, now
> we can close down the server and wait for the workers (thread 3s) to
> do their job".

It was attempted here because the end of the test data provider is known
in the discussed test combination.

It might be better to switch to an other network communication interface.
The tolerance for response delays can also vary.

Regards,
Markus
0 new messages