In short: I ran into serious problems when developing multithreded Tcl/
Tk application in C. After spending (literally!) weeks on debugging
and tests, I believe that it narrows down to Tcl/Tk bugs in
multithread mode. I tried to find something over net on such problems,
but to no success :-( I'll be very grateful if someone could confirm
that, or perhaps point out what I'm doing wrong...here's detailed
description:
I have Win32 C++ application (SCADA/HMI program) which may contain
several (from zero to literally hundreths) independent Tcl/Tk
"modules". Module is my own term here, and it boils down to separate
thread, with it's own tcl interpreter. I also create a "container
window" in module's thread, and initialize Tk in this interpreter (via
Tk_Init) so that "container window" becomes parent window for this
Tk's instance "main window" (this is done via setting global "argv"
variable to "-use container_id"). After initialization, thread load
script from file into interpreter and enters fairly standard event
loop, processing both windows messages and Tcl events. Some of
messages (defined by RegisterWindowsMessage) are used to carry
commands from other threads - when I receive such message I call some
Tcl function in interpreter, by usual Tcl_Eval( "functionName" ) call.
This code in turn does some Tcl work, like setting GUI up, displaying
information for the user, sometimes it calls C++ code back by means of
our own commands (created by Tcl_CreateObjCommand), etc.
All in all, it boils down to one (sometimes quite big) Win32 process,
containing many threads that contain Tcl interpreters with Tk enabled
on them. Interpreters react to events, such as GUI command, timer
event or some other, application-specific triggers. They process data,
display it, you get the point. Our company is using this functionality
for some time now (three, four years I guess), so code is pretty
stable and well-tested. We started from using quite simple and small
configurations (like, say, five scripts with some simple
functionality), and moved on to really big ones recently (more than
100 scripts, with different and sophisticated functionality, TclHttpd
among them). During that time I found and fixed many bugs in our code,
I also tried to treat Tcl/Tk nice and carefully, so as not to run into
problems. So far so good...
Recently, we found out that some of our configurations are _sometimes_
crashing on _some_ machines when starting up. Really nasty bug,
because it appeared rarely. After months of narrowing the case down,
we found out setup and machine that allowed for debugging. What I
found out is this: when there are multiple (ten or more) Tcl/Tk
threads starting and creating their GUIs (using TkTable), Tcl code
called by Tcl_Eval( "functionName" ) _sometimes_ throws unknown
exceptions (one that you can catch only by catch( ... ) rule). What is
important is that either no exception is thrown, or all interpreters
throw it - that would suggest that this is due to some conflicted
global state. Moreover, state of each interpreter after throwing such
exception is unstable, and if I try to Tcl_Destroy( interpreter ), it
will call "abort" internally thereby ending whole process.
So, my guess is as follows: multiple Tk runtimes using TkTable, even
if assigned to independent interpreters in independent threads using
independent windows and event loops are not completely fail-safe.
Sometimes (quite rarely) something bad happens (this would suggest
race condition) and Tk code crashes in one thread, causing other
threads crash, too :-(
My questions are:
- What do you think of such interpretation? Am I right?
- If I'm right, what I can do to remedy the problem? I'm using Tcl/Tk
8.4.13, I have read that Tk is being improved (speaking in the terms
of thread-awareness) in current beta (8.5?). Should I upgrade to
latest beta release or perhaps to latest "stable" one?
- I'm using (2.9 version) of TkTable, is there newer (unofficial beta
version?) available?
I'm really looking forward to someone savvy to help me. I use Tcl/Tk
from C++ code for some ten years, and never encountered problem I
couldn't fix on my own...till now. Any help will be appreciated.
Best regards
Michal
Well, you will have to somehow narrow down the conditions to reproduce
the problem, but I recently wasted an entire day on a nasty bug. It
turns out, the shared library found first by the process wasn't
compiled with thread support. This doesn't cause an immediate crash,
but only under certain conditions. Anyway, when I hear threads I now
think: do everything from scratch in a private directory. Even if you
don't get a crash, you may end up with corruption of important data.
Anyway, once again: with threads, don't trust any existing build.
There was also a recent change in build flags, where --nostartfiles
gets left off, but this usually causes the build to fail (and maybe
only on x86_64.
Basically there is a crash bug in Tcl Threads that can come up
occasionally when a mutex or condition/signal is freed.
vasiljevic has made a patch that I think should be in 8.5b2 but I
haven't taken a look at it yet. I'm pretty confident that my patch
works and has less performance implications (see file history on that
bug report).
Also pay attention to tom.rmadilo's advice - ensure that you compile
everything from scratch and with the same flags, especially if you're
running under Windows (don't mix libc/libcmt/msvcrt, etc, and don't
mix release and debug libraries).
Regards,
Twylite
To me, the recommended idiom in multi-threaded Tk is to have (N-1)
totally non-Tk threads communicating with 1 thread hosting Tk. From
your description it seems the threads are individually issuing
synchronous Tk commands; if it is the case, it would be wise to remove
[package require Tk] from the N-1 and delegate all GUI-reconfig
commands to the central one.
-Alex
Alex labels it "wise"; I judge it "necessary". It's essentially
infeasible to have multiple threads issuing Tk commands.
First of all, than you for your time and patience :-)
@tom.rmadillo
I'm pretty sure that I've compiled everything with same
([debug]multithreaded dll) settings. Also, I know typical syndroms of
mixing libraries compiled with different settings. And I believe this
is not the case. Of course I'm compiling everything myself, including
Tcl, Tk, my application and TkTable.
@Twylite
Well, I probably made myself not clear enough. I'm not using Tcl
Threads package at all. I'm creating new win32 thread in C++ code and
then build windows, tcl interpreter, enabling tk, etc in this very
thread.
@Alex
I know this idiom as well. But this in essence applies to one
application using multiple threads and one, common GUI. This is not
the case, as GUIs are supposed to be completely independent: different
threads, event loops, windows, interpreters, etc. For users
convenience they're visually "contained" in one top-level window, plus
they all sit in one process - and that's all. This is not typical
multithreaded application with one common GUI, it's more like
"operating system" for many threads running Tcl/Tk code, with separate
GUIs for each of them.
@Cameron
"Infeasible" you say. Well, then I'm glad that I didn't know this when
I started programming :-). Because it works for us for some four years
now, flawlessly, unless we use Tk _with_ Tktable.
Based upon what I see in the sources, Tcl and Tk authors made great
progress when it comes to making Tcl/Tk thread safe (with careful use
of thread, that is). Things that used to be in global variables and
such were moved to thread local storages. Certain problems (such as
global variables in Tk's Canvas implementation) were resolved not so
long ago. I suspect similar (but more subtle) problem in Tk _or_
TkTable implementation...anybody can confirm this?
Regards
Michal
I re-read your post and then discussion related to issue you pointed
out. Sorry, I guess I'm just tired (this was long day). I hope that
you're right. Anyway, I plan to apply patch described and repeat my
tests. Thanks for valuable suggestion, hope that will help (fingers
crossed :-) ).
Michal
What's the purpose of having independent event loops ?
The windowing system's events (mouse, keyboard, repaints) get
serialized and routed anyway.
So as long as nothing blocks any of the event loops, there could very
well be just a central one with no visible or prgrammatic
difference...
> For users convenience they're visually "contained" in one top-level window, plus
> they all sit in one process - and that's all.
If they are in one toplevel, you must be using the -use flag.
Then, you could also do this from separate processes.
So why insist on all these being threads, if they are so independent ?
> This is not typical
> multithreaded application with one common GUI, it's more like
> "operating system" for many threads running Tcl/Tk code, with separate
> GUIs for each of them.
Yes, but for a couple of years now "operating systems" have been using
address space boundaries between the individual components, calling
them "processes". Starting to see why ?
-Alex
Well, in Win32 each thread with it's own windows has to have it's own
message (event) loop. And that makes sense, if you ask me, because it
allows separated threads to have independent GUIs.
> > For users convenience they're visually "contained" in one top-level window, plus
> > they all sit in one process - and that's all.
>
> If they are in one toplevel, you must be using the -use flag.
> Then, you could also do this from separate processes.
> So why insist on all these being threads, if they are so independent ?
One word answer: performance. Communication between processes in
system (at least in win32, but I believe this is case in most OSes) is
_slow_. If you want to have multiple (order of hundreths) connected
modules, working together, exchanging information, you'd better off
using threads. Processes - based predessor of our current system was
really slow, and that was with coarser "task granulation".
>
> > This is not typical
> > multithreaded application with one common GUI, it's more like
> > "operating system" for many threads running Tcl/Tk code, with separate
> > GUIs for each of them.
>
> Yes, but for a couple of years now "operating systems" have been using
> address space boundaries between the individual components, calling
> them "processes". Starting to see why ?
Yeah, and people made concurrent systems using processes, and found
out that processes are heavy and slow when it comes to interoperating,
so "lightweight processes", a.k.a. "threads" were invented. Let me
tell you: they have their purposes, too :-)
Best regards from snowy Krakow, Poland
Michal
Yes, but in my experience the sets of constraints leading to the two
alternatives are pretty much exclusive:
- fine-grained intermodule communication and/or task switching demand
threads, to avoid respectively I/O and address-space change (MMU
reprogramming) overheads.
- more independent things and/or slow event streams like GUI are
perfectly content with processes.
It is clear you know what you're doing; but just for my culture I'd be
curious to know how you get to be doing fine-grained intertask
transfers and switches between independent GUI apps.
-Alex
System (called ANT Insight, webpage www.ant-iss.com, I'm afraid is in
Polish thought :-( ) is intended to be Swiss Army knife for industrial
automation monitoring purposes. Usually, everything (from simple
electric meters to big production machines) in modern industrial
plants (power plants, automobile factories, breweries, etc) has it's
own electronic interface. Trouble is, they usually adhere to different
standards (MODBUS, M-BUS, ProfiBus, to name only a few) or to no
standard at all (looks like it's, for example, compressor
manufacturers delight to come up with new and completely different
protocol of communication). Of course protocol is only one of
problems, you also have different interconnects (RS-232, RS-485, TCP/
IP, TCP/UDP, etc, etc), various databases, etc, etc.
So, should one want to write code that handles _all_ cases possible,
one would get lost between thousands of specific combinations, f.e.
MODBUS on TCP/IP being something completely different to both MODBUS
on RS-485 and, say, MODBUS on GPRS modem. And we're quite small
company. So we sliced up functionality in smalles chunks possible (we
call them modules), for example we have "MODBUS driver", which don't
care where bytes come from and where they go, it just sends them
"out", or gets them "in". We have several modules for various types of
connections, like "serial" or "tcp" or "udp", or...you get the point.
Now where Tcl/Tk comes from? Well, one cannot write C drivers for all
strange protocols, or different visualization needs that one can
encounter in todays industry. So we have "popular", high performance
code written in C, and generic "Tcl/Tk" module for handling special
cases. And please note that in big factory you can get "many special
cases", for example, one hudred modules of type "WeirdMeter1", and
several tens of types "WeirdMeter2 and 3". Or perhaps someone wants to
have special graphic report (this is where Tk comes in handy) or
control. Now, person which does "integration" (configuration of
software in the field) is usually electric engineer, and not software
specialist like you. You said "idiom for thread programming". He
doesn't know what thread is! It's pretty easy for him to write small,
isolated module and then make one hundred instances of such module.
It'd be pretty hard for him to create big module handling everything,
using I/O multiplexing or worker threads to diverge interest between
all peers. Now he can safely block one thread (waiting for I/O, for
example) and all other modules (tcl/tk interpreters in different
threads) are running in independent way.
And that's why we did it. Based on our experience (had some successes
where companies using "serious" software by General Electric failed),
it kind of works :-) Hope that answers your question.
Regards
Michal
Thanks.
> [...] Now, person which does "integration" (configuration of
> software in the field) is usually electric engineer, and not software
> specialist like you. You said "idiom for thread programming". He
> doesn't know what thread is! It's pretty easy for him to write small,
> isolated module and then make one hundred instances of such module.
> It'd be pretty hard for him to create big module handling everything,
This is a very valid argument for building small independent things,
so it answers the question "why not a single GUI thread". But it
doesn't at *all* address the "processes vs threads" issue. The same
electric engineer will be using the very same primitives whether in a
multithreaded program or not (I assume you encapsulated all intertask
IO code).
So, again, what kind of fine-grained hi-speed intertask IO do you
have ?
-Alex
If the factory could afford another computer, I would set one up that
didn't run any Tk/GUI, but just received data and pushed it out as tcp
to the gui application using either a very simple protocol, or if
signals come in from many locations, a messaging protocol like HTTP
POST. HTTP POST could have an advantage, but either way you establish
an address for each monitored GUI, and a thread which runs the module
code handles the data.
Personally I just hope that real time control isn't taking place
inside the GUI application. That is the first module I would break out
of any application.
There are so many modules, that you don't really have to have "hi
speed" IO. When we started development, a 1 GHZ PIII CPU (if I
remember well) was considered top-notch. I run simple test on such
machine: two C processes, A calling B using Win32 RPC mechanisms. B's
method returned immediately after being called. I made some 4-5
thousands of calls per second, and that used up 100% CPU. Now let us
assume that we have 100 MODBUS drivers in separate processes, 100 TCL
windows in separate processes controlling them, 100, say, TCP/IP or
virtual serial port drivers. And that we want to read information (in
one second) from MODBUS devices, where whole "read" process needs
exchange of, say, 10 messages with their respective replies. That way
you get (100+100+100)*10=3000 rpc calls, and some 70% of CPU time is
spend switching processes. Not good. Our software does such task and
you hardly see _any_ CPU usage at all, simply because it isn't wasting
cycles.
Having said that, I'm sure one could do that "smarter" using
processes, for example sending commands in bunches, creating not "one
process per one driver" but rather "one process per one driver
_type_", buffering, using better RPC mechanisms etc, etc. I'm not
trying to say that my way is the only one to go. It's just one I came
up with, made to work and like, after all ;-). Yes, it has it's
drawbacks. Other methods have drawbacks, too. It's matter of
taste...what appeals to someone better :-)
@tom.rmadillo
Factory could afford another computer (or ten, should they need them),
but as system grows bigger it becomes complex and more error-prone.
Personally I like "lightweight" solutions and found out that they work
for me. Most of the time, that is ;-)
Real-time control isn't taking place at PC computer, usually. That's
what PLC are for. They're rugged, fail-safe, simple (=reliable) and
realtime. But aside from control, in any modern plant, there is more
and more to do in "monitoring" department. For example, collecting
information for "traceability" purposes or for accounting and such.
That can be one huge SQL database, believe me (and databases are
burning most CPU cycles in my company's typical scenario).
Do 100 event loops serving 100 toplevels (even embedded in one larger
true toplevel) really make sense ?
I can well understand that hundreds of components may have to talk to
other hundreds of components. I just can't believe that the same may
simultaneously have something meaningful to say to or to hear from an
individual GUI. That's why I keep thinking your best bet would be to:
- keep your hundreds of threads all talking to each other through
efficient semaphore-based ITC
- have a single GUI thread (or just a few if you insist, but not
hundreds)
- encapsulate your formerly direct Tk calls from the individual
modules so that the "electric engineers" don't ever know that in fact
they are delegating.
-Alex
When you remove tktable from the equation, the problem is no longer
repeatable, and/or completely absent?
> My questions are:
> - What do you think of such interpretation? Am I right?
If your interpretation is right, it would be a bug in Tk and/or Tktable.
That is certainly possible.
> - If I'm right, what I can do to remedy the problem? I'm using Tcl/Tk
> 8.4.13, I have read that Tk is being improved (speaking in the terms
> of thread-awareness) in current beta (8.5?). Should I upgrade to
> latest beta release or perhaps to latest "stable" one?
You should at least upgrade to the latest stable and try and repeat the
problem. MT bugs do crop up and get squashed between releases. I see a
few MT related issues addressed since 8.4.13 specifically.
> - I'm using (2.9 version) of TkTable, is there newer (unofficial beta
> version?) available?
There is no newer code for tktable that would address this issue. There
are no obvious uses of globals in the tktable C code, so I don't know
why it would be the problem, though it might call core code that
triggers the issue.
BTW, as to using Tk in multiple independent threads ... it *should* now
work. This was not the case until 8.4.something, and several subtle
bugs had to be addressed along the way to get there.
Jeff
> When you remove tktable from the equation, the problem is no longer
> repeatable, and/or completely absent?
Absent. Repeatability of the problem is quite poor: you have to re-
load configuration some ten, twenty times, for the bug to occur. Mind
you that one configuration loading and startup takes some five minutes
(this is pretty obsolete machine). When we used better (faster)
computers, especially SMP and/or multicore ones, we weren't able to
reproduce the problem at all.
Having said that, it's possible that it's not tktable, but amount of
work (script work) it takes to setup correctly. These modules are
completely tk-table based, and so removing tk-table from equation
makes them pretty do-nothing :-(
> You should at least upgrade to the latest stable and try and repeat the
> problem. MT bugs do crop up and get squashed between releases. I see a
> few MT related issues addressed since 8.4.13 specifically.
Did that. Unfortunately, problem persists. I have impression that
crashes are even rarer than before, but I managed to crash version
using 8.4.16 twice during last two days (it took me some time to
switch to this version). It all ends up with panic called from
DeleteInterpProc, with message "DeleteInterpProc called with active
evals". And I know for sure that it isn't my code that throws that
"fatal" exception (which handler, in turn, calls DeleteInterp).
> BTW, as to using Tk in multiple independent threads ... it *should* now
> work. This was not the case until 8.4.something, and several subtle
> bugs had to be addressed along the way to get there.
Yeah, since I used Tk/Tcl in MT more and more over the years, I've
seen how much work went into this. Basically, it was that when I found
"next" bug, it was already described as fixed, and all there was to it
(for me) was to download next stable version. I really appreciate
everyone's involved hard work (this is especially hard, as MT-related
bugs are nastiest and hardest to track I know).
I'm going to try newest Beta then, and if that fails...well, I guess
I'm in real trouble :-( Or perhaps I'll have to look where in Tcl this
exception comes from...we'll see.
@Alex
> I just can't believe that the same may simultaneously have something meaningful to say to or to hear from an individual GUI
Well, in Tcl event loop does not imply "GUI". You have to have event
loop in order to use "fileevent" or "after", for example. Besides,
what's wrong in having event loop for each thread? I usually think of
event loop as way (especially, when made using "Command" desing
pattern) of communicating with thread, simple, elegant and easy to
extend.
Having said that, design you pointed out is probably less "brute" then
mine, elegant, and should I ever be doing this once again, I consider
using similar approach. However, for now I have application that took
some fifteen man-years of work, and besides bugs (that are being
worked on and eliminated gradually) - just works. I doubt that best
approach to (nasty, I agree) threading bug would be to re-start
everything from scratch (or even to re-write any major part of
system). Especially that I don't think I did something fundamentally
wrong, there's no (as pilots say) no-go item in my checklist.
Well, I happen to know :-)
Of course there's nothing wrong with having many threads with evloops
for fileevent/after's, since that's what I suggested you to do in the
first place...
But earlier in the description you stated that you had many threads
all doing Tk calls on their own. *That* I believe is more dangerous:
even though it's supposed to work now, multi-threaded Tk is
intrinsically a nontrivial sync issue. Subtle bugs *might* remain, and
you *might* have stumbled upon one of them....
> Having said that, design you pointed out is probably less "brute" then
> mine, elegant, and should I ever be doing this once again, I consider
> using similar approach. However, for now I have application that took
> some fifteen man-years of work, and besides bugs (that are being
> worked on and eliminated gradually) - just works. I doubt that best
> approach to (nasty, I agree) threading bug would be to re-start
> everything from scratch (or even to re-write any major part of
> system). Especially that I don't think I did something fundamentally
> wrong, there's no (as pilots say) no-go item in my checklist.
You're the ultimate judge -- but do you realize how painless
"delegating" may be ?
proc delegate args {::thread::send $::gui $args}
# just add "delegate" in front of all your individual Tk commands
-Alex