Tcl script fails on some platforms

hrp...@cam.ac.uk

unread,

Jul 31, 2007, 4:32:50 AM7/31/07

to

Hi

I've taken over looking after a relatively big TclTk application
(using [incr] Tcl & Tk + several other packages; it has around 60,000
lines and interfaces with a Fortran/C application with ~300,000 lines,
hence the small sample of code here - if you want more, just ask...),
and have come across a problem that only seems to affect a very small
proportion of users - and I'm stumped. I have had a look around the
web trying to find pointers but have so far been unsuccessful - so I
was hoping someone here could give me an idea of where to start
looking to get to the bottom of this problem.

The following code snippet works for me on any platform I care to try
it on and continues through to the rest of the script. When I say "any
platform", I've been running on Linux RHEL 4, SUSE 9.3, FC6, Ubuntu
7, Mac OSX 10.3 & 10.4 (both Intel and PPC in the latter case),
Windows XP, DEC Alpha Tru64 and possibly a couple of others that I've
forgotten. I have one user for whom it fails when running on FC6.

The installed versions of TclTk depend a bit on which platform/OS, but
are all 8.4.10 - 8.4.14, mostly the ActiveTcl distribution, but the
Tru64 version was built locally.

Does anyone have any ideas as to where I should start looking for an
answer to this? Has anyone come across anything similar?

Anyway, in file 1 I have (with various "puts" included to try to
track the problem) -

# get a filename and location (as full path) from the user
set l_image_file [.addImages get]
puts "this is the image filename: $l_image_file"
# If the user picked a file
if {$l_image_file != ""} {
puts "Okay, the image filename wasn't blank"
# add the image to the current session
$::session addImage $l_image_file
puts "but the code doesn't seem to have reached here"
}
}

In file 2 (where the ::session is defined and implemented) I have -

body Session::addImage { a_image_file } {
puts "check that this is actually being called"
# Check image isn't already present
set l_image [Image::getImageByPath $a_image_file]

This all looks relatively straightforward to me, and indeed, when I
run it on the boxes available to me I get the following output -
------------------------
this is the image filename: /nfs/ls1/harrygrp0/harry/test/camillo/
hg_001.mar1600
Okay, the image filename wasn't blank
check that this is actually being called
but the code doesn't seem to have reached here
============
;-) note that the message in the last line is wrong since the code has
quite clearly "reached here"!

My user gets
-----------------------

this is the image filename: /home/webice/imosflm_test/hg_001.mar1600
Okay, the image filename wasn't blank
invalid command name ""
invalid command name ""
while executing
"$::session addImage $l_image_file"
(object "::.c" method "::Controller::addImages" body line 11)
invoked from within
"::.c addImages"
(in namespace inscope "::Controller" script line 1)
invoked from within
"namespace inscope ::Controller {::.c addImages}"
("uplevel" body line 1)
invoked from within
"uplevel \#0 $itk_option(-command)"
(object "::.c.tbf.images.aitb" method "::Toolbutton::execute" body
line 42)
invoked from within
"::.c.tbf.images.aitb execute"
(in namespace inscope "::Toolbutton" script line 1)
invoked from within
"namespace inscope ::Toolbutton {::.c.tbf.images.aitb execute}"
invoked from within
".c.tbf.images.aitb.button invoke"
("uplevel" body line 1)
invoked from within
"uplevel #0 [list $w invoke]"
(procedure "tk::ButtonUp" line 22)
invoked from within
"tk::ButtonUp .c.tbf.images.aitb.button"
(command bound to event)
============

Mark Janssen

unread,

Jul 31, 2007, 4:50:35 AM7/31/07

to

It seems that for you user the ::session variable is empty. I would
start by puts-ing that session variable and if it is indeed empty
trying to figure out why its is.

Mark

hrp...@cam.ac.uk

unread,

Jul 31, 2007, 5:06:00 AM7/31/07

to

Sorry, I thought I'd done that in the lines -

puts "this is the image filename: $l_image_file"
# If the user picked a file
if {$l_image_file != ""} {
puts "Okay, the image filename wasn't blank"
# add the image to the current session
$::session addImage $l_image_file

addImage is surely the method, and l_image_file is the variable???

Mark Janssen

unread,

Jul 31, 2007, 7:05:14 AM7/31/07

to

You are not setting the ::session variable here, you are referencing
it ($::session)
Add this point the error indicates that $::session is "" so this is
substituted to:

"" addImage $l_image_file

giving the 'invalid command name ""' error.

Maybe you meant ::Session::addImage instead? I did notice that this is
defined as:

body Session::addImage ...

Is body some procedure you wrote yourself?

Mark

Bryan Oakley

unread,

Jul 31, 2007, 7:28:14 AM7/31/07

to

"body" is an incr tcl command. "body Session::addImage" defines the
method "addImage" for any object of class "Session"

But yes, $::session is null for whatever reason. He needs to figure out
why either a) session is never set, or b) what is causing it to be set
to the null string.

hrp...@cam.ac.uk

unread,

Jul 31, 2007, 8:01:36 AM7/31/07

to

Ah, okay, I think I'm beginning to understand. So if I include the
line

puts "\$\:\:session is set to $::session"

here -

puts "Okay, the image filename wasn't blank"
# add the image to the current session

* puts "\$\:\:session is set to $::session"
$::session addImage $l_image_file

puts "but the code doesn't seem to have reached here"

On my box (and all others that I have access to!) I get -

Okay, the image filename wasn't blank

$::session is set to ::Controller::session0

check that this is actually being called
but the code doesn't seem to have reached here

So it should be straightforward to send this to my users and let them
see what the result is.

What confuses me (and worries me slightly) is that I have around 750
users and fewer than
0.5% report this problem. If it's a problem in the code I'd expect to
see a higher error-rate, but if it's a problem with their installation
of either my code or their copy of TclTk I'd also expect a higher
error-rate. By the way, the only user who has reported this and has
tried on a second machine has the code working fine on the other
machine.

Debugging by remote control is *hard*!

Many thanks...

Cameron Laird

unread,

Jul 31, 2007, 9:22:55 AM7/31/07

to

In article <1185883296.2...@k79g2000hse.googlegroups.com>,
<hrp...@cam.ac.uk> wrote:
.
.

.
>What confuses me (and worries me slightly) is that I have around 750
>users and fewer than
>0.5% report this problem. If it's a problem in the code I'd expect to
>see a higher error-rate, but if it's a problem with their installation
>of either my code or their copy of TclTk I'd also expect a higher
>error-rate. By the way, the only user who has reported this and has
>tried on a second machine has the code working fine on the other
>machine.
>
>Debugging by remote control is *hard*!

.
.
.
One of the glories of Tcl is that we typically conclude that
remote diagnosis is difficult; the contrast is with people
working in Java, C++, ..., where practitioners most often know
that it's impossible.

Gerald W. Lester

unread,

Jul 31, 2007, 10:13:53 AM7/31/07

to

Cameron Laird wrote:
>... .

> One of the glories of Tcl is that we typically conclude that
> remote diagnosis is difficult; the contrast is with people
> working in Java, C++, ..., where practitioners most often know
> that it's impossible.

QOTW??

--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+

hrp...@cam.ac.uk

unread,

Jul 31, 2007, 1:19:06 PM7/31/07

to

Okay, I think I've got at least a partial answer to this problem.

The interface between my Tcl script and the Fortran/C program is via
TCP/IP sockets. If I force the IP address of the server side (in Tcl)
to be localhost or 127.0.0.1, I get this error and can see that
$::session has not been set (or is set to somethiing that doesn't
print on the screen).

If I just let the Tcl socket stuff use the real IP address of the
machine, I don't get the problem.

Now I have to see if my user (in a different time zone) can tell me
more about his machine...

Cameron Laird

unread,

Jul 31, 2007, 2:53:54 PM7/31/07

to

In article <1185902346....@k79g2000hse.googlegroups.com>,

This raises a suspicion that some sort of exception from [socket]
has been ignored.

Robert Heller

unread,

Jul 31, 2007, 3:56:06 PM7/31/07

to

Sounds like it is machine that lacks an EtherNet connection...

If the Fortran/C program is in fact running on the local machine, then
the interface code needs to be able to work over the loopback device
(localhost or 127.0.0.1) -- in fact it should prefer to use the
loopback, since that is more secure -- it avoids running a 'public'
service that could be hacked into.

You might want to have a really close look at the code where the sockets
are set up and see why using the loopback device causes problems.

>
>

--
Robert Heller -- Get the Deepwoods Software FireFox Toolbar!
Deepwoods Software -- Linux Installation and Administration
http://www.deepsoft.com/ -- Web Hosting, with CGI and Database
hel...@deepsoft.com -- Contract Programming: C/C++, Tcl/Tk

hrp...@cam.ac.uk

unread,

Jul 31, 2007, 4:30:30 PM7/31/07

to

Hi

this is exactly why I was looking at implementing the loopback (I have
a whole site of potential users who can't use the software, and I
suspected it's because of their very secure firewall & proxy setup, so
I was trying to avoid socket connections that might look like they
were not local). I would prefer loopback in any case because
everything does happen on a local machine.

However, I know that my machine has an ethernet connection, and this
was the machine where I managed to simulate the failure by specifying
loopback in the Tcl socket creation. So I guess I'm back to looking at
the socket creation code (which actually looks *very* similar to that
in Welch's book - I think it was cribbed from there!).

thanks for this - I think I'm starting to make some progress in
understanding this.

Jeff Hobbs

unread,

Jul 31, 2007, 7:41:41 PM7/31/07

to hrp...@cam.ac.uk

hrp...@cam.ac.uk wrote:
> Debugging by remote control is *hard*!

If you have open sockets, then you can remote debug using the Tcl Dev Kit:
http://www.activestate.com/Products/tcl_dev_kit/

Another tool to try, since you have Tk running, is tkcon. Starting and
running in tkcon can provide a much richer introspection experience once
the error is tripped.

Jeff

hrp...@cam.ac.uk

unread,

Aug 1, 2007, 4:39:10 AM8/1/07

to

Hi

I think I've traced this now. It seems that the problem only existed
on FC6 (I haven't checked against earlier releases of FC...); the /etc/
hosts file on FC6 seems to require the local machine's IP address as
well as the localhost (i.e. loopback) addresses - otherwise it seems
to default to assigning localhost to one or both of the sockets
created by my applications (i.e. either or both of the server/client
pair).

On most machines anything other than the loopback addresses seems to
be optional.

Once my user put a line in with his local machine's IP address, all
was hunky-dory, and the TclTk app communicated with the Fortran/C
app..

I don't pretend to understand this, or even know enough about TCP/IP
to know if the FC6 form of the /etc/hosts file is actually the correct
version or whether it's just an option that has been adopted as
useful, but the important thing from my point of view is that I can
avoid the problem.

Thanks to everyone for your suggestions - they led me along paths I
would have not considered otherwise.