I want to release a new version of Websh, but I still have a strange
problem when testing Websh on Windows:
Sometimes, when I run the test suite to test Websh running within
Apache, my tests work, sometimes they don't. (Even running exactly the
same binaries does not reliably yield the same result.)
Apache sometimes crashes, leaving the following message in the error log:
Tcl_AsyncDelete: async handler deleted by the wrong thread
Funny thing: I don't use any Tcl Async functions in Websh at all. Is it
an extension that comes with the ActiveTcl distribution that interferes?
(A user reported this behavior to me already in January, but I had no
answer then, and I still don't). Note that the Tcl test code reported to
fail has nothing to do with the actual error: it's the server that is
supposed to deliver the page requested with http::geturl that panics.
Also note that the error seems at least to always be triggered by the
same test (pool-1.2), but sometimes, I get a Server Error 500 page and a
subsequent request will work again, while sometimes the error seems to
be triggered when trying to kill Apache cause it's already terminated).
It never seems to happen with Apache 1.3 (any Tcl version), but it
happens in combinations of Apache 2.0.63, Apache 2.2.11, Tcl 8.5.7, and
8.6.0 (b1)
To reproduce:
- Have a recent ActiveTcl version installed (e.g. 8.5.7 or 8.6)
- Have a recent Apache httpd installed (e.g. 2.0.63 or 2.2.11)
- Have Visual Studio ready
- Checkout Websh:
svn co http://svn.apache.org/repos/asf/tcl/websh/trunk websh
- cd to websh\src\win and run
nmake TCL_PREFIX="C:/Program Files/ActiveTcl/8.6.0" TCL_VERSION=86
HTTPD_PREFIX="C:/Program Files/Apache Software Foundation/Apache2.2.11"
apachetest
where TCL_PREFIX and HTTPD_PREFIX point to Tcl and Apache resp. and
TCL_VERSION matches the version you refer to by TCL_PREFIX
See attachments for an example of two consecutive calls to nmake
apachetest, where once it works and once it fails and the relevant error
log produced by the tests.
Thanks for any hints
Ronnie
--
Ronnie Brunner | ronnie....@netcetera.ch
phone +41-44-247 79 79 | fax +41-44-247 70 75
Netcetera AG | 8040 Z�rich | Switzerland | http://netcetera.ch
I'm not sure, but this is reminiscent of multi-threaded exit issues
we've been fixing recently, with George's patch.
The principle is now that when exiting, only minimal finalization work
is done because all N-1 other threads are in unknown states.
Could you test again with HEAD ?
-Alex
Is that valid for HEAD 8.6 and HEAD of 8.5 branch? (Cause it happens for
both versions)...
On Windows, I generally stick to http://www.tcl.tk/doc/howto/compile.html:
"Before trying to compile Tcl you should do the following things:
* Try ActiveTcl. ActiveState provides easy-to-install binaries of Tcl
for many platforms."
Anyway ;-), I gave it a try. But checking out HEAD and trying to compile
for Windows, I get:
H:\projects\tcl\win>nmake -f makefile.vc all
[...]
Microsoft (R) Program Maintenance Utility Version 6.00.8168.0
Copyright (C) Microsoft Corp 1988-1998. All rights reserved.
===============================================================================
*** Compiler has 'Optimizations'
*** Compiler has 'Pentium 0x0f fix'
*** Linker has 'Win98 alignment problem'
*** Intermediate directory will be '.\Release\itcl_Dynamic'
*** Output directory will be '.\Release'
*** Suffix for binaries will be ''
*** Optional defines are '-DTCL_CFGVAL_ENCODING=\"cp1252\"
-DSTDC_HEADERS -DTCL_
CFG_OPTIMIZED'
*** Compiler version 6. Target machine is IX86
*** Compiler options ' -QI0f -Ot -Oi -Op -Gs -YX -GZ -W3'
*** Link options ''
cl -nologo -c -QI0f -W3 -D _CRT_SECURE_NO_DEPRECATE -D
_CRT_NONSTDC_NO_
DEPRECATE -Fp.\Release\itcl_Dynamic\ -O2 -Ot -Oi -Op -Gs -YX -MD
-I..\win -I..\
generic -DBUILD_itcl -DTCL_THREADS=1 -I"H:\projects\tcl\win\..\g
eneric" -I"H:\projects\tcl\win\..\win" -DTCL_CFGVAL_ENCODING=\"c
p1252\" -DSTDC_HEADERS -DTCL_CFG_OPTIMIZED -DUSE_TCL_STUBS
-DUSE_TCLOO_STUBS -Fo
.\Release\itcl_Dynamic\ @C:\DOCUME~1\ronnie\LOCALS~1\Temp\nma05216.
itclBase.c
..\generic\itclBase.c(85) : error C2026: string too big, trailing
characters truncated
itclBuiltin.c
..\generic\itclBuiltin.c(85) : error C2026: string too big, trailing
characters truncated
itclParse.c
..\generic\itclParse.c(100) : error C2026: string too big, trailing
characters truncated
NMAKE : fatal error U1077: 'cl' : return code '0x2'
Stop.
And since I don't really fluently speak Windows, the simple answer to
your question is: No, unfortunately I can't.
Ronnie
--
Ronnie Brunner | ronnie....@netcetera.ch
phone +41-44-247 79 79 | fax +41-44-247 70 75
That appears to be the new itcl extension. The Microsoft compiler has a
limit for the length of string literals, and for some reason it appears that
itcl's code exceeds that. gcc in MinGW would probably work, but that may
not be ideal. Please file a bug.
I believe there is also a way to disable the building of itcl, but I do not
know how that is done with nmake.
-GPS
The bug is/was present in all versions, and is only fixed in 8.6 HEAD.
> And since I don't really fluently speak Windows, the simple answer to
> your question is: No, unfortunately I can't.
Dunno, I don't use the MS compiler, I use mingw.
As for prebuilt binaries of the bleeding edge, normally I turn to
http://www.patthoyts.tk/tclkit/win32-ix86/ , but it seems Pat has not
been updating them since Apr 23...
(If anybody has another link for nightly cvs builds, speak up.)
Now if you ping me at my gmail address, I'll send you the binaries
when I'm back home.
-Alex
done. Doesn't solve my problem though, but since Alex promised me
binaries to test against, I hope I'll get there...
Ronnie
-Alex
Thanks for sending me the binaries Alex. Unfortunately they didn't help
at all: I still get crashes, but very different ones.
What I did (just to make sure I didn't apply your binaries strangely):
- Copied your bins over an existing ActiveState 8.6.0 installation
- Modified init.tcl to not implement try (which obviously is now part of
your binary)
- Use this installation to run my tests against
What I get is that Websh doesn't even start up correctly. I get an
internal Websh error that certain fiels on the file system cannot be
accessed. I.E the following error message is triggered:
if (Tcl_Access(id, R_OK) != 0 ||
Tcl_Stat(id, &statPtr) != TCL_OK) {
Tcl_MutexUnlock(&(conf->mainInterpLock));
ap_log_error(APLOG_MARK, APLOG_ERR, 0, r->server,
"cannot access or stat webInterpClass file '%s'", id);
Tcl_DecrRefCount(idObj);
return NULL;
}
mtime = statPtr.st_mtime;
the file in "id" is there and it always worked so far. (I didn't post
the full context as I don't think it actually matters...)
Now either something with the file system handling was changed in the
head binary I got or my problems have nothing to do with either
Tcl_AsyncDelete or the fiel system, but we only see some side effects of
overwriting memory through use of a disposed Tcl_Obj or something.
BTW: I have no errors on Solaris or Linux with the same tests. It seems
Win specific.
Any other ideas appreciated.
TIA
Sorry to ask Googlable questions... but I'm unfamiliar with your
Websh. Is it a pure-script app or does it depend on extensions ?
Embedding ? Is there a simple way for me to reproduce this setup (and
bug) on my local sandbox ?
-Alex
> Alexandre Ferrieux wrote:
>> Now if you ping me at my gmail address, I'll send you the binaries
>> when I'm back home.
>
> Thanks for sending me the binaries Alex. Unfortunately they didn't help
> at all: I still get crashes, but very different ones.
>
> What I did (just to make sure I didn't apply your binaries strangely):
> - Copied your bins over an existing ActiveState 8.6.0 installation
> - Modified init.tcl to not implement try (which obviously is now part of
> your binary)
> - Use this installation to run my tests against
>
> What I get is that Websh doesn't even start up correctly. I get an
> internal Websh error that certain fiels on the file system cannot be
> accessed. I.E the following error message is triggered:
>
> if (Tcl_Access(id, R_OK) != 0 ||
> Tcl_Stat(id, &statPtr) != TCL_OK) {
> Tcl_MutexUnlock(&(conf->mainInterpLock));
> ap_log_error(APLOG_MARK, APLOG_ERR, 0, r->server,
> "cannot access or stat webInterpClass file '%s'", id);
> Tcl_DecrRefCount(idObj);
> return NULL;
> }
> mtime = statPtr.st_mtime;
>
> the file in "id" is there and it always worked so far. (I didn't post
> the full context as I don't think it actually matters...)
Tcl may be built with flags that may bring in a 64-bit struct stat
definition. This can cause problems, because if your code is built with a
different definitions for struct stat, than what the Tcl stat functions use,
the offsets and values in the struct will be incorrect. It can also cause
stack corruption of other local variables, or faults.
In Linux this problem is especially bothersome for me, because Tcl
extensions and programs that use a compiled Tcl library must be built with
HAVE_STRUCT_STAT64, if they directly access the Tcl_StatBuf/stat struct.
This means that user code should match the way that the libtcl was built, so
that the definitions match.
Tcl can not use the migration macros that were intended to help programs
transition transparently to 64-bit struct stat definitions, because Tcl uses
fts.h.
Thus sayeth the glibc fts.h:
"/* The fts interface is incompatible with the LFS interface which
transparently uses the 64-bit file access functions. */"
There is another interface that is in POSIX, but it seems there has been
some confusion, because some manuals and systems suggest fts, and others
suggest ftw() or nftw(). Madness has ensued, and the mess grows.
So, Tcl uses stat64() directly in some cases.
Some systems broke their ABI years ago, and made their defined struct stat
64-bit where needed in the struct.
> Now either something with the file system handling was changed in the
> head binary I got or my problems have nothing to do with either
> Tcl_AsyncDelete or the fiel system, but we only see some side effects of
> overwriting memory through use of a disposed Tcl_Obj or something.
Any Tcl_Obj pointer/variable stored or spilled to the stack, may be
corrupted by the struct stat misuse.
> BTW: I have no errors on Solaris or Linux with the same tests. It seems
> Win specific.
It could be the struct stat issue.
-GPS
Yes George, that seems likely: I built the tcl86.dll for Ronnie on a
32-bit XP.
However I don't have a 64-bit machine at hand; somebody else should
step in.
As a side note: shouldn't we design some mechanism within the Stub
system so that at least the mismatch be detected at runtime-link time
and prevent Tcl from proceeding, instead of crashing at some random
later time ?
-Alex
> On Jul 16, 9:05 pm, GPS <georg...@xmission.com> wrote:
>>
>> > BTW: I have no errors on Solaris or Linux with the same tests. It seems
>> > Win specific.
>>
>> It could be the struct stat issue.
>
> Yes George, that seems likely: I built the tcl86.dll for Ronnie on a
> 32-bit XP.
> However I don't have a 64-bit machine at hand; somebody else should
> step in.
A 32-bit XP can still use 64-bit integers in a struct stat. gcc and other
compilers support a C99 standard long long type, that is 64-bit. I believe
that the Microsoft compiler supports a 64-bit type, and I seem to recall it
uses a non-standard name.
In some cases when people throw around the term 64-bit, they mean 64-bit
pointers.
With a file system it's possible to have a program with a 32-bit pointer
size that is capable of using 64-bit file offsets. Disks limited to only
32-bits of addressing would be much smaller (think 2^32 vs. 2^64). So ~4GB
of disk space is not enough for most people, and that seek range might not
be adequate.
The ZFS file system, as you may have read about, supports sizes of 2^128. 2
to the power of 128 is a lot of bytes!
The way that a 32-bit processor uses 64-bit integers is via multiple
instructions, and carrying generally. The native ALU register size of an
IA-32/x86 is still 32-bits. There are also SSE, and MMX features available,
and those have brought some larger registers, and alignment restrictions.
Some IA-32/x86 processors may support 36-bits of memory (see: PAE), but the
pointer and register size remains the same.
These days in most new processors the x86 instructions are now legacy mode,
and the way forward is EM64T which became Intel(TM) 64 (it may be something
different by now), which is very similar to x86-64 or AMD64, however there
are some important differences.
IA-64/Itanium seems to be diminishing in some form, but who knows what the
future may bring?
> As a side note: shouldn't we design some mechanism within the Stub
> system so that at least the mismatch be detected at runtime-link time
> and prevent Tcl from proceeding, instead of crashing at some random
> later time ?
Donal knows the particulars of the current solution. From what I understood
the solution that was eventually chosen was to use an API that uses the
Tcl_StatBuf pointer like an opaque pointer, via access functions in libtcl.
I'm not certain if that TIP passed.
-GPS
Well, my XP is 32-bit as well... so I think we should not have a problem
here, right? Maybe, Alex just needs to send me his tclstub86.lib and
tcl86.lib because I now use the ActiveState versions, which might
reflect a different setup... (But to be honest: I have not much clue
about what I'm talking right now ... ;-)
Ronnie
Websh is a (mostly) C extension and does not depend on any other extension.
Main target is the Tcl extension libwebsh.(so|dll) which is where all
the functionality sits and which provides the websh package.
And it comes with two flavors of integration:
websh(.exe): basically a Tclsh that links to libwebsh.so
mod_websh.so: an Apache module that puts Websh into Apache
From the original post:
> - Have a recent ActiveTcl version installed (e.g. 8.5.7 or 8.6)
> - Have a recent Apache httpd installed (e.g. 2.0.63 or 2.2.11)
> - Have Visual Studio ready
> - Checkout Websh:
> svn co http://svn.apache.org/repos/asf/tcl/websh/trunk websh
> - cd to websh\src\win and run
With the MS comiler, you'd use
nmake TCL_PREFIX="<path to your tcl>" TCL_VERSION=86
HTTPD_PREFIX="<path to your httpd>" apachetest
to run the tests, using mingw, you probably need the unix version,
right? This would roughly be
cd src/unix
autoconf
configure --with-tcl=<path to tcl> -with-httpdinclude=<path to http
includes> --enable-gcc --enable-threads
make apachetest
hth
Ronnie
Aha ! So you also need my stub.lib files I guess. Sorry I was under
the assumption that it was pure script, otherwise I'd have sent them
earlier. Will do that shortly.
-Alex
Thanks for sending me all the lib (.a) files. Once again, I
unfortunately didn't get any further. My compiler seems not to like your
libtcl86.a file (the libtclstub86.a seems to work).
For completness, I add the output of my link command (note: renaming the
files to .lib gets rid of the D4024 warning, but doesn't change anything
else.) The resulting targets all crash pretty immediatley.
I guess I'll just have to wait until I get a next release of either
Tcl8.5 or Tcl8.6 where your fix is in and test then, or I can download a
complete HEAD build of Tcl...
H:\projects\websh\src\win>nmake TCL_PREFIX="C:/Program
Files/ActiveTcl/8.6.0" TCL_VERSION=86 HTTPD_PREFIX="C:/Program
Files/Apache Software Foundation/Apache2.0.63" all
Microsoft (R) Program Maintenance Utility Version 6.00.8168.0
Copyright (C) Microsoft Corp 1988-1998. All rights reserved.
cl /D"WIN32" /D"VERSION=\"3.6.0b5\"" /D"_MBCS" /W3 /EHsc /O2
/Ob1 /LD /M
D /Gy -o mod_websh3.6.0b5.so apchannel.obj interpool.obj logtoap.obj
mod_websh.obj modwebsh_ap.obj request_ap.obj response_ap.obj
websh3.6.0b5.lib "C:/Program Files/ActiveTcl/8.6.0/lib/libtcl86.a"
"C:/Program Files/ActiveTcl/8.6.0/lib/libtclstub86.a" "C:/Program
Files/Apache Software Foundation/Apache2.0.63/lib/libhttpd.lib"
"C:/Program Files/Apache Software
Foundation/Apache2.0.63/lib/libapr.lib" "C:/Program Files/Apache
Software Foundation/Apache2.0.63/lib/libaprutil.lib" kernel32.lib
user32.lib advapi32.lib ws2_32.lib odbc32.lib /link /dll
/nodefaultlib:msvcrt.lib /subsystem:windows
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86
Copyright (C) Microsoft Corp 1984-1998. All rights reserved.
Command line warning D4024 : unrecognized source file type 'C:/Program
Files/ActiveTcl/8.6.0/lib/libtcl86.a', object file assumed
Command line warning D4024 : unrecognized source file type 'C:/Program
Files/ActiveTcl/8.6.0/lib/libtclstub86.a', object file assumed
Microsoft (R) Incremental Linker Version 6.00.8447
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.
/out:apchannel.dll
/dll
/implib:mod_websh3.6.0b5.lib
/out:mod_websh3.6.0b5.so
/dll
/nodefaultlib:msvcrt.lib
/subsystem:windows
apchannel.obj
interpool.obj
logtoap.obj
mod_websh.obj
modwebsh_ap.obj
request_ap.obj
response_ap.obj
websh3.6.0b5.lib
"C:/Program Files/ActiveTcl/8.6.0/lib/libtcl86.a"
"C:/Program Files/ActiveTcl/8.6.0/lib/libtclstub86.a"
"C:/Program Files/Apache Software Foundation/Apache2.0.63/lib/libhttpd.lib"
"C:/Program Files/Apache Software Foundation/Apache2.0.63/lib/libapr.lib"
"C:/Program Files/Apache Software
Foundation/Apache2.0.63/lib/libaprutil.lib"
kernel32.lib
user32.lib
advapi32.lib
ws2_32.lib
odbc32.lib
libtcl86.a(d000693.o) : warning LNK4078: multiple ".text" sections found
with different attributes (E0000020)
libtcl86.a(d000693.o) : warning LNK4078: multiple ".text" sections found
with different attributes (E0000020)
Creating library mod_websh3.6.0b5.lib and object mod_websh3.6.0b5.exp
LINK : warning LNK4089: all references to "d000000.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000085.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000200.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000205.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000216.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000264.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000279.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000288.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000291.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000325.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000333.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000334.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000335.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000357.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000360.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000361.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000363.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000365.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000430.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000434.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000438.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000446.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000451.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000454.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000493.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000500.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000502.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000507.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000515.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000531.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000533.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000541.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000543.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000554.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000557.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000564.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000593.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000594.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000595.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000596.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000623.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000624.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000625.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000627.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000629.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000632.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000649.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000662.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000680.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000693.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000698.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000710.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000744.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000747.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000759.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000769.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000808.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000847.o" discarded by /OPT:REF
LINK : warning LNK4089: all references to "d000895.o" discarded by /OPT:REF
H:\projects\websh\src\win>
--
Ronnie Brunner | ronnie....@netcetera.ch
phone +41-44-247 79 79 | fax +41-44-247 70 75
Well, all this cross-arch linking is a kamikaze dive...
What I don't understand is how you can compile your extension or
embedding while not being able to compile the Tcl HEAD.
Considering the amount of time and sweat we have spent on this, I
think it is time to recognize that the extra effort to compile Tcl is
the best investment you can make at this point...
-Alex
I gave it another shot and just disabled itcl (which caused the "error
C2026: string too big") by renaming its makefile... and I was able to
build Tcl HEAD.
So, long thread short outcome: Apache doesn't report the same error
anymore but it still hangs during my tests.
So the fix you made in Tcl HEAD had some effect, but there is still
something fishy in my tests (or better: Websh or Tcl). Unfortunately,
I'm not really sure how to proceed, because my debugging skills on Win
are very limited.
Thanks so far, further hints appreciated
Ronnie
--
Ronnie Brunner | ronnie....@netcetera.ch
phone +41-44-247 79 79 | fax +41-44-247 70 75
The next thing to do would then be to open a bug ticket at SF, but one
key accelerator to the whole process would be to isolate the problem
as much as possible from:
(1) the apache module context
(2) the rest of your app
Ideally, you'd come up with a small Tcl script reproducing the bug on
tclsh.exe, possibly including some [load your.dll].
If none of this is possible, the next most useful thing would be to
analyse the resulting deadlock by attaching to the process with a
debugger and reporting the stacks of all threads.
-Alex