Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Trying to port mksh to Coherent

227 views
Skip to first unread message

Roy

unread,
Apr 25, 2012, 11:10:25 PM4/25/12
to
Hi all,

I tried to port mksh to Coherent, and I hit some issues:
- there is no gettimeofday() and I used time() instead
- there is no termios.h and I tried to use termio.h instead (for
tcgetattr() and tcsetattr())
- there is no tcgetpgrp() and tcsetpgrp() so I disabled job control
- there is no lstat() and readlink(), I used stat() and failing the
readlink() line

- the Build.sh script cannot generate a list of signals because it
seems that sed fails to print the tab char (0x9). I tried gnu sed 2.05
too but the problem still (both 2.03 and 2.05 works in other platform,
so I think it is a bug in Coherent 4.2.10 kernel)
- for workarounding the above issue, I removed the sed line and make
use of gawk to get a list. It successfully got some of them, and then
I got "Build.sh: line 1824: Unable to preserve redirection state when
redirecting builtin" and "grep: (standard input): bad file number"

- Signals are broken (build with -DMKSH_UNEMPLOYED(for disabling job
controls) but not -DMKSH_NOPROSPECTOFWORK, it will generate binary,
but when you run external command like "ls", it will never return to
shell prompt. actually bash-1.13 coherent port is also affected which
fails to get correct exit status from gcc in Build.sh) but with -
DMKSH_NOPROSPECTOFWORK it works. (Note that this option produces a
shell not supporting standard Korn Shell scripts.)

Result binary + source (built with -DMKSH_NOPROSPECTOFWORK):
http://roy.orz.hm/soft/mksh-coh.tgz

Best regards,
Roy

andrzej Popielewicz

unread,
Apr 26, 2012, 2:13:14 AM4/26/12
to
W dniu 2012-04-26 05:10, Roy pisze:
> Hi all,
>
> I tried to port mksh to Coherent, and I hit some issues:
> - there is no gettimeofday() and I used time() instead

gettimeofday can be found in libsocket.a , available in coherent
internet archives

> - there is no termios.h and I tried to use termio.h instead (for

termios.h and termios library can be found in coherent internet archives

> tcgetattr() and tcsetattr())
> - there is no tcgetpgrp() and tcsetpgrp() so I disabled job control

some of the above can be found in coherent archives

Good starting point is www.tuhs.org


In general , You will find a lot of missing functions in glibc. Download
glibc sources and try to port missing functions. It is one of my
favourite methods when I port something. If something is missing I look
in glibc first.


> - there is no lstat() and readlink(), I used stat() and failing the
> readlink() line

It is obvious, coherent does not support symbolic links
>

> Best regards,
> Roy

Andrzej

andrzej Popielewicz

unread,
Apr 26, 2012, 2:28:16 AM4/26/12
to
W dniu 2012-04-26 05:10, Roy pisze:
> Hi all,
>
> I tried to port mksh to Coherent, and I hit some issues:
> - there is no gettimeofday() and I used time() instead
> - there is no termios.h and I tried to use termio.h instead (for
> tcgetattr() and tcsetattr())
> - there is no tcgetpgrp() and tcsetpgrp() so I disabled job control
> - there is no lstat() and readlink(), I used stat() and failing the
> readlink() line
>


try this in Your browser

http://gopher.floodgap.com/gopher/gw?gopher://telefisk.org:70/1

It is gopher server maintaing among other stuff Coherent archive too.

Andrzej


andrzej Popielewicz

unread,
Apr 26, 2012, 2:39:17 AM4/26/12
to
W dniu 2012-04-26 05:10, Roy pisze:
> Hi all,
>
> I tried to port mksh to Coherent, and I hit some issues:
> - there is no gettimeofday() and I used time() instead
> - there is no termios.h and I tried to use termio.h instead (for

http://gopher.floodgap.com/gopher/gw?gopher://telefisk.org:70/1/coherent/sources/

termios stuff is in above directory.

Andrzej

Roy

unread,
Apr 26, 2012, 4:08:26 AM4/26/12
to
On Apr 26, 2:13 pm, andrzej Popielewicz <va...@icpnet.pl> wrote:
> W dniu 2012-04-26 05:10, Roy pisze:
>
> > Hi all,
>
> > I tried to port mksh to Coherent, and I hit some issues:
> > - there is no gettimeofday() and I used time() instead
>
> gettimeofday can be found in libsocket.a , available in coherent
> internet archives

I can't get libsocket compiling in 4.2.10, it complains there is no
NOFILE defined.

>
> > - there is no termios.h and I tried to use termio.h instead (for
>
> termios.h and termios library can be found in coherent internet archives
>
> > tcgetattr() and tcsetattr())
> > - there is no tcgetpgrp() and tcsetpgrp() so I disabled job control
>
> some of the above can be found in coherent archives
>
> Good starting point iswww.tuhs.org
>
> In general , You will find a lot of missing functions in glibc. Download
> glibc  sources and try to port missing functions. It is one of my
> favourite methods when I port something. If something is missing I look
> in glibc first.
>

Tried, the main problem is the signal one. I can see that bash-1.13.5
and mksh are affected.

andrzej Popielewicz

unread,
Apr 26, 2012, 4:30:24 AM4/26/12
to
W dniu 2012-04-26 10:08, Roy pisze:
> On Apr 26, 2:13 pm, andrzej Popielewicz<va...@icpnet.pl> wrote:

>> gettimeofday can be found in libsocket.a , available in coherent
>> internet archives
>
> I can't get libsocket compiling in 4.2.10, it complains there is no
> NOFILE defined.
>

As far as I remember I also had problems with compiling libsocket.
Finally it compiled but did not work as original compiled version. So I
am using original libsocket.a .
NOFILE should be defined somewhere in the /usr/include .If not find it
in include files of glibc or define it Yourself.It is the maksimum
number of opened files,I suspect number about of order of 64 (?) it
must be somewhere in /usr/include, perhaps with slightly changed name.
The location of include files in Coherent does not have to be same as in
glibc world.


>
> Tried, the main problem is the signal one. I can see that bash-1.13.5
> and mksh are affected.

There is also original bash version in the archives.Did You try this ?
It dose not mean signal system in Coherent is perfect. Many features are
missing .
As far as I know the option You use in mksh can allow to track signal
bugs. Try to locate where the bugs happen and what is their nature.

Andrzej

Roy

unread,
Apr 26, 2012, 7:12:58 AM4/26/12
to
On Apr 26, 4:30 pm, andrzej Popielewicz <va...@icpnet.pl> wrote:
> W dniu 2012-04-26 10:08, Roy pisze:
>
> > On Apr 26, 2:13 pm, andrzej Popielewicz<va...@icpnet.pl>  wrote:
> >> gettimeofday can be found in libsocket.a , available in coherent
> >> internet archives
>
> > I can't get libsocket compiling in 4.2.10, it complains there is no
> > NOFILE defined.
>
> As far as I remember I also had problems with compiling libsocket.
> Finally it compiled but did not work as original compiled version. So I
> am using original libsocket.a .
> NOFILE should be defined somewhere in the /usr/include .If not find it
> in include files of glibc or define it Yourself.It is the maksimum
> number of opened files,I suspect number about of order of 64 (?)  it
> must be somewhere in /usr/include, perhaps with slightly changed name.
> The location of include files in Coherent does not have to be same as in
> glibc world.
>
>
>
> > Tried, the main problem is the signal one. I can see that bash-1.13.5
> > and mksh are affected.
>
> There is also original bash version in the archives.Did You try this ?

I used the bash-1.13.5.tgz in gopher://telefisk.org/1/coherent/sources32/system/
I can see bash fails the same as bash in BeOS 5.0 (Build.sh exits
after "if the compiler does not fail correctly... no")

Andrzej Popielewicz

unread,
Apr 26, 2012, 3:27:45 PM4/26/12
to
>> W dniu 2012-04-26 10:08, Roy pisze:
> I used the bash-1.13.5.tgz in gopher://telefisk.org/1/coherent/sources32/system/

You can try the same version of bash , but ported by me, of course no
warranty that it helps

http://www.landibase.com/coherent.html


Andrzej

Roy

unread,
Apr 26, 2012, 9:22:42 PM4/26/12
to
Both bash1 and 2 from you exit after same message when running
Build.sh.

>
> Andrzej

Roy

unread,
Apr 26, 2012, 9:46:34 PM4/26/12
to
On Apr 26, 11:10 am, Roy <roy...@gmail.com> wrote:
> - Signals are broken (build with -DMKSH_UNEMPLOYED(for disabling job
> controls) but not -DMKSH_NOPROSPECTOFWORK, it will generate binary,
> but when you run external command like "ls", it will never return to
> shell prompt. actually bash-1.13 coherent port is also affected which
> fails to get correct exit status from gcc in Build.sh) but with -
> DMKSH_NOPROSPECTOFWORK it works. (Note that this option produces a
> shell not supporting standard Korn Shell scripts.)
>

In ps -alx, it shows that mksh is idle-waiting sigsuspend, so that's
the problem."that is a hint that mksh never gets the SIGCHLD, see also
http://dev.haiku-os.org/ticket/5567, Haiku had the same problem, but
fixed it" said tg.

andrzej Popielewicz

unread,
Apr 27, 2012, 1:56:39 AM4/27/12
to
W dniu 2012-04-27 03:46, Roy pisze:
> In ps -alx, it shows that mksh is idle-waiting sigsuspend, so that's

OK, it is precise enough I hope to try to fix it. I will look at it.
Thanks for the valuable signal.

Andrzej

Roy

unread,
Apr 27, 2012, 2:22:47 AM4/27/12
to
If you can fix the kernel, please let it make use of "HLT" op in its
idle loop so that it won't keep the core in full load.

>
> Andrzej

Roy

unread,
Apr 27, 2012, 11:55:41 AM4/27/12
to
On Apr 27, 1:56 pm, andrzej Popielewicz <va...@icpnet.pl> wrote:
I wonder if sigsuspend() should be same as sigprocmask(SIG_SETMASK,
&set, &oset); pause(); sigprocmask(SIG_SETMASK, &oset, NULL); ?
I don't know why sigsuspend() is broken in BeOS 5.0 and Coherent.

Roy

Andrzej Popielewicz

unread,
Apr 29, 2012, 1:14:26 PM4/29/12
to
Roy pisze:

> I wonder if sigsuspend() should be same as sigprocmask(SIG_SETMASK,


In Coherent it should have the same result.

But returning to the main problem . I assume that You suspect that
sigsuspend is broken. Well, so far as I can estimate this signal
handling in Coherent is quite well compatible with SYSV and Posix
standards.
You mentioned similar problem in other OS, which was fixed. This fixed
problem concerns threads and Coherent does not have threads and
"translating" this fix to processes is not quite obvious if at all
making sense.

I have analysed Your package. I could build mksh using gcc-2.8.1 and
gcc-3.2.3 and I did not notice any problems with sigsuspend. I did NOT
use Your option D_NOPROSPECTFORWORK nor D_NOEMPLOYED. I have removed
this option from Your mk.sh. I have added LDFLAGS="-L/usr/lib -lsocket
-lcoh -L/lib/ndp -lc -lm" , where libcoh.a is my own library of utility
functions and libsocket.a is original from mwc. I have have removed
option -DUSE_TERMIO.

Unfortunalety mksh built in such a way does not work. I observe the message

internal error, roque pointer 8C8 !!!

Never seen such message, although I have ported hundreds of packages.

BTW , the same meesage is produced if I use Your option -D_NOPROSPECTS...

Concluding I do not think the problem is related to sigsuspend, of
course I may be wrong.

BTW , the missing functions like lstat, readlink . tcgpgrp, etc I have
supplied in the form of wrappers, for example my lstat uses stat ,
readlink returns -1 and errno is set to EINVAL etc. This method is oft
used, because You do not have to change the original source code.


Andrzej



Andrzej Popielewicz

unread,
Apr 29, 2012, 1:42:43 PM4/29/12
to
Roy pisze:
> On Apr 27, 1:56 pm, andrzej Popielewicz <va...@icpnet.pl> wrote:
>> W dniu 2012-04-27 03:46, Roy pisze:

> I wonder if sigsuspend() should be same as sigprocmask(SIG_SETMASK,

Read this from Linux man, it concerns also Coherent

http://linux.die.net/man/2/sigsuspend

Essential is that sigsupend waits for the signal, for which signal
handler was defined.

Notice also the possible role of sigprocmask.

If You look at the source code of mksh, if option -D_NOPROSPECTSTOWORK
is NOT used, than both sigprocmask and sigsuspend are used.

Andrzej

Andrzej Popielewicz

unread,
Apr 29, 2012, 1:52:59 PM4/29/12
to
Andrzej Popielewicz pisze:

> internal error, roque pointer 8C8 !!!
>


I would suggest , that mksh has internal problems with memory access.
Something tries to write to the memory range allocated by malloc and
this write operation is in some way not correct, for example exceeds the
allowed range of memory addresses.
One should look in other words for memory corruption, buffer overflow
etc, I suspect.Sometimes last byte of character string euqal to '\0' is
written to the memory address , which was not allocated, the allocated
storage was one byte too short etc, just to mention an example..

Andrzej

Roy

unread,
Apr 29, 2012, 10:16:14 PM4/29/12
to
On Apr 30, 1:14 am, Andrzej Popielewicz <va...@icpnet.pl> wrote:
> Roy pisze:
>
> > I wonder if sigsuspend() should be same as sigprocmask(SIG_SETMASK,
>
> In Coherent it should have the same result.
>
> But returning to the main problem . I assume that You suspect that
> sigsuspend is broken. Well, so far as I can estimate this signal
> handling in Coherent is quite well compatible with SYSV and Posix
> standards.
> You mentioned similar problem in other OS, which was fixed. This fixed
> problem concerns threads and Coherent does not have threads and
> "translating" this fix to processes is not quite obvious if at all
> making sense.
>
> I have analysed Your package. I could build mksh using gcc-2.8.1 and
> gcc-3.2.3 and I did not notice any problems with sigsuspend. I did NOT
> use Your option D_NOPROSPECTFORWORK nor D_NOEMPLOYED. I have removed
> this option from Your mk.sh. I have added LDFLAGS="-L/usr/lib -lsocket
> -lcoh -L/lib/ndp -lc -lm" , where libcoh.a is my own library of utility
> functions and libsocket.a is original from mwc. I have have removed
> option -DUSE_TERMIO.

please release gcc-2.8.1 and gcc-3.2.3 binary packages for testing.
I'm using gcc-2.5.6 from gnu[1234].c.c++.dd, with ld.pre11 as /bin/ld
-DMKSH_UNEMPLOYED is needed if OS doesn't support tc[gs]etpgrp()
calls.

>
> Unfortunalety mksh built in such a way does not work. I observe the message
>
> internal error, roque pointer 8C8 !!!
>
> Never seen such message, although I have ported hundreds of packages.
>
> BTW , the same meesage is produced if I use Your option -D_NOPROSPECTS...
>

not found in mksh source, this should be message from kernel/libc.
and I haven't seen this in my binary. (although I'm using your
4.2.10ap 128MB-enabled kernel)
Build log: http://roy.orz.hm/soft/mksh-logs/build-coherent4210-no_sigsuspend_fixsignallist.log
(I hacked Build.sh for having correct signal list output)
Test log: http://roy.orz.hm/soft/mksh-logs/test-coherent4210-no_sigsuspend_fake-ls-s.log
(I created a wrapper for /bin/ln faking symbolic links, therefore cd-
pe test have to be disabled because of recursive directory loop)

> Concluding I do not think the problem is related to sigsuspend, of
> course I may be wrong.
>
> BTW , the missing functions like lstat, readlink . tcgpgrp, etc I have
> supplied in the form of wrappers, for example my lstat uses stat ,
> readlink returns -1 and errno is set to EINVAL etc. This method is oft
> used, because You do not have to change the original source code.

"failing" tc[gs]etpgrp() cannot fool mksh for having jobs control.
pdksh in coherent has no jobs control compiled.

>
> Andrzej

Roy

unread,
Apr 29, 2012, 10:45:32 PM4/29/12
to
On Apr 30, 1:14 am, Andrzej Popielewicz <va...@icpnet.pl> wrote:
and you may redownload mksh-coh.tgz which is based on author's
internal version having #ifdef for using sigprocmask()+pause()
+sigprocmask() instead of sigsuspend(), which works in Coherent/
Syllable Desktop/BeOS 5.0.

in original code, sigsuspend(&sm_default) and sm_default is set by
sigemptyset(), that means sigsuspend() should return when any signal
is raised (child exit should raise SIGCLD), but sigsuspend() doesn't
return and pause() does return. "so it(mksh)'s definitely a check tool
for kernel bugs" said by tg(author of mksh).

Roy

Roy

unread,
Apr 29, 2012, 10:49:28 PM4/29/12
to
> Build log:http://roy.orz.hm/soft/mksh-logs/build-coherent4210-no_sigsuspend_fix...
> (I hacked Build.sh for having correct signal list output)
> Test log:http://roy.orz.hm/soft/mksh-logs/test-coherent4210-no_sigsuspend_fake...
> (I created a wrapper for /bin/ln faking symbolic links, therefore cd-
> pe test have to be disabled because of recursive directory loop)

the test log should be: http://roy.orz.hm/soft/mksh-logs/test-coherent4210-no_sigsuspend_fake-ln-s.log

Roy

unread,
Apr 29, 2012, 11:54:29 PM4/29/12
to
forgot to mention: Build.sh in this package requires GNU grep in name
"ggrep" because Coherent grep doesn't seem to work with the search
criteria that Build.sh uses.

andrzej Popielewicz

unread,
Apr 30, 2012, 2:08:45 AM4/30/12
to
W dniu 2012-04-30 05:54, Roy pisze:
> forgot to mention: Build.sh in this package requires GNU grep in name

I am using gnu grep.

Andrzej

andrzej Popielewicz

unread,
Apr 30, 2012, 2:37:43 AM4/30/12
to
W dniu 2012-04-30 04:16, Roy pisze:
> please release gcc-2.8.1 and gcc-3.2.3 binary packages for testing.
> I'm using gcc-2.5.6 from gnu[1234].c.c++.dd, with ld.pre11 as /bin/ld

As I told many times, I will release almost everything , for example in
the form of CD or DVD image, after Robert is ready with his OpenCoherent
license. On the other hand I could imagine thousands of questions, as I
can see in this small case.
I have about 20 GB if not more of different versions of different packages.
And I am not going to make any choice. Just release everything.
Other problem, there is no warranty it will work in standard Coherent. I
would have to test it before and it all costs time and hardware.And I am
not planning for sure to test it if it works in emulator etc etc. Not so
easy.

> -DMKSH_UNEMPLOYED is needed if OS doesn't support tc[gs]etpgrp()

I understand. But You can write Your own dummy routines, as I mentioned
in my post.They will return -1.
But the code will be linkable and You can always add real life into
these functions later.


> not found in mksh source, this should be message from kernel/libc.
> and I haven't seen this in my binary. (although I'm using your
> 4.2.10ap 128MB-enabled kernel)

Yes, of course, I did not claim its is a message produced by mksh. But
the message suggests the nature of problems the system has with running
this program, it means program does something which is not acceptable
for the system.

But generally the phenomennon is not quite new, I had such cases.
Program linked but it did not work, for example crashed or even said
"cannot execute". Sometimes solution was found and sometimes not. I am
not dye-hard porter, if something does not work I leave it and take
another project, after some time I come back and perhaps solution is found.

What is interesting if I use gcc-4.4.6, the mksh does not produce this
message but the program loops, in the sense, that it consumes CPU time,
as observed on other terminal and the system is very slow. But gcc-4.4.6
is rather an experimental and I have more confidence in gcc-2.8.1 and
gcc-3.2.3.

> (I created a wrapper for /bin/ln faking symbolic links, therefore cd-
> pe test have to be disabled because of recursive directory loop)

I use also this trick in my system.Sometimes "ln" is simply a "cp".

>
> "failing" tc[gs]etpgrp() cannot fool mksh for having jobs control.
> pdksh in coherent has no jobs control compiled.

Yes. But I have also implemented these functions. In any case in both
cases , real or dummy functions ,effects are the same.


Andrzej


andrzej Popielewicz

unread,
Apr 30, 2012, 2:55:27 AM4/30/12
to
W dniu 2012-04-30 04:45, Roy pisze:
> On Apr 30, 1:14 am, Andrzej Popielewicz<va...@icpnet.pl> wrote:
>> Roy pisze:
>
> and you may redownload mksh-coh.tgz which is based on author's
> internal version having #ifdef for using sigprocmask()+pause()
> +sigprocmask() instead of sigsuspend(), which works in Coherent/
> Syllable Desktop/BeOS 5.0.
>

OK, I will check it.

> in original code, sigsuspend(&sm_default) and sm_default is set by
> sigemptyset(), that means sigsuspend() should return when any signal
> is raised (child exit should raise SIGCLD), but sigsuspend() doesn't
> return and pause() does return. "so it(mksh)'s definitely a check tool
> for kernel bugs" said by tg(author of mksh).

Not so fast. If something does not work it does not mean kernel is bad.
In 99% such cases usually there is program which is wrong.

If sigsuspend is not returning in the case of SIGCHLD signal, it means
there were no signal handler defined for SIGCHLD. You can write simple
program, with and without signal handler defined.If child sends SIGCHLD
signal, waitpid always will "detect" this, it means will return with -1
and EINTR error. I have tested it, so I assume Coherent kernel behaves
correctly , at least as I can understand this now.

So You have to analyse whether in the case You use sigsuspend , the
signal handler for SIGCHLD is active, I mean explictly defined with no
SIG_IGN or SIG_DFL.
The default behaviour of Coherent kernel is to ignore SIGCHLD, but only
in the case of SIG_IGN or SIG_DFL action.If signal handler is explictly
defined with nontrivial action, signal is not ignored and waitpid
detects it.

Andrzej

andrzej Popielewicz

unread,
Apr 30, 2012, 3:55:26 AM4/30/12
to
W dniu 2012-04-30 04:45, Roy pisze:
> On Apr 30, 1:14 am, Andrzej Popielewicz<va...@icpnet.pl> wrote:
>> Roy pisze:


> and you may redownload mksh-coh.tgz which is based on author's
> internal version having #ifdef for using sigprocmask()+pause()
> +sigprocmask() instead of sigsuspend(), which works in Coherent/
> Syllable Desktop/BeOS 5.0.

So You suggest, that in this case mksh behaves OK ?

As I said I will check it. Logically this construction is handled by
Coherent kernel in the same way as sigsuspend. Although there are some
technical differences.
If this case is really working I will try to fix sigsuspend
correspondingly simply by removing these differences.
Obvious difference is that sigsuspend is more "atomic", it is one system
call. And above construction consists of 3 system calls.This difference
cannot be removed.

Andrzej

andrzej Popielewicz

unread,
Apr 30, 2012, 3:05:27 AM4/30/12
to
W dniu 2012-04-30 08:55, andrzej Popielewicz pisze:
> W dniu 2012-04-30 04:45, Roy pisze:

> You can write simple
> program, with and without signal handler defined.If child sends SIGCHLD
> signal, waitpid always will "detect" this, it means will return with -1
> and EINTR error. I have tested it, so I assume Coherent kernel behaves
> correctly , at least as I can understand this now.
>

Of course waitpid will detect this SIGCHLD signal if signal handler is
explicitly defined , and wil NOT detect if signal handler is not
defined. By defined handler I mean not only signal statement or
sigaction statement. But also the function body of the handler.
An the handler body can be empty, it is enough that it does exist, is
not NULL.
Sometimes one uses wait or waitpid in the handlers body.


Andrzej


Roy

unread,
Apr 30, 2012, 4:15:46 AM4/30/12
to
On Apr 30, 3:55 pm, andrzej Popielewicz <va...@icpnet.pl> wrote:
> W dniu 2012-04-30 04:45, Roy pisze:
>
> > On Apr 30, 1:14 am, Andrzej Popielewicz<va...@icpnet.pl>  wrote:
> >> Roy pisze:
> > and you may redownload mksh-coh.tgz which is based on author's
> > internal version having #ifdef for using sigprocmask()+pause()
> > +sigprocmask() instead of sigsuspend(), which works in Coherent/
> > Syllable Desktop/BeOS 5.0.
>
> So You suggest, that in this case mksh behaves OK ?

Yes. It definitely works. Although it still have a bit flow over of
exit trap(which doesn't exist in other OS for example Syllable Desktop
and BeOS).
FAIL check.t15:regression-61
Description:
Check if EXIT trap is executed for sub shells.
unexpected stdout - got too much output
wanted:
start
A
A last
B
C
C last
sub exit
parent last
parent exit
got:
start
A
A last
B
C
C last
sub exit
parent last
parent exit
parent exit

Roy

unread,
Apr 30, 2012, 4:10:37 AM4/30/12
to
On Apr 30, 2:55 pm, andrzej Popielewicz <va...@icpnet.pl> wrote:
> W dniu 2012-04-30 04:45, Roy pisze:
>
> > On Apr 30, 1:14 am, Andrzej Popielewicz<va...@icpnet.pl>  wrote:
> >> Roy pisze:
>
> > and you may redownload mksh-coh.tgz which is based on author's
> > internal version having #ifdef for using sigprocmask()+pause()
> > +sigprocmask() instead of sigsuspend(), which works in Coherent/
> > Syllable Desktop/BeOS 5.0.
>
> OK, I will check it.
>
> > in original code, sigsuspend(&sm_default) and sm_default is set by
> > sigemptyset(), that means sigsuspend() should return when any signal
> > is raised (child exit should raise SIGCLD), but sigsuspend() doesn't
> > return and pause() does return. "so it(mksh)'s definitely a check tool
> > for kernel bugs" said by tg(author of mksh).
>
> Not so fast. If something does not work it does not mean kernel is bad.
> In 99% such cases usually there is program which is wrong.
>
> If sigsuspend is not returning in the case of SIGCHLD signal, it means
> there were no signal handler defined for SIGCHLD. You can write simple
> program, with and without signal handler defined.If child sends SIGCHLD
> signal, waitpid always will "detect" this, it means will return with -1
> and EINTR error. I have tested it, so I assume Coherent kernel behaves
> correctly , at least as I can understand this now.

You may have a look in jobs.c, which process the signals. in j_init():
#ifndef MKSH_NOPROSPECTOFWORK
(void)sigemptyset(&sm_default);
sigprocmask(SIG_SETMASK, &sm_default, NULL);

(void)sigemptyset(&sm_sigchld);
(void)sigaddset(&sm_sigchld, SIGCHLD);

setsig(&sigtraps[SIGCHLD], j_sigchld,
SS_RESTORE_ORIG|SS_FORCE|SS_SHTRAP);
#else
/* Make sure SIGCHLD isn't ignored - can do odd things under SYSV */
setsig(&sigtraps[SIGCHLD], SIG_DFL, SS_RESTORE_ORIG|SS_FORCE);
#endif
so it did set SIGCHLD signal handler as j_sigchld() from my
understanding.

andrzej Popielewicz

unread,
Apr 30, 2012, 4:46:27 AM4/30/12
to
W dniu 2012-04-30 10:10, Roy pisze:
> #else
> /* Make sure SIGCHLD isn't ignored - can do odd things under SYSV */
> setsig(&sigtraps[SIGCHLD], SIG_DFL, SS_RESTORE_ORIG|SS_FORCE);
> #endif


So If NOPROSPECTSTOWORK is defined the action is set to SIG_DFL, it
means there is no signal handler.
In Coherent if action is defined to SIG_IGN or SIG_DFL the signal is
ignored/discarded.

So add empty signal handler and replace SIG_DFL with this handler.
Should work.


Andrzej

andrzej Popielewicz

unread,
Apr 30, 2012, 4:48:34 AM4/30/12
to
W dniu 2012-04-30 10:15, Roy pisze:

> Yes. It definitely works. Although it still have a bit flow over of

OK,
So I will check in the evening this new version.
I wiil also try to use old gcc-2.5.6 .

Andrzej

Roy

unread,
Apr 30, 2012, 8:49:51 AM4/30/12
to
"NOPROSPECTOFWORK means absolutely no job handling and discarding the
signal is the desired action here" said tg.

>
> Andrzej

Andrzej Popielewicz

unread,
Apr 30, 2012, 1:00:58 PM4/30/12
to
Roy pisze:

> "NOPROSPECTOFWORK means absolutely no job handling and discarding the
> signal is the desired action here" said tg.

OK.So now I understand, that problems with sigsuspend are occuring when
You do NOT use NOPROSPECTSTOWORK option. In this case signal handler is
set and of course it is strange that SIGCHLD signal is not observed/catched.
I can imagine , that for example this signal mask is in some way not
correct in such a way that SIGCHLD is blocked, but it would require
deeper analysis of the code.

One would have to analyse what author wanted to do and if his code is
appropriate to achieve his goal for given OS.It is rather trivial remark
but it must be done.

I will first of all try to reproduce the problem using gcc-2.5.6 ,
because using newer compilers gives different results. I have met cases,
, and well it is well known, that behaviour of the program may depend on
the compiler You use. And with such restrictive system like Coherent one
has to be prepared for surprises.It is even worse for kernel releated stuff.

Andrzej



Andrzej Popielewicz

unread,
Apr 30, 2012, 2:44:43 PM4/30/12
to
Roy pisze:

>
> Yes. It definitely works. Although it still have a bit flow over of

Good news.
First of all, Your mksh works in my system.

Then I wanted to reproduce Your results. I could build working mksh
using Your mk.sh and gcc-2.5.6 from mwc.

More : I could build working mksh without setting the option
D_UNEMPLOYED in mk.sh. mksh built in such a way made complaints at the
start that a job a job control will not be fully functional but
everything works fine, I mean both options -i and -c work fine.

And if You omit D_UNEMPLOYED , what does happen ? If it does not work,
it would mean we have different environments , which is obviously
true.For example quite different kernel.
You have one option for example. You could try to debug mksh with gdb.


Compiled and linked with "better" gcc does not work as before.
You could try to do the following, namely analyse mksh with valgrind in
Linux. Results with "better" gcc show or suggest that something may be
wrong in mksh and valgrind should detect possible buffer overflow or
memory leaks.

In the meantime I will try to figure out why Your
sigsuspend->sigprocmask replacement was neccesary, perhaps really one
could improve something in the kernel.

I will add on my site link to Your site with mksh.
Thanks for contributing to Coherent.

Andrzej

Andrzej Popielewicz

unread,
Apr 30, 2012, 5:04:59 PM4/30/12
to
Andrzej Popielewicz pisze:
> Roy pisze:
>
>

Roy, thank You very much. You have contributed to the Coherent kernel fix.

Better gcc was right. There was buffer overflow. I have found it.In the
kernel itself. So now Your mk.sh produces working mksh without
D_NO_SIGSUSPEND. sigsuspend works fine in Coherent.

BTW , I was wrong saying that sigsuspend = sigprocmask + pause
+sigprocmask. It is not true. sigsuspend is much more sophisticated.It
must act according to its Posix specification.

In any case, version of mksh with sigsuspend works fine and Your version
with Your trick without sigsuspend is also usable in standard Coherent.
For mksh this fix is not important, because there is no job control
anyway. So Your version of mksh is enough.

But it is more important than that. It means many other applications
could benefit from this fix.All those using sigsuspend.




Andrzej

Roy

unread,
Apr 30, 2012, 7:50:29 PM4/30/12
to
On May 1, 2:44 am, Andrzej Popielewicz <va...@icpnet.pl> wrote:
> Roy pisze:
>
>
>
> > Yes. It definitely works. Although it still have a bit flow over of
>
> Good news.
> First of all, Your mksh works in my system.
>
> Then I wanted to reproduce Your results. I could build working mksh
> using Your mk.sh and gcc-2.5.6 from mwc.
>
> More : I could build working mksh without setting the option
> D_UNEMPLOYED in mk.sh. mksh built in such a way made complaints at the
> start that a job a job control will not be fully functional but
> everything works fine, I mean both options -i and -c work fine.

So you got working job control? command &/bg/fg/jobs/builtin kill are
working?
example: http://linuxcommand.org/lts0080.php

Roy

unread,
Apr 30, 2012, 8:44:46 PM4/30/12
to
On May 1, 2:44 am, Andrzej Popielewicz <va...@icpnet.pl> wrote:
> Roy pisze:
>
>
>
> > Yes. It definitely works. Although it still have a bit flow over of
>
> Good news.
> First of all, Your mksh works in my system.
>
> Then I wanted to reproduce Your results. I could build working mksh
> using Your mk.sh and gcc-2.5.6 from mwc.
>
> More : I could build working mksh without setting the option
> D_UNEMPLOYED in mk.sh. mksh built in such a way made complaints at the
> start that a job a job control will not be fully functional but
> everything works fine, I mean both options -i and -c work fine.
>
> And if You omit D_UNEMPLOYED , what does happen ? If it does not work,
> it would mean we have different environments , which is obviously
> true.For example quite different kernel.
> You have one option for example. You could try to debug mksh with gdb.

gcc 2.5.6 + ld.pre11 with -g option always throwing core dump with
resulting binary.

Andrzej Popielewicz

unread,
May 1, 2012, 4:13:29 AM5/1/12
to
Roy pisze:

> So you got working job control? command &/bg/fg/jobs/builtin kill are
> working?
> example: http://linuxcommand.org/lts0080.php

Partially yes. jobs works, in the sense it shows the list of background
jobs. I have started simple looping script in the background but not
with bg but with &.
kill works.

During start I observe the message about failing setpgid and not fully
functional job control. Perhaps one should look in the neighbourhood of
setpgid call.

Andrzej

Andrzej Popielewicz

unread,
May 1, 2012, 4:18:41 AM5/1/12
to
Roy pisze:

>> You have one option for example. You could try to debug mksh with gdb.
>
> gcc 2.5.6 + ld.pre11 with -g option always throwing core dump with
> resulting binary.
>

Do the following . Replace /u1/gnu/bin/gcc with gcc in mk.sh.
Then in shell
export PATH=/u1/gnu/bin:$PATH

in this way the ld from /u1/gnu/bin will be used, it is gnu ld-2.2.
Perhaps it helps.
I have built mksh using above method.
I am using original mwc ld only during build of the kernel.

Andrzej

Thorsten Glaser

unread,
May 4, 2012, 3:37:22 PM5/4/12
to
Andrzej Popielewicz dixit:

>> internal error, roque pointer 8C8 !!!

> I would suggest , that mksh has internal problems with memory access.
> Something tries to write to the memory range allocated by malloc and this write
> operation is in some way not correct, for example exceeds the allowed range of
> memory addresses.

Exactly. Something passes bogus pointers to one of the allocation
routines (aresize or afree).

> One should look in other words for memory corruption, buffer overflow etc, I
> suspect.Sometimes last byte of character string euqal to '\0' is written to the
> memory address , which was not allocated, the allocated storage was one byte
> too short etc, just to mention an example..

On Debian, mksh runs just fine under valgrind (given it is
compiled with the Build.sh -g option). I believe mksh itself
does not have any unchecked memory accesses that are not
introduced by the compiler, libraries or system.

If you add -DDEBUG to the compilation of lalloc.c (just that
one file) you would get more information and a core dump by
means of calling abort() on that place.

bye,
//mirabilos
--
FWIW, I'm quite impressed with mksh interactively. I thought it was much
*much* more bare bones. But it turns out it beats the living hell out of
ksh93 in that respect. I'd even consider it for my daily use if I hadn't
wasted half my life on my zsh setup. :-) -- Frank Terbeck in #!/bin/mksh

Andrzej Popielewicz

unread,
May 5, 2012, 8:25:34 AM5/5/12
to
Thorsten Glaser pisze:
> Exactly. Something passes bogus pointers to one of the allocation
> routines (aresize or afree).

If You suggest it could be a system , well it is not excluded.
>
> On Debian, mksh runs just fine under valgrind (given it is

It is a valuable information. Because porting valgrind to Coherent would
be rather difficult.

> If you add -DDEBUG to the compilation of lalloc.c (just that

I will check it because I am interested why the better compiler produces
failing code. It makes mksh quite an interesting stress test , although
phenomenon is not quite new, most programs compiled by these better
compilers work fine but some not.

gcc compiler project does not support Coherent as such, and I simply do
not have time and patiency to backport any patches that are published,
and the gcc-4.4.6 is the last which supports coff format.


In the meantime primary problem was solved by fixing a kernel.


Andrzej

Andrzej Popielewicz

unread,
May 5, 2012, 11:34:01 AM5/5/12
to
Roy pisze:

> not found in mksh source, this should be message from kernel/libc.

For sure it is not the message from libc/kernel. It is produced by
lalloc.c , in findptr function.

Andrzej

Andrzej Popielewicz

unread,
May 5, 2012, 11:59:23 AM5/5/12
to
Thorsten Glaser pisze:

> If you add -DDEBUG to the compilation of lalloc.c (just that
> one file) you would get more information and a core dump by
> means of calling abort() on that place.

I followed Your advice and used DEBUG in lalloc.

The message produced now is not more informative, it tells "rogue
pointer 8c8 at ap 0". After adding some fprintfs I have found , that
error occurs in findptr, called in itself by afree.
It means ap itself in findptr is not NULL but ap->next is.

afree fails for the same reason : ap->next is NULL.

BTW , the error occurs after many remalloc calls in subsequent aresize
calls are succesful.
I feel it will be a rather difficult task to find out why the code
created with newer compiler has problems with memory allocation/freeing.
Perhaps it depends on headers.

Anyway thanks for comments.

Andrzej

Andrzej Popielewicz

unread,
May 5, 2012, 2:31:26 PM5/5/12
to
> Thorsten Glaser pisze:

> On Debian, mksh runs just fine under valgrind (given it is

Thorsten, do not be afraid. Valgrind has right.

I have just found the solution to the problem of why better compiler
produces bad code.
The reason is NOT compiler but loader , I mean ld command. gcc-2.5.6
distribution from mwc comes with gnu ld-2.2. Usually I am using ld-2.4
. I have ported higher versions of ld but they have problems. After
switching to ld-2.2 both gcc-2.8.1 and gcc-3.2.3 produce also working mksh.

Interesting, in most cases programs built with ld-2.4 worked fine.
Interesting and I think valuable lesson for Coherent users.
If I use gcc-2.5.6 with ld-2.4 it also produces "rogue pointer 8c8" error.

Thanks anyway.

Andrzej

andrzej Popielewicz

unread,
May 8, 2012, 2:45:10 AM5/8/12
to
W dniu 2012-05-05 20:31, Andrzej Popielewicz pisze:
> > Thorsten Glaser pisze:


It does not mean ld-2.2 is the only reliable linker in Coherent.
I have just fixed ld-2.14 . Built with gcc-2.8.1 .

And now mksh works fine if built with gcc-3.2.3 and ld-2.14 too !

Andrzej

0 new messages