We have seen something similar where QM is not notified of loss of the
network connection and the process hangs inside a Linux library call where
we cannot see the logout request.
Although we need a better solution for this, you should be able to kill the
QM processes from Linux rather than a complete restart. Our cleanup
mechanism will then recover the licences within five minutes. You can speed
this up by doing
qm -cleanup
I have forwarded your email to one of our dealers who has identified a
problem in Linux ssh that might explain this.
Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton, NN4 6DB
+44-(0)1604-709200
We are running Gentoo base system version 1.6.14.
--
Cedric Fontaine
http://www.terroirsquebec.com
So I should just kill -9 the qm process on linux and then qm -cleanup ?
I'm sorry Martin, I really don't expect you to be resolving Linux
issues, but I do see a great deal of irony in all of this.
T
I sometimes get this sort of finger pointing. It often happens on
Windows systems, and another MV database that I use. Nice to see the
same thing happening with Linux. Don't want the FOSS people missing
out! ;-)
Anyway, I've found an effective way to stop the finger pointing is to
ask the person doing the pointing for EVIDENCE that the bug is where
they say it is.
So, Martin. Do you have proof that the bug is in the Linux networking code?
Ashley Chapman
> I'm sorry Martin, I really don't expect you to be resolving
> Linux issues, but I do see a great deal of irony in all of this.
I agree that it is not our job but, in this particular instance, one of our
dealers has identified and fixed a problem that sounds like it could be the
same issue. I have asked him to communicate directly with Cedric (or perhaps
via this list) and he has agreed to do so as soon as time permits.
Re Ashley's comment...
> So, Martin. Do you have proof that the bug is in the Linux
> networking code?
We have seen two network connection problems that appear to be in Linux. The
one that fits closest to Cedric's problem is where we hang inside a kernel
function (as shown by strace) and never return to QM. This makes it
difficult for us to catch the error.
The other one involves poll() or select() saying "yes, there is data waiting
to be read" and read() saying "no there isn't", resulting in a loop trying
to recover the non-existant data. We have worked around this one inside QM.
Just a thought...
If there's a suspected problem in the linux internals, then presumably
this problem does not exist for QM on the Windows or BSD platforms.
If that's the case, perhaps you can consider using QM on top of
FreeBSD. Unless you are tightly tied to Gentoo.
Ashley
In other words, Martin, it may be better to be less eager to fix
this on your own, even if you can.
T
Any sign of the requested diagnostics?
We have tried repeatedly to reproduce this here but have so far failed. It
is tough to diagnose the cause without an example to look at.
Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton, NN4 6DB
+44-(0)1604-709200
This sounds very much like the Linux problem that one of our users tracked
down. I will ask him again to reply.
> What are the command lines for core dump or strace ?
Depending on your system, you should be able to force a core dump with
kill -4 (SIGILL) but it doesn't seem to work on all systems.
Were supported, strace is
strace -p 1234
where 1234 is the pid of the process that you want to trace.