I have been discussing this matter in the Gnatlist but it seems that we
have not extracted any definitive conclusion, so I want to ask here too.
I have a multitasking program; it has service tasks with usually are
accepting rendezvous, and periodic tasks that do some processing and
sleep for some time.
The problem is that I've detected that some (many) of these tasks are
being stopped for unusual long times, after some running time. It seems
like I issue a delay 1.0 but they sleep for much more. I haven't find
any deadlock or critical race condition anywhere, CPU usage is idle.
Since the program is fairly big and complex, I've distilled the
following test case to rule out problems in my code:
http://www.mycgiserver.com/~mosteo/dead-noio.zip
This is fairly simple but exhibits the same behavior. It has a queue for
trace messages, a task reading them, a task producing them, and other
dummy tasks that seem to favour the happening of the wrong thing.
The program runs fine for some time, and suddenly traces cease to
appear. Some arbitrary time after, usually less than half an hour (but
sometimes hours, sometimes "never"), everything resumes normally but is
evident that the task I'm using as reference (called heartbeat in my
sample) has slept for much more time than it should. I detect that
condition and report it.
I have the following test combinations, in everyone I launch eight
simultaneous instances of the test:
A) AMD 2000+ winXP SP1 Gnat 3.15p. Usually some of the instances
exhibits the rare stopping in less than half an hour.
B) Same computer as A) but with Linux, Gnat 3.15p native threading:
Everything OK (several hours uptime).
C) Same computer as A) but using ObjectAda free edition: Everything OK
(several days uptime).
D) PIII 1000 WinXP Gnat 3.15p. It has happened only once in several
days.
People from the gnatlist have run similar versions of this test program;
a Win2000 user has reported the freezing after 90 minutes, while users
of other platforms have not reported any problem for very long uptimes.
I would be very grateful if you could review my program and test it. I
would want to know if
a) There is a justification for abnormally long delays in my code,
because of some misunderstanding or bug in my tasks.
b) This is XP or at least windows specific, or even if it happens to
someone apart from me.
c) It could be a problem in Gnat-Windows itself.
d) How could I workaround that, if possible.
I've launched the eight tasks just before starting this post. They're
now all frozen, and what's more strange, I've seen them frozen
within 30 seconds of difference (I was keeping an eye on them), and have
resumed at the nearly same time. CPU idle all the time. This is driving
me crazy. Any light or hope in this matter would make me very happy. I
have usually uptimes of several days and no other program is showing
strange behaviors. I've tested my RAM also.
Thanks in advance,
--
-------------------------
Jano
402450.at.cepsz.unizar.es
-------------------------
I had the same problem. It was becouse of wrong result from QueryPerformcnceCounter Win32 API call on some hardware configurations. GNAT for Win32 is counting time by using QueryPerformcnceCounter call. Some hardware configurations could make this counter to be wrong sometimes. See article in the MSDN
http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B274323
it is talking about leap counter forward, and contain program in C to detect it.
I was encounter the leap counter backward, and wrote program in Ada to detect it.
See the sources to understand how to use it.
Very short description is:
Press a key '1' then counter would checked 10000 times for the next counter is bigger than previous,
if it is not so the message
[prev_counter] > [next_counter] would appear.
Press '2' the same checking would be 20000, etcetera till '9'
------------------------------------
with System.OS_Interface;
with Ada.Text_IO;
procedure Time is
use Ada.Text_IO;
use System.OS_Interface;
Perf_Freq : aliased LARGE_INTEGER;
Curr_Counter, Prev_Counter : aliased LARGE_INTEGER;
Char : Character;
Pass : Natural := 0;
Max_Diff : LARGE_INTEGER := 0.0;
Diff : LARGE_INTEGER;
procedure Check (Item : BOOL);
pragma Inline (Check);
procedure Check (Item : BOOL) is
begin
if not Item then
raise Program_Error;
end if;
end Check;
begin
Check (QueryPerformanceFrequency (Perf_Freq'Access));
Put_Line (LARGE_INTEGER'Image (Perf_Freq));
Check (QueryPerformanceCounter (Prev_Counter'Access));
loop
Check (QueryPerformanceCounter (Curr_Counter'Access));
Diff := Curr_Counter - Prev_Counter;
if Diff < 0.0 then
Put_Line (LARGE_INTEGER'Image (Prev_Counter)
& " >" & LARGE_INTEGER'Image (Curr_Counter));
elsif Diff > Max_Diff then
Max_Diff := Diff;
end if;
Prev_Counter := Curr_Counter;
if Pass = 0 then
Get_Immediate (Char);
case Char is
when '0' =>
Put_Line (LARGE_INTEGER'Image (Curr_Counter));
when 'f' | 'F' =>
Check (QueryPerformanceFrequency (Perf_Freq'Access));
Put_Line (LARGE_INTEGER'Image (Perf_Freq));
when '1' .. '9' =>
Pass := Integer'Value ("" & Char) * 10000;
Max_Diff := 0.0;
Check (QueryPerformanceCounter (Prev_Counter'Access));
when 'q' | 'Q' => exit;
when others => Put (Char);
end case;
else
Pass := Pass - 1;
if Pass = 0 then
Put_Line (LARGE_INTEGER'Image (Curr_Counter)
& LARGE_INTEGER'Image (Max_Diff));
end if;
end if;
end loop;
end Time;
-----------------------------------
> I had the same problem. It was becouse of wrong result from QueryPerformcnceCounter Win32 API call on some hardware configurations. GNAT for Win32 is counting time by using QueryPerformcnceCounter call. Some hardware configurations could make this counter to be wrong sometimes. See article in the MSDN
Finally some certain facts! You don't know how grateful I feel now. I
was desperate and questioning my most basic understanding of tasking.
Your test program effectively has shown that my PC is affected.
Now the question :D Is there a solution?
I don't understand clearly from the Microsoft article if a motherboard
driver update can help. I will try that as first attempt.
I suppose that my only other alternative is to alter my Gnat (ucks!) Any
hints for that?
Out of curiosity, is that corrected in Gnat 3.16?
> Now the question :D Is there a solution?
Change your PC :)
Pascal.
--
--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://perso.wanadoo.fr/pascal.obry
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595
See also
www.adapower.com/articles/gnatclockfix.html
That, IIRC, was a fix to Gnat 3.14p They released 3.15p with a
different correction, which had a bug in it (see comp.lang.ada
"Re: ANN: GNAT 3.15p binary for OS/2 available")
> See also www.adapower.com/articles/gnatclockfix.html
> That, IIRC, was a fix to Gnat 3.14p They released 3.15p with a
> different correction, which had a bug in it (see comp.lang.ada
> "Re: ANN: GNAT 3.15p binary for OS/2 available")
Mmm. By the dates, it seems that the patch is for 3.14p. However,
someone knows if it could work with 3.15p?
In any case, I'm clueless about what I must do with the replacement
file. I need to download sources and recompile? I see that the sources
are in my binary installation, but I don't think that replacing there as
achieved anything.
Thanks,
Glad to help.
> Your test program effectively has shown that my PC is affected.
>
> Now the question :D Is there a solution?
I think there is no way to complete workaround hardware error by the any
software solutions, so, the best way is to change the computer.
> I don't understand clearly from the Microsoft article if a motherboard
> driver update can help. I will try that as first attempt.
I don't think.
> I suppose that my only other alternative is to alter my Gnat (ucks!) Any
> hints for that?
>
> Out of curiosity, is that corrected in Gnat 3.16?
It is not GNAT fug, so the GNAT should not be fixed. Maybe somebody could
provide GNAT RTL for Win32 not based on the QueryPerformanceCounter ? It
would help, but the time precision would be decreased.
> I think there is no way to complete workaround hardware error by the any
> software solutions, so, the best way is to change the computer.
Argh! It's only some months old!
> > I don't understand clearly from the Microsoft article if a motherboard
> > driver update can help. I will try that as first attempt.
>
> I don't think.
They didn't :(
> It is not GNAT fug, so the GNAT should not be fixed. Maybe somebody could
> provide GNAT RTL for Win32 not based on the QueryPerformanceCounter ? It
> would help, but the time precision would be decreased.
I see that we are talking about two different problems? The MSKB article
talks about leaps forward, but your test program shows leaps back. In
reality, more that leaps, it seems that some value is mangled because
the incorrect values are usually very small (< 5000)
The patch everybody is talking about would correct both problems or only
the leap forward? Because if the later, I'm on my own, I suppose.
And that motherboard got good reviews... argh... and is not a strange
one... argh again. Oh well.
> And you need special compile option(s) to recompile
> system components. It's doable, but somewhat tedious.
Is that procedure somewhere in the documentation? I've looked (not very
thorough) but haven't found that.
> I'll try to
> post my correction, but in the meantime perhaps some kind soul, in
> spirit of Free Software, will send you s-osprim.adb from 3.16, which
> hopefully has only the single change.
0:-)
> You originally mentioned discussing this on gnatlist, where at least
> one other user tested and found the same behavior. I presume that was
> also using Gnat 3.15p?
I can't find their post right now and I am in a hurry... I'll dig in the
archives later. I don't remember however any mention about that so I
suppose it was 3.15p.
>I think there is no way to complete workaround hardware error by the any
>software solutions, so, the best way is to change the computer.
Not sure, if you mean that, but "working with software around hardware
errors" is part of my daily job. ;)
Vinzent.
SORRY SORRY SORRY my fault, I was misinterpreting the results of
Anisikov's test. I have no backward leaps. Then the patch discussed
maybe my solution.
Many thanks for your advice, I'm going to try it right now. Furthermore,
I was misinterpreting the results from the test (not read the source
carefully enough the first time). So my problem is the one described in
the knowledge base and the one which the patch addresses :) :) :)
Go and complain at the store and ask to get the hardware replaced. If
not complain to the producer.
--
Preben Randhol http://www.pvv.org/~randhol/
This line appears as the last line before end System.OS_Primitives,
in package elaboration, in 5wosprim.adb, in GCC 3.3.
Leap backward would looks like
123412341234 > 123412341234
in the console of Ada test after click '1' or '9' keys.
Leap forward would be detected by the C test written in the MSDN link I gave you.
Still in 316 and 500a there is still a problem in gnat related with delays.
We found that applications sometimes whould wait for a very long time.
This is caused by the logic for waiting inside s-prtaop.adb.
If you check the loop for the routines that wait with a timeout you find
that each loop
the routines do the call to monotonic time twice, which is wrong.
This bug is fixed in the wavefront of 501.
Wiljan
Same code as in 3.15p. And I think I have missed the . Sorry.
Yes, the world is not ideal. And some computer subsystems (hardware or software) has an errors.
I think we should fix the errors on the place where it is, this way is better than to do errors workaround. We would have more stable systems this way.
> Yes. That's the initial synchronization. But it also needs to
> appear when there's a re-synchronization, ie, when Base_Clock changes.
> Unless of course 5wosprim.adb has a different solution to the problem.
I've applied the change you suggested and recompiled the runtime. After
more than 8 hours of uptime, when before my program rarely lasted for
more than half an hour without hanging, it seems I'm done with that
pesky problem.
Many thanks!