I have a CE application which runs pretty well on a number of embedded PC's
(eBox 2300sx).
I am experiencing seemingly random failures however - some appearing days
after the application is started. These put up a dialog with the above
message (xxx is my app name).
Similar messages have also reported this as being in 'udevice.exe'.
My app is a Managed C# CF3.5 app and makes extensive use of a TCP Socket and
a serial port (Com2).
My app has a full set of exception handlers - with some wrapping threads,
and one wrapping the whole app - but none of these are triggered when this
dialog pops up.
1. How do I identify what is causing this dialog ??.
2. How do I prevent it from occurring. There is no user - its a remotely
located device with no screen, keyboard or mouse and MUST run 24x7 completely
unattended. There is a UI but for 'diagnostic' purposes only - plug in a
screen and see what the device is doing.
I would make it a full 'windows service' if this were a PC - but its CE :-O.
My app is run from within a command file so that IF it were to crash out -
it just gets restarted - this all works until the OS puts up this dialog box
:-O - then one of my threads stops (serial port) until the OK button is
pressed.
This is making me wonder if I really should be using CE for this project at
all :-((.
I cannot really 'instrument' this any further as adding KITL prevents me
from using the serial and ethernet ports - on which my app is based. It also
completely changes the footprint of the device and thus may even mask the
issue anyway ??.
It 'looks' to me like an exception outside of my own code - maybe in the
'managed wrappers' ??.
This one has been driving me nuts for many weeks now as I cannot get to the
bottom of it - especially as the error message is SO unhelpful!.
NB I replaced the MS SerialPort with OpenNetCF Serial and this seems to have
made the occurrence rarer - I thought it had gone away - but yesterday it
came back again after running perfectly for a few days :-O.
Finally - it does not appear to happen on ALL boxes - although this may well
be a 'red herring' - ie there could be other code changes introduced whilst
trying to instrument the problem to locate it ;-O.
I am even now getting to the stage of questioning whether CE is the right
choice here - if it can put up a show-stopper like this. My eBox has 128M RAM
so maybe I should be thinking about Embedded XP or some other OS ?? - ie one
which has better support for 'services'.
Thanks hopefully.
Regards
Graham
Kind regards,
Rob
www.robtso.nl
Thanks for the feedback.
I am building my own NK.BIN file - although I wish I was not having to do
that :-O - its yet another layer of complexity I could do without ;-)) - but
I guess that it does give me greater control over the final product....
I am running a search right now - although I 'suspect' that it might be in
the CF code. I just saw my Com1 port - I had left it connected, and it shows
a lot of entries like :
Exception 'Access Violation' (14): Thread-Id=05170002(pth=87270854),
Proc-Id=051
60002(pprc=87270564) 'EmsAppCE.exe', VM-active=05160002(pprc=87270564)
'EmsAppCE
.exe'
PC=40bf7e9c(mscoree3_5.dll+0x00047e9c)
RA=40bf8256(mscoree3_5.dll+0x00048256) SP
=0004ee68, BVA=555555a8
Exception 'Raised Exception' (-1): Thread-Id=05170002(pth=87270854),
Proc-Id=004
00002(pprc=812a7800) 'NK.EXE', VM-active=05160002(pprc=87270564)
'EmsAppCE.exe'
PC=c00338b1(k.coredll.dll+0x000138b1) RA=c0033901(k.coredll.dll+0x00013901)
SP=d
1ecefc8, BVA=ffffffff
EmsAppCE.exe is my managed code app - but these exceptions are down in
mscoree and k.coredll :-O. Without runtime access to a debugger, I don't know
how to find where this is in my code ;-O.
This has made me start to look even harder at my code though - especially my
use of Queue<string> across multiple threads - and even my exception logging
mechanism - may need to beef up the thread-safe functionality here a little
as well ;-).
I am also adding an application-level 'UnhandledException' handler - as I
originally thought that just a try-catch around the Form.Run would have done
that - but I guess not - if the exception is from another managed thread
etc....
Its strange because I originally saw this type of failure back in January
but it seemed to be coming from my Socket Layer code :-O. Recently its
stopped my Serial port thread - last time it stopped the whole app :-O.
Thanks for the feedback though - helps keep me on the right track (and sane
I hope!) ;-).
Trouble is that I hate 'guesswork' (like try this and see if it fixes it) -
I'd much rather be able to understand exactly what is going wrong - then fix
it. Gives far more long-term confidence in my code :-O.
Regards
Graham
Kind regards,
Rob.
www.robtso.nl
I'm betting this is not a thread issue, but a compaction issue. You've got
some buffer that was passed to native code, then that buffer gets moved by
the GC and the native code tries to use the old buffer address.
--
Chris Tacke, Embedded MVP
OpenNETCF Consulting
Giving back to the embedded community
http://community.OpenNETCF.com
"GrahamS" <Gra...@discussions.microsoft.com> wrote in message
news:F23FBD0A-E81B-44B6...@microsoft.com...
Kind regards,
Rob
www.robtso.nl
--
Chris Tacke, Embedded MVP
OpenNETCF Consulting
Giving back to the embedded community
http://community.OpenNETCF.com
"Rob" <__rob_AT_robtso_DOT_nl__> wrote in message
news:2892B770-E4DE-4816...@microsoft.com...
This has generated some feedback here - sorry I was away doing something
else - having made some tweaks and letting it run.
To answer a few of the points.
Unmanaged code - the only stuff I am aware of is in the OpenNetCf_Serial
stuff - and I added that AFTER I started having these issues.
Thread communications are all (bar 1 which I will come to) done using simple
typed string queues. All declared as public static (as I read somewhere that
these were thread-safe in CF).
public static Queue<string> command = new Queue<string>();
There is (or was) one probably issue I found in my code. I have a static
Audit task, which can write to a disk file, and actually does quite often.
This was actually doing the writing directly from the calls - ie in each
thread!!. It just uses a System.IO stream. SO - I have now decoupled this by
just adding the text to be written into yet another static Queue - then I
have one task (in my main UI timer event) which actually does the writing. My
'guess' is that threads 'could' have been competing for the log file -
especially as it gets larger (its a new file every day) and I just append the
next log entry to it - BUT the latest crash was early on Sunday AM - at 01:54
- when the file would have been quite small anyway.
So - its running with this change made now....so we'll see how that goes I
guess.
I never found any reference anywhere under WINCE600 of 'must shut down'
which is the tail of the error message.
Wait a mo... just had a further thought - if I were to create a string in
one thread, then Enqueue that to 2 queues - does it get queued by reference
or value ??, it couldn't then be being deleted by one thread whilst still in
the second queue could it ?? - I would hope not, but could code around that
just in case I guess.
Thanks guys for your ongoing help with this one....
Graham
--
Dean Ramsier - eMVP
BSQUARE Corporation
"GrahamS" <Gra...@discussions.microsoft.com> wrote in message
news:3ACD8546-4A75-4713...@microsoft.com...
Thanks for the response - had a look at the compiler option you mention and
it refers to errors in buffer management code - functions like strcpy etc.
Well - as I use purely managed code I would 'expect' that the .Net wrappers
would ensure this does not occur :-O.
It did however made me look at my code even more critically - and I have two
areas (both in the serial interface) where I do an 'Array.copy' to move
incoming data into my own buffers. In both instances I do NOT adequately
check the returned data length against my own remaining buffer space :-O - so
this 'could' well be the root of my problem. I will be beefing up that code
at the very least - even though .Net should NOT be allowing me to do that -
or even if it does happen - it should exit gracefully and throw me an
exception I can (and would!) catch.
Pity that MS doesn't seem to be monitoring this thread - as we are
potentially exposing an underlying problem in the .Net CF 3.5 layers here
:-O. I thought that such errors should not be able to occur in 'managed code'.
Thanks again....
Regards
Graham
GrahamS is right in that there is little or no information given in this
message. It is not a managed issue, else as Chris points out we would get a
managed exception. There's something going on in conmanclient2 or perhaps
mscoree that is operating outside of managed code.
I'd love to hear if anyone has any hints about how to proceed.
Thanks,
Wil S
Make sure you get a QFE, so everybody reaps the benefits.
Good luck,
Michel Verhagen, eMVP
Check out my blog: http://GuruCE.com/blog
GuruCE
Microsoft Embedded Partner
http://GuruCE.com
Consultancy, training and development services.
- Wil S
--
Bruce Eitman (eMVP)
Senior Engineer
Bruce.Eitman AT EuroTech DOT com
My BLOG http://geekswithblogs.net/bruceeitman
EuroTech Inc.
www.EuroTech.com
"Wil" <W...@discussions.microsoft.com> wrote in message
news:A5E71651-7E53-4911...@microsoft.com...
Just got a notification that this topic had come back to life and thought I
should add my two -penn'th.
The issue arises on my code in 'some' systems but not others, and my view is
that it is not a PB or DevStudio problem :-O. Mine arises on 'release build'
code running completely stand-alone. I 'suspect' its somewhere in mscoree or
below that the error is occurring.
I have 'gone quiet' on this topic purely because I am in the process of
completely re-architecting my (managed) application and its not yet got to
the CE stage yet ;-).
Hope this helps....
Regards
Graham
I suspect that this must be a problem with the JIT compiler or .NET support
code in some way, since this would be the first execution of f(), and it is
getting JITted. I suspect that my environment is such that there is
additional memory load, etc., in the failure case that is somehow aggravating
the JIT compiler.
Does this sound like a JITter issue?
- Wil
- Wil
Paul T.
"Wil" <W...@discussions.microsoft.com> wrote in message
news:0DF44CB5-F14A-4128...@microsoft.com...
"Wil" <W...@discussions.microsoft.com> wrote in message
news:0DF44CB5-F14A-4128...@microsoft.com...
Now I've opened another case with MS as we're seeing the "serious error"
dialog outside the managed debugger situation...
It seems that there is a problem in CF 3.5. I am working on catching
this message during the application lifecycle. IF memory dump will be
available I think it will be clear what is the issue.
The second issue I noticed is that sometimes remote process viwer just
stops working. It connects to the device buyt shows only one record
with adress 0000000 - looks like something VERY buggy.
The most is that when I replace the tolhelp.api DLL from the release
directory - it works back!
But the files are different in size... something very very strange.
Besides, I n oticed that using toolhelp.dll (for toolhelp API) can
cause some very weird things. For example,
One more failure:
On debug output i have:
Exception 'Data Abort' (4): Thread-Id=06bd0052(pth=84270d4c), Proc-
Id=06bc004e(pprc=83d2b780) 'GUI.exe', VM-active=06bc004e
(pprc=83d2b780) 'GUI.exe'
PC=4002c384(coredll.dll+0x0001c384) RA=40d11660(mscoree3_5.dll
+0x000b1660) SP=0007fad0, BVA=00000064
GUI.exe is our managed application.
From the Dr.Watson dump I have:
STACK_TEXT:
0007fad0 40d1165c : 00000060 06bd0052 0000001c ffffc808 : coredll!
LineTo+0x40 [d:\yamafp-1\private\winceos\coreos\core\thunks
\twingdi.cpp @ 958]
0007fad8 40d82324 : 00000060 06bd0052 0000001c ffffc808 : mscoree3_5!
PALCriticalSection_Enter+0x4
0007fadc 40d0cb7c : 00000060 06bd0052 0000001c ffffc808 : netcfagl3_5!
GUI_Init+0x2dbf4
0007fae8 40d5d930 : 00000060 06bd0052 0000001c ffffc808 : mscoree3_5!
NSLHandle_Release+0x3c
0007faf4 40d54228 : 00000060 06bd0052 0000001c ffffc808 : netcfagl3_5!
GUI_Init+0x9200
0007fb18 40c94a84 : 00000060 06bd0052 0000001c ffffc808 : netcfagl3_5!
DllMain+0xac
0007fb1c 40c9e034 : 00000060 06bd0052 0000001c ffffc808 : mscoree3_5!
EE_SetTerminateProc+0x34
0007fb20 40c9495c : 00000060 06bd0052 0000001c ffffc808 : mscoree3_5!
EE_GetCurrentAppDomainCompatVersion+0xbd4
0007fb30 40d2d59c : 00000060 06bd0052 0000001c ffffc808 : mscoree3_5!
PALProcess_CurrentTerminate+0x1a14
0007fb3c 40d2c594 : 00000060 06bd0052 0000001c ffffc808 : mscoree3_5!
DllGetClassObjectInternal+0x3e4
0007fb4c 40c498d4 : 00000060 06bd0052 0000001c ffffc808 : mscoree3_5!
PALHost_LaunchApp+0x214
0007fda0 40c49a80 : 00000060 06bd0052 0000001c ffffc808 : mscoree!
PALHost_LaunchApp+0x64
0007fdb8 4003c124 : 00000060 06bd0052 0000001c ffffc808 : mscoree!
CorExeMain+0x174
0007fdf0 00000000 : 00000060 06bd0052 0000001c ffffc808 : coredll!
_InternalCxxFrameHandler+0x388
It seems that our app caused this error in MS dll. The problem is how
to know the managed stack that caused the crash:(
Any ideas?
Thanks
Igor, I'm not sure what you are replying or referring to, but considering
the title of your posting you may be wondering where that message box comes
from. The answer is simple, it is generated by a built-in check that tries
to detect overflows of arrays on the stack.
I recently had the same and it turned out that sometimes two threads were
accessing the same piece of data concurrently.
Uli
--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
Hi
Great to have an answer;;;
But maybe there is a way to know where did it come from? In a large
app its almost impossible to find it by trying to look over the
code... Especially when I cannot reproduce this message.
Thanks anyway:)
Igor
The thread linked below is what I found and what helped me most. The problem
is that it even is hard to produce the error even when using the example
the compiler docs give.
Anyway, what I did was to for another thread that watches all others. If one
thread runs into this error, it will be stopped, which can be detected
because all others are still running. This is a bit different from e.g.
a "normal" access violation, where the OS terminates the application. Here,
the check is built into the application which otherwise keeps on running.
Once I had an idea where the crash happened, I also saw patterns that
triggered it and eventually could resolve it. There was some guessing
involved though.
Good luck!
Uli
Hi Ulrich
I didnt understand what you mean by "watcvhing another threads".
How does it happen? Can you give me a simple example here?
Thanks
I just updated the current file and line in a global variable and checked
that variable regularly in a different thread. If it doesn't change for
some time, I can assume the thread is blocked at that position.
Uli
OK thanks:)