ArangoDB crashing intermiddently on Windows

137 views
Skip to first unread message

Julian May

unread,
Jan 15, 2016, 5:13:28 AM1/15/16
to ArangoDB
Hey

We've had crashes in development environment which we've overlooked, but it persists in user-tests, so we need to act on it now.
There is no apperrent pattern to when and why it crashes - it crashes when not being queried at all. It happens maybe 1 or 2 times a week per instance/machine.
All instances are setup using the xcopy recipe and a downloaded zip-file
On my machine (Windows 8.1 pro), it happened once on 15th jan, 12th jan, 29th dec.

On Windows 8.1 pro, running ArangoDB v. 2.7.3 we get this error in application log:

Log Name:      Application
Source:        Windows Error Reporting
Date:          1/15/2016 10:37:13 AM
Event ID:      1001
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      Developer02.Spectra.local
Description:
Fault bucket 120371643980, type 4
Event Name: APPCRASH
Response: Not available
Cab Id: 120337341409

Problem signature:
P1: arangod.exe
P2: 2.7.2.15
P3: 565dddd9
P4: arangod.exe
P5: 2.7.2.15
P6: 565dddd9
P7: c0000005
P8: 00000000004e2b6b
P9: 
P10: 

Attached files:
C:\Users\jm\AppData\Local\Temp\WERD06A.tmp.WERInternalMetadata.xml
C:\Users\jm\AppData\Local\Temp\WERDC13.tmp.appcompat.txt
C:\Users\jm\AppData\Local\Temp\WERDC33.tmp.dmp
C:\Users\jm\AppData\Local\Temp\WERDC63.tmp.WERDataCollectionFailure.txt

These files may be available here:
C:\Users\jm\AppData\Local\Microsoft\Windows\WER\ReportArchive\AppCrash_arangod.exe_7d9a406342a0a766c1a28b2f168a387a781588_b710e840_cab_4299c680   (ATTACHED "Report.wer")

Analysis symbol: 
Rechecking for solution: 0
Report Id: 56bf477a-bb6a-11e5-bedc-f0921ce75827
Report Status: 9
Hashed bucket: ca6eba7952347f35e69f4f8270d2a429
Event Xml:
  <System>
    <Provider Name="Windows Error Reporting" />
    <EventID Qualifiers="0">1001</EventID>
    <Level>4</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-01-15T09:37:13.000000000Z" />
    <EventRecordID>136181</EventRecordID>
    <Channel>Application</Channel>
    <Computer>Developer02.Spectra.local</Computer>
    <Security />
  </System>
  <EventData>
    <Data>120371643980</Data>
    <Data>4</Data>
    <Data>APPCRASH</Data>
    <Data>Not available</Data>
    <Data>120337341409</Data>
    <Data>arangod.exe</Data>
    <Data>2.7.2.15</Data>
    <Data>565dddd9</Data>
    <Data>arangod.exe</Data>
    <Data>2.7.2.15</Data>
    <Data>565dddd9</Data>
    <Data>c0000005</Data>
    <Data>00000000004e2b6b</Data>
    <Data>
    </Data>
    <Data>
    </Data>
    <Data>
C:\Users\jm\AppData\Local\Temp\WERD06A.tmp.WERInternalMetadata.xml
C:\Users\jm\AppData\Local\Temp\WERDC13.tmp.appcompat.txt
C:\Users\jm\AppData\Local\Temp\WERDC33.tmp.dmp
C:\Users\jm\AppData\Local\Temp\WERDC63.tmp.WERDataCollectionFailure.txt</Data>
    <Data>C:\Users\jm\AppData\Local\Microsoft\Windows\WER\ReportArchive\AppCrash_arangod.exe_7d9a406342a0a766c1a28b2f168a387a781588_b710e840_cab_4299c680</Data>
    <Data>
    </Data>
    <Data>0</Data>
    <Data>56bf477a-bb6a-11e5-bedc-f0921ce75827</Data>
    <Data>9</Data>
    <Data>ca6eba7952347f35e69f4f8270d2a429</Data>
  </EventData>
</Event>


On Windows 7 (SP1) running ArangoDB v. 2.7.2, we get this:

Log Name:      Application

Source:        Application Error

Date:          15-01-2016 09:31:32

Event ID:      1000

Task Category: (100)

Level:         Error

Keywords:      Classic

User:          N/A

Computer:      PBCPC01.Spectra.local

Description:

Faulting application name: arangod.exe, version: 2.7.1.12, time stamp: 0x56409886

Faulting module name: ntdll.dll, version: 6.1.7601.19045, time stamp: 0x56259295

Exception code: 0xc0000005

Fault offset: 0x000000000004fba7

Faulting process id: 0x3a8c

Faulting application start time: 0x01d14eafcd49eb72

Faulting application path: C:\Temp\XPC Core\Dependencies\ArangoDB-2.7.1-win64\bin\arangod.exe

Faulting module path: C:\Windows\SYSTEM32\ntdll.dll

Report Id: 60fa6c8b-bb62-11e5-b92f-954412d44514

Event Xml:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

  <System>

    <Provider Name="Application Error" />

    <EventID Qualifiers="0">1000</EventID>

    <Level>2</Level>

    <Task>100</Task>

    <Keywords>0x80000000000000</Keywords>

    <TimeCreated SystemTime="2016-01-15T08:31:32.000000000Z" />

    <EventRecordID>166990</EventRecordID>

    <Channel>Application</Channel>

    <Computer>PBCPC01.Spectra.local</Computer>

    <Security />

  </System>

  <EventData>

    <Data>arangod.exe</Data>

    <Data>2.7.1.12</Data>

    <Data>56409886</Data>

    <Data>ntdll.dll</Data>

    <Data>6.1.7601.19045</Data>

    <Data>56259295</Data>

    <Data>c0000005</Data>

    <Data>000000000004fba7</Data>

    <Data>3a8c</Data>

    <Data>01d14eafcd49eb72</Data>

    <Data>C:\Temp\XPC Core\Dependencies\ArangoDB-2.7.1-win64\bin\arangod.exe</Data>

    <Data>C:\Windows\SYSTEM32\ntdll.dll</Data>

    <Data>60fa6c8b-bb62-11e5-b92f-954412d44514</Data>

  </EventData>

</Event>



Let me know how i can be of any assistance to tracing the origin of this bug. 
I have ~17 MB eventlogs of ArangoDB errors like the one below on my dev machine - times does not relate to the crashes, but might be a clue for you?


Log Name:      Application
Source:        ArangoDB
Date:          1/15/2016 9:29:46 AM
Event ID:      256
Task Category: (3)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      Developer02.Spectra.local
Description:
The description for Event ID 256 from source ArangoDB cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

Cannot find manifest file "c:\Code\XpectraDependencies\ArangoDB-2.7.2-win64\var\lib\arangodb-apps\_db\Availability\av\APP\manifest.json"
C:\b\ArangoDB-2.7.2\lib\V8\v8-utils.cpp
JS_Log
1839

Event Xml:
  <System>
    <Provider Name="ArangoDB" />
    <EventID Qualifiers="49154">256</EventID>
    <Level>2</Level>
    <Task>3</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2016-01-15T08:29:46.000000000Z" />
    <EventRecordID>136139</EventRecordID>
    <Channel>Application</Channel>
    <Computer>Developer02.Spectra.local</Computer>
    <Security />
  </System>
  <EventData>
    <Data>Cannot find manifest file "c:\Code\XpectraDependencies\ArangoDB-2.7.2-win64\var\lib\arangodb-apps\_db\Availability\av\APP\manifest.json"</Data>
    <Data>C:\b\ArangoDB-2.7.2\lib\V8\v8-utils.cpp</Data>
    <Data>JS_Log</Data>
    <Data>1839</Data>
  </EventData>
</Event>

Report.wer
Message has been deleted

Wilfried Gösgens

unread,
Jan 15, 2016, 7:15:02 AM1/15/16
to ArangoDB
Hi Julian,

Regarding the 0x...5 - thats a segmentation violation. Can you add by any chance try to catch a dump for this situation?
have a look at:
https://github.com/arangodb/arangodb/blob/devel/README_maintainers.md#windows-debugging

Did you try 2.8 beta yet whether it shows the same problems?

Regarding the error messages, these seem to originate from some foxx route missing a manifest file; You should be able to fix this by removing and re installing the foxx service for this route and adding the manifest file.

Cheers,
Willi


On Friday, January 15, 2016 at 11:13:28 AM UTC+1, Julian May wrote:
Hey

We've had crashes in development environment which we've overlooked, but it persists in user-tests, so we need to act on it now.
There is no apperrent pattern to when and why it crashes - it crashes when not being queried at all. It happens maybe 1 or 2 times a week per instance/machine.
All instances are setup using the xcopy recipe and a downloaded zip-file
On my machine (Windows 8.1 pro), it happened once on 15th jan, 12th jan, 29th dec.

On Windows 8.1 pro, running ArangoDB v. 2.7.3 we get this error in application log:

Julian May

unread,
Jan 15, 2016, 7:51:25 AM1/15/16
to ArangoDB
Hey Willi

Nope, have not been procdump-monitoring the instances or tried 2.8 beta.
I'll give 2.8 beta a go, and start procdump-monitoring it if the problem persists.

The error messages regarding makes perfect sense then, I've been fiddling about and forgetting to close my editor when removing services under development, which left folder/files in a bad state - disregarding these then :)

Thanks!

/Julian

Julian May

unread,
Jan 16, 2016, 12:25:56 PM1/16/16
to ArangoDB

Julian May

unread,
Jan 17, 2016, 7:37:25 AM1/17/16
to ArangoDB
I really hope we can resolve this problem soon - it's a show-stopper for us regarding using ArangoDB, since we do not have non-Windows environments :,(

Wilfried Gösgens

unread,
Jan 18, 2016, 10:58:27 AM1/18/16
to ArangoDB
Ok,
 from the analysis it seems that somethings wrong with the heap.

a call to `new` inside of AQL fails with segmentation fault.
Since we run the linux build with Valgrind on our CI and we have nothing wrong in there, it seems to be windows specific.

As suggested in http://stackoverflow.com/questions/1010106/how-to-debug-heap-corruption-errors - I conpiled with these flags, performance seems to be terribly slow.
Currently still starting the test-server since 20 minutes, so while valgrind may not be fast, this is slower. We will have to wait whether that brings something to light.

Julian May

unread,
Jan 18, 2016, 11:09:15 AM1/18/16
to ArangoDB
Allright. Jan Stücke helped me with the "trust/buy-in from organisation" side of the issue, so it's less 'panicky' at this point :)
Is there an issue on git for this I could monitor? 
Thanks!

/Julian

Wilfried Gösgens

unread,
Jan 27, 2016, 5:06:35 AM1/27/16
to ArangoDB
Hi,
we've been trying hard to reproduce this (and several other problems) on windows - and failed.
Can you retry whether this still persists with the 2.8 final release, and if is there a chance we can get your data directory for a detail analysis?
For the legal details (if) contact us at hackers at arangodb dot com, so we can fix this.

Cheers,
Willi

Wilfried Gösgens

unread,
Jan 27, 2016, 2:35:29 PM1/27/16
to ArangoDB
Hi,
to make myself more clear... In case of 2.8.0 final still showing these issues (we have done some fixes, so theres hope its gone) we would like to reproduce this on a local development system so we can inspect this further.
Since we haven't been successfull doing so, a copy of your current database directory and a set of your queries to execute  (or the foxx services to run) would be great.
We'd be for shure willing to sign an NDA about treating your data private.

Cheers,
Willi

Julian May

unread,
Jan 28, 2016, 12:52:11 AM1/28/16
to ArangoDB
Hey Willi
We'll update to 2.8.0 all around and pay close attention over the next week. There's no issue sending a copy of the directory should the problem persist, but I have faith :)
If we don't see further crashes, by Friday next week I'll consider this resolved, otherwise I'll privately email you a copy of what was running on the crashed instance.

Thanks a ton for the effort!
/Julian

Julian May

unread,
Feb 5, 2016, 1:39:33 AM2/5/16
to ArangoDB
No crashes! Resolved :D Thanks and have a great karneval weekend!

/Julian

Reply all
Reply to author
Forward
0 new messages