ArangoDB crashes after importing data

149 views
Skip to first unread message

torst...@gmx.de

unread,
Dec 23, 2014, 5:43:26 PM12/23/14
to aran...@googlegroups.com
Hi,

I am evaluation ArangoDB for our new production system. We are planing to move from Oracle to NOSQL.
My first test was to upload some data to one new created collection.
I start 10.000 batch uploads which a chunk of 2259 records. After 3146 batch uploads it crashes. These means 7.106.814 records.

One record looks like:
HEADER: type_id;account_name;owner;first_name;last_name;last_login;account_type;owner;entitlements
DATA: 147-1;merodac...@company.com;merodac...@company.com;merodach;kyrion;27.09.2014 00:35:28;personal;merodac...@company.com;[8553] - DBA_ADVISOR_TEMPLATES : SELECT /->/ App_Store.app
 

The http connection was dropped. Also it is not possible to start ArangoDB with this database again. I have not found any log file which could give me answers to the Problem. I have repeated this test twice and it always have crashed. On the first test I have not counted how many batch uploads where done.

I tested this on a mac book pro with 16 GB of RAM

Any ideas where to search for the problem. The log file in the data directory show only some INFO messages:



Frank Celler

unread,
Dec 27, 2014, 10:56:53 AM12/27/14
to aran...@googlegroups.com
Hello Torsten,

we are trying to reproduce the problem, but need some more information:

- Which ArangoDB version are you using?
- Which MacOS version?
- Did you install the homebrew or the APP version?
- Did you have any resource limits in place (i. e. what is the output of "ulimit -a")
- Do you have a core dump?

Thanks
  Frank



Am Dienstag, 23. Dezember 2014 23:43:26 UTC+1 schrieb torst...@gmx.de:
Hi,

I am evaluation ArangoDB for our new production system. We are planing to move from Oracle to NOSQL.
My first test was to upload some data to one new created collection.
I start 10.000 batch uploads which a chunk of 2259 records. After 3146 batch uploads it crashes. These means 7.106.814 records.

One record looks like:
HEADER: type_id;account_name;owner;first_name;last_name;last_login;account_type;owner;entitlements
DATA: 147-1;merodach.kyrion@company.com;merodach.kyrion@company.com;merodach;kyrion;27.09.2014 00:35:28;personal;merodach.kyr...@company.com;[8553] - DBA_ADVISOR_TEMPLATES : SELECT /->/ App_Store.app

torst...@gmx.de

unread,
Dec 27, 2014, 12:22:38 PM12/27/14
to aran...@googlegroups.com
Hello Frank,

sorry for my incomplete description:

It was the ArangoDB Version 2.3.3. Downloaded as the APP version for MacOSX. I have Yosemity (10.10.1) installed.
I have no core dump because I do not know where to find it. The log which was in the database directory did not show anything.
If you are interested in I could send you the test program and the test data I have used.

I have not done any configuration in ArangoDB. I just downloaded it, started it and then create one collection for the import.

user$ ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

file size               (blocks, -f) unlimited

max locked memory       (kbytes, -l) unlimited

max memory size         (kbytes, -m) unlimited

open files                      (-n) 2560

pipe size            (512 bytes, -p) 1

stack size              (kbytes, -s) 8192

cpu time               (seconds, -t) unlimited

max user processes              (-u) 709

virtual memory          (kbytes, -v) unlimited


Thanks Torsten



Frank Celler

unread,
Dec 28, 2014, 4:59:11 AM12/28/14
to aran...@googlegroups.com
I'll try to find a Mac with 10.10 (I'm currently still on 10.9).

Frank Celler

unread,
Dec 28, 2014, 6:37:28 AM12/28/14
to aran...@googlegroups.com
Hi Torsten,

my colleague has 10.10. Can you send my a link to the test program and data to hackers (at) arangodb.org?

Thanks a lot
  Frank


Am Samstag, 27. Dezember 2014 18:22:38 UTC+1 schrieb torst...@gmx.de:

florian

unread,
Dec 30, 2014, 11:17:45 AM12/30/14
to aran...@googlegroups.com
HI Torsten,

we can now say that the batch method is working properly, though it might not be the best way for one-time bulk imports (see my answer on stackoverflow)
We are now further investigating why/if arango might crash when importing your data

cheers

Florian

Torsten Link

unread,
Dec 30, 2014, 2:33:32 PM12/30/14
to aran...@googlegroups.com
Hello,

this is not a one time import. This test was used to just look like the database will behave.
Per day I have to import about 4 million datasets. But this will increase over the next years to 
8 million per day. I have to import about 4000 files. But on the import of each file a lot of checks have to be performed.
Thats the reason why a upload like mentioned in stack overflow will not work.
Currently I have about one million errors or warnings while checking the files.



Mit freundlichen Grüßen

Torsten Link


Tel.:  +49 173 9111194
Fax: +49 6181 258108
mail: torst...@gmx.de

--
You received this message because you are subscribed to a topic in the Google Groups "ArangoDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/arangodb/qqZY3NutumY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to arangodb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Frank Celler

unread,
Jan 5, 2015, 6:09:48 AM1/5/15
to aran...@googlegroups.com
Hello Torsten,

First the good news: we successfully used your program under Linux and imported some ten millions documents. There is, however, a catch: one needs to adjust the number of file-descriptors. With modern Linux kernels this is not a problem. The reason is, that ArangoDB needs descriptors for the open data files. In principle you could also increase the maximal size of the datafile. This will then use a smaller number of larger files.

The neutral news: If you are going to import 8 million per day you definitely will need to shared your data. This will be true for any NoSQL solution. Another interesting question is, what kind of queries you are going to execute. Depending on your access pattern, you might need additional indexes.

The bad news: The file descriptor limit on Mac is very low by default (256). I have not found out, how to raise the limit for an app. So you might need to either install ArangoDB via Homebrew or at least the command line version. In these cases you can raise the limit using ulimit and some Mac magic (http://superuser.com/questions/302754/increase-the-maximum-number-of-open-file-descriptors-in-snow-leopard).

freues neues Jahr / happy new year.

  Frank
To unsubscribe from this group and all its topics, send an email to arangodb+unsubscribe@googlegroups.com.

Torsten Link

unread,
Jan 5, 2015, 1:51:36 PM1/5/15
to aran...@googlegroups.com
Hello Frank,

also Happy new Year.

Thank you for the fast investigation. The target environment will be Linux. Only the development and Test environment is Mac. So  i can live with that.
Currently I do not really Know how I will access the data. I come from Oracle and just start with my first experiments on NOSQL.
So I first have to learn how to structure the data. maybe I will go the Graph approach. But not sure at the moment.

Mit freundlichen Grüßen

Torsten Link


Tel.:  +49 173 9111194
Fax: +49 6181 258108
mail: torst...@gmx.de

To unsubscribe from this group and all its topics, send an email to arangodb+u...@googlegroups.com.

Frank Celler

unread,
Jan 7, 2015, 3:49:29 AM1/7/15
to aran...@googlegroups.com
Hello Torsten,

if there are any questions about the data model, please fell free to ask either here or directly hackers (at) arangodb.org

Kind regards
  Frank
Reply all
Reply to author
Forward
0 new messages