Maximum document size?

891 views
Skip to first unread message

William Hayes

unread,
Feb 15, 2017, 11:17:47 PM2/15/17
to ArangoDB
Hopefully a quick question.  It looks like the max document size (e.g. document value) is determined by the data journal size which I think is 32MB but which is configurable.  Do I understand this correctly?  I will have a lot of small files with occasional files that are very large potentially up to a few hundred MB.  Does the Velocy lib affect the size of the JSON document sizes in the collections?  

Thanks!

Jan

unread,
Feb 16, 2017, 3:34:31 AM2/16/17
to ArangoDB
Hi,

the JSON data sent to ArangoDB is internally stored in the VelocyPack format. So yes, VelocyPack does affect the size of the documents.
Normally VelocyPack objects can be stored as compact as JSON or even more compact, as shown in the table [here](https://github.com/arangodb/velocypack/blob/master/Performance.md).
The theoretical maximum size of a document in VelocyPack is a few exabytes.

In ArangoDB, there are additional practical limits for a document's size:
- every document is stored in the WAL first, so it must fit into a WAL journal file. The default size is 32 MB (can be adjusted using --wal.logfile-size). When a bigger document arrives that does not fit into a journal, a new journal will be created that can hold this document. Journals can grow beyond the --wal.logfile-size threshold if the option --wal.allow-oversize-entries is set to true (it is by default).
- if documents are processed with JavaScript, there are additional limits. The maximum string length there is 256 MB if I am not wrong. The ArangoShell (arangosh) and some other ArangoDB functionality uses JavaScript and these parts may want to JSON-stringify documents. This practically caps the max document size to 256 MB if these parts are used.
- the ArangoDB client tools (e.g. ArangoShell) all use a configurable netpack packet max size, which is 128 MB by default (--server.max-pack-size option). This may need to be increased to allow for bigger documents in network traffic.
 
So yes, it will work with documents bigger 32 MB. Here's an example with a document that's approx. 177 MB big (note that I already adjusted the max packet size value for this):

> var doc = {}; for (i = 0; i < 5000000; ++i) doc["testdata" + i] = "testdata" + i;
testdata4999999

> JSON.stringify(doc).length
177777781

> db._create("biggie");
[ArangoCollection 27417865702610887, "biggie" (type document, status loaded)]

> db.biggiedb.biggie.insert(doc);
{
  "_id" : "biggie/27417865702610899",
  "_key" : "27417865702610899",
  "_rev" : "_UiV8Jzq---"
}

Processing time for a documents normally depends on the document size, so bigger documents will in most cases take longer to process than smaller ones.
So in practice big documents should be used only in exceptional cases, when it is unavoidable.

Best regards
J
Reply all
Reply to author
Forward
0 new messages