Virtual Memory

82 views
Skip to first unread message

Michael Roterman

unread,
Jun 19, 2014, 2:05:59 PM6/19/14
to aran...@googlegroups.com
Hello,

I am trying to understand what the limits of Arangodb are and what the ideal setup is. From what I have understood arango stores all the collection data in the virtual memory and ideally you want this to fit in the RAM. If the collection grows and cannot fit in the RAM it will be swapped to disk.

So my first question. If my db grows will I need to adjust the swap partition/file to accommodate the db?

Since arango also syncs the data to disk does this mean that the data will always be located in the RAM and disk? So if I have a db that's 1.5GB and my RAM is 1GB I will need to at least have 0.5GB of swap disk and 1.5GB of regular disk space?

I am a bit confused how arango uses the virtual memory. Right now I have 7 collections that are practically empty. I have 1GB of RAM and 1GB of swap disk.
The admin reports that arango is using 4.5GB of virtual memory. How is this possible if the swap disk is 1GB? It's currently using 80MB of RAM. Shouldn't this be 224MB if the journal size is 32MB for each collection?

What is the recommendation on the journal size vs collection size? Can this be dynamically adjusted as the collection grows?

What kind of performance is expected if the swap disk is used a lot when the disk is an SSD? If the swap disk is used a lot would the performance be similar to using a more traditional db such as mysql? 

Thanks,
Michael

Jan Steemann

unread,
Jun 24, 2014, 12:46:24 PM6/24/14
to aran...@googlegroups.com
Hi Michael,

sorry for the delay in answering this.
Someone reposted the question here, and I answered it there:

http://stackoverflow.com/questions/24380071/memory-usage-of-arangodb

I hope this helps.
Best regards
Jan
> --
> You received this message because you are subscribed to the Google
> Groups "ArangoDB" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to arangodb+u...@googlegroups.com
> <mailto:arangodb+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Michael Roterman

unread,
Jun 25, 2014, 9:35:41 AM6/25/14
to aran...@googlegroups.com, j.ste...@triagens.de
Hi Jan,

Thank you very much for letting me know and writing a very comprehensive reply. Really appreciate the work you guys do with ArangoDB! 

I am not very familiar with the inner workings of virtual memory and memory-mapped files. What was news to me from your reply is that it's mostly the frequently used collections that should fit in memory. I was under the impression that it was recommended to fit all collections in memory. As sometimes e.g. you want to store geo data which takes a lot of space but maybe only access it once on signup etc. I was contemplating having heavy collections not used very often in a different db.

In my specific use case I am using ubuntu. I am not very familiar how the swap works but are the memory-mapped files stored in the swap partition or are they stored on the main disk elsewhere? When I setup a new cloud instance I will need to specify the swap partition size. If the files are stored in the swap partition I will need to size it and resize it depending on my collections.

Regarding the V8 threads. Are these used mostly for Foxx? I was under the impression that Arango core is written in C++? If I am using ArangoDB as a pure storage solution should I minimize the V8 threads to reduce the memory footprint?

Thanks!
Michael 

Jan Steemann

unread,
Jun 26, 2014, 2:42:08 AM6/26/14
to Michael Roterman, aran...@googlegroups.com
Hi Michael,

if the collection datafiles you're working with are too big to fit into
main memory simultaneously, the operating system will start writing some
or all of their pages back to disk. This shouldn't affect the operation
system's swapfile. The virtual memory of collection datafiles is backed
by the datafiles in the ArangoDB database directory, and those files are
the place where the OS will write the data back to.

Indexes in ArangoDB are somewhat different. Indexes have no file-backed
storage (they're constructed from the collection datafiles on-the-fly).
The memory set aside for indexes may therefore go into the operating
system's swapfile if the OS runs out of memory and starts to swap.

Whether or not the OS will start swapping at all depends on a lot of
factors and configuration.
I suggest running a small test with your expected workload and observing
the output of vmstat (http://linux.die.net/man/8/vmstat) during that. It
will tell you if and to what amount your system is swapping. The most
interesting columns in your case should be:
- si: Amount of memory swapped in from disk (/s).
- so: Amount of memory swapped to disk (/s).
- bi: Blocks received from a block device (blocks/s).
- bo: Blocks sent to a block device (blocks/s).

If you see values > 0 in si/so, your system is swapping. This may be due
to ArangoDB or other processes because they're all competing for the
same limited resources.

Re V8: the V8 threads are used for Foxx applications but also for some
of the built-in REST handlers (e.g. create/drop collection) and also AQL
queries. So you cannot deactivate the V8 functionality altogether but
need to have at least a few V8 threads available for these operations.

Best regards
Jan
> > an email to arangodb+u...@googlegroups.com <javascript:>
> > <mailto:arangodb+u...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>

Michael Roterman

unread,
Jun 26, 2014, 11:28:44 AM6/26/14
to aran...@googlegroups.com, m...@oceanicdesigns.com, j.ste...@triagens.de
Hi Jan,

Your answer makes it clear. So when creating the swap partition I will only need to consider the indexes in a worst case senario.
I will try vmstat when I have more data in the db.

Regarding the V8, if I am not using Foxx applications what do you consider a good default minimum number of threads?

Appreciate your help!

Thanks,
Michael
Reply all
Reply to author
Forward
0 new messages