Documents size on the database

128 views
Skip to first unread message

Davide

unread,
Mar 4, 2015, 9:08:04 AM3/4/15
to baa...@googlegroups.com
What is the actual size of documents on the DB?
I was making some tests and I wanted to check the DB size after the creation of documents... so I tried creating batches of 100 documents per time.
I tried with docs that contained a JSON object with 3 properties, and each contained a string... for a total of 100 characters, 100 bytes on a text document.
Every 100 of these docs, the dashboard was indicating an increase of 500Kb in the DB size. That means 5Kb each document.... is that correct?

Then I tried the same with much bigger JSON content, I think it was with around 10 properties and 1.5Kb, and the DB size increased if 800Kb every 100 of these bigger docs.
That means 8Kb per doc.

Are my tests right?

If this is right, my problem comes with disk space management, the DB would get very big pretty quickly.
Is there a way to have smaller docs?

Zs

unread,
Mar 4, 2015, 11:15:14 AM3/4/15
to baa...@googlegroups.com
I had similar results, huge overhead. What was strange is after a server restart the DB size dramatically dropped to a probably correct size. Did you check this?

Davide

unread,
Mar 4, 2015, 12:29:27 PM3/4/15
to baa...@googlegroups.com
Nope, I didn't test it, but it's interesting... I will restart the server now ;)

Davide

unread,
Mar 4, 2015, 1:14:13 PM3/4/15
to baa...@googlegroups.com
wow, you were right!
actually I realized now that there are 2 values on the dashboard:
- DB size
- collection size.

The collection size always grows up with the right amount of size, while the DB size increases of 5KB+ for every vertex (doc) created, and 10KB every edge (link) created.

The rebooting doesn't always drop down the DB Size, but eventually it does.... and becomes the right size again.
By the behavior of this size, seems like it's cache is not been cleared until reboot happens.


Can anybody in BaasBox team confirm or disprove this?
And, if so, is there a way to flush the cache periodically without rebooting the server?



giastfader

unread,
Mar 4, 2015, 7:21:07 PM3/4/15
to
Hi guys.
Here is how the things work.

The DB size is the total size of all files stored into the db/baasbox folder.
These files ARE the database, and in this location there are not only data files, but also index files and other files used by the db system.

BaasBox uses OrientDB as persistence engine, so the first thing to have in mind is how this database works.
OrientDB creates WAL (Write Ahead Logging)  files to manage durability of the database.
These files, by default, can growth up to 4GB and they are used in case of crashes to try to fix the database inconsistencies.
BaasBox, since version 0.9.2, set this limit to 300MB, and it can be overridden via command line (-Dstorage.wal.maxSize parameter).
This simply means that the total DB size is affected by these files that are, as you noted, flushed and deleted if the server is stopped.
Furthermore BaasBox defines indexes on some fields, and these indexes are stored in several files into this same folder.
The size of these indexes is not shown into the collection size section of the console.

In addition to this BaasBox adds its own data to each inserted document.
For every document, BaasBox creates another private data structure that is used to manage links among documents and files.
These private data are not shown into the collection section too.
For those who were curious and know OrientDB, these data are stored into a class called _BB_NodeVertex that extends V, meanwhile the links are stored directly into the E class.

As far as the single document size is concerned, you maybe noted that usually they are bigger than you expected. This happens because, in each document, BaasBox stores:
  • the ID of the document
  • the username  that actually created the record
  • the creation date
  • data related to the permissions on that record (each BaasBox collection is an OrientDB class that extends the ORestricted class)
  • an embedded document used to store audit information: who and when the latest update happened
For each Link, BaasBox stores:
  • the ID of the Link
  • the username that created the link
  • the creation date

Zs

unread,
Mar 5, 2015, 2:08:02 AM3/5/15
to baa...@googlegroups.com
Thanks for clarifying this. I guess OrientDB rotates the WAL files in case it reaches the limit (300MB by default with BaasBox). Even 4GB is fine for me if it's the maximum and does not affect the performance and if it can help ensure consistency more reliably. Should I set it to 4GB or just us BaasBox's default one?

Davide

unread,
Mar 5, 2015, 7:25:12 AM3/5/15
to baa...@googlegroups.com
Thanks giastfader for the very clear answer!!

Reply all
Reply to author
Forward
0 new messages