DETS table auto_save behaviour

107 views
Skip to first unread message

Nicolas Martyanoff

unread,
May 25, 2021, 2:43:21 AM5/25/21
to erlang-q...@erlang.org

Hi,

I am unsure about the behaviour of DETS regarding saving.

The documentation indicates:

all operations performed by Dets are disk operations

Which seems to hint that every single insertion ends up on disk. Good.

But then:

{auto_save, auto_save()} - The autosave interval. If the interval is
an integer Time, the table is flushed to disk whenever it is not
accessed for Time milliseconds. A table that has been flushed
requires no reparation when reopened after an uncontrolled emulator
halt.

This is ambiguous: does it means that entries will be buffered in memory
and only written to disk during the auto save operation (therefore some
operations are not actually disk operations), or does it mean that DETS
always writes to disk without sync-ing (using fsync or equivalent), and
synchronization occurs during the auto save operation ?

In any case, am I correct in assuming that DETS does not offer any way
to guarantee that entries are actually written on disk, meaning that an
application crash would lead to a loss of every entry written since the
last auto_save operation ?

I was hoping to use DETS as a local persistent buffer in case data
cannot be written to a remote database, but it seems impossible to
guarantee that every entry is being sync-ed to disk.

Thank you in advance.

Regards,

--
Nicolas Martyanoff
http://snowsyn.net
kha...@gmail.com

Frank Muller

unread,
May 25, 2021, 3:01:33 AM5/25/21
to Nicolas Martyanoff, erlang-q...@erlang.org
This question always puzzled me. 
Does Mnesia rely on the same assumptions?

/Frank

<kha...@gmail.com> a écrit :

Mikael Pettersson

unread,
May 26, 2021, 5:10:52 PM5/26/21
to Nicolas Martyanoff, erlang-questions
On Tue, May 25, 2021 at 8:43 AM Nicolas Martyanoff <kha...@gmail.com> wrote:
> I was hoping to use DETS as a local persistent buffer in case data
> cannot be written to a remote database, but it seems impossible to
> guarantee that every entry is being sync-ed to disk.

I'm not too familiar with the internals of DETS, but basically data
goes straight to/from disk while meta-data about allocated and free
areas of the file are cached in memory. I don't know if writes are
sync or not. In our experience, DETS files are somewhat fragile, plus
they have a hard 2GB size limitation which made them extremely awkward
for our use case (large mnesia tables). That's part of the reason we
migrated most of our mnesia tables to eleveldb.

If I had to have a standalone (not mnesia) local persistent store I'd
probably go with eleveldb (or one of its spinoffs) if I needed lookups
by key, or a disk_log if I just needed a FIFO buffer. disk_log allows
you to choose how sync or async your writes are. _I_ wouldn't use
DETS.

Nicolas Martyanoff

unread,
May 27, 2021, 1:39:37 AM5/27/21
to Mikael Pettersson, erlang-questions
Mikael Pettersson <mikpe...@gmail.com> writes:
> I'm not too familiar with the internals of DETS, but basically data
> goes straight to/from disk while meta-data about allocated and free
> areas of the file are cached in memory. I don't know if writes are
> sync or not. In our experience, DETS files are somewhat fragile, plus
> they have a hard 2GB size limitation which made them extremely awkward
> for our use case (large mnesia tables). That's part of the reason we
> migrated most of our mnesia tables to eleveldb.
I was already wary of DETS due to the size limitation (the fact that the
limitation is still here in 2021 shows that nobody is interested in
maintaining the module), but you are confirming my first impression.

> If I had to have a standalone (not mnesia) local persistent store I'd
> probably go with eleveldb (or one of its spinoffs) if I needed lookups
> by key, or a disk_log if I just needed a FIFO buffer. disk_log allows
> you to choose how sync or async your writes are. _I_ wouldn't use
> DETS.
I also just realized that it does not support ordered_set. I'll probably
end up with sqlite3.

Thank you for the information!

Ulf Wiger

unread,
May 27, 2021, 1:47:45 AM5/27/21
to Mikael Pettersson, erlang-questions
It's always tricky with open files during some abrupt crashes. OS-level file system caching means that not all written data may have been physically written to disk.

To detect this, dets has a flag indicating whether the file was properly closed. As I understand it, the 'auto-save' does the same thing as when the file is closed, except the file stays open.

BR,
Ulf W

Frank Muller

unread,
May 27, 2021, 2:52:17 AM5/27/21
to Ulf Wiger, erlang-questions
How about Mnesia and persistence to disk?

Ulf Wiger

unread,
May 27, 2021, 10:12:23 AM5/27/21
to Frank Muller, erlang-questions
Mnesia has a WAL (Write-Ahead Log), in which it writes data safely. It then writes to dets (if that's the chosen table type).

At startup, dets files are repaired if they don't appear to have been properly closed. Then the transaction log is applied, making sure that the database is consistent.

Repairs of dets files have been known to take time in the past, but I think OTP has optimized it, Klarna optimized the mnesia end of it, and both computers and disks are insanely faster now.

I'd say that the most glaring issue with disc_only_copies in mnesia is not even the 2 GB limit, but the fact that if you get there, dets will simply discard the update, and mnesia won't even notice. That is, your application must ensure that you never exceed the dets limit.

Most people use disc_copies for persistence, since they have better performance and better reliability than disc_only_copies. The downside is that the table will also fit in RAM. A different approach would be to use a backend plugin. There are three alternatives to choose from, as far as I know: leveldb, leveled, and rocksdb. There may be issues building leveldb on newer OTP versions. Leveled is (almost) entirely erlang-based, so it wins hands-down on build time. Rocksdb should be the fastest, although the difference isn't dramatic.

BR,
Ulf W


Frank Muller

unread,
May 27, 2021, 10:31:29 AM5/27/21
to Ulf Wiger, erlang-questions
Thanks for the info Ulf. 

Could you please point me to the WAL source code?
Curious to know how it’s implemented.  

Ulf Wiger

unread,
May 27, 2021, 12:07:46 PM5/27/21
to Frank Muller, erlang-questions
The logic is spread out, but a starting point is where the actual commit is logged.


But there are several different places where stuff happens. Check also the mnesia_tm:do_commit()

and the mnesia_dumper.erl module (which reads the commit log and disperses the data into the
different tables, both at startup, and periodically, to avoid having the commit log grow too large.)

BR,
Ulf

Frank Muller

unread,
May 27, 2021, 12:40:05 PM5/27/21
to Ulf Wiger, erlang-questions
Awesome, thanks!
I thought the WAL was implemented in C. 
Reply all
Reply to author
Forward
0 new messages