[erlang-questions] mnesia - secondary indexes vs separate tables

37 views
Skip to first unread message

John Doe

unread,
May 28, 2013, 8:26:01 AM5/28/13
to erlang-q...@erlang.org
Hello,

I'm trying to do some kind of poor man's composite keys with mnesia for some running statistical data. What should be faster to read and write - one table with a few indexes or separate tables with primary key only?

There will be about 1-5 millions of records, approx the equal number of reads, inserts and deletes. 
The queries will be mostly "check if at least one record exists where given indexed field value = ...", "insert ...", and "delete all records where indexed field value=xxx". Total of 4 fields, so either one table with 4 indexed fields, or 4 tables with key_field + counter.
The table(s) will be disc_only_copy, as RAM matters.

Motiejus Jakštys

unread,
May 28, 2013, 9:14:57 AM5/28/13
to John Doe, erlang-q...@erlang.org
On Tue, May 28, 2013 at 3:26 PM, John Doe <donped...@gmail.com> wrote:
> Hello,
>
> I'm trying to do some kind of poor man's composite keys with mnesia for some
> running statistical data. What should be faster to read and write - one
> table with a few indexes or separate tables with primary key only?

Some folks did some benchmarks with mnesia with a secondary index, and
write performance was terrible (~40 times slowdown for inserts?). But
I could not find it in the archives. Who was it?

> There will be about 1-5 millions of records, approx the equal number of
> reads, inserts and deletes.
> The queries will be mostly "check if at least one record exists where given
> indexed field value = ...", "insert ...", and "delete all records where
> indexed field value=xxx". Total of 4 fields, so either one table with 4
> indexed fields, or 4 tables with key_field + counter.
> The table(s) will be disc_only_copy, as RAM matters.

On the other hand, it does not seem like too much work to measure
both. And you will have a definite answer.

--
Motiejus Jakštys
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Ulf Wiger

unread,
May 28, 2013, 11:19:02 AM5/28/13
to Motiejus Jakštys, erlang-q...@erlang.org

On 28 May 2013, at 15:14, Motiejus Jakštys wrote:

> Some folks did some benchmarks with mnesia with a secondary index, and
> write performance was terrible (~40 times slowdown for inserts?). But
> I could not find it in the archives. Who was it?

I believe it was Scott Fritchie, but it was for the special case where an inordinate number of secondary keys were of the same value (e.g. indexing a boolean attribute). Since index tables are 'bag' tables, this can indeed result in terrible performance.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com

Ignas Vyšniauskas

unread,
May 28, 2013, 6:05:47 PM5/28/13
to Ulf Wiger, erlang-q...@erlang.org
05/28/2013 05:19 PM, Ulf Wiger wrote:
> On 28 May 2013, at 15:14, Motiejus Jakštys wrote:
>
>> Some folks did some benchmarks with mnesia with a secondary index,
>> and write performance was terrible (~40 times slowdown for
>> inserts?). But I could not find it in the archives. Who was it?
>
> I believe it was Scott Fritchie, but it was for the special case
> where an inordinate number of secondary keys were of the same value
> (e.g. indexing a boolean attribute). Since index tables are 'bag'
> tables, this can indeed result in terrible performance.

Indeed it was Scott, here:
http://erlang.2086793.n4.nabble.com/Mnesia-and-additional-indexes-a-cautionary-tale-td2088434.html

And I have done some benchmarks which confirmed crappy insert
performance with secondary indices, but it was not very thorough,
because I bumped into a more severe problem, see below.

On 05/28/2013 02:26 PM, John Doe wrote:
> The table(s) will be disc_only_copy, as RAM matters.

Last time I checked (R14 I think) mnesia never stored the secondary
indices, so it would recreate them every time you restarted the node,
and given that you have an order of 10^6 record this will have either a
significant or a very significant impact on your start time, so you
might want check if that's still the case and decide whether it's a
problem for you.

--
Ignas

Reply all
Reply to author
Forward
0 new messages