Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Mnesia deadlock with large volume of dirty operations?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Brian Acton  
View profile   Translate to Translated (View Original)
 More options Apr 2 2010, 2:22 pm
From: Brian Acton <ac...@whatsapp.com>
Date: Fri, 2 Apr 2010 11:22:52 -0700
Local: Fri, Apr 2 2010 2:22 pm
Subject: Re: [erlang-questions] Mnesia deadlock with large volume of dirty operations?

I'm sorry. I neglected to tell you what I had done on the previous day.

On the previous day, I had attempted to delete some old records using this
methodology:

                mnesia:write_lock_table(offline_msg),
                mnesia:foldl(
                  fun(Rec, _Acc) ->
                          case Rec#offline_msg.expire of
                              never ->
                                  ok;
                              TS ->
                                  if
                                      TS < TimeStamp ->
                                          mnesia:delete_object(Rec);
                                      true ->
                                          ok
                                  end
                          end
                  end, ok, offline_msg)

This delete finished on the 1st node but subsequently locked up all the
other nodes on a table lock. The cluster blew up and my 24/7 service went
into 1 hr of recovery of downtime.

So to recap,

on day 1 - transaction start, table lock, delete objects - finished in about
2 minutes
on day 2 - dirty select, dirty delete objects - finished in about 2 minutes

In both cases, the cluster blew up and became unusable for at least 20-30
minutes. After 20-30 minutes, we initiated recovery protocols.

Should I try

day 3 - transaction start, no table lock, delete objects

? is the table lock too coarse grained ? considering that the cluster has
blown up twice, i'm obviously a little scared to try another variation....

--b

On Fri, Apr 2, 2010 at 5:47 AM, Ovidiu Deac <ovidiud...@gmail.com> wrote:
> To me it sounds like another example of premature optimization which
> went wrong? :)

> On Fri, Apr 2, 2010 at 10:19 AM, Dan Gudmundsson <d...@erlang.org> wrote:
> > When you are using dirty, every operation is sent separately to all
> nodes,
> > i.e. 192593*6 messages, actually a transaction could have been faster
> > in this case.
> > With one message (large) containing all ops to each node.

> > What you get is an overloaded mnesia_tm (very long msg queues),
> > which do the actual writing of the data on the other (participating
> > mnesia nodes).

> > So transactions will be blocked waiting on mnesia_tm to process those
> 200000
> > messages on the other nodes.

> > /Dan

> > On Fri, Apr 2, 2010 at 1:11 AM, Brian Acton <ac...@whatsapp.com> wrote:
> >> Hi guys,

> >> I am running R13B04 SMP on FreeBSD 7.3. I have a cluster of 7 nodes
> running
> >> mnesia.

> >> I have a table of 1196143 records using about 1.504GB of storage. It's a
> >> reasonably hot table doing a fair number of insert operations at any
> given
> >> time.

> >> I decided that since there was a 2GB limit in mnesia that I should do
> some
> >> cleanup on the system and specifically this table.

> >> Trying to avoid major problems with Mnesia, transaction load, and
> deadlock,
> >> I decided to do dirty_select and dirty_delete_object individually on the
> >> records.

> >> I started slow, deleting first 10, then 100, then 1000, then 10000, then
> >> 100,000 records. My goal was to delete 192593 records total.

> >> The first five deletions went through nicely and caused minimal to no
> >> impact.

> >> Unfortunately, the very last delete blew up the system. My delete
> command
> >> completed successfully but on the other nodes, it caused mnesia to get
> stuck
> >> on pending transactions, caused my message queues to fill up and
> basically
> >> brought down the whole system. We saw some mnesia is overloaded messages
> in
> >> our logs on these nodes but did not see a ton of them.

> >> Does anyone have any clues on what went wrong? I am attaching my code
> below
> >> for your review.

> >> --b

> >> Mnesia configuration tunables:

> >>      -mnesia no_table_loaders 20
> >>      -mnesia dc_dump_limit 40
> >>      -mnesia dump_log_write_threshold 10000

> >> Example error message:

> >> ** WARNING ** Mnesia is overloaded: {mnesia_tm, message_queue_len,
> >> [387,842]}

> >> Sample code:

> >> Select = fun(Days) ->
> >>         {MegaSecs, Secs, _MicroSecs} = now(),
> >>         T = MegaSecs * 1000000 + Secs - 86400 * Days,
> >>         TimeStamp = {T div 1000000, T rem 1000000, 0},
> >>         mnesia:dirty_select(offline_msg,
> >>                     [{'$1',
> >>                       [{'<', {element, 3, '$1'},
> >>                     {TimeStamp} }],
> >>                       ['$1']}])
> >>     end.

> >> Count = fun(Days) -> length(Select(Days)) end.

> >> Delete = fun(Days, Total) ->
> >>         C = Select(Days),
> >>         D = lists:sublist(C, Total),
> >>         lists:foreach(fun(Rec) ->
> >>                       ok = mnesia:dirty_delete_object(Rec)
> >>                   end,
> >>                   D),
> >>         length(D)
> >>     end.

> > ________________________________________________________________
> > erlang-questions (at) erlang.org mailing list.
> > See http://www.erlang.org/faq.html
> > To unsubscribe; mailto:erlang-questions-unsubscr...@erlang.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.