Database deadlocks / concurrency issues with Ecto 2 async tests

626 views
Skip to first unread message

Gleb Arshinov

unread,
May 10, 2016, 4:42:30 PM5/10/16
to elixi...@googlegroups.com
We are updating our little app to work with Ecto 2 RC. Running tests
in parallel we see some pretty advanced database concurrency issues.
I think working around these will be a common problem, and it should
be documented.

Imagine you have 2 DB transactions which update Posts. If they update
the same rows in parallel in different order they can deadlock.
E.g.:

Transaction 1: Transaction 2
begin
begin
update posts where id = 1
update posts
where id = 2
update posts
where id = 1
update posts where id = 2 -- deadlock

We saw a few of these kind of deadlocks in our tests.

In production for this app this is not a problem. We don't worry
about concurrency for it. Actions are not wrapped in transactions.
In theory this can result in non-atomic action failure and data
corruption. In practice given lack of concurrency in current use it
would take 10 years to see a failure, and it would be harmless when it
happened because constraints would keep data consistent. So we
haven't bothered yet.

However, the tests do wrap these actions into a transaction. And
with Ecto 2 the tests are highly concurrent, and we do see deadlocks
in the tests. Our solution is to fix the actions and tests to
properly synchronize concurrency using Postgres row locking. And in
one case where it's too hard, and the issue is with test setup not the
action itself to keep async off for that test.

I think this may be a common problem when developers start using Ecto
2. Tests for actions that modify multiple rows are likely to
deadlock. I anticipate this will create 2 problems - support and
marketing.

I think most web developers have no idea about Postgres concurrency controls
and will not know what to do when they run into deadlocks. Would be
nice add a page to
documentation including:
* understanding row locking
(http://www.postgresql.org/docs/current/static/explicit-locking.html)
* understanding and debugging deadlocks
* preventing deadlocks using consistent update ordering
* preventing deadlocks by explicitly locking a parent record
E.g. something like this:
https://www.brightbox.com/blog/2014/11/13/preventing-mysql-deadlocks/

From the marketing point view Ecto 2 seems to make a claim that it
will run existing unmodified DB tests in parallel (if they are
properly written). When I read it I was confused, since I didn't
think that was possible. I was actually thinking of MVCC visibility,
but the real problem turned out to be deadlocks.

I think what Ecto 2 async tests do is even better than currently
advertised. Not only do the tests run faster, but they let you test
that you code is correct in a highly concurrent environment. But it
is more work. This should be acknowledged IMO.

Best regards,

Gleb

PS It is ironic that Ecto 2 will make developers deal with lock-based
style of concurrency. So by using a BEAM language that lets you avoid
lock-based code, they will actually need to learn more about
lock-based concurrency than they would with Rails.

José Valim

unread,
May 10, 2016, 5:46:32 PM5/10/16
to elixi...@googlegroups.com
Thank you Gleb, you are 100% correct! I will make sure to improve the documentation.

There are ways to avoid such issues which is by making sure the tests work on different data. But that may not be possible if you are triggering commands like update_all or delete_all. You can solve it by adding row-level locking or by simply making the test not async and using the shared sandboxed connection:

    Ecto.Adapters.SQL.Sandbox.mode({:shared, self()})


José Valim
Skype: jv.ptec
Founder and Director of R&D


--
You received this message because you are subscribed to the Google Groups "elixir-ecto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-ecto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/CACZNi589uEhed4ee8Yesh1230%3DMbVr_R1U_MQ5_58_r0GbWKZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

José Valim

unread,
Jun 11, 2016, 4:35:18 AM6/11/16
to elixi...@googlegroups.com
Hey Gleb, I have been writing documentation for this example that you have posted and I have a question. Why do you have two different transactions working on the same data? For example, transaction 2 was not supposed to see the data created by transaction 1.

Or is the issue the fact you are issuing update_all commands and those will walk through the whole the table possibly accessing (but skipping) data that was created by another transaction and not committed yet?



José Valim
Skype: jv.ptec
Founder and Director of R&D

Gleb Arshinov

unread,
Jun 11, 2016, 12:19:04 PM6/11/16
to elixi...@googlegroups.com
Fixtures. There is some common data in the database before
transactions start. Does this make sense?

If we did factories and started with empty DB we wouldn't have these
deadlock problems. But I think we'd get different ones, e.g.
deadlocks on inserting data into 2 tables with unique indexes.
Haven't tried this, but it might be actually harder to deal with.

Best regards,

Gleb
> https://groups.google.com/d/msgid/elixir-ecto/CAGnRm4KRObj8HwrP%3DhP0FLg5hZtoihY1RL0BDz95BqgkCKR14g%40mail.gmail.com.

José Valim

unread,
Jun 11, 2016, 12:30:57 PM6/11/16
to elixi...@googlegroups.com
Perfect. So the advice I added to the docs was to make sure to have distinct data sets. The unique indexes is relatively easy to solve by using "System.unique_integer([:positive])" to get a different value per run.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-ecto/CACZNi58o0Aw9XyL9Ont2PuFMgWpeGXo9mmeBc1MynOyLrQWw%3DA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--

Gleb Arshinov

unread,
Jun 11, 2016, 1:06:20 PM6/11/16
to elixi...@googlegroups.com
Makes sense.

W/ factories I was thinking about unique constraint on
columns other than id:

create table users (username text unique);
begin; begin;
insert into users values ('Jose');
insert into users values ('Gleb');
insert into users values ('Gleb');
insert into users values ('Jose');

Hard to know how much this actually comes up without trying
in a real test suite.

Gleb

On Sat, Jun 11, 2016 at 12:30 PM, José Valim
> https://groups.google.com/d/msgid/elixir-ecto/CAGnRm4KeHjHQ0JJ0UjxXgQZfg8bumdr3yVZzKx%3DgjpAXs03_%3DQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages