We are updating our little app to work with Ecto 2 RC. Running tests
in parallel we see some pretty advanced database concurrency issues.
I think working around these will be a common problem, and it should
be documented.
Imagine you have 2 DB transactions which update Posts. If they update
the same rows in parallel in different order they can deadlock.
E.g.:
Transaction 1: Transaction 2
begin
begin
update posts where id = 1
update posts
where id = 2
update posts
where id = 1
update posts where id = 2 -- deadlock
We saw a few of these kind of deadlocks in our tests.
In production for this app this is not a problem. We don't worry
about concurrency for it. Actions are not wrapped in transactions.
In theory this can result in non-atomic action failure and data
corruption. In practice given lack of concurrency in current use it
would take 10 years to see a failure, and it would be harmless when it
happened because constraints would keep data consistent. So we
haven't bothered yet.
However, the tests do wrap these actions into a transaction. And
with Ecto 2 the tests are highly concurrent, and we do see deadlocks
in the tests. Our solution is to fix the actions and tests to
properly synchronize concurrency using Postgres row locking. And in
one case where it's too hard, and the issue is with test setup not the
action itself to keep async off for that test.
I think this may be a common problem when developers start using Ecto
2. Tests for actions that modify multiple rows are likely to
deadlock. I anticipate this will create 2 problems - support and
marketing.
I think most web developers have no idea about Postgres concurrency controls
and will not know what to do when they run into deadlocks. Would be
nice add a page to
documentation including:
* understanding row locking
(
http://www.postgresql.org/docs/current/static/explicit-locking.html)
* understanding and debugging deadlocks
* preventing deadlocks using consistent update ordering
* preventing deadlocks by explicitly locking a parent record
E.g. something like this:
https://www.brightbox.com/blog/2014/11/13/preventing-mysql-deadlocks/
From the marketing point view Ecto 2 seems to make a claim that it
will run existing unmodified DB tests in parallel (if they are
properly written). When I read it I was confused, since I didn't
think that was possible. I was actually thinking of MVCC visibility,
but the real problem turned out to be deadlocks.
I think what Ecto 2 async tests do is even better than currently
advertised. Not only do the tests run faster, but they let you test
that you code is correct in a highly concurrent environment. But it
is more work. This should be acknowledged IMO.
Best regards,
Gleb
PS It is ironic that Ecto 2 will make developers deal with lock-based
style of concurrency. So by using a BEAM language that lets you avoid
lock-based code, they will actually need to learn more about
lock-based concurrency than they would with Rails.