Hi Andres. That helps a bit. (Good to be able to pinpoint that Citus
is the source of these errors.) However, I'm not sure I'm closer to a
solution.
From the information in your message, and the contents of the error
message I saw, it sounds like what happened is that 2 different
processes were trying to acquire locks on the same 2 shards, and
deadlocked each other. But this raises a bunch of additional questions
for me:
* When does Citus lock a shard? I'm not clear how exactly I'm managing
to get a deadlock, as I'm not using transactions (I'm using JDBC
auto-commit). So it seems surprising that I would ever be locking
anything for any significant period of time, let alone having multiple
processes trying to lock the same things at the same time.
* Why does Citus lock an entire shard? Wouldn't locking at the row
level be less restrictive?
* Is there any way to turn off (or relax) this shard-level locking?
* When I received this error, I was also experiencing another issue
where I had "max_connections" set too low on my worker machines, and so
the master was getting errors where it was being blocked from connecting
to the workers. Any chance that might have been part of the cause of
this issue? (I.e., master acquires lock, tries to update worker, fails,
another process does the same and results in a deadlock.)
Thanks,
DR