Deadlock when deleting while importing with nested set disabled

217 views
Skip to first unread message

Raphaël Barman

unread,
Apr 26, 2022, 11:04:31 AM4/26/22
to AtoM Users
Hi,

I am running an AtoM instance where a lot of imports are happening (>4k per day). Each import is composed of a single file and a single information object.
To speed up the imports, they are run with the nested set updating disabled (--skip-nested-set-build).

Concurrently, we have an ongoing quality assurance process that verifies the importations. In some cases, information objects need to be deleted. The user uses the web interface to delete the information object.

Unfortunately, this process makes the import process fail with a deadlock when there is a deletion happening at the same time:
Job 15351419 "arFileImportJob": Exception: Unable to execute INSERT statement. [wrapped: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction]

Deadlock is happening when the import tries to insert a new row and the deletion process is trying to update the nested set:

---TRANSACTION 115421865, ACTIVE 23 sec updating or deleting
mysql tables in use 4, locked 4
16705 lock struct(s), heap size 2056400, 3144412 row lock(s), undo log entries 3138625
MySQL thread id 2179216, OS thread handle 139633645201152, query id 268489601 10.77.232.101 atom Updating
UPDATE information_object
      SET information_object.RGT = information_object.RGT - '2'
      WHERE information_object.RGT >= '2345187'
Trx read view will not see trx with id >= 115421841, sees < 115421841
---TRANSACTION 421110496260760, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421110496256504, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 115421953, ACTIVE 8 sec inserting
mysql tables in use 4, locked 4
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 516, OS thread handle 139635830523648, query id 268489714 10.77.232.101 atom Update
INSERT INTO information_object (`ID`,`LEVEL_OF_DESCRIPTION_ID`,`COLLECTION_TYPE_ID`,`PARENT_ID`,`SOURCE_STANDARD`,`SOURCE_CULTURE`) VALUES (15351273,2061,128,1653,'http://www.loc.gov/ead/ead.xsd','fr')
------- TRX HAS BEEN WAITING 8 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4309 page no 37475 n bits 248 index PRIMARY of table `atom`.`information_object` trx id 115421953 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0

 0: len 8; hex 73757072656d756d; asc supremum;;

 
I don't really know if that's a bug. For us, it would be convenient if the two processes can happen at once.

I have thought of the following ways of fixing this issue
  1.  Having the deletion process from the UI triggers a new job that performs the deletion. Then, the two processes will happen sequentially (as we have only one worker).
  2. Having the import job has a retry mechanism in case of MySQL deadlock
  3. Tweaking MySQL configuration to help avoid such a deadlock

Would anyone have any pointer on how to fix that?

Cheers,
Raphaël

José Raddaoui

unread,
Apr 28, 2022, 10:54:34 AM4/28/22
to AtoM Users
Hi Raphaël,

Thanks for the detailed report and research. I wouldn't consider it as a bug, but more as the implications of having a nested set implementation to handle hierarchies in a relational database. One of our goals after AtoM 2.6 is to move away from that implementation, now that we have the option to use CTE queries to handle hierarchies in MySQL 8. If you are interested, you can find more information in the following two tickets (they link to many more):


Some of those were addressed in AtoM 2.6 and, while it won't fully fix this particular issue, it will improve the situation if you're not using that version already. About further improvements and your suggestions:

  1.  Having the deletion process from the UI triggers a new job that performs the deletion. Then, the two processes will happen sequentially (as we have only one worker).
This would be the ideal solution in my opinion, as even without the deadlocks, it's an operation that can take a lot of time in big datasets. The new job could also be included in this list to avoid its parallel execution if there are multiple workers. As a similar alternative that won't require code changes, if you are running those imports from the CLI and you are able to get the slugs from the descriptions instead of deleting them from the GUI, you could use the CLI task to delete descriptions sequentially.

  1. Having the import job has a retry mechanism in case of MySQL deadlock
We recently added something like that for the web request cycle but decided not to do it in jobs as they can be really long running processes and they are not always idempotent. But, are you running >4k imports as jobs from the GUI or using the CLI task from a custom script? For the later, I guess you could find a way to implement the retry (and clean-up if needed) from that script in case of an error.

  1. Tweaking MySQL configuration to help avoid such a deadlock
Lost in all those tickets you may have found a couple of analysis tasks that relate to this:


My initial impression was that changing the isolation level could cause data lost, but removing the indexes from the lft and rgt columns could potentially solve this particular problem as it may be main reason of the lock. With the changes in 2.6, I think it won't be a major performance hit to remove those indexes, and you could do it temporarily and add them back later. You could also try increasing the innodb_lock_wait_timeout value to see if waiting for a little longer helps with the issue, but the lock time will probably increase as the data grows.


Best regards,
Radda.
Reply all
Reply to author
Forward
0 new messages