Hi,
I am running an AtoM instance where a lot of imports are happening (>4k per day). Each import is composed of a single file and a single information object.
To speed up the imports, they are run with the nested set updating disabled (
--skip-nested-set-build).
Concurrently, we have an ongoing quality assurance process that verifies the importations. In some cases, information objects need to be deleted. The user uses the web interface to delete the information object.
Unfortunately, this process makes the import process fail with a deadlock when there is a deletion happening at the same time:
Job 15351419 "arFileImportJob": Exception: Unable to execute INSERT statement. [wrapped: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction]Deadlock is happening when the import tries to insert a new row and the deletion process is trying to update the nested set:
---TRANSACTION 115421865, ACTIVE 23 sec updating or deleting
mysql tables in use 4, locked 4
16705 lock struct(s), heap size 2056400, 3144412 row lock(s), undo log entries 3138625
MySQL thread id 2179216, OS thread handle 139633645201152, query id 268489601 10.77.232.101 atom Updating
UPDATE information_object
SET information_object.RGT = information_object.RGT - '2'
WHERE information_object.RGT >= '2345187'
Trx read view will not see trx with id >= 115421841, sees < 115421841
---TRANSACTION 421110496260760, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421110496256504, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 115421953, ACTIVE 8 sec inserting
mysql tables in use 4, locked 4
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 516, OS thread handle 139635830523648, query id 268489714 10.77.232.101 atom Update
INSERT INTO information_object (`ID`,`LEVEL_OF_DESCRIPTION_ID`,`COLLECTION_TYPE_ID`,`PARENT_ID`,`SOURCE_STANDARD`,`SOURCE_CULTURE`) VALUES (15351273,2061,128,1653,'http://www.loc.gov/ead/ead.xsd','fr')
------- TRX HAS BEEN WAITING 8 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 4309 page no 37475 n bits 248 index PRIMARY of table `atom`.`information_object` trx id 115421953 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;; I don't really know if that's a bug. For us, it would be convenient if the two processes can happen at once.
I have thought of the following ways of fixing this issue
- Having the deletion process from the UI triggers a new job that performs the deletion. Then, the two processes will happen sequentially (as we have only one worker).
- Having the import job has a retry mechanism in case of MySQL deadlock
- Tweaking MySQL configuration to help avoid such a deadlock
Would anyone have any pointer on how to fix that?
Cheers,
Raphaël