SQLSTATE[HY000]: General error: 3636 Recursive query aborted

375 views
Skip to first unread message

Roger Rutishauser

unread,
Aug 21, 2023, 10:18:41 AM8/21/23
to AtoM Users
Hi,

I have a AtoM installation where some archival descriptions yield a 500 error. 

in the nginx error.log:
[wrapped: SQLSTATE[HY000]: General error: 3636 Recursive query aborted after 10001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.]

I rebuilt the nested set, increased the cte_max_recursion_depth to even 100000 but it did not help. Also, I checked if all descriptions have obj_id, pub_status_id and slug (as described here https://www.accesstomemory.org/en/docs/2.6/admin-manual/maintenance/troubleshooting/#troubleshooting-data-corruption), which is the case.

AtoM 2.6.4 with PHP 7.2 and Percona MySQL 8.0 is installed

Any ideas what else I could try?

Thanks!

Dan Gillean

unread,
Aug 21, 2023, 1:08:45 PM8/21/23
to ica-ato...@googlegroups.com
Hi Roger, 

I've only seen this error in the user forum once before (that I recall), and that had to do more with the size of an import, so I don't think that is directly relevant. However - were these descriptions created via import?

One of our developers (Radda) suggested that one possibility is that there is an accidental loop in the parent relations somewhere in one (or more) of your archival hierarchies. This could possibly be the result of a bad CSV import. 

If this is the case, it will be a difficult thing to find and resolve. If you have a recent database backup, then the easiest thing would probably be to roll back to before this became an issue. If that is not possible, then I would suggest you try either a) running full export from the command-line (since the descendants query does not currently use CTE, this will hopefully complete ok) and then use the CSV to look for issues in the legacyId and ParentId columns - or else by using SQL to examine different information_object tables and check for strange object relations that way. 

Otherwise, we can't think of why that error would just crop up unexpectedly - what exactly are you typically doing when you encounter the error? 

In any case.... good luck, and let us know how it goes!

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/f0434e8d-2a61-4e3d-ac44-215ca7ea2a89n%40googlegroups.com.

Roger Rutishauser

unread,
Aug 28, 2023, 12:12:33 PM8/28/23
to AtoM Users
Hi Dan,

Thanks for your reply. 
Some have been created by CSV import (GUI and CLI) and some have been created in the GUI. 
I haven't found a loop in the parent relations so far.

I checked the lft and rgt and found out that there are some descriptions that share the same lft with others. Most lft exist in 2, some in 3, and one even in 258 descriptions!
I guess that's what the problem is?
select io.id, io.lft, io.rgt, io18.title from information_object io 
left join information_object_i18n as io18 on io.id = io18.id 
where io.lft in (
  select lft from information_object 
  group by lft having count(lft) > 1
order by io.lft;

Am I right in assuming that if I set the lft and rgt values to NULL and rebuild the search index, the descriptions will be lost?
What else could I do, e.g. setting the lft and rgt to the next, non-existing, value? It will probably destroy the hierarchy, right?

How would you guys proceed if you had multiple descriptions with the same lft or rgt?

Best regards, Roger

Dan Gillean

unread,
Aug 29, 2023, 8:32:46 AM8/29/23
to ica-ato...@googlegroups.com
Hi Roger, 

As always, make some backups first, but: you did say that you've tried running the build-nested-set task, right?
And this didn't fix issues with the nested set values (i.e. the lft and rgt ones)? If not, was there any output from the task, or does it appear to complete successfully?

I am still awaiting further developer feedback, but some initial thoughts: 

First, I still think trying to run a bulk CSV export from the command-line could be a good idea - even if the task fails, the failure message could give us useful insight - and if it does manage to export, then it should be much easier to find issues in a CSV, especially if you use a tool like OpenRefine to find duplicate values. 

If you want to keep exploring via SQL, one of our developers found some threads you might explore where others have needed to identify circular references in their databases. Perhaps you might be able to adapt something from these?
I also don't personally know enough about MySQL to tell you exactly where to find this one, but MySQL has its own error logs. You can enable those and then check them for more specific information? See: 
Otherwise, if you have a backup, then ... sure, why not try deleting some values from the duplicated nested set fields and see if it helps? I would suggest trying something more conservative at first, like deleting just one of the duplicate LFT or RGT values from a record whenever you find them  - hopefully this might prompt the task to populate these. If not, you can try deleting all of the values and see what effect that has when running the nested set task afterwards. Do this ONLY if you have another backup (in case the outcome is unexpected), but: I don't think you would "lose" your descriptions - they would still have entries in the database, and the parentId is stored in a different field if I recall correctly - so if that is the case, then the hierarchy would likely be maintained but siblings would be out of order. 

Finally, can you tell us a bit more about this overall? Did you encounter the 500 error just navigating to the view page of these descriptions? If yes, from the search results, or while navigating the treeview, or...? Anything else you tried that had any further useful context or output, etc?

I hope we can get to the bottom of this! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Roger Rutishauser

unread,
Sep 1, 2023, 3:50:34 AM9/1/23
to AtoM Users
Hi Dan!

Thanks for your help. I have not yet tried the queries that you sent me, I will try them later, but here's a summary of what happens and of what I did so far:

The 500 error appears when I click on certain descriptions, no matter if the base page was the fullwidth treeview on top of a description, or the full hierarchy page, or search result.
If I click on a bad description in the full-width hierarchy it won't do nothing visually. Neither URL nor content changes, but you see the 500 in the nginx error.

Let's say  I was in the detail-view of the description Item C11.2 and I have the following hierarchy (example) on top of the page.
Fonds A
-- Fonds B
-- Fonds C
---- Sub-Fonds C1
------ Sub-Fonds C11
-------- Item C11.1
-------- Item C11.2 (active)
-------- Item C11.3
Clicking on some of the Items C11.1-3 and on Fonds B and Fonds C it would work
Clicking on Sub-Fonds C11, Sub-Fonds C1, Fonds A it would give 500 in nginx error log, but nothing happens in the browser.

By the way: We're talking about the test environment, which has 40'903 archival descriptions.

If I enter the URL to the bad description directly, i get the 500 error screen. This is also the case when I click on the description in the search result list.
Displays:
Oops! An Error Occurred
Sorry, something went wrong.
The server returned a 500 Internal Server Error.
Try again a little later or ask in the discussion group.
Back to previous page.


the nginx error.log shows:
2023/08/30 16:17:02 [error] 13882#13882: *1765 FastCGI sent in stderr: "PHP message:  [wrapped: SQLSTATE[HY000]: General error: 3636 Recursive query aborted after 1000001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.]" while reading response header from upstream, client: 10.15.24.134, server: xyz.CENSORED.org, request: "GET /archive-a HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.atom.sock:", host: "xyz.CENSORED.org", referrer: "https://xyz.CENSORED.org/browse/hierarchy"


Then I rebuild the nested set, it workded fine, but did not have any effect. I also updated the elasticserach index.
root@A15:/usr/share/nginx/atom# sudo -u www-data php symfony propel:build-nested-set
>> propel    Build nested set for information_object...
>> propel    Build nested set for term...
>> propel    Build nested set for menu...
>> propel    Done!



Then I tried to export from the CLI:

Variant 1:
root@A15:/usr/share/nginx/atom# php symfony csv:export full-export.csv
Exporting as ISAD.


   [wrapped: SQLSTATE[HY000]: General error: 3636 Recursive query aborted after 1000001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.]


Variant 2:
root@A15:/usr/share/nginx/atom# php symfony csv:export --single-slug="archive-a" archive-a.csv
Exporting as ISAD.
...............[many more dots clipped]..............................

   [wrapped: SQLSTATE[HY000]: General error: 3636 Recursive query aborted after 1000001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.]


Just to make sure it works with another slug... it does.
root@A15:/usr/share/nginx/atom# php symfony csv:export --single-slug="archive-b" archive-b.csv
Exporting as ISAD.
...............[many more dots clipped]..............................
Export complete (2445 descriptions exported).



Investigating further, I realized that about 350 archival descriptions don't have a parent_id (value is NULL) which as far as I know should not be.
So I set the parent_id to 1 for those, leaving lft and rgt untouched. I thought I could move them back to the correct place later. Strangely, although the parent_id was NULL, they appeared on the correct place within the hierarchy in AtoM (why?).
Then I built the nested set once again, but after a while (like 10 minutes or so) it crashed, displaying killed in the shell. I ran the build-nested-set command with nohup in the background and saw the process in htop consuming lots of memory until it crashed.
I also ran sudo php symfony search:populate. It is currently running, but I doubt it will work out fine. Indexing runs much slower than in the normal case, before I set the parent_id to 1 and built the nested set.

Of course, in between all the steps I restartet memcache, php, mysql etc., cleared the cache, just to be on the safe side.

So you see the data is a huge mess and I don't have any clue what to do next. Also, I would like to understand what could lead to the mess - my guess is that it's due to imports of malformed CSV and/or unfinished import jobs.

A solution would be to drop the database and import the one from the production system, but the problem is that it has even 600+ archival descriptions with parent_id = NULL! The customer said that there are also some descriptions that yield a 500 error when clicked on.
I would like to know how to cleanse the data. 
And we're working on upgrading to 2.7.3 so that the CSV validator can be used!

Looking forward to your thoughts on this.
Roger

Roger Rutishauser

unread,
Sep 1, 2023, 6:17:29 AM9/1/23
to AtoM Users
follow-up:
Customer says on PROD environment there are no 500 errors. Still there are, as mentioned, over 600 descriptions without a parent_id. Do I need to do anything?

Dan Gillean

unread,
Sep 1, 2023, 9:07:12 AM9/1/23
to ica-ato...@googlegroups.com
Hi Roger, 

That.... sounds messy! I'm glad to learn that this is a test instance, and that while there may be some issues, things are overall better in the production system. 

I will need to get back to you next week when the best developer for me to ask such thorny questions will be back in office. In the meantime, my initial thoughts: 
  • If things are better in prod, then I strongly suggest we focus on getting a copy of the prod database fixed so you can upgrade them and use the validator going forward, rather than getting lost in the weeds figuring out exactly where the recursive loop is in this test instance
  • AtoM does have hidden "root" objects for each entity type, so yes - as far as I know there shouldn't be any user-created descriptions without a parent ID value. For archival descriptions, the root object ID is 1 I believe. When descriptions have a parent_id value of 1, they are top-level descriptions in AtoM. 
  • In light of this, you COULD take a copy of the production instance, load it into your test environment, and try updating those missing parent_id values to 1, then running all the typical maintenance tasks (rebuild nested set, clear cache, restart PHP-FPM, repopulate search index) and see how things go. If those descriptions are no longer in the correct place, then ideally they could be manually moved via the GUI afterwards, assuming there are no other issues! 
Otherwise, if things are more complicated than that, hang tight and I will see if I can get you some further suggestions next week. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Dan Gillean

unread,
Sep 5, 2023, 11:59:49 AM9/5/23
to ica-ato...@googlegroups.com
Hi again Roger, 

Radda has shared an experimental WORK IN PROGRESS tool that may help your situation. Our long-term hope is that the Maintainers will pick this up, finalize it, and include it as a formally supported command-line task in an upcoming AtoM release. In the meantime, it's still an experimental internal tool we use sometimes to help with client issues. If you make sure you have backups, aren't working directly on a production system, and use it at your own risk, then it may end up resolving the missing parent_id issues for you. 

There is a gist here: 
To use the script, you can download it, save it in your root AtoM installation directory, and then run it with our generic tools:run command: 
  • php symfony tools:run 01_wip_atom_integrity_tool.php
According to the internal docs, at the moment the script does the following:
  • It asks if you have created a backup of the database.
  • It adds missing object rows for all the resources extending QubitObject (except descriptions).
  • It regenerates slugs to use them in the CSV report.
  • It adds missing parent ids to terms, making them child of the root term.
  • It checks the following description issues, showing a count of the affected ones and their descendants:
    •  Descriptions without object row.
    •  Descriptions without parent id.
    •  Descriptions without parent (the parent id exists but it points to a non existing description).
    •  Descriptions without publication status.
    •  Affected descriptions.
    •  Affected descriptions (including descendants).
  • If there are affected descriptions, it asks what would you like to do:
    • Generate CSV with affected descriptions and descendants.
    • Try to fix affected descriptions (and generate CSV).
    • Delete affected descriptions and descendants (and generate CSV).
  • It generates a CSV file (in all cases) with the affected descriptions and, if selected, deletes the descriptions (including their descendants) or tries to fix them.
  • It shows the generated CSV filename and progress of the process (each 100 descriptions).
  • It rebuilds the nested set.
  • Finally, it tells you to rebuild the search index.
Note that for missing parent ID values, when the script attempts a repair it will set these as top-level descriptions. 

Hope this helps! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Roger Rutishauser

unread,
Nov 3, 2023, 8:14:39 AM11/3/23
to AtoM Users
Hi Dan

I have just been able to test the script. It works fine, and it showed me the 600 or so elements without parent and parent id. I was able to fix them with 2) and have the CSV that I can give to the customer.
Perfect! GREAT!
Thank you very much for your valuable support.

Best regards
Roger

Dan Gillean

unread,
Nov 3, 2023, 8:49:02 AM11/3/23
to ica-ato...@googlegroups.com
So glad to hear it, Roger! 

Thank you for the update - this is useful feedback that I will pass on to the Maintainers, so we can ensure that adding support for this as a command-line task is on their backlog for a future release. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Reply all
Reply to author
Forward
0 new messages