OAI harvesting into Summon - XML response incomplete

78 views
Skip to first unread message
Assigned to da...@artefactual.com by me

lib-s...@unbc.ca

unread,
Jan 18, 2018, 1:23:11 PM1/18/18
to AtoM Users
Hello,

We're having an issue with our repository not being fully ingested into Summon. Here's the response their development team sent us:

Around page 291 (cursor=29000) (https://search.nbca.unbc.ca/;oai?verb=ListRecords&resumptionToken=eyJmcm9tIjoiIiwidW50aWwiOiIiLCJjdXJzb3IiOjI5MDAwLCJtZXRhZGF0YVByZWZpeCI6Im9haV9kYyIsInNldCI6IiJ9), the XML response is incomplete and invalid. It looks like it has been cut off before the last few results and closing tags. This happened on multiple OAI harvesting tools besides ours, so it appears to be an issue with the server. 

I searched this AtoM Users group, as well as the bug tracker, and found this: https://projects.artefactual.com/issues/10344

That seems to match our issue, except if I'm interpreting it correctly the issue was fixed in v2.4.0, but we're running 2.4.0.

I also found the "Harvesting from AtoM to Primo" thread, but I don't think that's the same issue we're experiencing.

Any ideas/suggestions?

Thanks,

David

David at Artefactual

unread,
Jan 18, 2018, 1:29:05 PM1/18/18
to AtoM Users
Hi David,

I would guess there's an error generating the Dublin Core document that is halting the script and truncating the response from the server.  I'd suggest checking your web server error log and looking for 500 errors from PHP, or putting your AtoM instance into debug mode and seeing if there are error messages in the output.

--

David Juhasz
Director, AtoM Technical Services Artefactual Systems Inc. www.artefactual.com

lib-s...@unbc.ca

unread,
Jan 18, 2018, 2:12:50 PM1/18/18
to AtoM Users
Here's what showed up in the nginx error.log when I tried loading that URL again:

"PHP message: PHP Fatal error:  Call to a member function getOaiIdentifier() on a non-object in /usr/share/nginx/atom/plugins/arOaiPlugin/modules/arOaiPlugin/templates/_listRecords.xml.php on line 11"

Dan Gillean

unread,
Jan 18, 2018, 5:22:39 PM1/18/18
to ICA-AtoM Users
Hi David, 

It's possible that you're receiving this error because AtoM is unable to find the collection root. This suggests that the nested set might have become corrupted. AtoM uses a nested set model to manage hierarchical data in the relational database. Sometimes during long-running operations (moves, large deletions, or any operation that times out mid-process, etc.) this can be corrupted, leading to strange behaviors. Fortunately, there's a command-line task you can try, which will rebuild the nested set. Run the following from your root AtoM directory (which is generally /usr/share/nginx/atom if you have followed our recommended installation instructions): 
Let us know if that resolves the issue! 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/7abcda44-bd52-4411-b2f6-1f72bd99eb8b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Message has been deleted
Message has been deleted

Dan Gillean

unread,
Jan 18, 2018, 6:14:58 PM1/18/18
to ICA-AtoM Users
Hi David, 

Drat, I was hoping we had an easy solution on our hands :)

You could also try regenerating the slugs, and possibly also reindexing and restarting services. If there's a missing slug somewhere in your data, it could cause similar issues. Try: 
For all this info, and for restarting services, you might want to check out these slides: 
Generally, I find that these tasks tend to resolve about 80%+ of reported AtoM issues, so I tend to recommend them first. It's worth a shot. 

If that doesn't help... then I'm running short on ideas. You could try doing some database checks to ensure there's no corrupted data in there, just in case. See these threads: 
And, let me know if you don't find any evidence of data corruption (e.g .missing publication status values or object rows, the most common cases) - if not, it's possible you're encountering a bug we haven't yet seen. In any case, I'll take the thread to our developers and see if they have further suggestions. 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Thu, Jan 18, 2018 at 5:32 PM, <lib-s...@unbc.ca> wrote:
Hi Dan,

I ran the 'php symfony propel:build-nested-set' command. It completed successfully, but the OAI output has not changed.

Thanks,

David
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
Message has been deleted

Dan Gillean

unread,
Jan 19, 2018, 1:11:02 PM1/19/18
to ICA-AtoM Users
Hi David, 

Thanks for the update - sorry to hear those solutions didn't work. I'll take this thread to our development team to see if they have further suggestions. 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Fri, Jan 19, 2018 at 12:32 PM, <lib-s...@unbc.ca> wrote:
Hi Dan,

I regenerated the slugs, reindexed (php symfony search:populate) and restarted services. Actually I restarted the entire server, since last night was scheduled server OS updates. It's still exhibiting the same behaviour.

I ran the SQL query from your first link, to check for broken information objects. All rows had values in all columns.

David

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.

Dan Gillean

unread,
Jan 24, 2018, 11:46:20 AM1/24/18
to ICA-AtoM Users
Hello again David, 

Unfortunately, our team doesn't have further suggestions at the moment - they believe it's possible this is being caused by some data corruption in your database that we haven't uncovered yet, but without a detailed review of your installation environment and a database dump (which is beyond what we can provide via the user forum), there's not much else we can suggest at this time. It's also possible you have uncovered a new bug in the OAI module, but in our initial testing we've been unable to reproduce it thus far. 

If this is a critical issue to UNBC and you would like support in resolving it from Artefactual, please feel free to contact me off-list and we can discuss a short time-and-materials contract to review and resolve the issue. 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

lib-s...@unbc.ca

unread,
Jan 25, 2018, 5:55:46 PM1/25/18
to AtoM Users
I think I've resolved the issue. Figured I should share what I figured out/did, in case anyone else runs into the same problem.

I started by looking at the incomplete XML being returned in the browser. It started with <record>, then <header>, and inside that was <identifier>, <datestamp> and <setSpec>. <setSpec> was an open tag with no contents, and that was the end of the file. Proceeding on the assumption that there was something wrong with this record, I decided to search for it in the database, using this simple query:

SELECT *
FROM information_object
WHERE information_object.oai_local_identifier = <xxxxxx>

The <xxxxxx> was the identifier number from the incomplete XML. This gave me one row result.

I decided to try the Delete a Description command from the documentation. To do that, I needed the slug, so I ran another SQL query:

SELECT *
FROM slug
WHERE object_id = <xxxxxx>

This time the <xxxxxx> was the id value from the first query result. This again gave me one row result, including the slug value I needed.

In a PuTTY shell prompt, I ran the command sudo php symfony tools:delete-description <slug> as per the documentation I linked to above, replacing <slug> with the slug value from my second SQL query.

Then I refreshed the browser, and this time it loaded the full page of complete XML. I loaded the next page, using the resumptionToken from the bottom of the page, and the next page gave incomplete XML. I repeated the procedure outlined above. In the end, I had to delete four Descriptions using this procedure. All four were within the same Fonds. After that I received several results pages of correct XML and it looked like it had moved to the next fonds, so I've re-triggered my little PowerShell OAI scraper to see if it can make it to the end successfully (If anyone is interested in that, message me).

One thing I noticed that might be relevant, or maybe a red herring, is that the slugs in the database for the items I deleted did not match exactly the URLs of the corresponding items. For example, the db entry for one description had 'personal-records-6', but the URL ended with 'personal-records-7'. Another one was 'lakehead-university' in the db, but the URL ended with 'lakehead-university-1'.

I hope nobody else runs into this problem, but if anyone does, I hope this helps. Now I just have to hope the problem doesn't return when the archivist re-enters the descriptions I deleted.

David

lib-s...@unbc.ca

unread,
Jan 25, 2018, 7:30:25 PM1/25/18
to AtoM Users
Sorry, one more quick thing. Before the archivist re-created any of the records, I decided I should rebuild the search index and clear the cache. I did that, and the four deleted descriptions came back. Except with slightly different OAI identifier numbers. I re-did the OAI scrap with my PowerShell script, and it succeeded, and included the four "deleted" descriptions.

I'm happy now, everything's there and working, but I am curious how/why those deleted records came back. I definitely checked that they were gone from the database with SQL queries after deleting them with php symfony tools:delete-description <slug>, but somehow they came back on their own. Very weird.

David

Dan Gillean

unread,
Jan 26, 2018, 10:21:51 AM1/26/18
to ICA-AtoM Users
Hi David, 

Very interesting! I am not quite sure what to make of this myself. It does sound like it was data corruption preventing you from proceeding - and the fact that the slugs you found didn't match exactly the slugs you were seeing in the browser, combined with the fact that deleting the near matches resulted in the original records returning, suggests to me that the particular form of data corruption at play here was some kind of incomplete duplicates in the object table. I'm not sure exactly how or why these incomplete rows were created, but I'm glad that you were able to track them down and delete them successfully. Thank you for updating the thread and sharing your process! 

Data corruption can happen a number of ways, but most often when a long-running process times out mid-task, aborting it while in an incomplete state. Perhaps a previous operation (an import, or a move, or something?) had left these ghost records - it's also possible they had been there for a while without causing issues until now. In any case, I'm happy you've worked it out. 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages