API skip parameter fails at 10000 value?

122 views
Skip to first unread message

Paul Collins

unread,
May 12, 2021, 12:50:02 PM5/12/21
to AtoM Users
Hi. I am performing a BrowseInformationObjects API call that returns very many results, and thus the 'skip' parameter is used to start a new offset for each successive batch.

I am wondering why I suddenly get a 400 Bad Request response for this API operation, when it works for every skip value lower than 10000:


(This happened on two attempted runs, which suggests that there is something special about the 10000 value.)

I may be able to obtain the data another way (by connecting to the command line and running a MySQL query), but it is less convenient because I will have to study the database schema. We also have an automated system that processes all updated records since a specified date, and (while it's unlikely) after a sudden large data import we might conceivably hit this limit.

The docs do not appear to mention any limit on the use of skip:

Paul C.

Dan Gillean

unread,
May 13, 2021, 12:06:49 PM5/13/21
to ICA-AtoM Users
Hi Paul, 

Interesting discovery. I suspect that what you are in fact encountering is Elasticsearch's default pagination limit, which is set at 10,000. This was introduced in Elasticsearch 2, and as we learned, can also affect large search and browse result sets, per this older issue that describes the problem a bit more: 
The consequences of this for AtoM users as described on the issue ticket: 

This means that for uses with 10,000+ records in an AtoM installation, trying to navigate to any page above 1,000 in the search/browse results (when the results per page setting is set to the default value of 10) will lead to an Elasticsearch error.
 
We wanted to avoid increasing this value as the new default, because doing so also increases the amount of memory required for AtoM to run smoothly. Consequently, in AtoM 2.5 and later, the workaround we provided at the time was to add the ability to change the sort direction of results from ascending to descending, and vice versa. That way, if a user was looking for results past the first 1,000 pages of results, they could reverse the sort direction and see the remaining results from the other side (i.e. as the first results in the new sort order, instead of the last). See: 
Nevertheless, this was still a workaround, and may not meet the needs of larger institutions. In the upcoming 2.7 release, we've therefore made it easier to change this value via configuration file, as described here: 
Previously this value was hardcoded in a few places, making it more difficult to change in earlier versions. If you have a development environment, you could possibly try applying the relevant pull request as a patch, but be sure to back up any data first! 
Keep in mind that you should also only make this change if you have a good amount of RAM assigned to your installation - increasing this pagination value will increase the cost of queries! 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/3cadb9aa-1f27-481e-a537-a8b7ff07258dn%40googlegroups.com.

Paul Collins

unread,
May 25, 2021, 2:03:23 PM5/25/21
to AtoM Users
Thanks Dan.

The ascending/descending thing to get 20K results instead of 10K is obviously a hack :)
In this case I discovered a way to pull the data out of the AtoM site as a CSV download (a "job") instead of using the API.
I would, though, like to suggest that you mention the 10K limitation in your documentation, as it might trip somebody up if they are not aware.

Dan Gillean

unread,
May 26, 2021, 9:08:20 AM5/26/21
to ICA-AtoM Users
Hi Paul, 

Yes, I'll be revising the docs to add information about the new configurable setting, so I'll be sure to include a warning about this in both Search/browse and API documentation, with a link to the relevant config file information. Thanks for the feedback! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Ailish McCarthy

unread,
Mar 15, 2022, 11:04:53 AM3/15/22
to AtoM Users
Hi Paul - would you be interested in sharing more info about the job you created to download the CSV? I'm having a similar issue my side. Thanks :) 

Paul Collins

unread,
Mar 15, 2022, 11:57:06 AM3/15/22
to AtoM Users
Hi Ailish,

You can probably just create a one-off CSV download "job", as documented here:
https://www.accesstomemory.org/en/docs/2.6/user-manual/administer/manage-jobs/

Aside from that: our internal tool that pulls AtoM data, to incorporate it into our general Web site search, operates something like this:

1. Initially, fetch all the "BIORs" (BrowseInformationObject: basic details of a record).
2. Every night thereafter, process a subset of the BIORs (we do a few thousand per night): for each one, retrieve its "RIOR" (ReadInformationObject: full details of a record), and store it against the BIOR locally.
3. When every BIOR has a RIOR, we've finished the batch, and can rewrite our own Web search index with the latest AtoM data. The next night, start again from step 1.

The reason for this rather download-intensive process is that I'm not aware of a way for AtoM to notify us about records being deleted. Otherwise our search would gradually accumulate a lot of records that no longer actually exist in AtoM.

Paul C.

Ailish McCarthy

unread,
Mar 22, 2022, 7:13:23 AM3/22/22
to AtoM Users

Thanks so much for the response @Paul! I'm going to look into this.

Ailish McCarthy

unread,
Apr 5, 2022, 7:49:42 AM4/5/22
to AtoM Users
Hi Dan - I can't find anything in the documentation that explains how to reverse the sort order via the API? Can you point me in the right direction?

Thanks 
Ailish

Dan Gillean

unread,
Apr 5, 2022, 8:56:25 AM4/5/22
to ICA-AtoM Users
Hi Ailish, 

I haven't tested this yet, but I'm pretty sure that the API uses most of the exact same URI parameters as those used when searching and browsing in the user interface, so try adding sortDir=desc or sortDir=asc to affect the search order?

Let me know if it works! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Ailish McCarthy

unread,
Apr 5, 2022, 3:32:32 PM4/5/22
to AtoM Users
Thanks Dan - I was looking for the particular query string but that seems to give me the same results regardless of the direction I pass in. I've also tried with different variations of &direction=/&sortDirection=/asc/ascending/desc/descending also. Any other ideas?

Dan Gillean

unread,
Apr 6, 2022, 9:04:17 AM4/6/22
to ICA-AtoM Users
Hi Ailish, 

One of our developers pointed me to this block of code:
So, it looks like you can use sort= to set the sort method used. Per lines 83-86, the default sort order is by last modified, in descending order, but other sort options available include:
  • alphabetic - alphabetic by title
  • identifier - sort by reference code
  • date - sort by the start date of the first date of creation added
Additionally, you can try using reverse=1 to flip the current sort direction, per lines 89-92. From what I can tell, so long as reverse is set, it should work - however it's not clear what if anything it's expecting other than being set, so if reverse=1 doesn't work, try reverse=true or similar. We use 0 and 1 for false/true in other URI parameters so that's my first suggestion. 

Let us know if that works! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Ailish McCarthy

unread,
Apr 6, 2022, 9:46:09 AM4/6/22
to AtoM Users
That's it! reverse with 1/0 is the ticket - cheers Dan and team :) 
Reply all
Reply to author
Forward
0 new messages