API skip parameter fails at 10000 value?

20 views
Skip to first unread message

Paul Collins

unread,
May 12, 2021, 12:50:02 PM5/12/21
to AtoM Users
Hi. I am performing a BrowseInformationObjects API call that returns very many results, and thus the 'skip' parameter is used to start a new offset for each successive batch.

I am wondering why I suddenly get a 400 Bad Request response for this API operation, when it works for every skip value lower than 10000:


(This happened on two attempted runs, which suggests that there is something special about the 10000 value.)

I may be able to obtain the data another way (by connecting to the command line and running a MySQL query), but it is less convenient because I will have to study the database schema. We also have an automated system that processes all updated records since a specified date, and (while it's unlikely) after a sudden large data import we might conceivably hit this limit.

The docs do not appear to mention any limit on the use of skip:

Paul C.

Dan Gillean

unread,
May 13, 2021, 12:06:49 PM5/13/21
to ICA-AtoM Users
Hi Paul, 

Interesting discovery. I suspect that what you are in fact encountering is Elasticsearch's default pagination limit, which is set at 10,000. This was introduced in Elasticsearch 2, and as we learned, can also affect large search and browse result sets, per this older issue that describes the problem a bit more: 
The consequences of this for AtoM users as described on the issue ticket: 

This means that for uses with 10,000+ records in an AtoM installation, trying to navigate to any page above 1,000 in the search/browse results (when the results per page setting is set to the default value of 10) will lead to an Elasticsearch error.
 
We wanted to avoid increasing this value as the new default, because doing so also increases the amount of memory required for AtoM to run smoothly. Consequently, in AtoM 2.5 and later, the workaround we provided at the time was to add the ability to change the sort direction of results from ascending to descending, and vice versa. That way, if a user was looking for results past the first 1,000 pages of results, they could reverse the sort direction and see the remaining results from the other side (i.e. as the first results in the new sort order, instead of the last). See: 
Nevertheless, this was still a workaround, and may not meet the needs of larger institutions. In the upcoming 2.7 release, we've therefore made it easier to change this value via configuration file, as described here: 
Previously this value was hardcoded in a few places, making it more difficult to change in earlier versions. If you have a development environment, you could possibly try applying the relevant pull request as a patch, but be sure to back up any data first! 
Keep in mind that you should also only make this change if you have a good amount of RAM assigned to your installation - increasing this pagination value will increase the cost of queries! 

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/3cadb9aa-1f27-481e-a537-a8b7ff07258dn%40googlegroups.com.

Paul Collins

unread,
May 25, 2021, 2:03:23 PM5/25/21
to AtoM Users
Thanks Dan.

The ascending/descending thing to get 20K results instead of 10K is obviously a hack :)
In this case I discovered a way to pull the data out of the AtoM site as a CSV download (a "job") instead of using the API.
I would, though, like to suggest that you mention the 10K limitation in your documentation, as it might trip somebody up if they are not aware.

Dan Gillean

unread,
May 26, 2021, 9:08:20 AM5/26/21
to ICA-AtoM Users
Hi Paul, 

Yes, I'll be revising the docs to add information about the new configurable setting, so I'll be sure to include a warning about this in both Search/browse and API documentation, with a link to the relevant config file information. Thanks for the feedback! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him

Reply all
Reply to author
Forward
0 new messages