Internal Server error since Walden launch

76 views
Skip to first unread message

Ivo Bleylevens

unread,
Nov 7, 2025, 3:03:58 AMNov 7
to OpenAlex Community
Dear all,

I have sent this message before to the support team of OpenAlex because I was seeing this problem when Walden was still in Beta (I was using data-version=2 back then; problem did not occur without the data-version=2).  But now since Walden is launched I still see the problem below.

why am I getting an internal server error when I try to fetch the next page using the cursor of this URL.
The current (successful page is):
https://api.openalex.org/works?mailto=[EMAIL]&filter=authorships.institutions.lineage:i2800191616|i34352273|i4210161616|i4210132221|i209991678|i2802557195|i183941352|f4320321016|f4320322712|f4320335608|f4320335549&per_page=100&cursor=IlsnLUluZmluaXR5JywgMTAsICdodHRwczovL29wZW5hbGV4Lm9yZy9XMjAwODgyMTc2NyddIg==

Then , for me the next cursor is IlsnLUluZmluaXR5JywgOSwgJ2h0dHBzOi8vb3BlbmFsZXgub3JnL1cyMTA5NjExOTkxJ10i and the URL would become:
https://api.openalex.org/works?mailto= [EMAIL]&filter=authorships.institutions.lineage:i2800191616|i34352273|i4210161616|i4210132221|i209991678|i2802557195|i183941352|f4320321016|f4320322712|f4320335608|f4320335549&per_page=100&cursor=IlsnLUluZmluaXR5JywgOSwgJ2h0dHBzOi8vb3BlbmFsZXgub3JnL1cyMTA5NjExOTkxJ10i

but this yields an Internal Server error:

Screenshot 2025-11-07 090207.jpg

This happens when downloading the first till the 887th page successfully but the 888th page gives this error.

I am using exactly the same script before and after the launch of Walden but now I get this error.

Anyone has a solution for this because at this moment I cannot download my whole needed dataset.

Kind regards,
Ivo

Ivo Bleylevens

unread,
Nov 7, 2025, 3:05:24 AMNov 7
to OpenAlex Community
Oh and be aware that the links above only work temporary. The cursors change and the pages where this occurs also changes. On Ocotber 14th this error occured when going from page  596th to 597th with diferent cursors of course.

Op vrijdag 7 november 2025 om 09:03:58 UTC+1 schreef Ivo Bleylevens:

Samuel Mok

unread,
Nov 7, 2025, 4:18:22 AMNov 7
to Ivo Bleylevens, OpenAlex Community
I'm pretty certain this is an issue that's cropped up before in the mailing list: the response is larger than the limit (iirc 5 MB max?), so the API returns an error. Normally you can fix this by limiting the amount of results per page to reduce the response size. 
If you cut the amount of items returned per query to 50 it works with the new cursor (60 and above still break!). However, the next cursor breaks even with only 10 items per page, the limit is 7: it seems the next item is the culprit! If we try to grab that by itself by setting the number of items per page to 1 it still breaks however... so this is either a broken item or it's over the limit by itself. Let's figure that out by reducing the amount of fields returned by the API.

If we let the API only return the openalex id it works! We find the Work with ID https://openalex.org/W1993957098. The front-end also seems to be unable to load it (as it probably uses the same API as a backend that makes sense!). If we also grab the DOI we can see it's this article that's breaking the API for some reason: 

R. StockbrüggerG. CoremansF. CreedM. DapoignyS.A. Müller-LissnerF. PaceA. SmoutP.J. Whorwell; Psychosocial Background and Intervention in the Irritable Bowel Syndrome. Digestion 1 April 1999; 60 (2): 175–186. https://doi.org/10.1159/000007644

If you search for the title in the frontend it finds it immediately, but opening the query also breaks the frontend. 


Anyway, we now got a new cursor so we can skip this item. Let's remove the 'select' parameter, and return the number of items per page to 1, and hey presto, we can continue!

I added a simple retry-loop to my retrieval scripts to automate this process when I encounter errors -- the basic version solves almost all cases (repeat retry a query with the pagesize cut in half each time until it works); I never encountered a *single* item that breaks the API limit yet, until now!

Cheers,
Samuel

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/933edd98-7da9-4bca-a1b8-bb65ab74a343n%40googlegroups.com.

Ivo Bleylevens

unread,
Nov 7, 2025, 4:39:59 AMNov 7
to OpenAlex Community
Hi Samuel,
this is golden !! Thanks for looking into it.

But that also make me realize that I had a misunderstanding of the nextpage token. I thought the token gave me a link to the next page with 100 items. But instead the token gives me a link to the next item no matter what the page size is? So your approach is (i think there is a typo in your last hyper link? 1 instead of 100, that confused me a bit): using the next token with a page size of 1 whenever an error occurs, can lead me a step forward if the response is too massive, and when it still breaks with page size of 1 you just only retrieve the OpenAlex ID to fetch the next token only and from there on you continue with larger page sizes, if they are successful ?

Thanks again !


Op vrijdag 7 november 2025 om 10:18:22 UTC+1 schreef sam...@gmail.com:

Samuel Mok

unread,
Nov 7, 2025, 11:28:46 AMNov 7
to Ivo Bleylevens, OpenAlex Community
Yes, exactly right: the token points basically points to the starting point, and then parses the rest of the query to determine the full contents of the response. Otherwise the cursor would need to be a lot longer to encode all the possible settings! And you understood my heuristic almost perfectly - when I encounter an error I try again with page size halved, if that falls half it again, etc. if I get to pagesize==1 and it still fails, try with only ID. If that also fails we abort; otherwise we can continue again!

If we would immediately use pagesize 1 instead of halving it will take too much time to fix the problem, as it's possible that the 99th item is the one causing issues -- this will lead to 99 failed requests and retries before catching the offender.

Cheers,
Samuel 


Ed Summers

unread,
Nov 10, 2025, 7:37:59 AMNov 10
to Samuel Mok, Ivo Bleylevens, OpenAlex Community
Thanks for the discussion about this continuing issue. I ran into it again recently too. Here’s the problematic record I found:

https://openalex.org/works/W3200281942

Presumably these are the result of record editing by some kind of backend automated process, and not via the API itself?

In case it’s helpful I created a Python implementation of Samuel’s heuristic for identifying problematic records:

https://gist.github.com/edsu/3a30bb66cf15165a5e7078d22c7a3082

You give it the API URL that is causing the 500 error, and it will print out the problematic record ID (or IDs) and the total records found.

```
$ python3 openalexbug.py "https://api.openalex.org/works?filter=author.id:https://openalex.org/A5003671931&cursor=&per-page=200

problem record: https://openalex.org/W3200281942
121 records
```

If you have uv installed you can run it right from the gist URL like this:

```
$ uv run https://gist.github.com/edsu/3a30bb66cf15165a5e7078d22c7a3082/raw/1656107df08ed82ab37652d1dfc56b812e380f7d/openalexbug.py "https://api.openalex.org/works?filter=author.id:https://openalex.org/A5003671931&cursor=&per-page=200
```

//Ed
> This happens when downloading the first till the 887th page successfully but the 888th page gives this error.
>
> I am using exactly the same script before and after the launch of Walden but now I get this error.
>
> Anyone has a solution for this because at this moment I cannot download my whole needed dataset.
>
> Kind regards,
> Ivo
>
>
> --
> You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/openalex-community/933edd98-7da9-4bca-a1b8-bb65ab74a343n%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/openalex-community/e082341c-63f7-40a0-9fd5-5f20cabf1576n%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/openalex-community/CANB9rgtgT3vTHJMTQyE%3Ddxv2GU3Qgp3MahJ%3DVPhbJwnp%3D7f_wA%40mail.gmail.com.

Samuel Mok

unread,
Nov 10, 2025, 8:01:55 AMNov 10
to Ed Summers, Ivo Bleylevens, OpenAlex Community
That's great, thanks Ed! Although users should note that that script doesn't include a rate limiter, nor an email to use the polite pool; so if you're using it to retrieve a large amount of items it could give false positives for troublesome openalex items because of rate limiting errors instead of the api size limit.

Cheers,
Samuel 

Ed Summers

unread,
Nov 10, 2025, 8:05:39 AMNov 10
to Samuel Mok, Ivo Bleylevens, OpenAlex Community

> On Nov 10, 2025, at 8:01 AM, Samuel Mok <sam...@gmail.com> wrote:
>
> That's great, thanks Ed! Although users should note that that script doesn't include a rate limiter, nor an email to use the polite pool; so if you're using it to retrieve a large amount of items it could give false positives for troublesome openalex items because of rate limiting errors instead of the api size limit.

I added a sleep which should help. Of course you can easily add an email address if you want. But please don’t use mine :-)

Samuel, do you have any sense of how to fix the underlying cause of these errors?

Casey Meyer

unread,
Nov 10, 2025, 8:35:59 AMNov 10
to Ed Summers, Samuel Mok, Ivo Bleylevens, OpenAlex Community
Fantastic troubleshooting Samuel and a good discussion on how to avoid these errors. I looked into the problem IDs and found they had malformed abstract inverted indexes. So I added some exception handling that will return null abstract_inverted_index instead of an error page. We also have a ticket to look into why these are malformed and fix that. But for now this should reduce some of the errors you are seeing. Let us know if you find more and sorry for the issues.

Casey

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.


--
Casey Meyer, CTO
OurResearchWe build tools to make scholarly research more open, connected, and reusable—for everyone.

Ed Summers

unread,
Nov 10, 2025, 8:40:36 AMNov 10
to Casey Meyer, Samuel Mok, Ivo Bleylevens, OpenAlex Community

> On Nov 10, 2025, at 8:35 AM, Casey Meyer <ca...@ourresearch.org> wrote:
>
> I looked into the problem IDs and found they had malformed abstract inverted indexes. So I added some exception handling that will return null abstract_inverted_index instead of an error page. We also have a ticket to look into why these are malformed and fix that. But for now this should reduce some of the errors you are seeing. Let us know if you find more and sorry for the issues.

Many thanks for the quick fix!

//Ed

Ivo Bleylevens

unread,
Nov 10, 2025, 9:50:00 AMNov 10
to OpenAlex Community

Thanks all ! And great Casey when it is solved (then I dont have to implement a fix) ! Or do we have to wait for some release ?
I started my job...so I'll know tonight if the bug still exists.

Bye !

Op maandag 10 november 2025 om 14:40:36 UTC+1 schreef e...@pobox.com:

Casey Meyer

unread,
Nov 10, 2025, 6:13:07 PMNov 10
to Ivo Bleylevens, OpenAlex Community
Hi Ivo. You don't have to wait for a release or anything. It should be fixed already. Yes, let us know if you're still seeing the bug!

Casey

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages