Walden author IDs in works no longer match author IDs in authors

44 views
Skip to first unread message

James Iremonger

unread,
Nov 14, 2025, 11:25:32 AMNov 14
to OpenAlex Community
Hello,

We are using your new Walden snapshot and imported it into BigQuery for use. However, since the Walden update we have found that the author ids nested within the works object do not match the author id’s in the authors object.

Previously, if you JOIN works onto authors we got 98.9% of author ids in works joining with author ids in authors. Following Walden we are only getting 30% of works author ids found in authors.

To check it’s not an issue with our pipeline I have downloaded the most recent file works/updated_date=2025-11-11/part_0000.gz. The 3rd works entry shows the issues but this is the case for the majority of the works ids.

works id = https://openalex.org/W4401252543
Author id’s from works snapshot in .gz
https://openalex.org/A2170486304
https://openalex.org/A2613686676
https://openalex.org/A815211345
https://openalex.org/A3093809309

In the browser publication https://openalex.org/W4401252543 has the 4 authors, their links take you straight to their author pages no issues.
Neyran ALTINKAYA, Erdoğan Kavlak, Fatma Eser Özgencil, Soner Çağatay

The first author id from the dump does take you to Erdogan Kavlak’s page https://openalex.org/A2170486304. But the remaining 3 author links go nowhere, an OpenAlex page that shows the loading animation but no details ever appear. 

Tijmen Altena

unread,
Nov 17, 2025, 6:08:54 AM (12 days ago) Nov 17
to OpenAlex Community

Hi James,

Walden has re-introduced a number of Author IDs < A5000000000. They are, I believe, old and deprecated IDs that are should not be in Walden. I think there's a fix pending here:
https://github.com/ourresearch/openalex-walden/commit/2691055c0adff1237bfa182ed89b64a90a0fc9d6

But I'm not sure when that's fully through the pipe and whether that commit captures all of the problems. We're not doing anything with Walden until these entries are out of the dump, and I do hope the team keeps maintaining the first version of the data until that time. 

Best,

Tijmen



Op vrijdag 14 november 2025 om 17:25:32 UTC+1 schreef ja...@visfo.health:
Reply all
Reply to author
Forward
0 new messages