⚠️ Errors and inconsistencies in OpenAlex API documentation and API responses

151 views
Skip to first unread message

Samuel Mok

unread,
Nov 15, 2025, 5:26:48 AM11/15/25
to OpenAlex Community

TL;DR: there are quite a few undocumented fields returned by the API, and fields that have a different structure compared to the docs. I made a quick notebook with python dataclasses to test these issues which you can run yourself from your browser here as an app (as shown in the screenshot), or here as a notebook w/ editable source code.
Screenshot 2025-11-15 112111.png
Now the actual message:

During developing a Python client for the OpenAlex API, I have encountered several discrepancies between the official documentation and the actual API responses. I am documenting these issues here in the hope that they can be addressed to improve the API's reliability and usability.

The python file defining these dataclasses can be found in my github repo: https://github.com/utsmok/aletheca/blob/main/src/aletheca/entities.py. In short, I created a lot of nested dataclasses to represent the various entities and objects in the OpenAlex API, and used the dacite library to parse API JSON responses into these dataclasses, using the strict mode. This throws an error if there are any extra undeclared fields, missing non-optional fields, or if the data does not match the expected type[s].

The data was retrieved from 8 endpoints (works, authors, institutions, sources, publishers, concepts, funders, and topics) using the ?sample=n parameter, and no other filters or parameters. Most sample sets were of size 50. A marimo notebook was made to run the queries and analyze the results, which can be found in the repo as well, here: https://github.com/utsmok/aletheca/blob/main/marimo_checks/check_entity_dataclasses.py, or run directly in the cloud using marimo's services: https://marimo.app/gh/utsmok/aletheca/main?entrypoint=marimo_checks%2Fcheck_entity_dataclasses.py

Now, on to the issues:

Issues shared between multiple entities
  • id: I encountered dehydrated entities which had a null value for the id field, something that should not happen according to the documentation. Example: Found in authorships for https://openalex.org/W7104996979

  • score and relevance_score: These fields appear in API responses where they should not: e.g., when using ?sample=10. The values were very high, close to 1.0. Elastic backend error?

  • created_date and updated_date: One or both of these fields were found missing for core entities sometimes. Created date should of course always be present; and the updated data can of course always use the created data as a fallback -- so neither should ever be empty. Examplehttps://openalex.org/P4361727468

  • topicstopic_share: These fields were found in multiple entities (Source, Institution) but are not documented anywhere for those endpoints.

  • nullable and/or missing fields of type list or bool: Several fields documented as lists or booleans were found to be null or missing instead of being empty lists or false. It would be very helpful if the api was consistent about this: either always return the field with a default value ([] or false for example), or always omit it when there is no value. Now it's a crapshoot -- we not only have to check if the value is empty or null before e.g. iterating over a list, but also check if the field is even present at all!

    Examples:

Work
  • funders: The works.funders field, which appears to be a list of DehydratedFunder objects (? -- aren't documented either!), is not documented.

  • institutions: A top-level institutions field (list of DehydratedInstitution?) is present but not documented.

  • has_content: The has_content object ({pdf: bool, grobid_xml: bool}) is not documented.

  • institution_assertionsis_xpac (bool), awards are also present and undocumented, but I haven't properly investigated them yet.

  • has_fulltext: was found to be null instead of falseExamplehttps://openalex.org/W3028709719

  • indexed_in: The documentation lists only a few valid values, datacite is missing but is returned by the API.

    Work-Location Object (e.g. best_oa_location, primary_location, locations)
Source
  • is_indexed_in_scopusoa_flip_yearis_high_oa_rateis_ojsis_in_scielois_high_oa_rate_since_yearis_in_doaj_since_yearoa_works_countlast_publication_yearfirst_publication_year are all undocumented fields that I found in the Source entity.
Institution
  • type_id is an undocumented field that I found in the Institution entity.
  • type: The api returns funder as a value for this field, but this is not included in the docs.
  • associated_institutions.relationship: The value successor is returned but is not documented.
  • The description key within the international object is not documented: see https://docs.openalex.org/api-entities/institutions/institution-object#international -- it should only have display_name as a key, holding a dict; but there is a second key description with another dict.
PublisherConcept

I know these will be deprecated 'soon' -- but as they are still in the API after the refresh and the 'soon' message has been there for a long time, I figured it's still worth reporting these issues. I did notice a lot of problems with x_concepts and other concept-related fields in the responses, so maybe it's a good time to drop them now?

  • image_url & image_thumbnail_url are present but not documented.
  • related_concepts: Can be null, should probably be []. Also, dehydrated concepts within this list can have a null wikidata value, which should never happen according to the docs. (Example: https://openalex.org/C65148998)
  • ancestors: Can be null even for concepts not at level 0, which should not happen according to the docs. (Example: https://openalex.org/C94727143
Message has been deleted

Ivo Bleylevens

unread,
Nov 27, 2025, 7:00:34 AM11/27/25
to OpenAlex Community
Interesting post Samuel ! While I see that some of the issues you found are solved at the moment, I also found out that some fields disapeared from the Works endpoint after the launch of Walden.

This is what I saw in the WORKS endpoint, and it would be nice to know when it is a good moment to change/adapt our software to be future proof. Is this how the JSON responses will look like from now on ?

Cheers,
Ivo


primary_location: ID is new field
primary_location: is_indexed_in_scopus is gone
primary_location: raw_type is new
primary_location: raw_source_name is new
type_crossref is gone
institution_assertions has become institution
has_fulltext is gone
fulltext_origin is gone
datasets is gone
versions is gone
awards is new field
funders is new field
has_content is new field with pdf and grobid_xml
abstract_inverted_index_v3 ?
cited_by_api_url is gone



Op donderdag 27 november 2025 om 12:59:49 UTC+1 schreef Ivo Bleylevens:
Interesting post Samuel ! While I see that some of the issues you found are solved at the moment, I also found out that some fields disapeared from the Works endpoint after the launch of Walden.

This is what I saw in the WORKS endpoint, and it would be nice to know when it is a good moment to change/adapt our software to be future proof. Is this hope the JSON responses will look like from now on ?

Cheers,
Ivo


primary_location: ID is new field
primary_location: is_indexed_in_scopus is gone
primary_location: raw_type is new
primary_location: raw_source_name is new
type_crossref is gone
institution_assertions has become institution
has_fulltext is gone
fulltext_origin is gone
datasets is gone
versions is gone
awards is new field
funders is new field
has_content is new field with pdf and grobid_xml
abstract_inverted_index_v3 ?
cited_by_api_url is gone


Op zaterdag 15 november 2025 om 11:26:48 UTC+1 schreef sam...@gmail.com:

Laura Wrubel

unread,
Dec 5, 2025, 1:16:29 PM12/5/25
to OpenAlex Community
Wondering if there has been any follow-up about changes to entities? It would be helpful to know when the API documentation will reflect the new data structure so that we can update our code. 

We noticed that grants disappeared from Work entities, and seems to be replaced with awards. But there is no documentation of awards. In fact, the Funders documentation still points to grants. https://docs.openalex.org/api-entities/funders

Laura

Jason Priem

unread,
Dec 6, 2025, 6:30:51 PM12/6/25
to Laura Wrubel, OpenAlex Community
Thanks for the reminder, Laura, I'll make sure to get that documentation updated! Apologies for the lag!
j

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/fc3df49e-8c0a-4e94-9970-00ed4252920bn%40googlegroups.com.

wordslikethis

unread,
Dec 15, 2025, 11:48:45 AM12/15/25
to OpenAlex Community
I've updated the docs, and I think all those are now covered. Sorry for the omissions.
j

Laura Wrubel

unread,
Dec 15, 2025, 4:32:29 PM12/15/25
to OpenAlex Community
Thanks for updating the Work entity fields. Much appreciated!

Samuel Mok

unread,
Dec 16, 2025, 7:31:27 AM12/16/25
to wordslikethis, OpenAlex Community
Hi Jason,

Thanks for updating the docs! According to my script there are still quite some fields that aren't covered in the documentation though. I made a more convenient table to list them this time, see below. Each link in the table has the anchor for the specific field, so it (should) directly go to the correct info if it's on the page. I also wanted to see what was changed in the docs (and/or propose direct changes to the docs), so I visited the repo for your docs. However, the repository doesn't seem to match up with the live pages, as the last commit is a few weeks old and the actual docs were updated 2 days ago. Am I looking in the wrong place? Or is the repo not synced with the production docs?

Hope this helps! 
Cheers, Samuel

Undocumented fields

Field NameDescription / NotesLinkStatus
institution_assertionsWork Object⚠️ Missing
institutionsList of dehydrated institutions.Work Object⚠️ Missing
datasetsWork Object⚠️ Missing
versionsList of version URLs.Work Object⚠️ Missing
has_fulltextBooleanWork Object⚠️ Missing
cited_by_percentile_yearWork Object✅ Documented
is_xpacBoolean flag.Work Object✅ Documented
license_idDistinct from license string.Location Object✅ Documented
raw_source_nameRaw string used to match source.Location Object✅ Documented
idLocation object ID.Location Object✅ Documented
raw_typeRaw type string.Location Object✅ Documented
topicsAuthor Object⚠️ Missing
topic_shareAuthor Object⚠️ Missing
topicsSource Object⚠️ Missing
topic_shareSource Object⚠️ Missing
is_indexed_in_scopusBoolean flag.Source Object⚠️ Missing
oa_flip_yearInteger year.Source Object⚠️ Missing
is_high_oa_rateBoolean flag.Source Object⚠️ Missing
is_ojsBoolean flag.Source Object⚠️ Missing
is_in_scieloBoolean flag.Source Object⚠️ Missing
is_high_oa_rate_since_yearInteger year.Source Object⚠️ Missing
is_in_doaj_since_yearInteger year.Source Object⚠️ Missing
oa_works_countInteger count.Source Object⚠️ Missing
last_publication_yearInteger year.Source Object⚠️ Missing
first_publication_yearInteger year.Source Object⚠️ Missing
host_organization_lineage_namesList of names.Source Object⚠️ Missing
raw_typeRaw string type.Source Object⚠️ Missing
topicsInstitution Object⚠️ Missing
topic_shareInstitution Object⚠️ Missing
type_idString ID.Institution Object⚠️ Missing
homepage_urlURL string.Publisher Object⚠️ Missing
oa_works_countFound in counts_by_year for non-Work entities.e.g. Author Object#counts_by_year⚠️ Missing
descriptionLocalized description dict. (additional top level key after display_name)e.g. Concept Object#international⚠️ Missing

Value and/or subfields mismatches

Field NameIssue DetailsLinkStatus
indexed_inAPI returns "datacite"; docs do not list this value.Work Object#indexed_in⚠️ Missing value
typeAPI returns "funder"; docs do not list this value.Institution Object#type⚠️ Missing value
associated_institutionsrelationship returns "successor"; docs do not list this value.Institution Object#associated_institutions⚠️ Missing value
parent_publisherAPI returns an object (dict); docs say it is a String (ID).Publisher Object#parent_publisher⚠️ Mismatched type

Reply all
Reply to author
Forward
0 new messages