September snapshot released

218 views
Skip to first unread message

Casey Meyer

unread,
Sep 22, 2023, 10:27:17 AM9/22/23
to OpenAlex users
Good morning,

Happy Friday! I'm excited to announce that the latest OpenAlex snapshot is available to download. Changes in this snapshot are:

- added raw author name to authorships objects in works (https://docs.openalex.org/api-entities/works/work-object/authorship-object#raw_author_name)
- institution lineage, or parent institution IDs, is available in works, authors, institutions (https://docs.openalex.org/api-entities/institutions/institution-object#lineage)
- sustainable development goals are assigned to 215 million works (https://docs.openalex.org/api-entities/works/work-object#sustainable_development_goals)
- improved institution matching for 1.1 million works
- countries distinct count available in works
- added ~700 new sources
- matched primary source for 248,647 old works
- abstract inverted index is correct object in snapshot (InvertedIndex key removed)
- updated_dates are in full ISO format
- documentation scripts updated for current snapshot

Some key notes when processing the snapshot:

1. Abstract inverted index can be accessed just like it is in the API. The extra key "InvertedIndex" that was incorrectly present in the last snapshot has been removed.
2. I tried my best to ensure updated_date is in timestamp format. But I don't think I was able to fix them all. Please let us know if you find updated_date in regular short-date format by creating a ticket at https://openalex.org/help
3. We're on track to assign sustainable development goals to all works (20 million left to go) in the next week. Keep in mind we have a threshold criteria, so some works may not have SDGs due to a low prediction scores.

Thanks,
Casey
Reply all
Reply to author
Forward
0 new messages