Some observations on ROR data

30 views
Skip to first unread message

Steve Canham

unread,
Feb 18, 2025, 5:34:22 PMFeb 18
to ROR Technical Forum
Hi
As a quick follow up to my post on importing the ROR data dump to a Postgres database:

1) I've substantially updated the code, re-organised the documentation and renamed the system to 'imp_ror' (rather than ror1). If anyone is interested in looking at it, it can now be found at https://github.com/steve-canham/imp_ror/tree/master.

2) I've attached a brief report describing some of the inconsistencies I found in the data and also attached a spreadsheet with lists of the organisations concerned.
The issues are not major, but I feel - as ROR becomes more important and more widely used - that it's important to keep the data as clean and as consistent as possible.
Blame my background working in clinical trials, where cleaning data can becomes obsessive.
I hope it's useful!

As ever, please do not hesitate to get back with any comments or requests for further information.
Cheers
Steve
ROR Name issues.docx
ROR name data.xlsx

Amanda French

unread,
Feb 18, 2025, 6:06:24 PMFeb 18
to ROR Technical Forum, steveca...@gmail.com
Thanks, Steve! Hope others who use Postgres can benefit from your work!

I'll ask our curatorial team to take a look at the issues you've kindly identified in these documents! I think the "missing predecessor links" and "missing successor links" items are almost certainly not issues, though: "Predecessors" by design may optionally have "Successors" but are not required to, and vice versa. In that respect those relationship types are unlike "Parent", "Child", and "Related", which do require a corresponding relationship, at least for active records. See https://ror.readme.io/docs/data-structure#relationships for more about that. Additionally, I think that in inactive records, the relationships are often (or perhaps always? can't recall) deliberately removed. For fuller discussions of the logic behind these relationships and status types, see the community feedback documents at https://ror.readme.io/docs/feedback-docs on "Handling inactive organizations in ROR" from Summer 2022.

However, the work you've done to identify missing or duplicated name types looks very useful, and it's interesting to hear that we have one record with a name in the deprecated Serbo-Croat language code 'sh`! We'll take a hard look at those.

Cheers,

Amanda
Reply all
Reply to author
Forward
0 new messages