Example of affiliation anomaly (Oldham council) and example use of LLM attached to MCP-connection to local OpenAlex instance

29 views
Skip to first unread message

Tobias Jeppsson

unread,
Mar 21, 2026, 4:25:35 PM (3 days ago) Mar 21
to OpenAlex Community

Hi,

I wanted to share a specific type of affiliation mismatch that the OpenAlex team might want to look at, as well as an example of how a LLM (in this case the Claude desktop app) can interact with our local instance of OpenAlex when exposed through MCP. Many others of you have probably looked into similar things, but it might serve as an interesting example.

As for the specific mismatch, it seems like “Oldham council” for some reason is being mapped to a large number of organizations based on raw affiliation strings such as “"Independent - affiliation not provided to SSRN" (in these cases all publications are coming from “SSRN Electronic Journal”). The issue came up and was automatically flagged by Claude as a probable anomaly during a test run to try out different queries and mine some of the results. When I look into the specific ROR flagged it seems like we have ~400000 work-author connections in our database, so the issue is relatively large.

See e.g. these selected parts of responses to follow-ups to an initial question to “Which are the main organizations globally that perform research in the same topic in the same time period?” ( a follow up to a query on the topic “Metal and Thin Film Mechanics” related to the Swedish research funder Vinnova)

Claude: “Note that "Oldham Council" (GB, rank 3 with 1,230 works) looks like an OpenAlex data artefact — a local UK council is unlikely to be a major thin film mechanics research institution. Let me flag that and also check where the top Swedish and other notable Western institutions land:
[…]
Oldham Council (rank 3) is almost certainly an OpenAlex affiliation disambiguation error — likely a misclassified UK address — and should be disregarded.”

User:

“Look online at Oldham council, to see what research organizations or companies that can be found there, which could explain the presence within "Metal and Thin Film Mechanics".


Claude:

" […] The mystery is solved. The raw affiliation string is "Independent — affiliation not provided to SSRN" in every single case. This is a classic OpenAlex affiliation disambiguation error: when authors submit preprints to SSRN without providing an institutional affiliation, OpenAlex's parsing algorithm has apparently mis-mapped the placeholder string "Independent — affiliation not provided to SSRN" to the ROR entity "Oldham Council" (ROR ID 05cxzzq07). There is no research being done by the actual Oldham Metropolitan Borough Council.”

Claude
“[…]
The SSRN string does appear exactly as I stated initially — but it is specifically a SSRN preprint server artefact, not a general "Independent consultant" catch-all. The venue is consistently SSRN Electronic Journal across all cases. There are no city, postal code or address fields — the raw affiliation string contains only the placeholder text. In multi-affiliation cases (e.g. work_id 4304774962), one author slot maps to "Oldham Council" via the SSRN string while the other affiliation slots correctly resolve to real institutions (CNRS labs in France, Max-Planck in Germany).

My earlier sample showing "Independent Consultant, Rabat Morocco" etc. was from a different, broader set of records also mapped to ROR 05cxzzq07 — so there are actually two patterns:

  1. SSRN preprints where the author didn't declare an affiliation → "Independent - affiliation not provided to SSRN" → Oldham Council
  2. Other preprint/journal submissions where authors described themselves as independent consultants/researchers → also incorrectly mapped to the same ROR

Both are disambiguation errors, but the SSRN one is by far the most systematic and largest in volume.


I have attached parts of the conversation transcript for reference (https://claude.ai/public/artifacts/f03c32e7-de4e-4035-99ca-f10310b45c43, generated and styled by Claude), which also may be interesting for some of you to see some of the possibilities and how it was prompted. This is basically complete, but omit some of the “thinking sections” and pre-sql-query notes by Claude.

This work was done at KTH Royal institute of technology library against a local Swedish instance of OpenAlex, as a duckdb-database built on parquet files (with the placeholder name Swemetrics, see https://swemetrics.se/). At our end the MCP-connection is a recent development and in very early initial testing, but clearly powerful. The same goes for development of Swemetrics as a whole, which is in early development and fluid, done in collaboration with Karolinska Institutet University Library. The backend and MCP-implementation was mainly done by my collegue Markus Skyttner in the last couple of weeks.

Kind regards

Tobias Jeppsson

(sorry for possible double posting; seems like this was attached to a previous thread by mistake)

Gabor Schubert

unread,
5:28 AM (2 hours ago) 5:28 AM
to OpenAlex Community

Hi Tobias,

This is really interesting. Your experiment demonstrates that LLM-based services/applications can identify and flag such errors.

Interestingly I found a few publications (although not about thin films) which were actually written by someone with real ties to Oldham Council for example: https://openalex.org/works/W2321618443 and https://openalex.org/works/W4381283478. These are written by Steve Rogowski (ironically an independent scholar according to his Google Scholar page: https://scholar.google.com/citations?user=zsgwOScAAAAJ&hl=en&oi=ao) in the field of Social Work who sometimes uses "Oldham Council" as affiliation in his publications. According to his biography, he was indeed active as a social worker at Oldham Council (https://essl.leeds.ac.uk/law/news/article/2359/from-leeds-law-to-social-justice-the-inspiring-journey-of-dr-steve-rogowski

I found a few other large scale mismatches with simple "visual check": Currently the two top sources among journal type sources in OpenAlex are two journals in niche fields (https://openalex.org/works?group_by=publication_year,open_access.is_oa,primary_topic.id,authorships.institutions.lineage,type,primary_location.source.type,primary_location.source.id&page=1&filter=primary_location.source.type:journal): "Medical Entomology and Zoology" (https://openalex.org/works?filter=primary_location.source.id:s4210197683) with almost 2 million publications, and "SHILAP Revista de lepidopterología" (https://openalex.org/works?page=1&filter=primary_location.source.id:s112646816) with more than 1.3 million publications.

These are obviously from source-matching errors, which causes that around 3 million items (0.6% of the total 480M records in OpenAlex) have erroneous sources.

"Medical Entomology and Zoology" is the journal of the Japan Society of Medical Entomology and Zoology with probably less than 10 thousands articles (according to Crossref): https://api.crossref.org/works?filter=issn:0424-7086. The error was most probably caused by the mismatching of sources when items from the Japanese CiiNi database were ingested by OpenAlex (https://en.wikipedia.org/wiki/CiNii)

"SHILAP Revista de lepidopterología" is the journal of "Sociedad Hispano-Luso-Americana de Lepidopterología" and is about moths and butterflies. It is highly unlikely that a such a journal published so large number of publications, and it is also suspicious that more than 100k publications are articles are in the topic of Military technology and strategies. This error is most likely caused by mis-matching sources when fetching data from DOAJ.

In the case of "Oldham Council" it is obvious that almost all the matched items were erroneous, but in other cases it might be not this straightforward. In the case of the two "mega-journals" above there are several thousands of legitimate articles which were actually published in these journals, but they are just hiding among the hundreds of thousands erroneous items.

Best regards,
Gabor Schubert
Stockholm University

Reply all
Reply to author
Forward
0 new messages