Hi, Dear all
I hope this message finds you well.
I am using OpenAlex as an indispensable tool in my research, allowing me to access a vast amount of academic data efficiently and for free. Its interface and functionality have been extremely useful in finding and connecting relevant information, which has greatly contributed to the progress of my work. However, I have encountered a specific issue that has hindered the experience: I am noticing inconsistencies in the reference counts for some works in OpenAlex compared to the number of references in the original PDFs and the journal websites. Here are a few examples:
The work https://openalex.org/works/W3036130366 indicates 53 references, but both the original PDF and the journal's website show a different count.
For the work https://openalex.org/works/W3094482919, OpenAlex reports 43 references, but the PLOS ONE website and the PDF list 38.
Similarly, https://openalex.org/works/W3022039950 has 52 references in OpenAlex, but 49 in the original PDF and on the journal's website.
Finally, https://openalex.org/works/W3096640152 has 30 references recorded in OpenAlex, while the PDF and the journal's site show 29.
I have noticed this discrepancy occurring in thousands of cases. I would like to request help in understanding the reasons for these inconsistencies. Additionally, how might this impact the citation counts of the referenced works? For example, if there are five extra references, could these five works potentially receive citations, even though they were not cited in the original references of the citing papers?
Thank you for your attention, and I look forward to your response.
Sincerely, Silva
--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-community/31b0dd45-837b-4ed1-a315-02abc0cbc3b2n%40googlegroups.com.
> For the work https://openalex.org/works/W3094482919, OpenAlex reports 43 references, but the PLOS ONE website and the PDF list 38.
Here, there are a number of cases with multiple OpenAlex IDs for the same referenced work, for several reasons:
- different editions of the same book (from different years)
(https://openalex.org/W1587026990 and https://openalex.org/W4254687493)
- different records for the same software package on CRAN (one with doi, one without)
(https://openalex.org/W2171216257 and https://openalex.org/W4399542805)
- subsequent version of the same F1000Research publication (which each get a different DOI)
(e.g. https://openalex.org/W2609499779 and https://openalex.org/W4235332646
- records for a preprint and a review of that preprint (on preLights), while the original reference is to the preprint
In this case, the second record has some of the metadata of the preprint itself,
(https://openalex.org/W2986870495 and https://openalex.org/W4234222970)
In addittion, some of the original references were not matched with an OpenAlex record (e.g. references to websites, rather than scholarly outputs), and the OpenAlex reference listt includes one generic record for 'Deleted Work' (https://openalex.org/W4285719527)
Lots going on!
Some of these are known challenges, e.g. how to handle different preprint versions and citations to each version vs. all versions together.
Some of them are more straightforward errors in matching the OpenAlex record to the actual reference.
It should be noted, too, that discrepancies between references is common when comparing databases - especially where not all refererences from e.g. the pdf are included. Sometimes this occurs because they are not part of the database itself - e.g. with references to websites, reports e.d. or with references that (rightly or wrongly) could not matched to a DOI or PMID when the database only has records witth these identifiers (as in Dimensions).
For instance, the first example has the expected 44 references in Crossref, of which 42 with DOI included in the metadata- and only (these) 42 in Dimansions. The second example has the expected 38 referencs in Crossref, of which 29 with DOI included in the metadata, and 33 in Dimensions.
So overall, there's definitely room for improvement on many fronts, on the other hand, there are inherent limitations that are one reason to not overly rely on any citation count as a 'ground truth'.
Hope this helps, this was fun to untangle, so thanks for the 'puzzle'!
kind regards, Bianca
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-community/CANB9rgu-qO3-9y%3DhtPgjZyXQwqhvgXvwpAph-ivoaLsgX13mBA%40mail.gmail.com.
I still have some questions regarding the OpenAlex metrics, and I would like the community's help to clarify them:
How does OpenAlex handle citations for both a preprint and the published article when both are available? Crossref treats the preprint and its corresponding article as distinct citable objects, each with its own DOI. As a result, the citations are counted separately, with no merging or consolidation between the preprint and the peer-reviewed article. Here are some clarifications from Crossref on the matter:
https://community.crossref.org/t/avoiding-duplicate-doi/11965
https://archive.org/details/gmail-crossref-re-citations .
Does OpenAlex transfer the citations from a preprint to the article published in a peer-reviewed journal? Or does it treat them as separate citable documents due to their distinct DOIs? Furthermore, is it correct to attribute a citation to something that wasn’t explicitly cited but has a related earlier version?
Thank you in advance to anyone who can respond