Thoughts on efficiently calculating number of self-citations by journal year?

165 views
Skip to first unread message

brydon brancart

unread,
Aug 17, 2023, 2:46:24 PM8/17/23
to OpenAlex users
Hi,

I'm curious to hear if anyone has thoughts on efficiently removing self-citation when calculating the impact factor of a journal.

My current pipeline is:

1) Get venue
2) Get all works ids for that source by year
3) Hit the cite API for all citations for each work and record the source/citation counts

This works in practice but is very slow on my machine, curious if others have approached this problem in a different way before.

Thanks,

Brydon

MB Paulus

unread,
Aug 21, 2023, 8:25:50 AM8/21/23
to OpenAlex users
Hi Brydon,

Whenever we thought about these types of questions the (hypothetical) solution was to work with the snapshot instead of the api.
Removing self-citations in a graph-type data structure should be straightforward; the only issue I see are cases in which the author disambiguation didn't work well.
Curious if there are any solutions to this using the api instead.

Best,
Max

Op donderdag 17 augustus 2023 om 20:46:24 UTC+2 schreef brydonp...@gmail.com:

Dave Middleton

unread,
Aug 24, 2023, 10:13:56 AM8/24/23
to OpenAlex users
Hi Brydon,

Your approach seems to be valid.

You can get an overview of works and all citations from a specific journal per year as follows:

For this journal example, OpenAlex may not go beyond a certain publication year..

To get the self-citations in works in this journal:
Get all OA-IDs for works published in this journal in the year 2022:

Then loop over each work to get the citations and filter them on the same journal:
https : // api . openalex . org/works?filter = cites : W1234567890 , locations . source . issn : 1097 - 0266

Note the number of citations >> number of self-citations for this example.

Even in the case of finding self-citations by an author e.g. A12134567890, you have to loop over all works by that author:
Unless there is a simpler way to get self-citations from the REST API, this may be it.

Cheers,

Dave
Reply all
Reply to author
Forward
0 new messages