The return of Citationgraph Bot

7 views
Skip to first unread message

James Hare

unread,
Feb 20, 2021, 3:00:54 PM2/20/21
to wikicite...@wikimedia.org
In 2016 I started a project to populate Wikidata with citation relationship data, i.e., instances of one work citing another. This started as a batch script and in 2018 evolved into a bot, Citationgraph bot. The citationgraph bot pulled citation data from PubMed Central and mapped it to entries on Wikidata. Subsequently I developed Citationgraph bot 2, which does the same using data from Crossref.
​At the end of 2018, the two bots went down and stayed down for over two years. As the corpus of academic paper data grew on Wikidata, I could no longer provide sufficient technical resources for the bot.
​As of this month, both Citationgraph bot and Citationgraph bot 2 are back online. The original Citationgraph bot still uses the PubMed Central API, while Citationgraph bot 2 has been updated to rely on a Crossref data dump from March 2020.
​What this data allows us to do is identify the most important and significant papers among the tens of millions. And if a given paper is cited on Wikipedia, having a built out citation graph on Wikidata allows you to follow the chain of provenance, from Wikipedia article to review article to underlying original sources. Alternatively, this could help in recommending sources for use in Wikipedia articles.

Liam Wyatt

unread,
Feb 20, 2021, 3:09:25 PM2/20/21
to James Hare, wikicite...@wikimedia.org
Congratulations james! 
Please do keep us all updated as to the progress of the bot, and of the underlying goals. 

As a notice to anyone, and in this specific instance to james -
As you probably know I run the @wikicite twitter account, which is quite well followed. If there is something anyone would like to share that would be of interest to the that public audience - like this news here - please contact me and I’m very happy to help promote your work. (For what it’s worth) :-) 

--

Liam Wyatt [Wittylama]

WikiCite Program Manager & Okapi Community Liaison
Wikimedia Foundation

Tom Morris

unread,
Feb 21, 2021, 12:45:27 PM2/21/21
to James Hare, wikicite...@wikimedia.org
That sounds great.

On Sat, Feb 20, 2021 at 3:00 PM James Hare <james...@gmail.com> wrote:
At the end of 2018, the two bots went down and stayed down for over two years. As the corpus of academic paper data grew on Wikidata, I could no longer provide sufficient technical resources for the bot.
As of this month, both Citationgraph bot and Citationgraph bot 2 are back online.

Out of curiosity does "technical resources" mean computer time or people time? What changed?
 
The original Citationgraph bot still uses the PubMed Central API, while Citationgraph bot 2 has been updated to rely on a Crossref data dump from March 2020.

Any plans to update to using the Jan 2021 dump and/or the Crossref API for more recent info?

Tom

James Hare

unread,
Feb 21, 2021, 12:56:40 PM2/21/21
to Tom Morris, wikicite...@wikimedia.org
On Sun, Feb 21, 2021 at 9:45 AM Tom Morris <tfmo...@gmail.com> wrote:

Out of curiosity does "technical resources" mean computer time or people time? What changed?

In short I had two choices: re-engineer the bot to use resources more efficiently, or use more powerful hardware. After years of not being able to do either, I relaunched the bot on my newly assembled workstation.
 
 
The original Citationgraph bot still uses the PubMed Central API, while Citationgraph bot 2 has been updated to rely on a Crossref data dump from March 2020.

Any plans to update to using the Jan 2021 dump and/or the Crossref API for more recent info?

Thank you for the link! I will incorporate this dump into the bot promptly.

In general, I prefer working with data dumps over APIs. This work involves a tremendous amount of data and serving this piecemeal over an API would take a long time and/or risk bringing down their servers.
 

Tom
Reply all
Reply to author
Forward
0 new messages