Canonical Citations

43 views
Skip to first unread message

Peter Gerdes

unread,
Mar 6, 2017, 9:08:01 AM3/6/17
to zotero-dev
For a variety of needs (merging duplicates, intelligent location of pdfs etc..) I created my own zotero command line client in python (it's available on github TruePath/zoterosync but it's probably not quite ready for anyone but me to use) and I'm quite satisfied with the results as far as they go.

However, a key desire of mine is to eventually have some kind of public canonical citation database which allows for community editing and updating.  Ideally I would do this without creating an entirely new webserver but looking through the zotero groups I was struck by the fact that there aren't any truly large scale public groups engaging in this kind of sharing.  So two questions.
  1. First, is there any technical limitation that would stop me from implementing a world editable list of citations (say in mathematical logic to start small) using Zotero's groups feature possibly with browser extensions or api using apps.  If not why hasn't anything like this been done.
  2. Secondly, if the above is possible how feasible would it be to extend/modify zotero to allow a document in a public group to include md5sums for every fulltext version of the paper encountered.  As it is now the md5 property is only available on imported files which is a bit of a hurdle for a public group but I imagine that could be worked around.

I suspect there is a huge useability cusp when you get to the point where pdfs can simply be thrown in a directory and the software automatically compares them against a publicly available list of hashes classifies and sorts them only asking for input when it comes across an unknown document.  I'm intent on making this happen but I'm not sure if I should pursue this by trying to modify/extend Zotero or build a new service and integrate it via a REST api.

I suspect I'm not the first person to bring this kind of idea up so maybe someone can point me to a prior discussion.

Marielle

unread,
Mar 7, 2017, 9:14:04 AM3/7/17
to zoter...@googlegroups.com
You might be interested to know that there is interest in putting that
sort of thing in wikidata:
https://meta.wikimedia.org/wiki/WikiCite_2016#Building_a_central_repository_of_citations_in_Wikidata

Wikidata is well suited for this kind of data and is human editable
already: i.e. see https://www.wikidata.org/wiki/Q1895685

The rub is getting things in and taking things out programmatically.
There's an API so it is definitely possible but no one has quite done
it yet :). There have been a number of projects to try to get
citations in and out though; I am not sure of the status of them.

At wikimedia we have a publically accessible API for getting citations
(https://en.wikipedia.org/api/rest_v1/#!/Citation/getCitation) which
uses Zotero in the backend, but this is not yet connected to wikidata.

Cheers,
Marielle
> --
> You received this message because you are subscribed to the Google Groups
> "zotero-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zotero-dev+...@googlegroups.com.
> To post to this group, send email to zoter...@googlegroups.com.
> Visit this group at https://groups.google.com/group/zotero-dev.
> For more options, visit https://groups.google.com/d/optout.

Sebastian Karcher

unread,
Mar 7, 2017, 9:33:31 AM3/7/17
to zoter...@googlegroups.com
Zotero has also expressed interest (and has taken some initial steps) to
create such a database just by aggregating user-generated citations.
Mendeley, of course, has already done that, and I think their database
is open via API.
I don't think Zotero groups are going to work well for this, though,
given the necessary size of such a collection. The Zotero software will
simply collapse under a group with even 500k items.

Doing something like this for Wikidata -- which would allow for more
curation, both manual and automated -- seems better suited as a platform.

On 03/06/2017 09:08 AM, Peter Gerdes wrote:
> For a variety of needs (merging duplicates, intelligent location of
> pdfs etc..) I created my own zotero command line client in python
> (it's available on github TruePath/zoterosync but it's probably not
> quite ready for anyone but me to use) and I'm quite satisfied with the
> results as far as they go.
>
> However, a key desire of mine is to eventually have some kind of
> public canonical citation database which allows for community editing
> and updating. Ideally I would do this without creating an entirely
> new webserver but looking through the zotero groups I was struck by
> the fact that there aren't any truly large scale public groups
> engaging in this kind of sharing. So two questions.
>
> 1. First, is there any technical limitation that would stop me from
> implementing a world editable list of citations (say in
> mathematical logic to start small) using Zotero's groups feature
> possibly with browser extensions or api using apps. If not why
> hasn't anything like this been done.
> 2. Secondly, if the above is possible how feasible would it be to
> extend/modify zotero to allow a document in a public group to
> include md5sums for every fulltext version of the paper
> encountered. As it is now the md5 property is only available on
> imported files which is a bit of a hurdle for a public group but I
> imagine that could be worked around.
>
>
> I suspect there is a huge useability cusp when you get to the point
> where pdfs can simply be thrown in a directory and the software
> automatically compares them against a publicly available list of
> hashes classifies and sorts them only asking for input when it comes
> across an unknown document. I'm intent on making this happen but I'm
> not sure if I should pursue this by trying to modify/extend Zotero or
> build a new service and integrate it via a REST api.
>
> I suspect I'm not the first person to bring this kind of idea up so
> maybe someone can point me to a prior discussion.
> --
> You received this message because you are subscribed to the Google
> Groups "zotero-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to zotero-dev+...@googlegroups.com
> <mailto:zotero-dev+...@googlegroups.com>.
> To post to this group, send email to zoter...@googlegroups.com
> <mailto:zoter...@googlegroups.com>.

Dan Stillman

unread,
Mar 7, 2017, 6:09:53 PM3/7/17
to zoter...@googlegroups.com
On 3/7/17 9:33 AM, Sebastian Karcher wrote:
> Zotero has also expressed interest (and has taken some initial steps) to
> create such a database just by aggregating user-generated citations.


Yes, we'll have more on this soon.

Peter Gerdes

unread,
Mar 7, 2017, 10:43:19 PM3/7/17
to zoter...@googlegroups.com
Ideally any such system really should be implemented on top of a citation storage system like zotero with access to large number of user citations that is frequently updated so one could use machine learning tools to generate best guesses at canonical citations even for relatively rare new articles. Then such data could be exposed for human editing and approved to truly canonical status and other zotero users could set their citations to track changes in the canonical version.

Most importantly if you want a fulltext_filehash :-> citation mapping you need to have access to a large number of fulltext hashes.  I think this aspect is the most import part of such a server because (amoung other things) it’s this data that allows someone with an unorganized pile of pdfs sitting on their computer to immediately and effortlessly transform it into a useful library

Of course one would need an option to let users choose if they want to share all/some of their citations (minus personalized fields) anonymously…maybe require it to track canonical citations.  Any chance of adding this kind of optional data sharing/mining to zotero.  If you do I’d be glad to help implement the system.




One very interesting aspect that I think hasn’t been fully appreciated yet is that a database associating sha1 hashes with citations might make bittorrent a plausible source of fulltext documents.  If someone writes a library management program that shares all fulltext documents over bittorrent with dht/magnet links then any potential consumer can build the magnet link for the desired document just from the hash.  This could enable truly distributed storage and other nice features but it means there might be resistance from the publishing houses (though realistically I suspect they known individual users are unlikely to pay up so as long as this mechanism remains imperfectly reliable university libraries will still have to subscribe and no one will be too upset.) 

--
You received this message because you are subscribed to a topic in the Google Groups "zotero-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/zotero-dev/oi9yMzB2D5k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to zotero-dev+...@googlegroups.com.
To post to this group, send email to zoter...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages