citation plugins

216 views
Skip to first unread message

Peter Krautzberger

unread,
Oct 15, 2011, 12:30:27 AM10/15/11
to Wordpress for Scientists
Hello.

I was hoping for some advice and discussion regarding citation related
plugins.

Since this has gotten a little longer: I will first describe the
problem and then add some questions.

Over at boolesrings.org we have had some problems this week. At
Booles' Rings we're experimenting with wordpress for academic
homepages (of mathematicians). We're essentially trying to find out
what is useful and/or necessary for an academic web presence via
wordpress.

Obviously, citations are important for documenting our own work and
writing about other people's work.

Since we're all mathematicians, there's the strong need for bibtex
import which is why papercite is popular -- it makes the move from
BibTeX to wordpress very easy. Unfortunately, papercite is very buggy
and we would like to replace it.

We're faced with the question: What do we need a citation plugin to
do?

Practically speaking,

a) bibtex import (but no dependence/sync)
* We have to start somewhere and that's where most people (in
mathematics) come from.
b) personal IDs for shortcode use
* we're human and we like to write ThatFamousPaper instead cryptic
ids
(I think mathematical writing is very different from scientific
writing in this respect -- papers can be holy objects...)
c) a GUI to look up/search for new citations
* Sometimes, you barely remember the paper's title.
* DOIs are cumbersome to look up anyway
* Searching multiple sources (google scholar, mendeley,
mathscinet, pubmed) would be nice while writing a post
* Maybe even links-to-citation functionality when quoting online
sources (blogs, mathoverflow etc)
d) Reversibility
* the citation in html (in a post) should include some form of
metadata that can be processed automatically (pingbacks, aggregation,
citation counts etc)


QUESTION 1: Do we have such a plugin?

1b) What plugins have which functionality?

* Kcite is excellent when you have the DOI (well, depends on the DOI
actually)
* bibtex-importer does a great job using links giving a local search
GUI -- but shoudn't citations be pages or a taxonomy?
* papercite offers the familiarity of keeping on as we do in LaTeX
* wpcitulike, bibliplugin seem to offer good external reference
sources
* zotpress seems to have almost everything, but requires zotero
* teachpress and scholarpress have too much overhead
1c) is there a plugin that uses Mendeley's api?


QUESTION 2: How do we want citations to work?

Ok, this is in hopes for a discussion. My amateur thoughts.

* reference management should be done by professionals not through
personally hacked bibtex files (we mathematicians have a bad habit...)
* references should be stored professionally, i.e., in the wp-database
or in a professional outside tool (mendeley, zotero, citeulike) (take
papercite as a terrible example relying on some random bibtex file
somewhere)
* even if an outside tools is used, actually referenced citations
should always be stored in the database.
* citations should be hardcoded into the post (when I review a
preprint, I don't want the reference to change to the published
version later)

Well, this has become more of a blog post... I guess I'll cross post
it at boolesrings.org/krautzberger...

In any case, I hope I made a little bit of sense. Any help is greatly
appreciated!

Best,
Peter.

Phillip Lord

unread,
Oct 17, 2011, 6:05:36 AM10/17/11
to wordpress-fo...@googlegroups.com
Peter Krautzberger <p.kraut...@googlemail.com> writes:
> Practically speaking,

> b) personal IDs for shortcode use
> c) a GUI to look up/search for new citations

b and c to some extent overlap. If you have a GUI which displays
citations sanely, the shortcodes underneath don't matter.

Something like kcite, based around CSL for rendering, should be able to
display the references while editing in what ever environment you want
to use.

While I realise that this is not a tool chain with a wide-user appeal, I
use emacs and it's bibtex tools to insert cites; it does the lookup, so
I don't have to.

> d) Reversibility
> * the citation in html (in a post) should include some form of
> metadata that can be processed automatically (pingbacks, aggregation,
> citation counts etc)

Processed by whom? The blog engine that hosts, or any reader?


> QUESTION 1: Do we have such a plugin?
>
> 1b) What plugins have which functionality?
>
> * Kcite is excellent when you have the DOI (well, depends on the DOI
> actually)

Or a pubmed ID -- I realise this might not be a big difference for a
mathematician, but it is. Kcite is limited to these two IDs at the
moment, because these are the ones I did first. Adding more sources is
entirely possible and something that I would like to do. I suspect that
the first on the list will be adding the ability to put arbitary
metadata into the post.

> QUESTION 2: How do we want citations to work?
>
> Ok, this is in hopes for a discussion. My amateur thoughts.
>
> * reference management should be done by professionals not through
> personally hacked bibtex files (we mathematicians have a bad habit...)

This depends on how professional the professionals turn out to be. If
their metadata is correct it's great. If it is wrong, different issue.


> * references should be stored professionally, i.e., in the wp-database
> or in a professional outside tool (mendeley, zotero, citeulike) (take
> papercite as a terrible example relying on some random bibtex file
> somewhere)
> * even if an outside tools is used, actually referenced citations
> should always be stored in the database.
>
> * citations should be hardcoded into the post (when I review a
> preprint, I don't want the reference to change to the published
> version later)


When you say "actually referenced citations", there are two ways of
interpreting this. Either the citation or the metadata for that
citation. Kcite, for example, stores the former, but not the latter.

When you say "store" here, I presume you mean "for all eternity". Kcite
now caches the metadata, but this is just for performance reasons.

The difference is important. As I say, metadata provides do get things
wrong or incomplete. If you cite a paper, and the metadata is wrong, do
you want to pick up the changes or live with the inaccurate metadata for
ever?

In terms of the underlying post or article changing, I don't think that
this is a feature of the citation engine, though. This is why
knowledgeblog displays all versions, imposes a (social) limitation that
once an article is finalized, no changes with semantic import should
occur, and we have made efforts to get our system archived so you can
turn the clock back independently of the blog owner. For academic
purposes, I think "unchanging" is more important than "uptodate".

Phil


Peter Krautzberger

unread,
Oct 19, 2011, 12:03:23 AM10/19/11
to Wordpress for Scientists
Phil, first of all a big thank you for your message. I've come to
realize that I was mixing technical with broader issues. Reading
(again) some of Martin's older blogposts on the issues I realize that
my original message was a little ignorant.

Let me try to react to your input while clearing up some of my
mistakes.

### regarding b)

This is really a small technical issue, not a fundamental one. I was
coming from the direction of Booles' Rings where we try to make the
transition to wordpress easy. I also use tools with automatic bibtex
lookup, but a lot of people are fond of their personal \cite{} codes
and feel they're important for their work flow.

With kcite you can use pubmed or DOIs, with papercite you can use your
own bibtex shortcodes -- I think it shouldn't be too hard to
accommodate both interchangingly. But this is not very important. I
was wondering: how does your emacs toolchain translate to wordpress --
is there a tool that turns \cite into kcite calls?


### regarding d)

This is more fundamental, I think.

But first: You asked "processed by whom?" I guess "by anyone" is my
answer. That is, if somebody has the html, they should find a unique
identifier to reconstruct what was cited (and possibly why).

Originally, I thought of this with content integrity in mind (see
below), but then aggregation also seemed to benefit naturally from
this.

### regarding kcite's future features.

A wish list from a mathematician's point of view.

arXiv and MathReviews are critical, zentralblatt also comes to mind.
MR is paywalled, however the look-up is free. So at least it's only
bad for the author...

Papercite uses bib2tpl which offers nice output options. Is there any
chance kcite might get similar output? Also, teachpress has a really
nice reference management interface.

### regarding "professional reference management tools"

This is a, as you pointed out, a messy subject. Somehow, I feel that
Google Scholar, Mendeley etc give me the kind of output I'd expect
these days when I see a button like "All 5 Versions". I don't know
what's right here.

Most services focus exclusively on plain journal data whereas I could
easily hope for a future where a preprint, a published version, an
expository blogpost, slides from a talk, a conference poster, a
youtube video, all constitute part of a single reference.

Of course, this is too much to hope for, but I can dream...

### regarding "actually referenced citations" etc

This part is very important to me but was a bit of a mess -- my
apologies. It connects to the larger issue of content integrity.

What I meant with "actually referenced" was simply the problem that
some plugins (kcite, zotpress) use outside services. This is fine, but
seems to me to risk the integrity of the content.

So I wrote "hardcoded" for "guaranteeing content integrity", as
compared to dynamically generated content from citation plugins such
as papercite that freshly convert a shortcode to generate the actual
html whenever the page is displayed.

For me "content integrity", i.e., the expectation that a post's html
cannot change unless somebody explicitly edits the post, seems vital
-- this is nothing special about citation plugins.

This integrity must include deactivation/deprecation of the (citation)
plugin! (The markdown-on-save plugin is a good example for this).

Finally, this would also greatly help with integrity in terms of
export: anybody with the html of a post should be able to get all
abstract metadata back out of it; also, tools like Martin's epub
plugin could benefit greatly from this.

Best,
Peter.



On Oct 17, 6:05 am, phillip.l...@newcastle.ac.uk (Phillip Lord) wrote:

Phillip Lord

unread,
Oct 19, 2011, 9:09:35 AM10/19/11
to wordpress-fo...@googlegroups.com

Peter Krautzberger <p.kraut...@googlemail.com> writes:
> Let me try to react to your input while clearing up some of my
> mistakes.
>
> ### regarding b)
>
> This is really a small technical issue, not a fundamental one. I was
> coming from the direction of Booles' Rings where we try to make the
> transition to wordpress easy. I also use tools with automatic bibtex
> lookup, but a lot of people are fond of their personal \cite{} codes
> and feel they're important for their work flow.
>
> With kcite you can use pubmed or DOIs, with papercite you can use your
> own bibtex shortcodes -- I think it shouldn't be too hard to
> accommodate both interchangingly. But this is not very important. I
> was wondering: how does your emacs toolchain translate to wordpress --
> is there a tool that turns \cite into kcite calls?

For emacs, I just use one of the standard Emacs tool chains, which is
reftex. I've just added more output styles so that, instead of \cite, it
puts in [cite]DOI[\cite] for kcite. Reftex gives me a GUI (should that
be TUI as it's all text?) for insertion, and then drops the DOI in
place.

Another solution would be to use tools my latextowordpress tool. Now, I
haven't gone full cycle on this yet, but it should work; what would be
needed is a bibtex stylefile which doesn't produce a bibliography, and
produces in-text citations using the shortcode syntax above.

We've used the same technique with word and zotero. We defined a style
in CSL, so you insert the citation using a GUI, then it gets translated
into kcite form at publication. See,
http://blog.fuzzierlogic.com/archives/529, for more info.

All of this sounds fairly complex. And at a coding level it does require
some pluging together. But, I would argue, this places the load on the
shoulders of the developer. The authors should just be able to use
latex, word, whatever the hell they want. I don't believe in retraining
people to use wordpress; I believe in coding wordpress to suit people.
For the authors, this should be seamless.

> ### regarding d)
>
> This is more fundamental, I think.
>
> But first: You asked "processed by whom?" I guess "by anyone" is my
> answer. That is, if somebody has the html, they should find a unique
> identifier to reconstruct what was cited (and possibly why).


Okay. I think that this is straight-forward. Kcite does this by dropping
the DOI or pubmed ID in place, into the HTML. The reader also gets to
see a visual form of this.


> ### regarding kcite's future features.
>
> A wish list from a mathematician's point of view.
>
> arXiv and MathReviews are critical, zentralblatt also comes to mind.
> MR is paywalled, however the look-up is free. So at least it's only
> bad for the author...

Kcite is (clunkily) pluggable at the moment, but this needs to get
better. Ultimately, I don't want to spend my life writing a data
integration tool, though. A "random ID to JSON" service would be
fantastic, and enable the data integration to happen elsewhere.

>
> Papercite uses bib2tpl which offers nice output options. Is there any
> chance kcite might get similar output? Also, teachpress has a really
> nice reference management interface.
>


Not used bib2tpl, but kcite is getting citeproc-js added (rendering will
be client side). This should offer plenty of output options.


> ### regarding "professional reference management tools"
>
> This is a, as you pointed out, a messy subject. Somehow, I feel that
> Google Scholar, Mendeley etc give me the kind of output I'd expect
> these days when I see a button like "All 5 Versions". I don't know
> what's right here.
>
> Most services focus exclusively on plain journal data whereas I could
> easily hope for a future where a preprint, a published version, an
> expository blogpost, slides from a talk, a conference poster, a
> youtube video, all constitute part of a single reference.
>
> Of course, this is too much to hope for, but I can dream...

Inserted a reference which just does a google lookup might achieve
something like this!


> ### regarding "actually referenced citations" etc
>
> This part is very important to me but was a bit of a mess -- my
> apologies. It connects to the larger issue of content integrity.
>
> What I meant with "actually referenced" was simply the problem that
> some plugins (kcite, zotpress) use outside services. This is fine, but
> seems to me to risk the integrity of the content.

It depends what you mean by "the content". The underlying identifier is
accessible with kcite, so you can always get back to this. Yes, it is
true that, if crossref change their metadata, the kcite generated
bibliography will change. But then perhaps this means that it is now
correct.

> So I wrote "hardcoded" for "guaranteeing content integrity", as
> compared to dynamically generated content from citation plugins such
> as papercite that freshly convert a shortcode to generate the actual
> html whenever the page is displayed.
>
> For me "content integrity", i.e., the expectation that a post's html
> cannot change unless somebody explicitly edits the post, seems vital
> -- this is nothing special about citation plugins.

Again, this is more complex. We need to be clear about can change and
what cannot, rather than saying nothing can change. For instance, what
if someone adds a comment to a post? What if the theme changes? Should
authors be able to add a supplementary erratum? What if the journal gets
sued and have to remove content?

It would be nice to have a simple rule ("Nothing must change"). But the
world isn't simple.

> Finally, this would also greatly help with integrity in terms of
> export: anybody with the html of a post should be able to get all
> abstract metadata back out of it; also, tools like Martin's epub
> plugin could benefit greatly from this.

I agree totally with this. Actually, it's one of the advantage of having
client-side rendering. All of the information is in the post or
accessible from the web.

Phil

Peter Krautzberger

unread,
Nov 24, 2011, 11:23:57 PM11/24/11
to Wordpress for Scientists
[Oh dear, I never replied?]

@Phil, thanks for your insightful replies.

What I meant with "no change": I just want to make sure that things do
not change by automated actions without the author or editor
interacting. I really like the markdown-on-save plugin as an example.

But I'm not too zealous about this...

Thanks again,
Peter.

Phillip Lord

unread,
Nov 25, 2011, 7:53:46 AM11/25/11
to wordpress-fo...@googlegroups.com

I think that automated changes should only happen at the presentation
layer. So, what the reader sees might change, but there should be some
"author source" which is exactly and only what the author typed.

Phil

Peter Krautzberger

unread,
Nov 25, 2011, 5:17:47 PM11/25/11
to Wordpress for Scientists
That makes a lot of sense to me. It would keep the reader updated but
allows for correction if the content becomes compromised.

On Nov 25, 7:53 am, phillip.l...@newcastle.ac.uk (Phillip Lord) wrote:

Reply all
Reply to author
Forward
0 new messages