serials series journals

5 views
Skip to first unread message

Matthew Person

unread,
Nov 17, 2009, 5:42:16 PM11/17/09
to taxo...@googlegroups.com
Greetings:
At the BHL Staff meeting at the Smithsonian NHM 2 weeks ago,
we were invited to join this group to keep up on wider developments related to our work;
I noticed conversations this week in this group discussing definitions of serials, journals, and series.

Just to let you know- there lots of BHL librarians out there who deal with the above
relationships every day and would be pleased to contribute to such discussions if you
need that perspective in your projects.

Cordially,
Matt Person
MBL:WHOI Library

--
>)))'>
         >))'>
                      >)'>
Matthew Person
Tech Services Coordinator
MBLWHOI Library  
www.mblwhoilibrary.org
Woods Hole, Massachusetts, USA
              >>}}}'>
>>}}}'>
MBLWHOI is a partner in the
Biodiversity Heritage Library
www.biodiversitylibrary.org


                     <'(<

Stephen Thorpe

unread,
Nov 17, 2009, 5:45:56 PM11/17/09
to Matthew Person, taxo...@googlegroups.com
Hi Matt: very true, but I think these people want to reinvent the wheel for themselves! This issue is trivial anyway, relative to the major issues in biodiversity informatics, foremost among which is the global decline in funding basic taxonomic research. Let's not forget the bigger picture ...
 

From: Matthew Person [mpe...@mbl.edu]
Sent: Wednesday, 18 November 2009 11:42 a.m.
To: taxo...@googlegroups.com
Subject: [TaxonLit] serials series journals

--

You received this message because you are subscribed to the Google Groups "Taxonomic Literature" group.
To post to this group, send email to taxo...@googlegroups.com.
To unsubscribe from this group, send email to taxonlit+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/taxonlit?hl=.

Richard Pyle

unread,
Nov 17, 2009, 6:32:23 PM11/17/09
to taxo...@googlegroups.com
Many thanks, Matt! Help is always appreciated, of course!



Contrary to Stephen's assertions, "these people" have tried desperately (for years) to specifically *not* re-invent any wheels. Most of us had always assumed that solutions to meet our (biodiversity community) needs for literature would emerge from the Library community. Alas, they have not. There are several reasons, I think, for this disparity between the needs of the biodiversity community and the existing products of the library community, but I won't dwell on those now. Suffice it to say that most of the post-TDWG threads on this list deal with those specific areas of disparity.



So ... the core group who spent several sessions and a couple of dinner conversations discussing a literature exchange schema for this (biodiversity) community at the recent TDWG meeting (which included Anna Weitzman [SI], Chris Lyal [NHM], Chris Freeland [BHL], Cathy Norton [BHL], Guido Sautter [Plazi], Paul Kirk [CABI/Index Fungorum], Dauvit King, and several others) decided to stick as closely as possible to an existing standard (which Chris Freeland recommended as the EndNote standard), and extend it to meet the specific needs of the biodiversity community.



I'll leave it to Chris F. and/or Cathy to comment more on their own perspectives on this "gap" between the library community & the biodiveristy community. But, obviously, it would be very, very helpful to see how the library community has dealt with such issues as reference granularity below the "Article" level, formatting of certain words and characters within titles, precision dating, and parsing of journal/series titles (with multiple, simultaneously legitimate renderings); and other aspects of specific interest to taxonomists and the biodiversity community in general.



Aloha,

Rich



Richard L. Pyle, PhD

Database Coordinator for Natural Sciences

and Associate Zoologist in Ichthyology

Department of Natural Sciences, Bishop Museum

1525 Bernice St., Honolulu, HI 96817

Ph: (808)848-4115, Fax: (808)847-8252

email: deep...@bishopmuseum.org

http://hbs.bishopmuseum.org/staff/pylerichard.html

 

 

 

 

________________________________

From: Matthew Person [mailto:mpe...@mbl.edu]

Sent: Tuesday, November 17, 2009 11:42 PM

To: taxo...@googlegroups.com

Subject: [TaxonLit] serials series journals

Greetings:

At the BHL Staff meeting at the Smithsonian NHM 2 weeks ago,

we were invited to join this group to keep up on wider developments related to our work;

I noticed conversations this week in this group discussing definitions of serials, journals, and series.

Just to let you know- there lots of BHL librarians out there who deal with the above

relationships every day and would be pleased to contribute to such discussions if you

need that perspective in your projects.

Cordially,

Matt Person

MBL:WHOI Library

--

>)))'>

>))'>

>)'>

Matthew Person

Tech Services Coordinator

MBLWHOI Library

www.mblwhoilibrary.org<https://webmail2.bishopmuseum.org/owa/www.mblwhoilibrary.org>

Woods Hole, Massachusetts, USA

>>}}}'>

>>}}}'>

MBLWHOI is a partner in the

Biodiversity Heritage Library

www.biodiversitylibrary.org<https://webmail2.bishopmuseum.org/owa/www.biodiversitylibrary.org>

Stephen Thorpe

unread,
Nov 17, 2009, 7:02:33 PM11/17/09
to Richard Pyle, taxo...@googlegroups.com
Aloha Rich, et al.

>Contrary to Stephen's assertions, "these people" have tried desperately (for years) to specifically *not* re-invent any wheels
Actually, there is only one assertion here, not plural! :)

> (biodiversity community) needs for literature
I guess I just don't understand what those "needs" are, and why they seem to differ so very dramatically from traditional methods of literature information exchange, viz. simply citing references in the traditional way. I just don't see this issue as one that we should be wasting much time on at this point in time. The priority should be more on making biodiversity information available to as wide an audience as possible, as simply as possible, and in a pragmatically useful form. This equates to more time actually making the info available, and less time obsessing about minor details of formatting. But maybe I'm wrong ...

All I want from a literature citation is preferably a link to a PDF, or failing that a DOI or other link to an abstract. This is my approach on Wikispecies ...

Stephen

________________________________________
From: Richard Pyle [deep...@bishopmuseum.org]
Sent: Wednesday, 18 November 2009 12:32 p.m.
To: taxo...@googlegroups.com
Subject: RE: [TaxonLit] serials series journals

Chris Freeland

unread,
Nov 18, 2009, 1:02:26 AM11/18/09
to Stephen Thorpe, Richard Pyle, taxo...@googlegroups.com
Stephen, "simply citing literature in the traditional way" for taxonomy requires accurate citation information to reach the goal (and no one will argue here) for pragmatic & useful data.  There is often a disconnect between the way the library community has catalogued a resource vs. the way it has been cited by scholars, making the simple goal of connecting a citation to a PDF a non-trivial exercise, given variabilities in series, serial name changes over time, etc.  It's not an issue of formatting, but one of accurate resource description when trying to match up a citation in a scholar's bibliography to online content.

Chris

Stephen Thorpe

unread,
Nov 18, 2009, 1:19:54 AM11/18/09
to Chris Freeland, Richard Pyle, taxo...@googlegroups.com
I see, but I deal with taxonomic literature literally all the time, and I would say that any such problems of "disconnect" are the exception rather than the rule. The best way of matching a cited reference to online content is simply to link it to that online content. I still don't see a real problem here ...
 

From: Chris Freeland [cfree...@gmail.com]
Sent: Wednesday, 18 November 2009 7:02 p.m.
To: Stephen Thorpe
Cc: Richard Pyle; taxo...@googlegroups.com
Subject: Re: [TaxonLit] serials series journals

Roderic Page

unread,
Nov 18, 2009, 1:49:24 AM11/18/09
to Stephen Thorpe, Chris Freeland, Richard Pyle, taxo...@googlegroups.com
Dear Stephen,

In many ways I agree with your sentiments, I want a simple link between name and publication (ideally an identifier such as a DOI that links to the article and metadata about that article), and am appalled by how little traction this idea has within our community (no major taxonomic database makes use of DOIs, for example). For a lot of literature, especially more recently published papers, or content  being digitised by commercial publishers or organisations such as JSTOR, making these links is relatively straightforward. 

However, for literature being scanned en mass by BHL, for example, this is a non-trivial exercise, and I think this is where the current discussion is relevant. For example, the Wikispecies page http://species.wikimedia.org/wiki/Scutocyamus has the citation 

Lincoln, R.J.; Hurley, D.E. 1974: Scutocyamus parvus, a new genus and species of whale-louse (Amphipoda: Cyamidae) ectoparasitic on the North Atlantic white-beaked dolphin. Bulletin of the British Museum (Natural History), zoology, 27(2): 59-64.

How do I find a link/identifier for this publication? Turns out, this paper has been scanned by BHL, you can see it here http://biodiversitylibrary.org/page/2261319#91. For an alternative display of this article, see http://iphylo.org/~rpage/bhl/viewer.php?PageID=2261319 . Finding the Lincoln and Hurley article in BHL is, at present, not a task that can be done easily by computer, a person has to go hunting for it manually.

The trick is how do I make going from the text citation "Lincoln, R.J.; Hurley, D.E. 1974: Scutocyamus parvus ..." to http://biodiversitylibrary.org/page/2261319#91 easy and ideally automated? One approach is to parse the citation into it's parts (journal, volume, etc.) and use an OpenURL resolver to find the reference. This resolver would need to have a database of articles, and be able to understand that "Bulletin of the British Museum (Natural History), zoology" is the same as "Bull. Brit. Mus. (Nat. Hist.) Zool." (for example). I think this discussion about journal titles is about providing the data to help make this possible.

Regards

Rod



---------------------------------------------------------
Roderic Page
Professor of Taxonomy
DEEB, FBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK







Richard Pyle

unread,
Nov 18, 2009, 10:53:39 AM11/18/09
to taxo...@googlegroups.com

I'm in full agreement with both Chris and Rod on these points. When DOIs
exist, they should absolutely be used. Where they don't exist, I think it
would be wonderful if they could somehow be assigned.

I think one of the peculiarities of what I keep referring to as "our
community" (which includes taxonomy) is that literature remains relevant for
centuries. The vast, vast majority of citations we want to resolve into
page images do not have DOIs assigned (yet). Moreover, many of the objects
we wish to be able to cite (e.g., so-called "microcitations" and/or
individual taxonomic treatments within an article or other
traditionally-cited unit of documentation; one-off documents such as field
notes and such; historical newspaper articles; unpublished manuscripts;
etc.) are not really conducisve to identification/resolution via DOIs.
Therefore, I see a need for a system of assigning reusable GUIDs to this
broader scope and higher granularity of documents. (Similar issues apply to
ISSNs and ISBNs for journals and books.)

Also, I think Rod's point in his last paragraph is spot-on. In GNA-land, we
arrived at an architecture with two main domains: GNI for "names as text
strings", and GNUB for "names as metadata-rich curated data objects". Chris
Freeland and I indpendantly saw how this same basic architecture could work
very well for literature citations as well. That is, having a sort of "GRI"
(Global References Index), which we referred to as a "Dirty Bucket", into
which we can freely dump text-string citations in whatever form they
currently exist; and then build parsing and fuzzy-matching algorithms and
services to reconcile these text-string citations against what we called a
"Clean Bucket" -- a sort-of "BRCB" (Biodiversity Reference Citation Bank;
analagous to GNUB), which would issue the broad-scope, high-granularity
GUIDs. That way, when I dump the contents of my database (which may include
the journal "Bulletin of the British Museum (Natural History), zoology")
into the GRI, and someone else dumps another database (which may include the
journal "Bull. Brit. Mus. (Nat. Hist.) Zool."), and yet another contributed
database from anouther source (which may include the journal "Bull Brit Mus
Nat Hist, Zool"), these can all be reconciled against each other, and
cross-linked to the GUID for a single "clean" citation record in the "BRCB".

Speaking from experience, manually reconciling these sorts of things can be
incredibly tedious and time-consuming. The example in the above paragraph
is unusual in how clean it is, and how easily and confidently the different
renderings of the same journal can be. More typical examples include things
like:

Proceedings of the Linnean Society of New South Wales (Series 2)
Proceedings of the Linnean Society of New South Wales, Second Series
Proceedings of the Linnean Society of New South Wales. Linnean Society of
New South Wales
Proc. Linn. Soc. New South Wales, Ser. 2

Are these the same? Probably (except not sure about the third one).

The services of the sort developed by Rod for bioGUID
(http://bioguid.info/services/) are *exactly* the sort of thing we need --
but what's missing is a global repository for citations-as-text-strings
(dirty bucket) to serve as the raw material for variants in how citations
are formatted; and a well-curated GUID-issuing "master list" (clean bucket)
to anchor the morass of text-strings to (and to allow reconciliation among
datasets).

Aloha,
Rich


________________________________
<mailto:taxonlit%2Bunsu...@googlegroups.com> .

For more options, visit this group at
http://groups.google.com/group/taxonlit?hl=.



--

You received this message because you are subscribed
to the Google Groups "Taxonomic Literature" group.
To post to this group, send email to
taxo...@googlegroups.com.
To unsubscribe from this group, send email to
taxonlit+u...@googlegroups.com
<mailto:taxonlit%2Bunsu...@googlegroups.com> .
For more options, visit this group at
http://groups.google.com/group/taxonlit?hl=.

--

You received this message because you are subscribed
to the Google Groups "Taxonomic Literature" group.
To post to this group, send email to
taxo...@googlegroups.com.
To unsubscribe from this group, send email to
taxonlit+u...@googlegroups.com
<mailto:taxonlit%2Bunsu...@googlegroups.com> .

Stephen Thorpe

unread,
Nov 18, 2009, 3:08:39 PM11/18/09
to Roderic Page, Chris Freeland, Richard Pyle, taxo...@googlegroups.com
On the subject of DOIs, perhaps someone can answer a question that is perplexing me greatly? Wikispecies has a DOI template which works fine in most cases. However, some publishers are regularly citing DOIs for articles which don't work! For example, see the following Wikispecies page:
I have cited DOIs, but the links don't work, so I have also had to link to the abstract pages using URLs
 
Stephen
 

From: Roderic Page [r.p...@bio.gla.ac.uk]
Sent: Wednesday, 18 November 2009 7:49 p.m.
To: Stephen Thorpe
Cc: Chris Freeland; Richard Pyle; taxo...@googlegroups.com

Roderic Page

unread,
Nov 18, 2009, 4:45:10 PM11/18/09
to Stephen Thorpe, Chris Freeland, Richard Pyle, taxo...@googlegroups.com
Dear Stephen,

Some DOIs are broken, and some publishers break them more than others. You can report broken DOIs to CrossRef (I've done this for the two on the page you gave) and they're usually pretty good at getting them fixed. The positive spin to put on this is that, unlike biodiversity projects, CrossRef have the infrastructure and motivation to fix these, whereas some of our major projects have literally millions of identifiers that are broken (Catalogue of Life 2009, I'm looking at you).

Regards

Rod


For more options, visit this group at http://groups.google.com/group/taxonlit?hl=en.

Stephen Thorpe

unread,
Nov 18, 2009, 4:49:45 PM11/18/09
to Roderic Page, Chris Freeland, Richard Pyle, taxo...@googlegroups.com
Thanks for that. I suspect that these DOIs were never registered, and that some publishers list DOIs long before attempting to register them, which I consider to be a "bad habit"!
 

From: Roderic Page [r.p...@bio.gla.ac.uk]
Sent: Thursday, 19 November 2009 10:45 a.m.

Stephen Thorpe

unread,
Nov 18, 2009, 5:13:25 PM11/18/09
to Roderic Page, Chris Freeland, Richard Pyle, taxo...@googlegroups.com
The really tricky issues regarding the citation of journal articles involves:
 
(1) disparity between nominal and actual date of publication
 
(2) a journal can be in volumes only, volumes and issues (each issue starting again at p.1, or not), numbers, or simply marked by (nominal) year
 
These issues probably cause the most confusion, and determining actual publication dates is probably the hardest problem of all (being based on external evidence which may be evaluated differently by different people)
 
Also, there is the distinctly 21st century problem of "bogus" journals, claiming to be published in print as well as electronic, but who knows???
 
Stephen
 

From: Roderic Page [r.p...@bio.gla.ac.uk]

Sent: Thursday, 19 November 2009 10:45 a.m.

Kehan Harman

unread,
Nov 19, 2009, 5:53:31 AM11/19/09
to Stephen Thorpe, Roderic Page, Chris Freeland, Richard Pyle, taxo...@googlegroups.com
I think we definitely shouldn't ignore the excellent work present in
both BPH and TL2 for botanical publications - I don't know how the
respective authors have done it but they have dug up the history of
pretty much every publication of taxonomic relevance and found the
actual date published.
Cheers,
Kehan


On Wed, Nov 18, 2009 at 10:13 PM, Stephen Thorpe
kehan...@gmail.com
http://kehan.wordpress.com
skype: kehanharman
msn: kehan...@gmail.com

Richard Pyle

unread,
Nov 19, 2009, 10:10:09 AM11/19/09
to taxo...@googlegroups.com

Yes! Definitely! Speaking of which, someone at TDWG (Chris?) mentioned
that these are now available electronically. How do we get access?

Aloha,
Rich

D.J.King

unread,
Nov 19, 2009, 10:25:41 AM11/19/09
to Richard Pyle, taxo...@googlegroups.com
Rich, try http://asaweb.huh.harvard.edu:8080/databases/publication_index.html for a straightforward search for a publication.

There's a big EU bid being developed (you might have spotted various groups of people at TDWG talking about it) that will include bibliographic aggregators with de-duplication etc in work package 7. Let's hope the money is forthcoming...

Cheers,
Dauvit.
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).

Chris Freeland

unread,
Nov 19, 2009, 4:05:27 PM11/19/09
to Richard Pyle, <taxonlit@googlegroups.com>
Both TL2 & BPH are available online, but both are somewhat closed at
this point. BHL has started negotiations with the publishers &
editors of TL2, and it's moving forward, but with no definite timeline
yet.

Both of these are botany-centric (though not exclusively). Are Index
Animalium & ZooRecord our best place for non-plant series lists?
Maybe we should start a page on the Google Group to list these
incredibly important indices & work out strategies to get them into
indexable forms. I'd do this myself, but I'm at ORD in Chicago
enroute to STL after 17 hrs of travel from Prague for BHL-E mtgs.

Chris Freeland

On Nov 19, 2009, at 9:10 AM, "Richard Pyle"

Stephen Thorpe

unread,
Dec 18, 2009, 8:45:20 PM12/18/09
to psch...@univ-montp2.fr, taxo...@googlegroups.com
Well, life would be easier if anybody who discovers a link to a reference would take 5 min. to add it to the relevant Wikispecies page(s) ...

________________________________________
From: "Peter A. Schäfer" [Peter....@univ-montp2.fr]
Sent: Friday, 18 December 2009 10:09 p.m.
To: Stephen Thorpe; taxo...@googlegroups.com


Subject: Re: [TaxonLit] serials series journals

Hi,
an other nice example of difficult open access:
I have searched for some time for the following protologue
> Polygonum flagelliforme Loisel. (1827) Mémoires Soc. Linn. Paris vol. 6 page 409

without success, but since several month/years the relevant article is
on-line at> http://bibdigital.rjb.csic.es/ing/index.php

unfortunately not in the list of periodic publications but under titles:
> Nouvelle notice Sur les plantes à ajouter à la Flore de France... ,
which of course is not mentioned on IPNI

and fortunately under authors:
>
> Loiseleur-Deslongchamps, Jean-Louis-Auguste
> Nouvelle notice Sur les plantes à ajouter à la Flore de France..., 1827

I am also lucky as this is not a reprint with different pagination but
apparently an extract from the journal with normal pagination (396-432)
and my protologue starts at the bottom of 409 and continues on 410.

I guess the explanation of this different citations are reprints.
Formerly taxonomists would get reprints in their speciality and would
cite them as "Author, title" and not necessarily mention the journal but
modern standards are "Author, journal".

Good luck in linking all that together!
Best wishes
Peter
Peter A. Schäfer (MPU)

Dean Pentcheff

unread,
Dec 19, 2009, 10:30:25 AM12/19/09
to taxonlit
This kind of (pardon me) reference fussiess is exactly the type of
thing that motivates me to think:

1. Bibliographic references are really more than just pointers to the
original pages. They contain information about those original
publications that goes beyond just a way to get the page.

2. Any definitive bibliographic system for taxonomy will need to
retain the information about all changes to the reference content,
including the person who made the change and why it was made.

In this instance, just correcting the page numbers wouldn't help. The
next researcher who (unknowingly) was looking at an original of the
reprint would justifiably "correct" the page numbers back again,
unless there was a change record containing this explanation for the
corrected page numbers.

Why haven't I added links to relevant references at Wikispecies?
Because I have thousands of them. Unless someone were to pay me for
the 5 minutes apiece it would take to make the entries, I can't do it.
Even if someone _did_ offer to pay me to do it, I'd be damned
reluctant to do it because I've already done that linking work once.
My time would be better spent working toward a Grand Unified Taxonomic
Reference & Paper system into which I could deposit my
references+PDFs, along with everyone else's.

That's a core problem I think we're trying to solve here. Trying to
create a major component for taxonomic work (covering references and
papers) so that no one has to do that kind of crappy clerical work
more than once, and even better, only one poor bugger per reference
has to do it at all. No more "everyone makes their own reference
collection" and no more "oh, won't you just take a few minutes of your
time [times 3000] to update just one more slightly-different online
repository with your valuable data?".

One central place for papers and their metadata, then everyone gets to
feed from that.

-Dean
--
Dean Pentcheff
pent...@gmail.com

2009/12/18 Stephen Thorpe <s.th...@auckland.ac.nz>:

Richard Pyle

unread,
Dec 19, 2009, 11:13:48 AM12/19/09
to Dean Pentcheff, taxonlit
> One central place for papers and their metadata, then
> everyone gets to feed from that.

This is *exactly* my vision as well! Right now, we're starting with the
metadata (essentially the Card Catalog of Biodiversity literature +
literature-like stuff). I'd love to see the PDF repository built around it,
but the BHL page images of pre-copyright literature is a HUGE step in the
right direction -- which is why it's so important that big initiatives that
rely on literature citations (I'm thinking right now of GNUB and BHL -- but
it would be great to include wikispecies and any number of other
taxon-literature initiatives) are built upon the same infrastructure (rather
than built as separate but cross-linked databases), and thus are
automatically cross-linked.

Rich


Dean Pentcheff

unread,
Dec 19, 2009, 11:52:57 AM12/19/09
to Richard Pyle, taxonlit
Yes. And I think this is the right way to go about it -- start with
the bibliographic metadata, then worry about the PDFs. The metadata
repository entries can point anywhere to the PDF blobs. The hard part
is getting the bibliographic metadata in order and in one place.

-Dean
--
Dean Pentcheff
pent...@gmail.com

2009/12/19 Richard Pyle <deep...@bishopmuseum.org>:

Stephen Thorpe

unread,
Dec 19, 2009, 6:18:19 PM12/19/09
to Dean Pentcheff, taxonlit
With reference to this bit:

Why haven't I added links to relevant references at Wikispecies?
Because I have thousands of them. Unless someone were to pay me for
the 5 minutes apiece it would take to make the entries, I can't do it.
Even if someone _did_ offer to pay me to do it, I'd be damned
reluctant to do it because I've already done that linking work once.
My time would be better spent working toward a Grand Unified Taxonomic
Reference & Paper system into which I could deposit my
references+PDFs, along with everyone else's.
That's a core problem I think we're trying to solve here. Trying to
create a major component for taxonomic work (covering references and
papers) so that no one has to do that kind of crappy clerical work
more than once, and even better, only one poor bugger per reference
has to do it at all. No more "everyone makes their own reference
collection" and no more "oh, won't you just take a few minutes of your
time [times 3000] to update just one more slightly-different online
repository with your valuable data?".

There is a fallacy of reasoning here! Say you have 3000 references. Sure, if YOU were to add all 3000 refs to Wikispecies, it would be a major undertaking, but presumably at least hundreds of other people have overlap with you on those 3000 refs? If 300 people each put 10 refs on Wikispecies, it would take approx 0.5-1 hr of their time for each person, and we would have 3000 refs added! Not having time to add all 3000 refs isn't really a good reason for not adding any! Wikispecies is the closest existing thing to a GUTR & P system. Make max use of what we already have ... Besides, with Wikispecies, you build up the taxonomic database simultaneously with the reference library, which saves time in the long run, and makes what is there more useful as soon as it is added ...

Stephen

________________________________________
From: taxo...@googlegroups.com [taxo...@googlegroups.com] On Behalf Of Dean Pentcheff [pent...@gmail.com]
Sent: Sunday, 20 December 2009 4:30 a.m.
To: taxonlit

Dean Pentcheff

unread,
Dec 19, 2009, 11:10:09 PM12/19/09
to Stephen Thorpe, taxonlit
[Sigh. Apologies in advance. Slight rant follows.]

OK. Let me try again.

I am currently the curator of the reference list for the
genus-and-above taxa of the Decapoda. It took me, with the active
collaboration of numerous highly knowledgeable systematists, years to
amass and error-check that list.

Yes, it overlaps with the reference lists of many other workers. Many
of them donated their reference lists to the project. We spent
unreasonable amounts of time resolving the trivial but important
differences between all those overlapping versions of the same
references. We (between us all) spent equally unreasonable amounts of
time finding the originals of those references to verify the
bibliographic information.

There is probably only a handful of references in the curated list
that are identical to what was submitted to us originally by
well-intentioned donors.

But here's the point: the list is now of very high quality. It is not
a dog's breakfast of some verified and some
fourth-generation-reference-list copies from someone's master's
thesis.

Am I about to claim that we now have the definitively correct version
of every reference? Absolutely not. But will I claim that we have the
best-verified and most likely to be correct collection of references
for that particular taxonomic slice of the pie? Yes.

That's why decapod biologists now use our list. That's why decapod
biologists now invest time in sending us corrections to the list.
Because they know that the reference list they download from us is of
higher quality than the one they could develop on their own.

Much though I'd love to claim that's because of the generosity of the
community in contributing to this wonderful central resource, that's
untrue. It's because the NSF, courtesy of the Decapoda AToL, paid us
to spend the time to do it. I don't have a salaried job where I could
choose to spend part of my time contributing to the community. I
worked as a consultant to (a) do the decapod literature; and (b) pay
my rent.

Now. You are asking me to find each taxon on Wikispecies for which we
have an authority reference and then patch in the bibliographic data.
You estimated five minutes per taxon. I think that's a reasonable
estimate. Some of them will be quick cut-and-pastes, but inevitably
some will end up being time-consuming investigations of possible
synonymies, corrections, confirmations, etc. Average five minutes per
update sounds reasonable.

When I've finished doing that, Wikispecies will be blessed with a
snapshot of today's version of our bibliographic work.

Do you think that when I get a correction to one of those references
I'll correct our database, then think "Oh. Right. Must trot off to
Wikispecies and fix that up, too."? No.

Before you suggest it: No, I also don't think it's a good idea for me
to go to Wikispecies and put in a link to our reference database or
website for each taxon or reference. I'm sitting on a stupid little
home made reference database running on a shoebox in a museum lab.
What happens when I quit? What happens every week when the museum
network goes wonky?

I will spend time on a centralized, curatable, change-tracking
taxonomic reference database because then I can put our references
there and quit being a half-assed database manager.

We need a centralized resource that can absorb and maintain the
scholarship and curation that it takes to make a professional quality
taxonomic bibliographic database. Until that time, you (and anyone
else) is welcome to download from or link to the work we do, but don't
ask us to plug it, piece by piece, into a system where we cannot
coherently maintain it. And honestly, I wouldn't advise any long-term
links to our resource: we're working as hard as we can to get someone
else to host it.

But to host it where it can be coherently maintained. That's critical
to get community buy-in. Wikipedia and Wikispecies are extraordinary
wonderful tools whose strongest aspect is their ability to hoover in
unstructured contributions from all over the place. Where they're not
well suited is to curate a rigidly systematic and focused set of data
(e.g. references and associated papers) curated and maintained by
taxonomic fanatics.

-Dean
--
Dean Pentcheff
pent...@gmail.com

2009/12/19 Stephen Thorpe <s.th...@auckland.ac.nz>:

Stephen Thorpe

unread,
Dec 19, 2009, 11:19:29 PM12/19/09
to Dean Pentcheff, taxonlit
Well, I'm glad you got that out of your system! My slight rant in return is about people who rant about things but put the critical bit right at the end, in an all too quick and short sentence, without sufficiently fleshing it out! Namely: 'Where they're [Wikispecies, etc.] not well suited is to curate a rigidly systematic and focused set of data (e.g. references and associated papers) curated and maintained by taxonomic fanatics'.
Why are they not well suited? They can't host the PDFs (though, possibly PDFs could be uploaded to the Wikimedia Commons, being "media files"), but otherwise I don't see a problem ...

Stephen

________________________________________
From: Dean Pentcheff [pent...@gmail.com]
Sent: Sunday, 20 December 2009 5:10 p.m.
To: Stephen Thorpe; taxonlit

Dean Pentcheff

unread,
Dec 20, 2009, 9:58:48 AM12/20/09
to Stephen Thorpe, taxonlit
Hosting the PDFs isn't a problem, I think. Because, at base, they're
binary blobs, their hosting is pretty simple.

The reason I don't think the present Wiki* resources are well suited
to hosting a bibliographic database for taxonomy is that Wiki* systems
(in their current form) are openly-structured, loosely specified data
containers. Don't get me wrong: that's not a problem, that's their
strength. That's why they're so wildly successful at receiving and
presenting information in very flexible ways.

There's an intrinsic tradeoff between tightly specified systems that
enable complete and efficient implementation of well-specified
functionality vs. permissively specified systems that permit a broad
variety of uses.

Because the data content and functionality of bibliographic systems
are so constrained and specifiable, they are on the tight end of the
spectrum, whereas Wiki* lives on the permissive end of the spectrum.

We will want to be able to do things with a bibliographic system that
just won't be possible with references embedded in Wiki* pages:

"Show me all the publications by Mary Rathbun between 1913 and 1916,
but exclude anything published in the Proceedings of the United States
National Museum."

"Give me a reference list (in Endnote format) of all of the references
tagged "decapoda" that were verified for correctness by Sammy De
Grave."

"Show me all the references from the two journals "Bulletin du Muséum
national d’Histoire naturelle" and "Bulletin du Muséum d’Histoire
naturelle", sorted by year, so that I can correct any that are
misattributed before and after the name changed in 1907"

It's not fair to ask a Wiki* system to enable those sorts of queries.
If it could, it would have such a tightly constrained data model that
it would lose the flexibility that is the hallmark of a wiki.

But it is fair to ask a bibliographic database to implement that
functionality. We're in the process of constraining the data model as
tightly as possible (but no tighter!) so that we get the best
functionality in the very limited data domain of taxonomic
bibliographic references.

Another example -- none of the query examples I gave included anything
to do with a species or taxon. That's deliberate. I fully agree that
what we're building is not a taxon-to-reference database. We don't
want that sort of additional information -- other systems are being
built for that.

What we want is a highly specified, highly functional bibliographic
repository to which other systems can point. If some system has a
taxon name list, then that other system would do well to point to an
entity in the bibliographic system we're developing for its authority
reference.

Rod Page

unread,
Dec 20, 2009, 12:31:47 PM12/20/09
to Taxonomic Literature
A couple of quick thoughts.

The first is that Wikis need not be unstructured, indeed Wikipedia has
considerable structure built in via templates, and Semantic Mediawiki
http://semantic-mediawiki.org/wiki/Semantic_MediaWiki has considerable
structure, and a query language built in as well. One strength
Semantic Mediawiki has is that the database schema in effect becomes
part of the wiki, and hence can be edited and modified as new
functionality becomes important.

There are practical issues (Semantic Mediawiki is a something of a
hack), but once one goes down the route of openly editable, versioned,
queryable data, then one pretty much ends up developing something like
Semantic Mediawiki. So I think the choice boils down to the economics
of developing something from scratch, versus repurposing existing
software. Other wiki-style options include the software underlying
http://openlibrary.org/.

I also think we should look closely at other existing tools for
storing bibliographic metadata. Projects such as Zotero (http://
www.zotero.org) and Mendeley (http://www.mendely.com) have a lot of
traction (and money), and are likely to outlast and outperform
anything our community puts together in this space. If everybody
dumped their bibliographies into a central space such as Zotero or
Mendeley, and shared it, we'd have a massive data set to play with.

Lastly, personally I want names + literature, if for no other reason
than this is one of the ultimate goals, and taxonomic databases are a
major source of bibliographic metadata. I want to be able to browse
literature collections by author, taxon, and geography, and I'd
suggest projects like "CiteBank" (whatever that is) would get a lot
more enthusiastic support if it made such tools available.
Bibliographies by themselves are pretty lifeless, and actually not
much use. It's the links between the publications, their authors, and
the subjects they deal with that makes it come alive.

Regards

Rod

> pentch...@gmail.com
>
> 2009/12/19 Stephen Thorpe <s.tho...@auckland.ac.nz>:


>
>
>
> > Well, I'm glad you got that out of your system! My slight rant in return is about people who rant about things but put the critical bit right at the end, in an all too quick and short sentence, without sufficiently fleshing it out! Namely: 'Where they're [Wikispecies, etc.] not well suited is to curate a rigidly systematic and focused set of data (e.g. references and associated papers) curated and maintained by taxonomic fanatics'.
> > Why are they not well suited? They can't host the PDFs (though, possibly PDFs could be uploaded to the Wikimedia Commons, being "media files"), but otherwise I don't see a problem ...
>
> > Stephen
>
> > ________________________________________

> > From: Dean Pentcheff [pentch...@gmail.com]

> > pentch...@gmail.com
>
> > 2009/12/19 Stephen Thorpe <s.tho...@auckland.ac.nz>:


> >> With reference to this bit:
>
> >> Why haven't I added links to relevant references at Wikispecies?
> >> Because I have thousands of them. Unless someone were to pay me for
> >> the 5 minutes apiece it would take to make the entries, I can't do it.
> >> Even if someone _did_ offer to pay me to do it, I'd be damned
> >> reluctant to do it because I've already done that linking work once.
> >> My time would be better spent working toward a Grand Unified Taxonomic
> >> Reference & Paper system into which I could deposit my
> >> references+PDFs, along with everyone else's.
> >> That's a core problem I think we're trying to solve here. Trying to
> >> create a major component for taxonomic work (covering references and
> >> papers) so that no one has to do that kind of crappy clerical work
> >> more than once, and even better, only one poor bugger per reference
> >> has to do it at all. No more "everyone makes their own reference
> >> collection" and no more "oh, won't you just take a few minutes of your
> >> time [times 3000] to update just one more slightly-different online
> >> repository with your valuable data?".
>
> >> There is a fallacy of reasoning here! Say you have 3000 references. Sure, if YOU were to add all 3000 refs to Wikispecies, it would be a major undertaking, but presumably at least hundreds of other people have overlap with you on those 3000 refs? If 300 people each put 10 refs on Wikispecies, it would take approx 0.5-1 hr of their time for each person, and we would have 3000 refs added! Not having time to add all 3000 refs isn't really a good reason for not adding any! Wikispecies is the closest existing thing to a GUTR & P system. Make max use of what we already have ... Besides, with Wikispecies, you build up the taxonomic database simultaneously with the reference library, which saves time in the long run, and makes what is there more useful
>

> ...
>
> read more »

Richard Pyle

unread,
Dec 20, 2009, 2:22:20 PM12/20/09
to Rod Page, Taxonomic Literature
> I also think we should look closely at other existing tools
> for storing bibliographic metadata. Projects such as Zotero (http://
> www.zotero.org) and Mendeley (http://www.mendely.com) have a
> lot of traction (and money), and are likely to outlast and
> outperform anything our community puts together in this
> space. If everybody dumped their bibliographies into a
> central space such as Zotero or Mendeley, and shared it, we'd
> have a massive data set to play with.

But would we have the freedom & access to implement the algorithms and
cross-linking that we wish to do for our community data? And are the Zotero
identifiers reliably persistent? And would any institution be able to
maintain a local mirror copy of the entire database locally, with features
to maintain it?

> Lastly, personally I want names + literature, if for no other
> reason than this is one of the ultimate goals, and taxonomic
> databases are a major source of bibliographic metadata. I
> want to be able to browse literature collections by author,
> taxon, and geography, and I'd suggest projects like
> "CiteBank" (whatever that is) would get a lot more
> enthusiastic support if it made such tools available.

99% of my motivation in pushing CiteBank is as a cross-link for GNUB. In
other words, I'm in this game for *exactly* the reasons you state above --
to cross-link to taxon names.

Rich


Stephen Thorpe

unread,
Dec 20, 2009, 4:42:47 PM12/20/09
to Rod Page, Taxonomic Literature
I think you are getting your Wikis a little mixed up, Rod. Wikispecies is more structured than Wikipedia, due to templates...

________________________________________
From: taxo...@googlegroups.com [taxo...@googlegroups.com] On Behalf Of Rod Page [r.p...@bio.gla.ac.uk]
Sent: Monday, 21 December 2009 6:31 a.m.
To: Taxonomic Literature
Subject: [TaxonLit] Re: serials series journals

Regards

Rod

--

Rod Page

unread,
Dec 20, 2009, 5:50:55 PM12/20/09
to Taxonomic Literature
Both Wikispecies and Wikipedia use templates. By virtue of its much
larger subject coverage, Wikipedia has a much richer set of templates
than Wikispecies. Wikispecies makes more use of templates to structure
taxonomic hierarchy than does Wikipedia.

Rod

On Dec 20, 9:42 pm, Stephen Thorpe <s.tho...@auckland.ac.nz> wrote:
> I think you are getting your Wikis a little mixed up, Rod. Wikispecies is more structured than Wikipedia, due to templates...
>
> ________________________________________
> From: taxo...@googlegroups.com [taxo...@googlegroups.com] On Behalf Of Rod Page [r.p...@bio.gla.ac.uk]
> Sent: Monday, 21 December 2009 6:31 a.m.
> To: Taxonomic Literature
> Subject: [TaxonLit] Re: serials series journals
>
> A couple of quick thoughts.
>
> The first is that Wikis need not be unstructured, indeed Wikipedia has

> considerable structure built in via templates, and Semantic Mediawikihttp://semantic-mediawiki.org/wiki/Semantic_MediaWikihas considerable


> structure, and a query language built in as well. One strength
> Semantic Mediawiki has is that the database schema in effect becomes
> part of the wiki, and hence can be edited and modified as new
> functionality becomes important.
>
> There are practical issues (Semantic Mediawiki is a something of a
> hack), but once one goes down the route of openly editable, versioned,
> queryable data, then one pretty much ends up developing something like
> Semantic Mediawiki. So I think the choice boils down to the economics
> of developing something from scratch, versus repurposing existing

> software. Other wiki-style options include the software underlyinghttp://openlibrary.org/.


>
> I also think we should look closely at other existing tools for

> storing bibliographic metadata. Projects such as Zotero (http://www.zotero.org) and Mendeley (http://www.mendely.com) have a lot of

> ...
>
> read more »

Rod Page

unread,
Dec 20, 2009, 5:55:36 PM12/20/09
to Taxonomic Literature
So long as we can get the bibliographic data out, then we can whatever
we want with it. My point is that Zotero and Mendeley have finely
crafted interfaces, big user communities, and are under active
development. If part of the goal is to capture bibliographic metadata,
then we could do a lot worse than encourage people to make use of
these tools, while we concentrate on developing tools to merge, clean,
and identify references.

Rod

Stephen Thorpe

unread,
Dec 20, 2009, 6:23:28 PM12/20/09
to Dean Pentcheff, taxonlit
Well Dean, I guess we have a difference in focus. My focus is on using Wikispecies as a classified repository of taxonomic information, references, and images. So, the basic idea is that if you want to know the current "state of play" of a taxon, you look it up on Wikispecies (or Google it and then click on the Wikispecies link). So, it is taxon focused. I don't give a monkey's about doing bibliographic searches like finding all the references published in such and such a journal by whoever, though author pages can handle some of that in a primitive way.

I wonder if the funders of these projects fully appreciate that the core taxonomic information can be handled rather well by the already existing and free Wikispecies infrastructure, and that they are paying out millions just to be able to do a few fancy searches of no primary taxonomic significance ... no doubt you will disagree! :)

Stephen

________________________________________
From: Dean Pentcheff [pent...@gmail.com]

Sent: Monday, 21 December 2009 3:58 a.m.

Dean Pentcheff

unread,
Dec 20, 2009, 6:49:23 PM12/20/09
to Stephen Thorpe, taxonlit
(Replying to Stephen's post, but really to a string of posts.)

I think we're all ultimately interested in the linked relationship
between taxa, taxon names, authority metainformation, and publication
text. None of those alone (except the taxa themselves, rooting around
happily in the mud) is interesting at all.

What we're trying to figure out is how best to do that. I tend to
agree with a viewpoint that we want targeted databases for each of the
components (names, references, documents, etc.) and infrastructure to
link them. The possible linkages are too diverse and too difficult to
specify in advance. If taxon names are part of a bibliographic
database, then we have to pick a name database schema (that's likely
to conflict with everyone else's schema).

It's much more powerful and future-proof to keep the homogeneous data
types together with their kin, and permit diversification and
innovation in the linking.

I do plead ignorance on the possibilities for Semantic Mediawiki. But
your argument, Rod, is on target -- once we've committed to a
versioned, schema-based database, all the rest is implementation
chaff. Important, difficult implementation chaff, but not fundamental
to the problem. The existence of the letters "W", "I", "K", and one
more "I" in the name of a possible platform shouldn't cause a rash.

Maybe we could hook into Zotero or other systems. But what I'd like to
see first is our list of needs and desires (as encoded in a schema or
textual description -- don't care which). THEN we can see what
existing platforms might work to host those needs, or decide to build
one ourselves.

As to the difference of taxon vs. bibliographic focus: yes, there are
numerous different perspectives out there! Practicing taxonomists,
though, tend to live down in the nuts and bolts of the data, below the
level of looking for taxa. They already know everything there is to
know about a taxon (yeah, exaggerated for impact), so they're looking
to comprehensively identify and hoover up the very low-level details.
Like complete, checked bibliographic information for authority
references (ideally with a direct link to the original work).

-Dean
--
Dean Pentcheff
pent...@gmail.com

2009/12/20 Stephen Thorpe <s.th...@auckland.ac.nz>:

Stephen Thorpe

unread,
Dec 20, 2009, 7:13:37 PM12/20/09
to Dean Pentcheff, taxonlit
>Practicing taxonomists, though, tend to live down in the nuts and bolts of the data, below the level of looking for taxa. They already know everything there is to know about a taxon (yeah, exaggerated for impact), so they're looking to comprehensively identify and hoover up the very low-level details. Like complete, checked bibliographic information for authority references (ideally with a direct link to the original work).

So much to rant about in reply to one sentence!

>below the level of looking for taxa

You've strawmanned me! I wasn't suggesting at all that the primary purpose of the Wikispecies pages was simply "looking for taxa"! I said it was for seeing the current "state of play" for a taxon (which may be a family, a genus, a species, etc.) My experience of taxonomists is that they have to find out and follow the state of play - it doesn't just pop into their heads by magic. There are plenty of cases around of taxonomists "missing things". So, imagine, if you will, being able to instantly Google a solid treasure trove of current info on any given taxon. Besides, I don't see why Wikispecies can't provide solid bibliographic information to allow taxonomists to "hoover up" low level details - actually it is rather good for that, and it can also handle links to the original work when such exists ...

For example, I was just now making a start at tidying up Wikispecies Decapoda, and I immediately find a nomenclatural tangle, see:
http://species.wikimedia.org/wiki/Anaglyptus_Milne-Edwards
it may be more complex, with the beetle genus possibly with Anaclyptus as the correct original spelling, but Anaglyptus is currently in use, and forms the basis of tribe Anaglyptini ... A mess to be sorted out, but highlighted by trying to synthesise taxonomic info accross the board on Wikispecies ...

Stephen

________________________________________
From: Dean Pentcheff [pent...@gmail.com]

Sent: Monday, 21 December 2009 12:49 p.m.

Reply all
Reply to author
Forward
0 new messages