Master Journal Title List

11 views
Skip to first unread message

Richard Pyle

unread,
Nov 14, 2009, 8:40:30 PM11/14/09
to taxo...@googlegroups.com

Hi All,

I'm currently on hour-34 of my long journey home. I spent much of last
night in Marseille airport (and two batteries' worth of time on the plane to
Houston) cleaning up my own list of journal titles. What I currently lack,
and what I think would be EXTREMELY useful, is a list of "clean" (i.e.,
validated) full journal titles (plus one or more standard abbreviations),
that we could use to cross-link our own (often messy & unverified) journal
titles to. I've worked on this before, and I know that certain nomenclator
databases have pretty clean lists of group-specific journal titles +
abbreviations (e.g., Catalog of Fishes, Hymenoptera Name Server, Hexacoralia
database, etc.) Chris (F.) mentioned that BHL has some sort of list, and
that no such list for our community exists out in library-land. Cathy
indicated that we might be able to extract something from the Library of
Congress or some other Library resource. Rod -- were you able to compile a
list of clean journal titles in your bioguid work on this?

From my perspective, this should be the first content-related priority
(after we get a draft data standard & data model hammered out).

One more thing: I mentioned at TDWG about ISO 833 (a PDF of which I just
uploaded to the site:
http://groups.google.com/group/taxonlit/web/ISO833.pdf). This might be
useful for normalizing our standard abbreviations (even if it no longer is a
vlaid ISO standard). Anybody want to take on the task of OCRing it (or
finding an electronic copy online)?

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deep...@bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html

Richard Pyle

unread,
Nov 14, 2009, 8:43:56 PM11/14/09
to taxo...@googlegroups.com

Just a quick follow-up: I forgot to mention this site:

http://library.humboldt.edu/infoservices/FTitleAbbr.htm

Which has some relevant links.

Rich

Rod Page

unread,
Nov 15, 2009, 5:05:37 AM11/15/09
to Taxonomic Literature
Wouldn't actually be more useful to have a list of journal names and
alternatives, all linked together? Journals can have more than one
name in use. My approach in bioGUID is to gather alternative names,
link them by the journal ISSN, and use approximate string matching to
handle cases where a citation uses a journal name that isn't an exact
match to one that I have in my database.

I think the ability to match names is what we really want, a clean
definitive list is simply a by-product of supporting that.

I could make a dump of what I have in bioGUID available. You can also
access individual journals like this:

http://bioguid.info/issn/index.php?issn=0454-6296

http://bioguid.info/issn/index.php?issn=0372-1426

I also have a journal lookup service that uses approximate string
matching at http://bioguid.info/services/

A clean list by itself isn't of much use given that people don't
always use these. What you want are a list of what people do use, plus
tools to cluster these (and variants you haven't seen yet) into sets
that refer to the same journal.

Regards

Rod
> email: deepr...@bishopmuseum.orghttp://hbs.bishopmuseum.org/staff/pylerichard.html

Richard Pyle

unread,
Nov 15, 2009, 7:38:43 AM11/15/09
to Rod Page, Taxonomic Literature
> Wouldn't actually be more useful to have a list of journal
> names and alternatives, all linked together?

Yes, that's what I intended to convey; but I see now I didn't quite put it
that way in my email. Obviosuly, the goal is to build the links. I have
found that having one "clean" form of the title is the easiest way to
achieve this so you have "n" variants linking to one "master", rather than
"n" variants cross-linked to each other. Obviously, we're going to want to
accommodate multiple simultaneously legitimage representations of Journal
titles, so I completely agree with that. I was more thinking in terms of
building a foundation to reconcile all the illegitimate variants
(misspellings, missing words or punctuations, truncated titles, etc.)
against a "clean" varaition that accomodates multiple legitimate
representations (different languages, different character sets,
abbreviations, legitimate alternate titles, etc.)

> Journals can
> have more than one name in use. My approach in bioGUID is to
> gather alternative names, link them by the journal ISSN,

Yes, the ISSN is handy, but only applies to journals with ISSNs. For
example, I'm currently working with a set of 4700 journal titles from Zoo.
Record; of which over 500 lack ISSNs. I have another master list of 157,000
titles of periodicals, which represent 103,000 unique titles, of which
31,000 lack ISSNs.

> I think the ability to match names is what we really want, a
> clean definitive list is simply a by-product of supporting that.

In one sense it's a byproduct; but in another sense it's also a tool to help
us achieve the reconciliation/matching.

> I could make a dump of what I have in bioGUID available. You
> can also access individual journals like this:
>
> http://bioguid.info/issn/index.php?issn=0454-6296
>
> http://bioguid.info/issn/index.php?issn=0372-1426
>
> I also have a journal lookup service that uses approximate
> string matching at http://bioguid.info/services/

Cool! Thanks! I'd definitely like to compare what you have to what I have
-- especially in terms of matching ISSNs for those journals that have them.

> A clean list by itself isn't of much use given that people
> don't always use these. What you want are a list of what
> people do use, plus tools to cluster these (and variants you
> haven't seen yet) into sets that refer to the same journal.

Unfortunately, in my experience, most of what people use is a hodge-podge of
abbreviations, full titles (some with periods, some without), and many,
manyy spelling errors, truncations and duplicates. I see the one "clean"
version of the title as the most effective mechanism that I have found in
consolodating both within a messy set of journals, and between two sets
(whether they are messy or clean). The problem I keep encountering has more
to do with illegitimate alternative representations of titles, than with
legitimate ones.

I definitely look forward to seeing a dump of what you have.

Rich


Rod Page

unread,
Nov 15, 2009, 1:41:53 PM11/15/09
to Taxonomic Literature
I've uploaded a dump of my list of journal names and ISSNs, edited
slightly to remove some obscure journals that had encoding issues.
This file is an amalgam of lists from PubMed, CrossRef, JSTOR, and
manual additions I've made.

The file is tab-delimited text (in UTF-8 encoding) with columns
corresponding to:

title issn language_code comment

language_code by default is English, but in some cases I've manually
edited it to match the language of the journal title.

In many cases there is more than one name for a journal, this is
intentional (in most cases). If you group by ISSN you can create a
list of alternative names for a journal.

Regards

Rod
> > string matching athttp://bioguid.info/services/

Dean Pentcheff

unread,
Dec 14, 2009, 8:10:44 PM12/14/09
to Taxonomic Literature
On journal titles... something that will help immensely is a format or
protocol to incorporate date ranges for each title. A simple example
we were able to use to confirm or correct a number of entries:
1971-now Fishery Bulletin
1941-1970 Fishery Bulletin of the Fish and Wildlife Service
1912-1940 Bulletin of the United States Fish Commission
1904-1911 Bulletin of the Bureau of Fisheries
1881-1903 Bulletin of the United States Fish Commission

If the journal title list also included information on the start and
end dates that the title is valid, and possibly (though this adds
considerably to the complexity) the "continued-as" linkage
information, lots of bogus journal names could be semi-automatically
cleaned up.

-Dean
--
Dean Pentcheff
pent...@gmail.com
Reply all
Reply to author
Forward
0 new messages