Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ISFDB on your desktop

2 views
Skip to first unread message

Ahasuerus

unread,
Sep 14, 2006, 8:32:22 PM9/14/06
to
As some of you know, the database and the scripts behind the ISFDB
(http://www.isfdb.org) have been available for download for quite some
time. In theory, it has been possible to build a functional copy of the
ISFDB on another computer for about 10 years, but, unfortunately, it
used to be a daunting task and few people found it worth their time.

As of 2006, the ISFDB has been redesigned (the "ISFDB2" project) and
moved to MySQL. The Python scripts that drive the ISFDB application are
still being beta tested, but we expect to re-enable user submissions
later this year. In the meantime, if you know basic SQL commands, you
can download the MySQL database engine and the ISFDB backup file (11 Mb
compressed), which should enable you to run all kinds of searches and
extract many kinds of data that are not accessible using the standard
ISFDB front end. The ISFDB download page
(http://isfdb.tamu.edu/wiki/index.php/ISFDB_Downloads) has updated
instructions explaining how to get started. ISFDB's table layout is
fairly straightforward, but there is a diagram at
http://www.isfdb.org/isfdb_schema.png in case you get lost.

A special note for those who dabble in programming: The data in the
ISFDB database varies in quality. Some of it was originally added by
webbots (see http://isfdb.tamu.edu/wiki/index.php/User:Dissembler) and
needs to be cleaned up. Other records were entered by humans at a time
when data validation was non-existent and updates were applied without
further checking -- with predictable results.

We have a few data cleanup projects going at this time -- see
http://isfdb.tamu.edu/wiki/index.php/Bibliographic_Projects_in_Progress
. It's an ongoing effort and everybody is invited to participate.
Although user submissions are currently disabled within the ISFDB
proper, the ISFDB *Wiki*
(http://isfdb.tamu.edu/wiki/index.php/Main_Page) is wide open except
for a few protected pages. If you have a copy of the ISFDB database on
your desktop, you can contribute to this effort by writing scripts that
search the ISFDB for suspected bad data and posting the resulting "hit
lists" to the Wiki.

We have put together a sample script that searches for bad suffixes to
serve as an example of the type of cleanup tools we are looking for --
see the first sub-project at
http://isfdb.tamu.edu/wiki/index.php/ISFDB:Author_Names_Cleanup . We
have created a stub for another sub-project at the bottom of the same
Wiki page, which see for details. Naturally, everybody should be free
to propose or simply start another data cleanup project.

A note re: tool selection. We realize that everybody has preferred
languages and tools and some of them can be quite obscure. (No, I
wasn't referring to Dave Langford specifically, why do you ask?) At the
very least we ask you to post the scripts that you used to identify
possible bad data the way it's currently done at
http://isfdb.tamu.edu/wiki/index.php/ISFDB:Author_Names_Cleanup . If
your tool of choice is obscure, please specify where it can be found so
that other people could re-run your script later on. Well known
languages like MySQL, Perl, Python, etc are preferred since they make
joint development easier, but we'll take what we can. Please note
http://isfdb.tamu.edu/wiki/index.php/ISFDB:General_disclaimer as it
relates to intellectual property issues.

Stop by and take a look around some time :)

Happy data munching,
--
Ahasuerus (since Al von Ruff is very busy at the moment)

0 new messages