Post / MARC import question

6 views
Skip to first unread message

Shannon

unread,
Jan 21, 2009, 4:20:05 PM1/21/09
to Scriblio
We have just installed Wordpress 2.7 + Scriblio 2.6v00 on our LAMP
server. We hope to transition our Community Information Database
(library.minlib.net:81) out of our III module and into Scriblio.
Records are currently stored as MARC, so I have been attempting to
import a small test batch with the Scriblio Importer. The file of
records uploads (I can see the file on my server and in my Media
Library), but nothing seems to be posted. Furthermore, when I write a
post by hand it 'posts' in the sense that it creates a record, but all
values that I assign to the record are empty. I have read the threads
on this group and some FAQ from the Wordpress Codex, and I'm stumped.

I created .htaccess just as the Scriblio wiki describes.

I'm using field 907 for the unique id.

wp_scrib_harvest does exist.

wp_posts has a row for each attempt at importing, which seems to
indicate that the file is an attachment, but those posts are not
visible on the actual site. There are also rows for each of my
handwritten test posts, but the Scriblio fields (i.e. title,
description) have the value 'closed.' I'm not sure what that means.

I created a page called Catalog, whose author is 'cataloger'. My
cataloger has been given the role of Editor.

I'm uploading my MARC files to wp-content/plugins/scriblio/import.

When I upload a .mrc file, its extension becomes .import. Is this
significant?

Is there a setting that I have overlooked? Do my MARC files need to be
imported to a different location?

Thanks,
Shannon

Casey Bisson

unread,
Jan 26, 2009, 10:06:28 PM1/26/09
to scri...@googlegroups.com

Shannon,

Interesting you should mention this use. I've got a few questions for
you, but let me try to address your question first:

Regarding the MARC importer: There's a bug in the version you've got
(my fault). If you can execute MySQL queries using a tool like PHP My
Admin, you can probably create the missing table with a command like
this:

CREATE TABLE wp_scrib_harvest (
source_id varchar(50) NOT NULL default '',
harvest_date timestamp NOT NULL default '0000-00-00 00:00:00',
imported tinyint(1) default '0',
content longtext NOT NULL,
PRIMARY KEY (source_id),
KEY imported (imported)
)

On the other hand, I'm getting closer to releasing a new version of
Scrib that not only offers better WordPress 2.7 compatibility, but
also a rich metadata editor. You can try it out now, just download the
"2.7b02" or "development version" here:

http://wordpress.org/extend/plugins/scriblio/download/

The folks at Colby-Sawyer College are already using the new version
for their institutional archives site:

http://archives.colby-sawyer.edu/

What you can't do, however, is use the MARC importer with the 2.7
version yet. You _can_ catalog a new book or photo from scratch or
import them from III (using the III importer), but I haven't updated
the MARC file importer yet.

That said, here are my questions:

How many records do you have?
I'd never thought of trying to do this with MARC. Are these cataloging
rules that you've developed, or are others doing this too?

Is this record a reasonable exemplar for the others?
http://library.minlib.net:81/record=1000012
http://library.minlib.net:81/search/.b1000012/.b1000012/1%2C1%2C1%2CB/marc
~b1000012

I'm actually working on a similar project. I'm basing it on Scriblio,
but building a metadata form specifically for it and with the idea
that representatives of each organization would self-maintain their
record(s).

I'd like to talk with you more about what you have in mind.

--Casey

Shannon

unread,
Jan 29, 2009, 5:37:59 PM1/29/09
to Scriblio
Hi Casey,
It's good to know that the MARC importer doesn't work with WP2.7. I
have turned my attention to the III importer. I made a small change
to your code on line 104:

$prefs['sourceinnopac'] = ereg_replace( '[^a-z|A-Z|0-9|-|:|\.]', '',
$_POST['scrib_iii-sourceinnopac'] );

I added the ':' because our database uses a port - library.minlib.net:
81 - and the colon was being stripped from the hostname when I set the
connection information.

I still have not successfully imported records from our database, but
I was able to pull a test set from lola.plymouth.edu. The importer
seems to be scanning our database, but does not import any records in
the end. Again, I'm stumped.

I have replied to your specific questions about our project off list.

Thanks,
Shannon
> Is this record a reasonable exemplar for the others?http://library.minlib.net:81/record=1000012http://library.minlib.net:81/search/.b1000012/.b1000012/1%2C1%2C1%2CB...

Casey Bisson

unread,
Feb 1, 2009, 9:18:54 PM2/1/09
to scri...@googlegroups.com

Well done in finding that bit of code. Your use case has convinced me
to change the expected input to accommodate port numbers as well as
https schemes.

Do you mind if I test against your Innopac?

Shannon Astolfi

unread,
Feb 2, 2009, 9:35:14 AM2/2/09
to scri...@googlegroups.com
Glad that was helpful. 

Absolutely feel free to test against our database.  I have some thoughts that might help, based on the tests we've been doing.  Our situation is a little convoluted, so bear with me.

Hypothesis #1: The importer requires a 245 field to capture the record.

The last time I posted to this group, the Scriblio III Importer could not see any records, and therefore was not harvesting any.  Our CID records don't have 245 title fields; the organization name is in a 110.  I added a 245 field to one record, and the importer detected it and listed one harvested record for publication.

I've been trying to read and understand the Scriblio code - is this snippet related to my 245 theory? (From Scriblio proper, not the importer.)
$postdata['post_title'] = $wpdb->escape(str_replace('\"', '"', $bibr['title'][0]['a']));


Hypothesis #2: III's CID module stores data differently than a bibliographic module.

This theory is difficult to prove, but here's the story: Even when I was able to harvest a record from the CID, the content wasn't getting parsed.  After attempting to publish the record, I looked at the row created in our MySQL database, and the record's tags and content appear as raw text in the 'content' field.  I moved some CID records to our III bibiliographic test database, pointed the Scriblio importer at them, and was able to harvest and publish records. (Hooray!)

Hypothesis #3: The CID MARC fields are special.

When the CID records successfully posted, only some of the fields appeared.  I think it has to do with the unique MARC encoding standard [1].  This is where I can use some help understanding the code.  It seems like I would want to customize the taxonomies, but that affects display and search, not MARC field parsing, right?  So if I want Scriblio to recognize fields 271 and 311, am I dealing with the iii_parse_record function in the importer or the parse_content function in Scriblio proper? Or both/neither?  Sorry if these are basic questions - I'm still just learning here.

Thanks Casey et al. - I really appreciate the help!

Shannon

[1] http://www.loc.gov/marc/community/ciintro.html

Mark A. Matienzo

unread,
Feb 2, 2009, 11:02:16 AM2/2/09
to scri...@googlegroups.com
Hi Shannon,

I believe all three of your hypotheses are correct. MARC Community
Information records in fact follow a different standard from MARC 21
bibliographic data; see
<http://www.loc.gov/marc/community/eccihome.html> for more
information.

Mark

--
Mark A. Matienzo
Applications Developer, Digital Experience Group
The New York Public Library

Casey Bisson

unread,
Feb 2, 2009, 11:21:44 AM2/2/09
to scri...@googlegroups.com

Mark, Thanks for pointing out the data dictionary for this.

I'm at a conference today[1], but I'll definitely need to look into
this more thoroughly.

[1:] http://maisonbisson.com/blog/post/13441/wordcamp-higher-ed-northeast/

--Casey

Casey Bisson

unread,
Feb 2, 2009, 11:26:06 AM2/2/09
to scri...@googlegroups.com

Shannon,

You're theories are spot on, and I'm excited about where this is going.

I'll have more soon,

--Casey

Casey Bisson

unread,
Feb 12, 2009, 9:23:59 PM2/12/09
to scri...@googlegroups.com

There's nothing there yet, but these conversations have convinced me to build an importer specific to the task. The first part is to get the CID plugin working, then make the III importer smart about it, and when that happens it will live here:



On Feb 2, 2009, at 9:35 AM, Shannon Astolfi wrote:

Reply all
Reply to author
Forward
0 new messages