Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sending marc records into a script that uses MARC::Batch

12 views
Skip to first unread message

John E Guillory

unread,
May 29, 2014, 12:08:00 PM5/29/14
to perl...@perl.org

Hello,

Two questions please:

 

1.      I’ve written a script that opens a marc file for reading using this syntax:

 

$file = $ARGV[0];

$batch = MARC::Batch->new('USMARC',$file);

 

It then loops thru the records using this syntax:

while ( $record = $batch->next()) {

         …..check position 6, 7 of leader and position 23 of 008 and make some changes

}

 

This works great. However, instead of accessing the file this way, I want to pipe the output of a previously run marc dump command directly into this script via the pipe.  

I understand that this can be done using this syntax:    while ($line =<STDIN>){ …}, but I don’t understand how to use that STDIN with “MARC::Batch->new(‘USMARC’,$file);”    This does not work:    $batch = MARC::Batch->new(‘USMARC’,<STDIN>);

 

2.      My current script successfully reads and processes a marc file of over 5 gigs!....but exits entirely on record 160,585 with the error from MARC::Batch, “Can't call method "as_string" on an undefined value at ./marc_batch.pl”.  Documentation on using MARC::Batch says that to tell it to continue processing even when errors are encountered one should use strict_off(), then print/report warnings at the bottom of the script. I don’t think my particular error is being handled by the strict_off() setting. Doesn’t anybody know what causes/how to fix “Can’t call method as_string?” error? Full script below—it’s pretty short, thanks to MARC::Batch.

 

Thanks for ensights! 

 

 

use MARC::Batch;

 

$file = $ARGV[0];

chomp($file);

 

$batch = MARC::Batch->new('USMARC',$file);

$batch->strict_off();    # otherwise script exits when encounters errors

 

open(OUT,'>new_marc');

 

while ( $record = $batch->next()) {

    $leader                = $record->leader();

    $leader_pos_6          = substr($leader,6,1);

    $leader_pos_7          = substr($leader,7,1);

 

    $field                 = $record->field('008');

    $field_008             = $field->as_string();

    $field_008_position_23 = substr($field_008,23,1);

 

if ( ($leader_pos_6 eq "a") && ($leader_pos_7 eq "m") && ($field_008_position_23 eq "o") || ($field_008_position_23 eq "s") ) {

 

       $control_num        = $record->field('001');

       $control_num        = $control_num->as_string();

 

       print "008 position 23: $field_008_position_23 \n";

       print "OLD leader: $leader \n";

       $old_leader = $leader;

       substr($leader,6,1) = 'm';

       print "NEW leader: $leader \n";

 

       print OUT $record->as_usmarc();

      print "$control_num|$old_leader|$leader|$field_008\n";

  

} else {  # not a match so just print this one unchanged…

       print OUT $record->as_usmarc();

}

 

}

 

# handles errors:

if (@warnings = $batch->warnings()) {

     print "\n Warnings detected: \n", @warnings;

}

 

close(OUT);

close(LOG);

 

 

 

John Guillory

Louisiana Library Network

225.578.3758

 

Timothy Prettyman

unread,
May 29, 2014, 12:23:14 PM5/29/14
to John E Guillory, perl...@perl.org
For your first question, instead of:

 $batch = MARC::Batch->new(‘USMARC’,<STDIN>);

use:

 $batch = MARC::Batch->new(‘USMARC’,STDIN);

For your second, the error is likely caused when a field you're using as_string() on doesn't exist in the record.  

So, you could do something like the following:

$field                 = $record->field('008');
$field or do {                                          # check for existence of field
   print "no 008 field for record\n";            # no field
   next;                                                  # skip the field (or whatever)
};

$field_008             = $field->as_string();


Hope this helps

-Tim

Timothy Prettyman
LIT/Library Systems
University of Michigan

John E Guillory

unread,
May 29, 2014, 5:13:25 PM5/29/14
to Timothy Prettyman, perl...@perl.org

Thanks Timothy for your help.

 

When processing about 5 million records I would expect some crazy records. The new script (incorporating Timothy’s  suggestions) exited prematurely on record 85,877 with: “Warnings detected: Entirely empty subfield found in tag 260”. I know 260 is publication stuff but it’s not “required”.  I’m deliberately printing warnings but again the script exited prematurely.

 

Thanks for assistance.

John

Robin Sheat

unread,
May 29, 2014, 7:12:32 PM5/29/14
to perl...@perl.org
John E Guillory schreef op do 29-05-2014 om 21:13 [+0000]:
> “Warnings detected: Entirely empty subfield found in tag 260”

An entirely empty subfield is an illegally formatted thing, at least
according to the rules of MARC::Record/MARC::Field, and so I assume the
MARC format itself. So it's not that it's a required field or anything
like that, it's that the USMARC is incorrectly formatted, so the parser
throws an exception with 'die'.

To catch the exception rather than having your program terminate, you
need to wrap the call that's failing in an 'eval' block, and check for
errors after it, handling them appropriately. You might be lucky and the
file is OK and the parser can continue, however you might be unlucky and
this corrupt record causes the parser to get confused and it can't find
the start of the next record.

See 'perldoc -f eval' for more information on using it for
error/exception handling.

--
Robin Sheat
Catalyst IT Ltd.
+64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
signature.asc

Stefano Bargioni

unread,
May 30, 2014, 5:20:35 AM5/30/14
to perl4lib
If I'm not wrong, 
$batch->strict_off();
will avoid your loop to print warnings and stop processing records.
HTH. Stefano
__________________________________________________
Il tuo 5x1000 al Patronato di San Girolamo della Carità è un gesto semplice ma di grande valore.
Una tua firma aiuterà i sacerdoti ad essere più vicini alle esigenze di tutti noi.
Aiutaci a formare sacerdoti e seminaristi provenienti dai 5 continenti indicando nella dichiarazione dei redditi il codice fiscale 97023980580.

Timothy Prettyman

unread,
May 30, 2014, 12:39:18 PM5/30/14
to perl4lib
I think you have to check for warnings as you read each record, so try moving your error handing code right after the batch->next() call.  But Robin's suggestion is good advice, and is probably a more robust way to handle the crud that can show up in a file of marc records.

-Tim

John E Guillory

unread,
Jun 3, 2014, 8:04:16 AM6/3/14
to perl4lib

Thanks for your ideas. I will try your suggestions.

0 new messages