line-feed in mrc file

111 views
Skip to first unread message

Hannah

unread,
Oct 25, 2012, 9:05:20 AM10/25/12
to solrma...@googlegroups.com
Hallo @ all.

I have a problem:
Are line-feeds allowed between records?

Normally we get .mrc files without line-feeds and my Marc4J-programms and solmarc indexer never had problems to read the files.

Now we get marc-files from another provider and there are line-feeds between every record.
 My programs can't read the file and solrmarc can't index the records:


ERROR [main] (MarcImporter.java:258) - Error reading record: unable to parse record length
org.marc4j.MarcException: unable to parse record length
        at org.marc4j.MarcPermissiveStreamReader.parseRecordLength(MarcPermissiveStreamReader.java:1264)
        at org.marc4j.MarcPermissiveStreamReader.next(MarcPermissiveStreamReader.java:274)
        at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:253)
        at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:534)

The same file, without line-feeds does not have any problem.....

greetings
Hannah

Simon Spero

unread,
Oct 25, 2012, 3:44:41 PM10/25/12
to solrma...@googlegroups.com
cr and lf are not allowed.   Which is good, as III inserts them in inappropriate places.

Simon


Hannah

--
You received this message because you are subscribed to the Google Groups "solrmarc-tech" group.
To view this discussion on the web visit https://groups.google.com/d/msg/solrmarc-tech/-/rm9_q1Z4rOEJ.
To post to this group, send email to solrma...@googlegroups.com.
To unsubscribe from this group, send email to solrmarc-tec...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/solrmarc-tech?hl=en.

Robert Haschart

unread,
Oct 25, 2012, 5:06:18 PM10/25/12
to solrma...@googlegroups.com
CR and LF are not allowed according to the spec.   However the marc4j  MarcPermissiveStreamReader that reads in binary encoded MARC records is designed to handle a wide variety of ill-formed, not-to-spec, MARC records.  Specifically including ones that have character returns or line feeds inserted between the records.  

However in looking at the error message below, it is clear that the version of marc4j you are using contains  MarcPermissiveStreamReader.java  version 1.9    and the code to handle the intervening CR's or LF's isn't present
until MarcPermissiveStreamReader.java   version 1.10 

Retrieving a more recent version or SolrMarc which contains a more recent version of marc4j should help handle this problem. 

However in looking back over some messages to the list from you back in August, it seems you have been wanting to upgrade your SolrMarc for other reasons already. 

I will make it a priority over the next few days to assist you in upgrading your version of SolrMarc.

-Bob Haschart

Hannah

unread,
Oct 26, 2012, 2:42:16 AM10/26/12
to solrma...@googlegroups.com
Hi.
many thanks for the answers!


I have a program  where I make 'readable' marc-files, using Marc4j.
Its small code:
#############
public static void main (String[] args) throws IOException{
       
        InputStream in = null;
        Record record = null;
       
        File marc = new File("/data/marcdaten/exdat_ubt_test5.readable");
        try {
            marc.createNewFile();
        } catch (IOException e) {
           
            e.printStackTrace();
        }
        FileOutputStream out = new FileOutputStream(marc);       
        OutputStreamWriter schreibeStrom = new OutputStreamWriter(out);       
        BufferedWriter bw = new BufferedWriter(schreibeStrom);
       
        try {
             in = new FileInputStream("/data/marcdaten/rds/exdat_ubt_test5.mrc");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        MarcReader reader = new MarcStreamReader(in);
        while (reader.hasNext()) {
             record = reader.next();
           //System.out.println(record.toString());
            bw.write(record.toString());      
        }
        bw.close();
#############       

I downloaded marc4j-2.5.1.beta.jar and now I get the following error:

LEADER 00226cu  a2200121ui 4500
001 04792582
003 DE-21
004 20070987
005 20111221000000.0
008 110621||||||||||||||||ger|||||||
852   $aW
852  1$a290$cZA 10093
938   $zL

Exception in thread "main" org.marc4j.MarcException: unable to parse record length
        at org.marc4j.MarcStreamReader.parseRecordLength(MarcStreamReader.java:351)
        at org.marc4j.MarcStreamReader.next(MarcStreamReader.java:138)
        at marc4opac.MakeMrcReadable.start(MakeMrcReadable.java:63)
        at marc4opac.Starter.main(Starter.java:63)
Caused by: java.lang.NumberFormatException: For input string: "
0022"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:470)
        at java.lang.Integer.parseInt(Integer.java:514)
        at org.marc4j.MarcStreamReader.parseRecordLength(MarcStreamReader.java:349)
        ... 3 more

...

Robert.

We have different installations. One for our productive Catalogue, where we use an old Solrmarc .
This installation will replace next year, so we don't want to change anything.
And a new Test-Installation with VuFind and the newest SolrMarc and Solr. (we managed it :-) (this will be productive next year)
You are right, when I try to index the data with line-feed with the VuFind installation there a no errors.


But my own programs for loading marc-data in the database, or making marc-files 'readable' don't work.

But if as you say: "CR and LF are not allowed according to the spec" , I can tell this to our provider, so he has to give us valid Marc21 files.

Hannah Ullrich

Robert Haschart

unread,
Oct 26, 2012, 11:06:30 AM10/26/12
to solrma...@googlegroups.com
Hannah,

If in your code sample below you instead do the following, which uses the MarcPermissiveStreamReader
your program should work with the data you have.  You need to pass in a few additional parameters to the constructor as shown below.

public MarcPermissiveStreamReader(InputStream input, boolean permissive, boolean convertToUTF8, String defaultEncoding)

permissive -- specifies whether the reader should be much more forgiving of data errors.  Note that even if this value is false, the reader can handle the CR LF issue you are seeing.

convertToUTF8 -- specifies whether the record that is read in should be converted to UTF8 character encoding as it is read in.  Note that if you specify true, and the record is already in UTF8, the character encoding will not be changed.  Specifying false should leave the character encoding unchanged.

defaultEncoding -- specifies what encoding the records are expected to be in for those that are not in UTF8, values can be "MARC8"  "UTF8"  "Unimarc"   "ISO-8859-1"



public static void main (String[] args) throws IOException{
       
        InputStream in = null;
        Record record = null;
       
        File marc = new File("/data/marcdaten/exdat_ubt_test5.readable");
        try {
            marc.createNewFile();
        } catch (IOException e) {
           
            e.printStackTrace();
        }
        FileOutputStream out = new FileOutputStream(marc);       
        OutputStreamWriter schreibeStrom = new OutputStreamWriter(out);       
        BufferedWriter bw = new BufferedWriter(schreibeStrom);
       
        try {
             in = new FileInputStream("/data/marcdaten/rds/exdat_ubt_test5.mrc");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
        boolean permissive = true;
        boolean convertToUTF8 = true;
        String defaultEncoding = "MARC8";
        MarcReader reader = new MarcPermissiveStreamReader(in, permissive, convertToUTF8, defaultEncoding);
        while (reader.hasNext()) {
             record = reader.next();
           //System.out.println(record.toString());
            bw.write(record.toString());      
        }
        bw.close();

Hannah

unread,
Oct 27, 2012, 11:02:41 AM10/27/12
to solrma...@googlegroups.com
Robert.

many thanks !! It works :-)

after importing icu4j-4_8_1_1.jar my program can read the file with LF.

have a nice weekend
Hannah
Reply all
Reply to author
Forward
0 new messages