Non-numeric MARC tags

19 views
Skip to first unread message

Ross Singer

unread,
Jun 10, 2009, 9:49:44 PM6/10/09
to solrmarc...@googlegroups.com
Is there any reason why SolrMarc is restricted to only numeric field
codes? Is the use case restricted to MARC21?

From http://www.loc.gov/marc/96principl.html#six:

6. Variable Fields and Tags

6.1. The data in a MARC record is organized into fields, each
identified by a three-character tag.
6.2. According to ANSI Z39.2, the tag must consist of alphabetic or
numeric ASCII graphic characters, i.e., decimal integers 0-9 or
letters A-Z (uppercase or lowercase, but not both). The MARC 21
formats have used only numeric tags.

I know our MARC export from Alto can contain alpha tags and I think
Aleph does this too.

I had hoped to add information from Jangle (URIs, etc.) in tags like
these (JAN or JGL, etc.) -- instead I've hacked a 004 tag for
Blacklight and coopted the 001 for VuFind, but all this feels dirty.
Not that JGL is great or anything, but it provides a pretty simple way
to hack some extra semantics into a MARC record without having to add
any Java.

Alternately, if there's no interest in this, any recommendations on
how I can do something similar?

Thanks,
-Ross.

Robert Haschart

unread,
Jun 11, 2009, 12:42:01 PM6/11/09
to solrmarc...@googlegroups.com
With a relatively minor change to the Solrmarc code it can be made to happily handle alphabetic tags.
The only code that assumes that the tag is numeric is the code that determines whether the tag is for a
control field or a data field.  (where tag value <= 9  implies control field)

I added a function to test for a tag being a control field (in SolrIndexer.java): 

    protected static boolean isControlField(String fieldTag)
    {
        if (fieldTag.matches("00[0-9]"))
        {
            return(true);
        }
        return(false);
    }

and then changed the two functions that try to convert a passed in field tag to an Integer to use this
new function:

    @SuppressWarnings("unchecked")
    protected static Set<String> getSubfieldDataAsSet(Record record, String fldTag, String subfldsStr, String separator)
    {
        Set<String> resultSet = new LinkedHashSet<String>();

        // Process Leader
        if (fldTag.equals("000"))
        {
            resultSet.add(record.getLeader().toString());
            return resultSet;
        }
       
        // Loop through Data and Control Fields
        // int iTag = new Integer(fldTag).intValue();
        List<VariableField> varFlds = record.getVariableFields(fldTag);
        for (VariableField vf : varFlds)
        {
            if (!isControlField(fldTag) && subfldsStr != null)
            {
                // DataField
                DataField dfield = (DataField) vf;

                if (subfldsStr.length() > 1 || separator != null)
                {
                    // Allow automatic concatenation of grouped subfields
                    StringBuffer buffer = new StringBuffer("");
                    List<Subfield> subFlds = dfield.getSubfields();
                    for (Subfield sf : subFlds)
                    {
                        if (subfldsStr.indexOf(sf.getCode()) != -1)
                        {
                            if (buffer.length() > 0) 
                                buffer.append(separator != null ? separator : " ");
                            buffer.append(sf.getData().trim());
                        }
                    }                       
                    if (buffer.length() > 0)
                        resultSet.add(buffer.toString());
                }
                else
                {
                    // get all instances of the single subfield
                    List<Subfield> subFlds = dfield.getSubfields(subfldsStr.charAt(0));
                    for (Subfield sf : subFlds)                        
                    {
                        resultSet.add(sf.getData().trim());
                    }
                }
            }
            else
            {
                // Control Field
                resultSet.add(((ControlField) vf).getData().trim());
            }
        }
        return resultSet;
    }


I'll probably check this change into the trunk in the near future.

-Bob

Ross Singer

unread,
Jun 12, 2009, 10:39:54 AM6/12/09
to solrmarc...@googlegroups.com
Bob, this would be very cool.

So, if I'm reading this correctly -- Control fields would remain
numeric, but data fields could be alpha or numeric?

This works fine for me -- I just want to confirm.

Thanks!
-Ross.

Robert Haschart

unread,
Jun 12, 2009, 11:13:08 AM6/12/09
to solrmarc...@googlegroups.com
Ross,

That's correct, since Control fields do not and can not have subfields, they have to be handled different than Data fields, and as far as I am aware, what defines a Control field is having a tag that is a three digit numeric string with a value less than 10.

    -Bob Haschart
Reply all
Reply to author
Forward
0 new messages