Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Converting MARC fields with Catmandu - repeated subfields being squished together.

5 views
Skip to first unread message

Robin Sheat

unread,
Jun 5, 2014, 11:11:24 PM6/5/14
to perl4lib
I'm using catmandu to JSON-ise MARC records for storage in
Elasticsearch, and seem to have come up with something that I can't
readily see how to fix (without getting down and dirty with fixers.)

I have a record that has this:

["650"," ","0","a","Time","v","Pictorial works","v","Juvenile
literature.","9","15531"]

and a mapping:

marc_map('650v', 'subject.$append')

This works well enough in most cases, however when the subfield is
doubled up, I end up with:

"subject":["Time","Pictorial worksJuvenile literature."]

The $append doesn't seem to apply in this case. This only seems to
happen to repeats within a field, other 650$v subfields are in their own
strings, though suffer the same problem.

Is this a bug in Catmandu-MARC? I've tried reading the marc_map.pl file,
but the lack of internal documentation, and the nature of what it's
doing make it not the easiest thing to understand.

--
Robin Sheat
Catalyst IT Ltd.
+64 4 803 2204
GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
signature.asc

Patrick Hochstenbach

unread,
Jun 6, 2014, 12:53:21 AM6/6/14
to Robin Sheat, perl4lib
Hi Robin

By default all repeated subfields get joined by empty space, you can set this with the 'join' option:

marc_map('650v','subject',join:'%%%')

gives you:

"subject","Pictorial works%%%Juvenile"

Or, if you have many 650 fields they are all joined into one string:

"subject","Pictorial works%%%Juvenile%%%foo%%%bar%%%test"

With the split_field command you can turn this again into an array:

split_field('subject','%%%')

gives you

"subject",["Pictorial works","Juvenile","foo","bar","test"]

Cheers
Patrick

PS. Indeed, the marc_map.pl is a bit cryptic. We are compiling perl scripts to make the executing much faster. The developers are now figuring out how to refactor this compilation out so that the Fix packages are easier to read.
________________________________________
From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Friday, June 06, 2014 5:11 AM
To: perl4lib
Subject: Converting MARC fields with Catmandu - repeated subfields being squished together.

Patrick Hochstenbach

unread,
Jun 6, 2014, 1:02:08 AM6/6/14
to Robin Sheat, perl4lib
Btw I've updates the Fixes cheat sheet at our Wiki to reflect your question :)

https://github.com/LibreCat/Catmandu/wiki/Fixes-Cheat-Sheet
________________________________________
From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Friday, June 06, 2014 5:11 AM
To: perl4lib
Subject: Converting MARC fields with Catmandu - repeated subfields being squished together.

Robin Sheat

unread,
Jun 8, 2014, 10:50:11 PM6/8/14
to perl4lib
Patrick Hochstenbach schreef op vr 06-06-2014 om 06:53 [+0200]:
> By default all repeated subfields get joined by empty space, you can
> set this with the 'join' option:
>
> marc_map('650v','subject',join:'%%%')

This doesn't work:

$ cat test.fixes
marc_map('650','subject',join:'###');
remove_field('record');

(the remove is just to make the results easier to see.)

In the MARC record I'm experimenting with:

650 0 _aCounting
_vPictorial works
_vJuvenile literature.
650 0 _aEnglish language
_xAlphabet
_vPictorial works
_vJuvenile literature.
_914467
650 0 _aTime
_vPictorial works
_vJuvenile literature.
_915531
650 0 _aChildren's stories, English
_vPictorial works.

$ catmandu convert MARC --fix test.fixes < test.marc
can't load fix marc_map('650','subject',join:'###');
remove_field('record');
: Not enough arguments for join or string at (eval 85) line 1, near "join:"
syntax error at (eval 85) line 1, near "join:"

Followed by a trace. The same goes when I attempt to use split:1, and
pretty much anything after the two parameters.
signature.asc

Robin Sheat

unread,
Jun 8, 2014, 10:58:32 PM6/8/14
to perl4lib
Robin Sheat schreef op ma 09-06-2014 om 14:50 [+1200]:
> $ cat test.fixes
> marc_map('650','subject',join:'###');
> remove_field('record');

Ah, I found that I need to change the syntax a bit:

marc_map('650','subject', -split => 1);

gives me:

{"subject":[["Counting","Pictorial works","Juvenile
literature."],["English language","Alphabet","Pictorial works","Juvenile
literature.","14467"],["Time","Pictorial works","Juvenile
literature.","15531"],["Children's stories, English","Pictorial
works."]],"_id":"5567128"}

which is closer. Is there an easy way to flatten those arrays?

Otherwise I can go with join and the split, but this way seems cleaner.

Actually, I wonder if nested arrays would work even better for my
purposes, I guess I should test it...
signature.asc

Patrick Hochstenbach

unread,
Jun 10, 2014, 1:08:12 AM6/10/14
to Robin Sheat, perl4lib
Hi Robin

Sure

join_field("subject.*"," ");
join_field("subject","<br>");

The first join is for concatenating all the subfields. The second join is for all the field.

In the new Catmandu version we are enhancing the language a bit, thats why I might have written my previous examples with the new syntax.

Greetings from ELAG2014 in Bath!

Patrick
________________________________________
From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Monday, June 09, 2014 4:58 AM
To: perl4lib
Subject: Re: Converting MARC fields with Catmandu - repeated subfields being squished together.

Robin Sheat

unread,
Jun 10, 2014, 1:32:19 AM6/10/14
to perl4lib
Patrick Hochstenbach schreef op di 10-06-2014 om 07:08 [+0200]:
> Sure
>
> join_field("subject.*"," ");
> join_field("subject","<br>");
>
> The first join is for concatenating all the subfields. The second join
> is for all the field.

Thanks.

I actually found out that Elasticsearch is totally happy with nested
arrays, and they're causing no problems at all like that, so I've just
left it as it is and it's working great.
signature.asc

Robin Sheat

unread,
Jun 10, 2014, 10:44:22 PM6/10/14
to perl4lib
I'm attempting to have a fixer that says "if this field hasn't already
been set, set it to "0", so I have:

unless exists('onloan')
add_field('onloan', '0')
end

This causes the error:

can't load fix unless exists('onloan')
add_field('onloan', '0')
end
: coercion for "_fixer" (constructor argument: "fix") failed: Bareword found where operator expected at (eval 85) line 2, near ")
add_field"

It's possibly of note that the following from the cheat sheet fails
also:

unless exists('oogly')
upcase('foo') # foo => 'bar'
end

with the same form of error. (Also, that comment is probably incorrect.)

A related question:

When I have a fixer like this that is three lines long, and I'm passing
in an array of fixers to the Catmandu::Fix constructor, what do I do
with the multiline ones? Should it be three strings in the array, should
it one string all on one line, or should it be one string with embedded
newlines?

When I try the final option, I get the error:

Unsuccessful stat on filename containing newline at /usr/share/perl5/Catmandu/Fix/Loader.pm line 33.

which is a standard Perl exception when you do a file operation on a
string with a \n anywhere that fails.
signature.asc

Patrick Hochstenbach

unread,
Jun 11, 2014, 1:44:47 AM6/11/14
to Robin Sheat, perl4lib
Hi Robin

You just might need to upgrade your Catmandu ("cpan Catmandu") to get the fixed. You are probably using 0.9* syntax features with an older version of Catmandu.

As for you second question. Yes, in Catmandu 0.9* all these combinations will work:

my $fixer = Catmandu->fixer('do_this()', 'do_that()');
my $fixer = Catmandu->fixer(['do_this()', 'do_that()']);
my $fixer = Catmandu->fixer(<<EOF);
do_this()
do_that()
EOF
my $fixer = Catmandu->fixer('fix.txt'); with fix.txt a file with the fixes
my $fixer = Catmandu->fixer('name'); with 'name' as a section in a catmandu.yml configuration file

You might want to join our librecat-dev mailing list where we regulary post update and discuss technicall issues:

http://mail.librecat.org/mailman/listinfo/librecat-dev

Greetings from ELAG2014 in Bath
Patrick
________________________________________
From: Robin Sheat [ro...@catalyst.net.nz]
Sent: Wednesday, June 11, 2014 4:44 AM
To: perl4lib
Subject: Converting MARC fields with Catmandu - 'unless exists' failing.

Robin Sheat

unread,
Jun 12, 2014, 2:26:27 AM6/12/14
to perl...@perl.org
Patrick Hochstenbach schreef op wo 11-06-2014 om 07:44 [+0200]:
> You just might need to upgrade your Catmandu ("cpan Catmandu") to get the fixed. You are probably using 0.9* syntax features with an older version of Catmandu.

Ah, turns out I was on an 0.8 version, from around a month ago.

Unfortunately building the new version is going to be difficult as it
requires YAML::XS, and that has some /weird/ build conditions, like
requiring to be in a git repo and a specific branch. I'll see how I go.

> You might want to join our librecat-dev mailing list where we regulary post update and discuss technicall issues:
>
> http://mail.librecat.org/mailman/listinfo/librecat-dev

Cheers, I'll do that.
signature.asc
0 new messages