Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Fwd: some transformations on file

4 views
Skip to first unread message

Marc Chantreux

unread,
Jan 20, 2013, 12:13:46 PM1/20/13
to samuel desseaux, perl...@perl.org
hello,

Just 2 notes about your attached content:

* please don't do that on mailing list: it's unsolicited content.
provide download urls instead.
* those are not marc files so the exemples given below don't work as
long as you haven't translated it to iso2709.

On Sun, Jan 20, 2013 at 03:54:18PM +0100, samuel desseaux wrote:
> Hi,
>
> I work on files for our library and i need some help.
>
> I have one file with all biblio records and one with items. A biblio record
> can have one or more than one item.
>
> First operation: i want to compare the two files and the identifier is the
> field 001. I want to have th results in two separates files
>
> 1st: all the items which have the same 001 field like in the biblio record
>
> 2nd: all the items which have not the same 001 field like in the biblio
> record

not tested but here is a good base:

use Modern::Perl;
use autodie;
use MARC::MIR;

my %biblio;
my %report;
map { open $report{$_},">$_.matches.txt" } qw< do dont >;

marawk { $biblio{(record_id)}=1 } 'biblio.mrc';
marawk {
my $id = record_id;
my $as = $biblio_id{ $to } ? 'do' : 'dont';
say $report{$as}, $id;
} 'items.mrc';


> Second operation: In my item files, all items of a same biblio record have
> the same 001 field but they are all separated. I'd like to join all the
> items under only one 001 field

a) be carefull: it will load the whole file in memory
b) not tested :)

use Modern::Perl;
use autodie;
use MARC::MIR;

my %items_for;
marawk { push @ { $items_for{(record_id)} } , $_ } 'items.mrc';

open my $fh,'>sorted.items.mrc';
map { map {print $fh to_iso2709} @$_ } values %items_for;

> After, with the new file, i want to merge with biblio record and if i find
> 2 identical 001, i attached the items on the biblio record

i don't get it. you want to merge item records and biblio record?

> Third operation: how can i correct some data bad encoded. It's due to the
> old database which doesn't respect UTF8.

i see no problem in the provided content.

regards
marc

Marc Chantreux

unread,
Jan 20, 2013, 12:14:20 PM1/20/13
to samuel desseaux, perl...@perl.org
hello,

tu peux me redonner un lien vers les fichiers marc ?

cordialement,
marc

--
Marc Chantreux
Université de Strasbourg, Direction Informatique
14 Rue René Descartes,
67084 STRASBOURG CEDEX
☎: 03.68.85.57.40
http://unistra.fr
"Don't believe everything you read on the Internet"
-- Abraham Lincoln

Marc Chantreux

unread,
Jan 20, 2013, 12:16:36 PM1/20/13
to perl...@perl.org
oops! sorry about it: bad destination

samuel desseaux

unread,
Jan 20, 2013, 12:43:38 PM1/20/13
to perl...@perl.org
* if it's a better solution, i will put my files(converted in iso2709) on
dropbox,


*the goal is to join properly items with biblio records. As we have to
separate files, it's a bit hard. With MarcEdit, if i merge these two files,
it's limited: marcedit doesn't understand that one biblio record can have
more than one item :-). I won't say any more about my library and his
exotical old ils i've moved for koha.








2013/1/20 Marc Chantreux <m...@unistra.fr>

Paul Hoffman

unread,
Jan 20, 2013, 8:10:14 PM1/20/13
to perl...@perl.org
On Sun, Jan 20, 2013 at 06:43:38PM +0100, samuel desseaux wrote:
> *the goal is to join properly items with biblio records.

Let's assume that you have these two files:

(B) Three MARC bibliographic records

1. 001 = 1029
2. 001 = 3884
3. 001 = 1650
(etc.)

(I) Seven MARC item records

1. 001 = 1029
2. 001 = 1650
3. 001 = 1029
4. 001 = 3884
5. 001 = 3884
6. 001 = 1650
7. 001 = 1650

Do you want to produce a *new* file of three records, like this?

1. I1 + I3
2. I4 + I5
3. I2 + I6 + I7

Is this really what you want to have in the end?

> As we have to separate files, it's a bit hard. With MarcEdit, if i
> merge these two files, it's limited: marcedit doesn't understand that
> one biblio record can have more than one item :-). I won't say any
> more about my library and his exotical old ils i've moved for koha.

It sounds as though what you *really* want in the end is a *single* file
of three MARC records, like this:

B1 + I1 + I3
B2 + I4 + I5
B3 + I2 + I6 + I7

Is that right? Here's a rough start in Perl:

-------->8-------->8-------->8-------->8-------->8-------->8-------->8--------
use MARC::File;
my ($file, %records);
$file = MARC::File::USMARC->in($bib_records_file);
while (my $bib_marc = read_next_record_from($file) {
my $sysnum = sysnum($bib_marc);
$records{$sysnum} = [ $bib_marc ];
}
$file->close;
$file = MARC::File::USMARC->in($bib_records_file);
while (my $item_marc = read_next_record_from($file) {
my $sysnum = sysnum($item_marc);
push @{ $records{$sysnum} }, $item_marc;
}
$file->close;
print @$_ for values %records;
-------->8-------->8-------->8-------->8-------->8-------->8-------->8--------

Let us know if you need help writing read_next_record_from() or
sysnum().

Paul.

--
Paul Hoffman <nku...@nkuitse.com>

samuel desseaux

unread,
Jan 21, 2013, 8:26:30 AM1/21/13
to perl...@perl.org
Hi Paul,

yes, it's exactly the way i try to follow.

I've my algorithm but it's a bit hard, (for the moment but i hope to have
more time to learn perl) to write a good code.


samuel


2013/1/21 Paul Hoffman <nku...@nkuitse.com>
0 new messages