Bibliothèque nationale de France (BnF) translator

68 views
Skip to first unread message

ziche

unread,
Oct 1, 2010, 9:51:20 AM10/1/10
to zotero-dev
I submitted a site translator (Bnf.js) for the French National Library
catalogue (http://catalogue.bnf.fr) which uses the UNIMARC data they
provide. Data gets run through the MARC translator and some specific
postprocessing is done. No extensive testing has been done so far, and
I am new to Zotero and not exactly a JavaScript guru either, so any
feedback would be welcome.

ziche

unread,
Oct 3, 2010, 7:38:03 AM10/3/10
to zotero-dev
Added support for documents available in digital form. The appropriate
Gallica (see http://gallica.bnf.fr) links will be stored in the item's
url property.

http://groups.google.com/group/zotero-dev/web/BnF.js

Avram Lyon

unread,
Oct 3, 2010, 8:57:42 AM10/3/10
to zoter...@googlegroups.com
> On Oct 1, 3:51 pm, ziche <zi...@noos.fr> wrote:
>> I submitted a site translator (Bnf.js) for the French National Library
>> catalogue (http://catalogue.bnf.fr) which uses the UNIMARC data they
>> provide. Data gets run through the MARC translator and some specific
>> postprocessing is done. No extensive testing has been done so far, and
>> I am new to Zotero and not exactly a JavaScript guru either, so any
>> feedback would be welcome.

This looks great and works well in my testing. I will commit it to the
repository soon.

I do have a question about your MARC changes, however -- are these due
to shortcomings in MARC and UNIMARC support that should be changed in
MARC.js? It looks like you have some expertise in MARC, so your
suggestions for improvements to the main MARC translator would be
welcome as well.

2010/10/3 ziche <zi...@noos.fr>:


> Added support for documents available in digital form. The appropriate
> Gallica (see http://gallica.bnf.fr) links will be stored in the item's
> url property.

Would it be possible to attach the original images from Gallica as
attachments to the Zotero item? Also, support for gallica.bnf.fr
search result and item pages would be nice to add in the future.

My suggestions above are just suggestions for future improvements --
the translator as it stands is very useful already and I look forward
to getting it to users.

- Avram

ziche

unread,
Oct 3, 2010, 12:30:44 PM10/3/10
to zotero-dev
I am glad to hear you find this useful. There is nothing wrong with
the MARC translator, though it could use some makeover. Two kinds of
difficulties arise:

1. mappings from the UNIMARC terminology to Zotero properties,
itemTypes, creatorTypes etc. are non-trivial. No global mechanism will
cover the particularities of a given catalog (I am mapping french
publication type denominators to Zotero item types - this is hardly
portable). This is why the MARC.js record.translate() method cannot be
much more than a reasonable starting point. We could, however, include
things like my mapping of UNIMARC relator codes to Zotero creatorTypes
in the next MARC translator.

2. the MARC translator currently does not cover or expose all the
specifics of UNIMARC records. There is no comfortable way to access
multiple subfields for a single identifier (meaning that you might get
"London: SomePublisher" instead of "London, New York, Tokio:
SomePublisher" when using high level record methods). Another (though
rather esoteric) missing feature is support for embedded fields
(potentially used in the 4## tags). These are shortcomings that could
be fixed in a newer version. I will let you know if I can find the
time to do this.

Concerning Gallica: there exists a translator for Gallica contents
(Gallica.js, though it appears to be broken for search result pages).
It would indeed be possible to use my BnF-MARC-processor instead of
its current screen scraping, but I do not wish to interfere with
somebody else's work. When you suggest to attach Gallica images to
Zotero items: I suppose you are not thinking about actual snapshots
(~60kB for a single page of text), but about adding one attachment per
page, linking to the corresponding image URL?

Best, Florian

On Oct 3, 2:57 pm, Avram Lyon <ajl...@gmail.com> wrote:
> > On Oct 1, 3:51 pm, ziche <zi...@noos.fr> wrote:
> >> I submitted a site translator (Bnf.js) for the French National Library
> >> catalogue (http://catalogue.bnf.fr) which uses the UNIMARC data they
> >> provide. Data gets run through the MARC translator and some specific
> >> postprocessing is done. No extensive testing has been done so far, and
> >> I am new to Zotero and not exactly a JavaScript guru either, so any
> >> feedback would be welcome.
>
> This looks great and works well in my testing. I will commit it to the
> repository soon.
>
> I do have a question about your MARC changes, however -- are these due
> to shortcomings in MARC and UNIMARC support that should be changed in
> MARC.js? It looks like you have some expertise in MARC, so your
> suggestions for improvements to the main MARC translator would be
> welcome as well.
>
> 2010/10/3 ziche <zi...@noos.fr>:
>
> > Added support for documents available in digital form. The appropriate
> > Gallica (seehttp://gallica.bnf.fr) links will be stored in the item's

Avram Lyon

unread,
Oct 3, 2010, 1:08:45 PM10/3/10
to zoter...@googlegroups.com
Florian,

2010/10/3 ziche <zi...@noos.fr>:


> 1. mappings from the UNIMARC terminology to Zotero properties,

[..]


> much more than a reasonable starting point. We could, however, include
> things like my mapping of UNIMARC relator codes to Zotero creatorTypes
> in the next MARC translator.
>
> 2. the MARC translator currently does not cover or expose  all the
> specifics of UNIMARC records. There is no comfortable way to access
> multiple subfields for a single identifier (meaning that you might get
> "London: SomePublisher" instead of "London, New York, Tokio:
> SomePublisher" when using high level record methods). Another (though
> rather esoteric) missing feature is support for embedded fields
> (potentially used in the 4## tags). These are shortcomings that could
> be fixed in a newer version. I will let you know if I can find the
> time to do this.

These are precisely the kinds of changes that would be good to address
at some point. As you note correctly, the MARC translator is rarely
sufficient on its own for a specific library, but it forms the core of
MARC processing for almost all the Zotero library translators, meaning
that improvements to its core functionality will immediately improve
Zotero's performance with thousands of library catalogs. Of course,
that also means that if we make a mistake in modifying MARC.js, we
will break thousands of catalogs. Oh well. :)

If you find the time to work on MARC/UNIMARC, your efforts would be
very welcome. Fortunately, it works sufficiently well already, so
improvements are not urgent.

> Concerning Gallica: there exists a translator for Gallica contents
> (Gallica.js, though it appears to be broken for search result pages).
> It would indeed be possible to use my BnF-MARC-processor instead of
> its current screen scraping, but I do not wish to interfere with
> somebody else's work. When you suggest to attach Gallica images to
> Zotero items: I suppose you are not thinking about actual snapshots
> (~60kB for a single page of text), but about adding one attachment per
> page, linking to the corresponding image URL?

If the Gallica site is similar enough to the rest of the BnF catalogs
to make including it fairly straightforward, please feel free to be
bold and handle it from the main BnF translator. Sylvain Machefert,
the author of the Gallica translator, is active on the mailing list
(he posted in September). I imagine he would be amiable to merging
them, so long as the functionality was not lost.

My suggestion for adding images was actually that you attach a PDF of
the digital object to the item
(http://www.pacifier.com/~tpope/Accessing_Manuscripts.htm#Downloading_with_Gallica),
as is done with many article databases, and as in the recently
committed Papers Past and National Archives of Australia translators.
In the case of Gallica, there is some risk that the document will be
overly large, so perhaps downloading shouldn't be automatic, but such
downloading can be incredibly useful. It appears that every, or at
least many, Gallica items have a download link, and it should be
possible to fetch the PDF automatically. Whether you decide to
implement this has to do with whether you see Gallica as similar to
Google Books, where downloading is not automatic, or to the National
Archives of Australia, where it is. For smaller items, automatic
downloading would be great. For larger ones, it might not be.

Regards and thanks for your contributions,

Avram

Avram

Avram Lyon

unread,
Oct 3, 2010, 2:58:04 PM10/3/10
to zotero-dev
The BnF translator and a patch to the Gallica translator have been
committed to SVN (https://www.zotero.org/trac/changeset/6984). Thanks
to Florian for the quick work on the Gallica translator.

- Avram

2010/10/3 Avram Lyon <ajl...@gmail.com>:

ziche

unread,
Oct 3, 2010, 3:43:30 PM10/3/10
to zotero-dev

ziche

unread,
Oct 3, 2010, 3:58:19 PM10/3/10
to zotero-dev
Thanks for the link to the Galilei site - a fine example of reverse
engineering. As far as I could see, the
http://visualiseur.bnf.fr/Visualiseur?Destination=Gallica&O=NUMM-94893&I=40
links (pointing to pages with pdf download links) are still
operational, but not linked from the public Gallica interface any more
(at least I could not get there from http://gallica.bnf.fr/document?O=N094893).
So it might be risky to rely on these URLs for downloads. Those PDF
downloads are of a peculiar kind, anyway: the BnF server will create
PDFs on request and store them for 2 days on a public FTP server (30MB
of Galilei's Opera omnia are stll waiting for me). Immediate download
would be our only option, then. Doing all this synchronously would
make the saving of a single Zotero item impossibly slow, and it is
beyond me to do it the asynchronous way. I updated the BnF.js,
however, to handle the Galiei sample (http://catalogue.bnf.fr/servlet/
biblio?idNoeud=1&ID=30476052&SN1=0&SN2=0&host=catalogue) with its
multiple visualization URLs.

http://groups.google.com/group/zotero-dev/web/BnF.js

Thanks for your suggestions! Best, Florian
> (http://www.pacifier.com/~tpope/Accessing_Manuscripts.htm#Downloading_...),

Avram Lyon

unread,
Oct 3, 2010, 4:15:56 PM10/3/10
to zotero-dev
2010/10/3 ziche <zi...@noos.fr>:

> Thanks for the link to the Galilei site - a fine example of reverse
> engineering. As far as I could see, the
> http://visualiseur.bnf.fr/Visualiseur?Destination=Gallica&O=NUMM-94893&I=40
> links (pointing to pages with pdf download links) are still
> operational, but not linked from the public Gallica interface any more
> (at least I could not get there from http://gallica.bnf.fr/document?O=N094893).
> So it might be risky to rely on these URLs for downloads. Those PDF

Click on the box with an arrow pointing out of it, next to the (i)
information button at the top of the document. It is exposed, in a way
somewhat similar to some of the PDF download pages for journal
archives.

> [..] downloads are of a peculiar kind, anyway: the BnF server will create


> PDFs on request and store them for 2 days on a public FTP server (30MB
> of Galilei's Opera omnia are stll waiting for me). Immediate download
> would be our only option, then. Doing all this synchronously would

> make the saving of a single Zotero item impossibly slow [..]

Item saving in Zotero is asynchronous already-- Firefox and Zotero
should not block while the item is being saved. The question here is
whether users _want_ to be saving the full-text documents in their
personal collections. I don't know the answer to that question.

Avram

ziche

unread,
Oct 3, 2010, 5:29:25 PM10/3/10
to zotero-dev
You're quite right - I mistook the download button for some kind of
fullscreen-mode button. Still, the link to the generated PDF will be
known only after the download form has been submitted and the BnF
server has finished creating the file - as far as my understanding
goes, this delay would be part of the Zotero item saving process, even
if the download of the PDF runs asynchronously. Probably there is no
easy way to handle this - the material accessible through Gallica is
simply to
inhomogeneous (from scanned medieval documents to ebooks), so there
can be no general download policy. I suppose we just leave it to the
users, hoping they are able to tell a download from a fullscreen
button.

Best, Florian

On Oct 3, 10:15 pm, Avram Lyon <ajl...@gmail.com> wrote:
> 2010/10/3 ziche <zi...@noos.fr>:
>
> > Thanks for the link to the Galilei site - a fine example of reverse
> > engineering. As far as I could see, the
> >http://visualiseur.bnf.fr/Visualiseur?Destination=Gallica&O=NUMM-9489...
> > links (pointing to pages with pdf download links) are still
> > operational, but not linked from the public Gallica interface any more
> > (at least I could not get there fromhttp://gallica.bnf.fr/document?O=N094893).

sylvain

unread,
Oct 4, 2010, 3:10:57 AM10/4/10
to zotero-dev
Hi,
I'm very happy to see that someone has developped a translator for BnF
(and code seems better that what I should have done ! ).
There's no problem to adapt Gallica translator to rely on this one,
instead of screenscraping method.

So Florian, just do what you think is better. What I still haven't
understood is how you get to unimarc display from a gallica web page.
This one for example : http://gallica.bnf.fr/ark:/12148/btv1b1200153m
. If you look at record, there's a "relation" field, with url
http://catalogue.bnf.fr/ark:/12148/cb384925524 which is the detailled
view in the catalog, is this how you want to do ? Are you sure this
"relation" field is always a link to the record ?

And thanks again for the BnF translator !

Sylvain

ziche

unread,
Oct 4, 2010, 6:12:44 AM10/4/10
to zotero-dev
I am glad you like the BnF translator. I did not dive into the Gallica
pages to any considerable depth, but from what I saw I assumed the
"relation" field would indeed always link to the corresponding BnF
entry - or at least: if it links to the BnF, it will link to the
correct entry. We could still screen-scrape if the relation is
missing, or pointing anywhere else than catalogue.bnf.fr. I'll let you
know when I've got a viable patch for Gallica.js making use of the
Unimarc data.

Best, Florian

On Oct 4, 9:10 am, sylvain <smachef...@gmail.com> wrote:
> Hi,
> I'm very happy to see that someone has developped a translator for BnF
> (and code seems better that what I should have done ! ).
> There's no problem to adapt Gallica translator to rely on this one,
> instead of screenscraping method.
>
> So Florian, just do what you think is better. What I still haven't
> understood is how you get to unimarc display from a gallica web page.
> This one for example :http://gallica.bnf.fr/ark:/12148/btv1b1200153m
> . If you look at record, there's a "relation" field, with urlhttp://catalogue.bnf.fr/ark:/12148/cb384925524which is the detailled

sylvain

unread,
Oct 4, 2010, 6:57:44 AM10/4/10
to zotero-dev
Fine. Just another information, about the MARC.js translator, I did
the changes to make it unimarc compatible. If you have found mistakes
while doing the BnF translator, don't hesitate to update it, I only
did a basic translator with main fields, but there's a lot of unimarc
format which is not managed by this translator.

And if you've questions about what I did with Gallica translator,
don't hesitate to contact me by mail (et en français c'est aussi
simple ;))

Sylvain

On 4 oct, 12:12, ziche <zi...@noos.fr> wrote:
> I am glad you like the BnF translator. I did not dive into the Gallica
> pages to any considerable depth, but from what I saw I assumed the
> "relation" field would indeed always link to the corresponding BnF
> entry - or at least: if it links to the BnF, it will link to the
> correct entry. We could still screen-scrape if the relation is
> missing, or pointing anywhere else than catalogue.bnf.fr. I'll let you
> know when I've got a viable patch for Gallica.js making use of the
> Unimarc data.
>
> Best, Florian
>
> On Oct 4, 9:10 am, sylvain <smachef...@gmail.com> wrote:
>
> > Hi,
> > I'm very happy to see that someone has developped a translator for BnF
> > (and code seems better that what I should have done ! ).
> > There's no problem to adapt Gallica translator to rely on this one,
> > instead of screenscraping method.
>
> > So Florian, just do what you think is better. What I still haven't
> > understood is how you get to unimarc display from a gallica web page.
> > This one for example :http://gallica.bnf.fr/ark:/12148/btv1b1200153m
> > . If you look at record, there's a "relation" field, with urlhttp://catalogue.bnf.fr/ark:/12148/cb384925524whichis the detailled

ziche

unread,
Oct 7, 2010, 4:22:38 PM10/7/10
to zotero-dev
Uploaded a BnF.js version fixing a silly typo ("contibutor" instead of
"contributor"...).

Avram Lyon

unread,
Oct 7, 2010, 4:39:10 PM10/7/10
to zoter...@googlegroups.com
Fixed in trunk: https://www.zotero.org/trac/changeset/7021

2010/10/8 ziche <zi...@noos.fr>:


> Uploaded a BnF.js version fixing a silly typo ("contibutor" instead of
> "contributor"...).
>

> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To post to this group, send email to zoter...@googlegroups.com.
> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.
>
>

Reply all
Reply to author
Forward
0 new messages