MARC and UNIMARC

90 views
Skip to first unread message

ziche

unread,
Oct 7, 2010, 5:04:27 PM10/7/10
to zotero-dev, smach...@gmail.com
Following a suggestion by Avram, I had a deeper look into Marc.js and
the possibilities of enhancing MARC support (with Sylvain's kind
support). This concerns UNIMARC in particular, but I am trying to
address Marc-21, too. I have uploaded some discussion materials to
http://zotero-dev.googlegroups.com/web/Marc2-suggestions.zip. As Avram
pointed out, there is nothing to be gained and much to be lost by
hurrying things in such a central place as the MARC engine is—this is
just meant as a starting point.
The ZIP file contains
- an importer Marc2.js, capable of importing binary Unimarc- and
Marc-21 files. My import tests have been very limited, though.
Marc2.js also exposes a "Marc" namespace which is the entry point for
translators using the module. This namespace attempts to model MARC
records with leaders, field and subfields, exposes a number of handy
constants and helper classes that clients can use to convert between
Zotero items, MARC records and external MARC representations (textual
or binary).
- the JSDoc-generated docs for Marc2.js (obviously insufficient)
- an example translator Library Catalog (Dynix)2.js using Marc2.js. I
tested it against http://www.babord.univ-bordeaux.fr. The example
intends to demonstrate how Marc translators could become a little
cleaner and maybe easier to maintain.

There is nothing revolutionary in this, and the current MARC support
works fine, thanks to all the guys who contributed to it. I just
thought we might be able to exploit the richness and granularity of
MARC information a little further, and to ease the creation of new
MARC based site translators. Let me know about your views on the
subject.

Avram Lyon

unread,
Oct 7, 2010, 5:22:26 PM10/7/10
to zoter...@googlegroups.com
2010/10/8 ziche <zi...@noos.fr>:

> Following a suggestion by Avram, I had a deeper look into Marc.js and
> the possibilities of enhancing MARC support (with Sylvain's kind
> support). This concerns UNIMARC in particular, but I am trying to
> address Marc-21, too. I have uploaded some discussion materials to
> http://zotero-dev.googlegroups.com/web/Marc2-suggestions.zip. As Avram
> pointed out, there is nothing to be gained and much to be lost by
> hurrying things in such a central place as the MARC engine is—this is
> just meant as a starting point.

I should note that I have also been muddling about in MARC.js in the
past few days, as I try to get Marc-21's multilingual data support to
feed into the multilingual frameworks of Frank's Multilingual Zotero
project. Any major new development on the core translators like MARC
should probably take into account the coming possibility of storing
translations and transliterations of titles, authors, and other data.

I'll take a look at the materials soon and comment as I can -- it
would be good if you took a look at what Frank has produced
(http://gsl-nagoya-u.net/http/pub/zotero-multilingual-overview.html)
and perhaps experimented with the test XPI or the SVN branch
(https://www.zotero.org/trac/browser/extension/branches/trunk-multilingual).
You probably shouldn't try too hard to make a translator work with
this experimental branch right now, since it is liable to change
before it merges to the Zotero trunk, but it's still reason enough to
think about how you can gather the multiple representations of
bibliographic data from MARC.

Regards,

Avram

ziche

unread,
Oct 7, 2010, 5:58:11 PM10/7/10
to zotero-dev
Hi Avram,

thank you for the hint concerning multilingual Zotero. I have
installed the multilingual version just now and will be running some
tests. By the way, the BnF's Unimarc test data (http://www.bnf.fr/
documents/autorites_unimarc_iso5426.not) is almost entirely
multilingual (they put all the scripts they have in there). Marc2.js
imports it into multilingual Zotero just fine, but up to now without
making use of the multilingual features (title versions etc. are
concatenated; compare with the UNIMARC code attached as note to each
imported item). It will make a useful test case for me.

Best, Florian

Avram Lyon

unread,
Oct 8, 2010, 12:59:03 AM10/8/10
to zotero-dev
Florian,

2010/10/8 ziche <zi...@noos.fr>:
> [..] I have uploaded some discussion materials to
> http://zotero-dev.googlegroups.com/web/Marc2-suggestions.zip. [..]

I'm having trouble downloading the zip file. In light of the announced
coming closure of the Files sections on Google Groups, could you post
this somewhere else -- maybe as separate files on Github or Bitbucket?
That way you'll get revision histories as a nice bonus as well, and
people will be able to reliably access the files.

Group admins: Can you change the group description to reflect that the
Files section is no longer recommended for code submissions?

- Avram

ziche

unread,
Oct 8, 2010, 2:33:26 AM10/8/10
to zotero-dev
Strange things happen to the file links. The plain link
http://zotero-dev.googlegroups.com/web/Marc2-suggestions.zip (without
any parameters) seems to work for me, however. I'm looking at github.

ziche

unread,
Oct 8, 2010, 3:09:18 AM10/8/10
to zotero-dev
I put it on github:
http://github.com/zomark/zotero-marc

Frank Bennett

unread,
Oct 8, 2010, 10:55:04 PM10/8/10
to zotero-dev
Florian, Avram,

It's great to see work on the multilingual support so soon. Avram
pointed out an issue with language tagging in MARC -- it uses ISO
639-2 three-character language tags, many of which don't appear in the
IANA Language Subtag Registry that multilingual Zotero looks to when
validating tags.

I've added a conversion table to the multilingual branch, and set up
the validator to call it. Such tags should now just normalize
transparently to their ISO 639-1 equivalents, where one exists
(untested, but I'm pretty sure it will work when fired in anger).

The upshot of this is that you should be able to feed MARC tags
directly to the setMultiField() function or to the servantLang
property currently used on creators, and have everything work normally
-- the tag should turn up in the Preferences menu under the normalized
tagname after the item is viewed in the item info panel.

Re possible changes to the existing code, I think the setMultiField()
function will prove stable, but the use of servantLang on creator
objects is probably going to change. If all goes well, I should have
the changes in place in the next month or so.

Frank

Frank Bennett

unread,
Oct 9, 2010, 1:31:13 AM10/9/10
to zotero-dev
On Oct 9, 11:55 am, Frank Bennett <biercena...@gmail.com> wrote:
> Florian, Avram,
>
> It's great to see work on the multilingual support so soon. Avram
> pointed out an issue with language tagging in MARC -- it uses ISO
> 639-2 three-character language tags, many of which don't appear in the
> IANA Language Subtag Registry that multilingual Zotero looks to when
> validating tags.
>
> I've added a conversion table to the multilingual branch, and set up
> the validator to call it.  Such tags should now just normalize
> transparently to their ISO 639-1 equivalents, where one exists
> (untested, but I'm pretty sure it will work when fired in anger).
>
> The upshot of this is that you should be able to feed MARC tags
> directly to the setMultiField() function or to the servantLang
> property currently used on creators, and have everything work normally
> -- the tag should turn up in the Preferences menu under the normalized
> tagname after the item is viewed in the item info panel.
>
> Re possible changes to the existing code, I think the setMultiField()
> function will prove stable, but the use of servantLang on creator
> objects is probably going to change.  If all goes well, I should have
> the changes in place in the next month or so.
>
> Frank

Have done a quick check with the new code in place, and it seems to
work. If the language is not known, it is registered in the
Preferences nickname list with the original tag as the nickname value,
and the translated (IANA) value as the actual tag. The nickname can
be edited, and the edit will stick across subsequent imports with the
same tag value. Looks about right all 'round.

Frank Bennett

unread,
Oct 9, 2010, 5:30:01 AM10/9/10
to zotero-dev
Had a chance to do a more careful check just now, and found that the
three-character tag was being saved to the database, although the
display layer used the correct IANA tag value. That's been fixed now
(in SVN and XPI both), but if you have run any data with MARC language
tags through the client that was up for the past few hours, you may
need to open the database with an editor, and delete the offending tag
entries from the "zlsTags" table by hand.

Other than _that_, it does look about right all 'round. Now.

ziche

unread,
Oct 9, 2010, 11:47:26 AM10/9/10
to zotero-dev
Frank, Avram,

I had a chance to play with Unimarc and multilingual Zotero. The
latest Marc2.js version on http://github.com/zomark/zotero-marc
attempts to import multilingual titles (I did nor get any further)
from binary UNIMARC (my test case is still
http://www.bnf.fr/documents/bibliographiq_unimarc_iso5426.not). The
only issue was with the ZlsValidation code, line 85sqq.:
if (this.remnant.length == 1) {
var sql = 'SELECT iana FROM isoTagMap WHERE iso=?';
var res = Zotero.DB.valueQuery(sql, [this.remnant[0]]);
if (res) {
this.remnant[0] = res;
}
}
would nor run for my tests, because I feed lang-script pairs into the
setMultiField method, not language codes only, so remnant.length will
be > 1 (Unimarc has got some rather primary support for variant
scripts). Commenting out the "if" condition did the trick.

The remaining problem is, of course, that there are too many different
ways in MARC/UNIMARC to code multilingual contents (Unimarc knows
parallel titles in subfields of 200, parallel titles in 510-fields,
and the BnF does it still another way (multiple 200 tags, in breach of
an explicit "Non repatable" in the specs.

Best, Florian

ziche

unread,
Oct 9, 2010, 4:04:54 PM10/9/10
to zotero-dev
Frank,
I still have trouble with the IOS language codes. I am using the
latest SVN sources, but still get three-character IOS codes in the
database (I tried to figure out where the ZlsValidator would kick in
during the item saving process, but did not find such a place). Once
the ISO codes in the database (e.g. zh-Hans), I have two alternatives:
either I do comment out the multilingual.js-line-85-condition (see my
previous post): then the languages will get created in the prefs, but
the item Info GUI is messed up (it contains just item type and the
title) - or I leave multilingual.js as it is: then the item will
display like a plain single-language item, and no languages added in
the prefs. Such behavior is of course to be expected with a mismatch
between the database and display layer concerning the language codes -
but what am I missing? I am using Marc2.js to import items from the
Bnf test set (see previous post) - in Marc2.js you may want to edit
the Marc.IO.setMaxImportRecords(10); line to alter the size of the
test set.
Best, Florian



On Oct 9, 5:47 pm, ziche <zi...@noos.fr> wrote:
> Frank, Avram,
>
> I had a chance to play with Unimarc and multilingual Zotero. The
> latest Marc2.js version onhttp://github.com/zomark/zotero-marc
> attempts to import multilingual titles (I did nor get any further)
> from binary UNIMARC (my test case is stillhttp://www.bnf.fr/documents/bibliographiq_unimarc_iso5426.not). The

Frank Bennett

unread,
Oct 9, 2010, 4:47:19 PM10/9/10
to zotero-dev
On Oct 10, 5:04 am, ziche <zi...@noos.fr> wrote:
> Frank,
> I still have trouble with the IOS language codes. I am using the
> latest SVN sources, but still get three-character IOS codes in the
> database (I tried to figure out where the ZlsValidator would kick in
> during the item saving process, but did not find such a place). Once
> the ISO codes in the database (e.g. zh-Hans), I have two alternatives:
> either I do comment out the multilingual.js-line-85-condition (see my
> previous post): then the languages will get created in the prefs, but
> the item Info GUI is messed up (it contains just item type and the
> title) - or I leave multilingual.js as it is: then the item will
> display like a plain single-language item, and no languages added in
> the prefs. Such behavior is of course to be expected with a mismatch
> between the database and display layer concerning the language codes -
> but what am I missing? I am using Marc2.js to import items from the
> Bnf test set (see previous post) - in Marc2.js you may want to edit
> the Marc.IO.setMaxImportRecords(10); line to alter the size of the
> test set.
> Best, Florian

Florian,

Sorry that it's not working as advertised. I'll take a look this
morning. As you saw in the code, I had blindly assumed that Unimarc
would use only single primary tags, without variants. I'll drop that
assumption.

For initial registrations, the validator is run from xpcom/data/
cachedLanguages.js. But the code is pretty green, I'll dig into it
with your translator code and report back.

Frank

Frank Bennett

unread,
Oct 9, 2010, 5:33:30 PM10/9/10
to zotero-dev
On Oct 10, 5:47 am, Frank Bennett <biercena...@gmail.com> wrote:
> > I am using Marc2.js to import items from the
> > Bnf test set (see previous post) - in Marc2.js you may want to edit
> > the Marc.IO.setMaxImportRecords(10); line to alter the size of the
> > test set.

I have downloaded the test data (both the bibliographiq and autorites
files), but I'm unable to get the import translator to run against
them. Here's what I've done:

- Downloaded the current MARC2.js source from the github link:
http://github.com/zomark/zotero-marc
- Copied MARC2.js into the Zotero translators directory
- Opened the gear menu and selected Import
- Selected one of the *.not files from the directory listing
- Clicked Open (or whatever)

The browser then throws up a popup asking me whether I want to to open
the file with an editor, or save it to disk. I click "Cancel", and
the import progress bar appears, and runs indefinitely. At that
point, I think JS has crashed, leaving the progress bar graphic
spinning without effect.

Should I be following a different procedure?

Frank

ziche

unread,
Oct 9, 2010, 6:12:19 PM10/9/10
to zotero-dev
This appears to be an older Zotero issue. You have to select
"UTF-8" (or anything but "auto-detect") in Prefs | Export | Import
character encoding.

In the meantime, I fixed a couple of things in Marc2.js, and added
some homemade language code validation in Item.setMultiField, just to
see what happens. I get the correct codes in the database, and on the
first visualization of each item the dependent entries are in place.
It's in subsequent clicks on the same items that they disappear and I
get an "Invalid language tag" message on the console (from
itembox.xml).

Best, Florian

Frank Bennett

unread,
Oct 9, 2010, 6:42:49 PM10/9/10
to zotero-dev
Okay, a little progress here:

- I was dropping the translator in the wrong location (SVN rather than
the running instance). Oops.
- I've set the import encoding in prefs, and that has made the
download attempt go away.
- There was a bug affecting display of language nicknames in ML
Zotero. Fix has been checked in.
- The MARC.js translator had to be moved out of the way, as it seemed
to block invocation of MARC2.js.

With those fixes in place, the translator runs, but I get corrupted
entries (authors where titles belong, random garbage in other
fields). I'll try again a bit later with a refreshed copy of
MARC2.js, but meanwhile it might be worth updating ML Zotero from SVN,
to see if the fix I just put in has any effect.

Frank

Frank Bennett

unread,
Oct 9, 2010, 7:03:36 PM10/9/10
to zotero-dev
A couple of corrections to my previous message.

With the github file of size 95.208kb (lasted download), and with
MARC.js moved out of the way, I'm getting empty entries against the
sample file autorites_unimarc_iso5426.not. The other sample file
(bibliographiq_unimarc_iso5426.not) fails with an invalid-file
complaint.

With MARC.js in place, I still get an error against
bibliographiq_unimarc_iso5426.not, and no entries, but in that
configuration, the autorites_unimarc_iso5426.not file processes
completely (all entries), which I take to mean that the MARC.js
translator is trumping MARC2.js. The items in this case contain
random garbage in their fields.

Will investigate further later today.

Frank

Frank Bennett

unread,
Oct 9, 2010, 8:40:19 PM10/9/10
to zotero-dev
The translator (commit # 86f9d9d0eb84055abc07) is currently
intepreting the autorites_unimarc_iso5426.not file as marc21, so the
multilingual conditional is not even being reached. A trace placed in
hasField() shows that only one field type (001) is being seen, and
only one per record. The hasField test used for unimarc
discrimination is therefore turning up false, and it's falling back to
marc21 (which produces no output, because the content fields are
apparently not being parse out.)

I've done a fresh grab of the source file with wget, and it's a clean
download. The only thing I can figure (assuming we are running the
same translator code) is that some difference in our respective
platforms has an impact on the way binary files are parsed in Firefox
JS. Maybe. I'm at a loss. If others can test the translator on other
platforms, maybe a pattern will emerge.

Here is the full link to the file used for testing:

http://www.bnf.fr/documents/autorites_unimarc_iso5426.not

Here is the link I used for the translator:

http://github.com/zomark/zotero-marc/blob/master/MARC2.js

Here are the details of my system here:

Linux (Ubuntu 9.10, I think)
Firefox 3.6.11
No extensions other than Zotero and Zotero OpenOffice Integration
Multilingual Zotero SVN
Import encoding set to UTF8
MARC.js file removed from Zotero translators directory

(Note to anyone who picks this up: On Linux at least, the backup files
with a tilde extension created by some editors will often load instead
of the most recent file version, so they may need to be deleted by
hand at each testing iteration.)

Hope this helps,
Frank

Frank Bennett

unread,
Oct 9, 2010, 11:48:54 PM10/9/10
to zotero-dev
Found it. The sample file autorites_unimarc_iso5426.not appears to be
written in Adobe Standard Encoding:

http://www.ascii.ca/adobestd.htm

Frank

ziche

unread,
Oct 10, 2010, 5:13:53 AM10/10/10
to zotero-dev
Thanks for all that testing, Frank. I think we are getting somewhere.

- the authorities test set is not meant to be imported by Zotero, I
posted this link by mistake. It's the bibliographic items test set.
- if you rename the test set to .marc2, you should be able to leave
Marc.js in place and have Marc2.js handle the import anyway. I put the
file on github: http://github.com/zomark/zotero-marc/blob/master/sampledata/bibliographiq_unimarc_utf8.marc2
- your CachedLanguages patch resolved the problem I had yesterday with
"Invalid language tag" messages
- I do get garbage, too, unless I take care to run the ZlsValidator
before committing language codes to the database, I patched item.js
for this purpose:
http://github.com/zomark/zotero-marc/blob/master/chrome-patches/content/zotero/xpcom/data/item.js,
but there might be better places.
In the same context, the IOS/IANA lookup has to be prepared to handle
lang-Script codes: http://github.com/zomark/zotero-marc/blob/master/chrome-patches/content/zotero/xpcom/multilingual.js.
Both patches seem reasonable to me, but I am just beginning to
understand the multilingual stuff.
- I uploaded a screenshot of how the result should look like:
http://github.com/zomark/zotero-marc/blob/master/screenshots/MultilingualItem01.png

Thanks for everything, Florian

东东爸

unread,
Oct 10, 2010, 5:45:53 AM10/10/10
to zoter...@googlegroups.com
On Sun, Oct 10, 2010 at 5:13 PM, ziche <zi...@noos.fr> wrote:
- I uploaded a screenshot of how the result should look like:
http://github.com/zomark/zotero-marc/blob/master/screenshots/MultilingualItem01.png

Sorry for asking an unrelated question: how to make title displays in two different language like the screenshot shows above? I noticed that the second line of title has a language tip, "ar-Arab", shows before the Arab language title, how could it happen? 

Thanks in advance.

--
Best Regards!

Ace Strong

ziche

unread,
Oct 10, 2010, 7:20:42 AM10/10/10
to zotero-dev
It's not part of the current Zotero distribution, but a feature of
Frank's multilingual extensions that will occasionally get merged into
the official Zotero. Have a look at http://gsl-nagoya-u.net/http/pub/zotero-multilingual-overview.html
for the details and installation links.

On Oct 10, 11:45 am, 东东爸 <acestr...@gmail.com> wrote:
> On Sun, Oct 10, 2010 at 5:13 PM, ziche <zi...@noos.fr> wrote:
>
> > - I uploaded a screenshot of how the result should look like:
>
> >http://github.com/zomark/zotero-marc/blob/master/screenshots/Multilin...

东东爸

unread,
Oct 10, 2010, 7:46:52 AM10/10/10
to zoter...@googlegroups.com
Thank you for your prompt, ziche.

--
You received this message because you are subscribed to the Google Groups "zotero-dev" group.
To post to this group, send email to zoter...@googlegroups.com.
To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.




--
Best Regards!

Ace Strong

==================================================
Nanjing University of Aeronautics and Astronautics.
College of Civil Aviation
TAO Cheng
E-mail: aces...@gmail.com ;aces...@nuaa.edu.cn
==================================================

Frank Bennett

unread,
Oct 10, 2010, 8:46:45 AM10/10/10
to zotero-dev
On Oct 10, 6:13 pm, ziche <zi...@noos.fr> wrote:
> Thanks for all that testing, Frank. I think we are getting somewhere.
>
> - the authorities test set is not meant to be imported by Zotero, I
> posted this link by mistake. It's the bibliographic items test set.
> - if you rename the test set to .marc2, you should be able to leave
> Marc.js in place and have Marc2.js handle the import anyway. I put the
> file on github:http://github.com/zomark/zotero-marc/blob/master/sampledata/bibliogra...
> - your CachedLanguages patch resolved the problem I had yesterday with
> "Invalid language tag" messages
> - I do get garbage, too, unless I take care to run the ZlsValidator
> before committing language codes to the database, I patched item.js
> for this purpose:http://github.com/zomark/zotero-marc/blob/master/chrome-patches/conte...,
> but there might be better places.
> In the same context, the IOS/IANA lookup has to be prepared to handle
> lang-Script codes:http://github.com/zomark/zotero-marc/blob/master/chrome-patches/conte....
> Both patches seem reasonable to me, but I am just beginning to
> understand the multilingual stuff.
> - I uploaded a screenshot of how the result should look like:http://github.com/zomark/zotero-marc/blob/master/screenshots/Multilin...
>
> Thanks for everything, Florian

By golly, it works. Content comes in from the test file without a
hitch.

In the branch, I've adopted your patch to remove the one-subtag
condition, in multilingual.js / ZlsValidator().

On the patch to item.js, I see what you meant about the need to
normalize tags in the save as well as the UI preference now -- sorry,
it's been busy here with the start of term, and programming has been
getting the short end of my attention span. The patch is fine, but
I've moved the code into translate.js, where it will run only when
importing items. When setMultiField() is run from the UI, it will
always be fed pre-validated tags from the Preferences list, so we can
save the overhead of validation there. I've also set things up so
that fields with invalid tags will passed over after they are logged.
I put the fix in for creators as well as ordinary fields, so when you
get those going, you should be able to feed them the raw tags from
unimarc as well.

It does indeed look like we're getting somewhere!

Frank

ziche

unread,
Oct 10, 2010, 2:25:23 PM10/10/10
to zotero-dev
Thanks for the patches. I think translate.js, line 1631 should be
var langTag = data[j].servantLang;
instead of
var langTag = data[j].lang;
otherwise everything seems to work fine.

I have done some excursions into the Marc-21 world, and uploaded a
modified LibraryOfCongress translator to
http://github.com/zomark/zotero-marc/blob/master/Library%20Catalog%20(Voyager-Marc2).js
- you will need to provide a current Marc2.js and to disable "Library
Catalog (Voyager).js" to try it out. It handles Marc-21 linked fields
(tag 880) and my test item was
http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&CMD=lccn%222009363874%22&v3=1&CNT=10.
Thanks to Simon who wrote the original translator in such a clean
fashion that the changes were trivial.

(You must be in a better position to find suitable multi-script test
material where you are - in the LOC catalog I had a hard time locating
a single Russian classic with cyrillic title or author data. This, and
the list of scripts supported by the official Marc-21 standard (a
total of 6, with a single entry for "Chinese, Japanese, Korean"),
makes me think the occident is doomed, and rightly so: but this has
been discussed on another thread, hasn't it).

Greetings, Florian

Avram Lyon

unread,
Oct 10, 2010, 2:44:08 PM10/10/10
to zotero-dev
2010/10/10 ziche <zi...@noos.fr>:

> (You must be in a better position to find suitable multi-script test
> material where you are - in the LOC catalog I had a hard time locating
> a single Russian classic with cyrillic title or author data. This, and
> the list of scripts supported by the official Marc-21 standard (a
> total of 6, with a single  entry for "Chinese, Japanese, Korean"),
> makes me think the occident is doomed, and rightly so: but this has
> been discussed on another thread, hasn't it).

This frustrated me as well. To be fair, the language variant system is
more flexible than the script variant system. My map was:
switch (tag) {
case "(3": return "Arab";
case "(B": return "Latn";
case "$1": return "Qabk"; // MARC has a single
tag for CJK. That won't do...
case "(N": return "Cyrl";
case "(S": return "Grek";
case "(2": return "Hebr";
default : return "Zyyy"; // Return something
technically valid

I thought about doing detection to distinguish CJK, but that's likely
to cause even more issues. My solution here is to use a private-use
script tag. I've thought some about private-use script and variant
tags -- they're not currently supported by the code (right?), but
perhaps we could approve certain ones as they prove necessary (like
this one), and submit truly necesary and useful ones to IETF for
approval. Thus, we'd enrich our registry a little to support user
needs, but then we could migrate those tags to the final approved ones
if/when they pass the gauntlet of ietf-languages.

I'm behind on the progress you and Frank have made with the
translators, but I'm excited and happy to see that this next important
piece of the multilingual project is coming together so cleanly and
quickly. I'll provide more feedback as soon as I get a chance to make
a running install with all the new test translators.

I should also note that some other catalogs make wider use of the
multi-script support in MARC-21-- most recent Russian publications in
the UCLA catalog include Cyrillic (ru-Cyrl) in addition to LoC
romanization (ru-Latin-alalc97); see http://catalog.library.ucla.edu
(persistent links are not a strong point of the catalog). It also uses
Voyager, so the translator should work unchanged.

Regards to the list,

Avram

ziche

unread,
Oct 10, 2010, 3:29:16 PM10/10/10
to zotero-dev
Hi Avram
I can see you have advanced further into the script/variant issues
than I have. Private use tags are probably a good idea to cope with
the MARC limitations (btw, the Unimarc script list reads

ba: "Latn", //Latin
ca: "Cyrl", //Cyrillic
da: "Jpan", //Japanese - script unspecified
db: "Hani", //Japanese - kanji
dc: "Kana", //Japanese - kana
ea: "Hans", //Chinese
fa: "Arab", //Arab
ga: "Grek", //Greek
ha: "Hebr", //Hebrew
ia: "Thai", //Thai
ja: "Deva", //Devanagari
ka: "Kore", //Korean
la: "Taml", //Tamil
ma: "Geor", //Georgian
mb: "Armn", //Armenian
zz: "Xyyy"

To say this is better than MARC-21 seems to be an exaggeration).

I had a look at UCLA and found, for example, a resource with "LC
control number: 2007222933" (it can be searched as "keyword
anywhere"). It turns out their parallel titles, places or edition
statements do not carry any language/script information at all...
Maybe we should translate these cases simply to dependent entries with
a private tag, and the multilingual UI should allow the user to
manually switch the language/script tag for an entry, or even for all
"private tag" entries of an item at the same time.

Best, Florian
> romanization (ru-Latin-alalc97); seehttp://catalog.library.ucla.edu

ziche

unread,
Oct 10, 2010, 4:14:44 PM10/10/10
to zotero-dev
Giving it another thought: while it may not be applicable to the CJK
problem, some kind of "script sniffing" method (based on, lets say,
http://www.unicode.org/charts/index.html#scripts) could solve problems
like the untagged cyrillic in the UCLA catalog. This could be set up
pretty easily with a lookup table in sqlite. Or ... does anybody know
whether Google Translate's language auto-detect feature is readily
available as a Web service - so that we could get around all those
lazy catalogers and ethnocentric standard authors altogether? It must
be getting late...
Best, Florian

ziche

unread,
Oct 10, 2010, 4:48:31 PM10/10/10
to zotero-dev
I'd sure like to try this out: http://code.google.com/apis/ajaxlanguage/documentation/#Translation.
Let's say we commit dependent entries with some kind of private use
language/script code, meaning the metadata were insufficient to
determine the language and/or script - we could then do an
asynchronous query against the Google API. Any objections ?:)

Frank Bennett

unread,
Oct 10, 2010, 8:51:48 PM10/10/10
to zotero-dev
On Oct 11, 3:44 am, Avram Lyon <ajl...@gmail.com> wrote:
> 2010/10/10 ziche <zi...@noos.fr>:
>
> > (You must be in a better position to find suitable multi-script test
> > material where you are - in the LOC catalog I had a hard time locating
> > a single Russian classic with cyrillic title or author data. This, and
> > the list of scripts supported by the official Marc-21 standard (a
> > total of 6, with a single  entry for "Chinese, Japanese, Korean"),
> > makes me think the occident is doomed, and rightly so: but this has
> > been discussed on another thread, hasn't it).

Yikes. Now that's what I call an own-goal. But if the primary tag is
correct, the RFC recommends leaving out the script tag for the most-
common case, and that applies here. I think we can escape this one
without data loss by just being lazy.

>
> This frustrated me as well. To be fair, the language variant system is
> more flexible than the script variant system. My map was:
>         switch (tag) {
>                 case "(3": return "Arab";
>                 case "(B": return "Latn";
>                 case "$1": return "Qabk";       // MARC has a single
> tag for CJK. That won't do...
>                 case "(N": return "Cyrl";
>                 case "(S": return "Grek";
>                 case "(2": return "Hebr";
>                 default : return "Zyyy";        // Return something
> technically valid

I'll look at adjusting the validator machinery to not drop the field
entry on the floor when only script validation fails, so we can see
how that works out.

>
> I thought about doing detection to distinguish CJK, but that's likely
> to cause even more issues.

That might work pretty well for titles, but names are composed only of
Han characters, which are shared between all three.

> My solution here is to use a private-use
> script tag. I've thought some about private-use script  and variant
> tags -- they're not currently supported by the code (right?), but
> perhaps we could approve certain ones as they prove necessary (like
> this one), and submit truly necesary and useful ones to IETF for
> approval. Thus, we'd enrich our registry a little to support user
> needs, but then we could migrate those tags to the final approved ones
> if/when they pass the gauntlet of ietf-languages.
>
> I'm behind on the progress you and Frank have made with the
> translators, but I'm excited and happy to see that this next important
> piece of the multilingual project is coming together so cleanly and
> quickly. I'll provide more feedback as soon as I get a chance to make
> a running install with all the new test translators.
>
> I should also note that some other catalogs make wider use of the
> multi-script support in MARC-21-- most recent Russian publications in
> the UCLA catalog include Cyrillic (ru-Cyrl) in addition to LoC
> romanization (ru-Latin-alalc97); seehttp://catalog.library.ucla.edu

Frank Bennett

unread,
Oct 10, 2010, 8:54:40 PM10/10/10
to zotero-dev
On Oct 11, 3:25 am, ziche <zi...@noos.fr> wrote:
> Thanks for the patches. I think translate.js, line 1631 should be
> var langTag = data[j].servantLang;
> instead of
> var langTag = data[j].lang;
> otherwise everything seems to work fine.

Thanks. Fixed and checked in.

>
> I have done some excursions into the Marc-21 world, and uploaded a
> modified LibraryOfCongress translator tohttp://github.com/zomark/zotero-marc/blob/master/Library%20Catalog%20...
> - you will need to provide a current Marc2.js and to disable "Library
> Catalog (Voyager).js" to try it out. It handles Marc-21 linked fields
> (tag 880) and my test item washttp://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&CMD=lccn%222009....
> Thanks to Simon who wrote the original translator in such a clean
> fashion that the changes were trivial.

The LoC link has a timed-out session ID in it, which doesn't affect
display of the page, but blocks acquisition of the MARC record. With
a fresh access to the page, it works as advertised here, complete with
creators. Nice!

ziche

unread,
Oct 11, 2010, 3:48:38 AM10/11/10
to zotero-dev
As it could be expected, the Google Detect Language API is full of Big
Brother stuff (technically and legally), and the websphere is already
full of people complaining that it will falsely detect Serbian as
Croatian (imagine). So back to something simpler. Unicode.org has a
complete mapping of code points to scripts (http://www.unicode.org/
Public/UNIDATA/Scripts.txt); if we pack this into a data table, we
could at least do some kind of automatic script detection of our own
(though it won't resolve all the problems of the far east languages).

Frank Bennett

unread,
Oct 11, 2010, 4:45:13 AM10/11/10
to zotero-dev
As I wrote earlier, I'm certain that for CJK languages in the original
script, the only tag needed is the language itself. There are
phonetic variants internal to Japanese (katakana and hiragana), but
it's probably safe to assume that a service that lumps CJ&K together
isn't going to provide those anyway.

Does that help?

Frank

Avram Lyon

unread,
Oct 11, 2010, 5:20:35 AM10/11/10
to zoter...@googlegroups.com
[Note to zotero-dev people: If the multilingual branch traffic is
getting out of hand, we can move it to a different mailing list or to
private discussion.]

2010/10/10 ziche <zi...@noos.fr>:


> I had a look at UCLA and found, for example, a resource with "LC
> control number: 2007222933" (it can be searched as "keyword
> anywhere"). It turns out their parallel titles, places or edition
> statements do not carry any language/script information at all...
> Maybe we should translate these cases simply to dependent entries with
> a private tag, and the multilingual UI should allow the user to
> manually switch the language/script tag for an entry, or even for all
> "private tag" entries of an item at the same time.

UCLA does show some of the data quality issues that are endemic to
library catalogs. In cases where alternate languages and scripts are
present in the input but not tagged, I think it would be best to
assign them the unknown language/script tags as appropriate.

Another record at UCLA (ISBN 9785170545421) provides "Cyrl" alternate
data for the title and other information, but the language is not
specified in the record. I think that we should save the primary data
(fields 100, 245, etc) as the parent record without marking its
language, and save the Cyrl-tagged alternate data in the 880 fields,
using the tag "und-Cyrl".

In general, I don't think that most script-sniffing is necessary,
since the language tag is most important, and few if any sniffing
algorithms can determine that with any reliability. We have ways to
mark data as missing using the existing language tag system-- we can
leverage that.

My suggestion of a private use tag for the CJK catch-all used in
MARC-21 was intended to find a way to maintain the data we have
gleaned from the MARC record-- we don't want to say "Zyyy"
(undetermined) since we in fact know that the script is one of Hans,
Hani, Kana, Jpan, Kore.

In short, I support saving all language variants with their own
language tags if known, and with "und" if not known. If a script
variant is provided and the script is not specified (Florian has found
one of these), then we should save the data as "-Zyyy". If we also
don't know the language, I suppose the data will have to be saved as
"und-Zyyy". In Florian's example, the data in the 880 fields would
have to be saved as und-Zyyy, since it is not declared to be Cyrillic
and the language of the metadata is not specified anywhere.

This is default behavior for the MARC translator that I am describing
-- if we know that a specific catalog has a systematically
non-standard way of handling language variants, we can then override
the process for that catalog.

The introduction of und and Zyyy for data raises the possibility of
multiple language variants that are all tagged with these unknown
tags. Do we want to allow such a beast? Will that make the system far
too complicated?

A final thought is that not only CiNii could benefit from Zotero users
contributing improved metadata -- multilingual metadata is hard to do
and universally weak. A path for improved metadata back to library
OPACs would be quite valuable as well.

Regards,

Avram

ziche

unread,
Oct 11, 2010, 6:46:02 AM10/11/10
to zotero-dev
Hi Avram,

> UCLA does show some of the data quality issues that are endemic to
> library catalogs. In cases where alternate languages and scripts are
> present in the input but not tagged, I think it would be best to
> assign them the unknown language/script tags as appropriate.
>
> Another record at UCLA (ISBN 9785170545421) provides "Cyrl" alternate
> data for the title and other information, but the language is not
> specified in the record. I think that we should save the primary data
> (fields 100, 245, etc) as the parent record without marking its
> language, and save the Cyrl-tagged alternate data in the 880 fields,
> using the tag "und-Cyrl".
>
> In general, I don't think that most script-sniffing is necessary,
> since the language tag is most important, and few if any sniffing
> algorithms can determine that with any reliability. We have ways to
> mark data as missing using the existing language tag system-- we can
> leverage that.

Actually, in your example the language is coded into the 008 field,
and the latest Voyager/Marc2 translators will handle this record
correctly. I still think my former sample (declared Russian,
undeclared Cyrillic), as well as many BnF records I have seen, could
benefit from script sniffing. I uploaded a proposal for a database
table to http://github.com/zomark/zotero-marc/blob/master/sql/unicodeScriptMap.sql.
This contains a normalized form of the scripts list at Unicode.org,
and allows for script queries like "select script from
unicodeScriptMap where fromCode>=myCodePoint and toCode<=myCodePoint".

I fully agree with your point about private script tags for those
cases where we no more than nothing, but less than necessary to
attribute a IANA tag.

Concerning language tags: could we add the ZlsValidator logic to the
translate code section where the "language" field is actually saved?
Marc translators will deliver things like "eng" or "eng fre", bur the
validator is capable of normalizing these tags.

Best, Florian

Avram Lyon

unread,
Oct 11, 2010, 7:02:46 AM10/11/10
to zoter...@googlegroups.com
2010/10/11 ziche <zi...@noos.fr>:

> Actually, in your example the language is coded into the 008 field,
> and the latest Voyager/Marc2 translators will handle this record

Oops. My sight-reading of MARC isn't very good, as you can tell.

> correctly. I still think my former sample (declared Russian,
> undeclared Cyrillic), as well as many BnF records I have seen, could
> benefit from script sniffing. I uploaded a proposal for a database

> table [..]

Now that I see how this would work, it seems a lot more reasonable
than I had feared. This sort of limited lookup would be good for those
cases when we have code 880 alternate graphic representations and we
don't know how to tag the alternate content. We could use this for all
unspecified scripts, since many US catalogs have primary data
exclusively in Latn-- we don't want to end up with "ru" and "ru-Cyrl",
where the former is actually referring to "ru-Latn", just because I
wanted to be cautious about sniffing.

- Avram

Frank Bennett

unread,
Oct 11, 2010, 7:36:30 AM10/11/10
to zotero-dev
On Oct 11, 7:46 pm, ziche <zi...@noos.fr> wrote:

[...]

> Concerning language tags: could we add the ZlsValidator logic to the
> translate code section where the "language" field is actually saved?
> Marc translators will deliver things like "eng" or "eng fre", bur the
> validator is capable of normalizing these tags.
>
> Best, Florian

The idea would be to leave the field unconstrained in Zotero, but for
translator-acquired records, to populate it with a normalized field,
if possible? For that, the validator can be made available for use
inside translators. In that case, I should change the way it is set
up, I think, since it's currently a single instance, and translators
can run async. An uninstantiated pointer can be opened to translators,
and they can instantiate it in the context where it will be used.

Does that sound about right?

Frank

Frank Bennett

unread,
Oct 11, 2010, 7:46:12 AM10/11/10
to zotero-dev
On Oct 11, 7:46 pm, ziche <zi...@noos.fr> wrote:

> [...] I uploaded a proposal for a database
> table tohttp://github.com/zomark/zotero-marc/blob/master/sql/unicodeScriptMap....
> This contains a normalized form of the scripts list at Unicode.org,
> and allows for script queries like "select script from
> unicodeScriptMap where fromCode>=myCodePoint and toCode<=myCodePoint".

Florian,

I guess the database table is derived from this unicode.org source?

http://www.unicode.org/Public/UNIDATA/Scripts.txt

If so, do you have the code used to generate it? The zls.py script
used to generate the existing tables (the IANA registry, plus the
639-2 tags convertible to 639-1/IANA) pulls the underlying data over
the wire from canonical sources, this could be stitched into the same
script for one-shot updates of the language tables.

Frank

ziche

unread,
Oct 11, 2010, 8:22:34 AM10/11/10
to zotero-dev
> The idea would be to leave the field unconstrained in Zotero, but for
> translator-acquired records, to populate it with a normalized field,
> if possible? For that, the validator can be made available for use
> inside translators. In that case, I should change the way it is set
> up, I think, since it's currently a single instance, and translators
> can run async. An uninstantiated pointer can be opened to translators,
> and they can instantiate it in the context where it will be used.

Actually, I thought you could fit it into the
Zotero.Translate.prototype._itemDone method, as you did with other
language validations - just checking for the "language" field being
set, and running it through the validator before handing it over to
newItem. But I might be missing something here.

Florian

ziche

unread,
Oct 11, 2010, 8:30:23 AM10/11/10
to zotero-dev
> Florian,
>
> I guess the database table is derived from this unicode.org source?
>
> http://www.unicode.org/Public/UNIDATA/Scripts.txt
>
> If so, do you have the code used to generate it?  The zls.py script
> used to generate the existing tables (the IANA registry, plus the
> 639-2 tags convertible to 639-1/IANA) pulls the underlying data over
> the wire from canonical sources, this could be stitched into the same
> script for one-shot updates of the language tables.

I was afraid you would be asking that... no, I did it the quick-and-
dirty non-repeatable way: loaded the Scripts.txt, cleaned it up with
regexps, run it to a quick-and-dirty Java program to collapse ranges,
wherever possible. I am basically illiterate when it comes to Python,
Perl and the like.

I submitted a modified zls.sql including the table:
http://github.com/zomark/zotero-marc/blob/master/sql/zls.sql, and
tried out script sniffing by modifying Marc2.js, translate.js and
multilingual.js (the changes have been submitted). Our sample record
("2007222933" at http://catalog.library.ucla.edu) will translate fine
with this.

Best, Florian

Frank Bennett

unread,
Oct 11, 2010, 9:09:43 AM10/11/10
to zotero-dev
Avram said earlier that he felt it was important to continue to allow
arbitrary content in the Language field. I'm not sure whether the
concern is about legacy data or use cases that require additional or
other content than language tags.

Avram?

>
> Florian

Frank Bennett

unread,
Oct 11, 2010, 9:20:56 AM10/11/10
to zotero-dev
On Oct 11, 9:30 pm, ziche <zi...@noos.fr> wrote:
> > Florian,
>
> > I guess the database table is derived from this unicode.org source?
>
> >http://www.unicode.org/Public/UNIDATA/Scripts.txt
>
> > If so, do you have the code used to generate it?  The zls.py script
> > used to generate the existing tables (the IANA registry, plus the
> > 639-2 tags convertible to 639-1/IANA) pulls the underlying data over
> > the wire from canonical sources, this could be stitched into the same
> > script for one-shot updates of the language tables.
>
> I was afraid you would be asking that... no, I did it the quick-and-
> dirty non-repeatable way: loaded the Scripts.txt, cleaned it up with
> regexps, run it to a quick-and-dirty Java program to collapse ranges,
> wherever possible. I am basically illiterate when it comes to Python,
> Perl and the like.

I'll see what I can do about scripting it.

>
> I submitted a modified zls.sql including the table:http://github.com/zomark/zotero-marc/blob/master/sql/zls.sql, and
> tried out script sniffing by modifying Marc2.js, translate.js and
> multilingual.js (the changes have been submitted). Our sample record
> ("2007222933" athttp://catalog.library.ucla.edu) will translate fine
> with this.
>
> Best, Florian

Frank Bennett

unread,
Oct 11, 2010, 9:43:27 AM10/11/10
to zotero-dev
On Oct 11, 9:30 pm, ziche <zi...@noos.fr> wrote:
[...]
> I submitted a modified zls.sql including the table:http://github.com/zomark/zotero-marc/blob/master/sql/zls.sql, and
> tried out script sniffing by modifying Marc2.js, translate.js and
> multilingual.js (the changes have been submitted). Our sample record
> ("2007222933" athttp://catalog.library.ucla.edu) will translate fine
> with this.

Florian,

I'm not great at optimizations, but looking at the patch to
multilingual.js, it seems like you could use the table content to
generate a set of regexps in cachedLanguages.js, and rely on those for
the sniffing, rather than issuing an SQL lookup for each character. I
see that the function returns as soon as it finds a match, but a
regexp might be faster, particularly with a long string that fails to
match throughout.

Frank

Avram Lyon

unread,
Oct 11, 2010, 9:56:13 AM10/11/10
to zoter...@googlegroups.com
2010/10/11 Frank Bennett <bierc...@gmail.com>:

> Avram said earlier that he felt it was important to continue to allow
> arbitrary content in the Language field.  I'm not sure whether the
> concern is about legacy data or use cases that require additional or
> other content than language tags.

When I started populating the Language field with language tags in
anticipation of the field contents mattering for the Multilingual
branch, I realized that I was being limited in ways of expressing the
language of the underlying real item (book, article, document) that I
didn't really appreciate. Consider a book in multiple languages, or
translated from one to another. These are cases that we could solve in
a rather complex manner with language tags, but the language of the
underlying item isn't necessarily needed for the processor to handle
the metadata (itself language-tagged) in a useful way.

There are cases, unfortunately, when a machine readable representation
of the language of the underlying item would be necessary; I'm
thinking in particular of Russian bibliographic practice where a
bibliography may be split by language. Nonetheless, we can drive such
citation generation on the basis of trusting the "master" version of a
multilingual field to be the language that we are privileging and
likely the language of the document for citation purposes.

MARC does try to provide information on whether an item is a
translation, and the source and target languages, but that kind of
information is not necessary for Zotero to maintain unless we're
trying to make it a fully MARC-interoperable library catalog. Which I
hope is not our goal at this point.

If there's a compelling case to make Language tag-driven, maybe I can
go along with it, but I don't yet see what we gain, and we lose the
ease-of-use of the free-form tag.

Perhaps Dan or someone can let us know how frequently the language tag
is actually used by Zotero users? Or does the Zotero team consider
that to be data that shouldn't be aggregated and disclosed from the
synced databases?

Regards,

Avram

ziche

unread,
Oct 11, 2010, 10:20:33 AM10/11/10
to zotero-dev
> I'm not great at optimizations, but looking at the patch to
> multilingual.js, it seems like you could use the table content to
> generate a set of regexps in cachedLanguages.js, and rely on those for
> the sniffing, rather than issuing an SQL lookup for each character.  I
> see that the function returns as soon as it finds a match, but a
> regexp might be faster, particularly with a long string that fails to
> match throughout.

I am not sure about that.... of course we could build a regexp for
each script, but then for N existing scripts, in the worst case, we
would be running N regexp matches. With the current method, we have a
max of M (indexed) SELECTs for a string of length M, but it is likely
we get a match at the first query. We could even reasonably limit the
number of SELECTs, saying: if you didn' t detect a script after the
first 10 characters, give it up.

Frank Bennett

unread,
Oct 11, 2010, 10:37:33 AM10/11/10
to zotero-dev
Got it. Hmm. We might gain by combining the two facilities -- use a
composite regexp to pull a conforming character out of the string, and
then a single SQL call to test it. Either the regexp fails, or both
succeed.

ziche

unread,
Oct 11, 2010, 3:51:42 PM10/11/10
to zotero-dev
The latest Marc2.js version on github (http://github.com/zomark/zotero-
marc) will import the entire Bnf test set (to be found on github in
the sampledata folder), with multilingual data for titles, places and
publishers. While this is fun and gives some idea of what is possible
(arab, hebrew, chinese, korean and cyrillic entries in there), it
brought up a question Frank alluded to: how much do the translators
need to know about language and script tags, i.e. do they need access
to the ZlsValidator, or a subset of its functions?

My current answer would be yes: consider a MARC translator (but the
same could be true for some other kind of input) encountering two
parallel statements for the place of publication, without any
information about the underlying script. If both statements were
written in the same script, we would assume that the book publisher is
based at two different places, and we would generate a single
item.place property "Place1, Place2". If, on the other hand, the two
statements have different scripts, like Москва and Moskva, we would
rather generate a main and a servant language entry. Currently,
however, the translator has no chance to distinguish these cases,
because the entire script validation and detection functionality is
outside the sandbox. So, Frank's proposal to expose a part of the
Vaidator methods to the sandboxed code (just as it happens with some
of the Zotero.Utilities) should be followed, I think.

Best, Florian

Avram Lyon

unread,
Oct 23, 2010, 11:51:14 AM10/23/10
to zotero-dev
I just discovered that Russian libraries use a dialect of Unimarc that
they call RUSMARC; I ran into it while exploring a translator request
here: http://forums.zotero.org/discussion/2814/4/which-site-translators-would-you-like-to-see-take-2/#Comment_73592

I'll be exploring how much it will take to support it using the
current (and proposed) MARC translators, but it reminded me that we
might want to explore some of these issues -- are there other MARC
dialects out there that we don't know about?

The RUSMARC format is discussed here: http://www.rba.ru/rusmarc/ (in
Russian, short English explanation at
http://www.rba.ru/rusmarc/rusmarc_e.html , examples (in HTML!) at
http://www.rba.ru/rusmarc/soft/examples.htm)

Best,

Avram

Frank Bennett

unread,
Oct 23, 2010, 5:58:55 PM10/23/10
to zotero-dev
Some time back, I misremembered the name Japanese variant as "JMARC":

http://forums.zotero.org/discussion/6939/add-japanese-ndl-to-list/

It turns out I wasn't entirely mistaken: there _is_ a variant, called
JAPAN/MARC. Here is a page with links to the specs (all in Japanese,
unfortunately):

http://www.ndl.go.jp/jp/library/data/jm.html

Frank

On Oct 24, 12:51 am, Avram Lyon <ajl...@gmail.com> wrote:
> I just discovered that Russian libraries use a dialect of Unimarc that
> they call RUSMARC; I ran into it while exploring a translator request
> here:http://forums.zotero.org/discussion/2814/4/which-site-translators-wou...

东东爸

unread,
Oct 23, 2010, 9:58:23 PM10/23/10
to zoter...@googlegroups.com
I find two Chinese MARC variant, called CNMARC and CMARC, used by china mainland and china Taiwan respectively.

Here are some useful links to specs(some in chinese, unfortunately):

CNMARC:

CMARC:
>
> Best,
>
> Avram

--
You received this message because you are subscribed to the Google Groups "zotero-dev" group.
To post to this group, send email to zoter...@googlegroups.com.
To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.




--
Best Regards!

Ace Strong

Frank Bennett

unread,
Oct 24, 2010, 8:16:01 PM10/24/10
to zotero-dev
Oh, boy. Korea has one too, called KORMARC: http://www.nl.go.kr/kormarc/index.html


On Oct 24, 10:58 am, 东东爸 <acestr...@gmail.com> wrote:
> I find two Chinese MARC variant, called CNMARC and CMARC, used by china
> mainland and china Taiwan respectively.
>
> Here are some useful links to specs(some in chinese, unfortunately):
>
> *CNMARC:*http://lib.jlu.edu.cn/tgw/%E4%B8%AD%E6%96%87%E5%9B%BE%E4%B9%A6CNMARC%...http://www.libsoft.cn/download/Help/CNMARC.doc
>
> *CMARC:*http://archive.ifla.org/IV/ifla72/papers/77-Mao_Hsu-en.pdfhttp://www.lins.fju.edu.tw/mao/works/CMARC%2BUNICODE.pdf
>
> On Sun, Oct 24, 2010 at 5:58 AM, Frank Bennett <biercena...@gmail.com>wrote:
>
>
>
> > Some time back, I misremembered the name Japanese variant as "JMARC":
>
> >http://forums.zotero.org/discussion/6939/add-japanese-ndl-to-list/
>
> > It turns out I wasn't entirely mistaken: there _is_ a variant, called
> > JAPAN/MARC.  Here is a page with links to the specs (all in Japanese,
> > unfortunately):
>
> >http://www.ndl.go.jp/jp/library/data/jm.html
>
> > Frank
>
> > On Oct 24, 12:51 am, Avram Lyon <ajl...@gmail.com> wrote:
> > > I just discovered that Russian libraries use a dialect of Unimarc that
> > > they call RUSMARC; I ran into it while exploring a translator request
> > > here:
> >http://forums.zotero.org/discussion/2814/4/which-site-translators-wou...
>
> > > I'll be exploring how much it will take to support it using the
> > > current (and proposed) MARC translators, but it reminded me that we
> > > might want to explore some of these issues -- are there other MARC
> > > dialects out there that we don't know about?
>
> > > The RUSMARC format is discussed here:http://www.rba.ru/rusmarc/(in
> > > Russian, short English explanation athttp://
> > > Best,
>
> > > Avram
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "zotero-dev" group.
> > To post to this group, send email to zoter...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > zotero-dev+...@googlegroups.com<zotero-dev%2Bunsu...@googlegroups.com>
> > .

ziche

unread,
Nov 2, 2010, 4:21:20 AM11/2/10
to zotero-dev
Thanks for all these MARCs. I spent a couple of days in Greece and
have not been available for looking into this (no, I did not locate
any signs of GREEK/MARC, but you made me feel sure it must exist). It
might be helpful if we can locate library sites actually using and
providing these formats for testing integration in Zotero's MARC
framework.

Best, Florian

On Oct 25, 1:16 am, Frank Bennett <biercena...@gmail.com> wrote:
> Oh, boy. Korea has one too, called KORMARC:http://www.nl.go.kr/kormarc/index.html
>
> On Oct 24, 10:58 am, 东东爸 <acestr...@gmail.com> wrote:
>
> > I find two Chinese MARC variant, called CNMARC and CMARC, used by china
> > mainland and china Taiwan respectively.
>
> > Here are some useful links to specs(some in chinese, unfortunately):
>
> > *CNMARC:*http://lib.jlu.edu.cn/tgw/%E4%B8%AD%E6%96%87%E5%9B%BE%E4%B9%A6CNMARC%...
>
> > *CMARC:*http://archive.ifla.org/IV/ifla72/papers/77-Mao_Hsu-en.pdfhttp://www....

Avram Lyon

unread,
Nov 2, 2010, 1:05:30 PM11/2/10
to zoter...@googlegroups.com
2010/11/2 ziche <zi...@noos.fr>:

> Thanks for all these MARCs. I spent a couple of days in Greece and
> have not been available for looking into this (no, I did not locate
> any signs of GREEK/MARC, but you made me feel sure it must exist). It
> might be helpful if we can locate library sites actually using and
> providing these formats for testing integration in Zotero's MARC
> framework.

RUSMARC is used by the National Library of Russia:
http://www.nlr.ru/ , catalogs at http://www.nlr.ru:8101/poisk/index.html
No direct link. Example data at
http://github.com/ajlyon/zotero-bits/raw/master/RUSMARC.sample1

- Avram

Reply all
Reply to author
Forward
0 new messages