But by all means post the code, and we'll figure out how to best
integrate into Zotero.
- Avram
2010/11/20 jonathan.morgan <jonathan....@gmail.com>:
> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To post to this group, send email to zoter...@googlegroups.com.
> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.
>
>
Avram
2010/11/21 jonathan.morgan <jonathan....@gmail.com>:
Just a random thought:
It occurs to me it might be helpful to publicly document a mapping
table for these different formats. Probably the authoritative source
would have to be the C-based source for bibutils, which has content
like:
static lookups article[] = {
{ "AU", "AUTHOR", PERSON, LEVEL_MAIN },
{ "A1", "AUTHOR", PERSON, LEVEL_MAIN },
{ "A2", "AUTHOR", PERSON, LEVEL_HOST },
{ "A3", "AUTHOR", PERSON, LEVEL_SERIES },
{ "ED", "EDITOR", PERSON, LEVEL_HOST },
{ "PY", "PARTYEAR", DATE, LEVEL_MAIN },
{ "Y1", "PARTYEAR", DATE, LEVEL_MAIN },
{ "Y2", "PARTMONTH", SIMPLE, LEVEL_MAIN },
{ "SN", "SERIALNUMBER", SERIALNO,LEVEL_HOST },
{ "TI", "TITLE", TITLE, LEVEL_MAIN },
{ "T1", "TITLE", TITLE, LEVEL_MAIN },
{ "T2", "SHORTTITLE", SIMPLE, LEVEL_HOST },
{ "T3", "TITLE", SIMPLE, LEVEL_SERIES },
{ "JO", "TITLE", SIMPLE, LEVEL_HOST }, /* JOURNAL */
......
A couple of little details (I haven't looked at the rest, and haven't
really followed the conversation):
First, on GitHub, it will automatically render markdown documents if
you give it an appropriate extension (IIRC, both "mdml" and "markdown"
should work).
Second, not super convenient longer-term to dump documentation in an
excel file. FWIW, every project wiki on GH is also its own repo of
markdown files. That can be a place to put documentation in some
cases.
Bruce
That said, you should certainly use whatever technology makes the job
of understanding the morass of Endnote exports any easier. When you've
worked it out, it'd be great if you could document it. That
documentation would hopefully be in some format that is easy to share
and maintain, lest you get stuck with maintaining the RIS export
documentation in perpetuity. Thus, that format could be something
based on plain text.
I look forward to your help explaining Endnote's exports, and to
improving interoperability.
Best wishes,
Avram
2010/11/23 jonathan.morgan <jonathan....@gmail.com>:
On Mon, Nov 29, 2010 at 10:22 AM, jonathan.morgan
<jonathan....@gmail.com> wrote:
> Understood. I'll definitely document once I understand better what is
> going on (and I think it will be interesting to have it all laid out
> there for people). You and Bruce both have actually been quite
> welcoming and helpful, and I just wanted you to know I appreciate it,
> also!
Avram's interpretation of my comments were correct, and I have the
same take as he (if excel is useful for you now, no problem).
At some point I guess I'm just shooting for extracting the mappings
from bibutil's source into something human readable (a nice table),
and then working from that. E.g. the hard work is actually the logic
of the mappings, not the parsing code.
BTW, there are two classes of issues with Endnote RIS export:
1) bugs (the infamous KW bug where multiple keywords get a single tag)
2) mappings
Bruce
If you can't do this or don't want to do this yourself, is there any chance you can mark it as a TODO in the appropriate place in the code you write please?
This isn't so much for me personally. But ...
> In trying to figure out authors, it has proven
> helpful for me to be able to look at a list of all EndNote reference
> types, the RIS tags used in export, and the EndNote fields to which
> they map, sorted by reference type, then by RIS tag; and then to be
> able to look at the same list, resorted by RIS tag, then by EndNote
> reference type, so I can see the way EndNote uses a given tag across
> reference types; and then to be able to look at the same list, sorted
> by EndNote field, then by RIS tag, to detect variations in export of a
> given EndNote field across reference types. I could make these views
> into HTML (though since they are really the same data and there are
> 1793 rows, they'd probably get out of sync pretty quickly if the
> output style changes).
>
> I also have a tab in the spreadsheet where each RIS tag has a series
> of columns indicating whether it is in the true spec (old and outdated
> though it may be), used by EndNote, and then a description, link to
> the page where it is specified in the RefMan spec, etc. I could make
> this HTML as well.
>
> Part of the usefulness is the sortability, but you could always keep a
> copy of the spreadsheet around for that.
>
> A few questions:
> - in the EDBOOK type exported from EndNote, the AU field is used for
> editors, not for authors,
Is that because EndNote doesn't support authors for that type?
> but "EDBOOK" maps to "book" just the same as
> "BOOK" does (and before I rewrote the mappings, it just mapped to
> generic since "BOOK" took up the one slot for mapping to "book" in
> zotero). To address this, I started storing the original reference
> type in the item along with the zotero item type, and then for an
> author, in the section for books, I check if EDBOOK, and if EDBOOK, I
> make AU create editors instead of authors. Does this sound about
> right?
Well, if you look at the bibutils mappings I earlier posted:
{ "A2", "AUTHOR", PERSON, LEVEL_HOST },
{ "A3", "AUTHOR", PERSON, LEVEL_SERIES },
{ "ED", "EDITOR", PERSON, LEVEL_HOST },
So he's mapping the "ED" tag to his internal "EDITOR" variable. But
this for an article, and I don't see anything for an "EDBOOK" type. Is
that actually a standard RIS type?
In any case, I would probably interpret that as a bug in EndNote and
map the AU tag to editor.
> - Authors are different for book sections, too - EndNote uses the A3
> tag (series author, standard part of Refman spec) to hold the series
> editor in these reference types: Conference Proceedings, Book Section,
> Audiovisual Material, Serial, Electronic Book Section; but uses A2 for
> series editors in these types: Book, Computer Program, Edited Book,
> Report
> Map, Web Page, Online Multimedia, Classical Work.
So to correct you a bit, EndNote uses the A3 tag to hold a variety of
non-primary contributors. In other words, they're abusing the spec a
bit in places, I think. The "publisher" one for reports seems odd.
Dan will have to comment on this.
Bruce
I think that a simple tab-delimited plain-text table would be great;
it could be easily maintained and distributed, and it can be imported
into Excel rather easily, or manipulated into mappings or re-sorted
using basic command-line tools or text editors.
>> I am about to the point where I can bust my references out of EndNote
>> (you can assign a tag to mulitple records easily in zotero! I can
>> even get groups migrated in an hour or so!), so please let me know if
>> there is a chance this code could be integrated, and if so, how I can
>> help. If not, I'll probably just put a post on the zotero forums to
>> let people know they are free to play with it if they want to and I'll
>> help if I can, but I don't want to implement and maintain a forked
>> version of the RIS importer. If I'm going to stay involved, I'd
>> rather see if I can help make the trunk one work better within the way
>> you want it implemented, or help work on a better interoperability
>> dialect (and I think the importers would benefit from using a
>> framework at least similar to the one in this file that separates
>> overall control flow from mapping logic, instead of having a list of
>> if-then-else statements that is overall control flow with mapping
>> logic nested inside, for it to be easily and reliably maintained over
>> time - it is risky having the control flow for the import process so
>> tightly coupled to processing of the mappings - better to have the two
>> split out, so changes to mappings don't inherently also involve
>> changes to control flow that need to be tested, as well).
I think it should be possible to make this into a framework with
support for the core standard as well as the Endnote dialect; we're
hopefully moving in this direction with the MARC translator, which
runs into rather similar issues of data providers using and abusing
the spec in different ways (as well as multiple specs!).
It should be possible to set the RIS dialect (and thus the logic and
mappings) by using translator options, so users can specify what kind
of RIS they have in the import/export dialogs. Smart dialect sniffing,
if possible, would be great too (the MARC translator currently
attempts to sniff which MARC spec is being used).
I can't speak for Dan and the core team, but I would be very glad to
see your reworked code make it into the trunk if it can make
translator maintenance more pleasant and allow us to correctly import
more dialects of RIS. Out of respect for specifications, I'd want
Zotero's RIS export (at least in its default setting) to adhere to the
spec as we understand it, but added flexibility in import and
customization would be wonderful.
Regards,
Avram
...
> - made it so keywords are split on semi-colons as well as newlines.
> It seems like there is some confusion as to Refman standard - some
> fields expect semi-colon delimited lists on one line, some expect
> multiple tags, one item per tag (like keywords). I had references in
> EndNote that were on multiple lines, and each line had multiple tags
> separated by semi-colons (not sure if it was originally one line, and
> EndNote made it more lines, or if it came like that from external
> database).
For the longest time (years and years), Endnote had a bug where it
output multiple keywords as:
KW - one kw
two kw
three kw
I'm pretty certain correct behavior with RIS is ....
KW - one kw
KW - two kw
KW - three kw
... and that ...
KW - one kw; two kw; three kw
... would also be a bug of sorts (if less onerous than the first one).
Bruce