Handling of HTML in notes during export with RIS/MODS

46 views
Skip to first unread message

skornblith

unread,
May 27, 2011, 4:10:16 PM5/27/11
to zotero-dev
Zotero handles notes internally as HTML, but, at the moment, it also
exports notes as HTML. While this is fine for RDF, it really shouldn't
be happening for BibTeX (and it's not anymore, as of a few minutes
ago), and we're not sure whether this is really what we want for RIS
and MODS export. There are several options:

1) Export notes as HTML.
2) Export notes as plain text. This breaks round-tripping of rich text
formatting.
3) Replace <p> and <br/> tags with newlines, but leave other HTML
formatting intact, so that HTML is only present if there was rich text
in the note, and rich text formatting still round-trips.
4) Show an option to determine whether notes should be exported as
HTML or plain text on export.

With options (1) and (4), we could determine whether a note was HTML
on import by checking for the presence of a <p> tag. With option (3),
plain text notes would be imported properly as long as they don't
contain a < or >.

Any opinions?

Bruce D'Arcus

unread,
May 27, 2011, 4:56:27 PM5/27/11
to zoter...@googlegroups.com
On Fri, May 27, 2011 at 4:10 PM, skornblith <si...@simonster.com> wrote:

> Zotero handles notes internally as HTML, but, at the moment, it also
> exports notes as HTML. While this is fine for RDF, it really shouldn't
> be happening for BibTeX (and it's not anymore, as of a few minutes
> ago), and we're not sure whether this is really what we want for RIS
> and MODS export.

MODS and RIS are kind of apples and oranges WRT to HTML.

MODS Is an XML language that should allow embedding HTML (or XHTML) in
its notes element, but I'm not sure if it does (?). If it doesn't,
then you need to strip the markup entirely. In any case, this isn't
really up to us, but to how the MODS schema is defined.

RIS is just a plain text format without any validation infrastructure
around it. So we can do whatever we want.

> There are several options:
>
> 1) Export notes as HTML.
> 2) Export notes as plain text. This breaks round-tripping of rich text
> formatting.
> 3) Replace <p> and <br/> tags with newlines, but leave other HTML
> formatting intact, so that HTML is only present if there was rich text
> in the note, and rich text formatting still round-trips.
> 4) Show an option to determine whether notes should be exported as
> HTML or plain text on export.
>
> With options (1) and (4), we could determine whether a note was HTML
> on import by checking for the presence of a <p> tag. With option (3),
> plain text notes would be imported properly as long as they don't
> contain a < or >.
>
> Any opinions?

I'd prefer #4, where the preference could also include #3 as an
option: full HTML, plain + inline HTML, and plain.

Bruce

Dan Stillman

unread,
May 27, 2011, 4:59:20 PM5/27/11
to zoter...@googlegroups.com
On 5/27/11 4:56 PM, Bruce D'Arcus wrote:
> On Fri, May 27, 2011 at 4:10 PM, skornblith<si...@simonster.com> wrote
>> Zotero handles notes internally as HTML, but, at the moment, it also
>> exports notes as HTML. While this is fine for RDF, it really shouldn't
>> be happening for BibTeX (and it's not anymore, as of a few minutes
>> ago), and we're not sure whether this is really what we want for RIS
>> and MODS export.
> MODS and RIS are kind of apples and oranges WRT to HTML.
>
> MODS Is an XML language that should allow embedding HTML (or XHTML) in
> its notes element, but I'm not sure if it does (?). If it doesn't,
> then you need to strip the markup entirely. In any case, this isn't
> really up to us, but to how the MODS schema is defined.

It's not up to us whether we include raw HTML in MODS. But if it doesn't
allow it, it's still up to us whether we include escaped HTML.

Bruce D'Arcus

unread,
May 27, 2011, 5:01:53 PM5/27/11
to zoter...@googlegroups.com

I'm in the "escaped markup is evil" camp, so that's where I stand.

I'll post a note on the MODS list about this.

Bruce

Richard Karnesky

unread,
May 27, 2011, 6:15:18 PM5/27/11
to zotero-dev
You had asked a few years ago:
http://listserv.loc.gov/cgi-bin/wa?A2=ind0407&L=MODS&P=R273&I=-3

the extension element would allow rich text notes. I'd agree with the
example: give plain text notes in MODS (for use by other clients), but
also include the rich text as an extension. This is obviously some
amount of duplication/size bloat in exported files, but it seems the
advantages outweigh this.

--Rick

Bruce D'Arcus

unread,
May 27, 2011, 6:26:37 PM5/27/11
to zoter...@googlegroups.com


On May 27, 2011 6:15 PM, "Richard Karnesky" <karn...@gmail.com> wrote:
>
> You had asked a few years ago:
> http://listserv.loc.gov/cgi-bin/wa?A2=ind0407&L=MODS&P=R273&I=-3
 

Doh!  How embarrassing.

> the extension element would allow rich text notes.  I'd agree with the
> example: give plain text notes in MODS (for use by other clients), but
> also include the rich text as an extension.  This is obviously some
> amount of duplication/size bloat in exported files, but it seems the
> advantages outweigh this.

Yes.

Bruce

> --Rick
>
> --
> You received this message because you are subscribed to the Google Groups "zotero-dev" group.
> To post to this group, send email to zoter...@googlegroups.com.
> To unsubscribe from this group, send email to zotero-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/zotero-dev?hl=en.
>

Otto Mättas

unread,
Sep 28, 2021, 4:59:15 PM9/28/21
to zotero-dev
Hello devs!

I am digging up this old discussion as reading through it does not bring closure for me.
Namely, we still have to deal with <p></p> tags around notes on RIS export.

We are building a systematic review (https://github.com/asreview/asreview) tool which uses the Notes field (from a RIS file) for labelling the records. We wish to have a circular workflow, Zotero serving as the reference manager, ASReview as the tool for review. In order to achieve this, we have to work around the issue by processing the RIS file coming from Zotero.

Would you consider looking at this issue once more to see if possible to export a RIS file as plain text (as looks to be the current standard, also with some of your commercial competitors.)? Thanks in advance and let me know if I can help you help us.

Seize the day,
Otto
Reply all
Reply to author
Forward
0 new messages