Unicode normalization

36 views
Skip to first unread message

ldd

unread,
Dec 11, 2007, 8:48:23 AM12/11/07
to zotero-dev
Zotero needs to normalize its data to a consistent Unicode
normalization, preferably NFC. I've detected the problem by doing
this:

1. Exported bibliography from Refworks to Bibtex format.

2. Visually inspected the resulting file. Accents were consistently
represented everywhere (as they would after an NFC normalization).

3. Imported the Bibtex file into Zotero. Everything appeared normal
after inspecting records.

4. Imported an entry from the Library of Congress into Zotero.

5. Exported all entries to a Bibtex file in Unicode format. (See my
post here:

http://groups.google.com/group/zotero-dev/browse_frm/thread/61a52b5012bccd3c
)

6. Upon inspection of the exported file, I see that all entries are
exported properly except for the entry I imported from the Library of
Congress. In the entry from Library of Congress, all accented
characters appear in their Unicode decomposition.

Deductions:
1. Given that all entries except for the LoC entry are consistently
and correctly coded, the problem is neither in the Bibtex import nor
in the Bibtex export.
2. Since the only faulty entry is the LoC entry, it is likely that the
LoC encodes accented characters in a decomposed fashion or that the
import filter used to import entries from the LoC creates decomposed
Unicode data which is passed to the rest of Zotero.

Suggested fix: After a filter is used to import data into Zotero, the
core Zotero code should perform a final pass on the data to normalize
it to NFC.

Dan Stillman

unread,
Dec 11, 2007, 3:24:23 PM12/11/07
to zoter...@googlegroups.com
On 12/11/07 8:48 AM, ldd wrote:
> Zotero needs to normalize its data to a consistent Unicode
> normalization, preferably NFC. I've detected the problem by doing
> this:
>
> ...
>
>

I've created a ticket for this: https://www.zotero.org/trac/ticket/865

Could you attach to that ticket an example BibTeX file and the URL for
(or description of, if they're not static) a LoC entry that exhibits
this behavior?

If you need an SVN/Trac account, you can create one here:
https://www.zotero.org/dev/trac_access

Thanks.

Rick Karnesky

unread,
Sep 9, 2014, 3:59:37 PM9/9/14
to zoter...@googlegroups.com
I'm bumping this old thread because I'm not sure where the issue is currently tracked and to bring people's attention to the normalization features now present in Firefox 31: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
Older versions can potentially use unorm: https://github.com/walling/unorm

So...do these seem compelling enough to start tackling this issue?  If so, where should the normalization be done?  I suspect that implementing this on a translator-by-translator basis is less-than-ideal.

--Rick

Aurimas Vinckevicius

unread,
Sep 9, 2014, 4:12:06 PM9/9/14
to zoter...@googlegroups.com


--
You received this message because you are subscribed to the Google Groups "zotero-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zotero-dev+...@googlegroups.com.
To post to this group, send email to zoter...@googlegroups.com.
Visit this group at http://groups.google.com/group/zotero-dev.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages