Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

[Bug] Inconsistent return value for selected text depending on method

40 views
Skip to first unread message

Ben Jaeger

unread,
Aug 23, 2024, 7:19:26 PM8/23/24
to zotero-dev
[Copied from zotero forums post]

First discovered by a user in an issue (https://github.com/ImperialSquid/zotero-zotts/issues/87) on my plugin.

There are multiple ways to get the selected text in a paper, however some return text that is different to the original, not visually it seems, but it will mess with the unicode encoding of the text (which is meaningful for TTS, or any downstream code doing some form of text matching etc)

The user provided this PDF for texting https://github.com/user-attachments/files/16676094/A.pdf

If you select the Å character,
reader._iframeWindow.getSelection().getRangeAt(0).toString()
returns the correct U+00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE) character

However,
reader._lastView._selectionPopup.annotation.text
returns two characters, U+0041 (LATIN CAPITAL LETTER A) and U+030A (COMBINING RING ABOVE)

(reader is just shorthand for the internal reader of the currently selected reader tab)

I haven't done any digging into why this occurs, nor have I checked what other ways there are of getting text out from Zotero to see if it occurs elsewhere (.getDisplayTitle() on library items and .text and .comment fields on sidebar annotations in the reader are all safe since the original user reported them as being fine.)

Ben Jaeger

unread,
Aug 24, 2024, 12:09:34 AM8/24/24
to zotero-dev
My mistake, after doing some more reading about unicode and finding String.prototype.normalise() in MDM, I see that Zotero normalises to NFC across the board, whereas NFD is what was most useful to me. 

So in reality, the annotation.text version isn't true to the original text, but it is the encoding Zotero uses consistently, whereas the _iframeWindow.getSelection() version is more like a hacky work around since it avoids Zotero's normalisation.

Somewhat jumping the gun on my part to raise it as an issue when a bit more digging shows it's all intentional, sorry all!
Reply all
Reply to author
Forward
0 new messages