FileReader object seems to default to UTF-8 encoding when "iso-8859-2" is used.

1,431 views
Skip to first unread message

Jared Crain

unread,
Nov 13, 2013, 5:58:03 PM11/13/13
to chromiu...@chromium.org
Hi, All.

I am trying to understand the origin of some unexpected behavior in our web app whereby content encoded as encoding A, but read in as if it were encoding B displays correctly as though the content were actually encoding B (while I am expecting it to fail to render correctly).

My specific situation: we allow users to read a file into our app, and this is implemented using the FileReader object's function ReadAsText. Additionally, we allow the user to declare a specific encoding for the incoming file. The specific bug happens when a user is importing a file that is actually encoded as UTF-8, but they are telling the system that it is iso-8859-2 (Central European ISO). When I call "ReadAsText," I pass the string "iso-8859-2" as the second parameter. I would expect the content to fail to render correctly, but, instead, the result property of the FileReader reveals that the content looks just fine (as though the FileReader read the file in knowing it was UTF-8). I see the same behavior in Chrome and Chromium.

I know that the FileReader defaults to UTF-8 encoding when no encoding is passed, and I believe the FileReader defaults to UTF-8 encoding when it doesn't know the encoding value that is passed as a parameter. Can anyone tell me if Chrome or Chromium does not currently support iso-8859-2, or where I might be able to find that information? Alternatively, is there a way to identify which encoding the FileReader object is using to read the content once ReadAsText is called?

Thanks!

Joshua Bell

unread,
Nov 14, 2013, 11:52:44 AM11/14/13
to Jared Crain, Chromium HTML5
Does the UTF-8 encoded file happen to contain a byte-order mark (BOM) at the start? In UTF-8 that would be the 3 byte sequence 0xEF 0xBB 0xBF

I'm not intimately familiar with this API, but many encoding paths treat an explicitly passed encoding label as a "suggestion" and any BOM in the data as definitive; this is done when loading pages off the web as it is required for Web compatibility. (Servers very frequently pass incorrect encodings in headers.)


--
You received this message because you are subscribed to the Google Groups "Chromium HTML5" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-html...@chromium.org.
To post to this group, send email to chromiu...@chromium.org.
Visit this group at http://groups.google.com/a/chromium.org/group/chromium-html5/.
For more options, visit https://groups.google.com/a/chromium.org/groups/opt_out.

Åsmund Wego

unread,
May 10, 2017, 7:41:40 AM5/10/17
to Chromium HTML5, jaredca...@gmail.com
What is the content of your file? There is not a big difference between the different text encodings. If all the characters are in the low range of the ASCII table (0-127), there will be no changes, and if a file with UTF-8 encoding is read using iso-8859-2 encoding, the result will be the same. You will only see errors in the loaded content if there are characters outside the lower ASCII range. 
Reply all
Reply to author
Forward
0 new messages