Issue 102 in chmsee: Encoding problem with TOC and Index

8 views
Skip to first unread message

chm...@googlecode.com

unread,
Dec 15, 2010, 4:14:30 AM12/15/10
to chm...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 102 by dmitriy.trt: Encoding problem with TOC and Index
http://code.google.com/p/chmsee/issues/detail?id=102

Chmsee doesn't use correct encoding for TOC and index no matter if I
choose "Auto" or set correct encoding by hand. File encoding is
Windows-1251.

You can download CHM file for testing from here:
http://www.lenininc.com/soft/webdes_ru.chm

Attachments:
Выделение-8b0.png 81.1 KB
Выделение-5f5.png 87.2 KB

chm...@googlecode.com

unread,
Dec 23, 2010, 3:28:26 AM12/23/10
to chm...@googlegroups.com
Updates:
Status: Accepted
Owner: jungleji

Comment #1 on issue 102 by jungleji: Encoding problem with TOC and Index
http://code.google.com/p/chmsee/issues/detail?id=102

The TOC and index contents are generated from .hhc and .hhk files.
In this webdes_ru.chm file, they are Contents.hhc and Index.hhk locating
in the extracted ../bookshelf/2e47ef.../ directory.

I examined the strings in these two file and found that they are
composed of character entities,
e.g. the first string:

"Ñîäåðæàíèå",

after decoding, it goes to

"Ñîäåðæàíèå"

I tried to convert the result with "WINDOWS-1251" encode again,
but it still remained the same form.

Do you have any experience to deal with this kind of encoding?


chm...@googlecode.com

unread,
Sep 21, 2012, 5:13:35 AM9/21/12
to chm...@googlegroups.com

Comment #2 on issue 102 by gpse...@gmail.com: Encoding problem with TOC and
Index
http://code.google.com/p/chmsee/issues/detail?id=102

I have the same trouble.
In additional, if CHM file contents built from non-ascii-named files they
cannot be opened with message:
Can not find link target file
at "/home/pseudo/.cache/chmsee/bookshelf/995653064118f2c2e6a06ff6e373c31e/ñîäåðæàíèå.htm"
Reall file name:
/home/pseudo/.cache/chmsee/bookshelf/995653064118f2c2e6a06ff6e373c31e/содержание.htm
It seems chmsee interprets all filenames and titles as iso8859-1 coding. It
would be great to interpret them as current locale-coded.

chmsee 1.3.0-2ubuntu2

Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS
Release: 12.04
Codename: precise

locale uk_UA.UTF-8.

chm...@googlecode.com

unread,
Sep 22, 2012, 3:13:25 AM9/22/12
to chm...@googlegroups.com

Comment #3 on issue 102 by jungl...@gmail.com: Encoding problem with TOC
and Index
http://code.google.com/p/chmsee/issues/detail?id=102

Hi gpseudo,

Thank you remind me about this issue, I just checked with latest
chmsee(v1.99.14), the bug still there. I will try to fix it with converting
string by locale later.

chm...@googlecode.com

unread,
Sep 25, 2012, 8:39:53 AM9/25/12
to chm...@googlegroups.com

Comment #4 on issue 102 by jungl...@gmail.com: Encoding problem with TOC
and Index
http://code.google.com/p/chmsee/issues/detail?id=102

I added a converting which based the locale from chm file.
Now the TOC looks better, but INDEX is still has some mess there.

The modification already committed, you can get it from:

git://github.com/jungleji/chmsee.git

Attachments:
sample_toc.png 141 KB
sample_index.png 145 KB

Reply all
Reply to author
Forward
0 new messages