ERR011: Unable to parse HTML element

Peng Yu

unread,

Nov 13, 2011, 3:39:21 PM11/13/11

to chm2pdf

Hi,

~$ chm2pdf --version
/usr/bin/chm2pdf version 0.9.1
This is free software; see the source for copying conditions. There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

chm2pdf --color --continuous

I use the above command to convert the following chm file. But I got
ERROR11. Does anybody know what is wrong?

https://rapidshare.com/files/2363027281/SSHDG_Barrett2ed.chm

ERR011: Unable to parse HTML element on line 24!
ERR011: Unable to parse HTML element on line 2!
ERR011: Unable to read image file "/tmp/tmpD8Xixf/SSHDG_Barrett2ed/"!
ERR011: Unable to parse HTML element on line 29!
ERR011: Unable to parse HTML element on line 2!
ERR011: Unable to parse HTML element on line 1!
PAGES: 700
BYTES:
5992956
Something wrong happened when launching htmldoc.
exit value: 1536
Check if output exists or if it is good.
Done.

Regards,
Peng

Reto

unread,

Nov 14, 2011, 5:04:33 AM11/14/11

to chm2pdf

Hi Peng!

As far as I know, you should use at least --book or --webpage option!
Have you tried this?

Ciao
Reto

Peng Yu

unread,

Nov 14, 2011, 9:51:05 AM11/14/11

to chm...@googlegroups.com

> As far as I know, you should use at least --book or --webpage option!
> Have you tried this?

No. It doesn't help.

~$ ./chm2pdf.sh /tmp/SSHDG_Barrett2ed.chm

ERR011: Unable to parse HTML element on line 24!
ERR011: Unable to parse HTML element on line 2!

ERR011: Unable to read image file "/tmp/tmpiGcdid/SSHDG_Barrett2ed/"!

ERR011: Unable to parse HTML element on line 29!
ERR011: Unable to parse HTML element on line 2!
ERR011: Unable to parse HTML element on line 1!

ERR002: Error: no pages generated! (did you remember to use webpage mode?

Something wrong happened when launching htmldoc.

exit value: 1792

Check if output exists or if it is good.
Done.

~$ cat chm2pdf.sh
#!/usr/bin/env bash

if [ $# -ne 1 ]
then
echo "usage: `basename $0` <chm_file>"
exit 1
fi

chm2pdf --color --book "$1"

--
Regards,
Peng

Reto

unread,

Nov 14, 2011, 4:35:47 PM11/14/11

to chm2pdf

I get the same result here.
If you look in the /tmp/chm2pdf/orig/ directory after the extraction,
you'll see that some files (like sshtdg2-CHP-10-SECT-6.html) have zero
length!
Others, (like sshtdg2-CHP-1.html) start in the middle of some html
without the normal header!
So already the extraction of the content of chm went wrong....

You could try to decompile the CHM with HMTL Help Workshop 4.74.8702.0
in windows,
copy to /tmp/chm2pdf/orig/
and then run chm2pdf with the --dontextract option?

Regards,
Reto

Reto

unread,

Nov 14, 2011, 4:46:29 PM11/14/11

to chm2pdf

Hi everybody!

I was able to eliminate one of my ERR011: Unable to parse HTML element
on line xx! errors.

My CHM file contained some javascript, but no effort is done in
chm2pdf to delete javascript (some other unwanted stuff is deleted
before passing all to the htmldoc part).

I am no expert of regex, so the following may not be a good solution,
but at least in my case one ERR011 is gone!

# Delete javascript (<script type='text/javascript'>...</script>)
page=re.sub('(?i)<script type=("|\')text/javascript("|\')
(.*?)>(.*?)</script>','', page, flags=re.DOTALL|re.MULTILINE)

Reto

unread,

Nov 15, 2011, 9:57:14 AM11/15/11

to chm2pdf

Decompiling with HTML Workshop there are some chapters missing.
The original CHM file seems corrupt: I am not able to open it on
windows xp.... so no wonder chm2pdf will get nervous on this one!
I

Peng Yu

unread,

Nov 15, 2011, 10:10:37 AM11/15/11

to chm...@googlegroups.com

2011/11/15 Reto <reto....@gmail.com>:

>
> Decompiling with HTML Workshop there are some chapters missing.
> The original CHM file seems corrupt: I am not able to open it on
> windows xp.... so no wonder chm2pdf will get nervous on this one!

I have changed the file name. XP is known to have problem opening
renamed chm files. But the file opens fine on mac.

--
Regards,
Peng

Reto

unread,

Nov 15, 2011, 4:14:48 PM11/15/11

to chm2pdf

On this page http://www.64bitjungle.com/ubuntu/viewing-chm-files-and-converting-chm-to-html-or-pdf-files-in-linux/
2 conversion are described.

Conversion Method 1: CHM -> HTML (-> PDF)
Firstly, it is possible to simply decompile the CHM file into
component HTML files, which can be opened in any web browser. These
HTML files may then optionally be transformed into a PDF document. In
order to do this, two main packages (with dependencies) need to be
installed – chmlib, and htmldoc: sudo apt-get install libchm-bin
htmldoc
The first part of the process calls upon chmlib to essentially
decompile the CHM file, and save the new files to a specified
directory, for example: extract_chmLib my_chm_book.chm htmloutputdir
This will pull apart the CHM file, and store all the new HTML files
within the “htmloutputdir” directory (within a sub-directory called
“final”). If desired, htmldoc can be called upon to convert the HTML
files into a single PDF document. Running htmldoc & from the Terminal,
opens up the htmldoc GUI from which the HTML files can be selected for
input, the output formatted, and PDF document generated. The htmldoc
website has extensive documentation which covers this process, but for
converting to PDF, I find the next method much easier!

Conversion Method 2: CHM -> PDF with chm2pdf
...well, this you already tried.

Or you may combine the 2: extract with method one, then continue with
chm2pdf with the option --dontextract

Reply all

Reply to author

Forward