Multiple pages problem

34 views
Skip to first unread message

Reto

unread,
Nov 6, 2011, 1:30:06 PM11/6/11
to chm2pdf
Ciao a tutti!

I have a CHM file which I generate myself that I want to deploy in a
PDF format (to password protect it) and thus I found the chm2pdf
script.
As I am not familiar with linux so I first tried to install under
windows but then decided it is easier to install Ubuntu on my PC and I
got the CHM2PDF 0.9.1.1ubuntu5 from the software center - this worked
flawless.

I am experimenting with the conversion of the CHM file and have
different kind of problems:
- I get some error code but the output file is there. How can I find
out where the error is?
ERR011: Unable to parse HTML element on line 105!
ERR011: Unable to parse HTML element on line 49!
PAGES: 389
BYTES:
3614179
Something wrong happened when launching htmldoc.
exit value: 512
Check if output exists or if it is good.
Done
- not all images are rendered in the PDF or they are rendered bad.
Well, some may be my fault as they are GIF with transparency, and if I
convert them to plain black and white they are there in the PDF. As I
am able to regenerate the CHM, this is not a big problem. But others I
don't understand why they are not rendered (could it be that the
problem is that the image is e.g 1511x90 but I embed it as width="755"
height="47" ?). I also tried to change format (GIF, PNG) but same
problem - next I'll try to change resolution....
-some pages are there more than once.
- I tried the --verbose option but didn't find out much more.... just
saw that the multiple pages are already there when correcting links
in the HTML files.

Has someone had similar problems or suggestions what to do next?
Kind regards
Reto

Reto

unread,
Nov 10, 2011, 12:53:35 PM11/10/11
to chm2pdf
Hi again!

I tried a simpler file, with just 4 pages and some cross-links between
them.
I can send this by email to every one who is willing to help me....
(got no reply until now.... anyone here?)

As you can see, the four pages are correclty extracted (p1.htm to
p4.htm)
But to the htmldoc, 11 pages are passed. So definitly this seems an
error of the chm2pdf script...

For the images problem, I was not able to repeat it on this simplified
file, only bmp is not rendered well in PDF

Here the extract of the --verbose option:

Example.chm:
--> /#IDXHDR
--> /#ITBITS
--> /#IVB
--> /#STRINGS
--> /#SYSTEM
--> /#TOPICS
--> /#URLSTR
--> /#URLTBL
--> /$FIftiMain
--> /$OBJINST
--> /$WWAssociativeLinks/Property
--> /$WWKeywordLinks/Property
--> /doc/Images/Param_COMP_htm_5b06edf4.bmp
--> /doc/Images/Param_COMP_htm_5b06edf4.GIF
--> /doc/Images/Param_COMP_htm_5b06edf4.PNG
--> /doc/Images/param_MS.png
--> /doc/Index.hhk
--> /doc/P1.htm
--> /doc/P2.htm
--> /doc/P3.htm
--> /doc/P4.htm
--> /toc.hhc
Correcting /tmp/tmpr5QJmW/Example/doc/P1.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P2.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P2.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P2.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P3.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P3.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P3.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P3.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P4.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P4.htm
Correcting /tmp/tmpr5QJmW/Example/doc/P4.htm
############### 1st pass ###############
match P1\.htm and replace it with temp0001_html
match P2\.htm and replace it with temp0002_html
match P2\.htm and replace it with temp0002_html
match P2\.htm and replace it with temp0002_html
match P3\.htm and replace it with temp0005_html
match P3\.htm and replace it with temp0005_html
match P3\.htm and replace it with temp0005_html
match P3\.htm and replace it with temp0005_html
match P4\.htm and replace it with temp0009_html
match P4\.htm and replace it with temp0009_html
match P4\.htm and replace it with temp0009_html

############### 2nd pass ###############
match temp0001_html and replace it with temp0001.html
match temp0002_html and replace it with temp0002.html
match temp0003_html and replace it with temp0003.html
match temp0004_html and replace it with temp0004.html
match temp0005_html and replace it with temp0005.html
match temp0006_html and replace it with temp0006.html
match temp0007_html and replace it with temp0007.html
match temp0008_html and replace it with temp0008.html
match temp0009_html and replace it with temp0009.html
match temp0010_html and replace it with temp0010.html
match temp0011_html and replace it with temp0011.html

htmldoc --webpage --duplex --format 'pdf14' --jpeg='100' --linkcolor
'blue' --header 'c C' --size 'a4' --no-duplex --linkstyle 'plain' --
embedfonts --bodyfont times --footer 'c C' "/tmp/tmpz5hkxw/Example/
temp0001.html" "/tmp/tmpz5hkxw/Example/temp0002.html" "/tmp/tmpz5hkxw/
Example/temp0003.html" "/tmp/tmpz5hkxw/Example/temp0004.html" "/tmp/
tmpz5hkxw/Example/temp0005.html" "/tmp/tmpz5hkxw/Example/
temp0006.html" "/tmp/tmpz5hkxw/Example/temp0007.html" "/tmp/tmpz5hkxw/
Example/temp0008.html" "/tmp/tmpz5hkxw/Example/temp0009.html" "/tmp/
tmpz5hkxw/Example/temp0010.html" "/tmp/tmpz5hkxw/Example/
temp0011.html" -f example.pdf > /dev/null
PAGES: 15
BYTES:
211921
Written file example.pdf
Done.

Max

unread,
Nov 11, 2011, 11:32:19 AM11/11/11
to chm2pdf
Hi, Reto!
It's a pity, but seems that the project and the group are abandoned -
some people continue to post bugs here, but there was no response from
devels for a long time and there is lots of spam here.

Reto

unread,
Nov 11, 2011, 6:08:07 PM11/11/11
to chm2pdf
Hi Max!

Thanks for your reply...
It's really a pity as the project is really interesting!
So I decided to try if I am able to learn enough python to find the
problem...

I see in the script that something is done to Avoid duplicates in the
list of image URLs (in class ImageCatcher), but nothing similar is
done for class PageLister... could this be the cause of the multiple
pages?

I added this in Pagelister class:
# Avoid duplicates in the list of URLs.
if not self.pages.count('/'+value):
self.pages.append('/'+value)

Multiple pages seems now fixed, but last page is now missing and order
seems garbled. Too late for today....

Reto

unread,
Nov 12, 2011, 9:09:03 AM11/12/11
to chm2pdf
Update:

This should be added in Pagelister class as it fix the multiple page
problem in files with a lot of cross-links inside:
>                 # Avoid duplicates in the list of URLs.
>                 if not self.pages.count('/'+value):
>                     self.pages.append('/'+value)

The last page missing is because in my "demo" the last page starts
with a h2 and not h1!
Using the --book option, the output is correct (and last page indented
for one level as it should be in the topics.

For the image problem, in the tmp/chm2pdf/work/ folder everything is
fine, so this seems to be more an issue of HTMLDOC and not chm2pdf.

Need still to check if the order of the pages in the big file is
correct. In demo it is....

Ciao!
Reto
Reply all
Reply to author
Forward
0 new messages