table background color removed

5 views
Skip to first unread message

Reto

unread,
Nov 12, 2011, 5:08:44 PM11/12/11
to chm2pdf
Hi again!

...I know stylesheets are not supported, thus I rework the source of
the CHM by hand.

If I have a table with some cells where I want a background color.

If I write e.g. <td bgcolor="#00ff00"> i end up in the work directory
with <td bgcolor=""> and in the PDF no background color is there.

If I write <td bgcolor=lime> the color is there in the PDF.

I looked trough the script. but I am not able to see where and why the
"#00ff00 get's removed.

Any ideas?

Reto

unread,
Nov 13, 2011, 11:34:09 AM11/13/11
to chm2pdf
It seems that only the first #00FF00 is removed after a link! A regex
too greedy maybe?

Original table:
<table>
<tr>
<td bgcolor="#00ff00">row 1, col1</td>
<td bgcolor="#00ff00">row 1, col2</td>
<td bgcolor="#00ff00">row 1, col3</td>
</tr>
<tr>
<td bgcolor="#00ff00">row 2, col1</td>
<td bgcolor="#00ff00">row 2, col2 <a href="P1.htm"> here a link to
page 1</a></td>
<td bgcolor="#00ff00">row 2, col3</td>
</tr>
<tr>
<td bgcolor="#00ff00">row 3, col1</td>
<td bgcolor="#00ff00">row 3, col2</td>
<td bgcolor="#00ff00">row 3, col3</td>
</tr>
</table>

Same table in Work directory:
<table>
<tr>
<td bgcolor="">row 1, col1</td>
<td bgcolor="#00ff00">row 1, col2</td>
<td bgcolor="#00ff00">row 1, col3</td>
</tr>
<tr>
<td bgcolor="#00ff00">row 2, col1</td>
<td bgcolor="#00ff00">row 2, col2 <a href="temp0001.html"> here a link
to page 1</a></td>
<td bgcolor="">row 2, col3</td>
</tr>
<tr>
<td bgcolor="#00ff00">row 3, col1</td>
<td bgcolor="#00ff00">row 3, col2</td>
<td bgcolor="#00ff00">row 3, col3</td>
</tr>
</table>

Reto

unread,
Nov 13, 2011, 12:38:08 PM11/13/11
to chm2pdf
Can any one hep me on regular expressions?

I think this one is too greedy:
# Replace links of the form "somefile.html#894" with
"somefile0206.html"
# The following will match anchors like '<a
href="temp0206.html#894"' and will store the 'temp0206.html' in
backreference 1.
# The replace string will then replace it with '<a
href="temp0206.html"', i.e. it will take away the '#894' part.
# This is because the numbers after the '#' are often
wrong or non-existent. It is better to link to an existing
# chapter than to a non-existent part of an existing
chapter.
page = re.sub('(?i)<a href="([^#]*)#[^"]*"', '<a href="\
\1"', page)

because it matches everything until the next #, even if it is outside
the link!

How about this? Is this the way to go? .. at least it seems to work!
page = re.sub('(?i)<a href="([^(#|")]*)#[^"]*"', '<a
href="\\1"', page)
Reply all
Reply to author
Forward
0 new messages