Another broken link issue solved!

10 views
Skip to first unread message

Reto

unread,
Nov 14, 2011, 5:09:58 PM11/14/11
to chm2pdf
Hi again....

I found out another broken link issue:

I have this link <a href="HW_overview.htm">Overview of HW</a>
which becomes <a href="HW_temp0003.html">Overview of HW</a>
because match overview\.htm and replace it with temp0003_html
but should be match HW_overview\.htm and replace it with temp0081_html

Again a regex problem! In the #Substitutions in 1st pass i modified
the regex like this to fix this by adding ":

page = re.sub('(?i)"'+match_string, '"'+replace_string, page)

Reto

unread,
Nov 14, 2011, 5:43:36 PM11/14/11
to chm2pdf
I modified also here because in my file I have links like <a
href="#X1">X1</a> and so nothing of the link would be left:

# Replace links of the form "somefile.html#894" with
"somefile0206.html"
# The following will match anchors like '<a href="temp0206.html#894"'
and will store the 'temp0206.html' in backreference 1.
# The replace string will then replace it with '<a
href="temp0206.html"', i.e. it will take away the '#894' part.
# This is because the numbers after the '#' are often wrong or non-
existent. It is better to link to an existing
# chapter than to a non-existent part of an existing chapter.
page = re.sub('(?i)<a href="([^#|"]+)#[^"]*"', '<a href="\\1"', page)

in my optinion, in this case I prefer to leave the link intact, as it
points inside the same file!

Reto

unread,
Nov 14, 2011, 6:38:49 PM11/14/11
to chm2pdf
As activity in this group is low, i cross-posted my solutions to
https://launchpad.net/ubuntu/+source/chm2pdf/ which seems more active!
Reply all
Reply to author
Forward
0 new messages