My goal is to take some HTML and change instances of
http://mydomain.com/ to file:///home/me/foo/
Yes, I know about wget, but I can't seem to get it to
work very well with Zope (too much duplication) and so
I want to take the zopemir.py script (to fetch the
content) and extend it to fix up the HTML.
Now: In general, does this seem like something
suited to HTMLParser.HTMLParser? Or maybe to
htmllib.HTMLParser? Or would I be better off just
using ''.replace() ?
I experimented a bit with the 2 parsers, and I can get
either one to modify the required a and img tags, but
once I do, I am not quite sure how to reconstruct the
full lines. ie, if I have:
Here is some text with a <a href="location">link</a>.
I can get the parser to return a tag with the corrected
location, but how do I get it to return the whole corrected
line?
This recipe from the Cookbook seems to point the way,
but I wonder if maybe this is more than I need:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/135005
I am pretty sure I could just go forward with .replace(:o)
Any hints appreciated.
I have written a web-spider program in python and in
it required a class which does a job like what you wants.
Basically the requirement is to map internet urls to directory
files given a base directory.
Myself and a friend have come up with an implementation
of it. It is free but we have not yet documented or informed
this group about it. You can see the code in my webpage
http://members.fortunecity.com/anandpillai. The code is
available as a link somewhere in the page. The module name is
WebHttpUrlPath.py.
Tell me if you find it useful :-)
Best Regards
Anand B Pillai
Lee Harr <mis...@frontiernet.net> wrote in message news:<slrnbc1md3...@localhost.my.domain>...