But I found the windows API "UrlDownloadToFile" only download the web
page self, but not include the image files. Since most of web-pages use
the image url like "../img/img1.jpg", when this url mapped into local,
it do doesn't work.
Is there a easy way to download whole webpage including the images? Or
some ways to address the problem above.
Thanks!
Alex
I imagine minimally parsing the HTML would work -- really not too hard to
write, as bogus images won't be found. It would pay off to exclude images
beyond the current root (as in "adverts").
[Jongware]
Look at the "WALKALL" sample program in MSDN.
Also, try groups "microsoft.public.inetsdk.programming.mshtml_hosting"
and "microsoft.public.inetsdk.programming.webbrowser_ctl" for questions
on this subject.
Norm
--
--
To reply, change domain to an adult feline.
I tried to use RegExp to parse the file which was downloaded by
UrlDownloadToFile, find all "src=...", replace the relative path like
"../img/img1.jpg" with full path on web-site. It works. I guess the
MSHTML will be more powerful but also more complex.
Anyway, thanks very much!
Alex
> Alex wrote:
>
>> Hi, all
>> I try to make a application which need cache the web content in
>> local.
>>
>> But I found the windows API "UrlDownloadToFile" only download
>> the web page self, but not include the image files. Since most
>> of web-pages use
>>
>> the image url like "../img/img1.jpg", when this url mapped into
>> local, it do doesn't work.
>>
>> Is there a easy way to download whole webpage including the images?
>> Or some ways to address the problem above.
>
> Look at the "WALKALL" sample program in MSDN.
>
> Also, try groups "microsoft.public.inetsdk.programming.mshtml_hosting"
> and "microsoft.public.inetsdk.programming.webbrowser_ctl" for
> questions on this subject.
If you have access to Linux source code, look up "wget", which makes
a local copy of a web page. It should take relatively little tweaking
to port the code and make it do what you want.
--
/~\ cgi...@kltpzyxm.invalid (Charlie Gibbs)
\ / I'm really at ac.dekanfrus if you read it the right way.
X Top-posted messages will probably be ignored. See RFC1855.
/ \ HTML will DEFINITELY be ignored. Join the ASCII ribbon campaign!