Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to download webpage including his images (like IE's save as)

1 view
Skip to first unread message

Alex

unread,
Jun 29, 2006, 5:06:54 AM6/29/06
to
Hi, all
I try to make a application which need cache the web content in local.

But I found the windows API "UrlDownloadToFile" only download the web
page self, but not include the image files. Since most of web-pages use

the image url like "../img/img1.jpg", when this url mapped into local,
it do doesn't work.


Is there a easy way to download whole webpage including the images? Or
some ways to address the problem above.


Thanks!
Alex

[Jongware]

unread,
Jun 29, 2006, 6:55:23 AM6/29/06
to
"Alex" <zaoyan...@gmail.com> wrote in message
news:1151572014.8...@d56g2000cwd.googlegroups.com...

I imagine minimally parsing the HTML would work -- really not too hard to
write, as bogus images won't be found. It would pay off to exclude images
beyond the current root (as in "adverts").

[Jongware]


Norman Bullen

unread,
Jun 29, 2006, 9:04:31 PM6/29/06
to
Alex wrote:

Look at the "WALKALL" sample program in MSDN.

Also, try groups "microsoft.public.inetsdk.programming.mshtml_hosting"
and "microsoft.public.inetsdk.programming.webbrowser_ctl" for questions
on this subject.

Norm

--
--
To reply, change domain to an adult feline.

Alex

unread,
Jun 30, 2006, 11:40:03 AM6/30/06
to

I tried to use RegExp to parse the file which was downloaded by
UrlDownloadToFile, find all "src=...", replace the relative path like
"../img/img1.jpg" with full path on web-site. It works. I guess the
MSHTML will be more powerful but also more complex.

Anyway, thanks very much!

Alex

Charlie Gibbs

unread,
Jul 2, 2006, 2:33:40 PM7/2/06
to
In article <zM_og.177$PE1...@newsread2.news.pas.earthlink.net>,
no...@BlackKittenAssociates.com.INVALID (Norman Bullen) writes:

> Alex wrote:
>
>> Hi, all
>> I try to make a application which need cache the web content in
>> local.
>>
>> But I found the windows API "UrlDownloadToFile" only download
>> the web page self, but not include the image files. Since most
>> of web-pages use
>>
>> the image url like "../img/img1.jpg", when this url mapped into
>> local, it do doesn't work.
>>
>> Is there a easy way to download whole webpage including the images?
>> Or some ways to address the problem above.
>

> Look at the "WALKALL" sample program in MSDN.
>
> Also, try groups "microsoft.public.inetsdk.programming.mshtml_hosting"
> and "microsoft.public.inetsdk.programming.webbrowser_ctl" for
> questions on this subject.

If you have access to Linux source code, look up "wget", which makes
a local copy of a web page. It should take relatively little tweaking
to port the code and make it do what you want.

--
/~\ cgi...@kltpzyxm.invalid (Charlie Gibbs)
\ / I'm really at ac.dekanfrus if you read it the right way.
X Top-posted messages will probably be ignored. See RFC1855.
/ \ HTML will DEFINITELY be ignored. Join the ASCII ribbon campaign!

0 new messages