How to get a site from Internet Archive

Kotitihaere

unread,

Mar 21, 2011, 9:44:23 PM3/21/11

to VisualWget Help

Hi,

I am trying to get a backup of my crashed site from the internet
archive
( http://replay.waybackmachine.org/20100727170119/http://www.maori.org.nz/papa_panui/)

I use the above URL but it only downloads the first page and is not
following any of the links.

I tried to run the .wget -wr ia as a command but get invalid --execute
command

I have everything set at default - I have not changed any of the
advanced settings - will do that when I can get it to download move
that just the first page!

What do I have to do to get it to follow the links and download from
there?

Any help will be most appreciated - thank you for your time :D

Khomsan Ph.

unread,

Mar 22, 2011, 3:41:36 AM3/22/11

to VisualWget Help

Hi Kotitihaere,

You can use recursive and level option in advanced options.
Just go to advanced options.
And tick on recursive and maybe also level and set number of levels.

There are other option like convert-links,
you can see all available options on wget manual.
http://www.gnu.org/software/wget/manual/wget.html#Recursive-Retrieval-Options

Best Regards,
Khomsan

On Mar 22, 8:44 am, Kotitihaere <tataih...@maori.org.nz> wrote:
> Hi,
>
> I am trying to get a backup of my crashed site from the internet
> archive

> (http://replay.waybackmachine.org/20100727170119/http://www.maori.org....)

Khomsan Ph.

unread,

Mar 22, 2011, 4:37:48 AM3/22/11

to VisualWget Help

You may also need to untick or uncheck the resuming option.
The resuming option is "continue" and "time stamping".

On Mar 22, 2:41 pm, "Khomsan Ph." <khomsan...@gmail.com> wrote:
> Hi Kotitihaere,
>
> You can use recursive and level option in advanced options.
> Just go to advanced options.
> And tick on recursive and maybe also level and set number of levels.
>
> There are other option like convert-links,

> you can see all available options on wget manual.http://www.gnu.org/software/wget/manual/wget.html#Recursive-Retrieval...

Kotitihaere

unread,

Mar 22, 2011, 9:19:00 PM3/22/11

to VisualWget Help

thank you both for your suggestions

In advanced I have nothing set in
Basic Startup
Logging and Input file
download
https
ftp

In http I have director-prefix set for where to download the files

In Recursive Retrieval I have recursive ticked and level number set to
10

In Recursive Accept/Reject I have a reject list and no parent

it is still only downloading the first page and the robots.txt

If I untick In Recursive Retrieval the recursive I only get the
index.html

I am thinking I need to add in the -wr ia command on the command line
then put in the URL without the waybackmachine part - can someone tell
me how to use the --execute=command syntax

Any help is much appreciated :D

Khomsan Ph.

unread,

Mar 23, 2011, 6:50:22 AM3/23/11

to VisualWget Help

Hello Kotitihaere,

I have try downloading from the url that you provide.
It seem that the robot.txt is blocking wget from crawling.
Because wget respect robot.txt, but good news, it's configurable.
You can turn off robot support of wget by using the execute command.

1. In basic startup options, there is execute command
2. Tick on execute command, and put in the following... robots=off

Best Regards,
Khomsan

Kotitihaere

unread,

Mar 25, 2011, 6:30:08 AM3/25/11

to VisualWget Help

thank you for that - I have tried it and still not had any luck *deep
sigh*

Khomsan Ph.

unread,

Mar 25, 2011, 7:35:27 AM3/25/11

to VisualWget Help

Umm... it's weird, because i already trying it and it works.
I just remove --continue and --timestamping.
And add --recursive and --execute robots=off

Kotitihaere

unread,

Mar 26, 2011, 1:19:55 AM3/26/11

to VisualWget Help

Oh you rock! :D as soon as I set the recursive it worked :D

Thank you SOOO much for all your help - it is soo appreciated :D

Khomsan Ph.

unread,

Mar 26, 2011, 4:06:30 AM3/26/11

to VisualWget Help

No problems.

daniel king

unread,

Feb 1, 2014, 5:18:06 PM2/1/14

to visualwge...@googlegroups.com

late reply (three years i know) but I used http://waybackdownloads.com they downloaded the full site with correct linking. and it only took them about 2 hours

Reply all

Reply to author

Forward