HTTrack

Gina Heffernan

unread,

Sep 19, 2023, 10:29:23 PM9/19/23

to State-Coord@Google

I've downloaded a LOT of sites with this program but I think I'm just getting too old to deal with it.

I cannot for the life of me figure out which options to use to avoid DLing off-site pages. Right now, I have about 50 Wiki pages and 100 Texas Handbook pages for one county. When using it for RootsWeb sites, I get neighboring counties. I've experimented with using a 0 and a 1 for external pages and that seems to affect the target site - if I use a 0, I don't get subfolders on the target site, if I use a 1, I get tons of off-site pages. Can someone maybe send me screenshots of the options they use?

Gina Heffernan,

State Coordinator, TXGenWeb

Virus-free.www.avast.com

Martha A C Graham

unread,

Sep 19, 2023, 11:15:29 PM9/19/23

to state...@googlegroups.com

Hi Gina,
I used Gray, TX as a sample for Httraking - 5 images -
Step 1 - Project Name - and Path - Place you want to store-download the files - I use an external hard drive. [Drive G] on the image - click 'next' - change nothing else
Step 2: Enter the url in the slot, click 'next' - change nothing else
Step 3: Click 'Finish' - do not change anything
Step 4: Software does the download.
When it is finished, go to the location where you saved the files:
Step 5: you will see a whole list of files, the only one that is important is the one that says "Sites.rootsweb.com' -these are the main files, everything else can be deleted.

Martha

Httrack-Step-1.png

Httrack-Step-2.png

Httrack-Step-3.png

Httrack-Step-4.png

Httrack-Step-5.png

Gina Heffernan

unread,

Sep 20, 2023, 3:29:31 PM9/20/23

to state...@googlegroups.com

Thanks! I was changing the Options in image #2. I'll leave that alone next time and see what happens. I had almost 2000 Wikipedia files by the time I gave up yesterday - and I wasn't downloading from Wiki. I don't mind having to delete files but waiting for that many to download is a pain.

I'll work on the obit pages in steps and send them before I put them in my template.

--
You received this message because you are subscribed to the Google Groups "State Coordinators" group.
To unsubscribe from this group and stop receiving emails from it, send an email to state-coord...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/state-coord/12834d17-04bb-4940-882d-7cece6338a3c%40gmail.com.

Timothy Stowell

unread,

Sep 20, 2023, 5:22:19 PM9/20/23

to state...@googlegroups.com

Depending on how long a site has been on Rootsweb, I would also do a pull for two other URLs for the same county:

http://www.rootsweb.com/~txgray/

http://www.rootsweb.ancestry.com/~txgray/

If they return items likely the total number of files will vary by the time the site existed with that URL.

Two cases in point - a county in New York - using only HTTrack:

A-sites.rootsweb - 872 files, 68 folders - 41.22 MB
B-rootsweb.ancestry - 4974 files, 162 folders - 352.89 MB
C-rootsweb.com - 5216 files, 300 folders - 153 MB

Another county - using Wayback and HTTrack:

Wayback
https://sites.rootsweb.com/ ~nyqueens/ 16 files
http://www.rootsweb.ancestry.com/ ~nyqueens/ 24 files
http://www.rootsweb.com/ ~nyqueens/ 26 files

https://sites.rootsweb.com/~nyqueen2/ 75 files
http://www.rootsweb.ancestry.com/ ~nyqueen2/ 1,926 Files, 46 Folders
http://www.rootsweb.com/~nyqueen2/ 530 files

HTTRACK (current sites)

https://sites.rootsweb.com/ ~nyqueens/ 23 files
https://sites.rootsweb.com/~nyqueen2/ 3,066 Files, 54 Folders

=====================================================

So in the case of Queens created by different persons over the years for NYGenWeb, you have this mass of files

and the first thought is what kind of mess is this to deal with? Speaking from experience...

However, there is an interesting piece of software one might consider - http://meldmerge.org/

Alas for me, I will either end up downloading and uploading in masse or take folder by folder of what I have access to

and figure it out as I go.

I plan as Joy said some time ago to scrape Rootsweb for as much as I can for each county I maintain.

Best of luck to you all.

Tim Stowell

Nancy_Janyszeski

unread,

Sep 22, 2023, 2:44:48 PM9/22/23

to state...@googlegroups.com

I have tried on several occasions to download httrack without success.

Is there something I am missing?

https://www.httrack.com/page/2/en/index.html

~ Nancy

Nancy Janyszeski

http://www.PAGenweb.org

~ Bucks County PAGenWeb
~ Montgomery County PAGenWeb
~ Northampton County PAGenWeb

Bucks County History

To view this discussion on the web visit https://groups.google.com/d/msgid/state-coord/CAF49Dn0%2Bazr31dk6ejf2jJ%2BCrqxE85yUf7Ex8RkBRKaC_bn2JA%40mail.gmail.com.

Martha A C Graham

unread,

Sep 22, 2023, 2:54:35 PM9/22/23

to state...@googlegroups.com

Hi Nancy