Paul,
Awesome! I just installed Gedcom 1.17, modified my script, and kicked
off another run. This will read 1345 html pages with 30 individuals with
their related family groups per page and builds a gedcom file from all
of that with over 40000 people in it.
It takes 1.5GB of memory and about 1.5 hours to process it all and
generate the gedcom file. I think the only thing that I have left to do
with this is to add the source definition and then link the source to
all the individuals, family groups and notes.
It is designed to parse a "Second Site" generated web site[1], but I'm
sure there are additional bells and widgets that can be added to this
that the site I'm grabbing does not use.
I think it would be great to create another program that, can read a
gedcom file and generate a "Second Site" like set of html pages. I have
a genealogy website the I build for my data [1][2], but it is all
dynamically generated pages from a database. I use Gedcom.pm to load my
database. But generating linked static pages that can be put on a DVD
for distribute has a lot of value. So I might tackle this in the near
future.
Thanks,
-Steve
[1]
http://www.nantuckethistoricalassociation.net/bgr/BGR-o/index.htm
[2]
http://swoodbridge.com/family/Woodbridge/
[3]
http://swoodbridge.com/family/WoodbridgeRecord/