Tom,
Like you, I don’t use a database, but just use in-memory data structures to hold my data.
I wrote the first version of my program over 30 years ago (Wow!, it’s been that long!) in Fortran and stored everything in arrays. I converted that to Turbo Pascal and (again like you), wrote my own B*-tree to hold my data. Both Fortran and Pascal are optimized compilers and produce very fast code. I used Fortran in the 1970’s for the chess program I developed and in doing so, I learned proper optimization techniques.
Turbo Pascal turned into Borland Delphi (an object-oriented version of Pascal with a tremendous IDE that had a great debugger). Embarcadero bought Delphi from Borland and they continue to develop and market it (now up to Version 12.2).
Behold has always been just a GEDCOM reader that produces a report from the data. It has always been fast because of the internal in-RAM data structures, but I could never make it quite as fast for very large files (millions of people) as GENViewer by MudCreek Software. Their secret trick for speed was using memory mapping for the GEDCOM file and they had direct access to it without even physically reading the whole thing. They made a pass and indexed what they needed. By the way, GENViewer is still available for free at http://www.mudcreeksoftware.com/
About 10 years ago, I was working to turn Behold into a full-fledged genealogy editor. I knew then I’d need a database to do that. I considered SQLite which had the ability to have in-RAM databases, but it overall was too clunky and wasn’t fast enough for me. So I decided to ultimately go with Embarcadero’s InterBase. It took several years to convert my internal data structures into database-compatible tables, and the result was the description I gave earlier. I got rid of my B*-trees (sad to see them go) and replaced them internally for now with Delphi’s generic structures, such as TList and TDictionary, many of which are hash-table based so were even faster than my B*-tree.
But then what happened was a change to genealogy itself. About 8 years ago, I chose MyHeritage to be where I’d keep and update my primary genealogy file. Their databases of billions of records and automatic record matching and matches to other trees was a game changer. I knew I could never implement that myself in Behold. So even though I was ready to, I never did convert Behold to use InterBase
But I’m still continuing to develop Behold as a GEDCOM reader. I can download my GEDCOM any time I want from MyHeritage and use Behold to view and make use of the information better.
With regards to your questions:
I would use a database, not a BTree or other data structures for a genealogy data editing program. You really can’t get much faster or simpler than an indexed database that is in RAM.
I would never iterate. A hash table will save you time. And that is just what a database index is.
Re: “making coding simpler” – Well, simply doing database calls is the simplest. If writing the code in you language of choice is difficult or gives complex routines, you could look for a higher level language with object-oriented extensions to write it in. But then, you and I are old horses, and new tricks don’t go well with us.
There’s my 2 cents worth.
Louis
--
---
You received this message because you are subscribed to the Google Groups "rootsdev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rootsdev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rootsdev/52439BFD-341A-4CC2-AE40-FE806D52C8E8%40gmail.com.
I have developed my own structure for dates and written about it here: https://www.beholdgenealogy.com/blog/?p=1640 which also explains how to sort dates.
With regards to indexing dates, I never had a need for it. But it would be tricky to index because you can't use exact lookups or hashtables. My initial thoughts would be to store the dates sorted in a binary tree or Btree and look for the first value >= the minimum date in your range and then look for the last value <= the largest date in your range. The dates from that first value to the last value will have at least some values falling in your desired range.
Louis