Thomas Wetmore
unread,Sep 16, 2024, 9:29:21 PM9/16/24Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to root...@googlegroups.com
Thanks to Luther and Tom for responding.
I think the idea of embedding semantics in Gedcom keys (cross references) is probably a non-starter. As Luther points out Gedcom7 removes restrictions on the length of keys. (I thought the Gedcom7 standard had forgotten to put the restriction in!). So with Gedcom7 you could go crazy with the put-semantics-in-the-keys idea. You could make each key describe the relationship between its person and someone from a small set of "central" persons. Small set because not all persons in a database have to be related to every other person. A database holds partitions, where every person in a partition is related to every other person in that partition, but not related to anyone in any other partition. Pick one person from each partition as its "central" person, and then make every key of every person in the partition the relationship between the person and the central person. Crazy but doable. The central person could be selected using the idea I suggested in the last email, by picking the person with the most ancestors and descendants in the partition. I'm not suggesting this is a good idea, mind you, just something that is possible.
But what is behind my wondering about this? It boils down to sorting and iterating. How do you want the persons in your database to be sorted and listed and iterated. Of course there can be many kinds of lists of persons needed in many contexts.
Obviously you will want to sort by name, and that's probably the most important. One might think (or just old timers like me who started programming in the 60s) that sorting a large database of persons by names might require lots of processing power, and it kind of does if the records aren't somehow pre-sorted by name, but with modern processors you would have to have a massive database before the sorting time would be noticeable. My database of 15,000 persons sorts by name in a couple milliseconds. And sorting by name requires comparing names, and for arbitrary name using Gedcom rules, this comparison operation is non-trivial. But still it's all done in a couple milliseconds.
Are there other ways to sort a list of records. By key is one obvious way. As another say your program needs to iterate through every person in your database, and order does not matter. No need to have persons in name or key order. What order would you use? Most (all?) databases have a "natural" order. If you have SQL as backing store you would likely iterate through the "person table" in its "index" order. In my current in-RAM database records are kept in hash tables that map keys to Gedcom "trees". The hash table is ordinary, so it has buckets and the buckets have entries, each entry being a map from a key to a Gedcom root "node". Natural order in this case is iterating through the buckets and then their entries. Because this is based on a hash function it is pseudo-random.
My LifeLines program has a programming subsystem. One of its built-in datatypes is a Person Sequence (INDISEQ in the reference manual). Programs can build up these sequences in many ways using built-in operations. For example, you can get the sequence of all spouses of persons in a sequence. You can get the sequence of all siblings, or all children, or all parents or all ancestors or all descendants of the persons in a sequence. You can union, intersect and difference sequences. These sequences can be sorted by name, by key, or by user-assigned properties. Allowing a user-defined property shows I've been worrying about sorting persons for a long time.
Tom Wetmore