Thomas Wetmore
unread,Oct 2, 2024, 12:56:28 PM10/2/24Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to root...@googlegroups.com
Group,
I've been experimenting with different ways to index and key persons in my DeadEnds software. I've started to use ChatGPT as a sounding board to get software development advice. Every times I use it I am more convinced of the revolutionary changes that AI is bringing to software developers.
I was thinking about the problem of finding the "connectedness" of each person in a database, which I define as the total number of ancestors and descendants each person has in the database. The person with the highest connectedness is, in a sense, the key, or base, or most important, or some such superlative, person in the database.
The connectededness of each person can easily be done using a simple recursive function. But if done that way there would be incredible duplication of effort. I was outlining some algorithms to do it efficiently, when I decided to ask ChatGPT for advice. You might be interested in the ensuing interaction.
First, here is the question I asked:
-------------------------------------
I have genealogical software that builds an in-Ram database of persons and families by reading a Gedcom file and transforming each INDI record and FAM record in the file in to person and family objec in the database. I want to partition the database into sets of persons and families that are all interconnected through parent/child relationships where each set is distinct from the others. Within each set I'd like to find the person who is the most connected, that is, has the most ancestors and descendents. I would actually like to find the "connectedness" of every person, that is, the combined total of ancestors and descendents each has. An algorithm could simply look and each person and compute the number independently for that person. But many calculations would be redundant, as the same ancestors and descendents would be computed for persons many times as they are found in different relationships with the person currently being worked on. Is there a way of doing the work to reduce this redundancy.
And here is the ChatGPT response:
-----------------------------------
To reduce redundancy when calculating the “connectedness” (total ancestors and descendants) of each person in your genealogical software, you can use a memoization approach or build a directed acyclic graph (DAG) representation of family relationships and employ dynamic programming techniques. Here’s a strategy that might help:
1. Graph Representation of Family Relations
• Represent the database as a graph where each person is a node, and parent/child relationships are edges.
• For each person, you can have two lists: one for ancestors and one for descendants. These relationships can be bidirectional in terms of traversal.
2. Memoization
• As you traverse the graph to compute the connectedness for each person, you store the results for ancestors and descendants for every node (person). When you encounter a person whose connectedness has already been calculated, you simply retrieve the value rather than recalculating.
• This can significantly reduce redundant calculations, as you avoid recalculating ancestor/descendant counts for persons already visited.
3. Recursive Traversal with Caching
• You can recursively compute the number of ancestors and descendants for each person. During this process, maintain a cache (in a dictionary or hashmap) that stores these counts for every person.
• For example:
• For a given person P, start by counting the ancestors by recursively checking their parents.
• While doing this, store the results for each ancestor in the cache so that if P shares a common ancestor with another person, you don’t recompute the count.
• Similarly, do this for descendants by checking each person’s children recursively.
4. Partitioning into Sets
• You can partition the database into distinct sets by running a depth-first search (DFS) or breadth-first search (BFS) on the graph. Each connected component (i.e., a group of persons and families connected through parent/child relationships) forms a distinct set.
• After partitioning, you can calculate the connectedness for each person within a set. This will also help reduce the search space.
5. Dynamic Programming for Connectedness
• Use dynamic programming to propagate the connectedness information upwards or downwards through the graph, avoiding redundant computations.
• For example, once you know how many descendants a person has, propagate that information to their ancestors so that when calculating the connectedness of those ancestors, the descendant count can be reused.
Basic Algorithm Outline:
1. Graph Construction:
• Build the parent/child relationships graph for persons and families.
2. Partitioning:
• Use DFS or BFS to identify distinct sets of interconnected persons.
3. Memoized Calculation:
• For each person in the set, use a recursive algorithm to calculate their ancestors and descendants, using memoization to store previously calculated values.
4. Connectedness Calculation:
• For each person, the connectedness is simply the sum of the number of ancestors and descendants.
This approach ensures that the number of ancestor/descendant computations is minimized, avoiding redundant recalculations, especially when traversing the same branches of the genealogical tree multiple times. Would you like help implementing any part of this in code?
--------------------------
Back to me. First I didnt' know whether ChatGPT would even understand my question. But this answer shows that not only was the problem fully "understood" (don't now what word should really be used), but the suggestions are really quite wonderful.
Best,
Tom Wetmore