Genealogical Connectedness and ChatGPT

5 views
Skip to first unread message

Thomas Wetmore

unread,
Oct 2, 2024, 12:56:28 PM10/2/24
to root...@googlegroups.com
Group,

I've been experimenting with different ways to index and key persons in my DeadEnds software. I've started to use ChatGPT as a sounding board to get software development advice. Every times I use it I am more convinced of the revolutionary changes that AI is bringing to software developers.

I was thinking about the problem of finding the "connectedness" of each person in a database, which I define as the total number of ancestors and descendants each person has in the database. The person with the highest connectedness is, in a sense, the key, or base, or most important, or some such superlative, person in the database.

The connectededness of each person can easily be done using a simple recursive function. But if done that way there would be incredible duplication of effort. I was outlining some algorithms to do it efficiently, when I decided to ask ChatGPT for advice. You might be interested in the ensuing interaction.

First, here is the question I asked:
-------------------------------------

I have genealogical software that builds an in-Ram database of persons and families by reading a Gedcom file and transforming each INDI record and FAM record in the file in to person and family objec in the database. I want to partition the database into sets of persons and families that are all interconnected through parent/child relationships where each set is distinct from the others. Within each set I'd like to find the person who is the most connected, that is, has the most ancestors and descendents. I would actually like to find the "connectedness" of every person, that is, the combined total of ancestors and descendents each has. An algorithm could simply look and each person and compute the number independently for that person. But many calculations would be redundant, as the same ancestors and descendents would be computed for persons many times as they are found in different relationships with the person currently being worked on. Is there a way of doing the work to reduce this redundancy.

And here is the ChatGPT response:
-----------------------------------
To reduce redundancy when calculating the “connectedness” (total ancestors and descendants) of each person in your genealogical software, you can use a memoization approach or build a directed acyclic graph (DAG) representation of family relationships and employ dynamic programming techniques. Here’s a strategy that might help:

1. Graph Representation of Family Relations

• Represent the database as a graph where each person is a node, and parent/child relationships are edges.
• For each person, you can have two lists: one for ancestors and one for descendants. These relationships can be bidirectional in terms of traversal.

2. Memoization

• As you traverse the graph to compute the connectedness for each person, you store the results for ancestors and descendants for every node (person). When you encounter a person whose connectedness has already been calculated, you simply retrieve the value rather than recalculating.
• This can significantly reduce redundant calculations, as you avoid recalculating ancestor/descendant counts for persons already visited.

3. Recursive Traversal with Caching

• You can recursively compute the number of ancestors and descendants for each person. During this process, maintain a cache (in a dictionary or hashmap) that stores these counts for every person.
• For example:
• For a given person P, start by counting the ancestors by recursively checking their parents.
• While doing this, store the results for each ancestor in the cache so that if P shares a common ancestor with another person, you don’t recompute the count.
• Similarly, do this for descendants by checking each person’s children recursively.

4. Partitioning into Sets

• You can partition the database into distinct sets by running a depth-first search (DFS) or breadth-first search (BFS) on the graph. Each connected component (i.e., a group of persons and families connected through parent/child relationships) forms a distinct set.
• After partitioning, you can calculate the connectedness for each person within a set. This will also help reduce the search space.

5. Dynamic Programming for Connectedness

• Use dynamic programming to propagate the connectedness information upwards or downwards through the graph, avoiding redundant computations.
• For example, once you know how many descendants a person has, propagate that information to their ancestors so that when calculating the connectedness of those ancestors, the descendant count can be reused.

Basic Algorithm Outline:

1. Graph Construction:
• Build the parent/child relationships graph for persons and families.
2. Partitioning:
• Use DFS or BFS to identify distinct sets of interconnected persons.
3. Memoized Calculation:
• For each person in the set, use a recursive algorithm to calculate their ancestors and descendants, using memoization to store previously calculated values.
4. Connectedness Calculation:
• For each person, the connectedness is simply the sum of the number of ancestors and descendants.

This approach ensures that the number of ancestor/descendant computations is minimized, avoiding redundant recalculations, especially when traversing the same branches of the genealogical tree multiple times. Would you like help implementing any part of this in code?
--------------------------

Back to me. First I didnt' know whether ChatGPT would even understand my question. But this answer shows that not only was the problem fully "understood" (don't now what word should really be used), but the suggestions are really quite wonderful.

Best,

Tom Wetmore



Andrew Hatchett

unread,
Oct 2, 2024, 2:21:52 PM10/2/24
to root...@googlegroups.com
Well Done Thomas!
As an experiment, I pasted your question in Perplexity.AI... and it replied as follows...

Answer
To efficiently partition the genealogical database and calculate connectedness scores, you can use a combination of graph algorithms and memoization techniques. Here's an approach to solve this problem:
  1. Build a graph representation:
    Convert the person and family objects into a graph where each person is a node and parent-child relationships are edges.
  2. Find connected components:
    Use a depth-first search (DFS) or breadth-first search (BFS) algorithm to identify all connected components in the graph. Each component represents a distinct set of interconnected persons.
  3. Calculate connectedness scores:
    For each connected component:
a) Choose an arbitrary starting node.
b) Perform a DFS traversal, keeping track of the depth for each node.
c) Use memoization to store the number of descendants for each node.
d) Perform a second DFS traversal to calculate the number of ancestors and total connectedness score.Here's a Python-like pseudocode implementation:
python
def find_connected_components(graph): components = [] visited = set() for node in graph: if node not in visited: component = [] dfs(node, visited, component) components.append(component) return components def dfs(node, visited, component): visited.add(node) component.append(node) for child in node.children: if child not in visited: dfs(child, visited, component) def calculate_connectedness(component): start_node = component[0] depth_map = {} descendant_count = {} def dfs_depth(node, depth): depth_map[node] = depth count = 1 for child in node.children: if child in component: count += dfs_depth(child, depth + 1) descendant_count[node] = count return count dfs_depth(start_node, 0) def dfs_connectedness(node): ancestors = depth_map[node] descendants = descendant_count[node] - 1 connectedness = ancestors + descendants node.connectedness = connectedness for child in node.children: if child in component: dfs_connectedness(child) dfs_connectedness(start_node) return max(component, key=lambda x: x.connectedness) # Main algorithm graph = build_graph_from_gedcom() components = find_connected_components(graph) for component in components: most_connected = calculate_connectedness(component) print(f"Most connected person in component: {most_connected.name}") print(f"Connectedness score: {most_connected.connectedness}")
This approach has several advantages:
  1. It partitions the database into distinct sets of interconnected persons.
  2. It calculates the connectedness score for each person efficiently by using memoization to avoid redundant calculations.
  3. It finds the most connected person in each component.
The time complexity of this algorithm is O(N + E), where N is the number of persons and E is the number of parent-child relationships. This is more efficient than calculating connectedness independently for each person, which would have a time complexity of O(N^2) in the worst case.To further optimize the algorithm, you could:
  1. Use an iterative approach instead of recursive for very large datasets to avoid stack overflow.
  2. Implement parallel processing for handling multiple components simultaneously.
  3. Use a more memory-efficient graph representation if dealing with extremely large datasets.
This solution provides an efficient way to partition your genealogical database and calculate connectedness scores while minimizing redundant calculations.


A Google Top Contributor/Product Expert Since 2015
ANDREW G. HATCHETT  #AGH3rd  |   #GOOGLEMAIL   
"Do The Needful"      #G4GN24      SERP: #agh100224 






--

---
You received this message because you are subscribed to the Google Groups "rootsdev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rootsdev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rootsdev/025A0FA7-E284-4FD7-9DDA-43ABA0D86167%40gmail.com.

Stephen Woodbridge

unread,
Oct 2, 2024, 4:38:03 PM10/2/24
to root...@googlegroups.com
I’ve been playing with ChatGPT and asked it to analyze the Gedcom 5.5 and 5.5.5 specs and generate NodeJS code to read and parse a Gedcom file into JavaScript objects. 

The code it generated was very VERY good and modular. I asked it to refactor the code to do some do some other things. I’m very impressed by the code it generates.  

-Steve W

Sent from my iPhone

On Oct 2, 2024, at 2:21 PM, Andrew Hatchett <agh...@gmail.com> wrote:



Thomas Wetmore

unread,
Oct 2, 2024, 8:10:21 PM10/2/24
to root...@googlegroups.com


> On Oct 2, 2024, at 2:21 PM, Andrew Hatchett <agh...@gmail.com> wrote:

> As an experiment, I pasted your question in Perplexity.AI... and it replied as follows...
>
> Answer
> To efficiently partition the genealogical database and calculate connectedness scores, you can use a combination of graph algorithms and memoization techniques. Here's an approach to solve this problem:
> • Build a graph representation:
> Convert the person and family objects into a graph where each person is a node and parent-child relationships are edges.
> • Find connected components:
> Use a depth-first search (DFS) or breadth-first search (BFS) algorithm to identify all connected components in the graph. Each component represents a distinct set of interconnected persons.
> • Calculate connectedness scores:

...

Andrew,

Great to see another answer from a different LLM, both basically the same. I have had good luck getting high quality code out of ChatGPT3/4. I try to compose my questions carefully, but the program seems to be able to read my mind and know what I'm trying to ask when I do a poor job of asking.

Tom Wetmore

Marshall Lake

unread,
Oct 3, 2024, 1:25:58 PM10/3/24
to Digest recipients

[some good stuff snipped]

Several times, during the development of my project, my old brain gets
stalled midst a problem. When I can't find help googling I pose the
problem to Bard/Gemini and it seems to always be helpful.

--
Marshall Lake -- marsha...@gmail.com -- http://www.mlake.net
Reply all
Reply to author
Forward
0 new messages