On Oct 5, 2024, at 11:13 AM, Enno Borgsteede <enno...@gmail.com> wrote:Hello Tom,
Have you ever asked ChatGPT to help optimize some of your source code?
I'm asking, because we have an algorithm in Gramps that can calculate and display paths between people, which is more than 100 times slower than a similar algorithm in Geneweb. I have a tree with more than 600,000 people, which includes a Karel de Grote GEDCOM that I once found on the web, and Geneweb can compute and display a path between any two people in that tree in 1 or 2 seconds. I see that on Geneanet, and also when I run Geneweb here.
Algorithms like these are quite trivial, in the sense that you start with one person, and then follow direct connections until you reach the other one, and store some sort of part string while doing that. And if may phone can do that in a reasonable time, close to real time, I see no reason why Gramps shouldn't be able to do the same, And if an algorithm is designed so that it never processes the same person twice, its processing time can be quite predictable, and increase more or less linear with the size of the tree, or even better.
The performance of our algorithm seems to be nothing like that, and the duration of the calculation looks more like it's exponential.
The source code is more than 3000 lines, so I doubt whether ChatGPT is able to process that, but I may be able to find the main loop, and feed that to the AI.
Do you or anyone else no other AIs that may be able to do this? I often use Visual Studio Code, but never tried how that works with AI.
Regards,
Enno
I would give it a try. So far ChatGPT has handled everything I've given it. It once told me I could send it a zip file of my full code base and it would comment on it for me. By the way, the cousins script is just 267 lines of code. It finds the closest relationship between and two persons (there can be many as you know) and prints out that relationship in a way that shows all the steps.
I'll have another look at our code, but finding cousins is easy, and our Deep Connections Gramplet goes further than that, just like Geneanet does for these random persons:

I hope that this picture will make it, but if it does not, I can show a similar path from Gramps, as text.
Enno
Op 05-10-2024 om 17:50 schreef Thomas Wetmore:
I would give it a try. So far ChatGPT has handled everything I've given it. It once told me I could send it a zip file of my full code base and it would comment on it for me. By the way, the cousins script is just 267 lines of code. It finds the closest relationship between and two persons (there can be many as you know) and prints out that relationship in a way that shows all the steps.I'll have another look at our code, but finding cousins is easy, and our Deep Connections Gramplet goes further than that, just like Geneanet does for these random persons:
<1s60IWyMsuyc2w6H.png>
Hello Tom,
Thank you. I knew that the code could be made simpler, and ChatGPT also suggested that it may process the same persons multiple times, and some extra logging proves that this is indeed the case, although I haven't figured out the best solution yet, to make it fit in Gramps.
ChatGPT gave me a handful of comments, including this:
and1. Optimize the Queue Operations
- Current Issue: The
self.queueis being used as a list, and you are usingpop(0), which removes the first element. In Python, popping from the front of a list is an O(n) operation because it requires shifting all elements.- Solution: Use a deque (from the
collectionsmodule), which provides an O(1) time complexity for both append and pop operations at both ends.
and another one:2. Reduce Duplicate Calculations and Cache More Data
- Current Issue: You're likely recalculating relationships between the same people multiple times.
- Solution: Use a more comprehensive caching mechanism. You already have
self.cache, but consider expanding it to store intermediate results like relationships between people, so that any repeated queries can be avoided entirely.
And these are all quite to the point.3. Optimize Relationship Search Algorithm
If
self.get_relativesorrelationship_calc.get_one_relationshipperforms complex operations, see if those methods themselves can be optimized. Specifically:
- If they query the database, check for indexing or ways to limit the number of queries.
- If these methods have repetitive calculations, caching or memoization might be helpful.
Until now, my experience with ChatGPT was not very good, because it didn't seem to have specific knowledge of GEDCOM, nor about about specific buildings in my neighborhood, or other things that I see as facts.
Some time ago, I asked it to generate code to remove lines with specific tags from a GEDCOM file, and that time, it produced code that removed all lines that had that tag somewhere in that line, and didn't seem to know that the tag is the second token in a line, and is preceded by a number.
Things may be better now.
Regards,
Enno
Hello Tom,
The LifeLines language is customized to handle genealogical data and write genealogical reports. I'm sure the LL code is shorter than Gramps's because of its large number of genealogical operations. If I add up the lines of C code behind the LL interpreter and the "built-ins" that implement the operations the story is different. The LL language API does, IMHO, hit a sweet spot for implementing genealogical algorithms.
As much as a I rave about ChatGPT it does not bat 100%.
Hehe, I know. I commented, because what I see here confirms my earlier experience which suggests that it's quite superficial. It is able to find simple ways for improvement, like replacing a list with a queue, or caching some data, but it does not see the main problem, which is in the missing breadcrumbs. And that part is quite easy to see too, once you know where to look. And that seems to be a natural consequence of the fact that it's a language model, not true AI. It may look impressive, and sometimes it is, but people were also impressed by Eliza.
There are many more articles about this, like this one:
https://link.springer.com/article/10.1007/s10676-024-09775-5
And at the same time, I found that it can produce very nice code, when I just give it the actual challenge, to which it replied that we can use a Breadth-First Search (BFS) and gave me a piece of code that's as compact as yours:
from collections import dequeIt's in Dutch, but I bet that you get the idea. I left out the test class and test code, but it gave me that too, and it has a queue for the people that need to be processed, and a set for the breadcrumbs, so it's about the most efficient that I can get.
def kortste_pad(start, doel):
# Als de start en doel hetzelfde zijn
if start == doel:
return [start.naam]
# BFS setup
queue = deque([(start, [start.naam])])
bezocht = set([start])
while queue:
huidige_persoon, pad = queue.popleft()
# Zoek door de kinderen, ouders en echtgenoten
for verwant in huidige_persoon.kinderen + huidige_persoon.ouders + huidige_persoon.echtgenoten:
if verwant not in bezocht:
if verwant == doel:
return pad + [verwant.naam]
queue.append((verwant, pad + [verwant.naam]))
bezocht.add(verwant)
return None # Geen pad gevonden
The only problem for me is that Gramps policies don't allow AI generated code, so I need to keep this secret somehow.
I hereby admit that I can't do this much better myself.
Cheers,
Enno
The only problem for me is that Gramps policies don't allow AI generated code, so I need to keep this secret somehow.
I hereby admit that I can't do this much better myself.
On Oct 7, 2024, at 2:56 PM, Enno Borgsteede <enno...@gmail.com> wrote:That's exactly what our code tries to do too, with the exception that it has a problem with the breadcrumbs, which is quite hard to find, because it's a bit of a mess. And maybe it's not that hard, but fellow developers seem to get scared by its messy looks, and me too.
Note that I did not test what I just sent, but just showed what look like good Python. It came with these comments:
Key Changes and Considerations:
- Input and Output: I replaced
getindimsgwith Python’sinput()function and simplified the print statements.- Table/Lists: In Python, dictionaries (
links,rels) are used for the mappings, and adequefor the queue (klist).- Database simulation: Functions like
get_father(),get_families(),save_key()are placeholders simulating how the data is accessed.- Control flow: The while loops and conditionals from the original code have been adapted using Python's idiomatic control structures.
This version assumes you have some data source or database that these helper functions (
get_father(), etc.) can interact with. You can replace these with actual data retrieval logic as needed.