$0.02

31 views
Skip to first unread message

Marshall Lake

unread,
Apr 16, 2024, 6:54:39 PMApr 16
to root...@googlegroups.com

I read with interest both the post, Przemek, and the message, Tom.

My background is in assembly language on various hardware and C on Linux
(and in the 1980s, UNIX on minicomputers).

Recently I began developing genealogy software for which I have a large
vision. Some of the requisites are cross-platform (Linux, Apple,
Windows), networking (collaboration, being able to sync data among public
venues like wikitree, geni, familysearch, etc and private databases) while
keeping the personal DB private, every item of data being supported with
sources/citations.

With cross-platform in mind (and much deliberation) I decided to write the
code in HTML and JavaScript. I'm a dinosaur, and with my background in
assembler and C, JavaScript is a beast! But, it's fun (as well as
frustrating). Plus, I'm learning JavaScript on the fly which adds to the
frustration. (I know I'm missing out on some of the conciseness of
JavaScript. To an experienced JS programmer I'm sure my code is
terrible.)

Anyway, I just thought I would chime in with my two cents.


--
Marshall Lake -- marsha...@gmail.com -- http://www.mlake.net

Thomas Wetmore

unread,
Apr 16, 2024, 9:27:34 PMApr 16
to root...@googlegroups.com
Marshall,

It is nice to see activity on this list. You have ambitious goals, and I hope you will work to see where they take you.

In my opinion the holy grail of genealogical software is a system capable of taking records from any kind of source, hundreds, thousands, whatever of them, and then inferring the actual persons mentioned in those records. Putting this into GEDCOM terms, I imagine each of the records, the raw data, as being in the form of a few interconnected GEDCOM records holding the persons, dates, places and events mentioned in each source. 

A database would start out as a collection of all these "evidence" GEDCOM records, which then get collectively and maybe iteratively "merged" together into a final set of "real person" GEDCOM records. Persons records become the collection of all evidence you have discovered about them.

This model is similar to the one used by FamilySearch and Ancestry.com. They have billions of source records. They are not in GEDCOM format, but they are some proprietary structured format, and when you use these systems you do merge together source records as you "build" your ancestors.

I have tried to follow this method in my own database, adding records taken from sources, resulting in several "mini" records that refer to the same person, and then merging them using my program's join commands.

I have corresponded with someone not on this list who is thinking about how AI could guide the merging.

I started writing software in the late 60s, FORTRAN through the 70s, C in the 80s and 90s, then Java, Objective-C, C++, and now Swift and Go. I am an old timer (74 years old) which means I have written mostly CLI programs, as networks and servers and clients didn't exist until late in my career.

Best,

Tom Wetmore

Przemek Więch

unread,
Apr 17, 2024, 8:29:48 AMApr 17
to root...@googlegroups.com
Thanks Tom and Marshall for the responses.
 
In my opinion the holy grail of genealogical software is a system capable of taking records from any kind of source, hundreds, thousands, whatever of them, and then inferring the actual persons mentioned in those records. Putting this into GEDCOM terms, I imagine each of the records, the raw data, as being in the form of a few interconnected GEDCOM records holding the persons, dates, places and events mentioned in each source. 

A database would start out as a collection of all these "evidence" GEDCOM records, which then get collectively and maybe iteratively "merged" together into a final set of "real person" GEDCOM records. Persons records become the collection of all evidence you have discovered about them.

I really like this description. It aligns well with my thinking about genealogy systems.

I don't think we need another fully-featured genealogy application as there are a number of them already and they're good enough. What I think is missing is the integration between the various information sources. It's easy to take Gramps or any other app and input your data there. However, the current workflow is manually copying information from somewhere. What is missing is automation that would take data from online databases and match and merge them into your own database. It shouldn't matter what family tree software you use as long as it is possible to programmatically interact with its data whether it's LifeLines or GenealogyJ that process GEDCOM files directly or Gramps with its own Gramps XML data format.

I imagine building a pipeline of separate pieces of software that do these tasks, each task being a totally separate program:
- extract data from an online database
- convert data between different formats, e.g. GEDCOM <-> Gramps XML
- match data to data in my database (automatic or interactive) and output a file with the matches
- merge the data into my database (automatic or interactive)

Once the process works end to end, family tree programs can integrate with it.

Another piece I'm missing is building a database of sources that I haven't matched to my database yet and I don't know if and how they match. For example, I'd like to keep a list of all people with a certain last name who were born in a certain village. At some point it's possible that I'll connect them to my family tree. This is the part described by Tom as "a system capable of taking records from any kind of source (...) and then inferring the actual persons". In theory, this can already be done by adding disconnected individuals in family tree software but I don't see a lot of value in doing it this way.

— Przemek


Marshall Lake

unread,
Apr 17, 2024, 11:30:43 AMApr 17
to root...@googlegroups.com

Our resumes are similar in many respects, Tom.

And you have ambitious goals, as well. Building a system that can handle
records from any type of source may indeed be ... as you say ... the "holy
grail" of genealogical software.

Godspeed in your endeavor.

Enno Borgsteede

unread,
Apr 17, 2024, 11:47:32 AMApr 17
to root...@googlegroups.com

Hello Przemek,

I imagine building a pipeline of separate pieces of software that do these tasks, each task being a totally separate program:
- extract data from an online database
- convert data between different formats, e.g. GEDCOM <-> Gramps XML
- match data to data in my database (automatic or interactive) and output a file with the matches
- merge the data into my database (automatic or interactive)
I see what you mean, but there are already programs that support part of this, like Clooz and Evidentia, and I don't user them for a couple of reasons. And the 1st of those is, that they're a pain to use, and don't really support the research process as I see it, which is largely influenced by things that Tom wrote in the good old days of Better GEDCOM. And the 2nd is, that they're not open source.

Another piece I'm missing is building a database of sources that I haven't matched to my database yet and I don't know if and how they match. For example, I'd like to keep a list of all people with a certain last name who were born in a certain village. At some point it's possible that I'll connect them to my family tree. This is the part described by Tom as "a system capable of taking records from any kind of source (...) and then inferring the actual persons". In theory, this can already be done by adding disconnected individuals in family tree software but I don't see a lot of value in doing it this way.

And that's exactly what I want, because it makes our software as easy as Ancestry or FamilySearch, and that's what it should be, to work.

This should be quite easy to implement in a Gramps fork, and can be way better than the forms Gramplets, which are awful in a couple of ways, one of which is the way they store data, in source attributes.

In this model, you can have 'evidence' persons, who are not connected to families, like in the normal conclusion based GEDCOM model, but rather to sources/citations, which are the objects that Gramps already have. One may also connect event and location records to the citation, and they're also already there.

At first glance this suggests that the most import additions are some extra relations between objects, like between 'evidence' persons and 'conclusion' persons, and Tom has already suggested that there may be other ways to implement that, where one person can be derived from another one, or more, and this can be repeated in a sort of 'conclusion tree' (my term).

There is one other thing though, at least for Gramps, and that's the fact that any 'evidence' person can have associated facts, like a birth date and place for a spouse in a marriage record, and in Gramps, these can either be stored in attributes (Yuck!) or in the global event and location tables, which can be shared with other objects, which is a bit over the top. And looking back at what Tom wrote then, it's probably way easier to store all such facts inside the person object, like we already do with attributes, meaning that you store dates and locations inside facts in the 'evidence' person as they are written in the source, without drawing conclusions, like what the location (place name) actually means on a map.

This is quite easy to implement too, because the person object is flexible enough to be expanded with a list of facts, just like it has generic attributes already. And in fact some fellow developers have already suggested that we expand attributes with dates and locations.

Regards,

Enno


Marshall Lake

unread,
Apr 18, 2024, 5:00:09 PMApr 18
to Digest recipients

> What I think is missing is the integration between the various
> information sources. It's easy to take Gramps or any other app and input
> your data there. However, the current workflow is manually copying
> information from somewhere. What is missing is automation that would
> take data from online databases and match and merge them into your own
> database.

> Another piece I'm missing is building a database of sources that I haven't
> matched to my database yet and I don't know if and how they match.

These are both good points, and I plan to address both with the software
I'm developing. Additionally, ....

> And the 2nd is, that they're not open source.

... the software will be open source.

As I mentioned earlier, it's a large project with a big vision, and
hopefully I can get it completed while I'm still above ground.

At some point I will add the project to github but I'm not there yet.

Stephen Woodbridge

unread,
Apr 19, 2024, 8:05:26 AMApr 19
to root...@googlegroups.com
--

---
You received this message because you are subscribed to the Google Groups "rootsdev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rootsdev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rootsdev/AACF6744-57AB-410B-94E8-19DBA9CEE8EE%40gmail.com.

paul...@gmail.com

unread,
Apr 20, 2024, 9:27:51 AMApr 20
to root...@googlegroups.com

Stephen Woodbridge, I had to smile at your github remark about the 3-second load time.

My first experience of databases in the 70s was with System 2000 – hierarchical!

 

 

From: root...@googlegroups.com <root...@googlegroups.com> On Behalf Of Stephen Woodbridge
Sent: Wednesday, April 17, 2024 2:53 AM
To: root...@googlegroups.com
Subject: Re: [rootsdev] $0.02

 

You might want to look at this:

 

 

image001.png

Marshall Lake

unread,
Apr 20, 2024, 3:52:38 PMApr 20
to root...@googlegroups.com

> Stephen Woodbridge <stephenwo...@gmail.com>: Apr 16 09:52PM -0400
>
> You might want to look at this:
> woodbri/family-tree-nodejs: Application for loading GEDCOM files and serving them on the web as navigable family trees.
> github.com


Thanks, Steve. That is useful.
Reply all
Reply to author
Forward
0 new messages