Any thoughts on LevelGraph graph database and GEDCOM?

44 views
Skip to first unread message

stephenwo...@gmail.com

unread,
Jan 13, 2019, 3:01:29 PM1/13/19
to rootsdev
Hi,

Has anyone played with LevelGraph graph database and GEDCOM files?

I'm interested in trying this out for my GEDCOM data for doing things like computing relationships between two people in the GEDCOM. By I'm an old relational DB guy and I'm have trouble figuring out how to get started. I'm interested in doing this in node.js and found:


I think I saw another package the will load JSON-LD into a levelgraph database

Then I'm getting lost on how to make queries. I think in part because I need to better understand how the above packages actually load the data so I know what data I have to work with.

Anyway, I would be interested in what others might have tried and your input and the best direction for tackling this problem.

-Stephen Woodbridge

Thomas Wetmore

unread,
Jan 14, 2019, 9:51:29 AM1/14/19
to root...@googlegroups.com
Stephen,

(Hi!). I did research on graph databases for genealogy a number of years ago. I believe they are natural for genealogical databases, and not only for the obvious reasons of relationships between persons. There are also relationships between persons and records; records and sources; records and events; dates and places and events; and so on. The one to one relationships between GEDCOM nodes, JSON nodes, genealogical concepts, software objects, simply cry out for unification in software.

But that's as far as I've gone. For the past number of years I've been "doing genealogy" not "doing genealogy software."

I have been thinking of rolling another program someday. The most interesting question about a database for me now is whether we need a "real database" at all. Processors are so fast now, and RAM memory is so massive and cheap now, that a GEDCOM file with tens or hundreds of thousands of persons, could be read into memory at the start of any new program run, and a full in-core database inflated in seconds at the most. That database should be, in my opinion, a custom graph database, internal only, no external storage at all. Rewrite the GEDCOM or whatever final file format is used at the end of the run, and it sits there as a flat file until the next run of the program.

There's a Mac genealogy program, GEDitCOM, don't know if it still exists, that reads the GEDCOM file at the beginning and goes from there. The first versions of that program go almost as far back as LifeLines, and I've been impressed by the program.

Rootsdev is a pretty bleak place now. I get the monthly automated message and that's about it. I have my theories about why the near absence of interest in these topics now, but that's a long tangent from here.

Maybe your questions about graph databases will stimulate some interaction.

Best,

Tom Wetmore

Dallan Quass

unread,
Jan 14, 2019, 10:24:46 AM1/14/19
to root...@googlegroups.com
I completely agree with Tom about reading the entire GEDCOM into memory when you first open a tree. Most GEDCOMs are less than 100k people, so they comfortably fit in memory. Keeping the tree in memory also makes search, sort, and navigation fast. You can go a long way with a memory-based data structure. 

--

---
You received this message because you are subscribed to the Google Groups "rootsdev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rootsdev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Woodbridge

unread,
Jan 14, 2019, 11:22:32 AM1/14/19
to root...@googlegroups.com
I just spent most of last night reading through the list archives. There
are a lot of interesting discussions and it is too bad the list seems to
be somewhat moribund these days.

John Clark's - The Tree Problem series was very interesting, but it
seems his links to code on github.com/genealogysystems are gone. I also
found his post on Trepo very interesting. And findarecord.com domain
seems to be gone also.

Regarding GEDCOM in memory, most of my development is based on building
user facing tools on top of other exiting tools, rather than building
the raw tools and databases themselves. So I'm not opposed to read the
GEDCOM into memory, but I don't want to write the tools to do that
myself, especially if one already exists. My family tree webserver
evolved out of one of the reports I wrote for LifeLines which turned a
GEDCOM file into Postscript book. It has had many invocations over time
with the latest being a rewrite of the PHP MySQL|PostgreSQL instance
into node.js using SQLite as the backing store. In my case, since I'm
only serving an existing GEDCOM file and not editing it SQLite works
fine until I want to ask, How is person X related to Y? And there might
be some monster recurvise SQL query to figure that out, but I also write
code to learn about new tools, hence wanting to look at using a graph
database, like LevelGraph.

Are you guys aware of a tool(s) that can be used read a GEDCOM and build
a graph? Preferably in node.js :)

-Stephen Woodbridge
> <mailto:stephenwo...@gmail.com> wrote:
> >
> > Hi,
> >
> > Has anyone played with LevelGraph graph database and GEDCOM files?
> >
> > I'm interested in trying this out for my GEDCOM data for doing
> things like computing relationships between two people in the
> GEDCOM. By I'm an old relational DB guy and I'm have trouble
> figuring out how to get started. I'm interested in doing this in
> node.js and found:
> >
> > https://github.com/levelgraph/levelgraph
> > https://github.com/MidnightLightning/gedjson
> >
> > I think I saw another package the will load JSON-LD into a
> levelgraph database
> >
> > Then I'm getting lost on how to make queries. I think in part
> because I need to better understand how the above packages
> actually load the data so I know what data I have to work with.
> >
> > Anyway, I would be interested in what others might have tried
> and your input and the best direction for tackling this problem.
> >
> > -Stephen Woodbridge
>
> --
>
> ---
> You received this message because you are subscribed to the Google
> Groups "rootsdev" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to rootsdev+u...@googlegroups.com
> <mailto:rootsdev%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
> --
>
> ---
> You received this message because you are subscribed to the Google
> Groups "rootsdev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to rootsdev+u...@googlegroups.com
> <mailto:rootsdev+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Enno Borgsteede

unread,
Jan 14, 2019, 2:55:46 PM1/14/19
to root...@googlegroups.com
Op 14-1-2019 om 17:22 schreef Stephen Woodbridge:
> Are you guys aware of a tool(s) that can be used read a GEDCOM and
> build a graph? Preferably in node.js :)

There's a large list of GEDCOM readers on
https://www.tamurajones.net/OpenSourceGEDCOMParsers.xhtml

You need to try them to see if there's anything you like.

Cheers,

Enno


Christopher Mosher

unread,
Jan 15, 2019, 7:42:51 PM1/15/19
to rootsdev
I have done some experimenting with Neo4j graph database with respect to storing genealogical data using the evidence/conclusion model.

I imagined a graph database containing *citations* (to some external *sources*), related to *persona* and their *events*, with *cross-references* (the conclusions, mostly of the "this persona is the *SAME AS* that persona" type). I also imagined a separate graph database dedicated to *places*, capable of holding complete hierarchical and historical information.

I've done some coding in Java to experiment with this data model, mostly to see if it's plausible at all, and just help to get the details straight in my head. I chose a static-typing (Java) over a dynamic-typing (javascript), but the same model could be represented in any language, really.


Regards,
Chris Mosher

Thomas Wetmore

unread,
Jan 15, 2019, 8:28:53 PM1/15/19
to root...@googlegroups.com
Chris,

Great experiment. Sounds like a very good model. I did some evaluation of Neo4j and thought it would be a good choice. MongoDB was the other one I looked into.

There is an O'Reilly book, "Graph Databases," that Neo used to provide a free digital copy of at their website. It's a great introduction to graph databases and how they fit in the overall world of databases. One of the authors is the designer of Neo4j. What was especially appreciated by me was lots of information about implementation details, suitable for building your own fully in-RAM database as I mentioned recently.

And there's enough information to gain you a basic knowledge of how making queries works in graph databases.

Tom Wetmore



--

---
You received this message because you are subscribed to the Google Groups "rootsdev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rootsdev+u...@googlegroups.com.

Ken Finnigan

unread,
Jan 15, 2019, 9:15:33 PM1/15/19
to root...@googlegroups.com
This has been a great thread.

I've been investigating developing a genealogy application for a while, in particular over the last 6 months, and as part of that I've researched graph databases, mostly neo4j. I also agree that a graph database is a very natural fit for genealogical data and its relationships.

Regards
Ken Finnigan

Stephen Woodbridge

unread,
Jan 15, 2019, 9:37:55 PM1/15/19
to root...@googlegroups.com
Chris and Thomas,

Thanks, I will look at Neo and look for the "Graph Databases" book. At
the moment, I'm still playing around with LevelGraph but there are not
many examples and very little doc, then again that might be because it
is fairly immature, but I can't pass that judgement yet, because it's
more likely the my knowledge in its application is more likely what is
immature :).

I've gotten as far as loading my GEDCOM data into levelgraph and being
able to make trivial queries against nodes and edges. Next is to figure
out some more complex queries like list ancestors or descendants,
shortest path from X to Y to compute how people are related.

My goals are much more modest than trying to write a genealogy program,
I use Family Tree Maker and have used PAF in the past and probably one
or two other programs and I find these adequate for data entry of data,
but generally hate them for display and presentation of data. And for
the last 30 years hosted my genealogy off my own servers, so my coding
has been around presentation, analysis of data ala LifeLines reports,
publishing it on the web. I'm also moving more towards doing genealogy
research now that I have some time for that, but I still love doing a
little hacking and experimentation.

-Steve

On 1/15/2019 8:28 PM, Thomas Wetmore wrote:
> Chris,
>
> Great experiment. Sounds like a very good model. I did some evaluation
> of Neo4j and thought it would be a good choice. MongoDB was the other
> one I looked into.
>
> There is an O'Reilly book, "Graph Databases," that Neo used to provide
> a free digital copy of at their website. It's a great introduction to
> graph databases and how they fit in the overall world of databases.
> One of the authors is the designer of Neo4j. What was especially
> appreciated by me was lots of information about implementation
> details, suitable for building your own fully in-RAM database as I
> mentioned recently.
>
> And there's enough information to gain you a basic knowledge of how
> making queries works in graph databases.
>
> Tom Wetmore
>
>
>
>> On Jan 15, 2019, at 7:42 PM, Christopher Mosher <cmos...@gmail.com
>> <mailto:cmos...@gmail.com>> wrote:
>>
>> I have done some experimenting with Neo4j graph database with
>> respect to storing genealogical data using the
>> evidence/conclusion model.
>>
>>
>> I imagined a graph database containing *citations* (to some external
>> *sources*), related to *persona* and their *events*, with
>> *cross-references* (the conclusions, mostly of the "this persona is
>> the *SAME AS* that persona" type). I also imagined a separate graph
>> database dedicated to *places*, capable of holding complete
>> hierarchical and historical information.
>>
>> I've done some coding in Java to experiment with this data model,
>> mostly to see if it's plausible at all, and just help to get the
>> details straight in my head. I chose a static-typing (Java) over a
>> dynamic-typing (javascript), but the same model could be represented
>> in any language, really.
>>
>> https://github.com/cmosher01/Genealdb
>>
>> Regards,
>> Chris Mosher
>>
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google
>> Groups "rootsdev" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to rootsdev+u...@googlegroups.com
>> <mailto:rootsdev+u...@googlegroups.com>.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
>
> ---
> You received this message because you are subscribed to the Google
> Groups "rootsdev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to rootsdev+u...@googlegroups.com
> <mailto:rootsdev+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.



Reply all
Reply to author
Forward
0 new messages