--
---
You received this message because you are subscribed to the Google Groups "rootsdev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rootsdev+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
genealogical data is so sloppy, with so many special exceptions, that force-fitting everything you might wish to record into restrictively-formatted tables is essentially impossible.
I think all the current stress on GIT, versioning, and atomic updates is wrong.
The use of relational databases PREVENTS the development of non-frustrating, flexible software implementations in the genealogical area
John,
Your obvious grasp of no-SQL, and seeing more of the results of your research, is VERY encouraging! Thanks for sharing your thoughts.
My overall concerns involve getting the genealogical data model to the level of excellence required for supporting genealogical (really historical) research. I've been involved with that issue for almost 30 years. My frustration with the GenTech model (I was a curmudgeon all the way back then too), then Better GEDCOM, and now FHISO (was there even a peep from them at RootsTech?), has led me to pretty much bow out. I bet I have a much longer failure rate than anyone else in this discussion! I spell doom to these efforts, and you really don't want me around. But I hate all the resetting to zero that happens with any new effort. I am happily retired, writing genealogical software in the manner I think best, using the models I think best, and am more or less content to let things unfold, or not unfold, as they will. It's a crying shame, though, isn't it, that genealogical software is so icky? Does it have to be to be that way, or are we all just collectively too dumb to figure out how to do it right?
Clearly SQL and relational databases are flashpoint issues for me. Sends me up the wall. So, by the way, is XML, especially when someone claims that it is inherently better than, say, GEDCOM (syntax!!) for archiving genealogical data.
Version control is a bit of a flashpoint for me, pretty obviously, also. For me the proper model of a person is a single summary record with an associated set (or multi-tiered tree) of persona (evidence person) records that provide all the explicit evidence serving to prove the existence of that person. The "primary operation of genealogical research", as it affects the structure of a properly designed genealogical database, is to create new evidence records (properly linked to source records), associate them with the right summary person (individual), and update the summary information about that individual from the new data. The glomming of the new evidence record to the individual is an explicit conclusion made by the researcher. It may be so obvious as to not require anything other than the glomming operation, or it may require an actual conclusion statement. The properties that exist at the individual level must be an intelligent summarization of all the facts from the evidence records. Each of those "intelligent summarizations" is obviously a conclusion, in the classical inductive and deductive senses, based on the evidence.
If version control is to be used I imagine that it would primarily be used in the context of the adding of evidence persons to individuals, removing evidence persons from individuals, partitioning the evidence persons from n individuals (n may be 1) into m individuals (m may be 1). These are certainly operations that it might be useful to track using version control.
How important do you (all of you, not just John) think it is to be able to reconstruct the history of every glom, unglom, rearrange, split, lump operation performed? Do you really think it is useful to keep around the history of all your conclusions and conclusion un-windings. Frankly, and maybe this applies only to me, I would never bother to do it. In almost every single case imaginable, I will glom the right evidence person (persona, please) into the right conclusion person, modify the conclusion person's summary information to reflect the new evidence, and leave things alone until more evidence shows up to add. Sure, once in awhile I will decide that I have glommed together evidence records from more than one real person into a single individual, and I will break the individual person into two individuals, but I can't imagine any productive reason that I would want to keep a formal trace of that partitioning operation. In the extremely rare event that I might decide I have to go back on one of these glomming, unglomming, partitioning operations, I will simply do it. Call me a luddite.
I don't think it is all that hard to find the theoretically right way for this model of individual and evidence records in a personal database to work alongside a "master" online database that is based on exactly the same constructs. Of course, if the two databases don't share the same evidence records and source records, all bets are off. Right there is the first almost insurmountable problem of bringing the worlds together. If there isn't a lingua franca for evidence and source records that all share, a seamless system is impossible. If you truly want this world, that is the first problem to solve. Do you have a spare thirty years lying around? To do this there must be an agreed upon standard for exchanging source and evidence records, agreed upon by the two big gorillas, all the yammering monkeys, and all little lemurs. There must be an agreed upon standard so that the 1860 census record for Joe Shmoe is the same, uniquely identifiable evidence record, in every place it shows up, and, of course, the source record for the 1860 census as a whole must be identical wherever it shows up. Does anyone honestly think we can make this happen? Does anyone honestly think we don't have to solve this problem to get a generic solution to the online/master, offline/personal database problem?
Tom
On Feb 13, 2014, at 12:59 PM, John Clark <socra...@gmail.com> wrote:
> Tom,
>
> I am actually advocating for using a No-SQL backend - a Property Graph. I also considered more traditional No-SQL backends such as couchdb or mongodb, but there are many advantages to be gained by using a Property Graph. Allow me to share some of my thoughts.
> * Property Graphs are inherently Schema-less at the data base level. They have mechanisms that allow the enforcement of a schema at the application level (Neo4j's Node Labels and Relationship types, Constraints, etc). And while Node and Edge Properties have a type, Property X on Node 1 can have a different type than Property X on Node 2.
> * Graphs that implement Gremlin has a rich traversal query language that simplifies data retrieval. Queries like "What married people within 5 generations have 1 or fewer children associated with them" or "Given any 2 people in the graph, what is the most direct lineage between them" or even "What birth events have fewer than 2 high confidence sources attached to them" (Please note that the data must be modeled in a way to enable these types of queries. I am merely pointing out what one could do, not what has already been done).
> In regards to your frustration with the restrictive nature of the current systems, I absolutely agree with you that there is a huge area for improvement. I will be sharing my thoughts on data models a little later, but your comment points out a very real truth.
>
> genealogical data is so sloppy, with so many special exceptions, that force-fitting everything you might wish to record into restrictively-formatted tables is essentially impossible.
>
> In a later part I will be discussing the dissonance between designing a data model around what I want to display to a user vs designing a data model around what actually happened.
>
> As to the focus on versioning, atomic updates, Git, etc, my thoughts are that we need a flexible system that natively deals with those concepts that we can build on. Adding them after the fact leads toward the same type of issues we have now with regards to collaboration, disagreements, and privacy.
>
> - John C.
>
> On Thu, Feb 13, 2014 at 10:48 AM, Thomas Wetmore <tt...@verizon.net> wrote:
> Ryan,
>
> > I think John's leading up to something that is at a layer above the relational vs. nosql vs. graph database debate. At least that seems to be the direction given the focus on atomic updates and versioning.
>
> I will be negative here. I think all the current stress on GIT, versioning, and atomic updates is wrong. The stress should be on the genealogical process, and the model and the records required to support it. Then one can talk about any version control issues that may actually exist. I think complex version control in genealogical databases as so far over the top in terms of what is really needed now, and possibly forever, that I think they are preventing work that would get us somewhere. If I am wrong so be it, and I will be happy, as I always am, to admit my wrong-headedness. For me all this GIT stuff is just so much sophomoric exuberance that we will have to wait to get past.
>
> > I like non-relational databases, especially for this domain space. I'm confident that we'll all eventually get there. But I think it's terribly naive to think that our frustrations with "EVERY existing genealogical program and online system" is rooted in relational databases. For example, I have no doubt that after FamilySearch finishes their migration to Cassandra, you'll still be frustrated with what you can't record.
>
> I stand by my statement. I will, however, add the contrapositive you imply -- using a no-SQL database (e.g., Cassandra) will NOT guarantee that a genealogical system based on it will be flexible and not cause frustration. The use of relational databases PREVENTS the development of non-frustrating, flexible software implementations in the genealogical area; and, although the use of non-relational databases ENABLES non-frustrating, flexible software, it does not GUARANTEE it.
>
> Tom
How important do you (all of you, not just John) think it is to be able to reconstruct the history of every glom, unglom, rearrange, split, lump operation performed? Do you really think it is useful to keep around the history of all your conclusions and conclusion un-windings. Frankly, and maybe this applies only to me, I would never bother to do it. In almost every single case imaginable, I will glom the right evidence person (persona, please) into the right conclusion person, modify the conclusion person's summary information to reflect the new evidence, and leave things alone until more evidence shows up to add. Sure, once in awhile I will decide that I have glommed together evidence records from more than one real person into a single individual, and I will break the individual person into two individuals, but I can't imagine any productive reason that I would want to keep a formal trace of that partitioning operation. In the extremely rare event that I might decide I have to go back on one of these glomming, unglomming, partitioning operations, I will simply do it. Call me a luddite.
--
I do agree that it would be very nice and handy to have a way to graphically see exactly the evolution history of individuals and relationships as they have been built up over time with the adding and possible occasional removal of evidence. And it would be cool to see the splitting and lumping and rearranging of individuals over time. But, would I really want to go back in history as shown by that graph to a certain point, and then go off in a different direction; or would I just use that graph to help me understand the mistakes that have been made, and use it as a guide to help rearrange evidence to get things back in order.