Extra relationships in relations list

102 views
Skip to first unread message

Jim Eggert

unread,
Aug 21, 2012, 11:04:00 PM8/21/12
to GEDitCOM II Discussions
Running GEDitCOM II 1.7 build 4, I sometimes find extra relationships occur. For example, in finding the relationship between two of my ancestors who are in fact father and daughter, it shows them also as 2nd cousins once removed and other, even more distant relations. Now this can be correct if they are related also on the daughter's mother's side, but in this case the extra relationship is clearly shown through the father.

My guess is that the relationship pruning algorithm can be confused if there is some in-breeding in the ancestry.

Also note that if I ask for the relationships in the other order, with the daughter as the base person, I only get the one correct relationship. So the search algorithm isn't symmetric.

=Jim

PastedGraphic-3.png

John Nairn

unread,
Aug 24, 2012, 12:46:17 AM8/24/12
to geditcom-ii...@googlegroups.com
Hi Jim,

The relationship calculation is complicated. Here are some replies:

On Aug 21, 2012, at 8:04 PM, Jim Eggert wrote:

> Running GEDitCOM II 1.7 build 4, I sometimes find extra relationships occur. For example, in finding the relationship between two of my ancestors who are in fact father and daughter, it shows them also as 2nd cousins once removed and other, even more distant relations. Now this can be correct if they are related also on the daughter's mother's side, but in this case the extra relationship is clearly shown through the father.

I am certain all relationships found are correct (you can always verify them by selecting it and clicking "Show Chart"). I have never seem one that came through that did not correspond to such a valid chart, but I am not 100% certion it will find all unique connections. I think it always finds the closest one, but might miss higher level (and usually less interesting) relationships.

>
> My guess is that the relationship pruning algorithm can be confused if there is some in-breeding in the ancestry.

It is not really getting confused by finding extra paths. An earlier version of GEDitCOM II found a whole lot more extra paths and I decided some of them were redundant (I forget why but I think if one chart was self-contained in another except for subsequent identical direct lines, only the smallest chart should be reported). I was able to prune them off, but multiple and different relationships are common in large trees.

>
> Also note that if I ask for the relationships in the other order, with the daughter as the base person, I only get the one correct relationship. So the search algorithm isn't symmetric.

The tree traversal is a combination of ancestor and decendant tree recursive traversals. It goes up from the target individual and then down from each unique ancestor. Going up is clear (each person has two parents), but going down may happen in different order when started from different places when in-breeding is present. In-breeding means one couple will be reached by two paths. Searching for one person's relatives may find one spouse first while searching the companion person's relatives may find the other one first. This change in order will make it non-symmetric at least in order and maybe in final results. I would think it will always find the simplest relationship and other ones too, but you found one that did not. I would need to see the connections to begin to diagnose why, but usually the missied ones on less interesting.

This problem is like finding a map from place A to place B. Google maps may find different routes from A to B then from B to A, especially if the route has highways or one-way roads. Entering branches from different points might be similar to hitting one-way routes in mapping.

In summary, as far as I know:

1. The most direct relationship is always found
2. Extra relationships are found; these are always correct and efforts were taken to prune them only to unique or interesting ones
3. From your findings, it seems possible that not all extra relationships are found

------------
John Nairn
http://www.geditcom.com
Genealogy Software for the Mac

Jim Eggert

unread,
Aug 24, 2012, 8:53:38 AM8/24/12
to GEDitCOM II Discussions, John Nairn
I beg to differ. Quoted inline below is a simple GEDCOM file that shows the problem I see. If you ask for the relationship between the father and the son in this file, in one direction you get just the father-son relationship, in the other direction you get that plus a spurious 1st cousin once removed relationship.

The reason is apparently because the pruning algorithm improperly handles inbreeding in the father's ancestry (Grandpa and Grandma were siblings). If all "relationships" between father and son go only through the father, then the two are just father and son, and not cousins. The fact that the father has genetic heritage from the same ancestor in different ways doesn't change this.

I think the pruning algorithm needs to be improved. It should eliminate relationship paths with a common individual in the ascending side and the descending side unless that individual is the common person at the top of that relationship path.

=Jim

0 HEAD
0 @I1@ INDI
1 NAME /Son/
1 SEX M
1 FAMC @F1@
0 @I2@ INDI
1 NAME /Father/
1 SEX M
1 FAMS @F1@
1 FAMC @F2@
0 @I3@ INDI
1 NAME /Grandpa/
1 SEX M
1 FAMS @F2@
1 FAMC @F3@
0 @I4@ INDI
1 NAME /Grandma/
1 SEX F
1 FAMS @F2@
1 FAMC @F3@
0 @I5@ INDI
1 NAME /Great-grandpa/
1 SEX M
1 FAMS @F3@
0 @F1@ FAM
1 HUSB @I2@
1 REFN 24 AUG 2012 08:30:35
2 TYPE Creation Date
1 CHIL @I1@
0 @F2@ FAM
1 HUSB @I3@
1 WIFE @I4@
1 CHIL @I2@
0 @F3@ FAM
1 HUSB @I5@
1 CHIL @I3@
1 CHIL @I4@
0 TRLR

John Nairn

unread,
Aug 26, 2012, 3:11:48 PM8/26/12
to geditcom-ii...@googlegroups.com
Hi Jim,

I not sure what you mean by differ? I said your results show that the calculation is not symmetric and your compact example showed that too. One direction shows only father-son while the other shows both a father-son and a first cousin, once removed.

So the real quesitons are which is best and can it be made symmetric? In my opinion, the direction including the cousin relationship is better because it adds new information. It is finding a relationship in this family tree that is not in other family trees that do not have in breeding. If only father-son relationship is shown, you might miss that important family history. It reminds my of the movie Chinatown where Jack Nicholson confronts Faye Dunnaway about a relationship and she admits "She's my sister and my daughter!" It was important to the movie to know both relationships and not eliminate one. The current code finds both and is symeetric in the Chinatown example. I think your idea of removing those with common individual in ascending path would change that and find only one relationship (because Faye Dunnaway is in both asecnding lines for relationship from daughter to her).

I think I could make it symmetric by the slightly inefficient method of finding relationships in both directions and keepeing only unique ones. This approach would then have father-son AND cousin in your eample in both directions. The other approach is to improve the algorithm by changing the pruning method and see if it can be made symmetric.

In case you (or anyone else) was to take a look, below is a psuedo-computer code of the relationship calculation used by GEDitCOM II. It is simplied to assume no one has multiple spouses (i.e. multiple links to family records as a spouse). This algorithm can be applied to your example and gets the results you saw (but it could not be used for the Chinatown example because that father would be linked to two family records - one as parent of Faye Dunnaway and one as an incestuous partner). The question is how to prune out relationships that one does not want? The method used in the algorithm is to set a flag on each family record as it is visited in the recursive traversal of all the genealogy linkages. If this "check" flag is removed, a search might find a large number of meaningless cousin relationships. When it is in place, it eliminates many of them. It sometimes elimates too many (which in my opinion is eliminating of the cousin relationship in your example in one direction). The question is where to prune differently in this algorhim? Writing and debugging such recursive code can be a challenging task. Add new special case for one problem often causes new problems.

The algorithm:

# NOTE: this simplified algorithm assumes no person has more than one spouse. To find
# step relationships (such as half siblings, step parents, etc.), many additional
# steps are needed (and are done in GEDitCOM II), but this algorithm still has the
# asymmetric properties, but does find closest relationship and correct additional ones

# start search with toRecord (the source) looking for matches to fromRecord (the target)
toRecord = (record being viewed when relationship menu command selected)
fromRecord = (user-selected record for finding relationship)
hits = 0 (incremented by each successful FoundRelationship() call)
set all family record special flags to -2
DescendFromFams(family record with toRecord as spouse, 0, 0, null)
Display results in relationship panel
Stop

sub FoundRelationship(ancestGen, descendGen)
   if this relationship was already found
       return
   endif
   save ancestGen, descendGen, and lines for future display of results
   hits = hits+1
endsub 

# the recursive tree-traversal entry point
# ancestGen is ancester generation number from toRecord (starting at zero)
# descendGen is decendant generation number from a starting family record (which was at zero)
# skipRecord means to skip that individual is a chlld in fam
sub DescendFromFams(fam, ancestGen, descendGen, skipRecord)
    # if individual has no spouse record, special handling then exit
    if no fam
        # this branch has ended if not root person
        if(ancestGen>0 or descendGen>0) return

        # But, if source toRecord has no families, restart from parents, but when descend skip toRecord
        DescendFromFams(parents family record, 1, 0, toRecord)
        return
    endif

    # read individuals in this family
    husband = husband in fam
    wife = wife in fam
    parentMatch = No

    # add them only if not source level (ancestGen>0) and if they are direct ancestors (descendGen=0)
    if(ancestGen>0 and descendGen==0)
        if husband exists and is same as fromRecord
            FoundRelationship(ancestGen, descendGen)
            parentMatch = Yes
        endif 
        if wife exists and is same as fromRecord
            FoundRelationship(ancestGen, descendGen)
            parentMatch = Yes
        endif
    endif

    # if a parent did match then done (can't match agin by continued search in valid data)
    if parentMatch is Yes then return

    # Was this family checked before (-2 means never checked, -1 means yes and no matches, >=0 means checking now)
    # but I might be removing too many?
    # I think can only screen out descendGed>0 otherwise search may miss some ancestors
    if descendGen>0
        check = special flag on fam record
        if check=-2
            # beginning recursion from this family record's children
            set fam flag to current number of hits
        else if check=-1
            # was checked before and already found a relationship, do not find if again
            return
        else
            # this means returned to same family under its own recursion, which is probably error in
            # the data (such as ancestor their own descendant)
            log warning that family descends from it self, which might mean bad data
        endif
    endif

    # loop over the children in this family
    for each child in fam
        # skip if needed
        if child is same as skipRecord
           go on to next child in this loop
        endif

        # is child a match?
        if child is same as fromRecord
            FoundRelationship(ancestGen, descendGen+1)
            DescendFromFams(family records with child as spouse, ancestGen, descendGen+1, null)
        endif
    next child

    # now that all children checked recursively, see if any matches were found
    # if not, then set flag to -1 to skip this record if it comes back
    # otherwise set to -2 to allow additional matches (only when descednGen>0)
    if descendGen>0
        check = special flag on fam record
        if check = current number of hits
            set fam flag to -1
        else
            set fam flag to -2
        endif
    endif

    # Special case for direct ancestors (when descendGen==0). This section will
    # get their parents which will be the next level of ancestors
    if descendGen == 0
        if ancestGen == 0
            # Special case on first call for source toRecord. Get the parents of toRecord so can get
            # descendants of mother and father of main individual
            DescendFromFams(toRecord's parents family record, ancestGen+1, 0, toRecord)
        else
            # move up to next level of ancestors
            if husband exists
                DescendFromFams(husband's parents family record, ancestGen+1, 0, husband)
            endif
            if wife exists
                DescendFromFams(wifes's parents family record, ancestGen+1, 0, husband)
            endif
        endif
    endif
endsub

On Aug 24, 2012, at 5:53 AM, Jim Eggert wrote:

I beg to differ.  Quoted inline below is a simple GEDCOM file that shows the problem I see.  If you ask for the relationship between the father and the son in this file, in one direction you get just the father-son relationship, in the other direction you get that plus a spurious 1st cousin once removed relationship.

The reason is apparently because the pruning algorithm improperly handles inbreeding in the father's ancestry (Grandpa and Grandma were siblings).  If all "relationships" between father and son go only through the father, then the two are just father and son, and not cousins.  The fact that the father has genetic heritage from the same ancestor in different ways doesn't change this.

I think the pruning algorithm needs to be improved.  It should eliminate relationship paths with a common individual in the ascending side and the descending side unless that individual is the common person at the top of that relationship path.


On Aug 24, 2012, at 12:46 AM, John Nairn wrote:

In summary, as far as I know:

1. The most direct relationship is always found
2. Extra relationships are found; these are always correct and efforts were taken to prune them only to unique or interesting ones
3. From your findings, it seems possible that not all extra relationships are found


---------------
John Nairn (1-541-737-4265, FAX:1-541-737-3385)
Professor and Richardson Chair
Web Page: http://www.cof.orst.edu/cof/wse/faculty/Nairn/
FEA/MPM Web Page: http://oregonstate.edu/~nairnj

John Nairn, Developer

unread,
Aug 26, 2012, 4:35:19 PM8/26/12
to geditcom-ii...@googlegroups.com
Follow up on the relationship search algorithm:

I can make Jim's example symmetric by deleting line in previous post that reads

     if parentMatch is Yes then return

That example now show both father-son and 1st cousin once-removed relationships, which is I think what genealogists should want (i.e., it give useful information about that families in-breeding lines). If you delete grandma from that example, it correctly finds only the father-son relationship

John

John Nairn, Developer

unread,
Aug 27, 2012, 12:45:32 AM8/27/12
to geditcom-ii...@googlegroups.com, John Nairn
I looked more at the calculations and think I have an improved version. It now should be symmetric and it finds unique relationships only (and I think it finds all of them).

In my opinion, a unique relationship is defined as follows:

1. Start with family tree and two individuals and find all paths up ancestors from one individual and down through ancestors of the other individual. These paths meet at a common ancestor and are the paths you can see in GEDitCOM II by clicking "Show Chart" in the relationship panel.

2. Eliminate all relationships that have the same ancestor located at the same relative position from the bottom of the list. These are eliminated because that ancestor would also be the common ancestor in a simpler chart. The simpler chart is kepi, but the extended ones are eliminated.

In Jim's example, the first cousin, once removed chart is not eliminated. The "Father" is in both lines, but at different relative positions from the bottom of the lists. One might argue that this is a different style that should be eliminated too, but it does not fit the elimination criteria in step 2 above. My current thought is that it should be included. I think such lines only appear near incestuous families and finding them might be useful information.

Jim Eggert

unread,
Aug 27, 2012, 9:26:52 PM8/27/12
to geditcom-ii...@googlegroups.com
Just about anyone who has traced their family tree back far enough will likely find inbreeding. This causes pedigree collapse (German: Ahnenverlust, Implex, or Ahnenschwund). See
http://en.wikipedia.org/wiki/Pedigree_collapse

Your improved algorithm seems to be symmetric, but I still think it has a problem, if I understand it correctly. It seems to lay too much weight on the common ancestor being found in the same relative position in each ascending tree.

To show you why, take a look at the GEDCOM file quoted here, identical to the previous one except that a daughter has been added. In the present relationship algorithm in v1.7 build 4, if you ask for the relationship between the son and the daughter, you get two relationships no matter which direction you try. You get two relationships between the son and the father if you try in one direction, and only one if you try in the other direction.

With the new algorithm, I think you would get two relationships between the son and the father in either direction, but only one between the son and the daughter in either direction. How is the son multiply related to the father but only singly related to the daughter?
0 @I6@ INDI
1 NAME /Daughter/
1 SEX F
1 FAMC @F1@
0 @F1@ FAM
1 HUSB @I2@
1 CHIL @I1@
1 CHIL @I6@
0 @F2@ FAM
1 HUSB @I3@
1 WIFE @I4@
1 CHIL @I2@
0 @F3@ FAM
1 HUSB @I5@
1 CHIL @I3@
1 CHIL @I4@
0 TRLR


jimsne...@gmail.com

unread,
Apr 24, 2014, 5:39:10 PM4/24/14
to geditcom-ii...@googlegroups.com
Hi Jim

When I click on the Relationship Chart under the View menu in the Default Format, the resulting chart has several extraneous items, e.g. "Experimental:", a plus-minus symbol before birth and death dates, etc. (The chart is attached.) I would like to remove these as you have in the chart you posted.

How do I get these to stop printing? How do I add other items, e.g., you have "Religion" in your chart? I know there is a way to edit formats but after searching through the Editor and the Help pages I haven't found the answers to these questions for the Relationship Chart yet. (I'm just starting to work with G-II again after a while away from G-I.)

Thanks,

Jim
Relationship Chart.pdf

Jim Eggert

unread,
Apr 25, 2014, 11:06:29 PM4/25/14
to geditcom-ii...@googlegroups.com
You must have changed your tree view.  Open up any ancestor or descendant tree, click to put it in column mode, then edit the columns to include/exclude the information you want.  Then make a new relationship chart and it should look the same way.

=Jim

Jim Lewis

unread,
Apr 26, 2014, 10:50:51 AM4/26/14
to geditcom-ii...@googlegroups.com, sup...@geditcom.com
Hi Jim

Thanks , but no go. I have the default columns only enabled: Name, portrait and life span. Nothing about "Experimental" etc shows up in the View/Columns/ drop down menu.So I am still getting a line for Experimental and a +- symbol in my Relationship Chart. 

I must be missing something simple.

Jim


--
You received this message because you are subscribed to a topic in the Google Groups "GEDitCOM II Discussions" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/geditcom-ii-discussions/iv55x6_2Ysk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to geditcom-ii-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nairn John

unread,
Apr 26, 2014, 4:49:50 PM4/26/14
to Jim Lewis, geditcom-ii-discussions@googlegroups.com geditcom-ii-discussions@googlegroups.com
As Jim mentioned, the content of cells that show up in a "Relationship Chip" is the same as what appears in the cells in ancestor trees. Thus if you set them, it will control that chart too.

But maybe you are looking at something else? First. I do not see "Relationship Chart under the View menu in the Default Format"? The command for the chart I (and Jim) are talking about is the one you get by:

1. Open an individual (or chose one in index window).
2. Chose "Relationship..." menu command in the "View" menu (and that command is not releated to the "Default Format")
3. When list comes up, chose another individual, click OK and see relationship between those two individuals.
4. Finally, to see the "Relationship Chart" click on the "Show Chart" button.

If that is the chart you mean and you still see +- symbols, you may need to email me a screen caption of what your are seeing.  A screen capture of a typcial ancestor tree would help too (for comparison).

Regards,
John Nairn

Jim Lewis

unread,
Apr 30, 2014, 8:20:05 AM4/30/14
to geditcom-ii...@googlegroups.com
John,

Here are the Individual Chart and the Relationship Chart and the Ancestor Chart. The problem is that the "+" symbol shows up in the Relationship Chart in front of Birth and Death.  Can you delete this symbol?

Thanks, Jim


Attachments:

Individual (Albert Lewis):
Albert Webber Lewis.pdf
Relationship Chart.pdf
Albert Webber Lewis Ancestor Chart.pdf

Nairn John

unread,
Apr 30, 2014, 10:33:25 AM4/30/14
to geditcom-ii...@googlegroups.com
Hi Jim,

Thanks for the figures and I now know the answer. It is a bug in GEDitCOM II that I will fix for the next version (a beta is posted on the www.geditcom.com downloads page now and final one is getting closer). The problem is the option in the tree column customization under "Column Name" using the "In Charts" pop-up menu. It seems the "Releationship Charts" do not correctly handle all those options. It looks like you selected "Omit Label, keep empty lines." When I chose that option, I get the same ± symbol in the chart. To get rid of them, you are currently limited to using "Use Label, omit empty lines" or "Omit Label, omit empty line." The first uses a label and the second omits it. The two options to "keep empty lines" currently are not working correctly in "Relationship Charts" (although they work for full trees). It will be fixed for the next release and maybe sooner in a beta posting.

Regards,
John Nairn

--
You received this message because you are subscribed to the Google Groups "GEDitCOM II Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geditcom-ii-discu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Relationship to Col. Leonard Lewis:


<Albert Webber Lewis.pdf><Relationship Chart.pdf>



Ancestor Chart:
<Albert Webber Lewis Ancestor Chart.pdf>


You received this message because you are subscribed to the Google Groups "GEDitCOM II Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geditcom-ii-discu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

John Nairn

unread,
May 30, 2014, 3:07:39 PM5/30/14
to geditcom-ii...@googlegroups.com
This issue with relationship chart not getting all chart labeling options correct is fixed in the just-posted beta version at


Please try it out if you had problems.

John Nairn

--
You received this message because you are subscribed to the Google Groups "GEDitCOM II Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geditcom-ii-discu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Relationship to Col. Leonard Lewis:


<Albert Webber Lewis.pdf><Relationship Chart.pdf>



Ancestor Chart:
<Albert Webber Lewis Ancestor Chart.pdf>
On Apr 27, 2014, at 5:49 AM, Nairn John wrote:

You received this message because you are subscribed to the Google Groups "GEDitCOM II Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geditcom-ii-discu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages