> Attribute dates are still not sorted properly!
> When date range starts with a same year as a single year date, it
> comes first which is wrong.
> Like 1855 should be before 1855–1866. It's so simple. No fuzzy logic
> nor artificial intelligence required. Can we please have this already…
Actually, it is not so simple, but is solvable and GEDitCOM (is supposed) to be using the most rational approach. Given two date ranges, you can construct a probabilty distribution that date 1 is separated by date 2 by t days. For dates like 1855 and 1855-1866, that function crosses zero - in other words, 1855 might be after 1855 to 1866 (such as if the first turns out to be Oct 1855 while the second turns out to be Mar 1855. You can integrate this function to find the probability that t less then 0 or greater than zero. For 1855 and 1855-1866, fuzzy math methods say:
1. 1855 is after 1855-1866 with a probablity of 1/24
2. 1855 is before 1855-1866 with a probablity of 23/24
Working out all the details, date 1 is more then 50% likely to be before date 2 is the midpoint of date 1 is before the midpoint of date 2. GEDitCOM II uses this fuzzy logic in date sorts. It therefore should be sorting 1855 before 1855-1866, but I tried and didn't. I will have to look into what is happening.
>
> Although the GEDCOM standard doesn't say it explicitly, it's
> semantically obvious the two date range expressions have different
> purpose and are not interchangeable:
> BET x AND y
> Is meant for single events like birth or death where the exact date is
> unknown and only an approximate date range can be estimated.
> FROM x TO y
> is meant for attributes that have duration like residence and
> occupation. Ie. people can live in a given residence various lengths
> of time and the beginning and ending dates of this stay is given with
> this expression. As I use only year accuracy for attributes it's
> sometimes a single year, because they move soon to another place, so a
> single year is enough and no range is needed.
> Why would there be two different date range expressions if there's no
> difference between them?
> QED
> Cheers
The GEDCOM standard is explicit, although GEDitCOM II does not really distinguish these two types of dates. Both are considered a date range for date calculations. Users can use them to indicate range or period as needed.
Here is how the standard describes it:
BET =Event happened some time between date 1 AND date 2. For example, bet 1904 and 1915 indicates that the event state (perhaps a single day) existed somewhere between 1904 and 1915 inclusive.
FROM =Indicates the beginning of a happening or state.
TO =Indicates the ending of a happening or state.
The BET/AND is called a "date range" and the FROM/TO is called a "date period."
John Nairn
=Jim
> If all you're interested in is being above the 50% point on the probability distribution (for date sorting purposes), and if you are using a uniform probability distribution for date ranges, then it is sufficient simply to compare the midpoints of the ranges. Whichever range has the greater midpoint, then a random date selected from that range with a uniform distribution is mathematically more likely to be later than one similarly selected from the other range.
>
> =Jim
>
And that is exactly how GEDitCOM II compares date ranges. The catch is for date periods indicated as FROM date1 TO date2. It is kind of like comparing apples and oranges because a date range means an event happened one day in that range while a date period means a state that existed for all those dates.
For example if a family lived in a residence FROM 1900 TO 1920 and one child was born in 1908, was that child born before or after they lived in the house? The full answer is both. The child was born after they moved into the house and before they moved out. Computer computer searching algorithms, however, frown on ambiguous answers and some other decision is needed. If you use midpoints, the answer would be 1908 is before FROM 1900 TO 1920, but intuitively saying the "residence event" happened before the 1908 birth makes more sense (in my opinion). For that reason, GEDItCOM compares date periods by using their start date rather than their midpoint.
If you really want a date range and not a date period, the date should be entered as BET date1 AND date2. The later will be sorted using their midpoints.
John
You should be entering
Residence A: FROM 1855 TO 1855
Residence B: FROM 1855 TO 1855
Residence C: FROM 1855 TO 1866
These will sort almost correct. As I wrote before dates periods are sorted by their first date. Here FROM 1855 to 1855 and FROM 1855 to 1866 have the same start date (1 JAN 1855) and therefore may sort in either direction (depending on how they were ordered at the start). To solve this, you said you know that this person lived in Residence B for a few months and 1855 before moving into Residence C. Therefore Residence C cannot be from 1 JAN 1855. To document this knowledge, you could change it to
Residence A: FROM 1855 TO 1855
Residence B: FROM 1855 TO 1855
Residence C: FROM MAR 1855 TO 1866
This will now sort as you want for these residence occupations. Residence B and C overlap. If you have more information on the dates it should be entered into the residence events. Also note that Residence B must be entered with the strange FROM 1855 TO 1855 to have GEDitCOM II recognize it as a date period. This entry actually means FROM 1 JAN 1855 TO 31 DEC 1855 (and could be entered that way if you prefer). If you enter just 1855 it implies a date range and is identical to BET 1 JAN 1855 AND 31 DEC 1855. This date will sort by its midpoint and therefore will come after the start Residence C.
A definition of fuzzy data is that it is not always clear how to sort - if it was the data would not be fuzzy. Another issue with fuzzy data is what is clear by human interpretation is not always clear to computer code. If you follow how GEDitCOM II is dealing with date periods and date ranges, however, you should always be able to get a sorting that makes sense.
Potentially GEDitCOM II could treat all residence dates as date periods, but it does not. All date fields can have a date range or a date period at the users control. If you want residences to be a date period, they have to be entered that way. I will think about whether it makes sense to change that, but I general, I think it is better to give users control of data entry.
I will look at other options instead. For example, perhaps it only matters to sort command and can be done in that code and would not be needed any place else. Also distinguishing 1855 from 1855 to 1866 is difficult in current sort method because it is based on one number. Accounting for start date and end date would require addition of secondary sorting criteria (unless the range can be encoded into a single number somehow for efficiency)
On nested dates such as FROM CAL 1777 TO BET 1784 AND 1786
These cannot be entered because GEDitCOM II has adopted all GEDCOM options for dates, but that option is not one of them. You do however, have some ways to record such a date:
1. Any date can be followed by a comment in parentheses such as
FROM 1777 TO 1786 (1777 was calculated, end date may have been 1784)
The comment is never used in date calculations, but will be there for your records.
2. To do the same thing in a hidden field, you can use the custom GEDitCOM II option to attach a memo to the date (or any) field:
a. Enter FROM 1777 to 1786 into the field
b. Control click on the field and chose "Attach/Edit Memo" from the pop up menu
c. Enter any line of text (such as "1777 was calculated, end date may have been 1784") and then type return or enter
Now when you hover the curse over that field, the comment will appear in a pop-up window rather then the help string for date fields. You can control click again to change the memo.
If you are exporting GEDCOM data to share with others using different software, the first option might be better. The second option exports a custom tag right after that date. If you want to both use memos and export GEDCOMs to share in other software, you can run the script "Export Data/Move Memos to Notes" before exporting your data and all memos for a record will appear in notes for the record. They will not be next to the actual date, but will be available for reference.
Residences in GEDCOM files are also a schizophrenic type of event/attribute. Over the years of developing GEDitCOM/GEDitCOM II, I have included residences in either events or in attributres. The current version treats that as a unique type of event. The GEDCOM standard calls it an attribute, but unlike every other attribute, no text is allowed on the main RESI line to describe it as an attribute. The residence information is all in the subordinate detail for date, place, and address. Residences are therefore more like events because they also have no text on the first line. Although an event can have the text "Y" to indicate and event as occurred and residences do not formally allow that option.
Attribute dates and places are fairly uncommon. About the only attributes I ever create with a date and/or place are residences (if called an attribute), occupation, and education. Dates could make sense for other attributes, but not very often (e.g., a conversion of religious affiliation).
BEF x and AFT x are another challenge in fuzzy dates. I used to set the start time for BEF x to a small number (i.e., the beginning of time) and end date for AFT x to a large number in the future. This did not work well with midpoint searching (they would always be first or last). Currently BEF and AFT are ignored and the date is sorted without using that information. The real problem is that BEF x and AFT x just does not provide computer code enough information. They need to be supplemented with some idea on how long before or after. For example someone born BEF x where x was from a baptism record was probably born a few days or at most a month before that date. But, someone who died AFT 1881 (when they appeared in the 1881 census) may have died many years after that date. You could comment the date:
AFT 1881 (maybe several years)
or make up a range
BET 1881 and 1891 (he was not in the 1891 census)
The first would sort by middle of 1881, the latter would sort by 1896.