Am a little surprised by the "month_names_in_" (Dutch, French, Gregorian, Hebrew, Julian) stuff, with the language name hard-coded.
Wikipedia says (http://en.wikipedia.org/wiki/Lists_of_languages) :
"According to SIL International, there are 6,309 spoken languages, as cataloged and described in the book Languages of the World (ISBN 0883128152). The International Organization for Standardization (ISO) assigns codes for most languages: for example, ISO 639-3 uses "eng" for English and "apk" for Plains Apache, one of the five Apache languages of North America."
Of course, one would only ever encounter a tiny fraction of those 6,309 languages. But (say) you want Spanish, German, Italian; this would require month_names_in_spanish, month_names_in_german, month_names_in_italian, which seems rather silly.
So, my very simple suggestion is that month_names_in_dutch(), month_names_in_french() [...] be replaced with month_names_in(language).
Mike
OK! I admit my code does not really check days per month for, say, any
of the Julian Calendars, as per
http://en.wikipedia.org/wiki/Roman_calendar.
In order to do that, the code would have to permit the user to specify
month names for their chosen (say) Julian calendar, and days per month,
and the same for any other calendar.
To that end, Mike Hamilton's suggestion of set_* and get_* sound like a
better mechanism than my default.
Let me think about it. After all, I've released the /first/ version of
this code, not the /last/...
On Thu, 2011-09-15 at 15:43 +1000, Mike Hamilton wrote:
> Ron writes:
>
> > I think it'd be set_month_names($calendar, $array_ref_of_names), which would
> > store those names into the pre-existing hash of default month names per calendar.
>
> I'd prefer set_calendar(calendar) and set_month_names(language,month_names). Completely decouple the calendar from the month names.
I'll think about this. You've got me worried, if nothing else.
You're implying that you can think of a /realistic/ use case with 1
calendar and multiple languages.
Now I'm beginning to hope you're wrong :-)).
> > I'm astonished you think sub-classing is inelegant and clumsy
>
> Real-life example: I have French and German names in my tree, so I need French accents (acute, grave, etc) and, maybe one day, German umlauts. So I have to create new classes for French and German, and then some sort of multiple inheritance class FrenchGerman ?
No no. You've overlooked the precise details of my non-existent
implementation. Hahahahahaha.
The code will stockpile user options, be they calendar or language.
So, calls to set options can be endless in number. The code will just be
storing stuff in a hash.
Access to the hash is, as always, triggered by your usage of a
date_calendar_escape in the GEDCOM file itself.
By design, a single file can hence use any number of escapes.
> Non-real-life: tomorrow I receive a new tree in Swahili, which requires a FrenchGermanSwahili class. The day after, my long-lost Tibetan cousin emails with his tree. He has data in Tibetan, Thai, Manipuri, Konkani and Marathi, so do I create a new class named FrenchGermanSwahiliTibetanThaiManipuriKonkaniMarathi ?
>
> Yes, that's of course an extreme, contrived and absurd example.
>
> Please don't take any of my comments as being negative. DateTime::Format::Gedcom is already a worthwhile and useful module. GEDCOM parsers seem easy at first glance, but many have given up after facing the nitty-gritty bits, the "ifs and buts" and the edge cases !
I assuredly don't take them negatively.
I haven't given up yet, but I'm beginning to get your hint...
> Must sign off now, there are some angels dancing on the heads of pins that I have to count ...
What a coincidence. I computed that number earlier today. It's ...
Damn - there's not enough space in this margin for my proof, but it can
be approximated by $infinity/$zero.
On Thu, 2011-09-15 at 18:42 +0100, Mike Elston wrote:
> Hello all,
>
> I have now had a chance for a first look at DateTime::Format::Gedcom
>
> I applaud the intention, and we all owe a great debt to Ron for his
> work on Perl Gedcom.
>
> But in my humble opinion, this class shows a complete
> misunderstanding both of how dates are often presented in GEDCOM
> files (and especially in files that claim to be GEDCOM but may not be
> strictly so), and of the idea of a GEDCOM date.
[snip huge analysis of date issues]
It's true that the code is quite crude in it's current handling of
dates.
It can be either continued to be worked on or abandoned. Do you think it
should be abandoned? If you think that, just say so :-).
Or perhaps should it not make any attempt to support anything other than
Gregorian dates, at least for some period of time? Is that worth doing?
I have plenty of time available to change how the code works, but only
if there is some point in doing so.
On Thu, 2011-09-15 at 12:45 +0200, Eugene van der Pijll wrote:
> Ron Savage schreef:
> > o The previous author of Gedcom::Date added some Dutch words to his
> > code, so I added the Dutch month names.
>
> I would have preferred the title of "other author"; I'm still keeping
Do you mean changing this text "Thanx to Eugene van der Pijll, the
author of the Gedcom::Date::* modules." to add 'other', or something
else? I'm quite happy to change what the comments say.
> open the option of continuing to work on Gedcom::Date later. (In fact,
> I've been working on a script to validate GEDCOM files, and that has
> given me a bit of insiration about future improvements.)
OK.
> About the Dutch names: Gedcom::Date only uses these for output.
> When parsing GEDCOM strings, only the restricted set of month
> abbreviations in the GEDCOM standard are accepted.
OK - I'll cut the Dutch words out. But as you can see from other emails,
and your own experience, calendar/language support is complex.
> Some remarks about DT::F::Gedcom:
>
> * It doesn't follow the semantics for DateTime::Format::* modules:
> parse_datetime() doesn't return a DateTime object, and it doesn't have
> a format_datetime() function. This is understandable, because a GEDCOM
> date string does not always correspond to an exact date, but I wonder
> if it should be in this namespace.
>
> (This is the reason that Gedcom::Date is not in the DateTime
> namespace, even though it returns DateTime objects.)
I feel this is a difficult decision, for the reasons you specify.
I think the basic problem is that the GEDCOM doc was not designed to fit
into the DateTime namespace, but the concept of parsing dates does,
especially given I decided to return DateTime objects.
It would (also) make sense to call it Genealogy::Gedcom::Date.
Anyone care to comment either way? Renaming it would stop a waste of
energy arguing about this.
> * Not accepting years < 1000 is a bad thing, certainly if you accept
> dates in the French calender. Single digit years are very common in
> that calendar. DateTime can handle 3-digit years, and even BC years; I
> would expect the same from any DT parser module.
As I said in another reply, I pre-process the candidate date, and then
pass it to DateTime::Format::Natural, but the latter does not always
accept years < 1000.
> * What is the value of the "one_ambiguous" flag if it is set by "1 JAN
> 2000" (especially when "1/1/2000" isn't ambiguous either, and
> "1/2/2000" is not * allowed by the GEDCOM standard?)
I overlooked that case.
> * How does your module record the difference between "2000", "JAN 2000"
> and "1 JAN 2000"?
It doesn't.
Is it worth extending the code to return info about those distinctions?
> * What is the benefit of using this module over Gedcom::Date? Or do you
> have future plans that cannot be done with Gedcom::Date? Not that I
> mind a bit of competition, of course.
It isn't necessarily superior.
It does aim to return more information per date, which is very important
to me.
Also, it helps me exercise my coding skills.
[ Good observations and points. ]
> A couple of years ago, I wrote a little tool to extract all exact
> dates of birth, marriage and death events from a GEDCOM file, and
> write them out as a calendar file (ical format, RFC 2445). I was using
> Paul Johnson's Gedcom package from CPAN. I started out using
> Date::Manip to parse dates (partly because the Gedcom package was
> already using it), but I ended up having to write my own parser. The
> problem was that Date::Manip's parser would fabricate days ("JAN 2000"
> would come back as "1 JAN 2000"). Paul Johnson's date normalisation
> routine suffered from the same problem, because it too relied on
> Date::Manip. A pity, because date normalisation would be very helpful
> when you're comparing information about an event from two GEDCOM
> files.
I'm afraid I've not been keeping up with the messages here recently. My
excuse it that I got behind at YAPC and have been really busy with work since
then. But be that as it may, let me comment on this particular aspect.
What Stephen says about Gedcom.pm is all true; I really punted on the date
handling. There are two reasons for that:
1. It's hard. I didn't want to write yet another date handling package.
Date::Manip is overkill in almost all respects and yet, as Stephen notes,
it is still insufficient for genealogical use. And I just didn't have the
heart to dive into its own code.
2. As Stephen also notes, I didn't feel that the GEDCOM specification's
description of dates was sufficient anyway. So even if I, or someone
else, were to fully implement it, I didn't think it would be a full
solution.
And then there's the question of what are you going to do with the
dates anyway? Full, complete dates are clear(*), but what about all
the other possibilities, either allowed by the GEDCOM spec or not.
Most tools would have no idea how to handle "Between May and July
1678", let alone something like "Easter Sunday in either 1783 or 1785".
So I thought to leave dates as basically free-form fields, with the
option to use Date::Manip to normalise them as far as possible, if
required.
(*) I say clear, but what about times and time zones, or calendar
changes? And no doubt there are other complexities. Rarely is
anything clear-cut in genealogy.
> Thanks for starting an unusually interesting discussion.
Agreed.
--
Paul Johnson - pa...@pjcj.net
http://www.pjcj.net
There are several things in the GEDCOM specification that are definitely
missing: phrases like "FROM BEF 1820 TO ABT 1825" for example. It would
be interesting to develop a more complete GEDCOM-like date grammar.
> And then there's the question of what are you going to do with the
> dates anyway?
There's several things that a GEDCOM Date module (or a program) would
want to do with a date:
* Validate
* Date math / checking, such as "is date B more than 16 years after date
A?"
* Text output, e.g. for report writing ("ABT APR 1820" => "around April 1820").
All of these are still useful even with the more interesting GEDCOM date
formats.
For example, using my own module:
use Gedcom::Date;
my $birth = Gedcom::Date->parse("BET JUL 1820 AND JUL 1825");
my $marr = Gedcom::Date->parse("BEF 1834");
print "Too young at marriage\n"
if $birth->clone->add( years => 16 ) > $marr;
You really want to be able to write such a validation rule, without
having to treat all different GEDCOM date formats in your code
explicitly.
> Most tools would have no idea how to handle "Between May and July
> 1678", let alone something like "Easter Sunday in either 1783 or 1785".
That most tools can't handle the first date is no reason not to try to
accept it in your own scripts. The second date is not really expressable
in GEDCOM, except as a (unparsable) date phrase.
> (*) I say clear, but what about times and time zones, or calendar
> changes? And no doubt there are other complexities. Rarely is
> anything clear-cut in genealogy.
GEDCOM was originally designed as an output scheme for genealogical
conclusions. Calendar changes should have been handled by the
genealogist or the program that created a GEDCOM file; when a date has
been outputted to a GEDCOM, it refers to a definite date in a known
calendar (either explicitly by a @#D...@ escape, or implicitly to the
Gregorian calender).
Times are outside the scope of GEDCOM; the standard has defined no tags
for them.
So while times, time zones and calendar changes are problematical in
genealogy, they shouldn't be a problem when interpreting a valid GEDCOM
file.
Eugene
It remembers the parts of the date (dmy, or my, or just y) that are
known, and only uses those components. It fabricates the missing day or
month where necessary, but these are never returned to the outside
world.
> Do you one of the GEDCOM concepts About, Calculated, Estimated or
> Interpreted?
Not at this moment. "1900" is equivalent to "BET 1 JAN 1900 AND 31 DEC
1900", and "BET 1897 AND 1903" is rougly equivalent to "ABT 1900", but I
haven't (yet) added a method to convert these GEDCOM date strings to
each other.
Eugene
On Tue, 2011-09-20 at 20:40 +0200, Eugene van der Pijll wrote:
> Paul Johnson schreef:
> > 2. As Stephen also notes, I didn't feel that the GEDCOM specification's
> > description of dates was sufficient anyway. So even if I, or someone
> > else, were to fully implement it, I didn't think it would be a full
> > solution.
>
> There are several things in the GEDCOM specification that are definitely
> missing: phrases like "FROM BEF 1820 TO ABT 1825" for example. It would
> be interesting to develop a more complete GEDCOM-like date grammar.
There's always the problem of getting people to stick to any 'standard'.
> > And then there's the question of what are you going to do with the
> > dates anyway?
>
> There's several things that a GEDCOM Date module (or a program) would
> want to do with a date:
>
> * Validate
> * Date math / checking, such as "is date B more than 16 years after date
> A?"
> * Text output, e.g. for report writing ("ABT APR 1820" => "around April 1820").
>
> All of these are still useful even with the more interesting GEDCOM date
> formats.
>
> For example, using my own module:
>
> use Gedcom::Date;
>
> my $birth = Gedcom::Date->parse("BET JUL 1820 AND JUL 1825");
> my $marr = Gedcom::Date->parse("BEF 1834");
>
> print "Too young at marriage\n"
> if $birth->clone->add( years => 16 ) > $marr;
>
> You really want to be able to write such a validation rule, without
> having to treat all different GEDCOM date formats in your code
> explicitly.
>
> > Most tools would have no idea how to handle "Between May and July
> > 1678", let alone something like "Easter Sunday in either 1783 or 1785".
>
> That most tools can't handle the first date is no reason not to try to
> accept it in your own scripts. The second date is not really expressable
> in GEDCOM, except as a (unparsable) date phrase.
Seems to me we're talking about 2 different things:
o What researchers record, which is exported as a GEDCOM date.
o What syntax a parser provides to give programmers access to the date
data.
The more latitude the former have, the more complexity the latter needs.
> > (*) I say clear, but what about times and time zones, or calendar
> > changes? And no doubt there are other complexities. Rarely is
> > anything clear-cut in genealogy.
>
> GEDCOM was originally designed as an output scheme for genealogical
> conclusions. Calendar changes should have been handled by the
> genealogist or the program that created a GEDCOM file; when a date has
> been outputted to a GEDCOM, it refers to a definite date in a known
> calendar (either explicitly by a @#D...@ escape, or implicitly to the
> Gregorian calender).
>
> Times are outside the scope of GEDCOM; the standard has defined no tags
> for them.
>
> So while times, time zones and calendar changes are problematical in
> genealogy, they shouldn't be a problem when interpreting a valid GEDCOM
> file.
>
> Eugene
>
--