On Wed, 3 Mar 2004, Zellyn Hunter wrote: > For those of you not up to speed on how much date and time confusion there is, > I remember this article by Dave Rolsky as being interesting:
You might also be interested in what Dave and others have done in an attempt to rationalize date and time handling for Perl 5. We have hashed and rehashed all the issues brought up here in the last few days on the datetime list. Including lengthy discussions about handling leap-seconds, TAI, etc. <plug>http://datetime.perl.org</plug>
> Interesting -- so the planet's finally gotten > its act together and settled on a rotational > speed, huh? Cool. :)
Nobody said anything about a planet. A clock should tell you how much time has gone by. If I get a time, sleep(1), and get another time, time 2 should be about time1 + 1 second, regardless of leap seconds, what planet I'm on, etc. Translating a timestamp into a date is a separate issue from keeping track of time itself, and should be handled separately.
__________________________________ Do you Yahoo!? Yahoo! Search - Find what you’re looking for faster http://search.yahoo.com
> > Interesting -- so the planet's finally gotten >> its act together and settled on a rotational >> speed, huh? Cool. :)
>Nobody said anything about a planet.
Actually, they did. UTC (which was the original reference) is defined such that noon is within .9 seconds of the sun being as directly overhead as it can be, and is thus directly tied to the behaviour of the planet. -- Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
> >> Interesting -- so the planet's finally gotten > >> its act together and settled on a rotational > >> speed, huh? Cool. :)
> >Nobody said anything about a planet.
> Actually, they did. UTC (which was the original > reference) is defined such that noon is within .9 > seconds of the sun being as directly overhead as > it can be, and is thus directly tied to the > behaviour of the planet.
My point remains. Don't use UTC, then :) That's why we want to ignore leap seconds.
__________________________________ Do you Yahoo!? Yahoo! Search - Find what you’re looking for faster http://search.yahoo.com
On Thu, Mar 04, 2004 at 09:12:47AM -0500, Dan Sugalski wrote:
: At 7:30 PM -0800 3/3/04, TOGoS wrote: : > > Interesting -- so the planet's finally gotten : >> its act together and settled on a rotational : >> speed, huh? Cool. :) : > : >Nobody said anything about a planet. : : Actually, they did. UTC (which was the original reference) is defined : such that noon is within .9 seconds of the sun being as directly : overhead as it can be, and is thus directly tied to the behaviour of : the planet.
The basic problem is that there are times you need to know exactly how many seconds have passed between two timestamps, and times you need to know exactly how many days have passed between two timestamps, and the two are in conflict. However, I think the more basic problem of the two is the first one. If you ignore that problem, you get all sorts of breakage down in random routines that you don't want breakage in, and this will only get worse as computers get faster, and try to syncronize more low level things.
In contrast, the other problem tends to manifest in user interfaces, where it can typically be solved by some snap-to-grid semantics at the minute, hour, and day boundaries, at least until the second timer and the Earth drift apart by 30 seconds or so. Which will take a while.
And anyone who needs greater UTC accuracy than that should probably be tracking TAI-UTF offsets anyway, or have some other way of resyncing their clock at least semiyearly.
So anyway, I don't care whether Parrot builds in support for complicated time systems like UTC. (Well, I do care some--it should probably be a library in any event.) But I do care that C<time - $^T> be accurate. (In fact, I'd like $^T to change to a floater too, on systems that can support it. It's way past time for hi-res timing to be the default, I think.)
On Thu, 4 Mar 2004, TOGoS wrote: > > Actually, they did. UTC (which was the original > > reference) is defined such that noon is within .9 > > seconds of the sun being as directly overhead as > > it can be, and is thus directly tied to the > > behaviour of the planet.
UTC may stop tracking UT1 and stop having [new] leap-seconds. Even if this happens we'll still have to take leap-seconds into account for dates prior to 1999.
> My point remains. Don't use UTC, then :) That's why we > want to ignore leap seconds.
I agree. That's why I'd like to see TAI be the default "seconds since some epoch".
> > I agree. That's why I'd like to see TAI be the default "seconds >> since some epoch".
>It seems like we want the "default" to be the simplest and fastest one >we can support.
Exactly. Which is the count that the system C RTL returns when you ask it, normally claimed to be GMT or something very much like it. If the local system returns localtime, I can see adjusting to GMT or UTC, or whatever, as that ought to be a trivial transform. Past that libraries can deal with it. -- Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
> If the local system returns localtime, I can see adjusting to GMT or UTC, or > whatever, as that ought to be a trivial transform.
Er, I'm not so sure about that. That means you'd have to be 100% sure that you can determine the local timezone without any ambiguity. That has not proven to be the case if the DateTime.pm project is any indication...
[Gordon == malic...@mac.com on Wed, 3 Mar 2004 16:48:45 -0500]
Gordon> gmclock(out Nx) Gordon> UTC clock in seconds since 0000 hrs Jan 1, 2000, Gordon> ignoring leap seconds.
Gordon> tolocal out Nx, out Iy, in Nz Gordon> x is set to z converted to the local time zone. y <- 1 Gordon> if Daylight Savings Time was in effect at z; y <- 0 Gordon> otherwise.
Gordon> splittime Px, Nx Gordon> Splits date up like Perl 5 gmtime. (But without Gordon> annoying y -= 1900 and m -= 1?)
Gordon> add_months(out Nx, in Ny, in Nz) Gordon> Sets x to y + z months.
This proposal was perhaps taken offline by the core team, but to avoid a Warnock, I'll pipe up and say I like this proposal as a core instruction set.
It'd be nice if the first item, "gmclock" wasn't defined in terms of "UTC". Regardless of the future fate of UTC leap seconds, any UTC-based clock would need to account for leap seconds going back, right? So it seems that GMT should be preferred in the core, with any UTC calculations being performed at the language or module level.
So, "gmclock" seems the right opcode (not "utcclock"), making the definition "GMT clock in seconds since 0000 hrs Jan 1, 2000".
Daylight savings time calculation seems appropriate to stay in the core, but perhaps additional opcodes are need to set the clock's understanding of the time zone used for DST calculations?
Gordon> # years Gordon> Nz = Py[5] Gordon> Nz = Nz - 2000 # epoch based at year 2000 Gordon> Nz = Nz * 12 # = months per year
<inappropriate sarcasm>I look forward to the exciting advancement of changing from subtracting 1900 from all my dates to subtracting 2000 or dealing with negative numbers; that'll change everything and really advance the state of the art.</sarcasm>
Of course, as Gordon suggests, what I'll actually be doing is relying on some Perl 6 module to give me a nice (non-negative date) object interface. The Parrot instruction should probably be optimized for speed. Formatting can always be kludged together at the language level, but a module will need solid support from Parrot for good duration calculations. Gordon's proposal is tuned for duration calculations, which is ideal.
It'd be great, if possible, to get a statement here from the DateTime group sanctioning one or another particular Parrot clock interfaces as "efficient, complete, and preferred". I haven't poked over to see if they are chatting about this Parrot issue.
> Leave parsing and formatting entirely to libraries. Absolutely no need > for that in the instruction set.
well, I have a bit of a problem with that... As it was pointed out before, people have gone hogwild with the parsing and formatting routines, and its a bloodbath of modules on CPAN with different methods for time parsing.
Which is a pity, because an augmented strftime and its reverse would handle most if not all of the parsing needs, ie:
strftime( out Sx, in Sy, in Nz )
where
Sx is an output date string Sy is a format described by strftime (ex: %y = 2 digit year %Y = 4 digit year,etc)
Nz is the number of seconds past a certain epoch (provided by gmclock, time)
r_strftime( out Nx, in Sy, in Sz )
where Nx is the number of seconds past a certain epoch
Sy is a format for a given date (provided by strftime)
Sz is the date to be converted to in seconds
Perhaps there could be an extra argument for locales, but in data processing most of the time you know the format for dates - as long as they are machine specified. If they are not, you can always fall back to the very slow Date::Parse and cousins. But how many times are you going to need to parse formats like '3 weeks from next Wednesday?'.
Whether or not this is a low-level module or an op is debatable, but standardizing on one type of formatting dates at the lowest possible level would make perl6 much much cleaner. And right now, things are way too slow - any low level module doing this would IMO need to do it in C anyways to get the fastest possible speed, which seems to argue for them being ops.
> Leave parsing and formatting entirely to libraries. Absolutely no need > for that in the instruction set.
well, I have a bit of a problem with that... As it was pointed out before, people have gone hogwild with the parsing and formatting routines, and its a bloodbath of modules on CPAN with different methods for time parsing.
Which is a pity, because an augmented strftime and its reverse would handle most if not all of the parsing needs, ie:
strftime( out Sx, in Sy, in Nz )
where
Sx is an output date string Sy is a format described by strftime (ex: %y = 2 digit year %Y = 4 digit year,etc)
Nz is the number of seconds past a certain epoch (provided by gmclock, time)
r_strftime( out Nx, in Sy, in Sz )
where Nx is the number of seconds past a certain epoch
Sy is a format for a given date (provided by strftime)
Sz is the date to be converted to in seconds
Perhaps there could be an extra argument for locales - and maybe the op could be split in two (ie: for (r_)gm_fmtime, (r_)local_fmtime), but for me, in processing time values most of the time you know the format for dates - as long as they are machine generated. If they are not, you can always fall back to the very slow Date::Parse and cousins, or maybe string these ops along in a chain of '||' to query for multiple formats.
But how many times are you going to need to parse formats like '3 weeks from next Wednesday?'.
Whether or not this is a low-level module or an op is debatable, but standardizing on one type of formatting dates at the lowest possible level would make perl6 much much cleaner. And right now, things are way too slow - any low level module doing this would IMO need to do it in C anyways to get the fastest possible speed, which seems to argue for them being ops.
> > gmclock(out Nx) > > UTC clock in seconds since 0000 hrs Jan 1, 2000, > > ignoring leap seconds.
> > tolocal out Nx, out Iy, in Nz > > x is set to z converted to the local time zone. y <- 1 > > if Daylight Savings Time was in effect at z; y <- 0 > > otherwise.
> > splittime Px, Nx > > Splits date up like Perl 5 gmtime. (But without > > annoying y -= 1900 and m -= 1?)
> > add_months(out Nx, in Ny, in Nz) > > Sets x to y + z months.
> This proposal was perhaps taken offline by the core team, but to avoid > a Warnock, I'll pipe up and say I like this proposal as a core > instruction set.
Thanks for the anti-Warnock.
> It'd be nice if the first item, "gmclock" wasn't defined in terms of > "UTC". Regardless of the future fate of UTC leap seconds, any > UTC-based clock would need to account for leap seconds going back, > right? So it seems that GMT should be preferred in the core, with any > UTC calculations being performed at the language or module level.
> So, "gmclock" seems the right opcode (not "utcclock"), making the > definition "GMT clock in seconds since 0000 hrs Jan 1, 2000".
Fair enough. I wasn't trying to load terms for once. In point of fact, this set of opcodes is very much agnostic to whether or not leap seconds are used.
The reason that add_months belongs as an opcode is because there's been no decision made re leap seconds, and in fact some anti-decisions have been proposed. Without some way to manipulate dates in accordance with the actual rules that get used (which might be platform dependent at this point), a portable program can't be written. Were this decision made solidly, then all of the ops except gmclock and possibly tolocal would be better written as library code.
Were leap seconds to fall in the middle of a month, then we would need to have an add_days op. Since that's not the case, the assumption that a day is 86400 seconds long is safe so long as you avoid adding enough days to cross a year boundary. Meanwhile, add_months encapsulates the leap-second worries AND lets IMC code conveniently manipulate dates without days-per-month tables and is-it-a-leap-year algorithms (which are required to convert add_months to add_days).
> Daylight savings time calculation seems appropriate to stay in the > core, but perhaps additional opcodes are need to set the clock's > understanding of the time zone used for DST calculations?
Indeed; "the local time zone" is pretty narrow-minded and backwards- thinking. The problem is balancing weight with efficiency, and time zone databases are pretty heavy critters, which make me think "get thee OUT of the core!"...
> > # years > > Nz = Py[5] > > Nz = Nz - 2000 # epoch based at year 2000 > > Nz = Nz * 12 # = months per year
> <inappropriate sarcasm>I look forward to the exciting advancement of > changing from subtracting 1900 from all my dates to subtracting 2000 > or dealing with negative numbers; that'll change everything and really > advance the state of the art.</sarcasm>
What you're complaining about is the elements of Perl's gmtime/localtime arrays. Actually, I suggested removing that quirk from splittime:
> > splittime Px, Nx > > Splits date up like Perl 5 gmtime. (But without > > annoying y -= 1900 and m -= 1?) ^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Nz - 2000 is conversion *to* the epoch, not reinterpreting the value
in the array.
Unless you think storing dates and times as the number of seconds since Jan 1 0000 is a good idea, perhaps. :)
I'd used Jan 1 2000 as an epoch since Larry suggested that's what he was planning for Perl 6 to use.
--
Gordon Henriksen IT Manager ICLUBcentral Inc. gor...@iclub.com
Edward S. Peschko wrote: > Gordon Henriksen wrote:
> > Leave parsing and formatting entirely to libraries. > > Absolutely no need for that in the instruction set.
> well, I have a bit of a problem with that... As it was pointed out > before, people have gone hogwild with the parsing and formatting > routines, and its a bloodbath of modules on CPAN with different > methods for time parsing.
"Not an opcode" doesn't mean "balkanized." There is a parrot/stdlib directory.
> Perhaps there could be an extra argument for locales - and maybe the > op could be split in two (ie: for (r_)gm_fmtime, (r_)local_fmtime), > but for me, in processing time values most of the time you know the > format for dates - as long as they are machine generated. If they are not, > you can always fall back to the very slow Date::Parse and cousins, or > maybe string these ops along in a chain of '||' to query for multiple > formats.
> But how many times are you going to need to parse formats > like '3 weeks from next Wednesday?'.
Uh. Yeah.
This sort of creeping featuritis is why date formatting and especially parsing do NOT belong as opcodes. It's too big a problem to solve in the
core, and regardless of how rich the interface is, it'll never be quite rich enough to satisfy everyone.
Someone depends on formats like "3 weeks from next Wednesday" on a regular basis. You might not need to format dates from the Chinese lunar calendar, but you can bet that it's vital to someone. Eras? Well, sure. Fuzzy date parsing? (Guessing.) Oh, baby. Global time zone database? Bring it on. A kl-hw (Klingon Homeworld) locale? But of course!
There'll always be pressure to further enhance the feature, increasing parrot's *core* memory footprint. You can't skip loading the implem- entation of core parrot ops like you can avoid DateTime.pm.
So keep it to (un)loadable user code, and provide a class that solves 90% of the problem in the stdlib.
Also, trying to nail down a rich interface before seeing what Larry has in mind for Perl 6 date handling is roughly a waste of time....
--
Gordon Henriksen IT Manager ICLUBcentral Inc. gor...@iclub.com
On Tue, Mar 09, 2004 at 04:21:24PM -0500, Gordon Henriksen wrote: > Edward S. Peschko wrote:
> > Gordon Henriksen wrote:
> > > Leave parsing and formatting entirely to libraries. > > > Absolutely no need for that in the instruction set.
> > well, I have a bit of a problem with that... As it was pointed out > > before, people have gone hogwild with the parsing and formatting > > routines, and its a bloodbath of modules on CPAN with different > > methods for time parsing.
> "Not an opcode" doesn't mean "balkanized." There is a parrot/stdlib > directory.
fair enough, but then where does the distinction lie? Why put gmtime, et al. in opcodes? As well as addmonth?
If you are optimising for simplicity and , it surely makes sense to put these in the standard library.
If you are optimising for speed, then it makes sense to put them in opcodes.
It all comes down to how fast one is versus the other. If everything is going to be built in the standard library based off of bytecode then there will be a performance hit. How severe, I don't know, but definitely a performance hit.
And given the prevalence of string and date manipulation in every day tasks, I would suggest not having that performance hit.
> > But how many times are you going to need to parse formats > > like '3 weeks from next Wednesday?'.
> Uh. Yeah.
> This sort of creeping featuritis is why date formatting and especially > parsing do NOT belong as opcodes. It's too big a problem to solve in the
But I don't see the 'creeping featuritis' that you see. As well as the memory imprint problems that you see. Is parrot going to link with libc?
If so, you get a ton of functionality for (almost) free, just by doing an intelligent wrapper around strftime. As well as providing much of your localization support. And you get a natural boundary on what functionality should be put (in the core) versus what should be put in a module.
And any scheme that Larry would come up with would be implemented in terms of what would be in parrot.
The problem would then come down to filling in deficiencies of individual C89 implementations (which hopefully will lessen as time goes on), making an intelligent interface into strftime, and doing a reverse lookup. How I'd love to be able to do in my code:
where r_strftime converted back to floating point seconds and then strftime reconverted back into @date_formats..
or
my $secs = gmclock(); my @oracle_dates = map( strftime("%d-%b-%Y", addmonths( $secs, $_)), (1..10))
making 10 dates each spaced a month apart and formatting them for inclusion into oracle.
Given the prevalence in which this is done, I'd want to do it *fast*. As I see it, with an opcode this will be as fast as possible, approaching C speed.
So - if you can guarantee that there isn't going to be a lot of overhead by putting it in stdlib, then sure by all means it should be put there.
> Someone depends on formats like "3 weeks from next Wednesday" on a > regular basis. You might not need to format dates from the Chinese > lunar calendar but you can bet that it's vital to someone.
Look at strftime(3C) some time. I'd say that the problem has been to a large extent solved. Yes, it doesn't handle TAI or other epochs, which could be handled by an offset, and yes, there aren't certain
I would think that most of this can
> parsing? (Guessing.) Oh, baby. Global time zone database? Bring it on. A > kl-hw (Klingon Homeworld) locale? But of course!
Most of the above is already there (well, maybe not klingon). Do you want to reinvent everything? What's not there could be implemented in terms of a wrapper around the opcode. Or augmented inside the opcode. It would be a judgement call as to which though.
> There'll always be pressure to further enhance the feature, increasing > parrot's *core* memory footprint. You can't skip loading the implem- > entation of core parrot ops like you can avoid DateTime.pm.
Resist the temptation then. Like I said, strftime provides quite a lot.
> So keep it to (un)loadable user code, and provide a class that solves > 90% > of the problem in the stdlib.
like I said, if there isn't a performance penalty for this, then go ahead. If there is a substantial performance penalty for every time I manipulate a date, you could easily make the argument 'so keep the core ops relatively small, which solve 90% of the problem, and put the rest into a wrapper inside stdlib.'
> Also, trying to nail down a rich interface before seeing what Larry has > in mind for Perl 6 date handling is roughly a waste of time....
well, no, I don't think so.. At least if I was developing perl6 I'd want some input on it. The RFCs after all are 3 years old, and at some point there is going to have to be a merging between perl6-internals and perl6-language..
At 03:52 PM 3/9/2004 -0800, Edward S. Peschko wrote:
>On Tue, Mar 09, 2004 at 04:21:24PM -0500, Gordon Henriksen wrote: > > "Not an opcode" doesn't mean "balkanized." There is a parrot/stdlib > > directory.
>fair enough, but then where does the distinction lie? Why put gmtime, et al. >in opcodes? As well as addmonth?
>If you are optimising for simplicity and , it surely makes sense to put >these in >the standard library.
>If you are optimising for speed, then it makes sense to put them in opcodes.
I don't think optimising for "X" is the reason. Parrot should have concise, necessary, complete opcode primitives upon which anything else can be built.
Date parsing can be done all "with" opcodes, but please not "inside" opcodes.
If we cannot provide a decently performing VM that makes people want to write stuff in bytecode (or compiled to bytecode) we have failed anyway.
>It all comes down to how fast one is versus the other. If everything is going >to be built in the standard library based off of bytecode then there will be >a performance hit. How severe, I don't know, but definitely a performance hit.
This is subjective reasoning, though. If we want to talk performance hits, we should know where we stand. When bytecode isn't full of PMC thrashing, and actually uses the low level Ix and Sx registers where possible, the JIT works extremely well and the performance beats Perl5 by several orders of magnitude in many cases. Hopefully the "hints" in Perl6 will help us write better libraries, but some of it may have to be written in IMC or slightly higher level language (on the equivalent of C, compilable to bytecode).
> > > But how many times are you going to need to parse formats > > > like '3 weeks from next Wednesday?'.
> > This sort of creeping featuritis is why date formatting and especially > > parsing do NOT belong as opcodes. It's too big a problem to solve in the
I agree.
>But I don't see the 'creeping featuritis' that you see. As well as the >memory imprint problems that you see. Is parrot going to link with libc?
Yes, Parrot will link with libc.
Regardless of what we implement as opcodes, there will never be a single solution that fits all, and it will never, ever end. Just look at Linux.
> I don't think optimising for "X" is the reason. Parrot should have > concise, necessary, complete opcode primitives upon which anything > else can be built.
how about opcode formatting primitives? That is what I would argue strftime and relatives are... generic low-level ways of turning strings into dates into numbers (seconds) and vice-versa.
That they are part of a standard low-level library (libc) that is widely available and can be described in terms of parrot primitives (strings, numbers), and the fact that this low-level library is well known, tested and supported on various platforms just seems to me to be a tremendous timesaver.
> Date parsing can be done all "with" opcodes, but please not "inside" > opcodes. > If we cannot provide a decently performing VM that makes people want to > write stuff in bytecode (or compiled to bytecode) we have failed anyway.
I agree that facilitating easy bytecode writing is really important - but reimplementing a pretty standard wheel and having that pretty standard wheel be slower than what is available doesn't seem to be useful to me. Taking the standard wheel and augmenting it seems more useful IMO.
> This is subjective reasoning, though. If we want to talk performance > hits, we should know where we stand. When bytecode isn't full of > PMC thrashing, and actually uses the low level Ix and Sx registers > where possible, the JIT works extremely well and the performance > beats Perl5 by several orders of magnitude in many cases. Hopefully > the "hints" in Perl6 will help us write better libraries, but some of it > may have to be written in IMC or slightly higher level language (on > the equivalent of C, compilable to bytecode).
I tell you what - why don't base your decision on an empirical test. I'll write a c program around strftime that does a given set of formatting dates and times, as well as the reverse translation, and we can benchmark that program versus the bytecode that parrot offers in a stdlib on a list of tasks.
After some optimising iterations, we'll compare the results. And depending on the performance difference, you can decide whether or not its worth the effort to put these primitives in the core.
> >> This sort of creeping featuritis is why date formatting and especially > >> parsing do NOT belong as opcodes. It's too big a problem to solve in the
> I agree.
And the more I think about it, the more I think this concern is misplaced. Putting it in an opcode will make it *harder* to do this type of thing, not easier. In perl5 land, backwards - and forwards - compatibility is a big concern. There will be a large tendency to stabilize any implementation that occurs here.
On Tue, Mar 09, 2004 at 09:53:47PM -0500, Gordon Henriksen wrote: > Edward,
> Want to call strptime? Use NCI. No need for anything new in the core. > That's WHY it's the CORE.
I think there is a misunderstanding here. I don't think that strftime by itself in the core - which probably has system dependencies and inconsistancies - is a good idea. Or for that matter, calling strftime directly is a good idea.
What I think is a good idea is a light wrapper around strftime, one that irons out the inconsistencies inherent in different platforms, and adds a reverse function which takes a format and turns it back into a number of seconds.
I don't know of any other function that does this, and I can tell you as a heavy user of perl in a production environment having a lightweight, fast, function that does this is very very useful. The faster the better.
Anyways - like I said, I don't have anything against it being in a standard library, as long as it doesn't have too much overhead and is reasonably efficient.
But I'm doubtful that this can really be done vs having it as an op - hence the idea of an extensive benchmark.
Ed
( ps - I looked for a 'stdlib' directory in parrot-0.1.0.. is it not there? Are stdlibs going to be shared between languages that host parrot? )
Edward S. Peschko wrote: > On Tue, Mar 09, 2004 at 09:53:47PM -0500, Gordon Henriksen wrote:
>>Edward,
>>Want to call strptime? Use NCI. No need for anything new in the core. >>That's WHY it's the CORE.
> I think there is a misunderstanding here. I don't think that strftime by itself > in the core - which probably has system dependencies and inconsistancies - > is a good idea. Or for that matter, calling strftime directly is a good idea. ... > But I'm doubtful that this can really be done vs having it as an op - hence the > idea of an extensive benchmark.
On a platform with a halfway decent JIT, a pure-PASM implementation could be as fast as an op-based one, given liberal use of the non-PMC registers. Maybe faster--no need to transcode to ASCII, extract a C string, or hack around platform X's quirks. And I'd imagine that the Parrot standard library would be written to be as fast as possible.
Besides, how fast does your date handling really need to be? I mean, *really*? Are you formatting eleventy billion dates in a tight loop or something?
-- Brent "Dax" Royal-Gordon <br...@brentdax.com> Perl and Parrot hacker
Oceania has always been at war with Eastasia.
[And once again, I show my incompetence with Thunderbird...]
At 08:12 PM 3/9/2004 -0800, Brent \"Dax\" Royal-Gordon wrote:
>On a platform with a halfway decent JIT, a pure-PASM implementation >could be as fast as an op-based one, given liberal use of the non-PMC
Agree.
>Besides, how fast does your date handling really need to be? I mean, >*really*? Are you formatting eleventy billion dates in a tight loop or >something?
I actually have Perl programs that parse many many millions of billing records per day (with 2 dates per record) for certain wireless companies.
Sometimes, if the customer wants to do an audit, we have to process over a month's worth, so we are bound by the actual execution time of the Perl script and the access time of a Sleepycat (Berkeley DB) database. The Perl programs must be able to scale with call/message volume, and right now the only thing we can do to improve it is put faster processors on it. (We have 8-way boxes with 64GB RAM, so 4GB hashes work just fine, but 1.5 hrs per day is still very finite when you have to baby sit scripts for a week to give customers the answers they want to know). Granted, I could rewrite this stuff in C, but we typically modify these things on very short notice and Perl gives us the flexibility to react quickly.
So, when we are discussing dates, I am one very interested party.
-Melvin
PS: Sorry I'm so vague about the numbers. The customers are very sensitive about those numbers and I could get in trouble, but lets say they are in the billions for a rather small time sample.
> On a platform with a halfway decent JIT, a pure-PASM implementation > could be as fast as an op-based one, given liberal use of the non-PMC > registers. Maybe faster--no need to transcode to ASCII, extract a C > string, or hack around platform X's quirks. And I'd imagine that the > Parrot standard library would be written to be as fast as possible.
ok, well cool. We'll see then.
> Besides, how fast does your date handling really need to be? I mean, > *really*? Are you formatting eleventy billion dates in a tight loop or > something?
yes.
Its a major issue/pain in database conversions, and in funnelling data from system to system - in my case, approximately 4 date conversions for each of 2.5 billion rows on different systems, some of them live production. Dates converted from one system standard oracle format to another oracle format. From mysql to db2. db2 to oracle. Date comparisons between different formats. (and yes, I have rolled 'my own' wrapper for handling this stuff. I use it both from perl and generated c to do the conversions. However, I'm sick of rolling my own stuff to 'supplement' what IMO should be native to perl itself.)
In any case, like I said, I don't have any problem with standard libraries handling this stuff; as long as whatever library is:
a) reasonably fast - ie: within 20% of C b) just as robust as the standard system stuff c) has a reverse formatting function, going back to seconds from dates. d) is standard - handles 95% of necessary date manipulations e) has a tight memory footprint.
I think its a waste to reimplement everything though. There is a LOT THERE in libc which could be leveraged. And it definitely would be a good idea for parrot to make dates almost as low level an item as strings,numbers, etc - ie: native datatypes or close to native. A formatting op which tied numbers to strings would do that quite nicely.
> Sometimes, if the customer wants to do an audit, we have to process > over a month's worth, so we are bound by the actual execution > time of the Perl script and the access time of a Sleepycat (Berkeley DB) > database. The Perl programs must be able to scale > with call/message volume, and right now the only thing we can do to > improve it is put faster processors on it. (We have 8-way boxes with > 64GB RAM, so 4GB hashes work just fine, but 1.5 hrs per day is still very > finite > when you have to baby sit scripts for a week to give customers the answers > they want to know). Granted, I could rewrite this stuff in C, but we > typically > modify these things on very short notice and Perl gives us the flexibility > to react quickly.
well you could opt for generated C (which is what I do). I generate function calls based on the content of the sql that is to be executed (ie: a md5 hash-like thingy) and bound back to flat files and/or perl data structures. Its a pain in the ****, but it works, and we are IO bound rather than processor bound on a system with *very* high IO throughput. We also generate the C from perl, of course.
Anyways, I'd much rather be able to have perl be able to handle everything, for the reasons you mentioned. Its just too bad that its too slow for the task at hand, given that the system that we were using. Hence the post.
> So, when we are discussing dates, I am one very interested party.
> -Melvin
> PS: Sorry I'm so vague about the numbers. The customers are very > sensitive about those numbers and I could get in trouble, but lets say > they are in the billions for a rather small time sample.
(oops. didn't think of that one. fortunately, I was thinking about it some more, and my estimates I gave in my last email weren't completely accurate.. ;-))
On Sat, 6 Mar 2004, Jared Rhine wrote: > It'd be great, if possible, to get a statement here from the DateTime > group sanctioning one or another particular Parrot clock interfaces as > "efficient, complete, and preferred". I haven't poked over to see > if they are chatting about this Parrot issue.
A few of us have already offered opinions...
What we would like for DateTime.pm is an epoch that is _absolutely_ fixed across all platforms. perl does not provide this so we are dependent on the behavior of time(), gmtime(), and Time::Local to make up for this. I have already suggested that Parrot include an op that returns TAI but I would happily settle for gettimeofday(2). What ever is decided upon Parrot should commit to making it consistent across _all_ platforms.