Localising CSV dates to system timezone - ok ?

150 views
Skip to first unread message

Simon Michael

unread,
Sep 27, 2022, 7:52:01 PM9/27/22
to hledger
The current date-format docs say:

Note that although you can parse date-times which include a time zone,
that time zone is ignored; it will not change the date that is parsed.
This means when reading CSV data with times not in your local time zone,
dates can be "off by one".

This bit me recently, so in my dev branch they now say:

To parse date-times containing a time zone, include `%Z` (or `%z`, `%EZ` or `%Ez`) in the pattern
(see the formatTime link above).

Note: when you use these, hledger localises CSV dates to your system time zone.
(By contrast, hledger versions before 1.28 always generated UTC dates from CSV,
which meant you sometimes got entries with dates different from the ones you remembered.)

This means that the result of CSV conversions can depend on your current system time zone
(if 1. the CSV contains zoned dates and 2. you use any of these Z codes to parse them).
If needed, you can force your conversions to always use a specific time zone
by overriding the system time zone when converting, eg (on unix):

$ TZ=UTC hledger print -f foo.csv # or TZ=UTC hledger import foo.csv

I know not everyone will see this, but if you're a hledger-flow-er, full fledged hledger-er,
or other CSV-firster (your CSV is your master copy), could you think about the above
and let me know if it sounds problematic / liable to break things in your setup ? 
Ie, because converting CSV might be less reproducible, say if you are travelling.

I could also add a --tz option to override the conversion timezone in a more
usable/platform-independent way. If it's needed and the right way to go.

Daniele Moro (gmail)

unread,
Sep 28, 2022, 5:59:35 AM9/28/22
to hle...@googlegroups.com
Not sure I get this: are you planning a new feature for parsing csv files, which could interpret time zones?
Where should this (additional) info be stored in the csv? In a separate column, or in the usual date/time column?
What happens if my csv contains dates and times, without information about time zones? Would this spit out an error?
(sorry, didn't read the dev-branch docs, so maybe point me to the answers there)
Thanks for your fantastic work on this project,
Dan

--
You received this message because you are subscribed to the Google Groups "hledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hledger+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hledger/9A5439DD-93FA-4447-A974-58AB3E61428F%40joyful.com.

Andreas Pauley

unread,
Sep 28, 2022, 6:27:51 AM9/28/22
to Hledger Group
On Wed, Sep 28, 2022, at 01:51, Simon Michael wrote:

>> $ TZ=UTC hledger print -f foo.csv # or TZ=UTC hledger import foo.csv
>
> I know not everyone will see this, but if you're a hledger-flow-er, full fledged hledger-er,
> or other CSV-firster (your CSV is your master copy), could you think about the above
> and let me know if it sounds problematic / liable to break things in your setup ?
> Ie, because converting CSV might be less reproducible, say if you are travelling.
>
> I could also add a --tz option to override the conversion timezone in a more
> usable/platform-independent way. If it's needed and the right way to go.

Hi Simon,

I've had a look at the CSV files I get from institutions in different countries/timezones. Personally I don't have any example of timestamps that include timezones within the timestamp's field text, or even a separate timezone field like in the Paypal example. In my case the timezone is always implicit, and dictated by the institution. Some of the institutions include the timezone in the header name, e.g. "Timestamp (UTC)"

If I think a bit broader to include other people who do have timezones specified in their CSV files:
conceptually we have an input timezone from the CSV files and an output timezone for hledger transactions.
During each csv-to-hledger conversion we will encounter various different input timezones. Up to now the output timezone would have matched the input timezone (because the input timezone was ignored). I haven't thought much about it before, but it was never surprising to me that my US accounts generated US dates in hledger journals. It may be surprising if my US accounts start generating European dates that doesn't match the source dates. Apart from the dates that doesn't match up it would result in a nasty git diff on all previous journals. But personally I wouldn't mind too much, as long as it happens once and the output is stable after that. And I can make the output stable by fixing the TZ environment to what I want before running my command (the "TZ=UTC" trick you mentioned).

For me the nicest (future) solution would be the ability to specify input and output timezones in the rules files, as opposed to command-line options. The rules file is a nice place where I can make the implicit institution timezone explicit, and group it with all the other oddities for that CSV format.

Kind regards,
Andreas

Daniele Moro (gmail)

unread,
Sep 28, 2022, 8:00:19 AM9/28/22
to hle...@googlegroups.com
Hi Simon,
I agree with Andreas pointed solution (rules file).
Cheers,
Dan

--
You received this message because you are subscribed to the Google Groups "hledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hledger+u...@googlegroups.com.

Ryan

unread,
Sep 28, 2022, 9:04:00 AM9/28/22
to hledger
I haven't used CSV files with hledger but I deal a lot with time zone issues in software. My opinion is that you should be explicit about time zone, or at least provide a way to be explicit. Inferring time zone often leads to unexpected problems. In addition to inferring time zone in CSV files, it sounds like it's also inferred in regular hledger-formatted journal files. Is that correct? If so, it could cause the same reproducibility problem when traveling, or perhaps when people are sharing a journal file across time zones via git. It might be worth it to provide a way to specify the time zone in journal files too.

Simon Michael

unread,
Sep 29, 2022, 10:29:33 PM9/29/22
to hledger
Thanks for your feedback, all.

To recap,

- Current hledger ignores time zones in CSV dates, even if you parse them with %Z. Whatever date appears in the CSV, is the date that will appear in your journal, regardless of time zones.

- Proposed new behaviour is to 1. respect any time zones parsed with %Z and 2. localise dates to your current system time zone (or another time zone set with TZ if preferred).


On Sep 27, 2022, at 23:59, Daniele Moro (gmail) <dan....@gmail.com> wrote:

Not sure I get this: are you planning a new feature for parsing csv files, which could interpret time zones?

A small improvement to current functionality. Previously, if CSV dates contained a time zone, you could (and had to, unless the time zone was always the same) use %Z to parse it. But that information was discarded. This can cause off-by-one dates. (Example below.)

Where should this (additional) info be stored in the csv? In a separate column, or in the usual date/time column?

It could be either, or nowhere, depending on the CSV. Many CSVs don't include time zone information.

What happens if my csv contains dates and times, without information about time zones? Would this spit out an error?

No problem, then you don't need %Z in your date-format and none of this affects you.



On Sep 28, 2022, at 00:27, Andreas Pauley <adpg...@ml1.net> wrote:

I've had a look at the CSV files I get from institutions in different countries/timezones. Personally I don't have any example of timestamps that include timezones within the timestamp's field text, or even a separate timezone field like in the Paypal example. In my case the timezone is always implicit, and dictated by the institution. Some of the institutions include the timezone in the header name, e.g. "Timestamp (UTC)"

If I think a bit broader to include other people who do have timezones specified in their CSV files:
conceptually we have an input timezone from the CSV files and an output timezone for hledger transactions.
During each csv-to-hledger conversion we will encounter various different input timezones. Up to now the output timezone would have matched the input timezone (because the input timezone was ignored). I haven't thought much about it before, but it was never surprising to me that my US accounts generated US dates in hledger journals. It may be surprising if my US accounts start generating European dates that doesn't match the source dates. Apart from the dates that doesn't match up it would result in a nasty git diff on all previous journals. But personally I wouldn't mind too much, as long as it happens once and the output is stable after that. And I can make the output stable by fixing the TZ environment to what I want before running my command (the "TZ=UTC" trick you mentioned).

Here's the case that caused a problem for me: one of my CSVs gives all dates in 2022-09-19T18:38:39.000Z format (a standard format, Z means UTC). But I am currently in HST, so some of my converted transactions have wrong dates.

For me the nicest (future) solution would be the ability to specify input and output timezones in the rules files, as opposed to command-line options. The rules file is a nice place where I can make the implicit institution timezone explicit, and group it with all the other oddities for that CSV format.

I hadn't thought of adding an input timezone to CSVs which don't have one. I suppose that could be useful, indeed, though it seems none of us have needed it yet. The rules file would be the right place since it's specific to each CSV file.

The output timezone is not specific to CSV files, so I was thinking it should be configured at the app level, as we wouldn't want to have to configure it in every rules file.


On Sep 28, 2022, at 03:03, Ryan <ryan....@gmail.com> wrote:

I haven't used CSV files with hledger but I deal a lot with time zone issues in software. My opinion is that you should be explicit about time zone, or at least provide a way to be explicit. Inferring time zone often leads to unexpected problems. In addition to inferring time zone in CSV files, it sounds like it's also inferred in regular hledger-formatted journal files. Is that correct? If so, it could cause the same reproducibility problem when traveling, or perhaps when people are sharing a journal file across time zones via git. It might be worth it to provide a way to specify the time zone in journal files too.

This makes sense in general, and as a separate issue to think about: how time zone should affect journal files. Currently we avoid the whole issue of time in journals.

So then ? 

- I think the new behaviour I described is a good incremental improvement, not excessively disruptive, and worth merging
- I think we might want to add the ability to declare the data's timezone in CSV rules
- And, a more portable way (than setting TZ) to choose the target timezone for CSV conversion
- But we haven't yet seen real need for these so perhaps we shouldn't implement them until then



Simon Michael

unread,
Sep 30, 2022, 12:08:10 AM9/30/22
to hledger


On Sep 29, 2022, at 16:29, Simon Michael <si...@joyful.com> wrote:

- I think the new behaviour I described is a good incremental improvement, not excessively disruptive, and worth merging
- I think we might want to add the ability to declare the data's timezone in CSV rules
- And, a more portable way (than setting TZ) to choose the target timezone for CSV conversion
- But we haven't yet seen real need for these so perhaps we shouldn't implement them until then


Narrator: "And then, he ran the tests."

It needs a bit more work: my branch assumes all dates are UTC, causing all dates to be off by one.

Here's a new draft of how I think it should work (as next step; it doesn't allow configuring the input time zone yet):

Time zones

When parsing CSV date-times, here are the current options for handling time zones.
Pick one that seems best for your situation:

  1. The CSV does not contain time zone information.
    hledger will keep the dates unchanged.
    Result: the date that appears in the CSV, will also appear in your journal.

  2. The CSV contains date-times with a time zone which is always the same, and you ignore it.
    You can do this by adding the timezone text to the date format.
    Result: same as 1.

  3. The CSV contains date-times with a time zone, and you do time-zone-aware conversion.
    You'll need to add %Z (or %z%EZ or %Ez, see the formatTime link above) to the date format.
    In this case hledger will parse the CSV date-time to UTC internally,
    then localise that to your system time zone.
    Result: dates in your journal could differ from the CSV dates in some cases.
    The output from a CSV conversion can vary depending on your current system time zone.

  4. Like 3, but you specify the time zone rather than relying on the system time zone.
    You can use the TZ environment variable to override the
    system time zone temporarily (on unix at least). Eg:
  1. $ TZ=UTC hledger print -f foo.csv  # or TZ=UTC hledger import foo.csv
    
  1. Result: dates in your journal could differ from the CSV dates,
    unless you force the same time zone that appears in the CSV dates.
    The output from a CSV conversion will be reproducible independent of your
    current system time zone.

Simon Michael

unread,
Sep 30, 2022, 12:29:14 AM9/30/22
to hledger


On Sep 29, 2022, at 18:08, Simon Michael <si...@joyful.com> wrote:



On Sep 29, 2022, at 16:29, Simon Michael <si...@joyful.com> wrote:

- I think the new behaviour I described is a good incremental improvement, not excessively disruptive, and worth merging
- I think we might want to add the ability to declare the data's timezone in CSV rules
- And, a more portable way (than setting TZ) to choose the target timezone for CSV conversion
- But we haven't yet seen real need for these so perhaps we shouldn't implement them until then


Narrator: "And then, he ran the tests."

It needs a bit more work: my branch assumes all dates are UTC, causing all dates to be off by one.

Here's a new draft of how I think it should work (as next step; it doesn't allow configuring the input time zone yet):


Well, having come this far.. disregard the previous; here is draft #3, including the feature of declaring the input time zone:


Time zones

When parsing CSV date-times, here are the current options for handling time zones.
Pick one that seems best for your situation:

  1. The CSV contains dates or date-times, with no time zone information.
    hledger will assume the dates are in your local time zone.

  1. The CSV contains date-times with a time zone which is always the same,
  1. and you ignore it, by adding the literal timezone text to the date format.
    Result: same as 1.

  2. The CSV contains date-times, with no time zone information,
    but you configure a time zone with timezone in the CSV rules.
    hledger will interpret the date-times as in that time zone,
    and will localise them to your current system time zone.

  3. The CSV contains date-times with a time zone, and you use
    %Z (or %z%EZ%Ez, see the formatTime link above) in date-format.
    hledger will parse the date-times as written,
    and will localise them to your system time zone.

  4. Like 3 or 4, but you override the system time zone by setting
    the TZ environment variable, eg (on unix at least):

  1. $ TZ=UTC hledger print -f foo.csv  # or TZ=UTC hledger import foo.csv
    
  1. You will get a reproducible CSV conversion that does not depend on
    the current system time zone.


Simon Michael

unread,
Sep 30, 2022, 4:43:18 PM9/30/22
to hledger


On Sep 29, 2022, at 18:29, Simon Michael <si...@joyful.com> wrote:



On Sep 29, 2022, at 18:08, Simon Michael <si...@joyful.com> wrote:



On Sep 29, 2022, at 16:29, Simon Michael <si...@joyful.com> wrote:

- I think the new behaviour I described is a good incremental improvement, not excessively disruptive, and worth merging
- I think we might want to add the ability to declare the data's timezone in CSV rules
- And, a more portable way (than setting TZ) to choose the target timezone for CSV conversion
- But we haven't yet seen real need for these so perhaps we shouldn't implement them until then


Narrator: "And then, he ran the tests."

It needs a bit more work: my branch assumes all dates are UTC, causing all dates to be off by one.

Here's a new draft of how I think it should work (as next step; it doesn't allow configuring the input time zone yet):


Well, having come this far.. disregard the previous; here is draft #3, including the feature of declaring the input time zone:



Previously, CSV date-times with a different time zone from yours (with or without explicit timezones in the CSV) could give off-by-one dates, because the CSV timezone was ignored.

Now,

• you can use the timezone rule to indicate which other timezone a CSV is implicitly using

• CSV date-times with a timezone - whether declared by rule or parsed with %Z - are localised to the system time zone (or another set with the TZ environment variable).

Reply all
Reply to author
Forward
0 new messages