Is there any Beancount user that uses the hi_IN locale?

100 views
Skip to first unread message

Daniele Nicolodi

unread,
Jun 29, 2020, 12:17:42 AM6/29/20
to Beancount
Hello,

I have been working on the Beancount input file parser and I found that
it does not correctly support digit grouping in monetary values in the
hi_IN locale, despite trying to do so.

Please see issue https://github.com/beancount/beancount/issues/490

I haven't see any bug report regarding this, but I don't know if this is
because no Beancount user uses the hi_IN locale, if it is because there
is no expectation for this uncommon numercial format to be accepted, if
it is because it is not often used in practice (I don't think so because
I can find examples of it in recent Indian newspaper articles).

Properly supporting the hi_IN monetary formatting is some extra work,
but it may be worth doing it if users would benefit.

Would anyone like to comment?

Thank you!

Best,
Dan

jitin

unread,
Jun 29, 2020, 10:37:19 AM6/29/20
to Beancount
Hi,

I am from India and I can say that the hi_IN format - putting commas like 10,00,00,000.00 is very common in India, and is the one used by default here, especially when representing money or in accounting. Banks and institutions definitely report using this format. I personally strip the commas before getting values into Beancount, so this has never been an issue for me.

Regards,
Jitin

Martin Blais

unread,
Jun 29, 2020, 12:34:09 PM6/29/20
to Beancount
I think Japan also has another system, IIRC blocks of four, I don't recall the details.

One of the things we might want to do for v3 is make it possible to specify the locale within Beancount itself (insulating it from its environment) and perhaps bring back the checks (simply removing the commas works). OTOH, right now we don't support decimal commas. Ideally we support all this in v3.




--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/29094e86-5f0c-490e-b3f2-b934ae49e015o%40googlegroups.com.

Daniele Nicolodi

unread,
Jun 29, 2020, 1:20:22 PM6/29/20
to bean...@googlegroups.com
On 29/06/2020 10:33, Martin Blais wrote:
> I think Japan also has another system, IIRC blocks of four, I don't
> recall the details.

Not in the ja_JP locale installed on my system.

> One of the things we might want to do for v3 is make it possible to
> specify the locale within Beancount itself (insulating it from its
> environment) and perhaps bring back the checks (simply removing the
> commas works). OTOH, right now we don't support decimal commas. Ideally
> we support all this in v3.

This would be nice.

Cheers,
Dan

Daniele Nicolodi

unread,
Jun 29, 2020, 1:34:43 PM6/29/20
to bean...@googlegroups.com
On 29/06/2020 08:37, jitin wrote:
> Hi,
>
> I am from India and I can say that the hi_IN format - putting commas
> like 10,00,00,000.00 is very common in India, and is the one used by
> default here, especially when representing money or in accounting. Banks
> and institutions definitely report using this format. I personally strip
> the commas before getting values into Beancount, so this has never been
> an issue for me.

Thanks Jitin,

have you being removing the commas because Beancount didn't support them
in the right way for your locale or is your choice unrelated to this?

Thank you.

Best,
Dan

jitin

unread,
Jun 29, 2020, 2:54:08 PM6/29/20
to Beancount
Hi Dan,

I just assumed that commas won't be supported, particularly for my locale. It felt the safest to remove them altogether.

Regards,
Jitin

Justus Pendleton

unread,
Jun 30, 2020, 10:16:06 AM6/30/20
to Beancount
On Monday, June 29, 2020 at 11:34:09 PM UTC+7, Martin Blais wrote:
One of the things we might want to do for v3 is make it possible to specify the locale within Beancount itself (insulating it from its environment) and perhaps bring back the checks (simply removing the commas works). OTOH, right now we don't support decimal commas. Ideally we support all this in v3.

Handling locales is tricky. The locale is (kinda sorta) tied to the currency. I use multiple currencies. One is USD but the other is VND which is officially defined with "decimal commas" for thousands separators and "comma decimals" for fraction separators. Which one should I use in a given entry?

It might seem obvious -- "let the user pick one and then everything is in that" -- and that might work if all data was user-generated. But the reality is that much of our data comes from external sources and all of them will generate data in their native, expected format. My US bank will export CSVs one way and my Vietnamese bank will export CSVs in another. I'm not sure we want beancount importers and price fetchers to have to parse the user file to lookup their locale to know what format to emit things in. (And unless doing so was made super easy for 3rd party authors of importers and price fetchers it seems likely that few would bother and would simply emit "raw" numbers which would work for 99% of users but be completely broken for others.)

Allowing individual files to have their own locale would allow us to work around that maybe. My Yahoo price fetcher gets redirected to a file with an en_US header. My WellsFargo importer gets redirected to a file with an en_US header. My VietcomBank importer gets redirected to a file with a vi_VN header. Maybe that doesn't complicate the parser and importer too much? Still, it feels like a less than great solution, too, so I'm not thrilled with that proposal either.

Daniele Nicolodi

unread,
Jun 30, 2020, 4:23:27 PM6/30/20
to bean...@googlegroups.com
Another question came to my mind: are stock shares supposed to be
formatted as monetary or regular numerical values?

For most locales these are the same, but for some there is a difference,
for example accordingly to the locale definition in hi_IN locale the
number one million is formatted as "10,00,000" if it is a monetary
value, but as "100,000" if it is just a number.

Thank you.

Cheers,
Dan

francois PEGORY

unread,
Jun 30, 2020, 5:32:09 PM6/30/20
to Beancount
Hello , everyone, I think that it will be easier: 
1) in a file parsed by the beancount parser , we have always the same "locale". Meaning everywhere with "." As decimal separator , nothing as  thousand separator and the currency and currency after ( and not before)
2) that the foreign locale should be managed par the specific importer and that export file with the common format whatever the initial format is.

Like this, a same beancount  file could be used everywhere.

Currently, if I have some USD expenses and some EUR expenses , I could choose to have two file or one file. It doesn't matter. 

Regards


--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

jitin

unread,
Jun 30, 2020, 6:45:15 PM6/30/20
to Beancount
Hi,

I'm not sure about other locales, but in practice, for hi_IN almost everything is expressed with that notation, not just monetary values. For an example, check out this website - covid19india.org , all values would be in that formatting. The rationale is that instead of expressing things in thousands, milions, or billions, they are more commonly expressed here in thousands, "lakhs" and "crores" where 1 lakh = 1,00,000 and 1 crore = 1,00,00,000. Hence the commas to enable expressing things in those units. 

Thanks,
Jitin

Daniele Nicolodi

unread,
Jun 30, 2020, 8:07:19 PM6/30/20
to bean...@googlegroups.com
On 30/06/2020 08:16, Justus Pendleton wrote:
> On Monday, June 29, 2020 at 11:34:09 PM UTC+7, Martin Blais wrote:
>
> One of the things we might want to do for v3 is make it possible to
> specify the locale within Beancount itself (insulating it from its
> environment) and perhaps bring back the checks (simply removing the
> commas works). OTOH, right now we don't support decimal commas.
> Ideally we support all this in v3.
>
>
> Handling locales is tricky. The locale is (kinda sorta) tied to the
> currency. I use multiple currencies. One is USD but the other is VND
> which is officially defined with "decimal commas" for thousands
> separators and "comma decimals" for fraction separators. Which one
> should I use in a given entry?

Individual practices may deviate from this, however in "official" locale
definitions the monetary formats tied to the locale language and zone,
and not to the currency. It is expected that the same construct are used
consistently to represent numbers independently to the currency.

There is no definition for how to represent amounts in USD, but only a
definition for how to represent monetary amounts in American English,
also known as the "en_US" locale. For example, the Euro is used across
different locales and in many of them monetary amounts are spelled
differently.

> It might seem obvious -- "let the user pick one and then everything is
> in that" -- and that might work if all data was user-generated. But the
> reality is that much of our data comes from external sources and all of
> them will generate data in their native, expected format.

What is discussed here is the format accepted for amounts in the
Beancount input files. Facilities for parsing external data sources are
an issue that is only tangentially related.

> My US bank
> will export CSVs one way and my Vietnamese bank will export CSVs in
> another. I'm not sure we want beancount importers and price fetchers to
> have to parse the user file to lookup their locale to know what format
> to emit things in. (And unless doing so was made super easy for 3rd
> party authors of importers and price fetchers it seems likely that few
> would bother and would simply emit "raw" numbers which would work for
> 99% of users but be completely broken for others.)

I don't know what you mean by "raw" numbers. The Beancount import
mechanism only accepts data in decimal (as in instances of the Python
decimal.Decimal class) representation, not in string representation. It
is responsibility of the import modules to perform the conversion from
strings to decimal. When the imported transactions or prices are
serialized into Beancount ledgers a choice must be made about which
format to use.

It is true that Beancount does not provide any fancy or locale aware way
to parse decimal numbers, but I don't think it should. Writing a naive
parser is trivial and there are dedicated libraries for more fancy
solutions.

> Allowing individual files to have their own locale would allow us to
> work around that maybe.

I don't understand which problem you are trying to work-around.
Different locales for different Beancount input files can be supported
without too much complication (once a locale aware parser exists).

However, in most cases, if a user handles different currencies at some
point it will need to have transactions with legs in different
currencies, and only one locale could be used for reporting, thus still
having some amounts written in the "wrong" locale for the currency.

It would also be very confusing: assume you have one file using en_US
locale and one using fr_FR locale, it would be very hard to know what an
amount like "1,000 JPY" is simply looking at it. Are these one Yen or
thousands Yen?

Cheers,
Dan

Daniele Nicolodi

unread,
Jun 30, 2020, 10:20:10 PM6/30/20
to bean...@googlegroups.com
On 30/06/2020 16:45, jitin wrote:
> I'm not sure about other locales, but in practice, for hi_IN almost
> everything is expressed with that notation, not just monetary values.

Interesting. Does the locale definition on your OS do the right thing?
For example, what are the results you obtain for the Python code below?

locale.setlocale(locale.LC_ALL, locale.normalize('hi_IN'))
print(locale.format_string('%f', 1.0e6, grouping=True, monetary=False))
print(locale.format_string('%f', 1.0e6, grouping=True, monetary=True))

I get:

10,000,00.000000
10,00,000.000000

but it is unlikely that this is right, looks like a bug to me.

Thank you!

Cheers,
Dan

jitin

unread,
Jul 1, 2020, 2:37:13 PM7/1/20
to Beancount
I get the same thing on my mac with 'hi_IN.ISCII-DEV'. The monetary=False one must be incorrect. As far as I can see locale.localeconv()["grouping"] gives [2,3,0], whereas locale.localeconv()["mon_grouping"] returns [3,2,0] which is the correct one.

On Ubuntu, with 'hi_IN.UTF-8', I get "mon_grouping" of  [3, 2, 0] (which is correct/expected), and "grouping" of [3, 0] which means 1,000,000,000, etc. 

But the grouping of [2,3,0] (1,00,000,000, etc.) definitely looks incorrect. 

The following on my mac gives me the same result:


export LC_ALL="hi_IN.ISCII-DEV"

locale -ck mon_grouping  # gives 3;2

locale -ck grouping  # gives 2;3


Looks like it's defined this way:

I'm not sure why that is so. 

Regards,
Jitin

Daniele Nicolodi

unread,
Jul 1, 2020, 3:10:19 PM7/1/20
to bean...@googlegroups.com
On 29/06/2020 11:20, Daniele Nicolodi wrote:
> On 29/06/2020 10:33, Martin Blais wrote:
>> I think Japan also has another system, IIRC blocks of four, I don't
>> recall the details.
>
> Not in the ja_JP locale installed on my system.

After determining that the locale definitions distributed with macOS may
not be that accurate, I checked in the glibc sources (which I believe
may be the most complete and widely used source I can find).

While it holds true that ja_JP uses the most common three digits
grouping, Taiwan uses four digits grouping, and the Indian (hi_IN)
system is used also in Bhutan (dz_BT) and Bangladesh (bn_BD).

Cheers,
Dan

jitin

unread,
Jul 1, 2020, 3:28:00 PM7/1/20
to Beancount
The hi_IN grouping scheme is also used in Pakistan, Nepal and most probably Sri Lanka (the entire Indian subcontinent). 

I think the Apple definition is a bug, and based on FreeBSD source and they seemed to have fixed the grouping from 2;3 to 3;2 - https://github.com/freebsd/freebsd/blob/671598262f152160b6b4c61f03100fc4c432ee84/share/numericdef/hi_IN.ISCII-DEV.src

It seems the FreeBSD uses the CLDR project now for locales.


Regards,
Jitin

Daniele Nicolodi

unread,
Jul 1, 2020, 4:16:47 PM7/1/20
to bean...@googlegroups.com
Thanks Jitin. My results were also on macOS (10.14 and 10.15). I don't
know where macOS gets its definitions of the locales, but they are
clearly wrong.

I checked the glibc sources and they seem to have it right, as you
reported. Still it seems that the hi_IN locale uses three digits
grouping for non monetary values, in contrast to what seems to be common
practice.

Looking into this has been an interesting learning experience.

Cheers,
Dan
> --
> You received this message because you are subscribed to the Google
> Groups "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to beancount+...@googlegroups.com
> <mailto:beancount+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/bb63e8c2-ea63-4988-b464-76f6948becb7o%40googlegroups.com
> <https://groups.google.com/d/msgid/beancount/bb63e8c2-ea63-4988-b464-76f6948becb7o%40googlegroups.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages