The idea is to make numbering formatting a little easier with the new
format() builtin
in Py2.6 and Py3.0: http://docs.python.org/library/string.html#formatspec
-------------------------------------------------------------
Motivation:
Provide a simple, non-locale aware way to format a number
with a thousands separator.
Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of
output exposed to end users.
In the finance world, output with commas is the norm. Finance
users
and non-professional programmers find the locale approach to be
frustrating, arcane and non-obvious.
It is not the goal to replace locale or to accommodate every
possible convention. The goal is to make a common task easier
for many users.
Research so far:
Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The
COMMA is used when a PERIOD is the decimal separator.
James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands.
Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format specifiers
like: "_($* #,##0_)".
Proposal I (from Nick Coghlan]:
A comma will be added to the format() specifier mini-language:
[[fill]align][sign][#][0][minimumwidth][,][.precision][type]
The ',' option indicates that commas should be included in the
output as a
thousands separator. As with locales which do not use a period as
the
decimal point, locales which use a different convention for digit
separation will need to use the locale module to obtain
appropriate
formatting.
The proposal works well with floats, ints, and decimals. It also
allows easy substitution for other separators. For example:
format(n, "6,f").replace(",", "_")
This technique is completely general but it is awkward in the one
case where the commas and periods need to be swapped.
format(n, "6,f").replace(",", "X").replace(".", ",").replace
("X", ".")
Proposal II (to meet Antoine Pitrou's request):
Make both the thousands separator and decimal separator user
specifiable
but not locale aware. For simplicity, limit the choices to a
comma, period,
space, or underscore..
[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision]
[type]
Examples:
format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
format(1234, "8T .f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'
This proposal meets mosts needs (except for people wanting
grouping
for hundreds or ten-thousands), but it comes at the expense of
being a little more complicated to learn and remember. Also, it
makes it
more challenging to write custom __format__ methods that follow
the
format specification mini-language.
For the locale module, just the "T" is necessary in a formatting
string
since the tool already has procedures for figuring out the actual
separators from the local context.
Comments and suggestions are welcome but I draw the line at supporting
Mayan numbering conventions ;-)
Raymond
Here's a re-post (hopefully without the line wrapping problems
in the previous post).
Raymond
-------------------------------------------------------------
Motivation:
-----------
Provide a simple, non-locale aware way to format a number
with a thousands separator.
Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output
exposed to end users.
In the finance world, output with commas is the norm. Finance
users and non-professional programmers find the locale
approach to be frustrating, arcane and non-obvious.
It is not the goal to replace locale or to accommodate every
possible convention. The goal is to make a common task easier
for many users.
Research so far:
----------------
Scanning the web, I've found that thousands separators are
usually one of COMMA, PERIOD, SPACE, or UNDERSCORE. The
COMMA is used when a PERIOD is the decimal separator.
James Knight observed that Indian/Pakistani numbering systems
group by hundreds. Ben Finney noted that Chinese group by
ten-thousands.
Visual Basic and its brethren (like MS Excel) use a completely
different style and have ultra-flexible custom format
specifiers like: "_($* #,##0_)".
Proposal I (from Nick Coghlan):
-------------------------------
A comma will be added to the format() specifier mini-language:
[[fill]align][sign][#][0][minimumwidth][,][.precision][type]
The ',' option indicates that commas should be included in the
output as a thousands separator. As with locales which do not
use a period as the decimal point, locales which use a
different convention for digit separation will need to use the
locale module to obtain appropriate formatting.
The proposal works well with floats, ints, and decimals.
It also allows easy substitution for other separators.
For example:
format(n, "6,f").replace(",", "_")
This technique is completely general but it is awkward in the
one case where the commas and periods need to be swapped:
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X",
".")
Proposal II (to meet Antoine Pitrou's request):
-----------------------------------------------
Make both the thousands separator and decimal separator user
specifiable but not locale aware. For simplicity, limit the
choices to a comma, period, space, or underscore.
[[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]
Examples:
format(1234, "8.1f") --> ' 1234.0'
format(1234, "8,1f") --> ' 1234,0'
format(1234, "8T.,1f") --> ' 1.234,0'
format(1234, "8T .f") --> ' 1 234,0'
format(1234, "8d") --> ' 1234'
format(1234, "8T,d") --> ' 1,234'
This proposal meets mosts needs (except for people wanting
grouping for hundreds or ten-thousands), but iIt comes at the
IIRC, some cultures use a non-uniform grouping, like e.g. "123 456 78.9".
For that, there is also a grouping reserved in the locale (at least in
those of C++ IOStreams, that is). Further, an that seems to also be one of
your concerns, there are different ways to represent negative numbers like
e.g. "(123)" or "-456".
> Make both the thousands separator and decimal separator user
> specifiable but not locale aware. For simplicity, limit the
> choices to a comma, period, space, or underscore.
>
> [[fill]align][sign][#][0][minimumwidth][T[tsep]][dsep precision][type]
>
> Examples:
>
> format(1234, "8.1f") --> ' 1234.0'
> format(1234, "8,1f") --> ' 1234,0'
> format(1234, "8T.,1f") --> ' 1.234,0'
> format(1234, "8T .f") --> ' 1 234,0'
> format(1234, "8d") --> ' 1234'
> format(1234, "8T,d") --> ' 1,234'
How about this?
format(1234, "8.1", tsep=",")
--> ' 1,234.0'
format(1234, "8.1", tsep=".", dsep=",")
--> ' 1.234,0'
format(123456, tsep=" ", grouping=(3, 2,))
--> '1 234 56'
IOW, why not explicitly say what you want using keyword arguments with
defaults instead of inventing an IMHO cryptic, read-only mini-language?
Seriously, the problem I see with this proposal is that its aim to be as
short as possible actually makes the resulting format specifications
unreadable. Could you even guess what "8T.,1f" should mean if you had not
written this?
> This proposal meets mosts needs (except for people wanting
> grouping for hundreds or ten-thousands), but iIt comes at the
> expense of being a little more complicated to learn and
> remember.
Too expensive for my taste.
Uli
--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
That makes sense to me but I don't think that's the way the format()
builtin was implemented (see PEP 3101 which was implemented Py2.6 and
3.0).
It is a simple pass-through to a __format__ method for each
formattable
object. I don't see how keywords would fit in that framework. What
is
proposed is similar to locale module's existing "n" specifier except
that
this lets you say exactly what you want instead of deferring to the
locale
settings.
The mini-language seems to already be the way of things (just as it is
many other languages including PHP, C, Fortran, and whatnot). I'm
just
proposing an addition "T," so you add commas as a thousands separator.
Raymond
... and why not C (centum) for hundreds (can't have H(ollerith)) and W
for wan (the Chinese word for 10 thousand)?
As far as I am concerned the most simple version plus a way to swap
around commas and period is all that is needed. The rest can be done
using one replace (because the decimal separator is always one of two
options). This should cover everywhere but the far east. 80% of cases
for 20% of implementation complexity.
For example:
[[fill]align][sign][#][0][,|.][minimumwidth][.precision][type]
> format(1234, ".8.1f") --> ' 1.234,0'
> format(1234, ",8.1f") --> ' 1,234.0'
Thanks for the feedback.
FWIW, posted a cleaned-up version of the proposal at
http://www.python.org/dev/peps/pep-0378/
Raymond
It would be nice if the PEP included a comparison between the proposed
scheme and how it is done in other programs and languages. For
example, I think Common Lisp has a feature for formatting thousands.
Spreadsheets like Excel probably have something similar. Those
programs are pretty well evolved and probably address the important
real use cases by now. It might be best to follow an existing example
(with adjustments for Pythonification as necessary) to the extent
possible.
Good idea. I'm hoping that people will post those here.
In my quick research, it looks like many languages offer
nothing more than the usual C style % formatting and defer
the rest for a local aware module.
> Â For
> example, I think Common Lisp has a feature for formatting thousands.
Do you have more detail?
> Spreadsheets like Excel probably have something similar.
I addressed that in the PEP in the section on VB and relatives. Their
approach doesn't graft-on to our existing approach. They use format
specifiers like: "_($* #,##0_)".
Raymond
>IOW, why not explicitly say what you want using keyword arguments with
>defaults instead of inventing an IMHO cryptic, read-only mini-language?
>Seriously, the problem I see with this proposal is that its aim to be as
>short as possible actually makes the resulting format specifications
>unreadable. Could you even guess what "8T.,1f" should mean if you had not
>written this?
+1
Look back in history, and see how COBOL did it with the
PICTURE - dead easy and easily understandable.
Compared to that, even the C printf stuff and python's %
are incomprehensible.
- Hendrik
[[fill]align][sign][#][0][minimumwidth][,][.precision][L][type]
Examples:
Assuming the locale has:
decimal point: ","
grouping separator: "."
grouping spacing: 3
format(123456, "10.1f") --> ' 123456.0'
format(123456, "10.1Lf") --> ' 123.456,0'
format(123456, "10,.1f") --> ' 123,456.0'
format(123456, "10,.1Lf") --> ' 123.456,0'
I found the Common Lisp spec for this and added it to the PEP.
Raymond
Hendrik van Rooyen's mention of Cobol's "picture" (aka PIC)
specifications might be added to the list. Cautionary tale: I once
had a similar idea and suggested including a bastardized version of
PIC in an extension language for something I worked on once. Another
programmer then coded a reasonable PIC subset and we shipped it.
Turned out that a number of our users were Cobol experts and once we
had anything like PIC, they expected the weirdest and most obscure
features (of which there were quite a few) of real Cobol PIC to work.
We ended up having to assign someone a fairly lengthy task of figuring
out the Cobol spec and implementing every last damn PIC feature. But
I digress.
> > example, I think Common Lisp has a feature for formatting thousands.
> Do you have more detail?
http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node200.html
gives as an example:
(format nil "The answer is ~:D." (expt 47 x))
=> "The answer is 229,345,007."
Ah, cool, I simultaneously looked for it and posted about it.
Missing from this PEP:
output below the decimal point.
show results for something like:
format(12345.54321, "15,.5f") --> ' 12,345.543,21'
Explain the interaction on sizes and lengths (which numbers are digits,
which are length [I vote for length on overall, digits on precision]),
and what happens with length-4 -- I'd say explicitly 1000 is show as
"1,000" despite style sheets that prefer "1000" and "10,000".
FWIW, I agree with pruebano, do the simplest easily usable thing, and
provide a way to swap the commas and periods. The rest can be ponied
in by string processing.
--Scott David Daniels
Scott....@Acm.Org
Seeing how many people complained for the proposal being unreadable
(although it tries to be simple by not including too much features), why
not go all the way to unreadability and teach people to always use some
sort of convenience function and never use the microlanguage except of
very simple cases (or extremely complex cases, in which case you might
actually be better served with writing your own formatting function).
A hyphotetical code using conv function and the microlanguage could look
like this:
>>> num = 213210.3242
>>> fmt = create_format(sep='-', decsep='@')
>>> print fmt
50|\/|3_v3ry_R34D4|3L3_C0D3
>>> '{0!{1}}'.format(num, fmt)
'213-210@3242'
LOL, it's like APL all over again ;-)
FWIW, the latest version of the proposal is dirt simple:
>>> format(1234567, 'd') # what we have now
'1234567'
>>> format(1234567, ',d') # proposed new option
'1,234,567'
>>> format(1234.5, '.2f') # what we have now
'1234.50'
>>> format(1234.5, ',.2f') # proposed new option
'1,234.50'
The proposal is roughly:
If you want commas in the output,
put a comma in the format string.
It's not rocket science.
What is rocket science is what you have to do now
to achieve the same effect. If someone finds the
above to be baffling, how the heck are they going
to do the same thing using the locale module?
Raymond
As far as I can see you're proposing an amendment to *encourage*
writing code that is not locale aware, with the amendment itself being
locale specific, which surely has to be a regressive move in the 21st
century. Frankly, I'd sooner see it made /harder/ to write code that
is not locale aware (warnings, like FxCop gives on .net code?) tnan
/easier/. Perhaps that's because I'm British, not American and I'm
sick of having date fields get the date wrong because the programmer
thinks the USA is the world. It makes me sympathetic to the problems
caused to others by programmers who think the English-speaking world
is the world.
By the way, to others who think that 123,456.7 and 123.456,7 are the
only conventions in common use in the West, no they're not. 123 456.7
is in common use in engineering, at least in Europe, precisely to
reduce (though not eliminate) problems caused by dot and comma
confusion..
--
Tim Rowe
would it break anything to also allow
>>> format(1234567, 'd') # what we have now
'1234567'
>>> format(1234567, '.d') # proposed new option
'1.234.567'
>>> format(1234.5, ',2f') # proposed new option
'1234,50'
>>> format(1234.5, '.,2f') # proposed new option
'1.234,50'
because that would support a moderate chunk of the non-english speaking
users and seems like a natural extension.
(i'm still not sure this is that great an idea - if you think using a
locale is "rocket science" then perhaps your excess energy would be better
spent making locale easier, rather than tweaking this behaviour for a
subset of users?)
andrew
What if you want to change the separator? Europeans usually
use periods instead of commas: one thousand = 1.000.
Yes, that's allowed too! The separators can be any one of COMMA,
SPACE, DOT, UNDERSCORE, or NON-BREAKING-SPACE.
That is supported also.
do you support just a fixed set of separators or anything?
how about this: (Switzerland)
12'000.99
or spacing:
12 000.99
--
Wolfgang
I lived in three different countries and in school used blank for
thousand separator to avoid confusion with the multiply operator. I
think this proposal is more for debugging big numbers and meant mostly
for programmers' eyes. We are already using the dot instead of comma
decimal separator in our programming languages that one more
Americanism won't kill us.
I am leaning towards proposal 1 now just to avoid the thousand
variations that will be requested because of this, making the
implementation unnecessarily complex. I can always use the 3
replacement hack (conveniently documented in the pep).
+1 for Nick's proposal
If it were for the programmers' eyes then it would be in the code, not
in the formatted output. Debugging of big numbers can be done by
checking within code, so there's no need to let this escape to the
output.
And if it's for programmers' eyes then the statement "The COMMA is
used when a PERIOD is the decimal separator" is wrong, at least if it
means that the COMMA is the /only/ separator used when a PERIOD is the
decimal separator. Ada uses UNDERSCOREs, which can be placed almost
anywhere in a numeric literal and are ignored.
And if it's mostly for programmers' eyes, why does the motivation
state that "Adding thousands separators is one of the simplest ways to
improve the professional appearance and readability of output exposed
to end users"? The proposal is clearly for the presentation of numbers
to end users, and quite simply is an encouragement to sloppiness in
presenting those numbers. If "Finance users and non-professional
programmers find the locale approach to be frustrating, arcane and
non-obvious" then by all means propose a way of making it simpler and
clearer, but not a bodge that will increase the amount of bad software
in the world.
-1 for all of the proposals.
--
Tim Rowe
What if I want other separators?
How about this idea: make the format has "long" format, which is a bit
more verbose, flexible, and unambiguous, and the current proposal a
"short" format, which is more concise.
The "long" format would be like this (this is much, much more featureful
than the current proposition, I think I might have crossed far beyond
the Mayan line):
[n|sign <signnegative>[[, <signzero>], <signpositive>] | ]
[w|min <minwidth>[, <align>[, <alignfill>]]]
[x|max <maxwidth>[, <overflowsign[, overflowalign]>]]
[s|sep [[...]<sep><sepwidth>]<sep><sepwidth> | ]
[dp|decpoint <decpoint> | ]
[ds|decsep <width><sep>[, <width><sep>[...]] | ]
[b|base <base-n>[, <charset>]]
[p|prec <prec> | ]
t|type <type>
The feel of "long" format
fmt_string: 'type f'
number: 876543213456.98765445
result: 876543213456.98765445
fmt_string: 'decpoint ^ | type f'
number: 876543213456.98765445
result: 876543213456^98765445
fmt_string: 'sep 2>1:3.4 | decpoint , | prec 3 | type f'
number: 876543213456.98765445
result: 87>65>4:321.3456,988
fmt_string: 'sep 2>1:3.4 | decpoint , | prec 3 | type f'
number: 876543213456.98765445
result: 87>65>4:321.3456,988
fmt_string: 'sep 2>1:3.4 | decpoint , | prec 3 | type f'
number: 876543213456.98765445
result: 87>65>4:321.3456,988
General Rules:
- every field, except type is optional
- fields are separated by | (this may change), escape literal | with ||
- every fields starts with an identifier then a mandatory whitespace
- subfields are separated by commas. Each identifier has long and short
identifier.
- Processing precedent is: type, base, prec, sep/decsep, decpoint, sign,
min, max
Specific rules:
- min and max determines width, min determine the rule when the
resulting string is shorter than minwidth, max determine rule when the
resulting string is longer than maxwidth (basically trimming). alignfill
is character/sequence of character to be used to make the resulting
string as long as minwidth, overflowsign is character added when
maxwidth is exceeded and trimming occurs
- sep is basically a separator delimited for each width. The regular
latin number system would be represented as sep 3.3 the leftmost number
and separator would be repeated.
- decsep works similarly to sep
- base is the number base, charset is mapping of digits used to
represent output number in the certain base.
PS: It is not designed for hand written, but is meant to be fairly readable
PPS: It is fairly modular too
format(n, ',d').replace(",", yoursep)
> How about this idea: make the format has "long" format, which is a bit
> more verbose, flexible, and unambiguous, and the current proposal a
> "short" format, which is more concise.
>
> The "long" format would be like this (this is much, much more featureful
> than the current proposition, I think I might have crossed far beyond
> the Mayan line):
I concur ;-)
Raymond
>> Look back in history, and see how COBOL did it with the
>> PICTURE - dead easy and easily understandable.
>> Compared to that, even the C printf stuff and python's %
>> are incomprehensible.
>>
>> - Hendrik
Yes. In COBOL, one writes
PICTURE $999,999,999.99
which is is way ahead of most of the later approaches.
John Nagle
* Detail issues with the locale module.
* Summarize commentary to date.
-- Opposition to formatting strings in general
(preferring a convenience function or PICTURE clause)
-- Opposition to any non-locale aware approach
* Add APOSTROPHE and non-breaking SPACE to the list of separators.
* Add more links to external references
(Babel, Excel, ADA, CommonLisp, COBOL, C-Sharp).
* Clarify how proposal II is parsed.
Raymond
The string methods could perhaps retain their current behaviour as the
default and accept a parameter to make them locale-sensitive. The same
could be the case for 'format' so the format string has "." to represent
the decimal point and "," to represent the digit separator, and those
would be the default, but it could accept a flag ("L"?) to make it
locale-sensitive.
It occurs to me, at least for quantities of data, one of the most
useful aids to readability is scaling down the quantity and suffixing
it with K (kilo), M (mega), G (giga), etc. This is sometimes done
with K=1000 and sometimes with K=1024 (fancy pronunciation "kibi"
rather than kilo, officially abbreviated Ki). Possible formatting:
'%.3K' % 1234567 = 1.235K # K = 1000
'%.:3Ki' % 1234567 = 1.206K # K = 1024
The colon (two dots) signifies base two. The "i" is not part of the
format spec, it's just a literal character, to make the standard
abbreviation for kibi.
I meant 1.235M and 1.177M, of course.
Raymod, I think there are several problems with the Motivations:
> The goal is to make a common task easier
> for many users.
Common task, for most people, means formatting numbers to the locale. We
should make converting numbers to locale easier to use, as easy as
calling a magic function that can convert the current object to the
locale representation or as simple as defining locale ID in the mini
language. This proposal, I believe, is for the _less_ common task of
formatting a number to a custom format not generally used anywhere else
in the world (like formatting a number to form an ipv6 address or
formatting a number to html/TeX code[1]).
[1] I know one mathematic textbook that uses superscript negative for
negative number to disambiguate it with minus sign.
> In the finance world, output with commas is the norm.
I can't cite any source, but I am skeptical with that. And how about
non-finance world? Scientific world? Pure math world?
> Provide a simple, non-locale aware way to format a number
> with a thousands separator.
Many have pointed out, locale is hard to use, this is easier approach
but pity it is not locale aware. If we want to provide a non-locale
aware formatting, we must make it flexible enough to make it the
"Ultimate Formatter". Otherwise it will just be redundant to locale.
> Adding thousands separators is one of the simplest ways to
> improve the professional appearance and readability of
> output exposed to end users.
There are infinitely many approach to numbers. One Singaporean text book
uses half-width space as thousand separator. One Autralian text book
uses superscript minus for negative numbers (which I believe would
require more than Unicode to represent, TeX or PDF perhaps). The
accounting world sometimes uses colors and parentheses to denote
negative numbers (this requires emmiting codes for the layout program:
HTML, TeX, PDF)
Anything less powerful than my proposed "Crossing Mayan line" is just a
harder alternative for locale module.
No doubt that you're skeptical of anything you didn't
already know ;-) I'm a CPA, was a 15 year division controller
for a Fortune 500 company, and an auditor for an international
accounting firm. Believe me when I say it is the norm in finance.
Besides, it seems like you're arguing that thousands separators
aren't needed anywhere at all and have doubts about their
utility. Pick-up your pocket calculator and take a look.
Look at your paycheck or your bank statement. Check-out a
publishing style guide. They are somewhat basic. There's
a reason the MS Excel and Lotus offered them from day one.
Python's format() style was taken directly from C-Sharp.
which offers both an "n" format that is locale sensitive
and a non-locale-sensitive variant that specifies a comma.
I'm suggesting that we also do both.
Random, make-up statistic: 99% of Python scripts are
not internationalized, have no need to be internationalized,
and have output intended to be used in the script writer's
immediate environment.
Another issue I have with locale is that you have to find
one that matches every specific need. Quick, which one gives
you non-breaking spaces for a thousands separator? If you
do find such a locale and it happens to be spelled the same
way on every platform, is it self-evident in your program
that it will in fact print with spaces or has that become
an implicit, behind the scenes operation. If later you need
to print another number with a different separator, do you
have a way make that happen without breaking the first piece
of code you wrote?
The locale module has plenty of issues for a programmer to
think about:
http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats
Besides, lots of people use Python who are not professional
programmers. We should not require them enter the complicated
world of locale just to do a basic formatting task. When
I teach Python to pre-college students, there is no way I'm
adding locale to the list of things they need to learn to
become functional with the language.
Sorry for the long post, but I feel like you keep inventing
heavy solutions that don't fit well with what we already have.
This should be a simple problem -- when writing a number
format, how I specify that I want character X as a thousands
separator. The answer to that question should be nothing
harder than, "add character X to the format string."
You're a very creative person, but I don't see Guido accepting
any idea that rejects what he has already chosen as the way
to format strings. He is no fan of the locale module's API,
but it is tightly bound to existing programs and POSIX
standards. That greatly limits the options for changing it.
I'm sure you can come-up with 500 ways of meeting this need
(almost none of which meld with Guido's choice to accept
PEP3101 for both 2.6 and 3.0). I'm offering a simple
extension to the existing framework that makes the above
tasks easy. C-sharp make essentially the same choice in its
design. There's no reason for you to have to use it
if you hate it.
Cheers,
Raymond
>
>
>
> Comments and suggestions are welcome but I draw the line at supporting
> Mayan numbering conventions ;-)
>
>
> Raymond
I have no reason to doubt that output with separators is nice, but I am
skeptical that all financial institution in the world (not just US) uses
commas for their separators.
> Python's format() style was taken directly from C-Sharp.
> which offers both an "n" format that is locale sensitive
> and a non-locale-sensitive variant that specifies a comma.
> I'm suggesting that we also do both.
I'm fine with that. But no commas, instead user-defineable separators.
> Random, make-up statistic: 99% of Python scripts are
> not internationalized, have no need to be internationalized,
> and have output intended to be used in the script writer's
> immediate environment.
Random, make up statistic: 95% of which is scripts written for
personal/internal use.
> If you
> do find such a locale and it happens to be spelled the same
> way on every platform, is it self-evident in your program
> that it will in fact print with spaces or has that become
> an implicit, behind the scenes operation. If later you need
> to print another number with a different separator, do you
> have a way make that happen without breaking the first piece
> of code you wrote?
Yeah, every data in transmission should be in locale independent format,
it should only be turned to locale aware format just before viewing to
the user. That way nothing will break.
Since you're an accountant, I am sure you know about Quicken Files,
which stores data in locale format, which IMHO is a very BAD design.
> Another issue I have with locale is that you have to find
> one that matches every specific need. Quick, which one gives
> you non-breaking spaces for a thousands separator?
That wasn't the issue. Most programs would either "use the environment's
locale and give user configuration to override the locale" or "I don't
care, the output is for personal/internal consumption" or "The data only
makes sense with certain formatting". I don't see a use case where the
programmer would really want to hardcode a locale AND want the output to
be exactly like what he sees in the user machine.
The first case ("use the environment's locale and give user
configuration to override the locale") is for internationalized
applications, and is served by locale. The locale module is currently
difficult to work with, so I believe we should provide a more accessible
way.
The second case ("I don't care, the output is for personal/internal
consumption"), is well served by python's default view.
The third case ("The data only makes sense with certain formatting") is
the one that will benefit the most from non-locale aware formatting. But
they would require a very powerful formatter. Such use case is
formatting IP address, telephone number, ID card number, etc.
My proposition is: make the format specifier a simpler API to locale
aware or make it capable to serve the third case. I would rather
prioritize on the former case.
You do know that we already have one, right?
That's what the existing "n" specifier does.
Raymond
> Yes. In COBOL, one writes
>
> PICTURE $999,999,999.99
>
> which is is way ahead of most of the later approaches.
That was fixed width. For zero suppression:
PIC $$$$,$$$,$99.99
This will format 1000 as $1,000.00
For fixed width zero suppression:
PIC $ZZZ,ZZZ,Z99.99
gives a fixed width field - $ 1,000.00
with a fixed width font, this will line the column up,
so that the decimals are under each other.
- Hendrik
> No doubt that you're skeptical of anything you didn't
> already know ;-) I'm a CPA, was a 15 year division controller
> for a Fortune 500 company, and an auditor for an international
> accounting firm. Believe me when I say it is the norm in finance.
> Besides, it seems like you're arguing that thousands separators
> aren't needed anywhere at all and have doubts about their
> utility. Pick-up your pocket calculator and take a look.
> Look at your paycheck or your bank statement.
My current bank and my previous bank use 2 ways to write numbers:
1. a decimal comma, and a space (or half-space or any other appropriate
small whitespace) as a thousands separator
2. written full out in words (including the currency names)
Invoices (not from these banks) often use a point as the thousands
separator (although that's "wrong" according to some national
standards, it's probably okay according to accounting standards...).
The second formatting (full words) is a legal requirement on certain
financial & legal documents here (and I can imagine in other countries
too?). Anybody working on a PEP about implementing a 'w' (for "wordy"?)
formatting type? ;-)
--
JanC
8< -----------------------------------------------------------------
> ......... If "Finance users and non-professional
> programmers find the locale approach to be frustrating, arcane and
> non-obvious" then by all means propose a way of making it simpler and
> clearer, but not a bodge that will increase the amount of bad software
> in the world.
I do not follow the reasoning behind this.
It seems to be based on an assumption that the locale approach
is some sort of holy grail that solves these problems, and that
anybody who does not like or use it is automatically guilty of
writing crap code.
No account seems to be taken of the fact that the locale approach
is a global one that forces uniformity on everything done on a PC
or by a user.
So when you want to make a report in a format that would suit
what your foreign visitors are used to, do you have to change
your server's locale, and change it back again afterwards, or what ?
The locale approach has all the disadvantages of global variables.
To make software usable by, or expandable to, different languages
and cultures is a tricky design problem - you have to, at the
minimum, do things like storing all your text, both for prompts and
errors, in some kind of database and refer to it by its key, everywhere.
You cannot simply assume, that because a number represents
a monetary value, that it is Yen, or Australian Dollar, or whatever -
you may have to convert it first, from its currency, to the currency
that you want to display it as, and only then can you worry about
the format that you want to display it in.
In all of this, as I see it, the locale approach addresses only a small
part, and solves very little.
Why is it still being defended and touted as if it were 42? *
- Hendrik
* the answer to life, the universe, and everything.
( - Douglas Adams' Hitchhiker books)
I went "tilt" like a slot machine long before I noticed...
:-)
- Hendrik
> "Tim Rowe" <dig....il.com> wrote:
>
> 8< -----------------------------------------------------------------
>
>> ......... If "Finance users and non-professional
>> programmers find the locale approach to be frustrating, arcane and
>> non-obvious" then by all means propose a way of making it simpler and
>> clearer, but not a bodge that will increase the amount of bad software
>> in the world.
>
> I do not follow the reasoning behind this.
>
> It seems to be based on an assumption that the locale approach
> is some sort of holy grail that solves these problems, and that
> anybody who does not like or use it is automatically guilty of
> writing crap code.
>
> No account seems to be taken of the fact that the locale approach
> is a global one that forces uniformity on everything done on a PC
> or by a user.
Like unicode, locales should make using your computer with your own
cultural settings a one-time configuration, and make using your
computer in another setting possible. By and large they do this.
Like unicode, locales fail in as much as they make cross-cultural
usage difficult. Unlike unicode, there is a lot of failure in the
standard locale library, which is almost entirely the fault of the
standard C locale library it uses.
Nobody's defending the implementation, as far as I've noticed
(which isn't saying much at the moment, but still...). A bit of
poking around in the cheese shop suggests that Babel
(http://www.babel.edgewall.org/) would be better, and Babel with
a context manager would be better yet.
On the other hand, we have a small addition to format strings.
Unfortunately it's a small addition that doesn't feel terribly
natural in a mini-language that already runs the risk of looking
like line noise when you pull the stops out. Not meaning the
term particularly unkindly, it is a bodge; it's quick and dirty,
syntactic saccharin rather than sugar for doing one particular
thing for one particular interest group, and which looks
deceptively like the right thing to do for everyone else.
That's a bad thing to do. If we ever do get round to fixing
localisation (i.e. making overriding bits of locales easy), it
becomes a feature that's automatically present that we have
to discourage normal programmers from using despite it's
apparent usefulness.
Frankly, I'd much rather fix the locale system and extend
the format syntax to override the default locale. Perhaps
something like
financial = Locale(group_sep=",", grouping=[3])
print("my number is {0:10n:financial}".format(1234567))
It's hard to think of a way of extending "%" format strings
to cope with this that won't look utterly horrid, though!
--
Rhodri James *-* Wildebeeste Herder to the Masses
locale.predefined["financial"] = Locale(group_sep=",", grouping=[3])
> No account seems to be taken of the fact that the locale approach
> is a global one that forces uniformity on everything done on a PC
> or by a user.
Not so. Under .NET, for instance, the global settings will give you a
default CultureInfo class, but you can create your own CultureInfo
classes for other cultures in your program and use them in place of
the default.
> So when you want to make a report in a format that would suit
> what your foreign visitors are used to, do you have to change
> your server's locale, and change it back again afterwards, or what ?
No, you create a local locale and use that.
There are essentially three possible levels I can see for this:
- programs that will only ever be used in one locale, known in
advance. They can have the locale hard-wired into the program. No
special support is needed for this. It's pretty easy to write a
function to format a number to a hard-wired locale. I've done it in
Pascal and FORTH and it was easy-peasy, so I can't imagine it's going
to be a big deal in Python. If it's such a big deal for accountants to
write this code, if they ask in this forum how to do it somebody will
almost certainly supply a function that takes a float and returns a
formatted string within a few minutes. It might even be you or me.
- Programs that may be used in any unchanging locale. The existing
locale support is built for this case.
- Programs that nead to operate across locales. This can either be
managed by switching global locales (which you rightly deprecate) or
by managing alternate locales within the program.
> The locale approach has all the disadvantages of global variables.
No, it has all the advantages of global constants used as overridable
defaults for local variables.
> To make software usable by, or expandable to, different languages
> and cultures is a tricky design problem - you have to, at the
> minimum, do things like storing all your text, both for prompts and
> errors, in some kind of database and refer to it by its key, everywhere.
> You cannot simply assume, that because a number represents
> a monetary value, that it is Yen, or Australian Dollar, or whatever -
> you may have to convert it first, from its currency, to the currency
> that you want to display it as, and only then can you worry about
> the format that you want to display it in.
Nothing in the proposal being considered addresses any of that.
--
Tim Rowe
> Rhodri James wrote:
> [snip]
>> Frankly, I'd much rather fix the locale system and extend
>> the format syntax to override the default locale. Perhaps
>> something like
>> financial = Locale(group_sep=",", grouping=[3])
>> print("my number is {0:10n:financial}".format(1234567))
>> It's hard to think of a way of extending "%" format strings
>> to cope with this that won't look utterly horrid, though!
>>
> The problem with your example is that it magically looks for the locale
> name "financial" in the current namespace.
True, to an extent. The counter-argument of "Is it so much
more magical than '{keyword}' looking up the object in the
parameter list" suggests a less magical approach would be to
make the locale a parameter itself:
print("my number is {0:10n:{1}}".format(1234567, financial)
> Perhaps the name should be
> registered somewhere like this:
>
> locale.predefined["financial"] = Locale(group_sep=",", grouping=[3])
> print("my number is {0:10n:financial}".format(1234567))
I'm not sure that I don't think that *more* magical than my
first stab! Regardless of the exact syntax, do you think
that being able to specify an overriding locale object (and
let's wave our hands over what one of those is too) is the
right approach?
financial = Locale(group_sep=",", grouping=[3])
print("my number is {0:10n:{fin}}".format(1234567, fin=financial))
Then again, shouldn't that be:
fin = Locale(group_sep=",", grouping=[3])
print("my number is {0:{fin}}".format(1234567, fin=financial))
> The field name can be an integer or an identifier, so the locale could
> be too, provided that you know where to look it up!
>
> financial = Locale(group_sep=",", grouping=[3])
> print("my number is {0:10n:{fin}}".format(1234567, fin=financial))
>
> Then again, shouldn't that be:
>
> fin = Locale(group_sep=",", grouping=[3])
> print("my number is {0:{fin}}".format(1234567, fin=financial))
Except that loses you the format, since the locale itself is a collection
of parameters the format uses. The locale knows how to do groupings, but
not whether to do them, nor what the field width should be. Come to think
of it, it doesn't know whether to use the LC_NUMERIC grouping information
or the LC_MONETARY grouping information. Hmm.
I can't believe I'm even suggesting this, but how about:
print("my number is {fin.format("10d", {0}, True)}".format(1235467,
fin=financial))
assuming the locale.format() method remains unchanged? That's horrible,
and I'm pretty sure it can't be right, but I'm too tired to think of
anything more sensible right now.
financial = Locale(group_sep=",", grouping=[3])
print("my number is {0:10n:fin}".format(1234567, fin=financial))
The format "10n" says whether to use separators or a decimal point; the
locale "fin" says what the separator and the decimal point look like.
> It should probably(?) be:
>
> financial = Locale(group_sep=",", grouping=[3])
> print("my number is {0:10n:fin}".format(1234567, fin=financial))
>
> The format "10n" says whether to use separators or a decimal point; the
> locale "fin" says what the separator and the decimal point look like.
That works, and isn't an abomination on the face of the existing syntax.
Excellent.
I'm rather presuming that the "n" presentation type does grouping. I've
only got Python 2.5 here, so I can't check it out (no str.format() method
and "%n" isn't supported by "%" formatting). If it does, an "m" type to
do the same thing only with the LC_MONETARY group settings instead of the
LC_NUMERIC ones would be a good idea.
This would be my preferred solution to Raymond's original
comma-in-the-format-string proposal, by the way: add an "m" presentation
type as above, and tell people to override the LC_MONETARY group settings
in the global locale. It's clear that it's a bodge, and weaning users
onto local locales (!) wouldn't be so hard later on.
Anyway, time I stopped hypothesising about locales and started looking at
the actual code-base, methinks.
I would prefer the format to have a fixed default so that if you don't
specify the locale the result is predictable.
> I'm not against putting a comma in the format to indicate that grouping
> should be used just as a dot indicates that a decimal point should be
> used. The locale would say what characters would be used for them.
>
> I would prefer the format to have a fixed default so that if you don't
> specify the locale the result is predictable.
Shouldn't that be the global locale?
Yes, but the format type 'n' is currently defined as taking its cues
from the global locale, so in that sense format already is
locale-sensitive.
> If anyone here is interested, here is a proposal I posted on the
> python-ideas list.
>
> The idea is to make numbering formatting a little easier with the
> new format() builtin
> in Py2.6 and Py3.0:
> http://docs.python.org/library/string.html#formatspec
>
[...]
> Comments and suggestions are welcome but I draw the line at
> supporting Mayan numbering conventions ;-)
Is that inclusive or exclusive?
--
rzed