Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

"comma-izing" numbers - a blast from the past!

599 views
Skip to first unread message

Kenny McCormack

unread,
Dec 9, 2011, 9:03:41 AM12/9/11
to
Today I had the need to display numbers "comma-ized" - which is a frequently
requested feature in this newsgroup. This arose in the context of
displaying Unix file sizes using the "ls" command (*). A bit of googling
turned up the following (GAWK) function, written back in 2005 by frequent
poster "Ed":

# Cleaned up a bit by me.
function comma(num) {
if (num < 0)
return "-" comma(-num)
while (num != (num=gensub(/([0-9])([0-9][0-9][0-9])($|[,.])/,"\\1,\\2\\3","",num)));
return num
}

# And the following program line to process the output of "ls"
{ sub($5" *"$6,$5 sprintf("%20s",comma($6)));print }

----------------------------------------------------------------------------
(*) Aside: ISTR that there was some way to do this directly in "ls", but I
couldn't find it in the man page.

--
Just for a change of pace, this sig is *not* an obscure reference to
comp.lang.c...

pk

unread,
Dec 9, 2011, 9:06:51 AM12/9/11
to
On Fri, 9 Dec 2011 14:03:41 +0000 (UTC), gaz...@shell.xmission.com (Kenny
McCormack) wrote:

> (*) Aside: ISTR that there was some way to do this directly in "ls", but I
> couldn't find it in the man page.

ls -l --block-size="'1"

with GNU ls.


Kenny McCormack

unread,
Dec 9, 2011, 9:12:39 AM12/9/11
to
No.

--
But the Bush apologists hope that you won't remember all that. And they
also have a theory, which I've been hearing more and more - namely,
that President Obama, though not yet in office or even elected, caused the
2008 slump. You see, people were worried in advance about his future
policies, and that's what caused the economy to tank. Seriously.

(Paul Krugman - Addicted to Bush)

pk

unread,
Dec 9, 2011, 9:14:00 AM12/9/11
to
On Fri, 9 Dec 2011 14:12:39 +0000 (UTC), gaz...@shell.xmission.com (Kenny
McCormack) wrote:

> In article <jbt4n4$bid$1...@speranza.aioe.org>, pk <p...@pk.invalid> wrote:
> >On Fri, 9 Dec 2011 14:03:41 +0000 (UTC), gaz...@shell.xmission.com
> >(Kenny McCormack) wrote:
> >
> >> (*) Aside: ISTR that there was some way to do this directly in "ls",
> >> but I couldn't find it in the man page.
> >
> >ls -l --block-size="'1"
> >
> >with GNU ls.
>
> No.

Tough.

Kenny McCormack

unread,
Dec 9, 2011, 9:17:15 AM12/9/11
to
Thank you.

Anyway, can we get back on topic now? Thanks again.

--
"Remember when teachers, public employees, Planned Parenthood, NPR and PBS
crashed the stock market, wiped out half of our 401Ks, took trillions in
TARP money, spilled oil in the Gulf of Mexico, gave themselves billions in
bonuses, and paid no taxes? Yeah, me neither."

Janis Papanagnou

unread,
Dec 9, 2011, 10:15:41 AM12/9/11
to
Am 09.12.2011 15:03, schrieb Kenny McCormack:
> Today I had the need to display numbers "comma-ized" - which is a frequently
> requested feature in this newsgroup. This arose in the context of
> displaying Unix file sizes using the "ls" command (*). A bit of googling
> turned up the following (GAWK) function, written back in 2005 by frequent
> poster "Ed":
>
> # Cleaned up a bit by me.
> function comma(num) {
> if (num< 0)
> return "-" comma(-num)
> while (num != (num=gensub(/([0-9])([0-9][0-9][0-9])($|[,.])/,"\\1,\\2\\3","",num)));
> return num
> }
>
> # And the following program line to process the output of "ls"
> { sub($5" *"$6,$5 sprintf("%20s",comma($6)));print }
>
> ----------------------------------------------------------------------------
> (*) Aside: ISTR that there was some way to do this directly in "ls", but I
> couldn't find it in the man page.
>

I didn't see any question in your posting. So, until further
clarification, I add ("blast from the past") my 2 cents

{ for (p="[0-9][0-9][0-9]$"; sub(p,",&"); p="[0-9][0-9][0-9],"p)
;
sub(/^,/,"")
print
}

(which is not depending on the GNU features).

A plain loop/substr()/length() based version would also be
straightforward, so I am wondering what it is that you actually
are asking for.

Janis

Ed Morton

unread,
Dec 9, 2011, 3:03:56 PM12/9/11
to
Kenny McCormack <gaz...@shell.xmission.com> wrote:

> Today I had the need to display numbers "comma-ized" - which is a frequently
> requested feature in this newsgroup. This arose in the context of
> displaying Unix file sizes using the "ls" command (*). A bit of googling
> turned up the following (GAWK) function, written back in 2005 by frequent
> poster "Ed":
>
> # Cleaned up a bit by me.
> function comma(num) {
> if (num < 0)
> return "-" comma(-num)
> while (num !=
(num=gensub(/([0-9])([0-9][0-9][0-9])($|[,.])/,"\\1,\\2\\3","",num)));
> return num
> }
>
> # And the following program line to process the output of "ls"
> { sub($5" *"$6,$5 sprintf("%20s",comma($6)));print }
>

Don't know if this is any better or worse, but here's a non-Gawk-specific
version without a loop or recursion:

function comma(num) {
sign = (sub(/^-/,"",num) ? "-" : "")

lgth = length(num)

lgthTail = int(lgth/3) * 3 - (lgth%3 ? 0 : 3)
lgthHead = lgth - lgthTail

tail = substr(num,lgthHead+1)
head = substr(num,1,lgthHead)

gsub(/.../,",&",tail)

return sign head tail
}

{ print comma($0) }

Regards

Ed.

Posted using www.webuse.net

Janis Papanagnou

unread,
Dec 9, 2011, 5:25:02 PM12/9/11
to
On 09.12.2011 21:03, Ed Morton wrote:
> Kenny McCormack <gaz...@shell.xmission.com> wrote:
>
>> Today I had the need to display numbers "comma-ized" - which is a frequently
>> requested feature in this newsgroup. This arose in the context of
>> displaying Unix file sizes using the "ls" command (*). A bit of googling
>> turned up the following (GAWK) function, written back in 2005 by frequent
>> poster "Ed":
>>
>> # Cleaned up a bit by me.
>> function comma(num) {
>> if (num < 0)
>> return "-" comma(-num)
>> while (num !=
> (num=gensub(/([0-9])([0-9][0-9][0-9])($|[,.])/,"\\1,\\2\\3","",num)));
>> return num
>> }
>>
>> # And the following program line to process the output of "ls"
>> { sub($5" *"$6,$5 sprintf("%20s",comma($6)));print }
>>
>
> Don't know if this is any better or worse, but here's a non-Gawk-specific
> version without a loop or recursion:

I like that function.

> function comma(num) {
> sign = (sub(/^-/,"",num) ? "-" : "")

Since it also handles a minus sign, some tweaking would be nice, though,
to correctly handle the + sign as well. Maybe something like

sign = substr(num,1,sub(/^[+-]/,"",num)) # a bit tricky, I admit

I wonder whether that expression is well defined or whether the evaluation
order of the substr() arguments may lead to problems, e.g. with other awks.

Janis

Rugxulo

unread,
Dec 9, 2011, 6:22:41 PM12/9/11
to
Hi,

On Dec 9, 8:03 am, gaze...@shell.xmission.com (Kenny McCormack) wrote:
>
> Today I had the need to display numbers "comma-ized" - which is a frequently
> requested feature in this newsgroup.  This arose in the context of
> displaying Unix file sizes using the "ls" command (*).
>
> (*) Aside: ISTR that there was some way to do this directly in "ls", but I
> couldn't find it in the man page.

There are probably a bunch of ways to do it. Heck, I'm not even an AWK
nor POSIX programmer, and I know of several ways:

# gawk --version | sed -e '1q'
GNU Awk 3.1.6
# cat commas.awk
{printf("%'ld\n",$5)}
# ls -l tde*
-rw-r--r-- 1 rugxulo rugxulo 578272 2007-05-01 20:29 tde51vs.zip
-rw-r--r-- 1 rugxulo rugxulo 386045 2009-08-12 14:56 tdelinux.gz
# ls -l tde* | awk -f commas.awk
578,272
386,045

Admittedly, I like the aforementioned AWK-only hardcoded version
better (even more portable), but since you did say "displaying Unix
file sizes", I can't help but suggest this.

http://pubs.opengroup.org/onlinepubs/009695399/functions/printf.html

P.S. There are a few alternatives in sed over at http://sed.sourceforge.net/#scripts
, but this way was the most obvious to me, at least.

Ed Morton

unread,
Dec 9, 2011, 10:01:17 PM12/9/11
to
You could always split it into (untested and not thought about very much!):

signPos = sub(/^[+-]/,"",num)
..
return substr(num,signPos,1) head tail

Regards,

Ed.

>
> Janis
>
> >
> > lgth = length(num)
> >
> > lgthTail = int(lgth/3) * 3 - (lgth%3 ? 0 : 3)
> > lgthHead = lgth - lgthTail
> >
> > tail = substr(num,lgthHead+1)
> > head = substr(num,1,lgthHead)
> >
> > gsub(/.../,",&",tail)
> >
> > return sign head tail
> > }
> >
> > { print comma($0) }
> >
> > Regards
> >
> > Ed.
> >
> > Posted using www.webuse.net


Posted using www.webuse.net

Tim Menzies

unread,
Dec 9, 2011, 11:34:01 PM12/9/11
to
On 2011-12-09, Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> Am 09.12.2011 15:03, schrieb Kenny McCormack:
>> Today I had the need to display numbers "comma-ized" - which is a frequently
>> requested feature in this newsgroup.

> I didn't see any question in your posting. So, until further
> clarification, I add ("blast from the past") my 2 cents
>
> { for (p="[0-9][0-9][0-9]$"; sub(p,",&"); p="[0-9][0-9][0-9],"p)
> ;
> sub(/^,/,"")
> print
> }
>
> (which is not depending on the GNU features).

Janis,
That is a great solution. I've never used "for" except for numbers. You da man!

t

mic...@gortel.phys.ualberta.ca

unread,
Dec 10, 2011, 1:31:59 AM12/10/11
to
Rugxulo <rug...@gmail.com> wrote:
> # cat commas.awk
> {printf("%'ld\n",$5)}
> # ls -l tde*
> -rw-r--r-- 1 rugxulo rugxulo 578272 2007-05-01 20:29 tde51vs.zip
> -rw-r--r-- 1 rugxulo rugxulo 386045 2009-08-12 14:56 tdelinux.gz
> # ls -l tde* | awk -f commas.awk
> 578,272
> 386,045

More precisely this is using not "comma" but "thousands separator" which
is locale dependent. In particular that means that if you will do above
# ls -l tde* | LANG=C awk -f commas.awk
then this separator will be empty and with 'LANG=de_DE', for example,
you will see dots while 'LANG=ru_RU' will have yet another effect.

Michal

Kenny McCormack

unread,
Dec 12, 2011, 6:16:04 AM12/12/11
to
In article <jbuucv$t2v$1...@tabloid.srv.ualberta.ca>,
<mic...@gortel.phys.ualberta.ca> wrote:
>Rugxulo <rug...@gmail.com> wrote:
>> # cat commas.awk
>> {printf("%'ld\n",$5)}
>> # ls -l tde*
>> -rw-r--r-- 1 rugxulo rugxulo 578272 2007-05-01 20:29 tde51vs.zip
>> -rw-r--r-- 1 rugxulo rugxulo 386045 2009-08-12 14:56 tdelinux.gz
>> # ls -l tde* | awk -f commas.awk
>> 578,272
>> 386,045

Curiously, on both the systems I tested this on, I got:

$ LANG=en_US gawk 'BEGIN {printf "%\047ld\n",123456789}'
,123,456,789
$

I get a similar result with a 6 digit number as well.

Ed Morton

unread,
Dec 12, 2011, 3:19:22 PM12/12/11
to
Kenny McCormack <gaz...@shell.xmission.com> wrote:

> In article <jbuucv$t2v$1...@tabloid.srv.ualberta.ca>,
> <mic...@gortel.phys.ualberta.ca> wrote:
> >Rugxulo <rug...@gmail.com> wrote:
> >> # cat commas.awk
> >> {printf("%'ld\n",$5)}
> >> # ls -l tde*
> >> -rw-r--r-- 1 rugxulo rugxulo 578272 2007-05-01 20:29 tde51vs.zip
> >> -rw-r--r-- 1 rugxulo rugxulo 386045 2009-08-12 14:56 tdelinux.gz
> >> # ls -l tde* | awk -f commas.awk
> >> 578,272
> >> 386,045
>
> Curiously, on both the systems I tested this on, I got:
>
> $ LANG=en_US gawk 'BEGIN {printf "%\047ld\n",123456789}'
> ,123,456,789
> $
>
> I get a similar result with a 6 digit number as well.
>

That is odd. On Solaris SunOS 5.10 I get:

$ LANG=en_US gawk 'BEGIN {printf "%\047ld\n",0123456789}'
123,456,789

$ LANG=en_US nawk 'BEGIN {printf "%\047ld\n",0123456789}'
123,456,789

$ LANG=en_US /usr/xpg4/bin/awk 'BEGIN {printf "%\047ld\n",0123456789}'
123,456,789

$ LANG=en_US gawk --posix 'BEGIN {printf "%\047ld\n",0123456789}'
gawk: cmd. line:1: fatal: `l' is not permitted in POSIX awk formats

$ gawk --version
GNU Awk 4.0.0

Notice that for better or worse it has the added side-effect of stripping
leading zeros. Probably for better in most cases I would think.

Ed.

Posted using www.webuse.net

mic...@gortel.phys.ualberta.ca

unread,
Dec 12, 2011, 7:59:52 PM12/12/11
to
Kenny McCormack <gaz...@shell.xmission.com> wrote:
>
> Curiously, on both the systems I tested this on, I got:
>
> $ LANG=en_US gawk 'BEGIN {printf "%\047ld\n",123456789}'
> ,123,456,789
> $
>
> I get a similar result with a 6 digit number as well.

That is definitely not what I am seeing. I wonder what you will
get with
$ LANG=en_US ls -l --block-size="'1"
if your 'ls' allows such options. Presumably changing LANG to
'en_US.UTF-8', if recognized, is not making any difference?

Michal

Kenny McCormack

unread,
Dec 13, 2011, 4:54:22 AM12/13/11
to
In article <201112122...@webuse.net>,
Ed Morton <morto...@gmail.com> wrote:
...
>That is odd. On Solaris SunOS 5.10 I get:
>
>$ LANG=en_US gawk 'BEGIN {printf "%\047ld\n",0123456789}'
>123,456,789

Interestingly enough, it does the right thing in gawk4, but not in gawk
3.1.4. I use gawk 3.1.4 for most of my day-to-day work, so that's why I got
the result that I got.

Strange - because intuitively, you'd think that this operation was just a
passthrough to the underlying printf (in your standard C library), so the
version of GAWK shouldn't matter...

--
> No, I haven't, that's why I'm asking questions. If you won't help me,
> why don't you just go find your lost manhood elsewhere.

CLC in a nutshell.

Aharon Robbins

unread,
Dec 14, 2011, 2:39:06 PM12/14/11
to
In article <jc77ce$2kk$1...@news.xmission.com>,
Kenny McCormack <gaz...@shell.xmission.com> wrote:
>In article <201112122...@webuse.net>,
>Ed Morton <morto...@gmail.com> wrote:
>...
>>That is odd. On Solaris SunOS 5.10 I get:
>>
>>$ LANG=en_US gawk 'BEGIN {printf "%\047ld\n",0123456789}'
>>123,456,789
>
>Interestingly enough, it does the right thing in gawk4, but not in gawk
>3.1.4. I use gawk 3.1.4 for most of my day-to-day work, so that's why I got
>the result that I got.

It was a bug, now fixed.

And 3.1.4 is OOOOOOOOOOOLLLLLLLLLLLLLLLLDDDDDDDDDDDDDDDD. You should upgrade.

>Strange - because intuitively, you'd think that this operation was just a
>passthrough to the underlying printf (in your standard C library), so the
>version of GAWK shouldn't matter...

What if you're on a system where the underlying printf doesn't support
the apostrophe flag?

Gawk therefore does this itself, although as seen, for a while there
was a bug.

Gawk in fact only punts to the underlying printf for floating point
values.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

Ivan Shmakov

unread,
Dec 14, 2011, 10:31:10 PM12/14/11
to
>>>>> michal <mic...@gortel.phys.ualberta.ca> writes:
>>>>> Kenny McCormack <gaz...@shell.xmission.com> wrote:

>> Curiously, on both the systems I tested this on, I got:

>> $ LANG=en_US gawk 'BEGIN {printf "%\047ld\n",123456789}'
>> ,123,456,789
>> $

>> I get a similar result with a 6 digit number as well.

> That is definitely not what I am seeing. I wonder what you will
> get with

> $ LANG=en_US ls -l --block-size="'1"

Please note that, as usual, LC_NUMERIC (if set) will override
LANG, and will be overriden in turn by LC_ALL. (I seem to
recall that certain GNU/Linux systems were setting LC_ALL
instead of LANG as per their default Shell profiles.)

> if your 'ls' allows such options. Presumably changing LANG to
> 'en_US.UTF-8', if recognized, is not making any difference?

I wonder if anything besides Coreutils' ls has such an option.

--cut: (coreutils) What information is listed --
Normally the size is printed as a byte count without punctuation,
but this can be overridden (*note Block size::). For example, `-h'
prints an abbreviated, human-readable count, and `--block-size="'1"'
prints a byte count with the thousands separator of the current
locale.
--cut: (coreutils) What information is listed --

--
FSF associate member #7257

mic...@gortel.phys.ualberta.ca

unread,
Dec 15, 2011, 10:25:51 AM12/15/11
to
Ivan Shmakov <onei...@gmail.com> wrote:
>>>>>> michal <mic...@gortel.phys.ualberta.ca> writes:
>
> > $ LANG=en_US ls -l --block-size="'1"
...
> I wonder if anything besides Coreutils' ls has such an option.

Yeah, at least for GNU stuff there is an information in

$ info coreutils 'block size'

But this can be found in some other places too. Outside of gawk :-) at
least in the current bash:

$ printf "%'ld\n" 1234567
1,234,567

Moreover 'man 3 printf' tells this:

For some numeric conversions a radix character ("decimal point") or
thousands' grouping character is used. The actual character used
depends on the LC_NUMERIC part of the locale. The POSIX locale uses
'.' as radix character, and does not have a grouping character. Thus,

printf("%'.2f", 1234567.89);

results in "1234567.89" in the POSIX locale, in "1234567,89" in the
nl_NL locale, and in "1.234.567,89" in the da_DK locale.

and

The SUSv2 specifies one further flag character.

' For decimal conversion (i, d, u, f, F, g, G) the output is to be
grouped with thousands' grouping characters if the locale infor-
mation indicates any. Note that many versions of gcc(1) cannot
parse this option and will issue a warning. SUSv2 does not
include %'F.

So what of that, how and where will work depends on your OS, libraries,
versions, etc...

Michal

Kenny McCormack

unread,
Dec 22, 2011, 6:28:50 AM12/22/11
to
In article <jcau0q$l88$1...@dont-email.me>,
Aharon Robbins <arn...@skeeve.com> wrote:
>In article <jc77ce$2kk$1...@news.xmission.com>,
>Kenny McCormack <gaz...@shell.xmission.com> wrote:
>>In article <201112122...@webuse.net>,
>>Ed Morton <morto...@gmail.com> wrote:
>>...
>>>That is odd. On Solaris SunOS 5.10 I get:
>>>
>>>$ LANG=en_US gawk 'BEGIN {printf "%\047ld\n",0123456789}'
>>>123,456,789
>>
>>Interestingly enough, it does the right thing in gawk4, but not in gawk
>>3.1.4. I use gawk 3.1.4 for most of my day-to-day work, so that's why I got
>>the result that I got.
>
>It was a bug, now fixed.

I see. Interesting.

>And 3.1.4 is OOOOOOOOOOOLLLLLLLLLLLLLLLLDDDDDDDDDDDDDDDD. You should upgrade.

As I'm sure you're aware, I have my reasons for sticking with this version.
Specifically, I added some special^Wnon-standard functionality and I can't
really retro-fit my changes to every new released version. FWIW, it
probably wouldn't be *that* hard for me to retro-fit, but I haven't actually
looked at the latest source code, so I don't really know what would be
involved.

In any case, the need to "upgrade" hasn't been pressing enough for me to
bother (yet). Incidentally, I *do* want to take a look at the new
multi-dimensional arrays stuff - I want to see if it really works as well in
the latest GAWK as it does in TAWK.

>>Strange - because intuitively, you'd think that this operation was just a
>>passthrough to the underlying printf (in your standard C library), so the
>>version of GAWK shouldn't matter...
>
>What if you're on a system where the underlying printf doesn't support
>the apostrophe flag?

Interesting. I suppose there is a tradeoff, between consistency (a prime
design goal for GAWK is that it do the same thing on all platforms) vs.
optimal functionality. I say this because in my view, it isn't always 100%
clear which is best, and I remember that in the past there has been an
issue. Specifically, with the strftime() function. I remember that at some
point in the distant past, it wasn't clear whether GAWK would/should use the
system's strftime() or use its own supplied version. Sometimes one was
better, sometimes the other. I remember having to sometimes tweak the build
scripts to make it use one or the other, depending on which one *I* thought
was better in the instant circumstances.

>Gawk therefore does this itself, although as seen, for a while there
>was a bug.

I see.

>Gawk in fact only punts to the underlying printf for floating point
>values.

Interesting.

--
Faced with the choice between changing one's mind and proving that there is
no need to do so, almost everyone gets busy on the proof.

- John Kenneth Galbraith -

0 new messages