Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Formatting a long decimal into Gb, Mb, or Kb (using String.format)

1,360 views
Skip to first unread message

Arvin Portlock

unread,
Nov 6, 2007, 3:08:43 PM11/6/07
to
I have a long decimal number indicating diskspace that looks like
"35648327680" and I would like to format it into a more easily read
format like "35 Gb." Precision isn't an issue. Approximation is fine
for this application.

I can do it with a switch statement and regular expressions but I
was wondering if there's a more elegant way using something like
String.format().

long size; // equals, e.g., 35648327680
String displaySize;
if (size > 1000000000) {
displaySize = String.format ("??? Gb", size);
} else if (size > 1000000) {
displaySize = String.format ("??? Mb", size);
} else if (size > 1000) {
displaySize = String.format ("??? Kb", size);
} else {
displaySize = String.format ("??? Bytes", size);
}

where the ??? question marks are where the right format should
go.

I've spent a couple of days on this with only frustration to show
for it. Thanks for any help you guys can give me.

Allen

Roedy Green

unread,
Nov 6, 2007, 5:11:24 PM11/6/07
to
On Tue, 06 Nov 2007 12:08:43 -0800, Arvin Portlock <nom...@sorry.com>
wrote, quoted or indirectly quoted someone who said :

>I can do it with a switch statement and regular expressions but I
>was wondering if there's a more elegant way using something like
>String.format().

this is what I call the "band" problem, converting a value into an
integer band number.

You can create an array with the low or upper bound on each band, and
do a linear search or a binary search.
http://mindprod.com/jgloss/binarysearch.html

You can do a great nested if.

Other related problems -- taking a wavelength and classifying it as a
colour.

Converting percentages to letter grades.

Defragging disks into bands by frequency of use or last use time.

--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Arvin Portlock

unread,
Nov 6, 2007, 5:28:23 PM11/6/07
to
Roedy Green wrote:

> On Tue, 06 Nov 2007 12:08:43 -0800, Arvin Portlock

> wrote, quoted or indirectly quoted someone who said :
>
>

> >I can do it with a switch statement and regular expressions but I
> >was wondering if there's a more elegant way using something like
> >String.format().
>
>

> this is what I call the "band" problem, converting a value into an
> integer band number.
>
> You can create an array with the low or upper bound on each band, and
> do a linear search or a binary search.
> http://mindprod.com/jgloss/binarysearch.html

This is a good idea. I like this better than nested if's. I also like
the phrase "band" problem. I tend to encounter a lot of these and I've
always relied on switches or if's. What I have now:

private String formatNumber (long number) {
String formatted;
if (number > 1000000000) {
String re = "^(.*)\\d{9}$";
Matcher m = Pattern.compile(re).matcher(Long.toString (number));
if (m.find()) {
formatted = m.group(1) + " Gb";
} else {
formatted = "0";
}
} else if (number > 1000000) {
String re = "^(.*)\\d{6}$";
Matcher m = Pattern.compile(re).matcher(Long.toString (number));
if (m.find()) {
formatted = m.group(1) + " Mb";
} else {
formatted = "0";
}
} else if (number > 1000) {
String re = "^(.*)\\d{3}$";
Matcher m = Pattern.compile(re).matcher(Long.toString (number));
if (m.find()) {
formatted = m.group(1) + " Kb";
} else {
formatted = "0";
}
} else {
formatted = Long.toString (number) + " bytes";
}
return (formatted);
}

I know I can factor out a lot of duplicate code but I didn't want to
waste time on it if I could come up with something nicer.

Thanks.

Eric Sosman

unread,
Nov 6, 2007, 5:57:24 PM11/6/07
to
Arvin Portlock wrote On 11/06/07 17:28,:

> Roedy Green wrote:
>
>
>>On Tue, 06 Nov 2007 12:08:43 -0800, Arvin Portlock
>>wrote, quoted or indirectly quoted someone who said :
>>
>>
>>
>>>I can do it with a switch statement and regular expressions but I
>>>was wondering if there's a more elegant way using something like
>>>String.format().
>>
>>
>>this is what I call the "band" problem, converting a value into an
>>integer band number.
>>
>>You can create an array with the low or upper bound on each band, and
>>do a linear search or a binary search.
>>http://mindprod.com/jgloss/binarysearch.html
>
>
> This is a good idea. I like this better than nested if's. I also like
> the phrase "band" problem. I tend to encounter a lot of these and I've
> always relied on switches or if's. What I have now:
>
> private String formatNumber (long number) {
> String formatted;
> if (number > 1000000000) {
> String re = "^(.*)\\d{9}$";
> Matcher m = Pattern.compile(re).matcher(Long.toString (number));
> if (m.find()) {
> formatted = m.group(1) + " Gb";
> } else {
> formatted = "0";
> }
> } else if (number > 1000000) {
> [...]

Um, er, are you aware of the division operator, `/'?
You can use it like this:

if (number > 1000000000)
formatted = Long.toString(number / 1000000000)
+ " GB");
else if (number > 1000000)
...

If you follow Roedy's suggestion and use a table, the
strings "GB" "MB" and so on can come from the table.

Incidentally, please use "GB","MB" and not "Gb","Mb".
The lower-case "b" usually signifies bits, not bytes, as
in "It takes ~10 seconds to send 1MB over a 10Mb link."
(However, it must be admitted that computer folk are no
respecters of measurement standards, not even standards
that are the subject of international treaties. For
example, the prefix meaning "kilo" is "k" and not "K",
and it multiplies by 1000, not by 1024. (A scheme of
names and prefixes for binary multiples has been adopted
("1KiB" is "1 kibibyte," 1024 bytes), but does not seem
to have gained much traction.))

--
Eric....@sun.com

Arvin Portlock

unread,
Nov 7, 2007, 1:05:26 PM11/7/07
to
Eric Sosman wrote:

> Um, er, are you aware of the division operator, '/'?
> You can use it like this:
>
> if (number > 1000000000)
> formatted = Long.toString(number / 1000000000)
> + " GB");
> else if (number > 1000000)

Yeah, I feel pretty stupid. Some people ought to have their programming
license revoked:

private String formatNumber (long number) {
String formatted;
if (number > 1000000000) {

formatted = Long.toString (number / 1000000000) + " GB";
} else if (number > 1000000) {
formatted = Long.toString (number / 1000000) + " MB";
} else if (number > 1000) {
formatted = Long.toString (number / 1000) + " KB";


} else {
formatted = Long.toString (number) + " bytes";
}
return (formatted);
}

> Incidentally, please use "GB","MB" and not "Gb","Mb".


> The lower-case "b" usually signifies bits, not bytes, as
> in "It takes ~10 seconds to send 1MB over a 10Mb link."
> (However, it must be admitted that computer folk are no
> respecters of measurement standards, not even standards
> that are the subject of international treaties. For
> example, the prefix meaning "kilo" is "k" and not "K",
> and it multiplies by 1000, not by 1024.

I didn't know that about big-B vs. little-b. I've changed
that. Thanks for being kind.

Piotr Kobzda

unread,
Nov 8, 2007, 9:28:45 AM11/8/07
to

There is still a mistake in your implementation. As pointed out by
Eric, "K" means something different than "k".

A base for "K" it is 1024, not 1000!


Consider also using the following approach for your purposes:

public enum StorageUnit {
BYTE ( "B", 1L),
KILOBYTE ("KB", 1L << 10),
MEGABYTE ("MB", 1L << 20),
GIGABYTE ("GB", 1L << 30),
TERABYTE ("TB", 1L << 40),
PETABYTE ("PB", 1L << 50),
EXABYTE ("EB", 1L << 60);

public static final StorageUnit BASE = BYTE;

private final String symbol;
private final long divider; // divider of BASE unit

StorageUnit(String name, long divider) {
this.symbol = name;
this.divider = divider;
}

public static StorageUnit of(final long number) {
final long n = number > 0 ? -number : number;
if (n > -(1L << 10)) {
return BYTE;
} else if (n > -(1L << 20)) {
return KILOBYTE;
} else if (n > -(1L << 30)) {
return MEGABYTE;
} else if (n > -(1L << 40)) {
return GIGABYTE;
} else if (n > -(1L << 50)) {
return TERABYTE;
} else if (n > -(1L << 60)) {
return PETABYTE;
} else { // n >= Long.MIN_VALUE
return EXABYTE;
}
}

public String format(long number) {
return nf.format((double)number / divider) + " " + symbol;
}

private static java.text.NumberFormat nf
= java.text.NumberFormat.getInstance();
static {
nf.setGroupingUsed(false);
nf.setMinimumFractionDigits(0);
nf.setMaximumFractionDigits(1);
}
}


Typical usage is as follows:

StorageUnit.of(number).format(number);


Note that there are still some units of information defined by the SI
not handled here, i.e. bits (and its derived units), zettabytes (ZB),
and yottabytes (YB). Handling them all together requires a bit more
advanced approach (likely something near to the JSR-275 proposals), but
I think you don't need to do that.


piotr

Lew

unread,
Nov 8, 2007, 9:51:06 AM11/8/07
to
Piotr Kobzda wrote:
> There is still a mistake in your implementation. As pointed out by
> Eric, "K" means something different than "k".
> A base for "K" it is 1024, not 1000!

"A" base, not "the" base. "K" also means 1000. "K" is not an SI prefix, and
it isn't precisely defined. It is, in fact, a natural-language term.

> The symbol "K" is often used informally to mean a multiple of (a) thousand,
> so one may talk of "a 40K salary" (40 000), or the Y2K problem.
> In these cases an uppercase K is often used, although using an
> uppercase K is never correct when writing under the rules of the SI.
> Also, it is often used as a prefix to designate the binary prefix
> kilo = 2^10 = 1024, although this is now non-standard.
<http://en.wikipedia.org/wiki/SI_prefix>

For the prefixes that use powers of 2, see,
<http://en.wikipedia.org/wiki/Binary_prefix>

--
Lew

Piotr Kobzda

unread,
Nov 8, 2007, 12:30:59 PM11/8/07
to
Lew wrote:

[...]

> For the prefixes that use powers of 2, see,
> <http://en.wikipedia.org/wiki/Binary_prefix>

Thanks! I wasn't aware that there is a "new" standard for the prefixes.
I was always using the binary prefixes listed in a middle table in
"Historical use" column there
<http://en.wikipedia.org/wiki/Binary_prefix#Specific_units_of_IEC_60027-2_A.2>.

But now it seems that I have to change that...

Just one quote more:

"These SI prefixes refer strictly to powers of 10. They should not be
used to indicate powers of 2 (for example, one kilobit represents 1000
bits and not 1024 bits). The IEC has adopted prefixes for binary powers
in the international standard IEC 60027-2: 2005, third edition, Letter
symbols to be used in electrical technology – Part 2: Telecommunications
and electronics. The names and symbols for the prefixes corresponding to
2^10, 2^20, 2^30, 2^40, 2^50, and 2^60 are, respectively: kibi, Ki;
mebi, Mi; gibi, Gi; tebi, Ti; pebi, Pi; and exbi, Ei. Thus, for example,
one kibibyte would be written: 1 KiB = 2^10 B = 1024 B, where B denotes
a byte. Although these prefixes are not part of the SI, they should be
used in the field of information technology to avoid the incorrect usage
of the SI prefixes."

[3.1 SI prefixes; The International System of Units (SI) brochure, 8th
edition 2006, available there <http://www.bipm.org/en/si/si_brochure/>]


Is anybody using the above standard recommendation every day?


piotr

Arvin Portlock

unread,
Nov 8, 2007, 12:48:19 PM11/8/07
to
Piotr Kobzda wrote:

> There is still a mistake in your implementation. As pointed out by
> Eric, "K" means something different than "k".
>
> A base for "K" it is 1024, not 1000!

I was wondering whether anybody would call me on this! Switching to
capital 'B' was an easy decision. For the letter prefixes though I
decided to go with common usage and consistency (GB, MB, kB???)
incorrect as it may be. Gi, Mi, etc., is simply too pedantic for
my audience, who are other programmers here at work who under-
stand the difference between powers of 10 and powers of 2 so I'm
unlikely to invite lawsuits for misleading information! Users first!

The following bit of code is wonderful! I never get to fiddle bits
in my programs and any chance to use "<<" turns me on. With a
little head scratching I actually understand it. But whoever
inherits my code after me would curse my name if I swapped this
in for the few lines of code I now have. Clever programmers. Bah!
I've only been programming Java for about a month now and your
complete implementation is dripping with Java-y goodness. It'll
be great to refer to for tips on good programming and best
practices. I've never ever seen a class with a signature beginning
with "public enum"!

Eric Sosman

unread,
Nov 8, 2007, 1:15:24 PM11/8/07
to
Arvin Portlock wrote On 11/08/07 12:48,:

>
> I was wondering whether anybody would call me on this! Switching to
> capital 'B' was an easy decision. For the letter prefixes though I
> decided to go with common usage and consistency (GB, MB, kB???)
> incorrect as it may be. Gi, Mi, etc., is simply too pedantic for
> my audience, who are other programmers here at work who under-
> stand the difference between powers of 10 and powers of 2 so I'm
> unlikely to invite lawsuits for misleading information! Users first!

The "mutable mega" actually caused me some grief once
at a PPOE. Our product's installation said it needed at
least 40 megabytes of available disk space (I don't recall
the actual number, but let's just pretend it was 40), and
a test engineer set out to verify that the installation would
in fact succeed with that bare minimum. So he filled his
disk with garbage file until Windows said 40,000,000 bytes
remained unused, fired up our installation, watched it die,
and filed a bug report.

Of course, what happened was that our script checked
for at least 41943040 = 40*1024*1024 bytes and refused to
run if there wasn't enough ...

Probably the worst abuse of this confusion referred to
the so-called "1.44 megabyte floppy." The actual capacity
of this device was 2880 512-byte blocks, or

1474560 B
1474+ kB
1.47+ MB
1440 KiB
1.41- MiB

You will note that "1.44" appears *nowhere* in this list
of expressions of the capacity! To get to 1.44, you must
express the capacity in a miscegenated decimal-and-binary
unit of 1024000 bytes, a unit I suggested should be called
the "maybebyte" in honor of the floppy's reliability ...

--
Eric.

Arvin Portlock

unread,
Nov 8, 2007, 1:34:29 PM11/8/07
to
Eric Sosman wrote:

You know, this is actually a very good point and perhaps
not as pedantic as I first thought. What I'm doing is creating
a web database of our inventory of some 2000 PCs, consolidating
what had been a haphazard set of Excel spreadsheets into a
central, searchable site. Some of the information is collected
manually onsite (like who owns the PC, who uses it, and for
what purpose) while other information (CPU type and speed, RAM,
diskspace, MAC Address, etc.) is gathered by automated scans.

Diskspace is reported as a long integer and I have converted
it into something that's easy to grok at a glance with a nice
accompanying pie chart done entirely in Javascript. It now occurs
to me that our PC guys may need to know more than just at-a-glance
diskspace figures in case they're trying to figure out, as you
related, things like whether or not there is enough space to
install something. In cases where it's close they may want
something more precise.

I'm going to do two things:

1 - Add more precision to the at-a-glance figure. Piotr's code had
an interesting little bit that I did not understand but which, it
seems, could be useful for this:

static {
nf.setGroupingUsed(false);
nf.setMinimumFractionDigits(0);
nf.setMaximumFractionDigits(1);
}

2 - Display the full long integer value as well.

Thanks for your feedback. This is a very friendly newsgroup. Not
like that awful perl newsgroup I usually go to.

Arvin

0 new messages