boolean[] x
implemented via array of integers. So if there is a space
consideration, then one may want to pack bits into the integers and
define boolean array as:
class BooleanArray {
private int[] data;
public BooleanArray( int size ) {
data = new int[size/Integer.SIZE+1];
for( int i = 0; i < data.length; i++ )
data[i] = 0;
}
public boolean get( int index ) {
int row = index/Integer.SIZE;
int col = index%Integer.SIZE;
return ((data[row]>>col)&1)==1;
}
public void set( int index, boolean val ) {
int row = index/Integer.SIZE;
int col = index%Integer.SIZE;
int bitmap = data[row];
if( !val ) if( (((bitmap>>col)&1)==1) )
data[row]=bitmap ^ (1<<col);
if( val ) if( (((bitmap>>col)&1)==0) )
data[row]=bitmap | (1<<col);
}
}
Now, I found this implementation to be more than 3 times slower than
native boolean[]. Is this expected behavior, or I'm missing something?
Or, perhaps there is a well known alternative packed boolean array
implementation which doesn't suffer the above space-time tradeoff?
BTW, why boolean is not the same as byte[] (or char[] -- whatever is
smaller?-)
> Now, I found this implementation to be more than 3 times slower than
> native boolean[]. Is this expected behavior, or I'm missing something?
> Or, perhaps there is a well known alternative packed boolean array
> implementation which doesn't suffer the above space-time tradeoff?
The space/performance tradeoff is probably the reason why boolean[] is
typically implemented with one byte per boolean, rather than one bit.
Processors commonly have efficient hardware support for extracting a
byte from a word and sticking it in a register.
If you want bit packed true/false, why not use java.util.BitSet? It has
methods similar to your get and set.
>
> BTW, why boolean is not the same as byte[] (or char[] -- whatever is
> smaller?-)
At the implementation level, the systems I've measured appear to use one
byte per element. At the language level, a Java boolean is not a number
at all, so it would not make sense for an array of them to be an array
of numbers.
Patricia
java.util.BitSet is actually slower than the OP submitted code. But the
OP code is far from optimal. The following snippit will speed up his
code by about 1/3. But it will still be 1/2 the speed of a boolean []
version. But as Patrica has pointed the speed of boolean [] is at the
cost of memory. boolean [] = new boolean[100000000] (that 100 million )
will take 100 Mbytes of heap. The OP's BooleanArray class and
java.util.BitSet will both take a little over 12 Mbytes. So a boolean []
may be suitable for small sets but not for large sets.
// code to speed up OP version
public void set (int i, boolean b) {
if(b) set(i);
else clear(i);
}
public void set (int i) {
data[i>>5]|=(1<<(i & 0x1f));
}
public void clear (int i) {
data[i>>5] &=~(1<<(i & 0x1f));
}
public boolean get (int i) {
return (data[i>>5] & (1<<(i & 0x1f)))!=0;
}
The code above is prone to ArrayIndexOutofBoundsException, as is a
boolean [] implementation. As an aside java.util.BitSet is an
extensible class that can throw an OutOfMemoryError at any time a
miscalculated index is entered.
Actually, any class implementation of an array will be slower than
primitive array. Consider this class
class BooleanBoolArray {
private boolean[] data;
public BooleanBoolArray(int size) {
data = new boolean[size];
}
public boolean get( int index ) {
return (data[index]);
}
public void set( int index, boolean val ) {
data[index] = val;
}
}
and following test
int size = 100000000;
long startTime;
boolean ba[];
BooleanArray cbba;
startTime = System.currentTimeMillis();
ba = new boolean[size];
for (int i = 0; i < size; i++) {
ba[i] = true;
if (!ba[i]) throw new Error("boolean array : index = " + i);
ba[i] = false;
if (ba[i]) throw new Error("boolean array : index = " + i);
}
println("boolean array : " + (System.currentTimeMillis() -
startTime) + " ms.");
ba = null;
startTime = System.currentTimeMillis();
cbba = new BooleanArray(size);
for (int i = 0; i < size; i++) {
cbba.set(i,true);
if (!cbba.get(i)) throw new Error("BooleanArray : index = " + i);
cbba.set(i,false);
if (cbba.get(i)) throw new Error("BooleanArray : index = " + i);
}
println("BooleanBoolArray : " + (System.currentTimeMillis() -
startTime) + " ms.");
On my machine the results are:
boolean array : 1843 ms.
BooleanArray : 4860 ms.
It seems that this kind of "boxing" is almost 3 times slower.
Tomek
Comparing various implementations I encountered unexpected results.
Consider this class and tests:
class IntTest {
public int i;
public static int j;
public int get(int i) {
return i;
}
public void set(int i) {
this.i = i;
}
}
and test
int size = 100000000;
long startTime;
int test;
IntTest intTest = new IntTest();
startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
intTest.i = i;
test = intTest.i;
}
println("intTest.i : " + (System.currentTimeMillis() - startTime) +
" ms.");
startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
IntTest.j = i;
test = IntTest.j;
}
println("IntTest.j : " + (System.currentTimeMillis() - startTime) +
" ms.");
startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
intTest.set(i);
test = intTest.get(i);
}
println("IntTest.set/get : " + (System.currentTimeMillis() -
startTime) + " ms.");
and the results are
intTest.i : 3593 ms.
IntTest.j : 3657 ms.
IntTest.set/get : 1656 ms.
Direct read/write to/from class field is 2 times slower than using
get/set methods. What a surprise (at least for me)!!! Can anyone explain
this?
Tomek
Ooops, my mistake
> public int get(int i) {
> return i;
> }
should be
public int get() {
return i;
}
> startTime = System.currentTimeMillis();
> for (int i = 0; i < size; i++) {
> intTest.set(i);
> test = intTest.get(i);
> }
> println("IntTest.set/get : " + (System.currentTimeMillis() -
>startTime) + " ms.");
should be
startTime = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
intTest.set(i);
test = intTest.get();
}
println("IntTest.set/get : " + (System.currentTimeMillis() -
startTime) + " ms.");
Sorry
Tomek
> On my machine the results are:
>
> boolean array : 1843 ms.
> BooleanArray : 4860 ms.
>
> It seems that this kind of "boxing" is almost 3 times slower.
>
Tried your example on my machine and the difference was even more
pronounced:
boolean array : 641 ms.
BooleanBoolArray : 3658 ms.
However, switch to server VM (JDK 1.5.0_10 on WinXP):
boolean array : 172 ms.
BooleanArray : 235 ms.
Most of the time for the first run seems to be down to allocating the
memory, taking the array allocations out of the timings gives these
figures:
Client VM:
boolean array : 594 ms.
BooleanArray : 626 ms.
Server VM:
boolean array : 94 ms.
BooleanArray : 109 ms.
Dan.
--
Daniel Dyer
http://www.uncommons.org
I was unaware of BitSet! (I naively expected that google would point
me to all Boolean Array classes if asked for "Boolean Array").
BitSet is almost twice as fast as my homegrown class, although almost
twice as slow as native boolean[]. So the tradeoff is a little bit
slower.
It looks like I'm going to use boolean[] for small arrays and switch
to BitSet for larger ones. Nice application of polymorphism (which I
didn't find useful in my code for years).
> Direct read/write to/from class field is 2 times slower than using
> get/set methods. What a surprise (at least for me)!!! Can anyone explain
> this?
It's always difficult to guess (correctly) what the JIT will do. However, your
test would be better if it followed common practise for micro-benchmarks. Move
each test into a separate method. Run all the tests in turn inside a loop (an
infinite loop is as good as any). I'll append a modified version of the test
driver class to this post.
Using it, and using the -client JMV from JDK1.6.0 I see:
intTest.i : 390 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 281 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 200 ms.
IntTest.j : 271 ms.
IntTest.set/get : 210 ms.
intTest.i : 210 ms.
IntTest.j : 271 ms.
IntTest.set/get : 200 ms.
intTest.i : 210 ms.
IntTest.j : 271 ms.
IntTest.set/get : 200 ms.
...
Notice how the first run through the loop is not representative. This
particular test settles down much more quickly than usual, but it does appear
that the JITed (and presumably inlined) code generated for the set/get pair is
more efficient than that generated for the direct field access. At a guess the
optimiser isn't tuned to work as hard on direct access from outside the object
itself (why should it be ? No one writes code like that in reality.)
Now, switching the -server JVM. I see:
intTest.i : 90 ms.
IntTest.j : 90 ms.
IntTest.set/get : 70 ms.
intTest.i : 101 ms.
IntTest.j : 50 ms.
IntTest.set/get : 40 ms.
intTest.i : 40 ms.
IntTest.j : 40 ms.
IntTest.set/get : 20 ms.
intTest.i : 20 ms.
IntTest.j : 20 ms.
IntTest.set/get : 20 ms.
intTest.i : 20 ms.
IntTest.j : 30 ms.
IntTest.set/get : 20 ms.
intTest.i : 20 ms.
IntTest.j : 20 ms.
IntTest.set/get : 20 ms.
...
Again notice how it takes a while to settle down, and that the initial figures
mean nothing. In this case it appears that the -server option has largely
eliminated the inner loop (to be honest I had expected that it would reduce it
to zero -- I'm not sure why it didn't manage that for this example, it usually
does unless you take special care). However, it clearly has done enough
optimisation to invalidate this micro-benchmark completely.
I haven't tried applying a similar benchmarkng approach to your (and the OPs)
original BooleanArray code myself, but unless/until you do try it, you won't
know what the performance of that code actually is.
-- chris
=======================
public class Test
{
static final int LOOPS = 100000000;
public static void
main(String[] args)
{
for (;;)
{
timeit1();
timeit2();
timeit3();
}
}
static void
timeit1()
{
long startTime;
int test;
IntTest intTest = new IntTest();
startTime = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++)
{
intTest.i = i;
test = intTest.i;
}
println("intTest.i : " + (System.currentTimeMillis() - startTime) + "
ms.");
}
static void
timeit2()
{
long startTime;
int test;
IntTest intTest = new IntTest();
startTime = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++)
{
IntTest.j = i;
test = IntTest.j;
}
println("IntTest.j : " + (System.currentTimeMillis() - startTime) + "
ms.");
}
static void
timeit3()
{
long startTime;
int test;
IntTest intTest = new IntTest();
startTime = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++)
{
intTest.set(i);
test = intTest.get();
}
println("IntTest.set/get : " + (System.currentTimeMillis() - startTime)
+ " ms.");
}
static void
println(String s)
{
System.out.println(s);
}
}=======================
BitSet should be faster on some operations than a boolean[].
Specifically, bulk operations (eg: AND these two bitsets together)
It is also much more memory effecient (at best 32 times, and worst 8
times)
It would also be faster for nextSetBit and nextClearBit than a naive
boolean[] approach.
> It's always difficult to guess (correctly) what the JIT will do.
And here's an example that demonstrates the futility of these
micro-optimisations:
I took Chris's benchmark class, the OP's BooleanArray and my own
functionally equivalent BitString
(<https://watchmaker.dev.java.net/source/browse/watchmaker/trunk/framework/src/java/main/org/uncommons/watchmaker/framework/types/BitString.java?rev=101&view=auto&content-type=text/vnd.viewcvs-markup>).
Running the test with 100 million get/set pairs for each class I got these
results from the client VM (after letting it settle down):
BitString : 4484ms.
BooleanArray : 2628ms.
So clearly the OP's implementation is better than mine. I tried exactly
the same test with the server VM:
BitString : 398ms.
BooleanArray : 443ms.
That's more like it. Obviously my code requires a more sophisticated VM
to bring out the best in it :)
So why the poorer performance in the first run? Well my implementation
validates the index passed to the get and set methods. Commenting this
out and re-running on the client VM, the difference in performance of the
two implementations isn't as significant as before:
BitString : 2859ms.
BooleanArray : 2628ms.
Right, so without the overhead of that validation code, my implementation
is going to be much faster than the OP's on the server VM...
BitString : 742ms.
BooleanArray : 443ms.
Not exactly. Less is more. I can only assume that the presence of the
validation allows the server JIT to make optimising assumptions that it
can't make in its absence.
I'll let you draw your own conclusions... (FWIW, these figures are all
from Apple's 5.0 JVM).
Dan.
--
Daniel Dyer
https://watchmaker.dev.java.net - Evolutionary Algorithm Framework for Java
[me:]
> > It's always difficult to guess (correctly) what the JIT will do.
>
> And here's an example that demonstrates the futility of these
> micro-optimisations: [...]
Interesting figures !
As another side-light, I wanted to take a look at the JITed code generated for
the benchmark I posted, but the muddle introduced by mixing up the loop itself
with the timing structure scrambled the picture more than I could decode (the
tools I'm using are incomplete, and my IA32 assember skills are even more so),
so I pulled the actual loops out into separate functions. E..g:
=========================
static void
timeit1()
{
long startTime;
int test;
IntTest intTest = new IntTest();
startTime = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++)
{
intTest.i = i;
test = intTest.i;
}
println("intTest.i : " + (System.currentTimeMillis() - startTime) + "
ms.");
}
=========================
turned into
=========================
static void
timeit1()
{
long startTime;
int test;
IntTest intTest = new IntTest();
startTime = System.currentTimeMillis();
dot1();
println("intTest.i : " + (System.currentTimeMillis() - startTime) + "
ms.");
}
static void
doit1()
{
int test;
IntTest intTest = new IntTest();
for (int i = 0; i < LOOPS; i++)
{
intTest.i = i;
test = intTest.i;
}
}
=========================
It's perhaps not too surprising that that changed how the JITer to optimised.
In this case reducing the timed "loop" from 20ms to 10ms -- and also generating
completely identical code for the version which used direct access to instance
fields and the version which used the getter/setter methods.
The point I want to make is that it's example of how better /factored/ code
(written to be readable by humans) can also be more /effcient/ code (since the
JIT has less muddle to contend with as it tries to work out what can be
optimised).
-- chris
> BitSet should be faster on some operations than a boolean[].
>
> Specifically, bulk operations (eg: AND these two bitsets together)
> It is also much more memory effecient (at best 32 times, and worst 8
> times)
>
> It would also be faster for nextSetBit and nextClearBit than a naive
> boolean[] approach.
But of course it depends on the mix of operations. I remember that during the
discussion of Eratosthenes's Sieve, a little while ago here[*], the question
came up of whether a BitSet or a boolean[] would be quicker. Despite the
advantages of compactness for the BitSet, I think that everyone who measured it
found the boolean[] version ran substantially faster (around x3 in my own
experiment -- which I didn't bother to post). That's mildly interesting
because it is (in a limited way) a real application of BitSet/boolean[] rather
than a necessarily rather contrived benchmark.
([*] I can't remember whether it was you or the other Daniel who posted code
and numbers to that thread -- maybe both...)
-- chris