benchmarks

5 views
Skip to first unread message

Dimiter Prodanov

unread,
Oct 30, 2010, 8:07:34 PM10/30/10
to ima...@googlegroups.com, Rasband Wayne
Hi everybody,

After the brainstorming at the conference I worked on some new
iteration patterns of PixLib.
The results and the code are attached.

My configuration is HP Compaq Intel CPU; Sun Java 6 on Ubuntu Lucid.

best regards,

Dimiter

=====================

array length 25000000
byte[]
-- MEMORY OVERHEAD --
Raw: 24999888 bytes
ImageJ: 14992 bytes
PixLib: 5808 bytes
Imglib (Array): 3936 bytes
Imglib (Cell): 61241184 bytes
Imglib (Planar): 24761304 bytes
Imglib (ImagePlus): 12032 bytes

-- TIME PERFORMANCE --
Iteration #1/3:
Raw: 23 ms 1.0 0.40350878
ImageJ: 57 ms 2.4782608 1.0
PixLib (Refl): 10151 ms 441.34784 178.08772
PixLib (Array): 239 ms 10.391304 4.1929827
Imglib (Array): 43 ms 1.8695652 0.75438595
Imglib (Cell): 326 ms 14.173913 5.7192984
Imglib (Planar): 616 ms 26.782608 10.807017
Imglib (ImagePlus): 992 ms 43.130436 17.40351
Iteration #2/3:
Raw: 17 ms 1.0 0.36956522
ImageJ: 46 ms 2.7058823 1.0
PixLib (Refl): 8757 ms 515.1177 190.36957
PixLib (Array): 222 ms 13.058824 4.826087
Imglib (Array): 941 ms 55.35294 20.456522
Imglib (Cell): 644 ms 37.882355 14.0
Imglib (Planar): 936 ms 55.058823 20.347826
Imglib (ImagePlus): 953 ms 56.058823 20.717392
Iteration #3/3:
Raw: 16 ms 1.0 0.4
ImageJ: 40 ms 2.5 1.0
PixLib (Refl): 8427 ms 526.6875 210.675
PixLib (Array): 249 ms 15.5625 6.225
Imglib (Array): 938 ms 58.625 23.45
Imglib (Cell): 940 ms 58.75 23.5
Imglib (Planar): 952 ms 59.5 23.8
Imglib (ImagePlus): 952 ms 59.5 23.8

system.png
invert-benchmark.zip

Stephan Saalfeld

unread,
Oct 30, 2010, 11:03:41 PM10/30/10
to ima...@googlegroups.com, Rasband Wayne
Hi Dimiter and all other benchmarkers,

thanks for adding this. I tested it and found that you can gain even a
bit more speed when removing line 49 from ByteForwardIterator, the JVM
will let that exception happen under exactly the conditions that you
test for :):

protected byte getSingle(int z) throws ArrayIndexOutOfBoundsException{
// if (z<0 || z>size) throw new ArrayIndexOutOfBoundsException(z);
return bpixels[z];
}

We found as well that, in this test, ImgLib suffers from bad JVM runtime
optimization when working with more than one branch through the
interface hierarchy (in the test, it's 4 branches, one for each
container, all containers are tested by the same generic method through
the Cursor interface). All other tests do not work through interfaces
which makes them not being affected by this. If we change the ImgLib
test to explicit classes such as how byte[], ByteProcessor and PixLib
are tested, then we get the following results:

array length 25000000
byte[]
-- MEMORY OVERHEAD --

Raw: 24998464 bytes
ImageJ: 23528 bytes
PixLib: 10248 bytes
Imglib (Array): 11472 bytes
Imglib (Cell): 80216080 bytes
Imglib (Planar): 24783400 bytes
Imglib (ImagePlus): 25152 bytes

-- TIME PERFORMANCE --
Iteration #1/3:

Raw: 26 ms 1.0 0.52
ImageJ: 50 ms 1.9230769 1.0
PixLib (Refl): 6600 ms 253.84616 132.0
PixLib (Array): 196 ms 7.5384617 3.92
Imglib (Array): 39 ms 1.5 0.78
Imglib (Cell): 268 ms 10.307693 5.36
Imglib (Planar): 154 ms 5.923077 3.08
Imglib (ImagePlus): 153 ms 5.8846154 3.06
Iteration #2/3:
Raw: 17 ms 1.0 0.41463414
ImageJ: 41 ms 2.4117646 1.0
PixLib (Refl): 5404 ms 317.88235 131.80487
PixLib (Array): 187 ms 11.0 4.5609756
Imglib (Array): 24 ms 1.4117647 0.58536583
Imglib (Cell): 255 ms 15.0 6.219512
Imglib (Planar): 144 ms 8.470589 3.512195
Imglib (ImagePlus): 143 ms 8.411765 3.487805
Iteration #3/3:
Raw: 17 ms 1.0 0.41463414
ImageJ: 41 ms 2.4117646 1.0
PixLib (Refl): 5334 ms 313.7647 130.09756
PixLib (Array): 182 ms 10.705882 4.4390244
Imglib (Array): 17 ms 1.0 0.41463414
Imglib (Cell): 178 ms 10.470589 4.3414636
Imglib (Planar): 157 ms 9.235294 3.8292682
Imglib (ImagePlus): 157 ms 9.235294 3.8292682


on a T61 with the same CPU, OS and Java as yours. We get the same
results for each container when tested individually through the generic
method (comment all three other container tests). Note that the speed
of the ArrayContainer ends up at the speed of raw byte[] iteration.
This is the only container that is limited to 2G pixels because it uses
a single array internally to store the pixels (such as byte[],
ByteProcessor and the ByteForwardIterator).

Find the extended test attached.

Best regards,
Stephan

InvertBenchmark.java

Dimiter Prodanov

unread,
Oct 31, 2010, 3:58:35 PM10/31/10
to ima...@googlegroups.com, Rasband Wayne
Thanks for the suggestion Stephan. I will include it in the new commit.
Do you think that it makes sense to specify the put methods as "synchronized"?
If I remove the synchronization condition then the put method is
optimized as byte[] array
and has similar performance (jar attached).

I think that this benchmark demonstrates the limitations in the
current implementation of the Generics patterns.

best regards,

Dimiter

===================================================

-- MEMORY OVERHEAD --
Raw: 15999888 bytes
ImageJ: 14712 bytes
PixLib: 5856 bytes
Imglib (Array): 4080 bytes
Imglib (Cell): 39176400 bytes
Imglib (Planar): 15866008 bytes
Imglib (ImagePlus): 10616 bytes

-- TIME PERFORMANCE --
Iteration #1/3:

Raw: 16 ms 1.0 0.8
ImageJ: 20 ms 1.25 1.0
PixLib (Refl): 6349 ms 396.8125 317.45
PixLib (Array: generic): 94 ms 5.875 4.7
PixLib (Array: byte): 36 ms 5.875 4.7
Imglib (Array): 34 ms 2.125 1.7
Imglib (Cell): 181 ms 11.3125 9.05
Imglib (Planar): 127 ms 7.9375 6.35
Imglib (ImagePlus): 131 ms 8.1875 6.55
Iteration #2/3:
Raw: 10 ms 1.0 0.90909094
ImageJ: 11 ms 1.1 1.0
PixLib (Refl): 5493 ms 549.3 499.36365
PixLib (Array: generic): 99 ms 9.9 9.0
PixLib (Array: byte): 12 ms 9.9 9.0
Imglib (Array): 13 ms 1.3 1.1818181
Imglib (Cell): 165 ms 16.5 15.0
Imglib (Planar): 114 ms 11.4 10.363636
Imglib (ImagePlus): 119 ms 11.9 10.818182
Iteration #3/3:
Raw: 11 ms 1.0 0.9166667
ImageJ: 12 ms 1.0909091 1.0
PixLib (Refl): 5551 ms 504.63635 462.58334
PixLib (Array: generic): 98 ms 8.909091 8.166667
PixLib (Array: byte): 13 ms 8.909091 8.166667
Imglib (Array): 12 ms 1.0909091 1.0
Imglib (Cell): 132 ms 12.0 11.0
Imglib (Planar): 138 ms 12.545455 11.5
Imglib (ImagePlus): 135 ms 12.272727 11.25

> --
> You received this message because you are subscribed to the Google Groups "ImageJX" group.
> To post to this group, send email to ima...@googlegroups.com.
> To unsubscribe from this group, send email to imagejx+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/imagejx?hl=en.
>
>

pixlib.jar
InvertBenchmark1.java

Curtis Rueden

unread,
Nov 1, 2010, 4:21:36 AM11/1/10
to ima...@googlegroups.com, Rasband Wayne
Hi everyone,

I committed the latest version of the benchmark to the imglib codebase:
  http://pacific.mpi-cbg.de/cgi-bin/gitweb.cgi?p=imglib.git;a=commit;h=c95f2b81aa7a3d09483aa5af9679c8a1d6e32928

The code lives in the imglib-ij project's test area. I added a dependency on pixlib for the test phase only (same as JUnit). Eclipse users may need to right-click the project and choose "Maven > Update Dependencies" for this to be properly updated.

Later today I will add something more computationally intensive, to verify that the overhead is linear with the number of pixels.

-Curtis

Stephan Saalfeld

unread,
Nov 1, 2010, 4:29:54 AM11/1/10
to ima...@googlegroups.com, Rasband Wayne
Hi Dimiter,

On Sun, 2010-10-31 at 20:58 +0100, Dimiter Prodanov wrote:
> Thanks for the suggestion Stephan. I will include it in the new commit.
> Do you think that it makes sense to specify the put methods as "synchronized"?
> If I remove the synchronization condition then the put method is
> optimized as byte[] array
> and has similar performance (jar attached).
>

Ah!---Nice! I haven't noticed that the put method was synchronized,
great that you spotted that as the reason for the lower performance. I
would never synchronize any highly accessed method by default. Each
multi-threaded application needs other blocks to be synchronized (or
not) which, then, is usually much more effective than synchronizing a
whole method even if it is not needed at all.

> I think that this benchmark demonstrates the limitations in the
> current implementation of the Generics patterns.
>

Actually not. In the first place, it shows that the current Hotspot JVM
is able to just-in-time compile and inline a class or interface based
hierarchy of virtual method calls down to the basic operations that it
performs and that there is no extra penalty for doing this through
interfaces defined as generic parameters. We have tested this before.
It is the most important argument against any resentments against
flexible interface-based libraries compared with unflexible `optimized'
special purpose code. The most interesting point is that the complexity
of the interface hierarchy does not matter which makes the separation of
data access and type as implemented in ImgLib possible and efficient to
no extra runtime-cost. No need to mix up things. Secondly, the test
shows that the optimization does not yet work if the runtime has to
handle several alternative paths through the interface hierarchy. This
is new to me, because we have never tested that. While this is the
case, we can still get optimal performance of a generic algorithm by
just replacing the interface types in the implementation by a set of
explicit class types that we will use. That is we will have to do the
job of a C++ like template compiler semi-manually. Still---the
algorithm code itself remains the same in all cases (and can be executed
for all non-specialized types) such that specialization can be done
automatically---it's just copy-and-rename and a pre-execution type
check. Note also, that this would have to be done for the frequently
called methods only, in ImgLib, that is Type and Cursors, in PixLib,
that is AbstractIterators.

Dimiter, in the PixLib repository, you have checked in your eclipse
project instead of the plain sources. What do you think about checking
in the sources only

http://pixlib.git.sourceforge.net/git/gitweb.cgi?p=pixlib/pixlib;a=tree;f=src

such that everybody can easily check that out into his personal IDE
project. Of course one can do that with some sym-link gymnastics but...

Thanks very much for your efforts.

Best,
Stephan

Grant B. Harris

unread,
Nov 1, 2010, 9:19:14 AM11/1/10
to ima...@googlegroups.com
Stephan,

Regarding a C++ like template compiler, the 'jeneral' project (http://code.google.com/p/jeneral) might be interesting. "Jeneral is a pure-Java class templates framework, that uses annotations processors... aims at bridging the gap between Java's poor type-erasure generics and more powerful abstraction syntaxes found in other languages, such as C#'s generics or C++'s templates."

- Grant


Johannes Schindelin

unread,
Nov 1, 2010, 11:35:13 AM11/1/10
to Curtis Rueden, ima...@googlegroups.com, Rasband Wayne
Hi,

On Mon, 1 Nov 2010, Curtis Rueden wrote:

> I committed the latest version of the benchmark to the imglib codebase:
>
> http://pacific.mpi-cbg.de/cgi-bin/gitweb.cgi?p=imglib.git;a=commit;h=c95f2b81aa7a3d09483aa5af9679c8a1d6e32928

In case you wondered -- like me -- how to run this beast:

mvn -Dexec.mainClass=tests.PerformanceBenchmark \
-Dexec.classpathScope=test exec:java

Do not forget the classpathScope; the benchmark is only visible in the
'test' scope.

Just for kicks (and because I claimed that the JDK7 JIT might be better),
I tested (with and) without Stephan's performance tweaks. Since I am only
interested in the long-run performance (see below for my tentative plan):

-- snip JDK6 with tweaks --
Raw: 10 ms, 1.0, 1.0
ImageJ: 10 ms, 1.0, 1.0
Imglib (Array): 13 ms, 1.3, 1.3
Imglib (Cell): 112 ms, 11.2, 11.2
Imglib (Planar): 100 ms, 10.0, 10.0
Imglib (ImagePlus): 101 ms, 10.1, 10.1
PixLib (Refl): 3347 ms, 334.7, 334.7
PixLib (Array: generic): 61 ms, 6.1, 6.1
PixLib (Array: byte): 11 ms, 1.1, 1.1
-- snap --

-- snip JDK7 with tweaks --
Raw: 10 ms, 1.0, 0.90909094
ImageJ: 11 ms, 1.1, 1.0
Imglib (Array): 12 ms, 1.2, 1.0909091
Imglib (Cell): 109 ms, 10.9, 9.909091
Imglib (Planar): 90 ms, 9.0, 8.181818
Imglib (ImagePlus): 93 ms, 9.3, 8.454545
PixLib (Refl): 3622 ms, 362.2, 329.27274
PixLib (Array: generic): 65 ms, 6.5, 5.909091
PixLib (Array: byte): 13 ms, 1.3, 1.1818181
-- snap --

-- snip JDK6 without tweaks --
Raw: 11 ms, 1.0, 0.9166667
ImageJ: 12 ms, 1.0909091, 1.0
Imglib (Array): 614 ms, 55.81818, 51.166668
Imglib (Cell): 598 ms, 54.363636, 49.833332
Imglib (Planar): 717 ms, 65.181816, 59.75
Imglib (ImagePlus): 679 ms, 61.727272, 56.583332
PixLib (Refl): 3301 ms, 300.0909, 275.08334
PixLib (Array: generic): 64 ms, 5.818182, 5.3333335
PixLib (Array: byte): 10 ms, 0.90909094, 0.8333333
-- snap --

-- snip JDK7 without tweaks --
Raw: 10 ms, 1.0, 0.90909094
ImageJ: 11 ms, 1.1, 1.0
Imglib (Array): 614 ms, 61.4, 55.81818
Imglib (Cell): 607 ms, 60.7, 55.18182
Imglib (Planar): 681 ms, 68.1, 61.909092
Imglib (ImagePlus): 674 ms, 67.4, 61.272728
PixLib (Refl): 3646 ms, 364.6, 331.45456
PixLib (Array: generic): 63 ms, 6.3, 5.7272725
PixLib (Array: byte): 11 ms, 1.1, 1.0
-- snap --

So while JDK7 performs better in general, it still has that bug where it
reverts to an utterly slow code path when you use the same methods with
different subclasses of Type and Cursor.

I could imagine, though, that it would not be too difficult to write a
"class visitor" based on ASM[*1*] to generate classes where each occurence
of, say, RealType in the bytecode is replaced with a given subclass.

For example, if you had a class

public class<T extends RealType<T>> InvertImage
implements Algorithm {
[...]
public boolean process() {
final Cursor<T> cursor = img.createCursor();
for (final T t : cursor)
t.set( 255 - t.get() );
cursor.close();
}
[...]
}

the ASM-based transformer should be able to produce a specialized class
via a call similar to this:

Algorithm<UnsignedByteType> algorithm =
transformer.get("my.package.InvertImageUnsignedByte",
InvertImage.class,
RealType.class, UnsignedByteType.class,
Cursor.class, ArrayCursor.class);

Of course, there could be a shortcut for this, too, which would take an
algorithm and an image as input and automatically transform the class (and
cache the result) according to the current image, and making up a class
name of the transformed class on the fly (i.e. something like append
"UnsignedByteArray" for the given example).

The provided functionality would definitely need to live by the paradigm
"make the easy simple", though.

That ASM-based code would definitely be an add-on, even if only for the
additional dependency (which is Fiji already three times, due to
Beanshell, JRuby and Jython having their own version, inside their own
package structure).

Ciao,
Dscho

Footnote [*1*]: For a tutorial, see http://asm.ow2.org/doc/tutorial.html
section "Bytecode Transformation". It really looks like assembler, but it
only has to be done once.


Curtis Rueden

unread,
Nov 2, 2010, 7:11:34 AM11/2/10
to ima...@googlegroups.com, Rasband Wayne
Hi everyone,


In case you wondered -- like me -- how to run this beast:

       mvn -Dexec.mainClass=tests.PerformanceBenchmark \
               -Dexec.classpathScope=test exec:java

Do not forget the classpathScope; the benchmark is only visible in the
'test' scope.

Thanks, this is a nice way to execute a Java class through Maven, making sure all dependencies are on the classpath.

Here is another way that launches the class directly:

mvn dependency:copy-dependencies
cd target
java -cp 'dependency/*':imglib-ij-2.0-SNAPSHOT.jar:test-classes -mx512m tests.PerformanceBenchmark 3000

Where "3000" means "test with a 3000x3000 image."

This works by copying all dependent JARs to a single convenient location (target/dependency) and then setting the classpath accordingly.

I have run the benchmark on various image sizes and am now generating some charts using Google Charts, so it's easy to update the data later as the code changes. I will reply back again when it's posted online.

-Curtis



--
Reply all
Reply to author
Forward
0 new messages