> Okay, recently Timo Nentwig complained that the Sun picture viewer is
> so horrible slow when images are scaled and why it's not possible to
> scale images in realtime.
No, I complained that the image producing is so horrible slow. And it is indeed.
For an example vistit http://www.timo-nentwig.de and have a look at the provided
code of my Liquid Applet. It is obsolete (in the meantime I wrong some better)
and unfortunately I lost the sources so there is only the main rendering code
and you will have to reconstruct the remaining stuff yourself. And you will
notice that my code is very fast the image producing (an optimized
MemoryImageSource) is problem.
And then tell me what I can do about it and speed it up. I know a lot of people
who were really interested... :->
[...]
>And then tell me what I can do about it and speed it up. I know a lot of people
>who were really interested... :->
I didn't follow your complete thread. Did BufferedImage come up at
some point? You can set / get several pixels at a time with getRGB /
setRGB.
Regards,
Marco
--
Please reply in the newsgroup, not by email!
Java programming tips: http://jiu.sourceforge.net/javatips.html
Other Java pages: http://www.geocities.com/marcoschmidt.geo/java.html
Well, while function calls are unfortunately slow as well (you'll notice in this
particular case) BufferedImage does image producing as well and hence is for the
same reason slow as well. It is the image producing and there is no way to avoid
it.
I yet have to do some investigations on VolatileImage...I have an idea but I'm
not quite sure whether it works and if it's faster...
What a surprise...
> Could you please talk in Java or general computer terms, so everyone
> understands you. Don't tell me you produce a new image for each frame?
It is a Java term...
I ran your "Liquid" Applet (Demo 1) on my 133 MHz "test" system and got 5.1
fps (about 196 ms per frame). I wrote a simple Applet that used animated
MemoryImageSource (200 by 200 pixels, same as your demo) and called
newPixels on every frame (same as your demo), and it got 32 fps (about 31 ms
per frame). The main difference between our Applets is that in your Applet
the integer array used in MemoryImageSource was modified between each frame.
Both Applets sent new pixel information to the ImageConsumers every frame.
My conclusion is that your Applet is spending about 165 ms (84% of the time)
updating the integer array and 31 ms (16% of the time) sending new pixels to
the ImageConsumers. Even though your code may be "very fast", the main
performance bottleneck in your Applet seems to be updating the values in the
integer array, and not in sending the new pixels to the Image Consumer (by a
5 to 1 margin!). I suspect that on other platforms (especially those with
faster CPUs), different ratios will be found.
Carl G.
My head is really beginning to hurt...
> But they aren't slow, because the JVM is allowed to inline methods
> automatically.
Buddy, believe me, I really did comprehensive investigations when I wrong my
applets 2 years ago, a setPixel() is slower than x[y]=z; BTW, array accessess
are damn slow in Java as well. Only under certain circumstances and conditions
methods are beiing inlined and this makes it hard to program...and not all
compilers do inline (under the same condition) AFAIK. But this was 2 years
ago... and setPixels() cannot be static as well at all..
> I ran your "Liquid" Applet (Demo 1) on my 133 MHz "test" system and
> got 5.1 fps (about 196 ms per frame). I wrote a simple Applet that
> used animated MemoryImageSource (200 by 200 pixels, same as your
> demo) and called newPixels on every frame (same as your demo), and it
> got 32 fps (about 31 ms per frame). The main difference between our
> Applets is that in your Applet the integer array used in
> MemoryImageSource was modified between each frame. Both Applets sent
Of couse it is...it does even alpha blend the textures, i.e. additional 3
multiplications per pixel and 2-3 additional array accesses per pixel.
BTW: image production is of course faster on 8bpp but I assume you used a 32bpp
color model...
> new pixel information to the ImageConsumers every frame. My
> conclusion is that your Applet is spending about 165 ms (84% of the
> time) updating the integer array and 31 ms (16% of the time) sending
> new pixels to the ImageConsumers. Even though your code may be "very
> fast", the main performance bottleneck in your Applet seems to be
> updating the values in the integer array, and not in sending the new
> pixels to the Image Consumer (by a 5 to 1 margin!). I suspect that
> on other platforms (especially those with faster CPUs), different
> ratios will be found.
Of course does the rendering take much longer than the image producing...
But 31ms are *FAR* to much. 1000ms==1s/31ms==32fps. Without ANY rendering at
all and without yielding time to other tasks the maximum frame rate is 32fps!
My current code is BTW actually significantly faster...
> I wrote a test program as well and I get 268 fps when _not_ modifying
> the source array between two repaints (however, I still create a new
> image object before each repaint and always from the source array,
> otherwise it wouldn't be fair). So if in theory that speed is
> possible, but I get less frames in practice, it must be caused by
> performing image modifications and not by re-creating a new image
> object for each frame.
And - TGOS - this issue is called image production. You can easily BLIT 268
frames per second (this is done by native code) but you cannot PRODUCE 268
images a second (this is not done by native code).
> The liquid demo produces about ~28 fps on my system. Graphical effects
> like this are very CPU hungry and no, they can't easily be accelerated
Yes, buddy, it is very CPU hungry...that's it.
> in hardware, as Timo demands all the time. Also I've never seen a game
IT IS THE IMAGE PRODUCTION; TRANSFORMING A ARRAY OF INTs TO AN IMAGE. DAMN HOW
WILL IT TAKE TILL YOU WILL UNDERSTAND IT?!
> using that realistic water effects in a comparable quality and neither
> Window's GDI, nor their DirectX APIs offer a function that would speed
> up these effects noticably.
>
> The following demo
>
> http://rsb.info.nih.gov/plasma/
404
> produces 254 fps on my system.
>
>
> Timo, have you tried to run your fluid demo, performing all
> calculations and stuff, but just not painting the result to the
> screen (also not creating an image object of the array)? How many
No. I did this to see it. Otherwise it does not really make sense. But 2 years
ago I found out that the image production takes quite some time. And, buddy, if
my Liquid does not impress you have a look at Peter's obsolete 3D engine, maybe
this does impress you and he claims the image production as well. Not even the
Fullscreen API offers the possibility to access VRAM directly to speed the
things up.
> frames a second can you calculate without painting them? If that's
More.
> only slightly more than you get when painting them, it doesn't look
> like painting is the bottle neck.
Yes, you are right, the rendering does OF COUSE take most of the time. But
nevertheless the image production is slow. In case you yet did not notice: this
effect has no purpose. It is a demo style effects. I will never make money with
it. I made it just to be as fast and as good as possible. And that's why the
stuff you say is simply off topic in this case.
> IT IS THE IMAGE PRODUCTION; TRANSFORMING A ARRAY OF INTs TO AN IMAGE.
> DAMN HOW WILL IT TAKE TILL YOU WILL UNDERSTAND IT?!
Array accesses as well. And floating point operations, multiplications and
divisions...about in this order.
>> using that realistic water effects in a comparable quality and
>> neither Window's GDI, nor their DirectX APIs offer a function that
>> would speed up these effects noticably.
What I did is only a simple effect. Realitistic water is something completely
different, e.g.:
http://cgi3.tky.3web.ne.jp/~tkano/tlwater.shtml
IMHO there is a lot you yet have no seen and there is even more you yet do not
know...
> IT IS THE IMAGE PRODUCTION; TRANSFORMING A ARRAY OF INTs TO AN IMAGE.
> DAMN HOW WILL IT TAKE TILL YOU WILL UNDERSTAND IT?!
...HOW *LONG* ...
> No. I did this to see it. Otherwise it does not really make sense.
> But 2 years ago I found out that the image production takes quite
> some time. And, buddy, if my Liquid does not impress you have a look
> at Peter's obsolete 3D engine, maybe this does impress you and he
> claims the image production as well. Not even the Fullscreen API
claims->complains
:)
Buddy, java.awt.image.ImageProducer. Have a look at the JDK sources...
>> You can easily BLIT 268 frames per second (this is done by native
>> code) but you cannot PRODUCE 268 images a second (this is not done
>> by native code).
>
> You don't have to.
> You only create a single one and update it for each frame.
And this is called image production...
> But let's look at this test:
>
> Toolkit tk = Toolkit.getDefaultToolkit();
> int[][] imageData =
> new int[250][200 * 200]; //pixels for 250 images
Uhh, bad idea, newbie, 2 dimensional arrays are even slower than 1 dimensional.
> timeInternal = System.currentTimeMillis();
> for (int i = 0; i < 250; i++) {
> g.drawImage(images[i], 5, 5, this);
This is called blitting.
Where do you actually modify the pixel array?
> If you just start this normally, you will get OutOfMemory errors,
> after all all the arrays and images take up quite a bit of memory. I
> set the JVM memory size to 100 MB on start-up and the whole image
> "creation" process took 50 ms. So the creation of a single object
> takes 0.2 ms. Drawing them all to screen took 880 ms, so painting a
> single one takes 3.25 ms, or IOW, you could paint 307 to screen a
> second.
Great. But we are still not talking about blitting but still abour image
production.
To those how are interested, this is IMHO actually the fastest way to do image
production in this particular case:
import java.awt.image.*;
public class FastImageProducer implements ImageProducer
{
private ImageConsumer consumer;
private int w,h;
private ColorModel cm;
private int[] pixel;
private int hints,sfd;
public FastImageProducer(int w, int h, ColorModel cm, int pixel[])
{
this.w=w;
this.h=h;
this.cm=cm;
this.pixel=pixel;
hints=ImageConsumer.TOPDOWNLEFTRIGHT
|ImageConsumer.COMPLETESCANLINES
|ImageConsumer.SINGLEPASS
|ImageConsumer.SINGLEFRAME;
sfd=ImageConsumer.SINGLEFRAMEDONE;
}
public synchronized void addConsumer(ImageConsumer consumer)
{
this.consumer=consumer;
}
public final void startProduction(ImageConsumer imageconsumer)
{
if (consumer!=imageconsumer)
{
consumer=imageconsumer;
consumer.setDimensions(w,h);
consumer.setProperties(null);
consumer.setColorModel(cm);
consumer.setHints(hints);
}
consumer.setPixels(0, 0, w, h, cm, pixel, 0, w);
consumer.imageComplete(sfd);
}
public void update()
{
if (consumer!=null) startProduction(consumer);
}
public final boolean isConsumer(ImageConsumer imageconsumer)
{
return consumer==imageconsumer;
}
public final void requestTopDownLeftRightResend(ImageConsumer imageconsumer)
{
}
public final void removeConsumer(ImageConsumer imageconsumer)
{
}
}
:> The following demo
:>
:> http://rsb.info.nih.gov/plasma/
: 404
You may want to try that again. It seems relevant to me.
--
__________
|im |yler http://timtyler.org/ t...@tt1.org
: Of course does the rendering take much longer than the image producing...
: But 31ms are *FAR* to much. 1000ms==1s/31ms==32fps. Without ANY rendering at
: all and without yielding time to other tasks the maximum frame rate is 32fps!
: My current code is BTW actually significantly faster...
http://rsb.info.nih.gov/plasma/ seems to be of a similar size.
32fps on that seems to be at about the level of a 233Mhz Pentium 2
running the old Netscape JVM - and that /is/ doing some rendering, as
well as producing the image.
Probably :->
>> Uhh, bad idea, newbie, 2 dimensional arrays are even slower than 1
>> dimensional.
>
> Fine for you, but I want 250 images, so I store them in an array.
> And don't call me newbie, Troll!
s.a. :->
404...
Maybe someone can mirror it?
There are severeal water effects but they all cheat in some way (e.g. don't
check the borders of the display area so wave go out on the right and come in at
the left - this does quite speed up the whole thing [durius.com] run in 8bpp
only or render a size of 2^n in order to you bit shifts instead of slow
multiplications). I did not see any faster code than my current one and it is
quite versatile as well...well, there's one cheating I do as well :)
BTW, there are severeal low-level issues to significantly speed Java code up;
unfortunately I never collected them :) If anybody know a good collection please
pass it to me...
e.g. is
for (int a=0; a < b; a++)
*significantly* slower than
for (int a=b-1; --a>=0;)
In a nutshell because the VM/RISC has an instruction to compare with zero AFAIK.
But there is lots of stuff like this and it becomes somewhat like assembly
programming and I don't thing that this should be the case today anymore but in
Java it really does make difference. The same applies e.g. to array accesses.
They are really _very_ slow and it causes you to have to write "bad" code, well
kinda coded code...and I IMHO such very-low-level optimizations are just not
reasonable and hence acceptable (while in case of array accesses I understand
why it only can hence must be slower but not in case of issues like image
production... I simply cannot accept that the UI of applications like JBuilder
is that slow on a 1GHz Athlon with a TNT2 and 768MByte, this must be sufficient
to enable a fast UI. Soon it will require a SGI server to control my
refrigerator...).
:> http://rsb.info.nih.gov/plasma/ seems to be of a similar size.
:>
:> 32fps on that seems to be at about the level of a 233Mhz Pentium 2
:> running the old Netscape JVM - and that /is/ doing some rendering, as
:> well as producing the image.
: 404...
: Maybe someone can mirror it?
http://web.archive.org/web/*/http://rsb.info.nih.gov/plasma/
>
> "Timo Nentwig" <timo.n...@web.de> wrote in comp.lang.java.programmer:
>
> > for (int a=0; a < b; a++)
> >
> > *significantly* slower than
> >
> > for (int a=b-1; --a>=0;)
>
> There's no difference for short for loops, but for very long ones, the
> second /first/ one is faster.
Using exactly the same class file on each machine, I have found that for
(int a=b-1; --a>=0;) is the fastest loop on my iMac running OS X by about
40%, but on my all my Windows machines both tests are near enough the
same.
I don't know why this would be but it seems to me that the underlying
machine architecture has a significant effect.
Rob.
That seems reasonable. It may well depend on what native comparison
operators there are. I would expect that: --a!=0 might be even faster,
although I would still separate it into
(int a=b-1; a!=0; a--)
for clarity.
--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
It's not always faster...depends on processor and VM.
> I don't know why this would be but it seems to me that the underlying
> machine architecture has a significant effect.
As I sayed, in a nutshell because the VM/RISC has a instruction to compare <a>
with zero. In case of (a < b) <b> may have to put into a processor register
first in order to compare with <a>. And because of stack stuff and so on.
German-spoken: google groups for "nentwig transversive".
java.lang.ClassFormatError: Plasma (Extra bytes at the end of the class file)
at java.lang.ClassLoader.defineClass0(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:509)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:123)
at sun.applet.AppletClassLoader.findClass(AppletClassLoader.java:146)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.applet.AppletClassLoader.loadClass(AppletClassLoader.java:112)
at java.lang.ClassLoader.loadClass(ClassLoader.java:262)
at sun.applet.AppletClassLoader.loadCode(AppletClassLoader.java:473)
at sun.applet.AppletPanel.createApplet(AppletPanel.java:548)
at sun.applet.AppletPanel.runLoader(AppletPanel.java:477)
at sun.applet.AppletPanel.run(AppletPanel.java:290)
at java.lang.Thread.run(Thread.java:536)
Well, I had to recompile to class...runs at about 100fps on my Athlon 1GHz,
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)
No a very impressive applet/effect :-)
And it uses a IndexColorModel which does make image production _much_ faster...
> My current code is BTW actually significantly faster...
Between 50 and 60 fps; Athlon 1GHz, Hotspot client.
What tool do you use to get assembly language from Java code?
-Mike
> You are probably speaking of method calls, don't you?
> But they aren't slow, because the JVM is allowed to inline methods
> automatically.
They are. Even in case of _simple_ _static_ _final_ methods...
Athlon 1GHz
java version "1.4.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)
jikes 1.15 takes 40ms opposed to 10ms
javac -O takes 40ms as well opposed to 20ms
(1 iteration; not propotional to increasing array size)
Test-Code:
class Red
{
public static final int from(int[]p, int i)
{
return (p[i]&0xFF0000)>>16;
}
}
public class Test
{
int[] pixel;
int px;
public static void main(String[] args)
{
new Test();
}
public Test()
{
pixel = new int[1024*768];
for (int i = 0; i<pixel.length; i++)
pixel[i] = 0xffffffff;
long
t0 = System.currentTimeMillis();
for (int i = 0; i<pixel.length; i++)
px = Red.from(pixel, i);
System.out.println(System.currentTimeMillis()-t0);
t0 = System.currentTimeMillis();
for (int i = 0; i<pixel.length; i++)
px = (pixel[i]&0xFF0000)<<16;
System.out.println(System.currentTimeMillis()-t0);
}
}
Interesting. I get similar results. But only for large loops; in case of 640x480
the second one is faster (not measurable while ++ takes 10ms)....but not
always...maybe because of some background activity on my system...(?)
Finally I'm "optimizing" my code slow? :-(
Well, did not much Java in the past :)
> And now JBuilder doesn't run slow at all on my PC.
But it does not run fast as well...
Many thanks; I had no idea about this!
-Mike
> private static void test1() {
> int i, j;
> int t = 0;
> int end = Integer.MAX_VALUE;
> for (i = 0; i < 10; i++) {
> for (j = 0; j < end; j++) {
> t++;
> }
> }
> }
>
> private static void test2() {
> int i, j;
> int t = 0;
> int end = Integer.MAX_VALUE - 1;
> for (i = 10; i >= 0; i--) {
> for (j = end; j >= 0; j--) {
> t++;
> }
> }
> }
> }
BTW you did not notice that the second loop is passed 11 times while the first
one only 10?
:->
Actually modern micro processors do _always_ compare to zero: they kinda
rephrase (i<10) to (i-10<0) what I actually did yesterday manually but it did
not speed up the code measurable.
So compare to zero is actually _always_ _faster_. Hotspot to blame for lacking
corresponding optimization (?)...
Hm, is there any inlining (byte-code) compiler? IMHO it would make sense for
setPixel(offset), getRed(pixel), crop(x,min,max) and stuff like this...
> 2) Why do you use a static method? If you think calling static methods
> of a class is always faster than calling instance methods, you are
> wrong.
Am I? When are they faster when are they not? But they are not slower, are they
:)
> You are right, this should have been "i > 0" and not "i >= 0".
> Still the first one is 12% faster on my machine.
To be fair code it as
int i
for (i=10-1; i!=0;i--)
But yes, it is indeed no more faster...
> for (int i...)
>> for (int i...)
>> for (int i...)
>
> is slower than
>
> int i;
> for (i...)
>> for (i...)
>> for (i...)
Yes, but this never made any difference at all when I tested it, it only does
mess up the code...
BTW, are all loops as fast as each other or may reformulation for to e.g.
int i=j;
while(something)
{
//code
i++;
}
be faster/slower? do-until?
> meaning you can always replace these two if you like. The extra
That's it. I _never_ use do-while at all (well, ok, in my published Liquid
code...hey, that was really the first time as I can remember ;) and
while...well, I use while very seldom as well...guess I only use it in run().
> Many inexperienced programmers don't know that for-loops are very
> flexible:
>
> for (int i = 10, int j = getSize(); i < 10 && j > -100; i++, j--)
I knew! :)
Consider e.g.
int x, y;
int offset = pixel.length - width - 1;
double pythagoras.
for (y = height; --y != 0; offset--)
{
for (x = width; --x != 0; offset--)
{
pythagoras = Math.sqrt(x*x+y*y);
...
pixel[offset]=something; // 1 slow mul less! :)
}
}
Elegant and real-life, isn't it?
But many experienced programmers use conditions redundantly:
public boolean isGreater(a,b)
{
return a>b;
}
public boolean isGreater(a,b)
{
if (a>b)
return true;
else
return false;
}
In genereal I agree but...read Jon Skeet's comment on this?
> Also there is no proof that != is faster than >=, both is just a
Hm...well as I said in general I agree to the above - don't like != either...ok,
I change it back :)
> if (isGreater(a, b) == true) {
;)
--x > 0
;)
I apologize for the dumb question :) I never had anything to do with (byte
code)assembly but are all byte code instructions equal fast? And in case they
are not (what I assume) is there any reference that lists the execution speed of
the several instructions?
Is e.g. iflt faster than if_cmplt?
Hey, don't laugh at me :)
> if (isGreater(a, b) == true) {
Just wish I could write
if(o)
instead of
if(o!=null)
:-(
I'm very glad I can't. Having the expression within an if *need* to be a
boolean means people no longer need to use the convention of:
if (5==x)
rather than
if (x==5)
just in case they miss an = sign out.
> I'm very glad I can't. Having the expression within an if *need* to be a
> boolean means people no longer need to use the convention of:
>
> if (5==x)
>
> rather than
>
> if (x==5)
Uh, I've never needed this (obfuscating) trick, because I've always
been lucky enough to use a compiler that is able to warn me about broken
code due to this problem.
regards
john
[x==5 vs 5==x]
> Uh, I've never needed this (obfuscating) trick, because I've always
> been lucky enough to use a compiler that is able to warn me about broken
> code due to this problem.
Likewise. You still see it recommended in various C books, however, and
lots of people still use it. With Java, almost no-one does because it's
*completely* unnecessary, and those who do can usually be persuaded of
its down-sides. I'd like to keep it that way.
>I apologize for the dumb question :) I never had anything to do with (byte
>code)assembly but are all byte code instructions equal fast? And in case they
>are not (what I assume) is there any reference that lists the execution speed of
>the several instructions?
It depends on the JVM. JITs will compile byte codes into different
machine code depending on context.
What you most need to know is that instance calls are slower than
static calls. Accessing local variables is faster that accessing
insntance variables. Private or final calls are faster than public
ones.
addition and subtraction and bit operations are fast. multiplication
and division are slow.
--
Available for tutoring, problem solving or contract
programming for $50 US per hour. The Java glossary is at
http://www.mindprod.com/jgloss.html
or http://64.251.89.39/jagg.html
-
canadian mind products, roedy green
Multiplication is often much faster than division. If you can manage to
avoid pipeline stalls it is sometimes almost as fast as addition.
int x = y/2;
int x = (int)(y*0.5f);
Unfortunately the float to int conversion may be slow(ish) on some machines.
>
> "Timo Nentwig" <timo.n...@web.de> wrote in comp.lang.java.programmer:
>
> >> Multiplication is often much faster than division. If you can manage
> >> to avoid pipeline stalls it is sometimes almost as fast as addition.
> >
> > int x = y/2;
> > int x = (int)(y*0.5f);
>
> No, certainly not faster.
Multiplication is not faster than division? Surely this is in error?
> In that case the following is faster:
>
> int x = y >> 1;
If y is floating point, as implied by the (int)(y*0.5f), then faster this
may be but useful it is not {:v)
--
--------------------------------------------
_ _
o o Jason Teagle
< ja...@teagster.co.uk
v
--------------------------------------------
>What you most need to know is that instance calls are slower than
>static calls. Accessing local variables is faster that accessing
>insntance variables. Private or final calls are faster than public
>ones.
>
>addition and subtraction and bit operations are fast. multiplication
>and division are slow.
transcendental functions, eg. Math.cos Math.sin Math.pow are handled
by evaluating polynomials, so they are quote slow.
Shifting by a fixed number of bits is likely to be faster than
shifting by a variable number of bits.
If you look at the bytecode, you want your conditional jumps arranged
so in the usual case the fall through.
^ ~ are very quick, like + and -
Not true. Never say never, AND never say ALWAYS unless you are absolutely
certain.
I seem to remeber that integer multiplication on certain CDC machines was
actually done by the floating point unit and was slightly slower than a
floating point multiply.
>> Shifting by a fixed number of bits is likely to be faster than
>> shifting by a variable number of bits.
>
>How's that?
N >> M is slower than N >> 2. This is because some hardware has a
special instruction for doing shifts when you know the number of bits
in advance. It can embed it in the op code.
>
>> If you look at the bytecode, you want your conditional jumps arranged
>> so in the usual case the fall through.
>
>Not quite sure what you meant by that.
A conditional jump in assembler either jumps off to some part of the
program or falls through to the next instruction in sequence. Jumping
is painful because lookahead logic needs to be flushed. Some chips
avoid this pain by looking ahead in more than one direction.
>How can you arrange them so they fall through more often?
If you write assembler code, you have your normal case is a line, and
jumps off to the side to deal with the strange cases. It is hard to
do that is java since the IF is the fall through which usually is the
exceptional case.