currently I'm evaluating how to port some of Xiph.org's media format
decoders to Java and I came across NestedVM which is giving some
encouraging results despite me having some problems (mostly on the
MIPS binary compilation steps).
As decoding of media formats is somewhat time critical I tried to do
some performance evaluations. The intended purpose for the Java ports
would be player applets that decode low-bitrate and low-resolution
streaming media content. There is a Ogg Theora + Vorbis player applet
named Cortado, but the Theora decoder used is incomplete and may break
on future Theora bitstreams. If possible it seems desirable to derive
a Java decoder directly from the C reference implementation.
Ogg Vorbis: I successfully converted the encoder_example and
decoder_example from libvorbis to Java classes. There is already a
pure Java port of a Vorbis decoder, so this was tested for evaluation
reasons only.
Tests were done in following Java environment:
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode, sharing)
http://savage747.sa.funpic.de/encoder_example-mips.zip (MIPS binary)
http://savage747.sa.funpic.de/encoder_example-java.zip
Usage: cat input.wav | java encoder_example - > output.ogg
On my 2 GHz AMD Athlon 64 the Java version is about 16 times slower
than native x86 code. This of course is pretty slow, but as the
encoder is doing heavy floating point math this was to be expected.
http://savage747.sa.funpic.de/decoder_example-mips.zip
http://savage747.sa.funpic.de/decoder_example-java.zip
Usage: cat input.ogg | java decoder_example - > output.pcm
The decoder is rougly 15 times slower than the native x86 code. Again
I assume the floating point math is to blame.
There's also an integer-only Vorbis decoder available (called
"Tremor"). I would expect it to perform better then the normal
libvorbis decoder (tested with decoder_example).
http://savage747.sa.funpic.de/ivorbisfile_example-mips.zip
http://savage747.sa.funpic.de/ivorbisfile_example-java.zip
Testing, however, showed that Tremor is even slower then the the
floating point decoder. I don't know where the problem is - most
likely I'm making wrong assumptions on what code will perform well
with NestedVM.
The really intersting thing would be to see how NestedVM would perform
on a Theora decoder. Using current Theora trunk (http://svn.xiph.org/
trunk/theora) I had no problems compiling the libraries, but the
example decoder programs didn't link:
mips-unknown-elf-gcc -Wall -O3 -fforce-addr -fomit-frame-pointer -
finline-functions -funroll-loops -O3 -mmemcpy -ffunction-sections -
fdata-sections -falign-functions=512 -fno-rename-registers -fno-
schedule-insns -fno-delayed-branch -freduce-all-givs -o dump_video
dump_video.o getopt.o getopt1.o -L/mnt/data/mips/lib ../lib/.libs/
libtheora.a /mnt/data/mips/lib/libogg.a
dump_video.o(.text.main+0xe08): In function `main':
: undefined reference to `ftime'
dump_video.o(.text.main+0xe18): In function `main':
: undefined reference to `ftime'
dump_video.o(.text.main+0xe20): In function `main':
: undefined reference to `ftime'
collect2: ld returned 1 exit status
make[2]: *** [dump_video] Error 1
make[2]: Leaving directory `/tmp/theora/examples'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/theora'
make: *** [all] Error 2
The Theora codec is purely integer based, so I assume NestedVM may
give usable performance. Even if the result is about 10 times slower
than native the result could still be usable in low-bitrate scenarios.
Somebody an idea how to resolve the build issue?
Thanks,
Maik Merten
http://linux.about.com/library/cmd/blcmdl3_ftime.htm
Interesting. Sounds like a pretty good use for NestedVM.
> dump_video.o(.text.main+0xe08): In function `main':
> : undefined reference to `ftime'
We must not implement ftime. According to my FreeBSD man page for
ftime:
"This interface is obsoleted by gettimeofday"
And we do implement ftime. You might want to see if there is some easy
way to get the decoder to use gettimeofday instead. If not yell and I
can probably implement ftime too.
-Brian
Actually this is the function ftime, not the command. We just don't
implement that function.
-Brian
Err.. make that gettimeofday.
-Brian
I got the example to compile now by stripping out the ftime calls in
the source code.
Toying around with the gcc compiler settings may squeeze a bit more
performance out of it. I assume that it's unlikely that NestedVM will
make some miracle performance jumps in the future? ;-)
Here are the generated files:
http://savage747.sa.funpic.de/dump_video-mips.zip
http://savage747.sa.funpic.de/dump_video-java.zip
Thanks for your help.
That is actually really good. The best we've seen is probably about 5
or 6 times slower.
> Toying around with the gcc compiler settings may squeeze a bit more
> performance out of it.
What gcc options are you using? Did you try the ones in the NestedVM
makefile? They have given the best possible performance for us.
I'm glad NestedVM has worked so well for you.
-Brian
The generated Makefiles contain
CFLAGS = -Wall -O3 -fforce-addr -fomit-frame-pointer -finline-
functions -funroll-loops -O3 -mmemcpy -ffunction-sections -fdata-
sections -falign-functions=512 -fno-rename-registers -fno-schedule-
insns -fno-delayed-branch -freduce-all-givs
I assume that optimization features leading to bigger binaries may
lead to more usage of the trampoline? It may be worth trying to omit -
finline-functions and -funroll-loops - let the JIT take care of
this ;)
> I'm glad NestedVM has worked so well for you.
I'm pretty impressed with NestedVM. So far everything I could compile
with the provided toolchain did work flawlessly.