Generating Java ports of Xiph.org's Ogg codec libraries

18 views
Skip to first unread message

maikmerten

unread,
Jul 2, 2007, 7:01:20 AM7/2/07
to NestedVM
Hello,

currently I'm evaluating how to port some of Xiph.org's media format
decoders to Java and I came across NestedVM which is giving some
encouraging results despite me having some problems (mostly on the
MIPS binary compilation steps).

As decoding of media formats is somewhat time critical I tried to do
some performance evaluations. The intended purpose for the Java ports
would be player applets that decode low-bitrate and low-resolution
streaming media content. There is a Ogg Theora + Vorbis player applet
named Cortado, but the Theora decoder used is incomplete and may break
on future Theora bitstreams. If possible it seems desirable to derive
a Java decoder directly from the C reference implementation.

Ogg Vorbis: I successfully converted the encoder_example and
decoder_example from libvorbis to Java classes. There is already a
pure Java port of a Vorbis decoder, so this was tested for evaluation
reasons only.

Tests were done in following Java environment:

java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode, sharing)


http://savage747.sa.funpic.de/encoder_example-mips.zip (MIPS binary)
http://savage747.sa.funpic.de/encoder_example-java.zip

Usage: cat input.wav | java encoder_example - > output.ogg

On my 2 GHz AMD Athlon 64 the Java version is about 16 times slower
than native x86 code. This of course is pretty slow, but as the
encoder is doing heavy floating point math this was to be expected.

http://savage747.sa.funpic.de/decoder_example-mips.zip
http://savage747.sa.funpic.de/decoder_example-java.zip

Usage: cat input.ogg | java decoder_example - > output.pcm

The decoder is rougly 15 times slower than the native x86 code. Again
I assume the floating point math is to blame.

There's also an integer-only Vorbis decoder available (called
"Tremor"). I would expect it to perform better then the normal
libvorbis decoder (tested with decoder_example).

http://savage747.sa.funpic.de/ivorbisfile_example-mips.zip
http://savage747.sa.funpic.de/ivorbisfile_example-java.zip

Testing, however, showed that Tremor is even slower then the the
floating point decoder. I don't know where the problem is - most
likely I'm making wrong assumptions on what code will perform well
with NestedVM.

The really intersting thing would be to see how NestedVM would perform
on a Theora decoder. Using current Theora trunk (http://svn.xiph.org/
trunk/theora) I had no problems compiling the libraries, but the
example decoder programs didn't link:

mips-unknown-elf-gcc -Wall -O3 -fforce-addr -fomit-frame-pointer -
finline-functions -funroll-loops -O3 -mmemcpy -ffunction-sections -
fdata-sections -falign-functions=512 -fno-rename-registers -fno-
schedule-insns -fno-delayed-branch -freduce-all-givs -o dump_video
dump_video.o getopt.o getopt1.o -L/mnt/data/mips/lib ../lib/.libs/
libtheora.a /mnt/data/mips/lib/libogg.a
dump_video.o(.text.main+0xe08): In function `main':
: undefined reference to `ftime'
dump_video.o(.text.main+0xe18): In function `main':
: undefined reference to `ftime'
dump_video.o(.text.main+0xe20): In function `main':
: undefined reference to `ftime'
collect2: ld returned 1 exit status
make[2]: *** [dump_video] Error 1
make[2]: Leaving directory `/tmp/theora/examples'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/theora'
make: *** [all] Error 2

The Theora codec is purely integer based, so I assume NestedVM may
give usable performance. Even if the result is about 10 times slower
than native the result could still be usable in low-bitrate scenarios.

Somebody an idea how to resolve the build issue?

Thanks,

Maik Merten

David Aubin

unread,
Jul 2, 2007, 8:58:00 AM7/2/07
to maikmerten, NestedVM
Looks like it can't find ftime. Check to see that it is in your path
of includes in the makefile.

http://linux.about.com/library/cmd/blcmdl3_ftime.htm

Brian Alliet

unread,
Jul 2, 2007, 9:59:50 AM7/2/07
to maikmerten, NestedVM
On Mon, Jul 02, 2007 at 04:01:20AM -0700, maikmerten wrote:
> currently I'm evaluating how to port some of Xiph.org's media format
> decoders to Java and I came across NestedVM which is giving some

Interesting. Sounds like a pretty good use for NestedVM.

> dump_video.o(.text.main+0xe08): In function `main':
> : undefined reference to `ftime'

We must not implement ftime. According to my FreeBSD man page for
ftime:

"This interface is obsoleted by gettimeofday"

And we do implement ftime. You might want to see if there is some easy
way to get the decoder to use gettimeofday instead. If not yell and I
can probably implement ftime too.

-Brian

Brian Alliet

unread,
Jul 2, 2007, 10:00:28 AM7/2/07
to David Aubin, maikmerten, NestedVM
On Mon, Jul 02, 2007 at 08:58:00AM -0400, David Aubin wrote:
> Looks like it can't find ftime. Check to see that it is in your path
> of includes in the makefile.

Actually this is the function ftime, not the command. We just don't
implement that function.

-Brian

Brian Alliet

unread,
Jul 2, 2007, 10:30:11 AM7/2/07
to maikmerten, NestedVM
On Mon, Jul 02, 2007 at 09:59:50AM -0400, Brian Alliet wrote:
> And we do implement ftime. You might want to see if there is some easy

Err.. make that gettimeofday.

-Brian

maikmerten

unread,
Jul 2, 2007, 10:50:26 AM7/2/07
to NestedVM
It seems that the lib/libc.a that is part of the NestedVM MIPS
toolchain actually has no ftime (huh? Is that even possible?).

I got the example to compile now by stripping out the ftime calls in
the source code.

maikmerten

unread,
Jul 2, 2007, 11:36:37 AM7/2/07
to NestedVM
Okay, I have now done some testing. It seems that currently the Theora
decoder is about 6-7 times slower in NestedVM when compared to the
native x86 version (which also includes some MMX assembly). Personally
I think this is a nice result for a start, albeit it means that
currently it can not decode content above 352x288 with full framerate
(25 to 30 fps) on my system.

Toying around with the gcc compiler settings may squeeze a bit more
performance out of it. I assume that it's unlikely that NestedVM will
make some miracle performance jumps in the future? ;-)

Here are the generated files:

http://savage747.sa.funpic.de/dump_video-mips.zip
http://savage747.sa.funpic.de/dump_video-java.zip

Thanks for your help.

Brian Alliet

unread,
Jul 2, 2007, 11:45:51 AM7/2/07
to maikmerten, NestedVM
On Mon, Jul 02, 2007 at 08:36:37AM -0700, maikmerten wrote:
> Okay, I have now done some testing. It seems that currently the Theora
> decoder is about 6-7 times slower in NestedVM when compared to the
> native x86 version (which also includes some MMX assembly). Personally

That is actually really good. The best we've seen is probably about 5
or 6 times slower.

> Toying around with the gcc compiler settings may squeeze a bit more
> performance out of it.

What gcc options are you using? Did you try the ones in the NestedVM
makefile? They have given the best possible performance for us.

I'm glad NestedVM has worked so well for you.

-Brian

maikmerten

unread,
Jul 2, 2007, 12:24:51 PM7/2/07
to NestedVM
> What gcc options are you using? Did you try the ones in the NestedVM
> makefile? They have given the best possible performance for us.

The generated Makefiles contain

CFLAGS = -Wall -O3 -fforce-addr -fomit-frame-pointer -finline-


functions -funroll-loops -O3 -mmemcpy -ffunction-sections -fdata-
sections -falign-functions=512 -fno-rename-registers -fno-schedule-
insns -fno-delayed-branch -freduce-all-givs

I assume that optimization features leading to bigger binaries may
lead to more usage of the trampoline? It may be worth trying to omit -
finline-functions and -funroll-loops - let the JIT take care of
this ;)

> I'm glad NestedVM has worked so well for you.

I'm pretty impressed with NestedVM. So far everything I could compile
with the provided toolchain did work flawlessly.

Reply all
Reply to author
Forward
0 new messages