This command line option forces ARMv5TE IDCT (useful for ARM9E and old XScale
cores without IWMMXT support). ARMv6 IDCT can be enabled using
'-lavdopts idct=17', it may work better.
> On a nokia n800 (300MHz omap2420):
AFAIK N800 runs at 330MHz with OS2007 and at up to 400MHz with OS2008.
In order to disable frequency scaling in OS2008 and keep it running
at 400MHz for more reliable results, you can use:
# echo null > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# echo 0 > /sys/power/op_active
>
> BENCHMARKs: VC: 122.543s VO: 0.162s A: 0.000s Sys: 1.416s =
> 124.120s
>
> So it can decode the complete video in ~2 minutes. The beagle:
>
> BENCHMARKs: VC: 193.856s VO: 0.153s A: 0.000s Sys: 2.718s =
> 196.727s
>
> Wow! That's a *lot* slower than nokia n800. A CPU with twice the
> megahertz is 50% slower!
From Cortex-A8 TRM. Instructions Cycle Timing:
Halfword: SMULxx and SMLAxx - 2 cycles
but Dual halfword: SMUAD, SMUSD - 1 cycle
ARMv5TE IDCT heavily uses SMULxx and SMLAxx instructions which take 1 cycle on
ARM9E, ARM11 and XScale.
Anyway, I suspect that the best results can be obtained when using NEON SIMD
optimizations :)
>
> The mplayer used is the one from https://garage.maemo.org/projects/mplayer/
> because that has armv6 simd and armv6 vfp optimizations.
>
> The CFLAGS used:
>
> -march=armv7-a -mtune=cortex-a8 -mfpu=vfp -mfloat-abi=softfp -
> fexpensive-optimizations -ftree-vectorize -fomit-frame-pointer -O4 -
> ffast-math
>
> I wondered why that is and got a hint from this:
>
> "Clocking rate (Crystal/DPLL/ARM core): 26.0/266/381 MHz"
>
> So the cpu is not running at 600MHz, but at 381MHz, is that expected?
> But even at 381 MHz it should be faster than an omap2.
>
> Does anyone have some idea and/or hints on this? I'll try running the
> test-idct and test-unquatize programs later this week
That would be also interesting. I'm especially interested in 'test-vfp',
because looking at TRM, seems like VFP also got a major slowdown on
Cortex-A8.
But Cortext-A9 claims to double VFP performance when compared with previous
generation :)
--
Best regards,
Siarhei Siamashka
Op 21 apr 2008, om 02:00 heeft Siarhei Siamashka het volgende
geschreven:
> On Sunday 20 April 2008, koen wrote:
>> I'm trying to do some tests to see how the cortex-a8 performs with
>> video and I'm getting very strange results with mplayer:
>>
>> The test:
>>
>> # wget
>> http://samples.mplayerhq.hu/benchmark/testsuite1/matrixbench_normdivx_vbrmp
>> 3.avi # mplayer -nosound -vo null -quiet -benchmark -loop 12 -
>> lavdopts
>> idct=16 matrixbench_normdivx_vbrmp3.avi | grep BENCHMARK
>
> This command line option forces ARMv5TE IDCT (useful for ARM9E and
> old XScale
> cores without IWMMXT support). ARMv6 IDCT can be enabled using
> '-lavdopts idct=17', it may work better.
with idct=17:
BENCHMARKs: VC: 186.421s VO: 0.143s A: 0.000s Sys: 2.025s =
188.588s
BENCHMARK%: VC: 98.8504% VO: 0.0760% A: 0.0000% Sys: 1.0736% =
100.0000%
>> That would be also interesting. I'm especially interested in 'test-
>> vfp',
> because looking at TRM, seems like VFP also got a major slowdown on
> Cortex-A8.
root@beagleboard:~/test# ./test-vfp --freq=$(dmesg | grep MHz | grep
ARM |awk -F/ '{print $5}' | awk '{print $1}')
Function: 'vector_fmul_vfp', time=123.040
Function: 'vector_fmul_reverse_vfp', time=116.570
Function: 'float_to_int16_vfp', time=143.864
Function: 'ff_float_to_int16_c', time=38.269
root@beagleboard:~/test# ./test-unquantize --freq=$(dmesg | grep MHz |
grep ARM |awk -F/ '{print $5}' | awk '{print $1}')
no cpu clock frequency specified, trying to autodetect it...
... detected as 469.6MHz
running correctness tests...
running performance tests...
dct_unquantize_h263_helper_c time=0.05625 usec per element, or 26.4
cycles (469.6MHz)
dct_unquantize_h263_special_helper_armv5te time=0.01772 usec per
element, or 8.3 cycles (469.6MHz)
root@beagleboard:~/test# ./test-idct --freq=$(dmesg | grep MHz | grep
ARM |awk -F/ '{print $5}' | awk '{print $1}') --enable-armv6
avg=-0.08, stddev=36.96, min=-168.00, max=149.00
Assuming cpu clock frequency 381MHz (ARMv6 enabled)
Please be patient and wait for the results, test requires quite a lot
of time to run...
correctness tests passed
- --- benchmarking with zero idct coefficients ---
simple_idct_armv5te time=535.2
simple_idct_put_armv5te cache=no, time=668.3
simple_idct_put_armv5te cache=yes, time=662.9
simple_idct_add_armv5te cache=no, time=890.5
simple_idct_add_armv5te cache=yes, time=744.9
simple_idct_armv5te_ref time=935.8
simple_idct_put_armv5te_ref cache=no, time=1190.6
simple_idct_put_armv5te_ref cache=yes, time=1171.2
simple_idct_add_armv5te_ref cache=no, time=1372.2
simple_idct_add_armv5te_ref cache=yes, time=1229.4
simple_idct_armv6 time=665.1
simple_idct_put_armv6 cache=no, time=934.0
simple_idct_put_armv6 cache=yes, time=754.6
simple_idct_add_armv6 cache=no, time=999.4
simple_idct_add_armv6 cache=yes, time=854.8
- --- benchmarking with random idct coefficients ---
simple_idct_armv5te time=1235.1
simple_idct_put_armv5te cache=no, time=1375.2
simple_idct_put_armv5te cache=yes, time=1367.0
simple_idct_add_armv5te cache=no, time=1617.9
simple_idct_add_armv5te cache=yes, time=1472.9
simple_idct_armv5te_ref time=1616.1
simple_idct_put_armv5te_ref cache=no, time=1863.3
simple_idct_put_armv5te_ref cache=yes, time=1843.0
simple_idct_add_armv5te_ref cache=no, time=2041.1
simple_idct_add_armv5te_ref cache=yes, time=1899.8
simple_idct_armv6 time=1038.1
simple_idct_put_armv6 cache=no, time=1299.8
simple_idct_put_armv6 cache=yes, time=1119.5
simple_idct_add_armv6 cache=no, time=1383.3
simple_idct_add_armv6 cache=yes, time=1234.1
regards,
Koen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)
iD8DBQFIDERxMkyGM64RGpERAj7HAKC3KTSetdhYxRO7k4PpSOOgLcC38gCgicGa
rxEtvKoSvUoO86tuxk0gk5w=
=8+j1
-----END PGP SIGNATURE-----
You have all of these mods already on your board
Gerald
You have all of these mods already on your board
Gerald
>
>
Is there a replacement for the DVFlasher to install the xloader easily?
Philip
Have you tried to generate MLO by your own? I haven't tried it yet, but
http://code.google.com/p/beagleboard/wiki/BeagleSourceCode
tells us:
-- cut --
Convert x-load.bin to MLO (required for MMC Boot)
1. Use the "SignGP" tool to sign the x-loader image. (“x-load.bin.ift”
file is generated in the same folder.)
./signGP x-load.bin
2. Rename x-load.bin.ift to MLO
-- cut --
X-Loader source is available via
http://elinux.org/BeagleBoard#Git
As X-Loader is a stripped down U-Boot, its include directory links to
uboot. So you need a recent U-Boot with
http://groups.google.com/group/beagleboard/browse_thread/thread/3473b44af1e6e326#
on top. Have a look to omap3530beagle.h. Currently, there is
PRCM_CLK_CFG2_266MHZ configured. Instead of this,
PRCM_CLK_CFG2_332MHZ can be enabled.
Don't know how to enable L2 cache and/or other frequencies, though.
Seems that there is no preparation for other (higher?) frequency
configuration in the public code yet?
Dirk
The POWER MODS I was refering to were hardware modifications that Gerald confirmed that it is already in place for your boards.For Enabling L2 Cache:1. I have not disabled it in X-loader, so no changes to x-loader for this. However in kernel it is disabled currently, to enabled it you have deselect the option "Disable L2 Cache"2. For running at 500 MPU, I can give out u-boot and x-loader changes, but just waiting for everyone to get their boards modified otherwise it might block others. For now, I have attached the MLO and u-boot.bin for testing. Just try this out, boot the kernel and read out the MPU clock by doingcat /proc/omap_clocks | grep "MPU"Regards,Khasim