On Tue, Oct 2, 2012 at 1:57 PM, amix <
mixich....@gmail.com> wrote:
> On Monday, October 1, 2012 7:34:50 PM UTC+2, Jon Smirl wrote:
>>
>> Doesn't bruteFIR use libfftw? If so libfftw already has ASM level NEON
>> support in it.
>
>
> I didn't know this. Last information I had was, that the BruteFIR author did
> not want to support ARM and in the docs he wrote about asm routines, so I
> combined the two.
Looks like there are two pieces of ASM, one in libfftw and one in
brutefir. libfftw has good ARM NEON code. brutefir probably needs some
work.
Google around for implementations. One paper on filtering I looked at
said optimizing with hand NEON would be good for about 2x over what
the compiler will do.
Implementing this in hand ASM is not that hard. Let the compiler
generate an assembly listing from the C code. Then switch the build
over to use that ASM file. Start making small improvements to the ASM
code. In general a decent amount of trial and error is required to
make the fastest code. You will think some optimizations are going to
be faster and they end up being slower.
>
>>
>> I suspect the Cedar mpeg encode/decode hardware is capable of doing
>> FFTs much faster than the NEON unit. But we'd have to beg until we can
>> get some documentation for it.
>
>
> Never heard about this hardware. Do you think, that the A10 will be too slow
> for DRC?
The Cedar hardware is inside the A10 chip. It is used for video
processing. After you finish play around with NEON that is the next
place to get speed gains.
The speed gains translate into more points in the FIR filter. The more
points, the smoother it gets.
koonlab.com is not responding right now. If it comes back up it shows
you how to do million point DRC on a GPU.
>Actually, as far as I know, the most important thing seems to be
> the memory speed (and size?). I would go x86, but the idea, to have a little
The limit is FPU speed first, then memory architecture. For example an
identically clocked PPC will beat ARM by about 30% because it has a
better memory architecture.
x86 wins for two reasons - it is running at 4Ghz, and it has SSE. A
GPU can 20x what a CPU can do.
> Mele A2000 connected to my amplifier/speakers, install Linux with BruteFIR
> onto it, plug in a hard-disk with FLAC and have it serve my audio-needs is
> very nice. I even want to go further and think about connecting the
> CubieBoard to the chip of my DAC (Buffalo-II, ESS Sabre), doing something
It is interesting to work with an all digital system and use a PWM amp
with I2S input. TAS5508 is an example of this.
> like the HifiDuino, with the CubieBoard instead an Arduino. Not sure,
> whether this is realistic. It would mean, I'd have to route the input signal
> through the CubieBoard, first, apply the convolution, and feed it into the
> DAC.
>
> Greetings, Andreas
>
> --
>
>
--
Jon Smirl
jons...@gmail.com