Compiler optimisation for RPi 3.

194 views
Skip to first unread message

Andrew Back

unread,
Jan 24, 2019, 5:06:24 AM1/24/19
to gr-gsm
Hello,

Running grgsm_scanner on a Raspberry Pi 3 and attempting to optimise this for best performance. CPU is spending a lot of time around 400%, but seems to have helped by building from source with:

gnuradio and gr-gsm:

cmake -DCMAKE_C_FLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2" -DCMAKE_CXX_FLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2" ../

libosmocore: ./configure CFLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2" CPPFLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2"

However, I'm not sure if:

1. Those are actually the optimal compiler flags
2. gr-gsm uses VOLK at all (in GR core blocks used or the OOT blocks)
3. gr-gsm can otherwise be optimised to use e.g. NEON (as with OsmoTRX)

Would be good to know if any further optimisation can easily be done (don't care if builds won't run on other CPUs). Also where the heavy lifting is being done, so that if we want to further optimise we know where efforts should be focused.

Also, running volk_profile seemed to pick generic machine for pretty much everything and no mention of NEON, which I thought I might see in the output.

Finally, this is on Debian/aarch64, but going to try with optimised source builds on stock 32-bit Raspbian also.

Regards,

Andrew

Vasil Velichkov

unread,
Jan 30, 2019, 6:34:39 AM1/30/19
to gr-gsm
Hi Andrew,

On 24/01/2019 12.06, Andrew Back wrote:
> Running grgsm_scanner on a Raspberry Pi 3 and attempting to optimise
> this for best performance.

May I ask you why do you want to optimize grgsm_scanner? Do you
experience Overflows or any other problem when running it on RPi3?

> CPU is spending a lot of time around 400%, but seems to have helped by
> building from source with:
>
> gnuradio and gr-gsm:
>
> cmake -DCMAKE_C_FLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2"
> -DCMAKE_CXX_FLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2" ../
>
> libosmocore:
>
> ./configure CFLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2"
> CPPFLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2"
>
> However, I'm not sure if:
>
> 1. Those are actually the optimal compiler flags
I don't have any experience with RPi or other ARM based platforms so I
don't know which flags are optimal.
> 2. gr-gsm uses VOLK at all (in GR core blocks used or the OOT blocks)
The scanner uses some GR core blocks like pfb.channelizer_ccf but right
now I don't know how to check whether VOLK is supported and enabled.
> 3. gr-gsm can otherwise be optimised to use e.g. NEON (as with OsmoTRX)
If it is possible for OsmoTRX then it should be possible for gr-gsm and
gnuradio as well.
> Would be good to know if any further optimisation can easily be done (don't care if builds won't run on other CPUs). Also where the heavy lifting is being done, so that if we want to further optimise we know where efforts should be focused.

Although I haven't done any profiling in my opinion the two blocks you
could try to profile and optimize first are the "GSM Input Adaptor" and
"GSM Receiver".

> Also, running volk_profile seemed to pick generic machine for pretty much everything and no mention of NEON, which I thought I might see in the output.
>
> Finally, this is on Debian/aarch64, but going to try with optimised source builds on stock 32-bit Raspbian also.
And please keep us posted on your progress.

Cheers,
Vasil

Andrew Back

unread,
Jan 30, 2019, 6:53:05 AM1/30/19
to Vasil Velichkov, gr-gsm
Hi Vasil,

On Wed, 30 Jan 2019 at 11:34, Vasil Velichkov <vvvel...@gmail.com> wrote:
>
> Hi Andrew,
>
> On 24/01/2019 12.06, Andrew Back wrote:
> > Running grgsm_scanner on a Raspberry Pi 3 and attempting to optimise
> > this for best performance.
>
> May I ask you why do you want to optimize grgsm_scanner? Do you
> experience Overflows or any other problem when running it on RPi3?

It consumes ~400% CPU a lot of the time and with each run you get
slightly different results
— different ARFCNs detected and sometimes with different parameters
extracted. Certainly
appears as though the RPi struggles to keep up.

> > CPU is spending a lot of time around 400%, but seems to have helped by
> > building from source with:
> >
> > gnuradio and gr-gsm:
> >
> > cmake -DCMAKE_C_FLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2"
> > -DCMAKE_CXX_FLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2" ../
> >
> > libosmocore:
> >
> > ./configure CFLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2"
> > CPPFLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2"
> >
> > However, I'm not sure if:
> >
> > 1. Those are actually the optimal compiler flags
> I don't have any experience with RPi or other ARM based platforms so I
> don't know which flags are optimal.
> > 2. gr-gsm uses VOLK at all (in GR core blocks used or the OOT blocks)
> The scanner uses some GR core blocks like pfb.channelizer_ccf but right
> now I don't know how to check whether VOLK is supported and enabled.

OK, thanks.

> > 3. gr-gsm can otherwise be optimised to use e.g. NEON (as with OsmoTRX)
> If it is possible for OsmoTRX then it should be possible for gr-gsm and
> gnuradio as well.
> > Would be good to know if any further optimisation can easily be done (don't care if builds won't run on other CPUs). Also where the heavy lifting is being done, so that if we want to further optimise we know where efforts should be focused.
>
> Although I haven't done any profiling in my opinion the two blocks you
> could try to profile and optimize first are the "GSM Input Adaptor" and
> "GSM Receiver".

Thanks, this is useful to know.

> > Also, running volk_profile seemed to pick generic machine for pretty much everything and no mention of NEON, which I thought I might see in the output.
> >
> > Finally, this is on Debian/aarch64, but going to try with optimised source builds on stock 32-bit Raspbian also.
> And please keep us posted on your progress.

The aforementioned compiler flags helped a lot on aarch64 and the
results were much improved, but still appeared
CPU bound and not 100% reliable. Next I'm going to try some different
optimisations on 32-bit Raspbian.

I will let you know how I get on.

Cheers,

Andrew

Vasil Velichkov

unread,
Jan 30, 2019, 9:35:43 AM1/30/19
to Andrew Back, gr-gsm
Hi Andrew,

On 30/01/2019 13.52, Andrew Back wrote:
> Hi Vasil,
>
> On Wed, 30 Jan 2019 at 11:34, Vasil Velichkov <vvvel...@gmail.com> wrote:
>> Hi Andrew,
>>
>> On 24/01/2019 12.06, Andrew Back wrote:
>>> Running grgsm_scanner on a Raspberry Pi 3 and attempting to optimise
>>> this for best performance.
>> May I ask you why do you want to optimize grgsm_scanner? Do you
>> experience Overflows or any other problem when running it on RPi3?
> It consumes ~400% CPU a lot of the time and with each run you get
> slightly different results
> — different ARFCNs detected and sometimes with different parameters
> extracted.

There are some known problems not directly related to the performance,
see https://github.com/ptrkrysik/gr-gsm/issues/421

Cinaed Simson

unread,
Feb 2, 2019, 1:54:57 PM2/2/19
to gr-...@googlegroups.com
On 1/30/19 3:33 AM, Vasil Velichkov wrote:
> Hi Andrew,
>
> On 24/01/2019 12.06, Andrew Back wrote:
>> Running grgsm_scanner on a Raspberry Pi 3 and attempting to optimise
>> this for best performance.
>
> May I ask you why do you want to optimize grgsm_scanner? Do you
> experience Overflows or any other problem when running it on RPi3?
>
>> CPU is spending a lot of time around 400%, but seems to have helped by
>> building from source with:
>>
>> gnuradio and gr-gsm:

It depends upon how obtained the source for gnuradio - volk is package
which seperate from gnuradio.

But you can check the ./volk subdirectory of gnuradio and see if there's
anything in it.

Or check to see if you have

volk_profile

installed in your path.

If you do, then run it - it will probably take a while to run RP3 - make
sure the machine is idle.

Ren Lie

unread,
Mar 18, 2019, 8:35:30 AM3/18/19
to gr-gsm
Hi, Andrew

I try to build the gnuradio on the raspberry pi 3B v1.2, but I met some troubles.

Can you help me?

gnu radio version is 3.7.12 .

volk version is  1.3.1 .

---------------

cmake -DCMAKE_C_FLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2"
-DCMAKE_CXX_FLAGS="-mcpu=cortex-a53 -funsafe-math-optimizations -O2" ../

=>Success

make 
=>Error

/root/gnuradio-3.7.12.0/volk/kernels/volk/asm/neon/volk_32f_x2_dot_prod_32f_neonasm_opts.s: Assembler messages:

/root/gnuradio-3.7.12.0/volk/kernels/volk/asm/neon/volk_32f_x2_dot_prod_32f_neonasm_opts.s:46: Error: selected processor does not support `sbfx r11,r1,#2,#1' in ARM mode

volk/lib/CMakeFiles/volk_obj.dir/build.make:1462: recipe for target 'volk/lib/CMakeFiles/volk_obj.dir/__/kernels/volk/asm/neon/volk_32f_x2_dot_prod_32f_neonasm_opts.s.o' failed

make[2]: *** [volk/lib/CMakeFiles/volk_obj.dir/__/kernels/volk/asm/neon/volk_32f_x2_dot_prod_32f_neonasm_opts.s.o] Error 1

CMakeFiles/Makefile2:252: recipe for target 'volk/lib/CMakeFiles/volk_obj.dir/all' failed

make[1]: *** [volk/lib/CMakeFiles/volk_obj.dir/all] Error 2

Makefile:160: recipe for target 'all' failed

make: *** [all] Error 2



Regards,

Ren Lie
Reply all
Reply to author
Forward
0 new messages