Application for GSoC contribution and advice requested

pradan

unread,

Feb 29, 2020, 2:13:36 PM2/29/20

to BeagleBoard GSoC

Hello everyone,

I am Prashant Dandriyal, a final year undergraduate of Bachelor of Technology in Electronics and Communication Engineering (E.C.E). My experiences with embedded systems include simple electro-mechanical circuits, 8-bit microcontrollers and 32 bit ones like the EK-M4C123GXL TIVA Launchpad and the CC26X2R1 development board. The components by Texas Instruments were provided to me by Texas Instruments in part of the India Innovation Challenge and Design Contest (IICDC-2018).

I have also been drawn towards the (previously niche) field of Embedded Artificial Intelligence (Embedded AI), better knows as Edge AI or TinyML as one of its subsidaries. I have been following the TinyML community for some time now. In process of implementing Machine Learning at the edge, I have completed course work and worked on TensorFlow, TensorFlow Lite and Intel OpenVINO toolkit, all in process of shifting inference to the edge. For my final year project, I am in process of implementing On-device Learning on Low Compute-capable devices.

For the GSoC 2020, I would like to contribute to the project YOLO models on the X15/AI. As one the mentors is Mr Hunyue Yau, I would request you all to redirect me to the related communities where I can discuss the idea with the mentors.

I am also going to introduce myself in the #beagle-gsoc channel at riot.im and ask for help.

I will be indebted for this help as it will enable me to finalize my project and begin with the preparations.

Thanking You,

Prashant Dandriyal

Hunyue Yau

unread,

Mar 2, 2020, 6:40:00 PM3/2/20

to beaglebo...@googlegroups.com

Hi,

There has been a few mails on the YOLO project idea. Trying to address all of
them in one email. The assumption is the students have subscribed.

Please try catch mentors on the freenode IRC in the #beagle-gsoc channel. It
looks like you have seen the elinux page. Please note the timezones for the
potential mentors in the bottom. I am often on that channel on and off between
10:30-19:00 US/Pacific time. Do hang around the channel. Most of us will look
at what has been said on the channel and respond even if it is later so asking
and leaving immediately will not work well.

General comments - Ideally, we would like to see the YOLO model working on the
BBAI (or x15 or even a BBB!) at a full 30fps video frame rate. Having said
that, right now I am seeing around 10 seconds per frame. This is largely due
to YOLO ask implemented with the Darknet framework not taking advantage of the
hardware. There are many possible ways of working on this. Do keep in mind
GSoC is a relatively short period of time. As part of the application, there
should be a convincing explanation on why you think you can accomplish what
you propose in the time frame.

Just to throw out potential work in this general area:
- Attempt leverage the TIDL stuff to accelerate it. Right now, TIDL doesn't
support all the layers so there will have to be some of it done on the
accelerators and some of it done on the ARM.

- Attempt to use the (any day now) updated TIDL stuff with the Tensorflow lite
support to run the model. Jason might prefer this path.

- Attempt to use the model conversion tools in TIDL.

- Attempt to use OpenCL to accelerate things. Please note, a brute force
recompile of the OpenCL port of darknet does not work. Most likely, this is
due to the port focusing on OpenCL with a GPU instead of OpenCL with the DSP
as it is on the BBAI/x15. A preliminary debug suggests it is a memory problem
somewhere.

- Attempt to use the SGX GPU. Currently, only OpenGLES is supported on there.
This would basically be a port to use OpenGLES. Nice thing about this path is
it could be reused on the BBB too.

- It could be a combination of any of the above.

Caveats -
YOLOv3 may not fit on the device. YOLOv2 (or even YOLOv2-tiny) may be a more
feasible approach.

Part of this is performance, it would be good to identify what size frames is
being targeted. I have found 320x240 to be convenient as that's common to many
webcams.

On Saturday, February 29, 2020 11:13:35 pradan wrote:
> Hello everyone,
> I am Prashant Dandriyal, a final year undergraduate of Bachelor of
> Technology in Electronics and Communication Engineering (E.C.E). My
> experiences with embedded systems include simple electro-mechanical
> circuits, 8-bit microcontrollers and 32 bit ones like the EK-M4C123GXL TIVA
> Launchpad and the CC26X2R1 development board. The components by Texas

> Instruments were provided to me by Texas Instruments in part of the *India
> Innovation Challenge and Design Contest* (IICDC-2018).
>
> I have also been drawn towards the (previously niche) field of *Embedded
> Artificial Intelligence (Embedded AI)*, better knows as* Edge AI* or
> *TinyML* <https://tinyml.org/>as one of its subsidaries. I have been

> following the TinyML community for some time now. In process of
> implementing Machine Learning at the edge, I have completed course work and

> worked on *TensorFlow*, *TensorFlow Lite *and *Intel OpenVINO toolkit*, all

> in process of shifting inference to the edge. For my final year project, I

> am in process of implementing *On-device Learning on Low Compute-capable
> devices*.
> For the GSoC 2020, I would like to contribute to the project *YOLO models
> on the X15/AI*. As one the mentors is *Mr Hunyue Yau*, I would request you

> all to redirect me to the related communities where I can discuss the idea
> with the mentors.
>

> I am also going to introduce myself in the *#beagle-gsoc
> <https://matrix.to/#/!CQPHJusQpcGJkOXBfC:matrix.org?via=matrix.org>*

> channel at riot.im and ask for help.
> I will be indebted for this help as it will enable me to finalize my
> project and begin with the preparations.
>
> Thanking You,
> Prashant Dandriyal

--
Hunyue Yau
http://www.hy-research.com/

pradan

unread,

Mar 3, 2020, 6:21:19 AM3/3/20

to BeagleBoard GSoC

Thank you so much Sir. I have some queries but I'll ask them in the GSoC channel at your prescribed time.
I hope that's ok.

pradan

unread,

Mar 25, 2020, 4:42:00 PM3/25/20

to BeagleBoard GSoC

@H Sir, I am workin on your suggestion and installing TIDL to validate my assumptions about the configuration file...in the mean time, please go through my updated app : https://elinux.org/BeagleBoard/GSoC/2020Proposal/PrashantDandriyal_2 which contains the mention to the demo I was talking about. i have detailed about it in my repo :https://github.com/PrashantDandriyal/GSoC2020_YOLOModelsOnTheBB_AI/blob/master/README.md

Hunyue Yau

unread,

Mar 25, 2020, 5:42:27 PM3/25/20

to beaglebo...@googlegroups.com, pradan

Hi,

It may be useful if you expand/clarify the benefits. You speak of Edge devices
but with AI, there are 2 halves. Training and inference. Up til this point,
most other things on the Beagle (bone/boards) are pretty much the same as on a
desktop. i.e. you can natively compile things. However, this symmetry isnt
there with AI.

Also, a big thing with the YOLO models is the ability to locate and identify
at the same time without doing iterative things like a sliding window to
search. This couples nicely with limited compute power like on the x15/AI
platforms. Getting this model to work would make locate/identify more viable.

pradan

unread,

Apr 12, 2020, 6:30:21 AM4/12/20

to BeagleBoard GSoC

Hello Sir,

There seems to be a problem. If I use NNPACK or other common acceleration library, they can only help our objective by using the on-board GPUs, which is not much on the AI or x15. Alternatively using the DSP requires writing OpenCL... Its not clear how EVEs can be configured similarly...

Do you suggest any way around that ?

Hunyue Yau

unread,

Apr 12, 2020, 5:53:21 PM4/12/20

to pradan, beaglebo...@googlegroups.com

Hi Prashant,

NNPACK has 2 main acceleration strategies that should apply to every ARM
boards. I have this working on the AI and there are references to people using
it on the RPi. It uses NEON (ARM SIMD)when possible and it uses a math
identity to change things around. Convolution in time is multiplication in
frequency; so NNPack does a FT to convert it to multiplication. In some cases
tis is faster.

I think on other platforms, it can use the GPU. If we want to use the GPU, we
have a few ways of doing this:
- Expand on the GPU acceleration in NNPack.
- Expand on the CUDA/OpenCL ports of Darknet.

The risk with the GPU is we are entering unknown territory as OpenGLES 2.0 for
GPGPU is untested; in addition, the SGX is not the fastest GPU around.
Nevertheless, it may still be useful. I can share code to set things up - main
thing is to figure out how to express things as shader code.

Coding in OpenCL for the DSP is the best way to go but there may be a few
limitations. Most code/examples for OpenCL assume a GPU backend whereas we
have a DSP. From my own experiments, it seems there is a limitation on ow
much/fast we can move day between the DSP and the main ARM core.

For us mere mortals, the only way to use the EVE for this purpose is via TIDL.

A crazy idea (as in, I haven't thought it through) - can we divide up the
tasks so some of it goes through TIDL to leverage the DSP/EVE and some of it
goes via the GPU. The starting point would probally be Darknet+NNPack. Biggest
risk I see here is transfer overhead between all those components and possibly
having to waste a lot of time figuring out how to pipeline all these together
to have a reasonable throughput. Just an idea for thought.

On Sunday, April 12, 2020 03:30:20 pradan wrote:
> Hello Sir,
> There seems to be a problem. If I use NNPACK or other common acceleration
> library, they can only help our objective by using the on-board GPUs, which
> is not much on the AI or x15. Alternatively using the DSP requires writing
> OpenCL... Its not clear how EVEs can be configured similarly...
> Do you suggest any way around that ?
>
> On Sunday, March 1, 2020 at 12:43:36 AM UTC+5:30, pradan wrote:
> > Hello everyone,
> > I am Prashant Dandriyal, a final year undergraduate of Bachelor of
> > Technology in Electronics and Communication Engineering (E.C.E). My
> > experiences with embedded systems include simple electro-mechanical
> > circuits, 8-bit microcontrollers and 32 bit ones like the EK-M4C123GXL
> > TIVA
> > Launchpad and the CC26X2R1 development board. The components by Texas

> > Instruments were provided to me by Texas Instruments in part of the *India
> > Innovation Challenge and Design Contest* (IICDC-2018).
> >
> > I have also been drawn towards the (previously niche) field of *Embedded
> > Artificial Intelligence (Embedded AI)*, better knows as* Edge AI* or

> > *TinyML* <https://tinyml.org/>as one of its subsidaries. I have been

> > following the TinyML community for some time now. In process of
> > implementing Machine Learning at the edge, I have completed course work
> > and

> > worked on *TensorFlow*, *TensorFlow Lite *and *Intel OpenVINO toolkit*,

> > all in process of shifting inference to the edge. For my final year

> > project, I am in process of implementing *On-device Learning on Low
> > Compute-capable devices*.
> > For the GSoC 2020, I would like to contribute to the project *YOLO models

> > on the X15/AI*. As one the mentors is *Mr Hunyue Yau*, I would request

> > you all to redirect me to the related communities where I can discuss the
> > idea with the mentors.
> >

> > I am also going to introduce myself in the *#beagle-gsoc
> > <https://matrix.to/#/!CQPHJusQpcGJkOXBfC:matrix.org?via=matrix.org>*

> > channel at riot.im and ask for help.
> > I will be indebted for this help as it will enable me to finalize my
> > project and begin with the preparations.
> >
> > Thanking You,
> > Prashant Dandriyal

pradan

unread,

Apr 14, 2020, 2:21:33 PM4/14/20

to BeagleBoard GSoC

Hello Sir,

I am working on both the paths:

1) Trying to get best model compression using automatic layer grouping of the "model import" feature. Although, the YOLO v2-tiny model converted to TensorFlow, seems to have many unsupported layers (probably due to the un-optimised conversion from Darknet).

2) Understanding the methods to use the OpenCL ports of Darknet. There are some good ports on the web but they still fail to meet our objectives. (only TIDL seems to be our saviour).

Meanwhile, I stumbled upon this mindblowing work https://www.jevoisinc.com/pages/Examples... where they manage to get an FPS of above 70+. In running the YOLO models, they show a FPS of 15 here http://jevois.org/moddoc/DarknetYOLO/modinfo.html. I am trying to understand their backend now. This is the code I could find till now http://jevois.org/basedoc/classDarknetYOLO.html http://jevois.org/basedoc/classDarknetYOLO.html. Will update you in tomorrow's meeting.

On Sunday, March 1, 2020 at 12:43:36 AM UTC+5:30, pradan wrote:

Hunyue Yau

unread,

Apr 15, 2020, 3:51:15 AM4/15/20

to beaglebo...@googlegroups.com, pradan

Hi,

Maybe I missed it but...
Where did they say they can do inference at 70fps? From what I saw, that's how
fast they can pull frames out of USB. Even 15 doesn't seem to be for
inference.

On Tuesday, April 14, 2020 11:21:33 pradan wrote:
> Hello Sir,
> I am working on both the paths:
> 1) Trying to get best model compression using automatic layer grouping of
> the "model import" feature. Although, the YOLO v2-tiny model converted to
> TensorFlow, seems to have many unsupported layers (probably due to the
> un-optimised conversion from Darknet).
> 2) Understanding the methods to use the OpenCL ports of Darknet. There are
> some good ports on the web but they still fail to meet our objectives.
> (only TIDL seems to be our saviour).

> Meanwhile, I stumbled upon this *mindblowing * work

> https://www.jevoisinc.com/pages/Examples... where they manage to get an FPS
> of above 70+. In running the YOLO models, they show a FPS of 15 here
> http://jevois.org/moddoc/DarknetYOLO/modinfo.html. I am trying to
> understand their backend now. This is the code I could find till now
> http://jevois.org/basedoc/classDarknetYOLO.html
> http://jevois.org/basedoc/classDarknetYOLO.html. Will update you in
> tomorrow's meeting.
>
> On Sunday, March 1, 2020 at 12:43:36 AM UTC+5:30, pradan wrote:
> > Hello everyone,
> > I am Prashant Dandriyal, a final year undergraduate of Bachelor of
> > Technology in Electronics and Communication Engineering (E.C.E). My
> > experiences with embedded systems include simple electro-mechanical
> > circuits, 8-bit microcontrollers and 32 bit ones like the EK-M4C123GXL
> > TIVA
> > Launchpad and the CC26X2R1 development board. The components by Texas