Is it possible to write PRU firmware for remoteproc completely in Assembler?

592 views
Skip to first unread message

n.da...@web.de

unread,
Feb 19, 2016, 2:04:30 PM2/19/16
to BeagleBoard

Hi, 

I'm using the AM335X Starter Kit from TI with an AM3359 SoC and I use the TI Processor SDK Linux version 02.00.01.07. I managed to get remoteproc driver working, then I removed the Display module and used the flatflex connector to breakout some GPIOs of PRU1 to hook up some LEDs. I wrote a blink-led firmware in CCS v6 as described in the PRU HandsOn Lab and successfully bootet the PRU1 to let my LEDs blink.

Now I need to write a very fast code so I have to write it in Assembly language. I would like to use the AM335x PRU-ICSS Reference Guide to write my code and use the PASM compiler.

Is it possible to write pure assembler code like with the PASM and make the code work for the newer remoteproc? Or can I write assembly code in CodeComposer Studio with the TI compiler? I couldn't figure out yet how I can do that. Is there a Tutorial somewhere?

I read that TI does not support PRU so good so this is why I ask here.

Regards
Nico

John Syne

unread,
Feb 19, 2016, 3:05:30 PM2/19/16
to beagl...@googlegroups.com
I recommend that you develop your code in C and then hand optimize the assembler where required. This helps document your code and make it more manageable. I have used CCSV6 for developing PRU apps and I think the support is pretty good. Make sure you use the scripts to configure the processor memory map and bring the PRU out of reset or you will have all kinds of issues when debugging. I have used both XDS200 and Blackhawk USB560M JTAG emulators and they both work without issue. 

Regards,
John




--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

William Hermans

unread,
Feb 19, 2016, 5:02:09 PM2/19/16
to beagl...@googlegroups.com
Is it possible to write pure assembler code like with the PASM and make the code work for the newer remoteproc? Or can I write assembly code in CodeComposer Studio with the TI compiler? I couldn't figure out yet how I can do that. Is there a Tutorial somewhere?

So just like any other language in Linux, I'm sure you could write remoteproc completely in assembly. But there is no Assembler like originally for PRU's, that I'm aware of.

However, I think the more important question would be why on earth would you want to write code for remoteproc / rpmsg in assembler ? The whole idea of remoteproc / rpmsg is to abstract many of those low level details, to make using multiple processors in this way much easier.

Charles Steinkuehler

unread,
Feb 19, 2016, 6:17:09 PM2/19/16
to beagl...@googlegroups.com
The ARM side should be written in C, unless you have a _really_ good
reason not to.

For the PRU, you can code in C or ASM as desired. If you do write
assembly, you will probably want to use the C calling conventions so
you can call your assembly PRU code from C or perhaps a C shim for
remoteproc. Ultimately, it doesn't really matter what you code in as
long as you generate and process the remoteproc messages and/or
interrupts your application needs.

NOTE: The C calling conventions are in the TI compiler documentation
(spruhv7a), section 6.3 "Register Conventions" and 6.4 "Function
Structure and Calling Conventions", and there's a section on mixing
assembly and C: 6.6 "Interfacing C and C++ with Assembly Language".

http://www.ti.com/general/docs/litabsmultiplefilelist.tsp?literatureNumber=spruhv7a

--
Charles Steinkuehler
cha...@steinkuehler.net

Greg

unread,
Feb 20, 2016, 11:11:59 AM2/20/16
to BeagleBoard
The support from TI is quite extensive:


Download the C compiler manual.  There is a section which describes several ways to incorporate assembly code.
This looks like a very detailed manual, which combined with the examples in the pru support package should be very helpful.

I'm still coming up to speed on all of this, and it's complicated because you have to think about what is going on with the C compiler, remoteproc, rpmsg, and
all of the details of what is going with these sort of kernel processes and the virtIO bus mechanism.  Too much going on for a Linux newbie, I've had to retreat
and study some of the fundamentals before getting back to this (I hope!).

You need to be aware the PASM is no longer supported.  The path forward is clpru, which is the C compiler which works with the included assembler (asmpru?).
There are some differences in the way assembly code is written for the newer assembler (there are notes on this in the command line package download).

I was also able to get the examples going with the PRU cape using remoteproc and version 4 kernel (Robert Nelson's testing image).  This massively simplified the process
compared to what you see the in the TI "Hands On Labs" tutorial.  Pretty much everything with regards to remoteproc and the clpru compiler is ready-to-run.  You don't need cross-compilation
or the IDE, all can be done at the command line on the BBB.  If you prefer to operate at the command line all the tools are there.

Please correct me if I've got this wrong, but I think it's fair to say that TI has provided a wealth of information for the PRU, however, they expect further support to be coming from the community.

Here's another really great contribution by TI:

Greg

din...@gmail.com

unread,
Feb 20, 2016, 2:01:14 PM2/20/16
to BeagleBoard
Nico,

There are two prerequisites for your PRU firmware to be loaded by the remoteproc d
river:
1. PRU firmware image must be in ELF format. Only TI's clpru and the unofficial G
NU PRU toolchain support this.
2. PRU firmware must include a ".resource_table" ELF section containing a resourc
e table.

You cannot use PASM with remoteproc because PASM cannot output ELF.

Here is a GNU assembler example that can be loaded and executed by remoteproc: htt
ps://github.com/dinuxbg/pru-gcc-examples/blob/master/blinking-led/pru/main1.S . Yo
u should be able to write something similar for TI's clpru assembler. But as others
have pointed, it is more sensible to start with C and optimize only the critical p
arts of your program in assembly.

Regards,
Dimitar

John Syne

unread,
Feb 20, 2016, 2:23:40 PM2/20/16
to beagl...@googlegroups.com
This is an excellent explanation of the workings of Remoteproc/RPMSG. Thanks for sharing.

Regards,
John

Regards,
Greg

William Hermans

unread,
Feb 20, 2016, 2:45:57 PM2/20/16
to beagl...@googlegroups.com
This is an excellent explanation of the workings of Remoteproc/RPMSG. Thanks for sharing.

Regards,
John

Yeah I've seen that, or something similar it is pretty good, except there is still one problem. That explanation  implies it instructs us how to use the PRU hardware with rpmsg, and I suppose on some level it really does. But what it does not explain, is how to interact with the rest of the on chip hardware through this mechanism.

Sending text messages between ARM, and PRU processors is a good intro demonstration of the software, but it is not really the least bit useful in the real world.

Anyway, people like me who are very experienced with writing code, will be put off using rpmsg etc because of this. Is it really so much to ask for example code to demonstrate how to interact with the on die hardware ? Without having to download 1GB of pretty much useless library . . .

John Syne

unread,
Feb 20, 2016, 3:01:38 PM2/20/16
to beagl...@googlegroups.com
Hi William,

So here is how I like to use this. The PRU is performing some function and I send commands to modify that function. An example would be controlling the position of a stepper motor. The ARM app sends a new position and the PRU takes care of stepping the motor to that new location. I think of the PRU as being good at doing low latency stuff and I use RPMSG/Remoteproc to send instructions and then I get feedback on measurements from the PRU. The interface isn’t fast enough to do anything more that this. Simply flashing an LED by sending a command isn’t the best use of this technology. Changing the flashing rate or the duty cycle is more appropriate. I hope I’m answering your question. 

Regards,
John



William Hermans

unread,
Feb 20, 2016, 3:29:59 PM2/20/16
to beagl...@googlegroups.com
I hope I’m answering your question.

No, not even close. I need an answer that gives an example in code, how to use on die peripherals, through the PRU's, when using remoteproc / rpmsg. Passed that, I do not want to download a couple gigs of data for software I do not need, or even want.

What would be really good, would be a github example. Blinking an on board LED or toggling a GPIO would be the simplest, but anything demonstrating using the onboard peripherals. ADC, I2C, CAN, or even just GPIO - whichever. The ARM processor side code would not exactly be so important, except it would be a good example of how the two sides of software interact with one another.

John Syne

unread,
Feb 20, 2016, 4:08:40 PM2/20/16
to beagl...@googlegroups.com
The PRU examples that I have pointed out several times do exactly what you are asking for. Also, several other posters have shown how to build these examples without CCSV6. After you build the PRU code, you have to place it in /lib/firmware so that Remoteproc can load it into the PRU, configure resources and start the PRU code. 

Regards,
John



William Hermans

unread,
Feb 20, 2016, 6:59:41 PM2/20/16
to beagl...@googlegroups.com
The PRU examples that I have pointed out several times do exactly what you are asking for. Also, several other posters have shown how to build these examples without CCSV6. After you build the PRU code, you have to place it in /lib/firmware so that Remoteproc can load it into the PRU, configure resources and start the PRU code.
Regards,
John
 
We'll just have to agree to disagree. Since I'm a very experienced programmer who has not had any problems setting up, or writing / using software for multiple other aspects of the hardware. Somehow, it must be my fault.

Przemek Klosowski

unread,
Feb 20, 2016, 11:45:37 PM2/20/16
to beagl...@googlegroups.com
On Sat, Feb 20, 2016 at 2:45 PM, William Hermans <yyr...@gmail.com> wrote:
> Is it really so much to ask for example code to demonstrate how to interact
> with the on die hardware ? Without having to download 1GB of pretty much
> useless library . . .

William,

I must be missing something, because I see remoteproc as a
communication and management mechanism for code on CPUs other than the
main processor. The actual code that you are running on those
subsidiary processors does not depend on the mechanism you use for
talking to it (other than the parts that do the talking, of course).

In particular, running ADC, I2C or GPIO should be the same, regardless
whether you use remoteproc or not---what changes is how you tell this
code what to do.

Does it make sense to you?

William Hermans

unread,
Feb 21, 2016, 12:39:53 AM2/21/16
to beagl...@googlegroups.com
William,

I must be missing something, because I see remoteproc as a
communication and management mechanism for code on CPUs other than the
main processor. The actual code that you are running on those
subsidiary processors does not depend on the mechanism you use for
talking to it (other than the parts that do the talking, of course).

In particular, running ADC, I2C or GPIO should be the same, regardless
whether you use remoteproc or not---what changes is how you tell this
code what to do.

Does it make sense to you?

What it is suppose to do hs always made sense to me. How exactlyit is done, is another story.

with uio_prussdrv, you have a driver module, which sets various things up, loads the PRU binary, and then enables / runs the PRU(s). On the PRU side, the code runs, communicates with various peripherals as needed( usually one, if any ), and then the PRU code performs it's function as specified in assembly. Sometimes, dumping data into ddr3( as per the example ), and sometimes not.

Anyway, the above is a fairly rough description, but how each aspect communicates with the other is abundantly clear in code. Some have even attempted to describe what happens, but if you ask me inadequately. No matter though the code is pretty clear.

With remoteproc, the Documentation/*txt documentation is very minimal, and does not describe the process in which it works very well. However, the code is fairly clear as to how the ARM, and PRU sides communicate with one another( rpmsg ). However, what is not clear, is how the PRU code actually manipulates the physics on system hardware. Additionally, to confuse matters even more, the assembler has changed to a compiler( C - clpru ), and there is something like "map" files for hardware configuration that do not seem to be very well documented. Just some examples, that are not very clear as to how, or why these are even needed.

So here I am, attempting to learn a few things new to me. Documentation is very poor, TI refuses to answer any questions in relation to PRUs on their e2e forums(" go to beagleboard.org google groups . . ." ). I spend several days learning about everything PRU related, and immediately pick up the concept of uio_prussdrv. Still having a hard time with the TI C compiler on the PRU side of things, largely due to these mysterious configuration files. But no matter, the TI Assembler is fairly straight forward, the PRU instruction set is a minimal Cortex M3 set, and easy.

Anyway, for context of my competence level. Not long ago I wrote a set of processes / applications to read from the CANBUS in realtime, decode the CANBUS data, and shuffle this decoded data out over a websocket. This required me learning several aspect of Linux systems programming from scratch. Including POSIX shared memory files, socketCAN, and process spawning / management. All from scratch, since this was my first major Linux application. All of this including reverse engineering parts of the high level CANBUS protocol took me around a month. The point here is, I have no problem picking up / understanding technologies, and / or API's, libraries, and such that I've previously have had no experience with. *So long* as there is at least a little decent documentation on the subject, or I can talk to someone who does understand things that may be confusing to me.

Additionally, I'm not saying exactly that remoteproc can't be made to work, because obviously it can. What I am saying is that since the concept is so poorly documented, is still in experimental phase, and now I learn that it is slower than traditional prussdrv drivers / methods. That it's just not worth my time to even attempt to get working.

That and I *have* spent some time ( roughly a week ), *just because* I'm the type that does not mind experimenting with new technology in software. But only new technology that is not too argumentative. As my time is far too valuable to me than to screw around with technology that honestly makes very little sense to me.

Also for what it is worth. remoteproc / rpmsg in my own mind is far more useful in cases where a processor may have multiple application / general purpose cores. In that one core can be made to run Linux, while the others can be made to run bare metal - Simultaneously. Less useful on the case of the PRUs since we already have a software layer that is well documented, works very well, and quite honestly far superior to remoteproc / rpmsg in this case. If nothing else. Speed.
 

William Hermans

unread,
Feb 21, 2016, 12:53:31 AM2/21/16
to beagl...@googlegroups.com
I do expect that TI will improve the documentation on their implementation of remoteproc / rpmsg sometime in the future  though. As in the case of the X15, there are not only 4 on die PRU's, but there are 4 IPU's( 2 usable for general purpose ), and two DSP's( on the dual core A15 ). I've no idea what TI has compiler / assembler wise for these DSP's but the IPU's from what I understand are fairly new( in the context of general purpose ). So I'd assume this is where remoteproc / rpmsg will make the most sense. the on die IPU's

John Syne

unread,
Feb 21, 2016, 1:20:22 AM2/21/16
to beagl...@googlegroups.com
On Feb 20, 2016, at 9:39 PM, William Hermans <yyr...@gmail.com> wrote:

William,

I must be missing something, because I see remoteproc as a
communication and management mechanism for code on CPUs other than the
main processor. The actual code that you are running on those
subsidiary processors does not depend on the mechanism you use for
talking to it (other than the parts that do the talking, of course).

In particular, running ADC, I2C or GPIO should be the same, regardless
whether you use remoteproc or not---what changes is how you tell this
code what to do.

Does it make sense to you?

What it is suppose to do hs always made sense to me. How exactlyit is done, is another story.

with uio_prussdrv, you have a driver module, which sets various things up, loads the PRU binary, and then enables / runs the PRU(s). On the PRU side, the code runs, communicates with various peripherals as needed( usually one, if any ), and then the PRU code performs it's function as specified in assembly. Sometimes, dumping data into ddr3( as per the example ), and sometimes not.

Anyway, the above is a fairly rough description, but how each aspect communicates with the other is abundantly clear in code. Some have even attempted to describe what happens, but if you ask me inadequately. No matter though the code is pretty clear.

With remoteproc, the Documentation/*txt documentation is very minimal, and does not describe the process in which it works very well. However, the code is fairly clear as to how the ARM, and PRU sides communicate with one another( rpmsg ). However, what is not clear, is how the PRU code actually manipulates the physics on system hardware. Additionally, to confuse matters even more, the assembler has changed to a compiler( C - clpru ), and there is something like "map" files for hardware configuration that do not seem to be very well documented. Just some examples, that are not very clear as to how, or why these are even needed.
What do you mean by “how the PRU code actually manipulates the physics on system hardware?

This is standard PRU code that toggles PRU dedicated IO, sets/clears register values of peripherals, in exactly the same way as the code that you run via prussdrv which is just doing the same, but via UIO. I think you are just pulling my leg here. This is trivial stuff. What is complicated? I’m scratching my head and totally confused ;-)

Regards,
John

John Syne

unread,
Feb 21, 2016, 1:21:09 AM2/21/16
to beagl...@googlegroups.com
The IPU’s are CortexM4 processors. 

Regards,
John



William Hermans

unread,
Feb 21, 2016, 1:22:30 AM2/21/16
to beagl...@googlegroups.com
The IPU’s are CortexM4 processors. 
Regards,
John

You're just now figuring that out ?

William Hermans

unread,
Feb 21, 2016, 1:30:25 AM2/21/16
to beagl...@googlegroups.com
I think more correctly said. They're similar to a Cortex M4 that sits on an Lx host processor interconnect. So you can not just use the eabi-none gcc port to make them work . . .

John Syne

unread,
Feb 21, 2016, 1:40:16 AM2/21/16
to beagl...@googlegroups.com
Ah, so I just use CCSV6 which has all the scripts that take the CortexM4s out of reset and configures their memory map so that I can write code and debug pretty quickly. Now if you don’t use CCSV6, you have to do all that via the CortexA15s and that is going to be very difficult for development. I’ve been doing this on the OMAP5 for several years, which has many of the same features as AM5728. I also use CCSV6 for the DSPs, which have the same issues. The TI DSP C compiler is highly optimized for the C66 DSP which has many cores that operate in parallel. Also, the instrumentation provided by CCSV6 makes it possible to do very accurate measurements while running live code. This is especially important for multithreaded applications. BTW, I believe CCSV6 doesn’t need a license for code that is less than 16K. 


Regards,
John



William Hermans

unread,
Feb 21, 2016, 1:47:35 AM2/21/16
to beagl...@googlegroups.com
BTW, I believe CCSV6 doesn’t need a license for code that is less than 16K.

I believe that any TI dev board is supported in CCSv6 for free so long as the code is not used for commercial purposes. This also includes various other dev boards, which I believe includes the beaglebone boards.

However, that is not the point. I have a considerable amount of time invested into using gcc based tool chains and prefer to stick with gcc. period. I do not need all that instrumentation fluff to write code, and in fact do not require, or even want an IDE of any sort most of the time. Let alone a buggy, poor performing IDE written in java . . .

Also do us both a favor. Don't try and tell me that CCS isn't buggy, and isn't poor performing, You're not the only one whose been exposed to CCS for years . . .

John Syne

unread,
Feb 21, 2016, 1:55:11 AM2/21/16
to beagl...@googlegroups.com
On the contrary, I have personal connections with the CCSV6 developers for many years. I have helped them fix several bugs, especially related to debugging Linux kernel code back in CCSV4. After CCSV5, TI went a different directions and I could no longer use CCS for kernel debugging and went the Lauterbach route. However, for DSP development, there is nothing better period. For all the other embedded processors, TI do a pretty decent job with CCSV6. That isn’t to say there are no bugs, but they do fix them pretty quickly. I have a pretty fast desktop with lots of memory so Eclipse performs quite well for me. 

Regards,
John



John Syne

unread,
Feb 21, 2016, 2:00:49 AM2/21/16
to beagl...@googlegroups.com
This video series by Eric Wilbur explains some the things the TI C Compiler does so well that you cannot do with GCC:


You need to view all of them to see that advantage. BTW, the C66xx DSP on the AM5728 is way more powerful than the C64/C67 DSP described in the videos. 

Regards,
John



William Hermans

unread,
Feb 21, 2016, 2:04:49 AM2/21/16
to beagl...@googlegroups.com
That isn’t to say there are no bugs, but they do fix them pretty quickly. I have a pretty fast desktop with lots of memory so Eclipse performs quite well for me.

i7 4710HQ with 16GB RAM, with 2GB dedicated 860M. So it's a laptop, and the only reason why I mention dedicated graphics. It is very, very fast.

But again, that's not the point. heh. The point is, even something that is Visual Studio Code ( not the IDE but editor ) that is IDE like, can perform very much faster than any IDE. I've also stopped using VS( the IDE ) because it is also sluggish any more. and it's native code.

As it is, I actually prefer writing much of my code in sublime text. As I like many of the features is has, including dark themes I can live with . . . VIM classic mode, snippets, customizable code complete, etc.

John Syne

unread,
Feb 21, 2016, 2:11:45 AM2/21/16
to beagl...@googlegroups.com
Yep, I like Sublime Text as well. It is clearly my favorite editor, but for indexing the Linux Kernel, to include only code for the platform I’m using, I use Eclipse. This help me browse to any Linux Kernel function with a ctrl click. For Javascript, I use Webstorm and for embedded I use CCSV6.1. I use whatever tools get the job done. 

Regards,
John



William Hermans

unread,
Feb 21, 2016, 2:32:02 AM2/21/16
to beagl...@googlegroups.com
This help me browse to any Linux Kernel function with a ctrl click.

This is something Visual Studio has had / done for years, as in since  . . . well as long as I can remember. According to wikipedia, Visual Studio 6 was released in 1998, and I know it was a feature in VS6 . . . at any rate it is why I've used Visual Studio for many years. If for nothing else, "function explorer". Which works fine with any source even if that source can not be compiled with VS ;)

Now days. find, and grep take the place of many tools. As well as many other command line utilities . . .

The only "compiler" that I'll put up with and is not gcc. Is actually not a compiler but is TI's PRU Assembler. I'd also might tolerate clpru in the future if I ever get around to reading the manual for it. BUt the PRU is a special case, where I feel that community based open source tools are not good enough, and probably never will be.

So, when you use a tool chain based on gcc. As well as all the wonderful Linux command line utilities. IDE "tools" are no longer necessary, and are in fact less efficient. GUI's tend to get in the way, in this context.

William Hermans

unread,
Feb 21, 2016, 2:35:45 AM2/21/16
to beagl...@googlegroups.com
Studio for many years. If for nothing else, "function explorer". Which works fine with any source even if that source can not be compiled with VS ;)

By the way, sublime text 3 has this built in now too.

John Syne

unread,
Feb 21, 2016, 3:42:41 PM2/21/16
to beagl...@googlegroups.com
Not true. The Kernel supports so many architectures and most indexers cannot deal with this in an intelligent way. BTW, I use Visual Studio Code which support Typescript and runs on any platform. I have used cscope and several other indexers in the past, but there is no way to teach them about that you are using the ARM architecture. So when you look for the source for a function, you get dozens of references and that just slows things down. Using “git grep”, grep, ack, etc also produce multiple references and that is unacceptable. 

Regards,
John



William Hermans

unread,
Feb 21, 2016, 5:42:02 PM2/21/16
to beagl...@googlegroups.com
Visual studio code is *not* Visual Studio. Visual Studio code is a text editor meant for web development, but *can* be used for other languages. Just as any other text editor can be used as such.

Visual Studio on the other hand is a full blown IDE that has had features in the past that no other IDE's could rival, or even compare to. If Eclipse can index this stuff you're talking about. So can Visual Studio. As Visual Studio is light years ahead of Eclipse, no doubt. The problem with Visual Studio however, is that once you stray outside of cl.exe( in the context of C/C++ ), setup increasingly gets more difficult. But the compiler *can* be "changed out", and the debugging system can be made to work with gcc tools if you understand how. Honestly though, I personally do not find the effort worth it anymore.

grep works just fine if you understand how to use it correctly.

John Syne

unread,
Feb 21, 2016, 5:56:03 PM2/21/16
to beagl...@googlegroups.com
When you have made VS index the Linux Kernel, then we can talk, but speculating that it can be done is senseless. Here is a simple exercise to prove my point. In two minutes, can you define the call sequence for say the ti_am335x_adc probe function. In other words, how does the tiadc_probe function get called? Start with the "module_platform_driver(tiadc_driver)” on line 594. 

Regards,
John



William Hermans

unread,
Feb 21, 2016, 6:05:59 PM2/21/16
to beagl...@googlegroups.com
This isn't a pissing contest John. Go out and look into it on the VS front if you want to. Otherwise don't worry about it.

John Syne

unread,
Feb 21, 2016, 8:24:22 PM2/21/16
to beagl...@googlegroups.com
You are right, and I didn’t expect you to take on the challenge. I was only making the point that browsing the Linux Kernel isn’t trivial and many of the online indexers have a long way to go to become useful. Anyway, thanks for playing ;-)

Regards,
John



Reply all
Reply to author
Forward
0 new messages