Is there a way to send an interrupt from userspace to the PRU-ICSS?

450 views
Skip to first unread message

ags

unread,
Mar 7, 2017, 11:45:08 PM3/7/17
to BeagleBoard
The mechanism for generating an interrupt from a PRU to the A8 (host) is well-documented. Is there a way to send an interrupt (one of the 64 system interrupt events documented in the PRU-ICSS literature) from userspace?

From reading the TI documentation, the only two that seem to be candidates are two "mailbox" interrupts. I  recall reading something about a version of the remoteproc (or RPMsg, or virtio) drivers that utilized these mailboxes, but ultimately abandoned them as they are not available on all platforms. (that may be incorrect).

Setting a flag in PRU DRAM or shared RAM is clearly a method that will work. However, it appears that polling DRAM or shared RAM is a multi-clock task; if a PRU system interrupt can be generated, it can be polled in one clock by examining R31 bits 30/31 (if configured correctly). Is this possible?

William Hermans

unread,
Mar 7, 2017, 11:52:27 PM3/7/17
to beagl...@googlegroups.com
On Tue, Mar 7, 2017 at 9:45 PM, ags <alfred.g...@gmail.com> wrote:
The mechanism for generating an interrupt from a PRU to the A8 (host) is well-documented. Is there a way to send an interrupt (one of the 64 system interrupt events documented in the PRU-ICSS literature) from userspace?

No, there are no such things as userspace interrupts, period.

From reading the TI documentation, the only two that seem to be candidates are two "mailbox" interrupts. I  recall reading something about a version of the remoteproc (or RPMsg, or virtio) drivers that utilized these mailboxes, but ultimately abandoned them as they are not available on all platforms. (that may be incorrect).

Setting a flag in PRU DRAM or shared RAM is clearly a method that will work. However, it appears that polling DRAM or shared RAM is a multi-clock task; if a PRU system interrupt can be generated, it can be polled in one clock by examining R31 bits 30/31 (if configured correctly). Is this possible?

I get the feeling however that you're misunderstanding the purpose of an interrupt. An interrupt is a way for hardware to let software know, something has happen that may require attention. Either way you wold probably be better off thinking in the context of setting a bitfield, or in this case, a single bit.

ags

unread,
Mar 8, 2017, 12:44:20 AM3/8/17
to BeagleBoard
I see what you are saying. But (from what I've read) it seems that many would also say that you can't *receive* interrupts in userspace. I guess that's technically true, but uio does provide a way, through sysfs, to determine if an interrupt was fired.

Also from what I've read (I've been doing a lot of reading...) I thought that remoteproc was using mailboxes to communicate from userspace to the PRU - and was wondering if this "reverse-direction" interrupt mechanism couldn't also be supported by drivers (say, by writing to /dev/uio<n> - analogous to select()/poll() on /dev/uio<n> to detect an interrupt.

Granted, after going through all that machinery, it may well be faster to just set a bit in PRU memory and have PRU poll.

If this is just nonsense, even in theory, then I'd welcome an education as to why.

William Hermans

unread,
Mar 8, 2017, 1:13:01 AM3/8/17
to beagl...@googlegroups.com
So, there is a driver, although I do not know the name of this driver that populates /proc/irq/ , but if you cat /proc/interrupts, you'll easily see which number correlates to what.

root@beaglebone:~# cat /proc/interrupts
           CPU0
 16:   39266767      INTC  68 Level     gp_timer
 20:     118529      INTC  12 Level     49000000.edma_ccint
 22:         97      INTC  14 Level     49000000.edma_ccerrint
 23:          0      INTC  96 Level     44e07000.gpio
 30:          0  44e07000.gpio   6 Edge      48060000.mmc cd
 56:          0      INTC  98 Level     4804c000.gpio
 89:          0      INTC  32 Level     481ac000.gpio
122:          0      INTC  62 Level     481ae000.gpio
155:         18      INTC  72 Level     44e09000.serial
156:       1431      INTC  70 Level     44e0b000.i2c
157:         33      INTC  30 Level     4819c000.i2c
158:         13      INTC  64 Level     mmc0
159:     587731      INTC  28 Level     mmc1
167:          0      INTC  75 Level     rtc0
168:          0      INTC  76 Level     rtc0
172:    2117580      INTC  41 Level     4a100000.ethernet
173:     218976      INTC  42 Level     4a100000.ethernet
181:          0      INTC 111 Level     48310000.rng
183:         41      INTC  18 Level     musb-hdrc.0.auto
184:          1      INTC  19 Level     musb-hdrc.1.auto
185:          0      INTC  17 Level     47400000.dma-controller
186:          0      INTC   7 Level     tps65217
187:          0      INTC  16 Level     TI-am335x-adc

The reason why there are no userspace interrupts, is that this would create context switching between userspace->kernel space->userspace. Needless to say, this would create a system wide performance problem.

Now, concerning setting a bit flag in memory *somwhere*. I ran into a similar problem that was all in userspace, where I was decoding a custom CANBUS protocol, and all of the so called "non-blocking" mechanisms available through the std Linux libc API, being too slow by far. The reason I needed this mechanism, was that I have two separate processes, communicate between each other, in real-time. The CANBUS side of things decoding messages at 1Mbit/s, the other half, being a web server, displaying the decoded data, in real-time through web sockets.

Anyway, with traditional non-blocking methods I was able to switch between these two apps at around 5-8 messages a second. But when I switched to using mmap() + a structure in memory + a bit in this structure as a locking mechanism. My message count max shot up to around 1000 messages / second. Granted, actual new message values were only around 20 a second.

SO basically how I  did this was again, I had a structure where one member was a single bit( 8 bits in storage, but I only used 1 ). After that, this bit would be either 0 or 1(of course ), and depending what value this bit was, indicated which process had access to the file. Which access rotation was a real factor in this given situation. Basically, the first half process would decode, and write, then set the access bit to allow the second half access. The second half would read this value, then change the value back to 0. As implied above, this turned out to be the fastest mechanism possible by a long shot.

Using the PRU's you would not be able to use POSIX IPC shard memory as I did, but you can still use mmap() + /dev/mem/ on the user space side of the application. Which would allow you to communicate with the PRU's through a specified location in memory.

If this is not clear enough, I can elaborate further. In either case, I'd be interested in how you dealt with your problem here, and would really love to see some simplified example code.


--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/f49b6ac0-7cc8-467c-8fcf-3060b65dec05%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dennis Lee Bieber

unread,
Mar 8, 2017, 9:22:15 AM3/8/17
to beagl...@googlegroups.com
On Tue, 7 Mar 2017 21:52:21 -0700, William Hermans
<yyr...@gmail.com> declaimed the following:

>I get the feeling however that you're misunderstanding the purpose of an
>interrupt. An interrupt is a way for hardware to let software know,
>something has happen that may require attention. Either way you wold
>probably be better off thinking in the context of setting a bitfield, or in
>this case, a single bit.

Depending upon one's background, there may be different levels of
things under "interrupt".

Your focus appears to be on (external) hardware signalling for service.

I have encountered systems with the concept of a software interrupt
being used by unprivileged code to signal a request that needs to be
handled in privileged code; or even if the system didn't support
privileged/unprivileged just to give an entry point to system functions
that wouldn't change with system updates (single interrupt with a passed
index into a function table). Many tend to use the name "trap" for a
software interrupt.

Even Ti-RTOS defines something called a SW interrupt, running at a
lower priority than HW interrupts (with a suggestion that the HW interrupt
should trigger a SW interrupt if more than a minimum amount of processing
is needed -- which, granted, is the opposite of a low priority task
requesting an operation by triggering a SW interrupt)

Hardware interrupts are pretty much always asynchronous -- taking
control away from whatever process is running. Software interrupts used to
access system services are synchronous -- the running process is the one
transferring control; though they may also be asynchronous if they are part
of a priority scheme and triggering may come from code running at other
priority levels (immediately if triggered by lower priority code, deferred
if triggered by higher priority code, only to be activated when the higher
priority code exits)


--
Wulfraed Dennis Lee Bieber AF6VN
wlf...@ix.netcom.com HTTP://wlfraed.home.netcom.com/

William Hermans

unread,
Mar 8, 2017, 11:04:57 AM3/8/17
to beagl...@googlegroups.com
For the purpose of this discussion with ags, I do not think the actual definition of what an interrupt is, is quite so important, as much as how to achieve an end goal. On a single threaded "system", I also do not think asynchronous is really ever a factor. But I usually do tend to view interrupts as prioritized, and preemptive.

William Hermans

unread,
Mar 8, 2017, 11:18:48 AM3/8/17
to beagl...@googlegroups.com
On Wed, Mar 8, 2017 at 9:04 AM, William Hermans <yyr...@gmail.com> wrote:
For the purpose of this discussion with ags, I do not think the actual definition of what an interrupt is, is quite so important, as much as how to achieve an end goal. On a single threaded "system", I also do not think asynchronous is really ever a factor. But I usually do tend to view interrupts as prioritized, and preemptive.

Additionally, what I proposed, should not interfere with system interrupts much, if at all. But should complete the task as fast as the system would allow, and is blocking in nature.

One thing I did not mention however, is that even though my idea is blocking in nature, you can give processor back to the system by using sleep(), or usleep(). Instead of continuously polling to the point that you're keeping the processor so busy, it has little time to do anything else.

In my use case, I think I used usleep() with a value of 10,000, which seemed responsive enough for my purpose, and only used 1-3% processor time. 

Justin Pearson

unread,
Mar 8, 2017, 5:14:15 PM3/8/17
to BeagleBoard
According to this 2015 video from the Embedded Linux Conference, the PRU does not support asynchronous interrupts:


I think there is some sort of PRU interrupt queue, but it does not interrupt the PRU's execution. Your PRU code must explicitly monitor the PRU interrupt queue to check for an interrupt.

Alternatively, I've used a method for ARM/PRU coordination that is similar to what William Hermans described: when the ARM CPU wants to trigger something on the PRU, it writes a 1 to the bottom byte of the PRU data RAM. The PRU continuously monitors this bottom byte to watch for a change. 

-Justin

ags

unread,
Mar 8, 2017, 6:34:03 PM3/8/17
to BeagleBoard
Correct - to preserve deterministic execution, the PRU cannot be asynchronously interrupted. Polling (of some form) is required.

Back to the OP, there is a way to register a (non-async) interrupt with the PRU. One can force a system interrupt (any one of the 64 that the PRUSS recognizes) by setting a bit in the Interrupt Status Register. From userspace it looks just like writing to the PRU DRAM since it's just writing a value to mmap()'d physical address. The advantage over what's been discussed here is that depending on how it's set up, it could be faster than polling from DRAM. I will have to implement to provide actual measurements.

William Hermans

unread,
Mar 8, 2017, 7:14:56 PM3/8/17
to beagl...@googlegroups.com

On Wed, Mar 8, 2017 at 4:34 PM, ags <alfred.g...@gmail.com> wrote:
Correct - to preserve deterministic execution, the PRU cannot be asynchronously interrupted. Polling (of some form) is required.

Back to the OP, there is a way to register a (non-async) interrupt with the PRU. One can force a system interrupt (any one of the 64 that the PRUSS recognizes) by setting a bit in the Interrupt Status Register. From userspace it looks just like writing to the PRU DRAM since it's just writing a value to mmap()'d physical address. The advantage over what's been discussed here is that depending on how it's set up, it could be faster than polling from DRAM. I will have to implement to provide actual measurements.

So I remember asking Charles, some time ago, if it would be more efficient writting to DRAM, or to one of the shared memory area for the PRU. From the ARM side of things. I think perhaps in this case, writing to one of the PRU's memory area's might be more efficient. In this one case. My reasoning here is that through userspace, one would have to write out to memory through /dev/mem/ anyhow. So why not make that to a memory location where the PRU has single cycle read speeds ? One *would* have to take extra care to make sure this memory location is correct, but no more so than writing into DRAM. . . .

Something to think about anyhow.

ags

unread,
Mar 10, 2017, 4:54:00 PM3/10/17
to BeagleBoard
I've had a hard time getting any definitive responses to questions on the subject of memory access & latency. It is true that the PRU cores have faster access to DRAM that is part of the PRU-ICSS (through the 32-bit interconnect SCR) - though not single-cycle - than to system DDR. However, the ARM core accesses DDR through L3 fabric, but the PRU-ICSS through L4FAST, so I'm thinking that it can access DDR faster than PRU-ICSS memory.

I've also asked about differences in latency/throughput/contention comparing PRU-ICSS 12KB shared RAM v the 8KB data RAM. No response. Since both 8K data RAM is accessible to both PRU cores, I'm not sure what the benefit of the 12KB shared RAM is (thought I imagine there is, I just can't figure it out).

Lastly - and even more importantly - is total agreement that you have to be careful about accessing any memory correctly. I have posted several times asking about the am335x_pru_package examples (using UIO). In at least one (https://github.com/beagleboard/am335x_pru_package/blob/master/pru_sw/example_apps/PRU_PRUtoPRU_Interrupt/PRU_PRUtoPRU_Interrupt.c), there is hardcoded use of the first 8 bytes of physical memory at 0x8000_0000. I don't see how that can be OK. It may be that I don't know some secrets of Linux internals, but from a theoretical perspective, I just don't know how one can make the assumption that any part of main memory is not in use by another process unless it is guaranteed by the kernel.

William Hermans

unread,
Mar 10, 2017, 9:24:59 PM3/10/17
to beagl...@googlegroups.com

On Fri, Mar 10, 2017 at 2:53 PM, ags <alfred.g...@gmail.com> wrote:
I've had a hard time getting any definitive responses to questions on the subject of memory access & latency. It is true that the PRU cores have faster access to DRAM that is part of the PRU-ICSS (through the 32-bit interconnect SCR) - though not single-cycle - than to system DDR. However, the ARM core accesses DDR through L3 fabric, but the PRU-ICSS through L4FAST, so I'm thinking that it can access DDR faster than PRU-ICSS memory.

I've also asked about differences in latency/throughput/contention comparing PRU-ICSS 12KB shared RAM v the 8KB data RAM. No response. Since both 8K data RAM is accessible to both PRU cores, I'm not sure what the benefit of the 12KB shared RAM is (thought I imagine there is, I just can't figure it out).

Lastly - and even more importantly - is total agreement that you have to be careful about accessing any memory correctly. I have posted several times asking about the am335x_pru_package examples (using UIO). In at least one (https://github.com/beagleboard/am335x_pru_package/blob/master/pru_sw/example_apps/PRU_PRUtoPRU_Interrupt/PRU_PRUtoPRU_Interrupt.c), there is hardcoded use of the first 8 bytes of physical memory at 0x8000_0000. I don't see how that can be OK. It may be that I don't know some secrets of Linux internals, but from a theoretical perspective, I just don't know how one can make the assumption that any part of main memory is not in use by another process unless it is guaranteed by the kernel.


So here is what I meant. Of course, I have no personal hands on,but looking at things from 35k feet. I *know* writing directly to the PRU shared memory from userspace, would be, performance wise, just as fast as writing to the 512M of system DDR. Through /dev/mem/. On the PRU side however, the PRU's would have single cycle access to their own memory. So the tricky part for me here would not be making sure we're writing to the right memory location, but knowing it's possible to begin with because I have not attempted this personally. In fact my hands on experience with the PRU is limited to just setting up a couple examples, and proving to myself it would work with a 4.x kernel.

So my only real "concern" is, if it really is possible to mmap() the physical address for the PRU's shared memory, and if that could be done "safely". But I do know that if it is possible, it would be faster than reading and writing to the systems 512M DDR because of the fabric latency. From the PRU side. Not only that, from what I've read in the past, is that accessing devices, or memory through that fabric can add a little bit of non deterministic latency. So my thinking here is that "we'd" gain back our little bit of determinism that we lost using DDR.

After that, I have no idea how important what I'm talking about is to you, with your given project. Address 0x8000000h though, I seem to recall is possibly related to the kernel, or perhaps the initrd. But another thing, that I do not pretend to know 100% about is how Linux virtual memory works. So when we say we're accessing "physical memory", through mmap() we're actually accessing the device modules, or external memory through virtual memory. Which it could very well be possible the person who wrote the uio pru examples knew this going in, and it's not by accident at all. But rather by design. I'd have to look further into the gory details of everything, before I could make this determination.

William Hermans

unread,
Mar 10, 2017, 9:39:05 PM3/10/17
to beagl...@googlegroups.com
Thinking on it for a little longer, I almost want to say that the Address 0x8000000h is actually the start of Linux's virtual memory map. But I'm not 100% sure.
 I'm doing my own research for a paying project, so can't really dive into documentation for something else right now . . .

William Hermans

unread,
Mar 10, 2017, 9:59:39 PM3/10/17
to beagl...@googlegroups.com
OK, according to some dicumentation I was able to find quickly, address 0x8000000 is the base address for the start of the DDR memory on the TI EVM board. Which is very similar to the beaglebone in memory layout.

William Hermans

unread,
Mar 10, 2017, 10:30:25 PM3/10/17
to beagl...@googlegroups.com
So I would say that it is not by accident that the base address of 0x8000000 works. In fact, if you think about it a little bit. . Read the opening paragraph labeled "purpose", and replace "DSP" with "PRU", for all intents and purposes. of this discussion.

Woody Stanford

unread,
Mar 12, 2017, 8:11:29 PM3/12/17
to beagl...@googlegroups.com
William,

Thank you so much for this information. Will really help for that thread I'm doing on BB. Just trying to get the P8/9 up on my little BBBW. Its nice having a little insight into the internals of them...as much information as I can get, I'm happy about. Not quite finished reading all my emails, but give me time.

If I have any questions can I bounce them off you, bro? It takes me a little while to get things done but I'm starting your email. I do GREAT in burst mode, so feel free to continue communicating as you have done.

Woody.
 
Check out the new website at: https://woodystanford.wordpress.com/stanford-systems-home-page/
Download the current (and past) quarterly newsletter on the development of our suborbital offerings at https://woodystanford.wordpress.com/downloads/ - at the bottom of the page.
Cell: 480-740-5610


On Friday, March 10, 2017 8:30 PM, William Hermans <yyr...@gmail.com> wrote:


So I would say that it is not by accident that the base address of 0x8000000 works. In fact, if you think about it a little bit. . Read the opening paragraph labeled "purpose", and replace "DSP" with "PRU", for all intents and purposes. of this discussion.

On Fri, Mar 10, 2017 at 7:59 PM, William Hermans <yyr...@gmail.com> wrote:
OK, according to some dicumentation I was able to find quickly, address 0x8000000 is the base address for the start of the DDR memory on the TI EVM board. Which is very similar to the beaglebone in memory layout.
On Fri, Mar 10, 2017 at 7:38 PM, William Hermans <yyr...@gmail.com> wrote:
Thinking on it for a little longer, I almost want to say that the Address 0x8000000h is actually the start of Linux's virtual memory map. But I'm not 100% sure.
 I'm doing my own research for a paying project, so can't really dive into documentation for something else right now . . .
On Fri, Mar 10, 2017 at 7:24 PM, William Hermans <yyr...@gmail.com> wrote:
On Fri, Mar 10, 2017 at 2:53 PM, ags <alfred.g...@gmail.com> wrote:
I've had a hard time getting any definitive responses to questions on the subject of memory access & latency. It is true that the PRU cores have faster access to DRAM that is part of the PRU-ICSS (through the 32-bit interconnect SCR) - though not single-cycle - than to system DDR. However, the ARM core accesses DDR through L3 fabric, but the PRU-ICSS through L4FAST, so I'm thinking that it can access DDR faster than PRU-ICSS memory.

I've also asked about differences in latency/throughput/contention comparing PRU-ICSS 12KB shared RAM v the 8KB data RAM. No response. Since both 8K data RAM is accessible to both PRU cores, I'm not sure what the benefit of the 12KB shared RAM is (thought I imagine there is, I just can't figure it out).

Lastly - and even more importantly - is total agreement that you have to be careful about accessing any memory correctly. I have posted several times asking about the am335x_pru_package examples (using UIO). In at least one (https://github.com/beagleboar d/am335x_pru_package/blob/mast er/pru_sw/example_apps/PRU_PRU toPRU_Interrupt/PRU_PRUtoPRU_ Interrupt.c), there is hardcoded use of the first 8 bytes of physical memory at 0x8000_0000. I don't see how that can be OK. It may be that I don't know some secrets of Linux internals, but from a theoretical perspective, I just don't know how one can make the assumption that any part of main memory is not in use by another process unless it is guaranteed by the kernel.


So here is what I meant. Of course, I have no personal hands on,but looking at things from 35k feet. I *know* writing directly to the PRU shared memory from userspace, would be, performance wise, just as fast as writing to the 512M of system DDR. Through /dev/mem/. On the PRU side however, the PRU's would have single cycle access to their own memory. So the tricky part for me here would not be making sure we're writing to the right memory location, but knowing it's possible to begin with because I have not attempted this personally. In fact my hands on experience with the PRU is limited to just setting up a couple examples, and proving to myself it would work with a 4.x kernel.

So my only real "concern" is, if it really is possible to mmap() the physical address for the PRU's shared memory, and if that could be done "safely". But I do know that if it is possible, it would be faster than reading and writing to the systems 512M DDR because of the fabric latency. From the PRU side. Not only that, from what I've read in the past, is that accessing devices, or memory through that fabric can add a little bit of non deterministic latency. So my thinking here is that "we'd" gain back our little bit of determinism that we lost using DDR.

After that, I have no idea how important what I'm talking about is to you, with your given project. Address 0x8000000h though, I seem to recall is possibly related to the kernel, or perhaps the initrd. But another thing, that I do not pretend to know 100% about is how Linux virtual memory works. So when we say we're accessing "physical memory", through mmap() we're actually accessing the device modules, or external memory through virtual memory. Which it could very well be possible the person who wrote the uio pru examples knew this going in, and it's not by accident at all. But rather by design. I'd have to look further into the gory details of everything, before I could make this determination.
--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/CALHSORrC4qT10TPDZECzS7PDrBTA7KPqeka7zKkfGP5p2xzt7w%40mail.gmail.com.

ags

unread,
Mar 14, 2017, 12:24:48 AM3/14/17
to BeagleBoard
@William Hermans like you I won't be able to dig into the gory details of loading Linux. This is an interesting read (albeit high-level and prompting more questions). I think I can say a few things without understanding all the details:

It is correct (from detailed reading of the TI TRM) that 0x80000000 is the physical memory address of the L3 DDR.
If Linux is leaving any physical memory unmapped, unused - that's a shame. Wasted precious resource.
The PRUSS UIO driver allocates memory and exposes the physical address in userspace. If this is not used, it is also a precious wasted resource.

Now comes the subjective stuff:

I'm going to presume that Linux isn't stupid, and not count on it leaving permanently-allocated and undocumented physical memory addresses available for those that know the secret handshake.
I will use the memory allocated by the PRUSS UIO driver to communicate between userspace the PRUICSS.

If someone from TI/BeagleBoard.org responds with clarification on where I'm incorrect, I'll adjust my position. As of now, for over two  years I've been asking this same question and gotten no definitive response. Anyone know who came up with the the am335x_pru_package examples?

Thanks for your input and replies. Much appreciated.

John Syne

unread,
Mar 14, 2017, 4:20:57 AM3/14/17
to beagl...@googlegroups.com
On Mar 13, 2017, at 9:24 PM, ags <alfred.g...@gmail.com> wrote:

@William Hermans like you I won't be able to dig into the gory details of loading Linux. This is an interesting read (albeit high-level and prompting more questions). I think I can say a few things without understanding all the details:

It is correct (from detailed reading of the TI TRM) that 0x80000000 is the physical memory address of the L3 DDR.
If Linux is leaving any physical memory unmapped, unused - that's a shame. Wasted precious resource.
The PRUSS UIO driver allocates memory and exposes the physical address in userspace. If this is not used, it is also a precious wasted resource.

Now comes the subjective stuff:

I'm going to presume that Linux isn't stupid, and not count on it leaving permanently-allocated and undocumented physical memory addresses available for those that know the secret handshake.
I will use the memory allocated by the PRUSS UIO driver to communicate between userspace the PRUICSS.

If someone from TI/BeagleBoard.org responds with clarification on where I'm incorrect, I'll adjust my position. As of now, for over two  years I've been asking this same question and gotten no definitive response. Anyone know who came up with the the am335x_pru_package examples?
Please understand, that TI has nothing to do with BeagleBoard.org. Also, there is no BeagleBoard.org support staff. We are all users just like yourself and we volunteer our time to help others. If no one answers your questions, then perhaps your questions are not interesting or no one has the time to investigate answers that you need. To answer your questions, we would have to read the TRM and then do some experimentation to get the answer. Why should we do this work for you when you can do this for yourself. 

Learn how to use the tools and help yourself. For example, clone the am335x_pru_package repo and then do a “git blame <file.c>” and it will give you the e-mail of the person who wrote each line of code for <file.c>. Pick up a good book on GIT as this is a very powerful tool. 

Regards,
John

Thanks for your input and replies. Much appreciated.

On Friday, March 10, 2017 at 7:30:25 PM UTC-8, William Hermans wrote:
So I would say that it is not by accident that the base address of 0x8000000 works. In fact, if you think about it a little bit. . Read the opening paragraph labeled "purpose", and replace "DSP" with "PRU", for all intents and purposes. of this discussion.


On Fri, Mar 10, 2017 at 7:59 PM, William Hermans <yyr...@gmail.com> wrote:
OK, according to some dicumentation I was able to find quickly, address 0x8000000 is the base address for the start of the DDR memory on the TI EVM board. Which is very similar to the beaglebone in memory layout.

On Fri, Mar 10, 2017 at 7:38 PM, William Hermans <yyr...@gmail.com> wrote:
Thinking on it for a little longer, I almost want to say that the Address 0x8000000h is actually the start of Linux's virtual memory map. But I'm not 100% sure.
 I'm doing my own research for a paying project, so can't really dive into documentation for something else right now . . .

On Fri, Mar 10, 2017 at 7:24 PM, William Hermans <yyr...@gmail.com> wrote:


On Fri, Mar 10, 2017 at 2:53 PM, ags <alfred.g...@gmail.com> wrote:
I've had a hard time getting any definitive responses to questions on the subject of memory access & latency. It is true that the PRU cores have faster access to DRAM that is part of the PRU-ICSS (through the 32-bit interconnect SCR) - though not single-cycle - than to system DDR. However, the ARM core accesses DDR through L3 fabric, but the PRU-ICSS through L4FAST, so I'm thinking that it can access DDR faster than PRU-ICSS memory.

I've also asked about differences in latency/throughput/contention comparing PRU-ICSS 12KB shared RAM v the 8KB data RAM. No response. Since both 8K data RAM is accessible to both PRU cores, I'm not sure what the benefit of the 12KB shared RAM is (thought I imagine there is, I just can't figure it out).

Lastly - and even more importantly - is total agreement that you have to be careful about accessing any memory correctly. I have posted several times asking about the am335x_pru_package examples (using UIO). In at least one (https://github.com/beagleboard/am335x_pru_package/blob/master/pru_sw/example_apps/PRU_PRUtoPRU_Interrupt/PRU_PRUtoPRU_Interrupt.c), there is hardcoded use of the first 8 bytes of physical memory at 0x8000_0000. I don't see how that can be OK. It may be that I don't know some secrets of Linux internals, but from a theoretical perspective, I just don't know how one can make the assumption that any part of main memory is not in use by another process unless it is guaranteed by the kernel.


So here is what I meant. Of course, I have no personal hands on,but looking at things from 35k feet. I *know* writing directly to the PRU shared memory from userspace, would be, performance wise, just as fast as writing to the 512M of system DDR. Through /dev/mem/. On the PRU side however, the PRU's would have single cycle access to their own memory. So the tricky part for me here would not be making sure we're writing to the right memory location, but knowing it's possible to begin with because I have not attempted this personally. In fact my hands on experience with the PRU is limited to just setting up a couple examples, and proving to myself it would work with a 4.x kernel.

So my only real "concern" is, if it really is possible to mmap() the physical address for the PRU's shared memory, and if that could be done "safely". But I do know that if it is possible, it would be faster than reading and writing to the systems 512M DDR because of the fabric latency. From the PRU side. Not only that, from what I've read in the past, is that accessing devices, or memory through that fabric can add a little bit of non deterministic latency. So my thinking here is that "we'd" gain back our little bit of determinism that we lost using DDR.

After that, I have no idea how important what I'm talking about is to you, with your given project. Address 0x8000000h though, I seem to recall is possibly related to the kernel, or perhaps the initrd. But another thing, that I do not pretend to know 100% about is how Linux virtual memory works. So when we say we're accessing "physical memory", through mmap() we're actually accessing the device modules, or external memory through virtual memory. Which it could very well be possible the person who wrote the uio pru examples knew this going in, and it's not by accident at all. But rather by design. I'd have to look further into the gory details of everything, before I could make this determination.



 

--
For more options, visit http://beagleboard.org/discuss
---
You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.

roberts...@yahoo.com

unread,
Mar 14, 2017, 4:25:00 AM3/14/17
to beagl...@googlegroups.com

--------------------------------------------
On Tue, 3/14/17, John Syne <john...@gmail.com> wrote:

Subject: Re: [beagleboard] Is there a way to send an interrupt from userspace to the PRU-ICSS?
To: beagl...@googlegroups.com
Date: Tuesday, March 14, 2017, 10:20 AM


On Mar 13,
2017, at 9:24 PM, ags <alfred.g...@gmail.com>
wrote:
@William Hermans like you I won't
be able to dig into the gory details of loading Linux. This
is an interesting read (albeit high-level and prompting more
questions). I think I can say a few things without
understanding all the details:
It is
correct (from detailed reading of the TI TRM) that
0x80000000 is the physical memory address of the L3
DDR.If Linux is leaving any
physical memory unmapped, unused - that's a shame.
Wasted precious resource.The PRUSS UIO driver allocates memory
and exposes the physical address in userspace. If this is
not used, it is also a precious wasted resource.
Now comes the subjective
stuff:
I'm going to presume that Linux
isn't stupid, and not count on it leaving
permanently-allocated and undocumented physical memory
addresses available for those that know the secret
handshake.I will use the
memory allocated by the PRUSS UIO driver to communicate
between userspace the PRUICSS.
If someone from TI/BeagleBoard.org responds with
clarification on where I'm incorrect, I'll adjust my
position. As of now, for over two  years I've been
asking this same question and gotten no definitive response.
Anyone know who came up with the the am335x_pru_package
examples?Please understand,
To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/FB84CF06-9D61-4E84-82DF-B078F1DB8309%40gmail.com.

For more options, visit https://groups.google.com/d/optout.
el mai important act legislativ care a stat la baza edificiului politic si institutional al Romaniei a fost Constitutia din 28 martie 1923. O constitutie noua era ceruta imperios de transformarile aparute in societatea romaneasca dupa Marea Unire.
Reply all
Reply to author
Forward
0 new messages