openamp crashing: rsc_table versus device-tree versus linker script versus application settings

359 views

Skip to first unread message

justthecook

unread,

Feb 14, 2019, 5:28:34 PM2/14/19

to open-amp

Hi,

We are using AMP solution on Zynq with Linux (xilinx v4.14) on Core 0 and FreeRTOS v10 on Core 1. We are seeing the system crash under heavy load across Open-AMP (either 2018.04 or 2017.10). Our BSP is 2018.2 release. We are using a custom Yocto-based Linux build flow with custom proxy app.

We either see failure on Core 1 in interrupt mask routine, task notify from ISR routine, or vPortEnterCritical. On Core 0, I usually see it fail in dma_inv_range or mac_b_interrupt. There have been many changes over the years since release of original XAPP/UG promoting Zynq Linux/FreeRTOS AMP solution.

I suspect a DMA or caching issue, but I've exhausted all my resources.

A couple questions:

1) Does Core 1 still need to mark certain areas as uncacheable? If so, what addresses (e.g. entire linkerscript region? or only vrings, or ...) ? Or, does -DUSE_AMP take care of this ?

2) Is there anything special to do on Core 1 for private CPU interrupts (i.e. IRQ 29 for FreeRTOS, IRQ 31 from FPGA) ? I though the only IRQs to worry about are shared IRQs (and use the map to cpu function).

3) Do I need to hack the boot.S file like the original XAPP/UG to mark regions as reserverd ? If so, what region?

4) Do I need to disable OCM using Xil_SetTlbAttributes(0xFFFF0000, 0x04de2) on Core 1? What about Core 0? Do I need to do anything special in Linux or the Linux-build.

Here is my device tree snippet for the carveout:

reserved-memory {
        #address-cells = <1>;
        #size-cells = <1>;
        ranges;
        rproc_0_reserved: rproc@08000000 {
            no-map;
            reg = <0x08000000 0xff00000>; /* start address: 128 MiB, length: 255 MiB */
        };
    };

    amba {
        elf_ddr_0: ddr@08000000 {
            compatible = "mmio-sram";
            reg = <0x08000000 0x7f00000>; /* start: 128 MiB, len: 127 MiB */
        };
    };

    remoteproc {
        compatible = "xlnx,zynq_remoteproc";
        vring0 = <15>;
        vring1 = <14>;
        srams = <&elf_ddr_0>;
        interrupt-parent = <&intc>;
        interrupts = < 0 34 4 0 35 4 0 36 4 >;
        firmware = "freertos";

Here is my rsc_table settings snippet:

RING_TX 0x0800_0000

RING_RX 0x0800_4000

RSC_RPROC_MEM 0x1000_0000, 0x1000_0000, 0x0010_0000, 0

Here is my lscript settings snippet:

ddr_0 ORIGIN= 0x0800_0000, LENGTH=127M

We are using a RPC_BUFF_SIZE of 2048.

Many thanks,

Jiaying Liang

unread,

Feb 15, 2019, 1:19:56 PM2/15/19

to open...@googlegroups.com

From: open...@googlegroups.com [mailto:open...@googlegroups.com] On Behalf Of justthecook
Sent: Thursday, February 14, 2019 2:29 PM
To: open-amp <open...@googlegroups.com>
Subject: [open-amp] openamp crashing: rsc_table versus device-tree versus linker script versus application settings

Hi,

I suspect a DMA or caching issue, but I've exhausted all my resources.

A couple questions:

1) Does Core 1 still need to mark certain areas as uncacheable? If so, what addresses (e.g. entire linkerscript region? or only vrings, or ...) ? Or, does -DUSE_AMP take care of this ?

[Wendy] For the shared memory between Core0 and core1, they needs to be uncacheable. –DUSE_AMP will not take care of it. You will need to map those memory as uncacheable from the application.

[Wendy] as Linux is running on Core0 already, core 1 cannot reinitialize the GIC. The –DUSE_AMP=1 will take care of it.

3) Do I need to hack the boot.S file like the original XAPP/UG to mark regions as reserverd ? If so, what region?

[Wendy] Depends on which XAPP you are using. You can raise this question to Xilinx support portal.

4) Do I need to disable OCM using Xil_SetTlbAttributes(0xFFFF0000, 0x04de2) on Core 1? What about Core 0? Do I need to do anything special in Linux or the Linux-build.

[Wendy] When Linux runs, it doesn’t use OCM. Not clear on if you application runs on Core 1 uses OCM

[Wendy] Just to isolate the issue, how about if you just let the FreeRTOS application loops in main but not doing any IPC yet to see if you have any issues so that you will know whether the boot code need to change, as you mentioned you also have issues on Core0 after you loads Core1.

Best Regards,

Wendy

Many thanks,

--
You received this message because you are subscribed to the Google Groups "open-amp" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-amp+u...@googlegroups.com.
To post to this group, send email to open...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

justthecook

unread,

Feb 19, 2019, 4:00:56 PM2/19/19

to open-amp

My responses inline...

On Friday, February 15, 2019 at 1:19:56 PM UTC-5, Jiaying Liang wrote:

From: open...@googlegroups.com [mailto:open...@googlegroups.com] On Behalf Of justthecook
Sent: Thursday, February 14, 2019 2:29 PM
To: open-amp <open...@googlegroups.com>
Subject: [open-amp] openamp crashing: rsc_table versus device-tree versus linker script versus application settings

Hi,

We are using AMP solution on Zynq with Linux (xilinx v4.14) on Core 0 and FreeRTOS v10 on Core 1. We are seeing the system crash under heavy load across Open-AMP (either 2018.04 or 2017.10). Our BSP is 2018.2 release. We are using a custom Yocto-based Linux build flow with custom proxy app.

We either see failure on Core 1 in interrupt mask routine, task notify from ISR routine, or vPortEnterCritical. On Core 0, I usually see it fail in dma_inv_range or mac_b_interrupt. There have been many changes over the years since release of original XAPP/UG promoting Zynq Linux/FreeRTOS AMP solution.

I suspect a DMA or caching issue, but I've exhausted all my resources.

A couple questions:

1) Does Core 1 still need to mark certain areas as uncacheable? If so, what addresses (e.g. entire linkerscript region? or only vrings, or ...) ? Or, does -DUSE_AMP take care of this ?

[Wendy] For the shared memory between Core0 and core1, they needs to be uncacheable. –DUSE_AMP will not take care of it. You will need to map those memory as uncacheable from the application.

[AJ] Ok, i marked as inner cached only just like in boot.S from XAPP1078. My region is from 0x08000000 thru 0x17f00000. One question, does the shared region also include the region identified in the lscript.ld file where the ELF resides? Or is it the entire region marked in the device-tree where rproc_0_reserved is located?

2) Is there anything special to do on Core 1 for private CPU interrupts (i.e. IRQ 29 for FreeRTOS, IRQ 31 from FPGA) ? I though the only IRQs to worry about are shared IRQs (and use the map to cpu function).

[Wendy] as Linux is running on Core0 already, core 1 cannot reinitialize the GIC. The –DUSE_AMP=1 will take care of it.

[AJ] What about interrupt nesting with OpenAMP? Do I have to modify libmetal to allow for nested interrupts? If I have a high-priority interrupt that can never interrupted even by OpenAMP, can I achieve this ?

3) Do I need to hack the boot.S file like the original XAPP/UG to mark regions as reserverd ? If so, what region?

[Wendy] Depends on which XAPP you are using. You can raise this question to Xilinx support portal.

[AJ] OK. I posted to Xilinx forums as well. I saw a post there talking about disabling branch prediction using this post:https://forums.xilinx.com/t5/OpenAMP/AMP-hangs/m-p/652410

4) Do I need to disable OCM using Xil_SetTlbAttributes(0xFFFF0000, 0x04de2) on Core 1? What about Core 0? Do I need to do anything special in Linux or the Linux-build.

[Wendy] When Linux runs, it doesn’t use OCM. Not clear on if you application runs on Core 1 uses OCM

[AJ] I am not using OCM on Core1.

Here is my device tree snippet for the carveout:

reserved-memory {
        #address-cells = <1>;
        #size-cells = <1>;
        ranges;
        rproc_0_reserved: rproc@08000000 {
            no-map;
            reg = <0x08000000 0xff00000>; /* start address: 128 MiB, length: 255 MiB */
        };
    };

    amba {
        elf_ddr_0: ddr@08000000 {
            compatible = "mmio-sram";
            reg = <0x08000000 0x7f00000>; /* start: 128 MiB, len: 127 MiB */
        };
    };

    remoteproc {
        compatible = "xlnx,zynq_remoteproc";
        vring0 = <15>;
        vring1 = <14>;
        srams = <&elf_ddr_0>;
        interrupt-parent = <&intc>;
        interrupts = < 0 34 4 0 35 4 0 36 4 >;
        firmware = "freertos";

Here is my rsc_table settings snippet:

RING_TX 0x0800_0000

[AJ] I fixed this to be 0x1000_0000 as I believe I had an error.

RING_RX 0x0800_4000

[AJ] I also fixed this to be 0x1000_4000 as i believe I had an error.

RSC_RPROC_MEM 0x1000_0000, 0x1000_0000, 0x0010_0000, 0

Here is my lscript settings snippet:

ddr_0 ORIGIN= 0x0800_0000, LENGTH=127M

We are using a RPC_BUFF_SIZE of 2048.

[Wendy] Just to isolate the issue, how about if you just let the FreeRTOS application loops in main but not doing any IPC yet to see if you have any issues so that you will know whether the boot code need to change, as you mentioned you also have issues on Core0 after you loads Core1.

[AJ] I did try only allowing lwip_main to run on Core1 across OpenAMP, with lwip_perf. lwip_perf seems to run fine by itself . It's when my high priority interrupt (priority 144) and my other thread (including tcp across OpenAMP) runs in conjunction I get locking of system in either Core0: arm_heavy_mb, macb_interrupt, or v7_dma_clean_range or Core1: vPortEnterCritical or ulPortSetInterruptMask.

wendy...@xilinx.com

unread,

Feb 27, 2019, 6:59:18 PM2/27/19

to open-amp

On Tuesday, February 19, 2019 at 1:00:56 PM UTC-8, justthecook wrote:

My responses inline...

On Friday, February 15, 2019 at 1:19:56 PM UTC-5, Jiaying Liang wrote:

From: open...@googlegroups.com [mailto:open...@googlegroups.com] On Behalf Of justthecook
Sent: Thursday, February 14, 2019 2:29 PM
To: open-amp <open...@googlegroups.com>
Subject: [open-amp] openamp crashing: rsc_table versus device-tree versus linker script versus application settings

Hi,

We are using AMP solution on Zynq with Linux (xilinx v4.14) on Core 0 and FreeRTOS v10 on Core 1. We are seeing the system crash under heavy load across Open-AMP (either 2018.04 or 2017.10). Our BSP is 2018.2 release. We are using a custom Yocto-based Linux build flow with custom proxy app.

We either see failure on Core 1 in interrupt mask routine, task notify from ISR routine, or vPortEnterCritical. On Core 0, I usually see it fail in dma_inv_range or mac_b_interrupt. There have been many changes over the years since release of original XAPP/UG promoting Zynq Linux/FreeRTOS AMP solution.

I suspect a DMA or caching issue, but I've exhausted all my resources.

A couple questions:

1) Does Core 1 still need to mark certain areas as uncacheable? If so, what addresses (e.g. entire linkerscript region? or only vrings, or ...) ? Or, does -DUSE_AMP take care of this ?

[Wendy] For the shared memory between Core0 and core1, they needs to be uncacheable. –DUSE_AMP will not take care of it. You will need to map those memory as uncacheable from the application.

[AJ] Ok, i marked as inner cached only just like in boot.S from XAPP1078. My region is from 0x08000000 thru 0x17f00000. One question, does the shared region also include the region identified in the lscript.ld file where the ELF resides? Or is it the entire region marked in the device-tree where rproc_0_reserved is located?

The ELF text, data sections can be cacheable for Core1, for Core0, it needs to be the reserved region.

2) Is there anything special to do on Core 1 for private CPU interrupts (i.e. IRQ 29 for FreeRTOS, IRQ 31 from FPGA) ? I though the only IRQs to worry about are shared IRQs (and use the map to cpu function).

OpenAMP demo on Zynq uses soft irq for inter processors notifications.

[Wendy] as Linux is running on Core0 already, core 1 cannot reinitialize the GIC. The –DUSE_AMP=1 will take care of it.

[AJ] What about interrupt nesting with OpenAMP? Do I have to modify libmetal to allow for nested interrupts? If I have a high-priority interrupt that can never interrupted even by OpenAMP, can I achieve this ?

libmetal do not manage interrupt nesting, when you setup the interrupt with the GIC controller driver, you will need to specify the priority of interrupt with the GIC controller API.

3) Do I need to hack the boot.S file like the original XAPP/UG to mark regions as reserverd ? If so, what region?

[Wendy] Depends on which XAPP you are using. You can raise this question to Xilinx support portal.

[AJ] OK. I posted to Xilinx forums as well. I saw a post there talking about disabling branch prediction using this post:https://forums.xilinx.com/t5/OpenAMP/AMP-hangs/m-p/652410

I don't think you need to hack the boot.S to mark regions as reserved. The region can be mapped after it boots

4) Do I need to disable OCM using Xil_SetTlbAttributes(0xFFFF0000, 0x04de2) on Core 1? What about Core 0? Do I need to do anything special in Linux or the Linux-build.

[Wendy] When Linux runs, it doesn’t use OCM. Not clear on if you application runs on Core 1 uses OCM

[AJ] I am not using OCM on Core1.

you don't have to disable OCM

I cannot see how the issue is connected to OpenAMP. If Core1 impacts Core0, there can be two issues: Core1 sends too many interrupt to Core0, or Core1 corrupted Core0's system memory. if you can map the Core0 system memory to read only or not accessible to Core1. if Core0 still locks, it looks like there can be interrupt issues.

justthecook

unread,

Apr 3, 2019, 3:24:52 PM4/3/19

to open-amp

So we were able to stabilize the system to some degree, although there is either still a bug, limitation exposed, or a change in implementation required. Here is a summary of the changes I implemented.

1) Reviewed and modified boot.S to carve out DDR (i.e. reserved sections and inner-cacheable). Ensure rsc_table and device tree align with these boot.S sections. I left our boot.S as I last modified it as system is much better now.

2) Implement ARM errata 799769 (and corresponding Zynq errata) pertaining to potential live-lock scenarios. We implemented wfe (wait-for-event) in our empty loops. These changes were mostly in our board support package library source.

3) Re-prioritized all FreeRTOS (2nd core of Zynq) and thread priorities to ensure they align with FreeRTOS FPU-safe scheme. Implemented FPU-safe memory functions as suggested by FreeRTOS for Cortex-A9. Treat OpenAMP and lwIP as background services priority-wise (lowest FPU-safe); this helps to ensure priority inversion is not occurring.

4) Modified our FreeRTOS application linker script and our FreeRTOS libsrc port_asm_vectors.S to align with official FreeRTOS v10.x demo releases. There were some differences between Xilinx's port and the official FreeRTOS demo with actually proves how FPU-safe IRQ contexts can be achieved.

From a stability standpoint, we are less-likely to be able to lock-up the system (as was previously happening). Now, I can get certain 2nd-core threads to in essence stall under certain OpenAMP demands. This usually occurs when we have a lot of File and TCP activity at the same time. Our TCP activity is 2-3 Megabits per second at best. Our File activity is heaviest under file renames and moves from the FreeRTOS-side where there are a lot of fopen/fread/fwrite, etc. Our file sizes are maximum of 512KB.

So this begs to question, what performance should be achievable via OpenAMP ? Has there been any benchmarking on dual-core/multi-core systems (i.e. ARM Cortex-A9) ? Is there a reasonable fundamental limitation on the number of interrupts that can be incurred during X number of CPU clock-cycles at a given DDR performance that we should expect not to exceed? Pertaining to too many interrupts comment at bottom of last post..., how many is too many interrupts? I can essentially stall 2nd-core threads as expected, though usually they don't recover. I can iperf through 1st core into 2nd core (using lwIP's perf server) and usually survive short bursts. I can usually achieve 23 Mbit bandwidth during short bursts.

Since our implementation has a File and also a TCP over single VRing IPIs (14, 15), does this inherently create a limitation in our implementation ? Do we need to add a separate set of IPIs for TCP ? Or do we need to increase VRing/VirtIO buffers or create a separate set ? Are there any built-in diagnostics with OpenAMP to gauge how close to filling the buffers we are approaching ?