Oprofile on Pandaboard / Omap4

581 views
Skip to first unread message

D. Arger

unread,
Dec 27, 2010, 4:57:08 PM12/27/10
to pandaboard
Hello,

Based on the information at

http://omappedia.org/wiki/Android_Debugging#OProfile_on_OMAP4

it seems the Oprofile infrastructure is working fine on OMAP4/
pandaboard in Android, but I can't seem to get it to cooperate
in Angstrom and Ubuntu.

I'm currently running with kernel-omap4-ti-2.6.35-omap4-L24.11-p5
kernel, configured to enable Oprofile, and with the newest CVS
Oprofile userland. I can successfully list the counters, can
configure events, etc., but get no counter data. I'm fairly familiar
with Oprofile, and have used it for getting counter information on
Beagle-xM, and several other platforms.

On pandaboard however, it seems Oprofile fails internally, since I
see the following in syslog:

hw perfevents: unable to reserve pmu

It seems the PMU is not correctly getting configured in arch/arm/
mach-
omap2/devices.c, with no omap4 case for initializing the PMU in the
kernel. (BTW, anyone know what the omap4 init is in mach-omap2?!!).
I found a reference to similar questions in a discussion between 'mru'
and 'robclark' on IRC:

http://pandaboard.org/pbirclogs/index.php?date=2010-10-31

Anyone have any ideas what the problem might be?

Cheers,
Dennis Arger


8<- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# opcontrol -l
oprofile: available events for CPU type "ARM Cortex-A9"

See Cortex-A9 Technical Reference Manual
Cortex A9 DDI (ARM DDI 0388E, revision r2p0)
PMNC_SW_INCR: (counter: 1, 2, 3, 4, 5, 6)
Software increment of PMNC registers (min count: 500)
...
<more detail deleted>


# opcontrol --status
Daemon running: pid 4026
Event 0: CPU_CYCLES:25000000:0:1:1
Separate options: library
vmlinux file: /boot/vmlinux-2.6.35.3
Image filter: none
Call-graph depth: 0

etc.

Måns Rullgård

unread,
Dec 29, 2010, 12:23:47 PM12/29/10
to panda...@googlegroups.com
"D. Arger" <darger.p...@gmail.com> writes:

> Hello,
>
> Based on the information at
>
> http://omappedia.org/wiki/Android_Debugging#OProfile_on_OMAP4
>
> it seems the Oprofile infrastructure is working fine on OMAP4/
> pandaboard in Android, but I can't seem to get it to cooperate
> in Angstrom and Ubuntu.
>
> I'm currently running with kernel-omap4-ti-2.6.35-omap4-L24.11-p5
> kernel, configured to enable Oprofile, and with the newest CVS
> Oprofile userland. I can successfully list the counters, can
> configure events, etc., but get no counter data. I'm fairly familiar
> with Oprofile, and have used it for getting counter information on
> Beagle-xM, and several other platforms.
>
> On pandaboard however, it seems Oprofile fails internally, since I
> see the following in syslog:
>
> hw perfevents: unable to reserve pmu
>
> It seems the PMU is not correctly getting configured in arch/arm/
> mach-
> omap2/devices.c, with no omap4 case for initializing the PMU in the
> kernel. (BTW, anyone know what the omap4 init is in mach-omap2?!!).
> I found a reference to similar questions in a discussion between 'mru'
> and 'robclark' on IRC:
>
> http://pandaboard.org/pbirclogs/index.php?date=2010-10-31
>
> Anyone have any ideas what the problem might be?

The cross-trigger interrupts need to be configured. Unfortunately,
the public TRM does not reveal how to do this.

--
Måns Rullgård
ma...@mansr.com

Woodruff, Richard

unread,
Dec 29, 2010, 4:00:58 PM12/29/10
to panda...@googlegroups.com

> From: panda...@googlegroups.com [mailto:panda...@googlegroups.com]
> On Behalf Of Måns Rullgård

> The cross-trigger interrupts need to be configured. Unfortunately,
> the public TRM does not reveal how to do this.

I think necessary bits for GP-device are incorporated in patches referenced on wiki. Secure devices also need a debug firewall dropped.

After asking around it seems the actual routing matches the ARM examples which can be found in public ARM Coresight docs. I'll attach more info below which can speed up understanding + ARM docs.

From emulation designer:
------------------------
TI has done nothing special. It is exactly done per ARM specs and ARM specs are the ones to follow.

The PMUIRQ (CPU_0) is routed via CTI_0 TRIGIN[1] to CTI. This can be routed via any CTI channel (0..3) to TRIGOUT[6] which is the MA_IRQ_1 seen by GIC.

The PMUIRQ(CPU_1) is routed via CTI_1 TRIGIN[1]. This can be routed via any CTI channel (0..3) to TRIGOUT[6] which is the MA_IRQ_2 seen by GIC.

CTI0 (Cortex_A9_0) : Base address is 0x54148000 (App view)
CTI1 (Cortex-A9-1): Base address is 0x54149000 (App view)

Step:0 CTIx : unlock the module so application writes can go through (x = 0 and 1)

*(base + 0xFB0) = 0xC5ACCE55

Step 1: Use channel 2 on CTI0 to map TRIGIN[1] to TRIGOUT[6] (channel 2 is just an example by the way)

--- Enable TRIGIN[1] to go to channel 2
*(base + 0x024) |= (1 << 2)

--- Enable channel 2 to go to TRIGOUT[6]
*(base + 0B8) |= (1 << 2)

--- Global enable for CTIx module
*(base) = 0x1

This will cause the interrupt to be generated to the GIC - GIC will have to be configured per interrupt table in Chiron spec for the CTI0
Interrupt source.

To clear the interrupt in ISR;

Disable interrupt source in PMU (need to review PMU documentation in case this is needed)
Clear the interrupt in CTI (do a SW acknowledge)
*(base + 0x10) |= (1 << 6) // CTI Interrupt acknowledge register

Step 2: Use channel 3 on CTI1 to map TRIGIN[1] to TRIGOUT[6[ (channel 3 is just an example also)

Picking two different channels may avoid issues. Alternatively you could pick same channel and use CTIGATE register to prevent channel propagation from CTIx to CTM to CTIx+1 (x=0).

--- Enable TRIGIN[1] to go to channel 3
*(base + 0x024) |= (1 << 3)

-- Enable channel 3 to go to TRIGOUT[6]
*(base + 0x0B8) |= (1 << 3)

------------------------------------

Hope that clears it up some. I think this info was passed on to TRM team to supplement TI TRM.

Regards,
Richard W.


Felix

unread,
Jan 11, 2011, 3:50:44 AM1/11/11
to pandaboard

Anyone successfully program CTI?
Is there special configuration to program CTI? I tried to program it
from kernel, a module, user application or even u-boot. But board
always hang. I tried to read 0x54148fb4(phys of Lock status register)
but it hangs.
Any suggestion?

what does these code do in their patches? Is it used to program CTI?
omap_writel(1, CM_L3INSTR_L3_3_CLKCTRL);
omap_writel(1, CM_L3INSTR_L3_INSTR_CLKCTRL);
omap_writel(2, CM_EMU_CLKSTCTRL);
while ((omap_readl(CM_EMU_CLKSTCTRL) & 0x300) != 0x300);

Thanks
Binwei

On Dec 30 2010, 5:00 am, "Woodruff, Richard" <r-woodru...@ti.com>
wrote:

Binwei Yang

unread,
Jan 12, 2011, 1:38:47 AM1/12/11
to pandaboard

I understand the CM_xxx code now. I also found the whole EMU configuration space can't be accessed.
Here is my code in u-boot, cmd_bdinfo.

*(unsigned int*)0x4A008E20=1;
*(unsigned int*)0x4A008E28=1;
printf("t1: %x, t2:%x\n",*(unsigned int*)0x4A008E20,*(unsigned int*)0x4A008E28);
//print 1 1

*(unsigned int*)0x4a307a00=2;
while ((*(unsigned int*)0x4a307a00 & 0x300) != 0x300);
printf("-- %x\n",*(unsigned int*)0x4a307a00);
//print 302

printf("-- %x\n",*(unsigned int*)(0x54148000+0xfb4));
//System hang here!


Thanks
Binwei

Felix

unread,
Jan 11, 2011, 11:23:15 AM1/11/11
to pandaboard

Here is the code I used. It's an application. I tried similar routine
in kernel it also hang. Any clue? Thank you very much in advance!

void *map_to_virt(unsigned int phys,unsigned int size)
{
char* vaddr;

int fd = open ( "/dev/mem", O_RDWR);

vaddr = (char *) mmap(NULL, size, PROT_READ|PROT_WRITE,
MAP_SHARED, fd, phys);
return vaddr;
}

void main()
{
unsigned long CTI[3] = {0x4A008000,0x4a307000,0x54148000};

unsigned char* wdbase1=map_to_virt(CTI[0],4096);
unsigned char* wdbase2=map_to_virt(CTI[1],4096);
unsigned char* wdbase3=map_to_virt(CTI[2],4096);
wdbase1+=0x820;
wdbase2+=0xa00;
wdbase3+=0xfb4;

printf("%p\t%p\t%p\n",wdbase1,wdbase2,wdbase3);
*(unsigned int*)wdbase1=1;
wdbase1+=8;
*(unsigned int*)wdbase1=1;
*(unsigned int*)wdbase2=3;
while ((*(unsigned int*)wdbase2 & 0x300) != 0x300);
printf("%x,%x,%x\n",*(unsigned int*)wdbase1,*(unsigned int*)
(wdbase1-8),*(unsigned int*)wdbase2);
//Everything is OK till here.

printf("%x\n",*(unsigned int*)wdbase3); //--------------System
hang here

Binwei Yang

unread,
Jan 17, 2011, 3:29:23 AM1/17/11
to pandaboard

Seems nobody care oprofile:-)

I solved the issue. it's because ti-omap4 branch doesn't support CONFIG_PM now. if we use TI's dev tree with PM supported, CTI can be programmed normally and oprofile can work without issue.

once CONFIG_PM is enabled, prcm_setup_regs is called, which enables all the DPLL autoidle and autogating. Then, CTI can be initialized normally.

please let me know if you need the patch.

Thanks
Binwei

Jayabharath, Goluguri

unread,
Jan 17, 2011, 12:16:17 PM1/17/11
to panda...@googlegroups.com
On Mon, Jan 17, 2011 at 2:29 AM, Binwei Yang <binw...@gmail.com> wrote:

please let me know if you need the patch.

 
Most certainly - yes

Also feel free to add any notes at : http://www.omappedia.com/wiki/Android_Debugging#OProfile_on_OMAP4

Best Regards, --Jayabharath

Joshi, Vikas

unread,
Jan 17, 2011, 1:00:33 PM1/17/11
to panda...@googlegroups.com

I have to come back to my desk to see the number. Please send updated number and I will dial in.

 

Vikas

Binwei Yang

unread,
Jan 18, 2011, 10:52:15 PM1/18/11
to panda...@googlegroups.com

Interestingly, Linux kernel doesn't give a way to set irq priority. On x86, irq priority is implied by vector number, while ARM can set priority for every irq. Doesn't anyone considered to use the feature in linux kernel?

Now all irq has the same priority so oprofile can't give any clue inside an irq routine. on x86 oprofile uses NMI so there is no such issue.
I hacked the gic_dist_init in gic.c to give oprofile a higher priority. through it's ugly. Device should have the ability.

Binwei Yang

unread,
Jan 26, 2011, 7:57:22 PM1/26/11
to panda...@googlegroups.com

OK, 2.6.35 kernel disable interrupt all the time during irq handler. So even we put PMU IRQ higher priority, it still can't profile into irq hendler. But it can solve this issue: if there are too many interrupts, PMU's irq will lost.
It's easy to reproduce, you can run iperf and then oprofile to see what happens.

Linux chooses the simplest way to handle IRQ and discard many features hardware has. Smart? Stupid?
Reply all
Reply to author
Forward
0 new messages