[PATCH] HACK: OMAP: DSS2: Fix for "DISPC error: GFX_FIFO_UNDERFLOW" on OMAP4

855 views
Skip to first unread message

Siarhei Siamashka

unread,
Nov 6, 2010, 11:09:58 PM11/6/10
to panda...@googlegroups.com, Siarhei Siamashka
On pandaboard EA1 (OMAP4430 ES2.0), when doing graphics intensive
operations involving massive updates to the framebuffer, the display
often flickers and eventually turns off with the following error
message in dmesg log:
"omapdss DISPC error: GFX_FIFO_UNDERFLOW, disabling GFX"

The bug can be easily reproduced by running x11perf program:
$ x11perf -copywinwin500

This patch tries to solve the problem by setting burst size to 32
bytes, same as on the other OMAP variants. And it helps. I have not
checked whether larger 64 byte bursts can be safely enabled with
some better fix.

I'm currently using "L24.11" branch from
http://dev.omapzoom.org/?p=integration/kernel-omap4.git;a=summary
---
drivers/video/omap2/dss/display.c | 8 ++------
drivers/video/omap2/dss/dsi.c | 2 +-
drivers/video/omap2/dss/dss.h | 10 ++++++++++
drivers/video/omap2/dss/manager.c | 2 +-
4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/video/omap2/dss/display.c b/drivers/video/omap2/dss/display.c
index d2b0b69..5cd5233 100644
--- a/drivers/video/omap2/dss/display.c
+++ b/drivers/video/omap2/dss/display.c
@@ -378,12 +378,8 @@ void default_get_overlay_fifo_thresholds(enum omap_plane plane,
{
unsigned burst_size_bytes;

- *burst_size = OMAP_DSS_BURST_16x32;
- if (cpu_is_omap44xx())
- burst_size_bytes = 8 * 128 / 8; /* OMAP4: highest
- burst size is 8x128*/
- else
- burst_size_bytes = 16 * 32 / 8;
+ *burst_size = omap_dss_burst_16x32();
+ burst_size_bytes = 16 * 32 / 8;

*fifo_high = fifo_size - 1;
*fifo_low = fifo_size - burst_size_bytes;
diff --git a/drivers/video/omap2/dss/dsi.c b/drivers/video/omap2/dss/dsi.c
index 06ca944..f37b2e7 100644
--- a/drivers/video/omap2/dss/dsi.c
+++ b/drivers/video/omap2/dss/dsi.c
@@ -3782,7 +3782,7 @@ void dsi_get_overlay_fifo_thresholds(enum omap_plane plane,
{
unsigned burst_size_bytes;

- *burst_size = OMAP_DSS_BURST_16x32;
+ *burst_size = omap_dss_burst_16x32();
burst_size_bytes = 16 * 32 / 8;

*fifo_high = fifo_size - burst_size_bytes;
diff --git a/drivers/video/omap2/dss/dss.h b/drivers/video/omap2/dss/dss.h
index 569d0b2..80d595f 100644
--- a/drivers/video/omap2/dss/dss.h
+++ b/drivers/video/omap2/dss/dss.h
@@ -108,6 +108,16 @@ enum omap_burst_size {
OMAP_DSS_BURST_16x32 = 2,
};

+static inline omap_dss_burst_16x32()
+{
+ if (cpu_is_omap44xx()) {
+ /* 4x128bit bursts from OMAP4 have the same size as 16x32 */
+ return 1;
+ } else {
+ return 2;
+ }
+}
+
enum omap_parallel_interface_mode {
OMAP_DSS_PARALLELMODE_BYPASS, /* MIPI DPI */
OMAP_DSS_PARALLELMODE_RFBI, /* MIPI DBI */
diff --git a/drivers/video/omap2/dss/manager.c b/drivers/video/omap2/dss/manager.c
index 2809dcd..69bd50e 100644
--- a/drivers/video/omap2/dss/manager.c
+++ b/drivers/video/omap2/dss/manager.c
@@ -1644,7 +1644,7 @@ int omap_dss_wb_apply(struct omap_overlay_manager *mgr, struct omap_writeback *w
wbc->enabled = true;

wbc->capturemode = wb->info.capturemode;
- wbc->burst_size = OMAP_DSS_BURST_16x32; /* 8x128 - min. for OMAP4 */
+ wbc->burst_size = omap_dss_burst_16x32();
wbc->source = wb->info.source;

/* TODO: Set fifo high, fifo low values ? */
--
1.7.2.2

Siarhei Siamashka

unread,
Nov 7, 2010, 12:25:23 AM11/7/10
to panda...@googlegroups.com
On Sunday 07 November 2010 05:09:58 Siarhei Siamashka wrote:
> This patch tries to solve the problem by setting burst size to 32
> bytes, same as on the other OMAP variants. And it helps. I have not
> checked whether larger 64 byte bursts can be safely enabled with
> some better fix.

These burst sizes are actually 64 and 128 bytes respectively, elementary
math is difficult at 5 AM :)

--
Best regards,
Siarhei Siamashka

Måns Rullgård

unread,
Nov 7, 2010, 6:09:41 AM11/7/10
to panda...@googlegroups.com
Siarhei Siamashka <siarhei....@gmail.com> writes:

> On pandaboard EA1 (OMAP4430 ES2.0), when doing graphics intensive
> operations involving massive updates to the framebuffer, the display
> often flickers and eventually turns off with the following error
> message in dmesg log:
> "omapdss DISPC error: GFX_FIFO_UNDERFLOW, disabling GFX"
>
> The bug can be easily reproduced by running x11perf program:
> $ x11perf -copywinwin500
>
> This patch tries to solve the problem by setting burst size to 32
> bytes, same as on the other OMAP variants. And it helps. I have not
> checked whether larger 64 byte bursts can be safely enabled with
> some better fix.

Nice find. I still think fixing the DDR clock is key though.

--
Måns Rullgård
ma...@mansr.com

Vincent Stehlé

unread,
Nov 8, 2010, 3:52:40 AM11/8/10
to panda...@googlegroups.com, Måns Rullgård
On 11/07/2010 12:09 PM, M�ns Rullg�rd wrote:
> (..) I still think fixing the DDR clock is key though.

Hi M�ns,

Pardon me but I am not aware of any issue with the DDR clock. Could you
please elaborate a bit? Maybe we can fix it.

Best regards,

--
Vincent Stehl�
Systems Engineer - TI France

Vladimir Pantelic

unread,
Nov 8, 2010, 4:08:26 AM11/8/10
to panda...@googlegroups.com
Vincent Stehl� wrote:
> On 11/07/2010 12:09 PM, M�ns Rullg�rd wrote:
>> (..) I still think fixing the DDR clock is key though.
>
> Hi M�ns,
>
> Pardon me but I am not aware of any issue with the DDR clock. Could you
> please elaborate a bit? Maybe we can fix it.

M�ns has an EA1 with a ES2.1 silicon. This is supposed to run the DDR at
400MHz, but it is only stable up to 266Mhz.

What we are after is a DPLL_CORE configuration for running ES 2.1 at
400MHz as advertised for the OMAP4.

Måns Rullgård

unread,
Nov 8, 2010, 4:30:32 AM11/8/10
to panda...@googlegroups.com
Vladimir Pantelic <vlad...@gmail.com> writes:

> Vincent Stehlé wrote:


>> On 11/07/2010 12:09 PM, Måns Rullgård wrote:
>>> (..) I still think fixing the DDR clock is key though.
>>

>> Hi Måns,


>>
>> Pardon me but I am not aware of any issue with the DDR clock. Could you
>> please elaborate a bit? Maybe we can fix it.
>

> Måns has an EA1 with a ES2.1 silicon. This is supposed to run the DDR at


> 400MHz, but it is only stable up to 266Mhz.
>
> What we are after is a DPLL_CORE configuration for running ES 2.1 at
> 400MHz as advertised for the OMAP4.

The missing configuration could be either in DPLL_CORE or EMIF. I
can't tell which based on the information available.

The configuration I found giving 266MHz seems to be running DPLL_CORE
near its limit. I'm hoping someone will reveal a secret way to
configure the dividers differently.

--
Måns Rullgård
ma...@mansr.com

Vincent Stehlé

unread,
Nov 8, 2010, 4:34:04 AM11/8/10
to panda...@googlegroups.com, Vladimir Pantelic
On 11/08/2010 10:08 AM, Vladimir Pantelic wrote:
(..)

> What we are after is a DPLL_CORE configuration for running ES 2.1 at
> 400MHz as advertised for the OMAP4.

I am a bit surprised: all ES2.1 I used were running fine with DDR at 400
MHz.

The MLO in Ubuntu 10.10 preinstalled image for OMAP4 does configure the
DDR to 400 MHz on ES2.1.

http://cdimage.ubuntu.com/ubuntu-netbook/ports/releases/10.10/release/ubuntu-netbook-10.10-preinstalled-netbook-armel+omap4.img.gz

If this is not the s/w you are using already, could you please try to
boot this image on your board?

Måns Rullgård

unread,
Nov 8, 2010, 5:29:07 AM11/8/10
to panda...@googlegroups.com
Vincent Stehlé <v-st...@ti.com> writes:

> On 11/08/2010 10:08 AM, Vladimir Pantelic wrote:
> (..)
>> What we are after is a DPLL_CORE configuration for running ES 2.1 at
>> 400MHz as advertised for the OMAP4.
>
> I am a bit surprised: all ES2.1 I used were running fine with DDR at 400
> MHz.

I have yet to see positive confirmation of one that does. Every board
I've seen believable figures for was running at 200MHz.

> The MLO in Ubuntu 10.10 preinstalled image for OMAP4 does configure the
> DDR to 400 MHz on ES2.1.

How do you know? Where is the source code of this MLO? Perhaps you
could provide a dump of the DPLL_CORE registers with this running.
That would be physical addresses 0x4a004100--0x4a004153.

> http://cdimage.ubuntu.com/ubuntu-netbook/ports/releases/10.10/release/ubuntu-netbook-10.10-preinstalled-netbook-armel+omap4.img.gz
>
> If this is not the s/w you are using already, could you please try to
> boot this image on your board?

This special treatment of ubuntu has to stop. If ubuntu has magical
patches that make things work, they should be integrated upstream
without delay. TI-developed functionality should to _first_ to
upstream, then ubuntu can pick it from there.

--
Måns Rullgård
ma...@mansr.com

Vincent Stehlé

unread,
Nov 8, 2010, 6:20:23 AM11/8/10
to panda...@googlegroups.com, Måns Rullgård, Vladimir Pantelic
On 11/08/2010 11:29 AM, M�ns Rullg�rd wrote:
(400 MHz working with ES2.1)

> I have yet to see positive confirmation of one that does.

I do :)

(..)
> How do you know?

Because I checked the frequencies after booting and I have L3_CORE @200
MHz, LPDDR2 @400 MHz.

(..)


> Where is the source code of this MLO?

On Ricardo's gitorious for example:
http://gitorious.org/~rsalveti/pandaboard/rsalveti-x-loader

Note that you may want to use the Ubuntu kernel with this MLO.

> Perhaps you
> could provide a dump of the DPLL_CORE registers with this running.
> That would be physical addresses 0x4a004100--0x4a004153.

Sure. Here you go (u-boot dump):

Panda # md.l 0x4a004100
4a004100: 00000110 00000000 00000500 00000000 ................
4a004110: 00000000 00000000 00000000 00000000 ................
4a004120: 0000000f 00000001 00000000 00007d05 .............}..
4a004130: 00000301 00000305 00000328 00000324 ........(...$...
4a004140: 00000326 00000325 00000000 00000000 &...%...........
4a004150: 00000000 00000000 00000000 00000000 ................

(Ubuntu 10.10 fine on OMAP4)


> This special treatment of ubuntu has to stop. If ubuntu has magical
> patches that make things work, they should be integrated upstream
> without delay. TI-developed functionality should to _first_ to
> upstream, then ubuntu can pick it from there.

I agree with you that "upstreaming" is the best way to go.

Unfortunately this is not how things went but hopefully we will do
better in the future.

...you can also look at it the other way round: at least Ubuntu has the
correct setup :)

Vladimir Pantelic

unread,
Nov 8, 2010, 6:38:06 AM11/8/10
to Vincent Stehlé, panda...@googlegroups.com, Måns Rullgård
Vincent Stehl� wrote:
> On 11/08/2010 11:29 AM, M�ns Rullg�rd wrote:
> (400 MHz working with ES2.1)
>> I have yet to see positive confirmation of one that does.
>
> I do :)
>
> (..)
>> How do you know?
>
> Because I checked the frequencies after booting and I have L3_CORE @200
> MHz, LPDDR2 @400 MHz.


which freqs does that correspond with in the clock tool?

Måns Rullgård

unread,
Nov 8, 2010, 7:04:50 AM11/8/10
to panda...@googlegroups.com
Vincent Stehlé <v-st...@ti.com> writes:

> On 11/08/2010 11:29 AM, Måns Rullgård wrote:
> (400 MHz working with ES2.1)
>> I have yet to see positive confirmation of one that does.
>
> I do :)
>
> (..)
>> How do you know?
>
> Because I checked the frequencies after booting and I have L3_CORE @200
> MHz, LPDDR2 @400 MHz.

How did you check that? Can you please download and run
http://hardwarebug.org/files/omap4_dramspeed.gz under Linux (as root)?
This will print the actual DDR clock frequency used.

>> Where is the source code of this MLO?
>
> On Ricardo's gitorious for example:
> http://gitorious.org/~rsalveti/pandaboard/rsalveti-x-loader

That one gives 200MHz.

> Note that you may want to use the Ubuntu kernel with this MLO.
>
>> Perhaps you could provide a dump of the DPLL_CORE registers with
>> this running. That would be physical addresses
>> 0x4a004100--0x4a004153.
>
> Sure. Here you go (u-boot dump):
>
> Panda # md.l 0x4a004100
> 4a004100: 00000110 00000000 00000500 00000000 ................
> 4a004110: 00000000 00000000 00000000 00000000 ................
> 4a004120: 0000000f 00000001 00000000 00007d05 .............}..
> 4a004130: 00000301 00000305 00000328 00000324 ........(...$...
> 4a004140: 00000326 00000325 00000000 00000000 &...%...........
> 4a004150: 00000000 00000000 00000000 00000000 ................

This configuration set the DPLL_CORE CLKOUT_M2 output to 800MHz. EMIF
then divides this by 4, giving a DDR clock of 200MHz. Is there
perhaps a secret setting in EMIF to change this divider to something
other than 4? There are a number of undocumented bits there. A dump
of 0x4c000000--0x4c0000ef might tell us something.

> (Ubuntu 10.10 fine on OMAP4)
>> This special treatment of ubuntu has to stop. If ubuntu has magical
>> patches that make things work, they should be integrated upstream
>> without delay. TI-developed functionality should to _first_ to
>> upstream, then ubuntu can pick it from there.
>
> I agree with you that "upstreaming" is the best way to go.

It's funny how everybody always agrees, but nobody ever does anything.

> Unfortunately this is not how things went but hopefully we will do
> better in the future.
>
> ...you can also look at it the other way round: at least Ubuntu has the
> correct setup :)

Maybe. I'm still unconvinced.

--
Måns Rullgård
ma...@mansr.com

Siarhei Siamashka

unread,
Nov 8, 2010, 8:47:58 AM11/8/10
to panda...@googlegroups.com
On Sunday 07 November 2010 13:09:41 Måns Rullgård wrote:
> Siarhei Siamashka <siarhei....@gmail.com> writes:
> > On pandaboard EA1 (OMAP4430 ES2.0), when doing graphics intensive
> > operations involving massive updates to the framebuffer, the display
> > often flickers and eventually turns off with the following error
> > message in dmesg log:
> > "omapdss DISPC error: GFX_FIFO_UNDERFLOW, disabling GFX"
> >
> > The bug can be easily reproduced by running x11perf program:
> > $ x11perf -copywinwin500
> >
> > This patch tries to solve the problem by setting burst size to 32
> > bytes, same as on the other OMAP variants. And it helps. I have not
> > checked whether larger 64 byte bursts can be safely enabled with
> > some better fix.
>
> Nice find.

Well, after a bit more testing, looks like this patch is only helping for
1280x1024-16@60 display mode. So that the "x11perf" test always fails without
the patch and works properly with it. But increasing color depth to 24 or going
up to 1920x1080 resolution with 16 bit color depth reveals all the same problem
again with or without the patch.

So it looks like more like a band-aid solution, but at least for me it allows
to have more or less usable linux desktop (I'll keep an eye on it to see if any
of these problems show up again).

So my original hypothesis that just 8x128bit burst mode is broken in es2.0
hardware proved to be wrong. Or at least partially wrong.

> I still think fixing the DDR clock is key though.

Yes, something looks definitely wrong with the DDR. And we already knew that
es2.0 has some hardware bugs there which prevented using higher memory clock
frequency.

But in the case of DSS, the problem looks like it is not purely memory
bandwidth related. The memory bandwidth is not so bad, and also running some
memcpy tests in the background does not seem to cause any problems as long
as we don't touch the framebuffer.

Right now I think (just a pure speculation) that the concurrent access to the
same memory address from both DISPC DMA and MPU at the same time may cause some
bad thing to happen. And it is not related to the DMA buffer being too small.
Changing burst size helps a bit to avoid this problematic condition, but does
not solve the problem completely. Thinking about it a bit more, there are
different strategies about how to work with the DISPC DMA buffer. It can be
configured to either prefer doing frequent small buffer refills or do larger
transfers with bigger interval between them depending on the values in
DISPC_GFX_BUF_THRESHOLD register. I think it is worth trying to check whether
avoiding long consequent memory reads by DISPC DMA would help. Maybe it's
large or multiple back-to-back burst reads from DISPC DMA happening
simultaneously with MPU memory accesses at the same address triggering
the problem.

Marcin Juszkiewicz

unread,
Nov 8, 2010, 8:31:13 AM11/8/10
to panda...@googlegroups.com
Dnia poniedziałek, 8 listopada 2010 o 13:04:50 Måns Rullgård napisał(a):
> Vincent Stehlé <v-st...@ti.com> writes:

> > Because I checked the frequencies after booting and I have L3_CORE @200
> > MHz, LPDDR2 @400 MHz.
>
> How did you check that? Can you please download and run
> http://hardwarebug.org/files/omap4_dramspeed.gz under Linux (as root)?
> This will print the actual DDR clock frequency used.

14:26 hrw@panda:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu natty (development branch)
Release: 11.04
Codename: natty
14:27 hrw@panda:~$ sudo ./omap4_dramspeed
99 MHz
14:27 hrw@panda:~$ uname -a
Linux panda 2.6.35-903-omap4 #14-Ubuntu SMP PREEMPT Wed Oct 6 17:23:24 UTC
2010 armv7l GNU/Linux
14:27 hrw@panda:~$ for gpio in 171 101 182;do cat
/sys/class/gpio/gpio$gpio/value;done
0
1
1

This is EA1 pandaboard with ES2.1 silicon.

Regards,
--
JID: h...@jabber.org
Website: http://marcin.juszkiewicz.com.pl/
LinkedIn: http://www.linkedin.com/in/marcinjuszkiewicz

Vincent Stehlé

unread,
Nov 8, 2010, 8:37:28 AM11/8/10
to panda...@googlegroups.com, Måns Rullgård
On 11/08/2010 01:04 PM, M�ns Rullg�rd wrote:
(..)

> How did you check that?

I used an internal tool that we have, which analyzes the registers values.

> Can you please download and run
> http://hardwarebug.org/files/omap4_dramspeed.gz under Linux (as root)?
> This will print the actual DDR clock frequency used.

vincent@vincent-panda:~$ sudo ./omap4_dramspeed
199 MHz

Either this tool is wrong, or our tool is :)

(..)


> This configuration set the DPLL_CORE CLKOUT_M2 output to 800MHz.
> EMIF then divides this by 4, giving a DDR clock of 200MHz.

CLKOUT_M2 @800 MHz, Fdpll @1600 MHz looks fine to me.

I need to check for EMIF DDR clock.

(..)


> Is there
> perhaps a secret setting in EMIF to change this divider to something
> other than 4?

If there is, I don't know it.

(..)


> A dump
> of 0x4c000000--0x4c0000ef might tell us something.

Here you go, then:

Panda # md.l 0x4c000000
4c000000: 40441403 40000004 80001ab9 00000010 ..D@...@........
4c000010: 00000618 00000618 10eb0662 10eb0662 ........b...b...
4c000020: 20370dd2 20370dd2 00b1c33f 00b1c33f ..7 ..7 ?...?...
4c000030: 11220c0c 11220c0c 00000000 00000000 .."...".........
4c000040: 00000000 00000000 00000000 00000000 ................
4c000050: c0000010 070700ff 9000190a 00042727 ............''..
4c000060: 00002011 00000000 00000000 00000000 . ..............
4c000070: 00000000 00000000 00000000 00000000 ................
4c000080: 03a5bf99 0030fce6 00010000 00000000 ......0.........
4c000090: a0a9d60a 00000000 000501ff 000501ff ................
4c0000a0: 00000000 00000000 00000000 00000000 ................
4c0000b0: 00000000 00000000 00000000 00000000 ................
4c0000c0: 00000000 00000000 d00b3215 00000000 .........2......
4c0000d0: 00000000 00000000 00000000 00000000 ................
4c0000e0: 00000000 849ff408 849ff408 00000004 ................

Vladimir Pantelic

unread,
Nov 8, 2010, 8:39:47 AM11/8/10
to panda...@googlegroups.com, Vincent Stehlé, Måns Rullgård
Vincent Stehl� wrote:
> On 11/08/2010 01:04 PM, M�ns Rullg�rd wrote:
> (..)
>> How did you check that?
>
> I used an internal tool that we have, which analyzes the registers values.
>
>> Can you please download and run
>> http://hardwarebug.org/files/omap4_dramspeed.gz under Linux (as root)?
>> This will print the actual DDR clock frequency used.
>
> vincent@vincent-panda:~$ sudo ./omap4_dramspeed
> 199 MHz
>
> Either this tool is wrong, or our tool is :)
>
> (..)
>> This configuration set the DPLL_CORE CLKOUT_M2 output to 800MHz.
>> EMIF then divides this by 4, giving a DDR clock of 200MHz.
>
> CLKOUT_M2 @800 MHz, Fdpll @1600 MHz looks fine to me.

so DPLL_CORE is supposed to run at 1600MHz?
Can you please state the M and N values to be used?

Vincent Stehlé

unread,
Nov 8, 2010, 8:43:32 AM11/8/10
to Vladimir Pantelic, panda...@googlegroups.com, Måns Rullgård
On 11/08/2010 02:39 PM, Vladimir Pantelic wrote:
> so DPLL_CORE is supposed to run at 1600MHz?
> Can you please state the M and N values to be used?

Let me rephrase: when you have CLKOUT_M2 @800, you have Fdpll @1600 MHz
(see TRM rev. M Figure 3-40. Generic DPLL Functional Diagram).

This is:

> 4a004120: 0000000f 00000001 00000000 00007d05 .............}..

^^^^^^^^

Fdpll = 2 * 38.4 * 0x7d / (5 + 1) = 1600 MHz.

Vladimir Pantelic

unread,
Nov 8, 2010, 8:48:24 AM11/8/10
to Vincent Stehlé, panda...@googlegroups.com, Måns Rullgård
Vincent Stehl� wrote:
> On 11/08/2010 02:39 PM, Vladimir Pantelic wrote:
>> so DPLL_CORE is supposed to run at 1600MHz?
>> Can you please state the M and N values to be used?
>
> Let me rephrase: when you have CLKOUT_M2 @800, you have Fdpll @1600 MHz
> (see TRM rev. M Figure 3-40. Generic DPLL Functional Diagram).
>
> This is:
>
>> 4a004120: 0000000f 00000001 00000000 00007d05 .............}..
> ^^^^^^^^
>
> Fdpll = 2 * 38.4 * 0x7d / (5 + 1) = 1600 MHz.

so 125 and 5/6, that's what we have as well.

now, CLKOUT_M2 goes to EMIF and TRM says there it is
divided by 4 to get sdram physical clock, that would be
200MHz then, no?

where does the 400MHz sdram come to play?


Måns Rullgård

unread,
Nov 8, 2010, 9:00:34 AM11/8/10
to panda...@googlegroups.com
Vincent Stehlé <v-st...@ti.com> writes:

> On 11/08/2010 01:04 PM, Måns Rullgård wrote:
> (..)
>> How did you check that?
>
> I used an internal tool that we have, which analyzes the registers values.
>
>> Can you please download and run
>> http://hardwarebug.org/files/omap4_dramspeed.gz under Linux (as root)?
>> This will print the actual DDR clock frequency used.
>
> vincent@vincent-panda:~$ sudo ./omap4_dramspeed
> 199 MHz
>
> Either this tool is wrong, or our tool is :)

My tool uses the EMIF_PERF_CNT_TIM register.

>> This configuration set the DPLL_CORE CLKOUT_M2 output to 800MHz.
>> EMIF then divides this by 4, giving a DDR clock of 200MHz.
>
> CLKOUT_M2 @800 MHz, Fdpll @1600 MHz looks fine to me.

It would be, if EMIF did not divide that 800MHz by 4.

> I need to check for EMIF DDR clock.

It's 200MHz on your system. My tool said so.

>> Is there perhaps a secret setting in EMIF to change this divider to
>> something other than 4?
>
> If there is, I don't know it.
>
> (..)
>> A dump of 0x4c000000--0x4c0000ef might tell us something.
>
> Here you go, then:

Since you're also only getting 200MHz, that doesn't tell me much.

If someone has details on the EMIF_DDR_PHY_CTRL_1/2 registers, I'd be
interested.

--
Måns Rullgård
ma...@mansr.com

archi

unread,
Nov 7, 2010, 11:28:16 PM11/7/10
to pandaboard
Hi,

When you write '2' as burst size on a omap4 pipeline you are actually
setting the burst size as 128 bytes.
So this patch changes the burst size from 128 bytes to 64 bytes.

Another option is to extend the the FIFO size of the GFX pipeline by
borrowing from another pipeline
which isn't in use(this can be done by writing to the
DISPC_GLOBAL_BUFFER register).
The GFX_BUF_THRESHOLDS values will have to be reprogrammed now to the
extended fifo size.

This works for 1080p resolutions with 128 byte bursts but we haven't
started using this yet since we haven't
implemented a "pipeline fifo resource manager", if, for example, we
are sure that we won't use writeback pipeline
we can give GFX it's fifo's during init.

Regards,
Archit
GLMM Display

On Nov 7, 4:09 pm, Måns Rullgård <m...@mansr.com> wrote:
> m...@mansr.com

Siarhei Siamashka

unread,
Nov 8, 2010, 10:14:24 AM11/8/10
to panda...@googlegroups.com
On Monday 08 November 2010 15:37:28 Vincent Stehlé wrote:

> On 11/08/2010 01:04 PM, Måns Rullgård wrote:
> (..)
>
> > How did you check that?
>
> I used an internal tool that we have, which analyzes the registers values.
>
> > Can you please download and run
> > http://hardwarebug.org/files/omap4_dramspeed.gz under Linux (as root)?
> > This will print the actual DDR clock frequency used.
>
> vincent@vincent-panda:~$ sudo ./omap4_dramspeed
> 199 MHz
>
> Either this tool is wrong, or our tool is :)

I think your tools have lost grasp of the reality ;)

On my Intel Atom N450 based netbook, I'm getting performance ~1.4GB/s for
'memcpy' function, just with the implementation provided by glibc (so don't
know whether we can actually do better than that). And surely automatic
hardware prefetch is also working properly on this hardware.

Reading Intel datasheet [1], we can find:
"1.2.1 System Memory Support
* One channel of DDR2 memory (consists of 64-bit of data lines):
* Maximum of two SO-DIMMs in Raw Card-A or Raw Card-C format
* Memory DDR2 data transfer rates of 667 MT/s"

Now moving to OMAP4 hardware, I could find the following presentation [2],
claiming 5x better memory bandwidth than OMAP3. Looks very nice on the paper.

So I was optimistically expecting memory performance of OMAP4 to be comparable
to that of Intel Atom in the final revisions of the silicon once all the major
problems are ironed out.

Needless to say, the memory performance of EA1 pandaboard is currently very
poor (worse than beagleboard), even taking into account that it is supposed to
have twice slower memory than A1 revision.

But I still have high hopes for you guys. Please do your best and solve these
problems (wrong software configuration or bugs in the hardware) before the
OMAP4 based products go to the mass market. I think OMAP4 based netbooks would
be awesome, and I'm going to get one if/when they show up in the shops :)

1. http://download.intel.com/design/processor/datashts/322847.pdf
2. http://www.hotchips.org/archives/hc21/2_mon/HC21.24.400.ClientProcessors-Epub/HC21.24.421.Witt-OMAP4430.pdf

Måns Rullgård

unread,
Nov 8, 2010, 9:38:37 AM11/8/10
to panda...@googlegroups.com
Siarhei Siamashka <siarhei....@gmail.com> writes:

> On Monday 08 November 2010 15:37:28 Vincent Stehlé wrote:
>> On 11/08/2010 01:04 PM, Måns Rullgård wrote:
>> (..)
>>
>> > How did you check that?
>>
>> I used an internal tool that we have, which analyzes the registers values.
>>
>> > Can you please download and run
>> > http://hardwarebug.org/files/omap4_dramspeed.gz under Linux (as root)?
>> > This will print the actual DDR clock frequency used.
>>
>> vincent@vincent-panda:~$ sudo ./omap4_dramspeed
>> 199 MHz
>>
>> Either this tool is wrong, or our tool is :)
>
> I think your tools have lost grasp of the reality ;)
>
> On my Intel Atom N450 based netbook, I'm getting performance ~1.4GB/s for
> 'memcpy' function, just with the implementation provided by glibc (so don't
> know whether we can actually do better than that). And surely automatic
> hardware prefetch is also working properly on this hardware.

Intel uses automatic prefetching a lot. OMAP4 managed to disable it
even though the A9 core has it. Way to go, TI.

> Reading Intel datasheet [1], we can find:
> "1.2.1 System Memory Support
> * One channel of DDR2 memory (consists of 64-bit of data lines):
> * Maximum of two SO-DIMMs in Raw Card-A or Raw Card-C format
> * Memory DDR2 data transfer rates of 667 MT/s"
>
> Now moving to OMAP4 hardware, I could find the following
> presentation [2], claiming 5x better memory bandwidth than
> OMAP3. Looks very nice on the paper.
>
> So I was optimistically expecting memory performance of OMAP4 to be
> comparable to that of Intel Atom in the final revisions of the
> silicon once all the major problems are ironed out.
>
> Needless to say, the memory performance of EA1 pandaboard is
> currently very poor (worse than beagleboard), even taking into
> account that it is supposed to have twice slower memory than A1
> revision.

Running DDR2 memory at 200MHz is ridiculous. DDR memory at the same
frequency is faster due to lower latencies. The only reason for using
(and indeed inventing) DDR2 is for the higher clock frequencies it
supports, which bring latencies down to DDR levels while "doubling"
the bandwidth. Scare quotes because the doubling only applies to
burst rates, obviously.

> But I still have high hopes for you guys. Please do your best and
> solve these problems (wrong software configuration or bugs in the
> hardware) before the OMAP4 based products go to the mass market.

Solving problems like this one quickly is crucial, or someone will
publish a "benchmark" showing just how slow it is. A bad reputation
is much harder to lose than a good one...

--
Måns Rullgård
ma...@mansr.com

Måns Rullgård

unread,
Nov 8, 2010, 11:21:33 AM11/8/10
to panda...@googlegroups.com
Vincent Stehlé <v-st...@ti.com> writes:

> Because I checked the frequencies after booting and I have L3_CORE @200
> MHz, LPDDR2 @400 MHz.
>

> Panda # md.l 0x4a004100
> 4a004100: 00000110 00000000 00000500 00000000 ................
> 4a004110: 00000000 00000000 00000000 00000000 ................
> 4a004120: 0000000f 00000001 00000000 00007d05 .............}..
> 4a004130: 00000301 00000305 00000328 00000324 ........(...$...
> 4a004140: 00000326 00000325 00000000 00000000 &...%...........
> 4a004150: 00000000 00000000 00000000 00000000 ................

CM_CLKSEL_CORE = 0x110 => L3_CLK = CORE_CLK/2
CORE_CLK = CORE_X2_CLK

CORE_X2_CLK is CLKOUTX2_M5 of DPLL_CORE.

CM_DIV_M5_DPLL_CORE:HSDIVIDER_CLKOUT2_DIV = 4 => CORE_X2_CLK = Fdpll/4
=> L3_CLK = Fdpll/8

---

CM_CLKSEL_DPLL_CORE = 0x00007d05 => M = 125, N = 5
=> Fdpll = Fref * 2 * 125 / 6

Fref = 38.4MHz => Fdpll = 1600MHz
=> L3_CLK = 200MHz

So far it's consistent.

Section 15.3.4.2 Clock Management (public TRM rev M) states that
"EMIFi_FCLK is divided inside the EMIF module: DDR clock = EMIFi_FCLK / 4."

Figures 15-46 and 15-47 show that both EMIFi_FCLK are both connected
to PHY_ROOT_CLK, which is the CLKOUT_M2 output of DPLL_CORE:

EMIFi_FCLK = PHY_ROOT_CLK = CLKOUT_M2 = Fdpll/(2*M2) = 1600MHz / 2 = 800MHz

DDR clock = EMIFi_FCLK / 4 = 200MHz

Reading the EMIF_PERF_CNT_TIM register shows it counting at a rate of 200MHz.

So where does the 400MHz figure come from?

--
Måns Rullgård
ma...@mansr.com

Vincent Stehlé

unread,
Nov 8, 2010, 12:32:56 PM11/8/10
to Vladimir Pantelic, panda...@googlegroups.com, Måns Rullgård
On 11/08/2010 02:48 PM, Vladimir Pantelic wrote:
> where does the 400MHz sdram come to play?

After asking people knowledgeable the DDR, it appears that:

DDR clock (PAD) == CLKOUT_M2 / 2

There is a "hint" in Table 3-185. CD_EMIF Modules Clocks Association in
rev. M of the TRM. Also, as far as I can tell the TRM statement in
15.3.4.2 Clock Management that "DDR clock = EMIFi_FCLK / 4" is wrong.
(If I understand correctly we always have EMIFi_FCLK == PHY_ROOT_CLOCK.)
I entered a "ticket" for that to be fixed in a future TRM revision.

Thanks for pointing that it was not clear, by the way.

Best regards,

Koen Kooi

unread,
Nov 8, 2010, 12:51:38 PM11/8/10
to panda...@googlegroups.com, Vladimir Pantelic, Måns Rullgård

So how do we fix the perceived slowness?

regards,

Koen

Søren Steen Christensen

unread,
Nov 9, 2010, 3:29:53 AM11/9/10
to panda...@googlegroups.com, Vladimir Pantelic, Måns Rullgård
I think it seems like we have two different issues?
1) Too slow DDR2 access (200MHz instead of expected 400MHz)
2) DISPC underrun (maybe partly caused by slow DDR2?) - Maybe as well caused
by the DISPC/DMA burst setting?

Have anyone tried to:
a) Actually measure the DDR clock frequency - Assuming it can be accessed
somehow/is feed to a test point?
- I have to admit that I haven't checked :-(
b) Use any kind of tool (i.e. Lauterbach) to analyze the L3 activity/status
to see what's actually going on inside the chip?
- Assuming something similar to the L3 OCP tool for OMAP3 exists for L3
NoC for OMAP4?

Best regards
Søren

---
SSC Solutions ApS - Denmark - www.ssc-solutions.dk

Vincent Stehlé

unread,
Nov 9, 2010, 3:51:01 AM11/9/10
to panda...@googlegroups.com, Søren Steen Christensen, Vladimir Pantelic, Måns Rullgård
On 11/09/2010 09:29 AM, S�ren Steen Christensen wrote:
(..)

> Have anyone tried to:
> a) Actually measure the DDR clock frequency - Assuming it can be accessed
> somehow/is feed to a test point?

This has been observed during hardware validation of the OMAP4430.

(..)


> - Assuming something similar to the L3 OCP tool for OMAP3 exists for L3
> NoC for OMAP4?

I don't know about the tools, but the hardware counters are there (see
TRM vM 13.2.3.6 "Statistic Collectors Group" for example).

Siarhei Siamashka

unread,
Nov 9, 2010, 5:53:16 AM11/9/10
to panda...@googlegroups.com
On Tuesday 09 November 2010 10:29:53 Søren Steen Christensen wrote:
> I think it seems like we have two different issues?
> 1) Too slow DDR2 access (200MHz instead of expected 400MHz)
> 2) DISPC underrun (maybe partly caused by slow DDR2?) - Maybe as well
> caused by the DISPC/DMA burst setting?

The discussion about (2) has currently moved offlist. And looks like the issue
can be solved (by a different patch from Archit Taneja). If anybody else is
suffering from this bug and would like to participate in testing different
patch variants, don't be shy and let us know.

Søren Steen Christensen

unread,
Nov 9, 2010, 5:04:39 AM11/9/10
to panda...@googlegroups.com, Vladimir Pantelic, Måns Rullgård
> > - Assuming something similar to the L3 OCP tool for OMAP3 exists for
L3
> > NoC for OMAP4?

> I don't know about the tools, but the hardware counters are there (see
> TRM vM 13.2.3.6 "Statistic Collectors Group" for example).

On OMAP3 you can AFAIR monitor all L3 OCP requests (size, cycles, initiator,
target, etc) by setting up the chip to steam all this OCP-header-information
to STP/STM/ETM. I have never actually done this myself but seen it
demonstrated at TIF once some years back...

Best regards,

Vincent Stehlé

unread,
Nov 9, 2010, 8:19:43 AM11/9/10
to panda...@googlegroups.com
On 11/09/2010 11:53 AM, Siarhei Siamashka wrote:
> (..) If anybody else is

> suffering from this bug and would like to participate in testing different
> patch variants, don't be shy and let us know.

Please add me to the list; I am interested in the patches.

Steve Sakoman

unread,
Nov 9, 2010, 8:58:26 AM11/9/10
to panda...@googlegroups.com

Where has the discussion moved?

This seems to be a pretty universal issue among Panda users. I am
certainly interested in testing.

Steve

Richard

unread,
Nov 8, 2010, 4:13:53 PM11/8/10
to pandaboard
Hi,

On Nov 8, 8:38 am, Måns Rullgård <m...@mansr.com> wrote:
> Siarhei Siamashka <siarhei.siamas...@gmail.com> writes:

> Intel uses automatic prefetching a lot.  OMAP4 managed to disable it
> even though the A9 core has it.  Way to go, TI.

I got a forward on this thread and can add some info. A number of
bits you are discussing are known to many for a long time.

> > Now moving to OMAP4 hardware, I could find the following
> > presentation [2], claiming 5x better memory bandwidth than
> > OMAP3. Looks very nice on the paper.

Raw performance from the DDR controller point of view is much higher
than OMAP3. The performance seen by any initiator is likely not what
the controller can do. To understand speed you need to look at the
who path from source to sync. You will find you can do things like
1080p on OMAP4 which are impossible on OMAP3.

> Running DDR2 memory at 200MHz is ridiculous.  DDR memory at the same
> frequency is faster due to lower latencies.  The only reason for using
> (and indeed inventing) DDR2 is for the higher clock frequencies it
> supports, which bring latencies down to DDR levels while "doubling"
> the bandwidth.  Scare quotes because the doubling only applies to
> burst rates, obviously.

- The ES1 chip was not timing closed to run at 400MHz but you could
get away with it if you didn't care about some HDMI audio glitches.
- The ES2 chip was timing closed but some new power feature caused a
timing violation on the 2nd LPDDR2 controller resulting in corruption
in one byte lane. There are many ways to work around but all have
some drawback. For development the one in public tree was deemed the
best for general use.
- The ES2.1 chip fixed this and is running fine at 400MHz on both
channels. Its highly likely it will take a while for all fixes to
propagate to all trees (assuming you have hardware version which can
take advantage of it). Panda boards running uBuntu builds have some
validated voltage/frequency setting in them.

On ES2 probably you can get away running at 400MHz if you use just one
controller but its still has other downsides.

> > But I still have high hopes for you guys. Please do your best and
> > solve these problems (wrong software configuration or bugs in the
> > hardware) before the OMAP4 based products go to the mass market.

ES2.1 is a production candidate but likely most people will opt for
ES2.2 depending on needs.

> Solving problems like this one quickly is crucial, or someone will
> publish a "benchmark" showing just how slow it is.  A bad reputation
> is much harder to lose than a good one...

Sure. Several synthetic benchmarks do bring up questions for CortexA9
in general.

When you dig in you will find things like A9's cache line size set at
32 bytes instead of 64 bytes (in A8 & A15) has a noticeable impact on
some instruction streams. You will find that the required (for
coherency) write-alloc caching also changes curves in significant
ways. These provide both + and - depending on point of view. The
bigger L2 cache and higher speculation of A9 hide some of these.

You do find good use of PLD or hardware prefetchers can recover a lot
of memcpy type performance. However, a memcpy on A9 and A8 looks
different as PLD policy is different. One throws them away past 4 and
the other holds on to them. There are other interesting details like
how write-alloc can be partially suppressed for sequential addresses
if A9 BIU is in state to recognize it.

On HW prefetch, both the MPCORE and the PL310 have prefetchers. The
versions of ARM IP which were bundled in OMAP4-ES1did not support
prefetch (was not documented in ARM information). In ES2 they both
are documented & supported in ARM IP which is bundled in OMAP4.
Trustzone ROM throws some wrenches into full use. You will find for
PL310 the prefetcher likely preforms the best when setting the
prefetch distance to 5 or greather (POR reg). It can nearly 2x for
some streams as its like a HW inserted PLD.

Regards,
Richard W.

Vladimir Pantelic

unread,
Nov 9, 2010, 5:23:09 AM11/9/10
to panda...@googlegroups.com

Well, unless the patch is under NDA just post it here, no? :-)

Måns Rullgård

unread,
Nov 9, 2010, 10:03:14 AM11/9/10
to panda...@googlegroups.com
Richard <r-woo...@ti.com> writes:

> Hi,
>
> On Nov 8, 8:38 am, Måns Rullgård <m...@mansr.com> wrote:
>> Siarhei Siamashka <siarhei.siamas...@gmail.com> writes:
>
>> Intel uses automatic prefetching a lot.  OMAP4 managed to disable it
>> even though the A9 core has it.  Way to go, TI.
>
> I got a forward on this thread and can add some info. A number of
> bits you are discussing are known to many for a long time.

All the more disappointing that they are not fixed or even documented.

>> > Now moving to OMAP4 hardware, I could find the following
>> > presentation [2], claiming 5x better memory bandwidth than
>> > OMAP3. Looks very nice on the paper.
>
> Raw performance from the DDR controller point of view is much higher
> than OMAP3. The performance seen by any initiator is likely not what
> the controller can do. To understand speed you need to look at the
> who path from source to sync. You will find you can do things like
> 1080p on OMAP4 which are impossible on OMAP3.

People keep saying that. Talk is cheap.

>> Running DDR2 memory at 200MHz is ridiculous.  DDR memory at the same
>> frequency is faster due to lower latencies.  The only reason for using
>> (and indeed inventing) DDR2 is for the higher clock frequencies it
>> supports, which bring latencies down to DDR levels while "doubling"
>> the bandwidth.  Scare quotes because the doubling only applies to
>> burst rates, obviously.
>
> - The ES1 chip was not timing closed to run at 400MHz but you could
> get away with it if you didn't care about some HDMI audio glitches.
> - The ES2 chip was timing closed but some new power feature caused a
> timing violation on the 2nd LPDDR2 controller resulting in corruption
> in one byte lane. There are many ways to work around but all have
> some drawback. For development the one in public tree was deemed the
> best for general use.
> - The ES2.1 chip fixed this and is running fine at 400MHz on both

Allegedly.

> channels. Its highly likely it will take a while for all fixes to
> propagate to all trees (assuming you have hardware version which can
> take advantage of it). Panda boards running uBuntu builds have some
> validated voltage/frequency setting in them.
>
> On ES2 probably you can get away running at 400MHz if you use just one
> controller but its still has other downsides.

My board has ES2.1, and the relevant software recognises it as such.

>> > But I still have high hopes for you guys. Please do your best and
>> > solve these problems (wrong software configuration or bugs in the
>> > hardware) before the OMAP4 based products go to the mass market.
>
> ES2.1 is a production candidate but likely most people will opt for
> ES2.2 depending on needs.

If it provides access to the A9 aux control register and the remaining
PL310 control registers, it will hopefully be a bit better. We'll see
once the chips arrive.

>> Solving problems like this one quickly is crucial, or someone will
>> publish a "benchmark" showing just how slow it is.  A bad reputation
>> is much harder to lose than a good one...
>
> Sure. Several synthetic benchmarks do bring up questions for CortexA9
> in general.

The Tegra2 is also A9, and it readily outperforms the OMAP4 when NEON
is not used, i.e. the vast majority of applications.

> When you dig in you will find things like A9's cache line size set at
> 32 bytes instead of 64 bytes (in A8 & A15) has a noticeable impact on
> some instruction streams. You will find that the required (for
> coherency) write-alloc caching also changes curves in significant
> ways. These provide both + and - depending on point of view. The
> bigger L2 cache and higher speculation of A9 hide some of these.
>
> You do find good use of PLD or hardware prefetchers can recover a lot
> of memcpy type performance. However, a memcpy on A9 and A8 looks
> different as PLD policy is different. One throws them away past 4 and
> the other holds on to them. There are other interesting details like
> how write-alloc can be partially suppressed for sequential addresses
> if A9 BIU is in state to recognize it.

The PL310 can be configured to discard prefetch requests conflicting
with non-prefetch requests. Of course the register governing this
(prefetch control register) is not accessible on OMAP4 GP devices.

> On HW prefetch, both the MPCORE and the PL310 have prefetchers. The
> versions of ARM IP which were bundled in OMAP4-ES1did not support
> prefetch (was not documented in ARM information). In ES2 they both
> are documented & supported in ARM IP which is bundled in OMAP4.
> Trustzone ROM throws some wrenches into full use. You will find for
> PL310 the prefetcher likely preforms the best when setting the
> prefetch distance to 5 or greather (POR reg). It can nearly 2x for
> some streams as its like a HW inserted PLD.

That setting is also in the prefetch control register, so there's no
way to alter it.

--
Måns Rullgård
ma...@mansr.com

messier

unread,
Nov 9, 2010, 2:10:28 PM11/9/10
to pandaboard
Sorry to post slightly off topic, but I am curious what silicon of the
OMAP4 is shipped with the current pandaboards. Is it already ES2.1?

Regards,

Thomas

On 9 Nov., 16:03, Måns Rullgård <m...@mansr.com> wrote:
> m...@mansr.com

Rob Clark

unread,
Nov 9, 2010, 3:45:52 PM11/9/10
to panda...@googlegroups.com
On Tue, Nov 9, 2010 at 1:10 PM, messier <tomb...@gmail.com> wrote:
> Sorry to post slightly off topic, but I am curious what silicon of the
> OMAP4 is shipped with the current pandaboards. Is it already ES2.1?
>

most EA1 boards are 2.0.. and all A1 boards are 2.1

http://omappedia.org/wiki/PandaBoard_Revisions

BR,
-R

> Regards,
>
> Thomas
>

Måns Rullgård

unread,
Nov 9, 2010, 4:04:30 PM11/9/10
to panda...@googlegroups.com
messier <tomb...@gmail.com> writes:

> Sorry to post slightly off topic, but I am curious what silicon of the
> OMAP4 is shipped with the current pandaboards. Is it already ES2.1?

Rev A1 boards (shipped by Digikey) are all ES2.1. EA1 boards are
mostly ES2.0 except a handful identified by an ES2.1 sticker on the
backside.

--
Måns Rullgård
ma...@mansr.com

messier

unread,
Nov 9, 2010, 4:36:15 PM11/9/10
to pandaboard
Ok, thanks for your answers. I ordered one, finally :-)
Hopefully shipping is quicker than expected...

Bernard Blackham

unread,
Nov 9, 2010, 7:37:16 PM11/9/10
to panda...@googlegroups.com

I received a board through the PEAP program last week (thanks!).
It is labelled on the bottom "Rev EA1", but the GPIOs suggest it is
has ES2.1 silicon:

/sys/class/gpio# cat gpio171/value
0
/sys/class/gpio# cat gpio101/value
1
/sys/class/gpio# cat gpio182/value
1

Does this mean it is actually ES2.1 silicon?

Cheers,
Bernard.

Måns Rullgård

unread,
Nov 10, 2010, 6:13:41 AM11/10/10
to panda...@googlegroups.com
Richard <r-woo...@ti.com> writes:

> - The ES2.1 chip fixed this and is running fine at 400MHz

Let's make sure we're talking about the same clock. The DDR interface
uses a differential clock, like this:

---\ /~~~\ /---\ /~~~\ /---\ /~~~\ /--- CK_t
X X X X X X
~~~/ \---/ \~~~/ \---/ \~~~/ \---/ \~~~ CK_c

|<--------->|
t_CK

The clock period is t_CK, and the frequency is 1/t_CK. Data transfer
occurs at each crossover between CK_t and CK_c. The data rate is thus
twice the clock frequency.

When I say memory is clocked at 200MHz, I mean that t_CK = 5ns.

--
Måns Rullgård
ma...@mansr.com

Bernard Blackham

unread,
Nov 10, 2010, 7:18:34 AM11/10/10
to panda...@googlegroups.com

To answer my own question, it appears the answer is no. Mans pointed
me to section 1.5 of the TRM, about device identification and the
ID_CODE register.

# devmem2 0x4A002204
Value at address 0x4A002204 (0x2ac81204): 0x1B85202F

Seems to definitively say that this board is ES2.0 silicon, and at
least for these boards, the GPIO identifiers aren't reliable.

Bernard.

Vladimir Pantelic

unread,
Nov 15, 2010, 4:32:04 AM11/15/10
to panda...@googlegroups.com, Richard
Richard wrote:

> - The ES2.1 chip fixed this and is running fine at 400MHz on both
> channels. Its highly likely it will take a while for all fixes to
> propagate to all trees (assuming you have hardware version which can
> take advantage of it). Panda boards running uBuntu builds have some
> validated voltage/frequency setting in them.

Richard,

could you point out the relevant "fixes" that are needed, so that
we can make sure they have "propagated" enough?

Siarhei Siamashka

unread,
Nov 15, 2010, 6:30:06 AM11/15/10
to panda...@googlegroups.com, Richard
On Tuesday 09 November 2010 17:03:14 Måns Rullgård wrote:

> Richard <r-woo...@ti.com> writes:
> > You do find good use of PLD or hardware prefetchers can recover a lot
> > of memcpy type performance. However, a memcpy on A9 and A8 looks
> > different as PLD policy is different. One throws them away past 4 and
> > the other holds on to them. There are other interesting details like
> > how write-alloc can be partially suppressed for sequential addresses
> > if A9 BIU is in state to recognize it.
>
> The PL310 can be configured to discard prefetch requests conflicting
> with non-prefetch requests. Of course the register governing this
> (prefetch control register) is not accessible on OMAP4 GP devices.
>
> > On HW prefetch, both the MPCORE and the PL310 have prefetchers. The
> > versions of ARM IP which were bundled in OMAP4-ES1did not support
> > prefetch (was not documented in ARM information). In ES2 they both
> > are documented & supported in ARM IP which is bundled in OMAP4.
> > Trustzone ROM throws some wrenches into full use. You will find for
> > PL310 the prefetcher likely preforms the best when setting the
> > prefetch distance to 5 or greather (POR reg). It can nearly 2x for
> > some streams as its like a HW inserted PLD.
>
> That setting is also in the prefetch control register, so there's no
> way to alter it.

Richard, do you have any comments regarding this? Could you clarify whether
it is possible to enable any kind of hardware prefetcher with ES2.0 or ES2.1
silicon? If it is not possible, can we be sure that this is going to be fixed
in the next revisions of OMAP4?

Also do you have any kind of recommendation about how to improve performance
with the current setup (assuming that somebody selected it on purpose due to
some reasons)? Maybe -fprefetch-loop-arrays gcc option or something similar?

Thanks.

signature.asc

Woodruff, Richard

unread,
Nov 16, 2010, 1:51:21 AM11/16/10
to Siarhei Siamashka, panda...@googlegroups.com
> From: Siarhei Siamashka [mailto:siarhei....@gmail.com]
> Sent: Monday, November 15, 2010 5:30 AM
> To: panda...@googlegroups.com; Woodruff, Richard

> Richard, do you have any comments regarding this? Could you clarify
> whether
> it is possible to enable any kind of hardware prefetcher with ES2.0 or
> ES2.1
> silicon? If it is not possible, can we be sure that this is going to
> be fixed
> in the next revisions of OMAP4?

On ES2.2-GP or any ES2.x-EMU with PPA needed registers should be available.

I do believe you can enable pl310 prefetcher via ES2 exported ROM call. However, the POR (prefetch offset register) is not open. The default POR is 0, for some things is like not issuing a PLD soon enough.

> Also do you have any kind of recommendation about how to improve
> performance
> with the current setup (assuming that somebody selected it on purpose
> due to
> some reasons)? Maybe -fprefetch-loop-arrays gcc option or something
> similar?

I've learned some PLD tricks which help some of the streaming operations. I'm unsure about all compiler tricks. ARM has buried some in RVCT. Mans pointed out that the Linaro tree has picked up some of these.

Regards,
Richard W.

yi li

unread,
May 17, 2011, 6:22:45 AM5/17/11
to panda...@googlegroups.com
Hi Mans,

I read this thread carefully since I am also troubled by my Pandaboard's poor memory performance.
I am using an A1 pandaboard (chip revision ES2.1 I think).

Just wondering, what is A1 pandaboard's DDR clock frequency. Are you still believing it is 200MHz? Any way to configure DDR clock to 400MHz?

I got similar performance result on my board as: http://computerarch.com/log/2011/03/01/pandaboard/
I think this result is got with L2 prefetch turn off. But it is still very low giving 400MHz LPDDR2.

Is this OMAP4 bug, or miss configuration of Pandaboard? Any way to fix it?

Regards,
-yi


Reply all
Reply to author
Forward
0 new messages