To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
or, via email, send a message with subject or body 'help' to
freebsd-a...@freebsd.org
You can reach the person managing the list at
freebsd-...@freebsd.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-arm digest..."
Today's Topics:
1. Re: Smallest ARM Device for FreeBSD (batcilla itself)
2. Re: Performance of SheevaPlug on 8-stable (Bernd Walter)
3. RE: Smallest ARM Device for FreeBSD (li...@walkertc.com)
4. Re: Performance of SheevaPlug on 8-stable (Mark Tinguely)
5. Re: Performance of SheevaPlug on 8-stable (M. Warner Losh)
6. Re: Smallest ARM Device for FreeBSD (Bernd Walter)
7. Re: Performance of SheevaPlug on 8-stable (Maks Verver)
8. Re: Performance of SheevaPlug on 8-stable (Grzegorz Bernacki)
9. Re: Performance of SheevaPlug on 8-stable (M. Warner Losh)
10. Re: Performance of SheevaPlug on 8-stable (Mark Tinguely)
11. Re: Performance of SheevaPlug on 8-stable (Bernd Walter)
12. Re: Performance of SheevaPlug on 8-stable (Mark Tinguely)
13. Re: Performance of SheevaPlug on 8-stable (Bernd Walter)
14. RM9200 tuning (Bernd Walter)
15. Re: RM9200 tuning (M. Warner Losh)
16. Re: RM9200 tuning (Stanislav Sedov)
17. Re: RM9200 tuning (Bernd Walter)
18. Re: RM9200 tuning (Bernd Walter)
19. Re: RM9200 tuning (Bernd Walter)
20. Re: RM9200 tuning (Stanislav Sedov)
21. Re: RM9200 tuning (Bernd Walter)
22. Re: RM9200 tuning (Rafal Jaworowski)
23. Re: RM9200 tuning (M. Warner Losh)
24. Re: RM9200 tuning (Bernd Walter)
25. Re: Performance of SheevaPlug on 8-stable (Maks Verver)
26. Re: RM9200 tuning (M. Warner Losh)
27. Re: RM9200 tuning (Bernd Walter)
----------------------------------------------------------------------
Message: 1
Date: Mon, 8 Mar 2010 14:28:38 +0200
From: batcilla itself <batc...@gmail.com>
Subject: Re: Smallest ARM Device for FreeBSD
To: li...@walkertc.com
Cc: freeb...@freebsd.org
Message-ID:
<6c36ec371003080428p54...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Is depends - which features you need?
I believe FreeBSD 8 or -current can be run on most 10x10cm ARM Xscale
boards, but only kernel and very minimal md rootfs.
Also you may need to port some i2c stuff and fix IRQ->GPIO etc.
//batcilla
2010/3/8 <li...@walkertc.com>
> What is the smallest ARM device that FreeBSD can run on?
>
> I am also looking for recommendations with respect to the most compatible
> ARM device that works with FreeBSD - and possibly the tradeoffs between
> devices. The web site doesn't go into much detail.
>
>
> _______________________________________________
> freeb...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm...@freebsd.org"
>
------------------------------
Message: 2
Date: Mon, 8 Mar 2010 13:41:17 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Hans Petter Selasky <hsel...@c2i.net>
Cc: freeb...@freebsd.org, Mark Tinguely <ting...@casselton.net>,
ti...@cicely.de
Message-ID: <20100308124...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 10:07:14AM +0100, Hans Petter Selasky wrote:
> On Monday 08 March 2010 09:25:59 Jacques Fourie wrote:
> > On Mon, Mar 8, 2010 at 5:00 AM, Mark Tinguely <ting...@casselton.net>
> wrote:
> > > <deletes>
> > >
> > >> It is still puzzling me why it is not near 80 seconds.
> > >> This would mean it is loosing something about 5-6 cycles.
> > >> Well - Ok - the pipeline might be that long and real loops are
> > >> mostly some instructions longer.
> > >> But I would still be interested to see Linux results on RM9200.
> > >>
> > >> --
> > >> B.Walter <be...@bwct.de> http://www.bwct.de
> > >> Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
> > >
> > > Thinking way out of the box ... has anyone tried this in single user
> > > mode?
> > >
>
> Was the output from "vmstat -i" and "top" posted?
No, but I can say that my current and 8.0-current system had almost no
load.
My 7.0-current system had about 60-70% load and was about 3 times slower
than the 8.0 and the patched 9.0 system, so it makes sense.
Do you expect anything special to see?
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 3
Date: Mon, 8 Mar 2010 08:40:52 -0500
From: <li...@walkertc.com>
Subject: RE: Smallest ARM Device for FreeBSD
To: <freeb...@freebsd.org>
Message-ID: <000c01cabec4$f8758800$e9609800$@com>
Content-Type: text/plain; charset="us-ascii"
This will be a minimal system that really only needs 2 USB or serial inputs
and network connectivity. It will act as a "sensor" - serving to read from
the 2 inputs and relaying that information over the network.
I'm actually flexible enough to not even have the 2 inputs since shrinking
the system takes a higher priority.
-----Original Message-----
From: batcilla itself [mailto:batc...@gmail.com]
Sent: Monday, March 08, 2010 7:29 AM
To: li...@walkertc.com
Cc: freeb...@freebsd.org
Subject: Re: Smallest ARM Device for FreeBSD
Is depends - which features you need?
I believe FreeBSD 8 or -current can be run on most 10x10cm ARM Xscale
boards, but only kernel and very minimal md rootfs.
Also you may need to port some i2c stuff and fix IRQ->GPIO etc.
//batcill
------------------------------
Message: 4
Date: Mon, 8 Mar 2010 07:57:54 -0600 (CST)
From: Mark Tinguely <ting...@casselton.net>
Subject: Re: Performance of SheevaPlug on 8-stable
To: ti...@cicely.de
Cc: freeb...@freebsd.org
Message-ID: <201003081357....@casselton.net>
> On Mon, Mar 08, 2010 at 10:07:14AM +0100, Hans Petter Selasky wrote:
> > On Monday 08 March 2010 09:25:59 Jacques Fourie wrote:
> > > On Mon, Mar 8, 2010 at 5:00 AM, Mark Tinguely <ting...@casselton.net>
> > wrote:
> > > > <deletes>
> > > >
> > > >> It is still puzzling me why it is not near 80 seconds.
> > > >> This would mean it is loosing something about 5-6 cycles.
> > > >> Well - Ok - the pipeline might be that long and real loops are
> > > >> mostly some instructions longer.
> > > >> But I would still be interested to see Linux results on RM9200.
> > > >>
> > > >> --
> > > >> B.Walter <be...@bwct.de> http://www.bwct.de
> > > >> Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
> > > >
> > > > Thinking way out of the box ... has anyone tried this in single user
> > > > mode?
> > > >
> >
> > Was the output from "vmstat -i" and "top" posted?
>
> No, but I can say that my current and 8.0-current system had almost no
> load.
> My 7.0-current system had about 60-70% load and was about 3 times slower
> than the 8.0 and the patched 9.0 system, so it makes sense.
> Do you expect anything special to see?
>
> --
> B.Walter <be...@bwct.de> http://www.bwct.de
> Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
>
The process is in a tight CPU bound loop. I am thinking that there is
still the interrupt handler (short assembly), find the next interrupt routine
(typically a bitmap shift loop), clock interrupt handler, scheduler,
cpu_switch() (even if it is just to the same process) that goes off every
1/HZ seconds. I would think that if there is an inefficiency in the above
loop, the times should be the same magnitude in single-user as in multi-using.
If you cannot go to single user then the 'vmstat -i' and 'top' is a good
idea - make sure something else is not causing a context switch which would
flush our caches.
The performance counter idea is a good one too.
Wildly grasping, here:
I suppose you could run the program with time and use the wall clock
or "date; a.out; date" to eliminate some problem in the "time" command.
--Mark.
------------------------------
Message: 5
Date: Mon, 08 Mar 2010 07:07:39 -0700 (MST)
From: "M. Warner Losh" <i...@bsdimp.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: ting...@casselton.net
Cc: freeb...@freebsd.org, ti...@cicely.de
Message-ID: <20100308.070739.110...@bsdimp.com>
Content-Type: Text/Plain; charset=us-ascii
In message: <201003081357....@casselton.net>
Mark Tinguely <ting...@casselton.net> writes:
:
: > On Mon, Mar 08, 2010 at 10:07:14AM +0100, Hans Petter Selasky wrote:
: > > On Monday 08 March 2010 09:25:59 Jacques Fourie wrote:
: > > > On Mon, Mar 8, 2010 at 5:00 AM, Mark Tinguely <ting...@casselton.net>
: > > wrote:
: > > > > <deletes>
: > > > >
: > > > >> It is still puzzling me why it is not near 80 seconds.
: > > > >> This would mean it is loosing something about 5-6 cycles.
: > > > >> Well - Ok - the pipeline might be that long and real loops are
: > > > >> mostly some instructions longer.
: > > > >> But I would still be interested to see Linux results on RM9200.
: > > > >>
: > > > >> --
: > > > >> B.Walter <be...@bwct.de> http://www.bwct.de
: > > > >> Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
: > > > >
: > > > > Thinking way out of the box ... has anyone tried this in single user
: > > > > mode?
: > > > >
: > >
: > > Was the output from "vmstat -i" and "top" posted?
: >
: > No, but I can say that my current and 8.0-current system had almost no
: > load.
: > My 7.0-current system had about 60-70% load and was about 3 times slower
: > than the 8.0 and the patched 9.0 system, so it makes sense.
: > Do you expect anything special to see?
: >
: > --
: > B.Walter <be...@bwct.de> http://www.bwct.de
: > Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
: >
:
: The process is in a tight CPU bound loop. I am thinking that there is
: still the interrupt handler (short assembly), find the next interrupt routine
: (typically a bitmap shift loop), clock interrupt handler, scheduler,
: cpu_switch() (even if it is just to the same process) that goes off every
: 1/HZ seconds. I would think that if there is an inefficiency in the above
: loop, the times should be the same magnitude in single-user as in multi-using.
:
: If you cannot go to single user then the 'vmstat -i' and 'top' is a good
: idea - make sure something else is not causing a context switch which would
: flush our caches.
:
: The performance counter idea is a good one too.
:
: Wildly grasping, here:
: I suppose you could run the program with time and use the wall clock
: or "date; a.out; date" to eliminate some problem in the "time" command.
ntpdate might help here to eliminate any local clock effects..
Warner
------------------------------
Message: 6
Date: Mon, 8 Mar 2010 15:24:07 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: Smallest ARM Device for FreeBSD
To: li...@walkertc.com
Cc: freeb...@freebsd.org
Message-ID: <20100308142...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 08:40:52AM -0500, li...@walkertc.com wrote:
> This will be a minimal system that really only needs 2 USB or serial inputs
> and network connectivity. It will act as a "sensor" - serving to read from
> the 2 inputs and relaying that information over the network.
>
> I'm actually flexible enough to not even have the 2 inputs since shrinking
> the system takes a higher priority.
It is hard to say without knowing your design limitations about
RAM, size, speed, power.
My RM9200 boards have 10x10cm, 4 port ethernet switch, 64-128MB
RAM and power consumption of 1.5-2.6W.
It has a few GPIO - especially the RX/TX of the 4 USART as TTL
and an I2C header, but it has only one USB port.
There are smaller RM9200 boards with just a single RAM chip, which
usually results in lower speed and without a switch.
If you want more power and you have access to grid power a sheva plug
can also be a nice alternative.
On the other hand - I personally often use much smaller ARM7 with
an embedded IP stack instead of a full featured FreeBSD system.
Those systems can be even smaller since they don't need large
SDRAM chips and the controllers are physically smaller as well.
> -----Original Message-----
> From: batcilla itself [mailto:batc...@gmail.com]
> Sent: Monday, March 08, 2010 7:29 AM
> To: li...@walkertc.com
> Cc: freeb...@freebsd.org
> Subject: Re: Smallest ARM Device for FreeBSD
>
> Is depends - which features you need?
> I believe FreeBSD 8 or -current can be run on most 10x10cm ARM Xscale
> boards, but only kernel and very minimal md rootfs.
> Also you may need to port some i2c stuff and fix IRQ->GPIO etc.
>
> //batcill
>
>
> _______________________________________________
> freeb...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm...@freebsd.org"
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 7
Date: Mon, 08 Mar 2010 15:29:25 +0100
From: Maks Verver <maksv...@geocities.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: freeb...@freebsd.org
Message-ID: <4B9509C5...@geocities.com>
Content-Type: text/plain; charset=ISO-8859-1
On 03/07/2010 10:25 PM, Mark Tinguely wrote:
> FreeBSD-current has kernel and user witness turned on. Witness is
> for locks, so it should not change the performance of a tight
> arithmetic loop like this.
For the record, I've been using 8-stable so far.
> I don't know the marvell interals, and from what I tell, their
> technial docs require NDA.
Yeah, that sucks. But I don't think the SheevaPlug contains a lot of
novel technology; it's just a slightly different configuration. In any
case, Linux seem to have more or less complete support for the
SheevaPlug (including L2 cache, SDIO and NAND flash) so for
details, the GPL'ed Linux source code may be helpful.
> It looks like from the cpu identification that the the branch
> prediction is turned on. Branch prediction compensates for the longer
> pipelines. I can't see how in the tight loop how that could go
> astray.
Well, since the Linux version of the test program runs exactly as well
as I expect (or could ever hope for) I don't have any doubts that the
CPU is able to run the tight loop efficiently. The question (for me) is
why it doesn't run just as well on FreeBSD.
I tried a couple of the suggestions:
Mark Tingely wrote:
> Thinking way out of the box ... has anyone tried this in single user
> mode?
I did, and it still takes 287 seconds (same as before).
Petter Selasky wrote:
> Was the output from "vmstat -i" and "top" posted?
Note yet. vmstat -i reports:
interrupt total rate
irq1: timer0 130981 999
irq33: uart0 477 3
irq19: ehci0 875 6
Total 132333 1010
Which looks entirely reasonable to me. Top contains the same info as the
time data I posted: 99.x% of CPU time is spent in user-mode, lots of
free memory. So it seems the kernel has very little do with this.
Next up, this patch:
> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
No idea what this does, but it helps a lot:
%time ./test
9.000u 0.000s 0:09.11 99.2% 40+1324k 0+0io 0pf+0w
That's much better than the 280+ seconds from before. But it's still
nearly twice as long as Linux takes.
There is more weirdness though. If I freshly boot the system I get
timings like these, and even nbench reports decent scores. However, if I
do a couple things like rerun/recompile nbench, then at some point
something 'breaks' and the performance goes back down to what it used to be.
So Mark's patch definitely touches on something related to the problem,
but doesn't quite solve the problem completely. I still have no clue
what's going on, but I'm willing to try out suggestions if anyone has
them. :-)
- Maks Verver.
------------------------------
Message: 8
Date: Mon, 08 Mar 2010 16:50:58 +0100
From: Grzegorz Bernacki <g...@semihalf.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Mark Tinguely <ting...@casselton.net>
Cc: freeb...@freebsd.org
Message-ID: <4B951CE2...@semihalf.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Mark Tinguely wrote:
> FreeBSD-current has kernel and user witness turned on. Witness is for
> locks, so it should not change the performance of a tight arithmetic loop
> like this.
>
> I don't know the marvell interals, and from what I tell, their technial
> docs require NDA. That said, many of the ARM processors also have a
> instruction internal cache (instruction prefetch) in addition to the
> instruction cache. I don't think the prefetch has an enable/disable.
>
> It looks like from the cpu identification that the the branch prediction
> is turned on. Branch prediction compensates for the longer pipelines.
> I can't see how in the tight loop how that could go astray.
>
> Thus says the ARM ARM:
>
> ARM implementations are free to choose how far ahead of the
> current point of execution they prefetch instructions; either
> a fixed or a dynamically varying number of instructions. As well
> as being free to choose how many instructions to prefetch, an ARM
> implementation can choose which possible future execution path to
> prefetch along. For example, after a branch instruction, it can
> choose to prefetch either the instruction following the branch
> or the instruction at the branch target. This is known as branch
> prediction.
>
> There are a few data dangling allocations that I would like to see
> closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
> is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
> it is never marked as unallocated. *IN THEORY*, if that page is used
> again, then we could falsely believe that page is being shared and
> we turn off the cache, eventhough it is not shared.
>
> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
>
> * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
> the Sheeva implementation. This is a theoritical observation of a side
> effect of the multiple kernel mapping patch that we did just before
> FreeBSD 8-release.
>
> --Mark Tinguely
This is probably caused by mechanism which turns of cache for shared pages.
When I add applied following path:
diff --git a/sys/arm/arm/pmap.c b/sys/arm/arm/pmap.c
index 390dc3c..d17c0cc 100644
--- a/sys/arm/arm/pmap.c
+++ b/sys/arm/arm/pmap.c
@@ -1401,6 +1401,8 @@ pmap_fix_cache(struct vm_page *pg, pmap_t pm, vm_offset_t va)
*/
TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) {
+ if (pv->pv_flags & PVF_EXEC)
+ return;
/* generate a count of the pv_entry uses */
if (pv->pv_flags & PVF_WRITE) {
if (pv->pv_pmap == pmap_kernel())
execution time of 'test' program is:
mv78100-4# time ./test
5.000u 0.000s 0:05.40 99.8% 40+1324k 0+0io 0pf+0w
and without this path is:
mv78100-4# time ./test
295.000u 0.000s 4:56.01 99.7% 40+1322k 0+0io 0pf+0w
I think we need to handle executable pages in different way.
grzesiek
------------------------------
Message: 9
Date: Mon, 08 Mar 2010 09:15:16 -0700 (MST)
From: "M. Warner Losh" <i...@bsdimp.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: g...@semihalf.com
Cc: ting...@casselton.net, freeb...@freebsd.org
Message-ID: <20100308.091516.295...@bsdimp.com>
Content-Type: Text/Plain; charset=us-ascii
cIn message: <4B951CE2...@semihalf.com>
Grzegorz Bernacki <g...@semihalf.com> writes:
: This is probably caused by mechanism which turns of cache for shared
: pages.
: When I add applied following path:
:
: diff --git a/sys/arm/arm/pmap.c b/sys/arm/arm/pmap.c
: index 390dc3c..d17c0cc 100644
: --- a/sys/arm/arm/pmap.c
: +++ b/sys/arm/arm/pmap.c
: @@ -1401,6 +1401,8 @@ pmap_fix_cache(struct vm_page *pg, pmap_t pm,
: vm_offset_t va)
: */
:
: TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) {
: + if (pv->pv_flags & PVF_EXEC)
: + return;
: /* generate a count of the pv_entry uses */
: if (pv->pv_flags & PVF_WRITE) {
: if (pv->pv_pmap == pmap_kernel())
:
: execution time of 'test' program is:
: mv78100-4# time ./test
: 5.000u 0.000s 0:05.40 99.8% 40+1324k 0+0io 0pf+0w
:
: and without this path is:
: mv78100-4# time ./test
: 295.000u 0.000s 4:56.01 99.7% 40+1322k 0+0io 0pf+0w
:
:
: I think we need to handle executable pages in different way.
Agreed. Why would we turn off caching for shared pages? I can
understand read/write pages in a system that has a virtually index
cache with the classic cache aliasing problems as a workaround for
lousy hardware, but otherwise, this one has me scratching my head...
And if there's only one copy of 'test' running, why does it hit the
'shared' case for this code?
Warner
------------------------------
Message: 10
Date: Mon, 8 Mar 2010 12:19:23 -0600 (CST)
From: Mark Tinguely <ting...@casselton.net>
Subject: Re: Performance of SheevaPlug on 8-stable
To: g...@semihalf.com, i...@bsdimp.com
Cc: freeb...@freebsd.org, ting...@casselton.net
Message-ID: <201003081819....@casselton.net>
> In message: <4B951CE2...@semihalf.com>
> Grzegorz Bernacki <g...@semihalf.com> writes:
> : This is probably caused by mechanism which turns of cache for shared
> : pages.
> : When I add applied following path:
> :
> : diff --git a/sys/arm/arm/pmap.c b/sys/arm/arm/pmap.c
> : index 390dc3c..d17c0cc 100644
> : --- a/sys/arm/arm/pmap.c
> : +++ b/sys/arm/arm/pmap.c
> : @@ -1401,6 +1401,8 @@ pmap_fix_cache(struct vm_page *pg, pmap_t pm,
> : vm_offset_t va)
> : */
> :
> : TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) {
> : + if (pv->pv_flags & PVF_EXEC)
> : + return;
> : /* generate a count of the pv_entry uses */
> : if (pv->pv_flags & PVF_WRITE) {
> : if (pv->pv_pmap == pmap_kernel())
> :
> : execution time of 'test' program is:
> : mv78100-4# time ./test
> : 5.000u 0.000s 0:05.40 99.8% 40+1324k 0+0io 0pf+0w
> :
> : and without this path is:
> : mv78100-4# time ./test
> : 295.000u 0.000s 4:56.01 99.7% 40+1322k 0+0io 0pf+0w
> :
> :
> : I think we need to handle executable pages in different way.
>
> Agreed. Why would we turn off caching for shared pages? I can
> understand read/write pages in a system that has a virtually index
> cache with the classic cache aliasing problems as a workaround for
> lousy hardware, but otherwise, this one has me scratching my head...
>
> And if there's only one copy of 'test' running, why does it hit the
> 'shared' case for this code?
>
> Warner
Could you do this instead:
+ int stop;
TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) {
+ if (pv->pv_flags & PVF_EXEC)
+ stop = 1;
/* generate a count of the pv_entry uses */
if (pv->pv_flags & PVF_WRITE) {
if (pv->pv_pmap == pmap_kernel())
kwritable++;
else if (pv->pv_pmap == pm)
uwritable++;
writable++;
}
if (pv->pv_pmap == pmap_kernel())
kentries++;
else {
if (pv->pv_pmap == pm)
uentries++;
entries++;
}
}
+ if (stop) {
+ if (writable && entries)
+ printf("fix results %d %d %d %d\n", kwritable,uwritable,
+ kentries, uentries);
+ return;
+ }
This would give counts to make sure there is not a logic error in fix_cache.
My best guess kwritable/kentry will be non-zero and says there is another
"dangling kernel allocation" somewhere. We allocate a page into a kernel
mapping and then either:
1) we are not clearing the md.pv_kva entry all the time when
allocation is freed. I thought I was careful.
2) someone is temporarilly allocating the page and not telling
pmap that they are done with it.
The information that this page is mapped to a kernel mapping is incorrectly
remembered, and the cache are turned off.
We could look in the free page code for a non-empty md.pv_kva entry to
test this theory.
md.pv_kva is set for kernel mappings, so the culiprit will be in the
pmap_kenter calls.
Thank-you.
--Mark Tinguely
------------------------------
Message: 11
Date: Mon, 8 Mar 2010 19:41:47 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: Performance of SheevaPlug on 8-stable
To: "M. Warner Losh" <i...@bsdimp.com>
Cc: ting...@casselton.net, freeb...@freebsd.org
Message-ID: <20100308184...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 09:15:16AM -0700, M. Warner Losh wrote:
> cIn message: <4B951CE2...@semihalf.com>
> Grzegorz Bernacki <g...@semihalf.com> writes:
> : This is probably caused by mechanism which turns of cache for shared
> : pages.
> : When I add applied following path:
> :
> : diff --git a/sys/arm/arm/pmap.c b/sys/arm/arm/pmap.c
> : index 390dc3c..d17c0cc 100644
> : --- a/sys/arm/arm/pmap.c
> : +++ b/sys/arm/arm/pmap.c
> : @@ -1401,6 +1401,8 @@ pmap_fix_cache(struct vm_page *pg, pmap_t pm,
> : vm_offset_t va)
> : */
> :
> : TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) {
> : + if (pv->pv_flags & PVF_EXEC)
> : + return;
> : /* generate a count of the pv_entry uses */
> : if (pv->pv_flags & PVF_WRITE) {
> : if (pv->pv_pmap == pmap_kernel())
> :
> : execution time of 'test' program is:
> : mv78100-4# time ./test
> : 5.000u 0.000s 0:05.40 99.8% 40+1324k 0+0io 0pf+0w
> :
> : and without this path is:
> : mv78100-4# time ./test
> : 295.000u 0.000s 4:56.01 99.7% 40+1322k 0+0io 0pf+0w
> :
> :
> : I think we need to handle executable pages in different way.
>
> Agreed. Why would we turn off caching for shared pages? I can
> understand read/write pages in a system that has a virtually index
> cache with the classic cache aliasing problems as a workaround for
> lousy hardware, but otherwise, this one has me scratching my head...
This puzzled me as well.
What is the requirement for such a handling with shared pages?
I though handing over shared data is done by cache-flush, barriers or
whatever an architectur has for this.
Most systems we talk about are single CPU, so it is just DMA and
handing over dcache writes to icache, but we don't support self
modifying code, so it is always done in a controlled way.
And even for SMP systems handing over data requires using
cache coherence mechanisms - e.g. those embedded in mutexes.
So what is wrong in my picture and requires us to do special handling
for shared pages on ARM?
> And if there's only one copy of 'test' running, why does it hit the
> 'shared' case for this code?
>
> Warner
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 12
Date: Mon, 8 Mar 2010 13:37:23 -0600 (CST)
From: Mark Tinguely <ting...@casselton.net>
Subject: Re: Performance of SheevaPlug on 8-stable
To: ti...@cicely.de
Cc: freeb...@freebsd.org
Message-ID: <201003081937....@casselton.net>
<deleted>
>
> This puzzled me as well.
> What is the requirement for such a handling with shared pages?
> I though handing over shared data is done by cache-flush, barriers or
> whatever an architectur has for this.
> Most systems we talk about are single CPU, so it is just DMA and
> handing over dcache writes to icache, but we don't support self
> modifying code, so it is always done in a controlled way.
> And even for SMP systems handing over data requires using
> cache coherence mechanisms - e.g. those embedded in mutexes.
> So what is wrong in my picture and requires us to do special handling
> for shared pages on ARM?
>
> > And if there's only one copy of 'test' running, why does it hit the
> > 'shared' case for this code?
> >
> > Warner
>
> --
> B.Walter <be...@bwct.de> http://www.bwct.de
> Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
ARMv4/ARMv5 use virtual indexed / virtual tagged level one caches.
They may or may not have level two caches. This is the ARM chips
that we currently support, and I will explain the rules below.
Newest processors the ARMv6 can be virtual index / physical tagged or
physical index / physical tagged level one caches; The ARM7 must have
physical index / physical tag level one caches. The ARMv6 and ARMv7
have more pde/pte bit explaining the cache status on the "inner"
and "outter" caches. The ARMv7 has the more mature cache management;
it defines the "level of unity" and "level of coherence" for the caches.
There is also a level snooping for the ARMv7 mulit-core, that I will
just dance around. PIPT cache must be synced to the "level of coherency"
before DMA and when modified from another process - think debugger in
another address space modifying instruction code. ARMv6/ARMv7 have
special address spaces to avoid tlb flushes. If they are not used, then
tlbs have to be flushed on context switch. This is close to the i386/amd64
with the exception of DMA, the i386/amd64 have self snooping cache buses.
VIVT cache rules:
1) flush cache and tlb on context change.
2) USER cache must be disabled if a physical page has AT LEAST one writable
user mapping AND is also mapped more than one time in the same user
address space. (multiple read mappings and no writes are fine, they take
up multiple cache entries. Obviously, a single read or a single write
is fine. If the mappings are in different user address spaces, we will
be okay because the flush on context change will sync things up).
3) KERNEL spaces are global.
a) If the page is mapped writable AT LEAST ONCE to a kernel space
AND the page is mapped more than once, no matter if the second
mapping is in the user or kernel space, all mappings must not
be cached.
b) If the page has only readable kernel mappings but at least one
writable user mapping, the cache must be disabled for the mappings
of page in this address space. This is slightly different from
rule 2. Kernel mappings are typically writable, so this is a
case that really does not happen.
It gets a little tricky to implement, because we have to catch the transition
from cache -> non-cache (change pte and wbinv/inv data or instruction caches)
and from non-cache -> cache (change the pte).
--Mark.
------------------------------
Message: 13
Date: Mon, 8 Mar 2010 20:54:49 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Mark Tinguely <ting...@casselton.net>
Cc: freeb...@freebsd.org, ti...@cicely.de
Message-ID: <20100308195...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 01:37:23PM -0600, Mark Tinguely wrote:
>
> <deleted>
> >
> > This puzzled me as well.
> > What is the requirement for such a handling with shared pages?
> > I though handing over shared data is done by cache-flush, barriers or
> > whatever an architectur has for this.
> > Most systems we talk about are single CPU, so it is just DMA and
> > handing over dcache writes to icache, but we don't support self
> > modifying code, so it is always done in a controlled way.
> > And even for SMP systems handing over data requires using
> > cache coherence mechanisms - e.g. those embedded in mutexes.
> > So what is wrong in my picture and requires us to do special handling
> > for shared pages on ARM?
> >
> > > And if there's only one copy of 'test' running, why does it hit the
> > > 'shared' case for this code?
> > >
> > > Warner
> >
> > --
> > B.Walter <be...@bwct.de> http://www.bwct.de
> > Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
>
> ARMv4/ARMv5 use virtual indexed / virtual tagged level one caches.
> They may or may not have level two caches. This is the ARM chips
> that we currently support, and I will explain the rules below.
>
> Newest processors the ARMv6 can be virtual index / physical tagged or
> physical index / physical tagged level one caches; The ARM7 must have
> physical index / physical tag level one caches. The ARMv6 and ARMv7
> have more pde/pte bit explaining the cache status on the "inner"
> and "outter" caches. The ARMv7 has the more mature cache management;
> it defines the "level of unity" and "level of coherence" for the caches.
> There is also a level snooping for the ARMv7 mulit-core, that I will
> just dance around. PIPT cache must be synced to the "level of coherency"
> before DMA and when modified from another process - think debugger in
> another address space modifying instruction code. ARMv6/ARMv7 have
> special address spaces to avoid tlb flushes. If they are not used, then
> tlbs have to be flushed on context switch. This is close to the i386/amd64
> with the exception of DMA, the i386/amd64 have self snooping cache buses.
>
> VIVT cache rules:
>
> 1) flush cache and tlb on context change.
>
> 2) USER cache must be disabled if a physical page has AT LEAST one writable
> user mapping AND is also mapped more than one time in the same user
> address space. (multiple read mappings and no writes are fine, they take
> up multiple cache entries. Obviously, a single read or a single write
> is fine. If the mappings are in different user address spaces, we will
> be okay because the flush on context change will sync things up).
>
> 3) KERNEL spaces are global.
> a) If the page is mapped writable AT LEAST ONCE to a kernel space
> AND the page is mapped more than once, no matter if the second
> mapping is in the user or kernel space, all mappings must not
> be cached.
I never assumed to be happy without a direct map.
> b) If the page has only readable kernel mappings but at least one
> writable user mapping, the cache must be disabled for the mappings
> of page in this address space. This is slightly different from
> rule 2. Kernel mappings are typically writable, so this is a
> case that really does not happen.
>
> It gets a little tricky to implement, because we have to catch the transition
> from cache -> non-cache (change pte and wbinv/inv data or instruction caches)
> and from non-cache -> cache (change the pte).
Thanks for the detailed explanation.
I took a while, but now I got it.
My picture wasn't expecting caching virtual pages.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 14
Date: Mon, 8 Mar 2010 21:23:38 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: RM9200 tuning
To: a...@freebsd.org
Cc: Bernd Walter <ti...@cicely7.cicely.de>
Message-ID: <20100308202...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
The whole cache story is scary enough I think, but it showed that it
is not avoidable to have uncacheable pages in some cases.
There are a few facts about RM9200 systems, which most of you running
those systems already know.
Most systems have 16Mx32 SDRAM - although other configurations with
x16 and different size are possible.
Although there is an external SRAM bus I don't think there is any good
sense to use it as general purpose RAM, since the ranges are small,
16bit and require lots of waitstates, which makes the slower than SDRAM
in almost every case.
There is also 16k fast internal SRAM, which isn't used within FreeBSD
at all.
Is there anything reasonable we can place in this RAM?
Originally I thought once about ate RX buffers or kernel page tables.
Atnother point are the clock speeds.
There are 3 important clocks - USB clock is generated by a separate
PLL and required to have a specific rate.
CPU clock is generated by another PLL and is typically 180MHz,
but allowed to be up to 205MHz IIRC.
Peripheral clock is divided from CPU clock and is typically 60MHz,
because 180MHz/3, but is allowed to be up to 80MHz.
The limiting factor is the PLL, which can't produce more than 180MHz.
But there are allowed settings, which might be faster in real world.
One example is running the CPU with 160MHz, which allows the
peripheral clock to be 80MHz.
The CPU ist slower, but memory is much faster.
The first not allowed thing is overclocking the PLL.
I've seen the PLL to happily produce 288MHz and I think the restriction
is just for the full temperature range and full xtal type range.
In fact I've even seen the CPU to run with 288MHz, but that's another
story.
It should be OK to run the system with full ~205MHz in most cases,
which also increases the peripheral clock including the SDRAM speed.
SDRAM is typically rated with 133MHz or 100MHz, so no problem from
this side.
Then of course there is the possibility for overclocking the CPU
as well as in the 288MHz case.
Originally FreeBSD had assumed fixed clock rates.
Knowing the peripheral rate is important for e.g. UART bps dividers.
I think in the meantime it is possible to reconfigure the kernel to
different clock rates - if yes what are the kernel options for it?
Which would be the best place to reconfigure the PLL?
I know how to do it and that it is done by the loader right now, but
I would like to have it as a kernel tuneable.
All I need to know is a good place in the kernel startup.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 15
Date: Mon, 08 Mar 2010 13:49:57 -0700 (MST)
From: "M. Warner Losh" <i...@bsdimp.com>
Subject: Re: RM9200 tuning
To: ti...@cicely.de, ti...@cicely7.cicely.de
Cc: a...@freebsd.org
Message-ID: <20100308.134957.431...@bsdimp.com>
Content-Type: Text/Plain; charset=us-ascii
In message: <20100308202...@cicely7.cicely.de>
Bernd Walter <ti...@cicely7.cicely.de> writes:
: Originally FreeBSD had assumed fixed clock rates.
Only for the first few revs. Now it is settable with the
AT91_MASTER_CLOCK option.
: Knowing the peripheral rate is important for e.g. UART bps dividers.
: I think in the meantime it is possible to reconfigure the kernel to
: different clock rates - if yes what are the kernel options for it?
: Which would be the best place to reconfigure the PLL?
: I know how to do it and that it is done by the loader right now, but
: I would like to have it as a kernel tuneable.
: All I need to know is a good place in the kernel startup.
Hmmm, I thought the kernel tried to read the master clock rate, but I
can't find the code that does it anymore. The AT91 family have a
register than can be read to get this value, so long as your board
design conforms to the atmel documented restrictions on the clock xtal
used.
Or, as is usually the case with clocks on these parts, am I confusing
this with something else :)
Warner
------------------------------
Message: 16
Date: Mon, 8 Mar 2010 12:46:38 -0800
From: Stanislav Sedov <st...@FreeBSD.org>
Subject: Re: RM9200 tuning
To: ti...@cicely.de
Cc: a...@freebsd.org, Bernd Walter <ti...@cicely7.cicely.de>
Message-ID: <20100308124638...@FreeBSD.org>
Content-Type: text/plain; charset=US-ASCII
On Mon, 8 Mar 2010 21:23:38 +0100
Bernd Walter <ti...@cicely7.cicely.de> mentioned:
> Originally FreeBSD had assumed fixed clock rates.
> Knowing the peripheral rate is important for e.g. UART bps dividers.
> I think in the meantime it is possible to reconfigure the kernel to
> different clock rates - if yes what are the kernel options for it?
> Which would be the best place to reconfigure the PLL?
> I know how to do it and that it is done by the loader right now, but
> I would like to have it as a kernel tuneable.
> All I need to know is a good place in the kernel startup.
>
I think the best place to do this would be the loader itself. AFAIK, FreeBSD
on AT91 doesn't assume any specific clock rate except the FSB clock rate and does
the calibration of the CPU clock and xtal clock on startup. One of solutions is to
add a loader tunable that will allow you to pass the FSB clock rate from the loader,
instead of assuming the constant value.
--
Stanislav Sedov
ST4096-RIPE
------------------------------
Message: 17
Date: Mon, 8 Mar 2010 22:24:01 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: RM9200 tuning
To: "M. Warner Losh" <i...@bsdimp.com>
Cc: a...@freebsd.org, ti...@cicely7.cicely.de, ti...@cicely.de
Message-ID: <20100308212...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 01:49:57PM -0700, M. Warner Losh wrote:
> In message: <20100308202...@cicely7.cicely.de>
> Bernd Walter <ti...@cicely7.cicely.de> writes:
> : Originally FreeBSD had assumed fixed clock rates.
>
> Only for the first few revs. Now it is settable with the
> AT91_MASTER_CLOCK option.
This is what I called the peripheral clock?
So in the normal case I setup this value to 60,000,000?
> : Knowing the peripheral rate is important for e.g. UART bps dividers.
> : I think in the meantime it is possible to reconfigure the kernel to
> : different clock rates - if yes what are the kernel options for it?
> : Which would be the best place to reconfigure the PLL?
> : I know how to do it and that it is done by the loader right now, but
> : I would like to have it as a kernel tuneable.
> : All I need to know is a good place in the kernel startup.
>
> Hmmm, I thought the kernel tried to read the master clock rate, but I
> can't find the code that does it anymore. The AT91 family have a
> register than can be read to get this value, so long as your board
> design conforms to the atmel documented restrictions on the clock xtal
> used.
Ah - I hardly remember that there was something like this.
I think it is initialized from the ROM code by comparing with the
32768 oscillator - without this it wouldn't be possible for the ROM code
to xmodem the first boot code via DBGU.
But I think it is just to get the xtal clock.
At least it would be possible to calculate the remaining clocks from
that.
Well - I want something different.
I want to reprogramm the PLL inside the kernel so I would be more
interested to know where there would be a good place to inject code for
this.
It needs to be early enough before anything depending on the clock has
startet.
I don't care if trampoline code unzip the kernel at highest speed.
> Or, as is usually the case with clocks on these parts, am I confusing
> this with something else :)
It sounds familar, but I do know that I had to hardcode the xtal speed
in the bootcode - which isn't the worst thing, since the xtal is always
16MHz with my boards.
Maybe is is used to setup the USB PLL later in the kernel.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 18
Date: Mon, 8 Mar 2010 22:32:57 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: RM9200 tuning
To: Stanislav Sedov <st...@FreeBSD.org>
Cc: a...@FreeBSD.org, Bernd Walter <ti...@cicely7.cicely.de>,
ti...@cicely.de
Message-ID: <20100308213...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 12:46:38PM -0800, Stanislav Sedov wrote:
> On Mon, 8 Mar 2010 21:23:38 +0100
> Bernd Walter <ti...@cicely7.cicely.de> mentioned:
>
> > Originally FreeBSD had assumed fixed clock rates.
> > Knowing the peripheral rate is important for e.g. UART bps dividers.
> > I think in the meantime it is possible to reconfigure the kernel to
> > different clock rates - if yes what are the kernel options for it?
> > Which would be the best place to reconfigure the PLL?
> > I know how to do it and that it is done by the loader right now, but
> > I would like to have it as a kernel tuneable.
> > All I need to know is a good place in the kernel startup.
> >
>
> I think the best place to do this would be the loader itself. AFAIK, FreeBSD
> on AT91 doesn't assume any specific clock rate except the FSB clock rate and does
> the calibration of the CPU clock and xtal clock on startup. One of solutions is to
> add a loader tunable that will allow you to pass the FSB clock rate from the loader,
> instead of assuming the constant value.
Well I still don't use a real loader, just the plain bootcode.
I would be very happy to switch to loader(8) with FICL, tuneables and
bootpromt.
Is it possible to do today?
I was with Warner using my elfbuild hardware for the first time when he
did the first steps on RM9200.
Therefor I'm probably still using obsolete old quick and dirty hacks.
If loader(8) can be used now it is the first thing I will change before
trying anything else.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 19
Date: Mon, 8 Mar 2010 23:18:50 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: RM9200 tuning
To: "M. Warner Losh" <i...@bsdimp.com>
Cc: a...@freebsd.org, ti...@cicely7.cicely.de, ti...@cicely.de
Message-ID: <20100308221...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 10:24:01PM +0100, Bernd Walter wrote:
> On Mon, Mar 08, 2010 at 01:49:57PM -0700, M. Warner Losh wrote:
> > In message: <20100308202...@cicely7.cicely.de>
> > Bernd Walter <ti...@cicely7.cicely.de> writes:
> > : Originally FreeBSD had assumed fixed clock rates.
> >
> > Only for the first few revs. Now it is settable with the
> > AT91_MASTER_CLOCK option.
>
> This is what I called the peripheral clock?
> So in the normal case I setup this value to 60,000,000?
>
> > : Knowing the peripheral rate is important for e.g. UART bps dividers.
> > : I think in the meantime it is possible to reconfigure the kernel to
> > : different clock rates - if yes what are the kernel options for it?
> > : Which would be the best place to reconfigure the PLL?
> > : I know how to do it and that it is done by the loader right now, but
> > : I would like to have it as a kernel tuneable.
> > : All I need to know is a good place in the kernel startup.
> >
> > Hmmm, I thought the kernel tried to read the master clock rate, but I
> > can't find the code that does it anymore. The AT91 family have a
> > register than can be read to get this value, so long as your board
> > design conforms to the atmel documented restrictions on the clock xtal
> > used.
>
> Ah - I hardly remember that there was something like this.
> I think it is initialized from the ROM code by comparing with the
> 32768 oscillator - without this it wouldn't be possible for the ROM code
> to xmodem the first boot code via DBGU.
> But I think it is just to get the xtal clock.
> At least it would be possible to calculate the remaining clocks from
> that.
> Well - I want something different.
> I want to reprogramm the PLL inside the kernel so I would be more
> interested to know where there would be a good place to inject code for
> this.
> It needs to be early enough before anything depending on the clock has
> startet.
> I don't care if trampoline code unzip the kernel at highest speed.
>
> > Or, as is usually the case with clocks on these parts, am I confusing
> > this with something else :)
>
> It sounds familar, but I do know that I had to hardcode the xtal speed
> in the bootcode - which isn't the worst thing, since the xtal is always
> 16MHz with my boards.
> Maybe is is used to setup the USB PLL later in the kernel.
Yes - there is CKGR_MCFR to read out the xtal frequency.
However this is not an exact value since it is counted with the
32768 oscillator.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 20
Date: Mon, 8 Mar 2010 14:38:50 -0800
From: Stanislav Sedov <st...@FreeBSD.org>
Subject: Re: RM9200 tuning
To: ti...@cicely.de
Cc: a...@FreeBSD.org, Bernd Walter <ti...@cicely7.cicely.de>
Message-ID: <20100308143850...@FreeBSD.org>
Content-Type: text/plain; charset=US-ASCII
On Mon, 8 Mar 2010 22:32:57 +0100
Bernd Walter <ti...@cicely7.cicely.de> mentioned:
> Well I still don't use a real loader, just the plain bootcode.
> I would be very happy to switch to loader(8) with FICL, tuneables and
> bootpromt.
> Is it possible to do today?
> I was with Warner using my elfbuild hardware for the first time when he
> did the first steps on RM9200.
> Therefor I'm probably still using obsolete old quick and dirty hacks.
> If loader(8) can be used now it is the first thing I will change before
> trying anything else.
I belive it should be pretty easy to port ubldr(8) to support AT91 as raj@ made it
working on the Marverll ARM platform. I'm not sure which details should be
reimplemented, though, but from what I seen it should be quite a little work
required.
I'm just using plain uboot loading direclty from flash without Atmel 1st level
loader and then use uboot to load FreeBSD kernel/pass hints and kenv variables.
I think I posted the relevant code a couple of times to the list, but maybe it
really makes more sense to port loader(8) instead as my code isn't going to make
it into the official uboot distribution.
--
Stanislav Sedov
ST4096-RIPE
------------------------------
Message: 21
Date: Mon, 8 Mar 2010 23:54:21 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: RM9200 tuning
To: Stanislav Sedov <st...@FreeBSD.org>
Cc: a...@FreeBSD.org, Bernd Walter <ti...@cicely7.cicely.de>,
ti...@cicely.de
Message-ID: <20100308225...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 02:38:50PM -0800, Stanislav Sedov wrote:
> On Mon, 8 Mar 2010 22:32:57 +0100
> Bernd Walter <ti...@cicely7.cicely.de> mentioned:
>
> > Well I still don't use a real loader, just the plain bootcode.
> > I would be very happy to switch to loader(8) with FICL, tuneables and
> > bootpromt.
> > Is it possible to do today?
> > I was with Warner using my elfbuild hardware for the first time when he
> > did the first steps on RM9200.
> > Therefor I'm probably still using obsolete old quick and dirty hacks.
> > If loader(8) can be used now it is the first thing I will change before
> > trying anything else.
>
> I belive it should be pretty easy to port ubldr(8) to support AT91 as raj@ made it
> working on the Marverll ARM platform. I'm not sure which details should be
> reimplemented, though, but from what I seen it should be quite a little work
> required.
>
> I'm just using plain uboot loading direclty from flash without Atmel 1st level
> loader and then use uboot to load FreeBSD kernel/pass hints and kenv variables.
> I think I posted the relevant code a couple of times to the list, but maybe it
> really makes more sense to port loader(8) instead as my code isn't going to make
> it into the official uboot distribution.
The ability to configure kenv variables is a big win.
I'm using sys/boot/arm/at91/boot2 from spiflash instead of uboot, which
directly starts the kernel.
The kernel has ugly hardcoded things, such as rootdev and hints.
There is already a good amount of support for loader(8) in libat91
so it shouldn't be too hard to do.
I may give it a try after finishing some other projects.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 22
Date: Tue, 9 Mar 2010 00:06:00 +0100
From: Rafal Jaworowski <r...@semihalf.com>
Subject: Re: RM9200 tuning
To: Stanislav Sedov <st...@FreeBSD.org>
Cc: a...@FreeBSD.org, Bernd Walter <ti...@cicely7.cicely.de>,
ti...@cicely.de
Message-ID: <F1BB4E3F-6094-4FC0...@semihalf.com>
Content-Type: text/plain; charset=us-ascii
On 2010-03-08, at 23:38, Stanislav Sedov wrote:
> On Mon, 8 Mar 2010 22:32:57 +0100
> Bernd Walter <ti...@cicely7.cicely.de> mentioned:
>
>> Well I still don't use a real loader, just the plain bootcode.
>> I would be very happy to switch to loader(8) with FICL, tuneables and
>> bootpromt.
>> Is it possible to do today?
>> I was with Warner using my elfbuild hardware for the first time when he
>> did the first steps on RM9200.
>> Therefor I'm probably still using obsolete old quick and dirty hacks.
>> If loader(8) can be used now it is the first thing I will change before
>> trying anything else.
>
> I belive it should be pretty easy to port ubldr(8) to support AT91 as raj@ made it
> working on the Marverll ARM platform. I'm not sure which details should be
> reimplemented, though, but from what I seen it should be quite a little work
> required.
The loader(8) support for ARM is independent of any specific board, but it's currently only integrated with U-Boot as an underlying firmware (though it should not be difficult to integrate it with other first-stage bootloaders provided they export some API for elementary operations on devices).
We have actually had it running so far on various Marvell systems (ARMv5 and v6), TI DaVinci (v5) and EP93xx (also v5), as far as I remember. The only requirement for running current loader(8) on ARM is that U-Boot is built with CONFIG_API option.
Rafal
------------------------------
Message: 23
Date: Mon, 08 Mar 2010 16:06:22 -0700 (MST)
From: "M. Warner Losh" <i...@bsdimp.com>
Subject: Re: RM9200 tuning
To: ti...@cicely.de, ti...@cicely7.cicely.de
Cc: st...@FreeBSD.org, a...@FreeBSD.org
Message-ID: <20100308.160622.149...@bsdimp.com>
Content-Type: Text/Plain; charset=us-ascii
In message: <20100308225...@cicely7.cicely.de>
Bernd Walter <ti...@cicely7.cicely.de> writes:
: On Mon, Mar 08, 2010 at 02:38:50PM -0800, Stanislav Sedov wrote:
: > On Mon, 8 Mar 2010 22:32:57 +0100
: > Bernd Walter <ti...@cicely7.cicely.de> mentioned:
: >
: > > Well I still don't use a real loader, just the plain bootcode.
: > > I would be very happy to switch to loader(8) with FICL, tuneables and
: > > bootpromt.
: > > Is it possible to do today?
: > > I was with Warner using my elfbuild hardware for the first time when he
: > > did the first steps on RM9200.
: > > Therefor I'm probably still using obsolete old quick and dirty hacks.
: > > If loader(8) can be used now it is the first thing I will change before
: > > trying anything else.
: >
: > I belive it should be pretty easy to port ubldr(8) to support AT91 as raj@ made it
: > working on the Marverll ARM platform. I'm not sure which details should be
: > reimplemented, though, but from what I seen it should be quite a little work
: > required.
: >
: > I'm just using plain uboot loading direclty from flash without Atmel 1st level
: > loader and then use uboot to load FreeBSD kernel/pass hints and kenv variables.
: > I think I posted the relevant code a couple of times to the list, but maybe it
: > really makes more sense to port loader(8) instead as my code isn't going to make
: > it into the official uboot distribution.
:
: The ability to configure kenv variables is a big win.
: I'm using sys/boot/arm/at91/boot2 from spiflash instead of uboot, which
: directly starts the kernel.
: The kernel has ugly hardcoded things, such as rootdev and hints.
: There is already a good amount of support for loader(8) in libat91
: so it shouldn't be too hard to do.
: I may give it a try after finishing some other projects.
I have an 8MB SPI flash on one of my boards... Maybe I should get
/boot/loader support going with it.. as well as the different
GEOM_FOO partioning schemes... Certainly would make a certain local
company interested...
Warner
------------------------------
Message: 24
Date: Tue, 9 Mar 2010 00:38:22 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: RM9200 tuning
To: "M. Warner Losh" <i...@bsdimp.com>
Cc: st...@FreeBSD.org, a...@FreeBSD.org, ti...@cicely7.cicely.de,
ti...@cicely.de
Message-ID: <20100308233...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 04:06:22PM -0700, M. Warner Losh wrote:
> In message: <20100308225...@cicely7.cicely.de>
> Bernd Walter <ti...@cicely7.cicely.de> writes:
> : On Mon, Mar 08, 2010 at 02:38:50PM -0800, Stanislav Sedov wrote:
> : > On Mon, 8 Mar 2010 22:32:57 +0100
> : > Bernd Walter <ti...@cicely7.cicely.de> mentioned:
> : >
> : > > Well I still don't use a real loader, just the plain bootcode.
> : > > I would be very happy to switch to loader(8) with FICL, tuneables and
> : > > bootpromt.
> : > > Is it possible to do today?
> : > > I was with Warner using my elfbuild hardware for the first time when he
> : > > did the first steps on RM9200.
> : > > Therefor I'm probably still using obsolete old quick and dirty hacks.
> : > > If loader(8) can be used now it is the first thing I will change before
> : > > trying anything else.
> : >
> : > I belive it should be pretty easy to port ubldr(8) to support AT91 as raj@ made it
> : > working on the Marverll ARM platform. I'm not sure which details should be
> : > reimplemented, though, but from what I seen it should be quite a little work
> : > required.
> : >
> : > I'm just using plain uboot loading direclty from flash without Atmel 1st level
> : > loader and then use uboot to load FreeBSD kernel/pass hints and kenv variables.
> : > I think I posted the relevant code a couple of times to the list, but maybe it
> : > really makes more sense to port loader(8) instead as my code isn't going to make
> : > it into the official uboot distribution.
> :
> : The ability to configure kenv variables is a big win.
> : I'm using sys/boot/arm/at91/boot2 from spiflash instead of uboot, which
> : directly starts the kernel.
> : The kernel has ugly hardcoded things, such as rootdev and hints.
> : There is already a good amount of support for loader(8) in libat91
> : so it shouldn't be too hard to do.
> : I may give it a try after finishing some other projects.
>
> I have an 8MB SPI flash on one of my boards... Maybe I should get
> /boot/loader support going with it.. as well as the different
> GEOM_FOO partioning schemes... Certainly would make a certain local
> company interested...
I just thought about boot2 retrieving loader from SD instead of
kernel.
But you are probably right to retrieve the loader from flash instead.
boot2 with UFS support is already too big for AT91 with just 8k SRAM.
A boot2 with just flash support would be much smaller and loader then
can switch between netboot and such, which currently has to be hardcoded
to keep the size down.
My boards usually have AT45DB161D (some early boards have 321C) chips.
Considered the loader size on i386 this should fit.
I remeber you also worked with boards booting from I2C flash?
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 25
Date: Tue, 09 Mar 2010 00:50:29 +0100
From: Maks Verver <maksv...@geocities.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: freeb...@freebsd.org
Message-ID: <4B958D45...@geocities.com>
Content-Type: text/plain; charset=ISO-8859-1
On 03/08/2010 07:19 PM, Mark Tinguely wrote:
> Could you do this instead:
> <code>
> This would give counts to make sure there is not a logic error in fix_cache.
I tried this (adding initialization of the flag variable) and the
problem is triggered with (at least) these values:
kwritable uwritable kentries uentries
1 0 1 0
1 0 1 1
0 1 0 1
1 0 1 2
The 1/0/1/1 case appears most frequently. Why executable pages are ever
mapped writable in user-space, I don't know. It's useful for
generated/self-modifying code obviously, but I would not expect any of
the standard tools or libraries to rely on that.
- Maks Verver.
------------------------------
Message: 26
Date: Mon, 08 Mar 2010 16:58:57 -0700 (MST)
From: "M. Warner Losh" <i...@bsdimp.com>
Subject: Re: RM9200 tuning
To: ti...@cicely.de, ti...@cicely7.cicely.de
Cc: st...@FreeBSD.org, a...@FreeBSD.org
Message-ID: <20100308.165857.535...@bsdimp.com>
Content-Type: Text/Plain; charset=us-ascii
In message: <20100308233...@cicely7.cicely.de>
Bernd Walter <ti...@cicely7.cicely.de> writes:
: On Mon, Mar 08, 2010 at 04:06:22PM -0700, M. Warner Losh wrote:
: > In message: <20100308225...@cicely7.cicely.de>
: > Bernd Walter <ti...@cicely7.cicely.de> writes:
: > : On Mon, Mar 08, 2010 at 02:38:50PM -0800, Stanislav Sedov wrote:
: > : > On Mon, 8 Mar 2010 22:32:57 +0100
: > : > Bernd Walter <ti...@cicely7.cicely.de> mentioned:
: > : >
: > : > > Well I still don't use a real loader, just the plain bootcode.
: > : > > I would be very happy to switch to loader(8) with FICL, tuneables and
: > : > > bootpromt.
: > : > > Is it possible to do today?
: > : > > I was with Warner using my elfbuild hardware for the first time when he
: > : > > did the first steps on RM9200.
: > : > > Therefor I'm probably still using obsolete old quick and dirty hacks.
: > : > > If loader(8) can be used now it is the first thing I will change before
: > : > > trying anything else.
: > : >
: > : > I belive it should be pretty easy to port ubldr(8) to support AT91 as raj@ made it
: > : > working on the Marverll ARM platform. I'm not sure which details should be
: > : > reimplemented, though, but from what I seen it should be quite a little work
: > : > required.
: > : >
: > : > I'm just using plain uboot loading direclty from flash without Atmel 1st level
: > : > loader and then use uboot to load FreeBSD kernel/pass hints and kenv variables.
: > : > I think I posted the relevant code a couple of times to the list, but maybe it
: > : > really makes more sense to port loader(8) instead as my code isn't going to make
: > : > it into the official uboot distribution.
: > :
: > : The ability to configure kenv variables is a big win.
: > : I'm using sys/boot/arm/at91/boot2 from spiflash instead of uboot, which
: > : directly starts the kernel.
: > : The kernel has ugly hardcoded things, such as rootdev and hints.
: > : There is already a good amount of support for loader(8) in libat91
: > : so it shouldn't be too hard to do.
: > : I may give it a try after finishing some other projects.
: >
: > I have an 8MB SPI flash on one of my boards... Maybe I should get
: > /boot/loader support going with it.. as well as the different
: > GEOM_FOO partioning schemes... Certainly would make a certain local
: > company interested...
:
: I just thought about boot2 retrieving loader from SD instead of
: kernel.
: But you are probably right to retrieve the loader from flash instead.
: boot2 with UFS support is already too big for AT91 with just 8k SRAM.
But the AT91RM9200 has 16k :) We are at about 9.5k, and further space
savings is difficult.
: A boot2 with just flash support would be much smaller and loader then
: can switch between netboot and such, which currently has to be hardcoded
: to keep the size down.
: My boards usually have AT45DB161D (some early boards have 321C) chips.
: Considered the loader size on i386 this should fit.
: I remeber you also worked with boards booting from I2C flash?
Yes.
Warner
------------------------------
Message: 27
Date: Tue, 9 Mar 2010 01:21:47 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: RM9200 tuning
To: "M. Warner Losh" <i...@bsdimp.com>
Cc: st...@FreeBSD.org, a...@FreeBSD.org, ti...@cicely7.cicely.de,
ti...@cicely.de
Message-ID: <20100309002...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Mon, Mar 08, 2010 at 04:58:57PM -0700, M. Warner Losh wrote:
> In message: <20100308233...@cicely7.cicely.de>
> Bernd Walter <ti...@cicely7.cicely.de> writes:
> : On Mon, Mar 08, 2010 at 04:06:22PM -0700, M. Warner Losh wrote:
> : > In message: <20100308225...@cicely7.cicely.de>
> : > Bernd Walter <ti...@cicely7.cicely.de> writes:
> : > : On Mon, Mar 08, 2010 at 02:38:50PM -0800, Stanislav Sedov wrote:
> : > : > On Mon, 8 Mar 2010 22:32:57 +0100
> : > : > Bernd Walter <ti...@cicely7.cicely.de> mentioned:
> : > : >
> : > : > > Well I still don't use a real loader, just the plain bootcode.
> : > : > > I would be very happy to switch to loader(8) with FICL, tuneables and
> : > : > > bootpromt.
> : > : > > Is it possible to do today?
> : > : > > I was with Warner using my elfbuild hardware for the first time when he
> : > : > > did the first steps on RM9200.
> : > : > > Therefor I'm probably still using obsolete old quick and dirty hacks.
> : > : > > If loader(8) can be used now it is the first thing I will change before
> : > : > > trying anything else.
> : > : >
> : > : > I belive it should be pretty easy to port ubldr(8) to support AT91 as raj@ made it
> : > : > working on the Marverll ARM platform. I'm not sure which details should be
> : > : > reimplemented, though, but from what I seen it should be quite a little work
> : > : > required.
> : > : >
> : > : > I'm just using plain uboot loading direclty from flash without Atmel 1st level
> : > : > loader and then use uboot to load FreeBSD kernel/pass hints and kenv variables.
> : > : > I think I posted the relevant code a couple of times to the list, but maybe it
> : > : > really makes more sense to port loader(8) instead as my code isn't going to make
> : > : > it into the official uboot distribution.
> : > :
> : > : The ability to configure kenv variables is a big win.
> : > : I'm using sys/boot/arm/at91/boot2 from spiflash instead of uboot, which
> : > : directly starts the kernel.
> : > : The kernel has ugly hardcoded things, such as rootdev and hints.
> : > : There is already a good amount of support for loader(8) in libat91
> : > : so it shouldn't be too hard to do.
> : > : I may give it a try after finishing some other projects.
> : >
> : > I have an 8MB SPI flash on one of my boards... Maybe I should get
> : > /boot/loader support going with it.. as well as the different
> : > GEOM_FOO partioning schemes... Certainly would make a certain local
> : > company interested...
> :
> : I just thought about boot2 retrieving loader from SD instead of
> : kernel.
> : But you are probably right to retrieve the loader from flash instead.
> : boot2 with UFS support is already too big for AT91 with just 8k SRAM.
>
> But the AT91RM9200 has 16k :) We are at about 9.5k, and further space
> savings is difficult.
I would be lying if I say that I never looked at the AT91SAM9260.
It is cheaper, faster, needs less external circuit and has less hardware
drawbacks (especially worth mentioning are ATE and MCI) - but only 8k SRAM.
My boot0spi is, including a simple RAM check, just 3585 bytes (codesize).
It is mostly the UFS, netboot and SD support which makes boot2 that big.
Just to init hardware and then copying loader(8) from SPI into SDRAM
should easily fit into 8k.
Or was there a different reason to mention that you have 8M flash?
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
End of freebsd-arm Digest, Vol 206, Issue 2
*******************************************