To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
or, via email, send a message with subject or body 'help' to
freebsd-a...@freebsd.org
You can reach the person managing the list at
freebsd-...@freebsd.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-arm digest..."
Today's Topics:
1. Re: Performance of SheevaPlug on 8-stable (Grzegorz Bernacki)
2. Re: Performance of SheevaPlug on 8-stable (Rafal Jaworowski)
3. Re: Performance of SheevaPlug on 8-stable (Mark Tinguely)
4. [releng_8 tinderbox] failure on arm/arm (FreeBSD Tinderbox)
5. Re: Performance of SheevaPlug on 8-stable (Bernd Walter)
6. Re: Performance of SheevaPlug on 8-stable (Rafal Jaworowski)
7. Re: Performance of SheevaPlug on 8-stable (Mark Tinguely)
8. Re: Performance of SheevaPlug on 8-stable (Mark Tinguely)
----------------------------------------------------------------------
Message: 1
Date: Wed, 10 Mar 2010 14:58:00 +0100
From: Grzegorz Bernacki <g...@semihalf.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Mark Tinguely <ting...@casselton.net>
Cc: freeb...@freebsd.org
Message-ID: <4B97A568...@semihalf.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Mark Tinguely wrote:
> FreeBSD-current has kernel and user witness turned on. Witness is for
> locks, so it should not change the performance of a tight arithmetic loop
> like this.
>
> I don't know the marvell interals, and from what I tell, their technial
> docs require NDA. That said, many of the ARM processors also have a
> instruction internal cache (instruction prefetch) in addition to the
> instruction cache. I don't think the prefetch has an enable/disable.
>
> It looks like from the cpu identification that the the branch prediction
> is turned on. Branch prediction compensates for the longer pipelines.
> I can't see how in the tight loop how that could go astray.
>
> Thus says the ARM ARM:
>
> ARM implementations are free to choose how far ahead of the
> current point of execution they prefetch instructions; either
> a fixed or a dynamically varying number of instructions. As well
> as being free to choose how many instructions to prefetch, an ARM
> implementation can choose which possible future execution path to
> prefetch along. For example, after a branch instruction, it can
> choose to prefetch either the instruction following the branch
> or the instruction at the branch target. This is known as branch
> prediction.
>
> There are a few data dangling allocations that I would like to see
> closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
> is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
> it is never marked as unallocated. *IN THEORY*, if that page is used
> again, then we could falsely believe that page is being shared and
> we turn off the cache, eventhough it is not shared.
>
> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
>
> * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
> the Sheeva implementation. This is a theoritical observation of a side
> effect of the multiple kernel mapping patch that we did just before
> FreeBSD 8-release.
I instrumented code with KTRs and your theory is correct. Kernel reuse page
which was previouly mapped via arm_nocache. Your patch should be applied
to -current.
grzesiek
------------------------------
Message: 2
Date: Wed, 10 Mar 2010 15:05:14 +0100
From: Rafal Jaworowski <r...@semihalf.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Bernd Walter <ti...@cicely7.cicely.de>
Cc: Mark Tinguely <ting...@casselton.net>, freeb...@freebsd.org
Message-ID: <0C313497-FA0C-4E93...@semihalf.com>
Content-Type: text/plain; charset=us-ascii
On 2010-03-10, at 14:58, Grzegorz Bernacki wrote:
>> There are a few data dangling allocations that I would like to see
>> closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
>> is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
>> it is never marked as unallocated. *IN THEORY*, if that page is used
>> again, then we could falsely believe that page is being shared and
>> we turn off the cache, eventhough it is not shared.
>> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
>> * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
>> the Sheeva implementation. This is a theoritical observation of a side
>> effect of the multiple kernel mapping patch that we did just before
>> FreeBSD 8-release.
>
> I instrumented code with KTRs and your theory is correct. Kernel reuse page
> which was previouly mapped via arm_nocache. Your patch should be applied
> to -current.
Bernd,
Could you confirm this also fixes the issues for you on the RM9200 machine? If so, I'll go on and commit the changes.
Rafal
------------------------------
Message: 3
Date: Wed, 10 Mar 2010 08:21:30 -0600 (CST)
From: Mark Tinguely <ting...@casselton.net>
Subject: Re: Performance of SheevaPlug on 8-stable
To: g...@semihalf.com, ting...@casselton.net
Cc: freeb...@freebsd.org
Message-ID: <201003101421....@casselton.net>
<deletes of my stuff>
> > There are a few data dangling allocations that I would like to see
> > closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
> > is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
> > it is never marked as unallocated. *IN THEORY*, if that page is used
> > again, then we could falsely believe that page is being shared and
> > we turn off the cache, eventhough it is not shared.
> >
> > http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
> >
> > * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
> > the Sheeva implementation. This is a theoritical observation of a side
> > effect of the multiple kernel mapping patch that we did just before
> > FreeBSD 8-release.
>
> I instrumented code with KTRs and your theory is correct. Kernel reuse page
> which was previouly mapped via arm_nocache. Your patch should be applied
> to -current.
>
> grzesiek
Thank-you. I would appreciate it if someone would commit that patch. It
has been on my mind since FreeBSD 8.0-release. This patch should help some
data buffers because we do not incorrectly think this page is still mapped
to a KVA.
I don't think this our I/O performance problem solution, Off-list, I have
mentioned the belief that it may be beneficial to think of not turning
off the cache DMA_COHERENT even for ARMv4/ARMv5.
--Mark Tinguely.
------------------------------
Message: 4
Date: Wed, 10 Mar 2010 14:28:18 GMT
From: FreeBSD Tinderbox <tind...@freebsd.org>
Subject: [releng_8 tinderbox] failure on arm/arm
To: FreeBSD Tinderbox <tind...@freebsd.org>, <sta...@freebsd.org>,
<a...@freebsd.org>
Message-ID: <201003101428....@freebsd-current.sentex.ca>
TB --- 2010-03-10 14:00:00 - tinderbox 2.6 running on freebsd-current.sentex.ca
TB --- 2010-03-10 14:00:00 - starting RELENG_8 tinderbox run for arm/arm
TB --- 2010-03-10 14:00:00 - cleaning the object tree
TB --- 2010-03-10 14:00:08 - cvsupping the source tree
TB --- 2010-03-10 14:00:08 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca /tinderbox/RELENG_8/arm/arm/supfile
TB --- 2010-03-10 14:00:26 - building world
TB --- 2010-03-10 14:00:26 - MAKEOBJDIRPREFIX=/obj
TB --- 2010-03-10 14:00:26 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2010-03-10 14:00:26 - TARGET=arm
TB --- 2010-03-10 14:00:26 - TARGET_ARCH=arm
TB --- 2010-03-10 14:00:26 - TZ=UTC
TB --- 2010-03-10 14:00:26 - __MAKE_CONF=/dev/null
TB --- 2010-03-10 14:00:26 - cd /src
TB --- 2010-03-10 14:00:26 - /usr/bin/make -B buildworld
>>> World build started on Wed Mar 10 14:00:26 UTC 2010
>>> Rebuilding the temporary build tree
>>> stage 1.1: legacy release compatibility shims
>>> stage 1.2: bootstrap tools
>>> stage 2.1: cleaning up the object tree
>>> stage 2.2: rebuilding the object tree
>>> stage 2.3: build tools
>>> stage 3: cross tools
>>> stage 4.1: building includes
>>> stage 4.2: building libraries
>>> stage 4.3: make dependencies
>>> stage 4.4: building everything
[...]
gzip -cn /src/lib/libc/sys/sched_setparam.2 > sched_setparam.2.gz
gzip -cn /src/lib/libc/sys/sched_setscheduler.2 > sched_setscheduler.2.gz
gzip -cn /src/lib/libc/sys/sched_yield.2 > sched_yield.2.gz
gzip -cn /src/lib/libc/sys/sctp_generic_recvmsg.2 > sctp_generic_recvmsg.2.gz
gzip -cn /src/lib/libc/sys/sctp_generic_sendmsg.2 > sctp_generic_sendmsg.2.gz
gzip -cn /src/lib/libc/sys/sctp_peeloff.2 > sctp_peeloff.2.gz
gzip -cn /src/lib/libc/sys/select.2 > select.2.gz
/libexec/ld-elf.so.1: Cannot open "/lib/libncurses.so.8"
*** Error code 1
Stop in /src/lib/libc.
*** Error code 1
Stop in /src/lib.
*** Error code 1
Stop in /src.
*** Error code 1
Stop in /src.
*** Error code 1
Stop in /src.
TB --- 2010-03-10 14:28:18 - WARNING: /usr/bin/make returned exit code 1
TB --- 2010-03-10 14:28:18 - ERROR: failed to build world
TB --- 2010-03-10 14:28:18 - 1165.76 user 320.11 system 1697.95 real
http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8-arm-arm.full
------------------------------
Message: 5
Date: Wed, 10 Mar 2010 15:38:11 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Rafal Jaworowski <r...@semihalf.com>
Cc: Mark Tinguely <ting...@casselton.net>, Bernd Walter
<ti...@cicely7.cicely.de>, freeb...@freebsd.org
Message-ID: <20100310143...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Wed, Mar 10, 2010 at 03:05:14PM +0100, Rafal Jaworowski wrote:
>
> On 2010-03-10, at 14:58, Grzegorz Bernacki wrote:
>
> >> There are a few data dangling allocations that I would like to see
> >> closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
> >> is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
> >> it is never marked as unallocated. *IN THEORY*, if that page is used
> >> again, then we could falsely believe that page is being shared and
> >> we turn off the cache, eventhough it is not shared.
> >> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
> >> * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
> >> the Sheeva implementation. This is a theoritical observation of a side
> >> effect of the multiple kernel mapping patch that we did just before
> >> FreeBSD 8-release.
> >
> > I instrumented code with KTRs and your theory is correct. Kernel reuse page
> > which was previouly mapped via arm_nocache. Your patch should be applied
> > to -current.
>
> Bernd,
> Could you confirm this also fixes the issues for you on the RM9200 machine? If so, I'll go on and commit the changes.
For me it helped to get back to the speed of my older systems.
Someone mentioned that even with this patch the speed can still drop
after some time.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 6
Date: Wed, 10 Mar 2010 16:53:15 +0100
From: Rafal Jaworowski <r...@semihalf.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Maks Verver <maksv...@geocities.com>
Cc: freeb...@freebsd.org
Message-ID: <BC70E9A7-56C4-499C...@semihalf.com>
Content-Type: text/plain; charset=us-ascii
On 2010-03-08, at 15:29, Maks Verver wrote:
> Next up, this patch:
>
>> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
>
> No idea what this does, but it helps a lot:
>
> %time ./test
> 9.000u 0.000s 0:09.11 99.2% 40+1324k 0+0io 0pf+0w
>
> That's much better than the 280+ seconds from before. But it's still
> nearly twice as long as Linux takes.
>
> There is more weirdness though. If I freshly boot the system I get
> timings like these, and even nbench reports decent scores. However, if I
> do a couple things like rerun/recompile nbench, then at some point
> something 'breaks' and the performance goes back down to what it used to be.
Mark,
Can you confirm this worsening over time happens with a fresh (from scratch) kernel build (with Mark T. patch applied)? Please provide the scenario / steps which lead to this behaviour.
Rafal
------------------------------
Message: 7
Date: Wed, 10 Mar 2010 10:42:15 -0600 (CST)
From: Mark Tinguely <ting...@casselton.net>
Subject: Re: Performance of SheevaPlug on 8-stable
To: ti...@cicely.de
Cc: freeb...@freebsd.org
Message-ID: <201003101642....@casselton.net>
On Wed Mar 10 at 0:38:20am, Bernd Walter wrote:
>
> On Wed, Mar 10, 2010 at 03:05:14PM +0100, Rafal Jaworowski wrote:
> >
> > On 2010-03-10, at 14:58, Grzegorz Bernacki wrote:
> >
> > >> There are a few data dangling allocations that I would like to see
> > >> closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
> > >> is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
> > >> it is never marked as unallocated. *IN THEORY*, if that page is used
> > >> again, then we could falsely believe that page is being shared and
> > >> we turn off the cache, eventhough it is not shared.
> > >> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
> > >> * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
> > >> the Sheeva implementation. This is a theoritical observation of a side
> > >> effect of the multiple kernel mapping patch that we did just before
> > >> FreeBSD 8-release.
> > >
> > > I instrumented code with KTRs and your theory is correct. Kernel reuse pa
ge
> > > which was previouly mapped via arm_nocache. Your patch should be applied
> > > to -current.
> >
> > Bernd,
> > Could you confirm this also fixes the issues for you on the RM9200 machine?
If so, I'll go on and commit the changes.
>
> For me it helped to get back to the speed of my older systems.
> Someone mentioned that even with this patch the speed can still drop
> after some time.
The orginial patch is needed, but the above would imply that there are more
places that we are not removing the remberence of the kernel allocation.
The assumption was the allocations are to the kernel map and are originating
rom pmap_kenter/pmap_qenter and will be removed with pmap_kremove/pmap_qremove.
I looked at the kernel sources yesterday; pmap_qenter/pmap_qremove is used
exclusively in the machine independant code.
Maybe the pmap_qremove is not used (process termination?, another allocation
with remove?) and the page is freed instead with the pmap_remove_page,
pmap_remove_all, pmap_remove routines.
A test panic in vm_page_free_toq() on non-zero md.pv_kva will tell us
which routine is releasing the page.
I will think some more about the pmap_remove_page, pmap_remove_all, pmap_remove
paths.
--Mark Tinguely.
------------------------------
Message: 8
Date: Wed, 10 Mar 2010 12:07:14 -0600 (CST)
From: Mark Tinguely <ting...@casselton.net>
Subject: Re: Performance of SheevaPlug on 8-stable
To: maksv...@geocities.com, r...@semihalf.com
Cc: freeb...@freebsd.org
Message-ID: <201003101807....@casselton.net>
On Wed, 10 Mar 2010 16:53pm, Rafal Jaworowski asks:
>
> On 2010-03-08, at 15:29, Maks Verver wrote:
>
> > Next up, this patch:
> >=20
> >> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
> >=20
> > No idea what this does, but it helps a lot:
> >=20
> > %time ./test
> > 9.000u 0.000s 0:09.11 99.2% 40+1324k 0+0io 0pf+0w
> >=20
> > That's much better than the 280+ seconds from before. But it's still
> > nearly twice as long as Linux takes.
> >=20
> > There is more weirdness though. If I freshly boot the system I get
> > timings like these, and even nbench reports decent scores. However, if =
> I
> > do a couple things like rerun/recompile nbench, then at some point
> > something 'breaks' and the performance goes back down to what it used =
> to be.
>
> Mark,
> Can you confirm this worsening over time happens with a fresh (from =
> scratch) kernel build (with Mark T. patch applied)? Please provide the =
> scenario / steps which lead to this behaviour.
>
> Rafal
I believe there is still a path that md.pv_kva is not being zeroed before
the page is freed. Later, when the page is re-mapped either for data or
executable, we think (because md.pv_kva is non-zero), that this page is still
mapped into the KVA stored at md.pv_kva.
My patch was for two places in the the machine dependant code that
we were not freeing the md.pv_kva.
We can prove this theory by placing a temporary printf statement or
even better a panic in vm_page_free_toq(). Something like:
if (m->hold_count != 0) {
m->flags &= ~PG_ZERO;
vm_page_enqueue(PQ_HOLD, m);
} else {
+ KASSERT(!m->md.pv_kva,
+ ("vm_page_free_toq: pva nonzero %p", m->md.pv_kva));
/*
* Restore the default memory attribute to the page.
*/
if (pmap_page_get_memattr(m) != VM_MEMATTR_DEFAULT)
pmap_page_set_memattr(m, VM_MEMATTR_DEFAULT);
Since the machine indepentant sources look pretty consistent on
pmap_qenter/pmap_qremove calls, I bet one of the pmap_remove* routines
will be freeing the page and in the panic traceback. pmap_remove*
was not the path I was expecting the kernel mapped page to be removed.
--Mark Tinguely
------------------------------
End of freebsd-arm Digest, Vol 206, Issue 5
*******************************************