To subscribe or unsubscribe via the World Wide Web, visit
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
or, via email, send a message with subject or body 'help' to
freebsd-a...@freebsd.org
You can reach the person managing the list at
freebsd-...@freebsd.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-arm digest..."
Today's Topics:
1. Performance of SheevaPlug on 8-stable (Maks Verver)
2. Re: Performance of SheevaPlug on 8-stable (Bernd Walter)
3. Re: Performance of SheevaPlug on 8-stable (Bernd Walter)
4. Re: Performance of SheevaPlug on 8-stable (M. Warner Losh)
5. Re: Performance of SheevaPlug on 8-stable (Maks Verver)
6. Re: Performance of SheevaPlug on 8-stable (Bernd Walter)
----------------------------------------------------------------------
Message: 1
Date: Sat, 06 Mar 2010 21:39:57 +0100
From: Maks Verver <maksv...@geocities.com>
Subject: Performance of SheevaPlug on 8-stable
To: freeb...@freebsd.org
Message-ID: <4B92BD9D...@geocities.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi everyone,
After a bit of patching and tinkering I got my SheevaPlug to boot
FreeBSD from a UFS2-formatted USB stick. To compare it with Linux I
decided to run nbench to see how FreeBSD compares with Ubuntu (which is
shipped with the SheevaPlug). To my surprise, the results were
atrocious! FreeBSD scores about 50 times worse than Ubuntu.
Of course, this performance difference is too large to be caused by
implementation differences. There must be something more fundemental
wrong here. To simplify things, I created a simple testcase that counts
up to the maximum value of an integer:
int main() { int i = 0; do ++i; while(i > 0); return 0; }
This compiles to: (both on Linux and on FreeBSD)
0000848c <main>:
848c: e3a03000 mov r3, #0 ; 0x0
8490: e2833001 add r3, r3, #1 ; 0x1
8494: e3530000 cmp r3, #0 ; 0x0
8498: cafffffc bgt 8490 <main+0x4>
849c: e3a00000 mov r0, #0 ; 0x0
84a0: e1a0f00e mov pc, lr
This stresses the CPU and not much else. Since there are three
instructions in the loop and the SheevaPlug runs at 1.2 GHz, I
expect this to take around (1<<31)*3/1.2e9 ~ 5.3687 seconds. On Ubuntu:
$ time ./test
real 0m5.422s
user 0m5.390s
sys 0m0.020s
Exactly as expected. On FreeBSD on the other hand:
%time ./test
286.000u 0.000s 4:47.22 99.8% 40+1321k 0+0io 0pf+0w
This takes almost five minutes, or over 50 times as long! All of it is
user-space CPU time. Does anybody have a suggestion why the CPU appears
to run so slowly in FreeBSD?
I pored over my kernel configuration but I don't see anything suspect. I
did (manually) apply Hans Petter Selasky's patch [1] to be able to boot
from USB, and consequently removed the NFS and BOOTP stuff from the
config provided at sys/arm/conf/SHEEVAPLUG. Furthermore I removed the
NO_SWAPPING and NO_FFS_SNAPSHOT options (because I plan to attach a USB
disk drive) and I left in the KDB and DDB options because as I think
they do not significantly affect performance. Is this correct?
Kind regards,
Maks Verver.
P.S. The strange thing is that stuff like network performance is
perfectly fine. I can fetch FTP data at 11 MB/s, which is about the
maximum possible on the cheap 100 Mbit switch I use, and is even a few
percent better than Ubuntu. So it seems it's really the CPU that's the
bottleneck, for no apparent reason.
[1] http://p4db.freebsd.org/chv.cgi?CH=169183
------------------------------
Message: 2
Date: Sat, 6 Mar 2010 22:17:16 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Maks Verver <maksv...@geocities.com>
Cc: freeb...@freebsd.org
Message-ID: <20100306211...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Sat, Mar 06, 2010 at 09:39:57PM +0100, Maks Verver wrote:
> Hi everyone,
>
> After a bit of patching and tinkering I got my SheevaPlug to boot
> FreeBSD from a UFS2-formatted USB stick. To compare it with Linux I
> decided to run nbench to see how FreeBSD compares with Ubuntu (which is
> shipped with the SheevaPlug). To my surprise, the results were
> atrocious! FreeBSD scores about 50 times worse than Ubuntu.
>
> Of course, this performance difference is too large to be caused by
> implementation differences. There must be something more fundemental
> wrong here. To simplify things, I created a simple testcase that counts
> up to the maximum value of an integer:
>
> int main() { int i = 0; do ++i; while(i > 0); return 0; }
>
> This compiles to: (both on Linux and on FreeBSD)
>
> 0000848c <main>:
> 848c: e3a03000 mov r3, #0 ; 0x0
> 8490: e2833001 add r3, r3, #1 ; 0x1
> 8494: e3530000 cmp r3, #0 ; 0x0
> 8498: cafffffc bgt 8490 <main+0x4>
> 849c: e3a00000 mov r0, #0 ; 0x0
> 84a0: e1a0f00e mov pc, lr
>
> This stresses the CPU and not much else. Since there are three
> instructions in the loop and the SheevaPlug runs at 1.2 GHz, I
> expect this to take around (1<<31)*3/1.2e9 ~ 5.3687 seconds. On Ubuntu:
>
> $ time ./test
> real 0m5.422s
> user 0m5.390s
> sys 0m0.020s
>
> Exactly as expected. On FreeBSD on the other hand:
>
> %time ./test
> 286.000u 0.000s 4:47.22 99.8% 40+1321k 0+0io 0pf+0w
>
> This takes almost five minutes, or over 50 times as long! All of it is
> user-space CPU time. Does anybody have a suggestion why the CPU appears
> to run so slowly in FreeBSD?
I was tempted to say different compiler optimisaitons, but you say that
the resulting code is the same.
Such massive speed difference sounds a bit like cache problems.
For what it's worth - I see it takes minutes (not finished yet) on 180MHz
RM9200 as well.
According to dmesg IC is enabled:
CPU: ARM920T rev 0 (ARM9TDMI core)
DC enabled IC enabled WB enabled LABT
16KB/32B 64-way Instruction cache
16KB/32B 64-way write-back-locking-A Data cache
If the above calculation is correct I would expect it to finish after
~7 times more time than calculated.
If the calculation is wrong, then why does Ubunto agrees with it?
> I pored over my kernel configuration but I don't see anything suspect. I
> did (manually) apply Hans Petter Selasky's patch [1] to be able to boot
> from USB, and consequently removed the NFS and BOOTP stuff from the
> config provided at sys/arm/conf/SHEEVAPLUG. Furthermore I removed the
> NO_SWAPPING and NO_FFS_SNAPSHOT options (because I plan to attach a USB
> disk drive) and I left in the KDB and DDB options because as I think
> they do not significantly affect performance. Is this correct?
>
> Kind regards,
> Maks Verver.
>
> P.S. The strange thing is that stuff like network performance is
> perfectly fine. I can fetch FTP data at 11 MB/s, which is about the
> maximum possible on the cheap 100 Mbit switch I use, and is even a few
> percent better than Ubuntu. So it seems it's really the CPU that's the
> bottleneck, for no apparent reason.
FTP won't win that much from cache and our network stack might outweight
the loss, so this all makes sense if IC cache won't work.
I think you have a very interesting catch, although I don't know why it
exactly is.
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 3
Date: Sat, 6 Mar 2010 22:51:53 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Maks Verver <maksv...@geocities.com>
Cc: freeb...@freebsd.org
Message-ID: <20100306215...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Sat, Mar 06, 2010 at 10:17:16PM +0100, Bernd Walter wrote:
> On Sat, Mar 06, 2010 at 09:39:57PM +0100, Maks Verver wrote:
> > Hi everyone,
> >
> > After a bit of patching and tinkering I got my SheevaPlug to boot
> > FreeBSD from a UFS2-formatted USB stick. To compare it with Linux I
> > decided to run nbench to see how FreeBSD compares with Ubuntu (which is
> > shipped with the SheevaPlug). To my surprise, the results were
> > atrocious! FreeBSD scores about 50 times worse than Ubuntu.
> >
> > Of course, this performance difference is too large to be caused by
> > implementation differences. There must be something more fundemental
> > wrong here. To simplify things, I created a simple testcase that counts
> > up to the maximum value of an integer:
> >
> > int main() { int i = 0; do ++i; while(i > 0); return 0; }
> >
> > This compiles to: (both on Linux and on FreeBSD)
> >
> > 0000848c <main>:
> > 848c: e3a03000 mov r3, #0 ; 0x0
> > 8490: e2833001 add r3, r3, #1 ; 0x1
> > 8494: e3530000 cmp r3, #0 ; 0x0
> > 8498: cafffffc bgt 8490 <main+0x4>
> > 849c: e3a00000 mov r0, #0 ; 0x0
> > 84a0: e1a0f00e mov pc, lr
> >
> > This stresses the CPU and not much else. Since there are three
> > instructions in the loop and the SheevaPlug runs at 1.2 GHz, I
> > expect this to take around (1<<31)*3/1.2e9 ~ 5.3687 seconds. On Ubuntu:
> >
> > $ time ./test
> > real 0m5.422s
> > user 0m5.390s
> > sys 0m0.020s
> >
> > Exactly as expected. On FreeBSD on the other hand:
> >
> > %time ./test
> > 286.000u 0.000s 4:47.22 99.8% 40+1321k 0+0io 0pf+0w
> >
> > This takes almost five minutes, or over 50 times as long! All of it is
> > user-space CPU time. Does anybody have a suggestion why the CPU appears
> > to run so slowly in FreeBSD?
>
> I was tempted to say different compiler optimisaitons, but you say that
> the resulting code is the same.
> Such massive speed difference sounds a bit like cache problems.
> For what it's worth - I see it takes minutes (not finished yet) on 180MHz
> RM9200 as well.
[67]chipmunk.cicely.de# ./test
2185.000u 3.000s 42:03.86 86.6% 46+1532k 0+0io 0pf+0w
This is really a long time to count 2^32 with 180MHz.
I would really say that there is something wrong.
> According to dmesg IC is enabled:
> CPU: ARM920T rev 0 (ARM9TDMI core)
> DC enabled IC enabled WB enabled LABT
> 16KB/32B 64-way Instruction cache
> 16KB/32B 64-way write-back-locking-A Data cache
>
> If the above calculation is correct I would expect it to finish after
> ~7 times more time than calculated.
> If the calculation is wrong, then why does Ubunto agrees with it?
>
> > I pored over my kernel configuration but I don't see anything suspect. I
> > did (manually) apply Hans Petter Selasky's patch [1] to be able to boot
> > from USB, and consequently removed the NFS and BOOTP stuff from the
> > config provided at sys/arm/conf/SHEEVAPLUG. Furthermore I removed the
> > NO_SWAPPING and NO_FFS_SNAPSHOT options (because I plan to attach a USB
> > disk drive) and I left in the KDB and DDB options because as I think
> > they do not significantly affect performance. Is this correct?
> >
> > Kind regards,
> > Maks Verver.
> >
> > P.S. The strange thing is that stuff like network performance is
> > perfectly fine. I can fetch FTP data at 11 MB/s, which is about the
> > maximum possible on the cheap 100 Mbit switch I use, and is even a few
> > percent better than Ubuntu. So it seems it's really the CPU that's the
> > bottleneck, for no apparent reason.
>
> FTP won't win that much from cache and our network stack might outweight
> the loss, so this all makes sense if IC cache won't work.
> I think you have a very interesting catch, although I don't know why it
> exactly is.
>
> --
> B.Walter <be...@bwct.de> http://www.bwct.de
> Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
> _______________________________________________
> freeb...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm...@freebsd.org"
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
Message: 4
Date: Sat, 06 Mar 2010 15:26:03 -0700 (MST)
From: "M. Warner Losh" <i...@bsdimp.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: ti...@cicely.de, ti...@cicely7.cicely.de
Cc: freeb...@freebsd.org
Message-ID: <20100306.152603.716...@bsdimp.com>
Content-Type: Text/Plain; charset=us-ascii
In message: <20100306215...@cicely7.cicely.de>
Bernd Walter <ti...@cicely7.cicely.de> writes:
: This is really a long time to count 2^32 with 180MHz.
: I would really say that there is something wrong.
Sounds a lot like ICACHE isn't being enabled, since a 3-liner like
this should be executing entirely out of cache after the first
instruction in main prefetches the cache line.
Warner
------------------------------
Message: 5
Date: Sun, 07 Mar 2010 02:39:48 +0100
From: Maks Verver <maksv...@geocities.com>
Subject: Re: Performance of SheevaPlug on 8-stable
To: freeb...@freebsd.org
Message-ID: <4B9303E4...@geocities.com>
Content-Type: text/plain; charset=ISO-8859-1
On 03/06/2010 10:17 PM, Bernd Walter wrote:
> Such massive speed difference sounds a bit like cache problems.
On 03/06/2010 11:26 PM, M. Warner Losh wrote:
> Sounds a lot like ICACHE isn't being enabled, since a 3-liner like
> this should be executing entirely out of cache after the first
> instruction in main prefetches the cache line.
Thanks for the quick responses! I think the both of you are right. I
didn't realize the cache could be turned off at all, but the boot output
shows:
CPU: Feroceon 88FR131 rev 1 (write-through core)
WB enabled EABT branch prediction enabled
16KB/32B 4-way Instruction cache
16KB/32B 4-way write-back-locking-C Data cache
This is different from the output on the wiki (which instructions I
followed, to some extent) at http://wiki.freebsd.org/FreeBSDMarvell:
CPU: ARM926EJ-S rev 0 (ARM9EJ-S core)
DC enabled IC enabled WB enabled EABT branch prediction enabled
32KB/32B 1-way Instruction cache
32KB/32B 4-way write-back-locking-C Data cache
Note that this guy is not running a SheevaPlug; the CPU is different.
But it's clear enough that on my system both processor caches are
disabled (even though they are correctly identified) and this is
understandably catastrophic for performance. It's good to have that
figured out at least. :-)
The logical next question is: why aren't these caches enabled? How is
this supposed to work? Is the bootloader supposed to enable the cache,
or the kernel? If the kernel, why isn't it doing this? (If it's the
bootloader's task, then it's strange that the Linux kernel has no
trouble enabling the cache with the same bootloader).
Kind regards,
Maks Verver.
------------------------------
Message: 6
Date: Sun, 7 Mar 2010 08:00:10 +0100
From: Bernd Walter <ti...@cicely7.cicely.de>
Subject: Re: Performance of SheevaPlug on 8-stable
To: Maks Verver <maksv...@geocities.com>
Cc: freeb...@freebsd.org
Message-ID: <20100307070...@cicely7.cicely.de>
Content-Type: text/plain; charset=us-ascii
On Sun, Mar 07, 2010 at 02:39:48AM +0100, Maks Verver wrote:
> On 03/06/2010 10:17 PM, Bernd Walter wrote:
> > Such massive speed difference sounds a bit like cache problems.
>
> On 03/06/2010 11:26 PM, M. Warner Losh wrote:
> > Sounds a lot like ICACHE isn't being enabled, since a 3-liner like
> > this should be executing entirely out of cache after the first
> > instruction in main prefetches the cache line.
>
> Thanks for the quick responses! I think the both of you are right. I
> didn't realize the cache could be turned off at all, but the boot output
> shows:
>
> CPU: Feroceon 88FR131 rev 1 (write-through core)
> WB enabled EABT branch prediction enabled
> 16KB/32B 4-way Instruction cache
> 16KB/32B 4-way write-back-locking-C Data cache
>
> This is different from the output on the wiki (which instructions I
> followed, to some extent) at http://wiki.freebsd.org/FreeBSDMarvell:
>
> CPU: ARM926EJ-S rev 0 (ARM9EJ-S core)
> DC enabled IC enabled WB enabled EABT branch prediction enabled
> 32KB/32B 1-way Instruction cache
> 32KB/32B 4-way write-back-locking-C Data cache
>
> Note that this guy is not running a SheevaPlug; the CPU is different.
That's probably just because of different CPUs.
I see a similar output on all of my systems with ARM920T CPU and
still there is something wrong.
I just verified with my 7.0-current system:
[102]arm9# ./test
200.000u 3.000s 12:51.47 26.3% 45+1512k 0+0io 0pf+0w
The system is productive and isn't completely idle, but the time is
still smaller, so it is hard to say if there is a problem as well.
Most interesting is a 8.0-current system I have:
[4]beaver.cicely.de> ./test
196.000u 1.000s 3:43.03 88.8% 44+1452k 0+0io 0pf+0w
Still much slower than calculated 80 seconds though, but also much
faster than on my 9-current system.
> But it's clear enough that on my system both processor caches are
> disabled (even though they are correctly identified) and this is
> understandably catastrophic for performance. It's good to have that
> figured out at least. :-)
Your loop isn't doing any data access, so it's just saying something
about ICACHE not working.
> The logical next question is: why aren't these caches enabled? How is
> this supposed to work? Is the bootloader supposed to enable the cache,
> or the kernel? If the kernel, why isn't it doing this? (If it's the
> bootloader's task, then it's strange that the Linux kernel has no
> trouble enabling the cache with the same bootloader).
That's a good question.
The kernel identifies them as being enabled on my CPU, but is it
really true?
IS there something which disables it later or this code is already
wrong.
But maybe it is not ICACHE itself and the memory pages are just
declared uncacheable?
--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
------------------------------
End of freebsd-arm Digest, Vol 205, Issue 4
*******************************************