Re: SCHED_ULE should not be the default

9 views
Skip to first unread message

O. Hartmann

unread,
Dec 12, 2011, 8:47:57 AM12/12/11
to Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org

> Not fully right, boinc defaults to run on idprio 31 so this isn't an
> issue. And yes, there are cases where SCHED_ULE shows much better
> performance then SCHED_4BSD. [...]

Do we have any proof at hand for such cases where SCHED_ULE performs
much better than SCHED_4BSD? Whenever the subject comes up, it is
mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
2. But in the end I see here contradictionary statements. People
complain about poor performance (especially in scientific environments),
and other give contra not being the case.

Within our department, we developed a highly scalable code for planetary
science purposes on imagery. It utilizes present GPUs via OpenCL if
present. Otherwise it grabs as many cores as it can.
By the end of this year I'll get a new desktop box based on Intels new
Sandy Bridge-E architecture with plenty of memory. If the colleague who
developed the code is willing performing some benchmarks on the same
hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
recent Suse. For FreeBSD I intent also to look for performance with both
different schedulers available.

O.

signature.asc

Vincent Hoffman

unread,
Dec 12, 2011, 10:13:00 AM12/12/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/12/2011 13:47, O. Hartmann wrote:
>
>> Not fully right, boinc defaults to run on idprio 31 so this isn't an
>> issue. And yes, there are cases where SCHED_ULE shows much better
>> performance then SCHED_4BSD. [...]
>
> Do we have any proof at hand for such cases where SCHED_ULE performs
> much better than SCHED_4BSD? Whenever the subject comes up, it is
> mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
> 2. But in the end I see here contradictionary statements. People
> complain about poor performance (especially in scientific environments),
> and other give contra not being the case.

It all a little old now but some if the stuff in
http://people.freebsd.org/~kris/scaling/
covers improvements that were seen.

http://jeffr-tech.livejournal.com/5705.html
shows a little too, reading though Jeffs blog is worth it as it has some
interesting stuff on SHED_ULE.

I thought there were some more benchmarks floating round but cant find
any with a quick google.


Vince

>
> Within our department, we developed a highly scalable code for planetary
> science purposes on imagery. It utilizes present GPUs via OpenCL if
> present. Otherwise it grabs as many cores as it can.
> By the end of this year I'll get a new desktop box based on Intels new
> Sandy Bridge-E architecture with plenty of memory. If the colleague who
> developed the code is willing performing some benchmarks on the same
> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
> recent Suse. For FreeBSD I intent also to look for performance with both
> different schedulers available.
>
> O.
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJO5hn7AAoJEF4mgOY1fXowOLAP/2EjhAFPb88NgKM0ieBb4X7R
NSw/9HTiwcshkfEdvYjAzYZ0cUWetEuRfnPVnh+abwfJEmMzZkwA0KIz8UYGHHik
22Z2SWSVDiwZAluz0ca7Xc931ojbzrK/zVMbivqW3cvnz8P4oEnASiENnsoa89Jy
Oskjd4QpAyIpB/AsYgc9FLT3kPX13fXC5bzw/zAPDsaupOYssRRlZu8nnqsEc1i1
IanLIPKLnIbpZTx75ehWxxRW8IjiQRvIe+7eBaDMhXO/Kvftotf0JzknrBnJezDQ
ZdhiOTq7F1Pm3dxra+DNKD+Dw+xUCYPFq/kuyqrZNz44H3qwT60vDhvw0yDz6422
nNP11z2+G4M85sahBak5AmSHuyek7HWb6uIHHnfvwNKSX4ZsdS8MVBViNJjmCYtL
PwuHDU3WdCes/vvKRNDopSp/s6RSLK9w3RT7jlMkaTu2Mmtw0BwGziDJ2pGaCQ14
68R5eO/SfNxoVp0g4lIzObyQR+//0OmALzElVK3VmHM9NoL3qZGCwBRLqjN5re82
dX6nsBr/DFJOpaFfdFLwPNyCNdNpg/WVegRkq2BEL/BaMISNiKzoVbM0Psh9gnb3
LW1j3LP2fOHhuN1bW3S31JmbNzvAnlRNynoNMldrwj5PWJY2HPk+mMFRjmRwdDTJ
9mhscz8++WRPvDZQXefl
=XqaR
-----END PGP SIGNATURE-----

_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Gary Jennejohn

unread,
Dec 12, 2011, 10:32:21 AM12/12/11
to Vincent Hoffman, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org

These observations are not scientific, but I have a CPU from AMD with
6 cores (AMD Phenom(tm) II X6 1090T Processor).

My simple test was ``make buildkernel'' while watching the core usage with
gkrellm.

With SCHED_4BSD all 6 cores are loaded to 97% during the build phase.
I've never seen any value above 97% with gkrellm.

With SCHED_ULE I never saw all 6 cores loaded this heavily. Usually
2 or more cores were at or below 90%. Not really that significant, but
still a noticeable difference in apparent scheduling behavior. Whether
the observed difference is due to some change in data from the kernel to
gkrellm is beyond me.

--
Gary Jennejohn

Steve Kargl

unread,
Dec 12, 2011, 10:51:59 AM12/12/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org

This comes up every 9 months or so, and must be approaching
FAQ status.

In a HPC environment, I recommend 4BSD. Depending on
the workload, ULE can cause a severe increase in turn
around time when doing already long computations. If
you have an MPI application, simply launching greater
than ncpu+1 jobs can show the problem.

PS: search the list archives for "kargl and ULE".

--
Steve

m...@freebsd.org

unread,
Dec 12, 2011, 11:04:37 AM12/12/11
to gljen...@googlemail.com, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Vincent Hoffman

SCHED_ULE is much sloppier about calculating which thread used a
timeslice -- unless the timeslice went 100% to a thread, the fraction
it used may get attributed elsewhere. So top's reporting of thread
usage is not a useful metric. Total buildworld time is, potentially.

Thanks,
matthew

Lars Engels

unread,
Dec 12, 2011, 11:10:46 AM12/12/11
to gljen...@googlemail.com, Vincent Hoffman, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
Did you use -jX to build the world?

_____________________________________________
Von: Gary Jennejohn <gljen...@googlemail.com>
Versendet am: Mon Dec 12 16:32:21 MEZ 2011
An: Vincent Hoffman <vi...@unsane.co.uk>
CC: "O. Hartmann" <ohar...@mail.zedat.fu-berlin.de>, Current FreeBSD <freebsd...@freebsd.org>, freebsd...@freebsd.org, freebsd-p...@freebsd.org
Betreff: Re: SCHED_ULE should not be the default

--
Gary Jennejohn
_____________________________________________

freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Lars Engels

unread,
Dec 12, 2011, 11:13:08 AM12/12/11
to Steve Kargl, O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org
Would it be possible to implement a mechanism that lets one change the scheduler on the fly? Afaik Solaris can do that.

_____________________________________________
Von: Steve Kargl <s...@troutmask.apl.washington.edu>
Versendet am: Mon Dec 12 16:51:59 MEZ 2011
An: "O. Hartmann" <ohar...@mail.zedat.fu-berlin.de>
CC: freebsd-p...@freebsd.org, Current FreeBSD <freebsd...@freebsd.org>, freebsd...@freebsd.org


Betreff: Re: SCHED_ULE should not be the default

--
Steve
_____________________________________________

To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Bruce Cran

unread,
Dec 12, 2011, 11:18:35 AM12/12/11
to Steve Kargl, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
On 12/12/2011 15:51, Steve Kargl wrote:
> This comes up every 9 months or so, and must be approaching FAQ
> status. In a HPC environment, I recommend 4BSD. Depending on the
> workload, ULE can cause a severe increase in turn around time when
> doing already long computations. If you have an MPI application,
> simply launching greater than ncpu+1 jobs can show the problem. PS:
> search the list archives for "kargl and ULE".

This isn't something that can be fixed by tuning ULE? For example for
desktop applications kern.sched.preempt_thresh should be set to 224 from
its default. I'm wondering if the installer should ask people what the
typical use will be, and tune the scheduler appropriately.

--
Bruce Cran
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Ivan Klymenko

unread,
Dec 12, 2011, 11:28:41 AM12/12/11
to Bruce Cran, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Steve Kargl
В Mon, 12 Dec 2011 16:18:35 +0000
Bruce Cran <br...@cran.org.uk> пишет:

> On 12/12/2011 15:51, Steve Kargl wrote:
> > This comes up every 9 months or so, and must be approaching FAQ
> > status. In a HPC environment, I recommend 4BSD. Depending on the
> > workload, ULE can cause a severe increase in turn around time when
> > doing already long computations. If you have an MPI application,
> > simply launching greater than ncpu+1 jobs can show the problem. PS:
> > search the list archives for "kargl and ULE".
>
> This isn't something that can be fixed by tuning ULE? For example for
> desktop applications kern.sched.preempt_thresh should be set to 224
> from its default. I'm wondering if the installer should ask people
> what the typical use will be, and tune the scheduler appropriately.
>

This is by and large does not help in certain situations ...

Pieter de Goeje

unread,
Dec 12, 2011, 10:53:04 AM12/12/11
to freebsd...@freebsd.org, O. Hartmann, freebsd...@freebsd.org, freebsd-p...@freebsd.org

In my spare time I do some stuff which can be considered "HPC". If I recall
correctly the most loud supporters of the notion that SCHED_BSD is faster
than SCHED_ULE are using more threads than there are cores, causing CPU core
contention and more importantly unevenly distributed runtimes among threads,
resulting in suboptimal execution times for their programs. Since I've never
actually seen that code in question it's hard to say whether or not
this "unfair" distribution actually results in lower throughput or that it
simply violates an assumption in the code that each thread takes about as
long to finish its task.
Although I haven't actually benchmarked the two schedulers directly, I have no
reason to suspect SCHED_ULE of suboptimal performance because:
1) A program model where there are N threads on N cores which take work items
from a shared queue until it is empty has almost perfect scaling on SCHED_ULE
(I get 398% CPU usage on a quadcore)
2) The same program on Linux (dual boot) compiled with exactly the same
compiler and flags runs slightly slower. I think this has to do with VM
differences.

What I'm trying to say is that until someone actually shows some code which
has demonstrably lower performance on SCHED_ULE and this is not caused by
IMHO improper timing dependencies between threads I'd say that there is no
cause for concern here. I actually expect performance differences between the
two schedulers to show in problems which cause a lot more contention on the
CPU cores and use lots of locks internally so threads are frequently waiting
on each other, for instance the MySQL benchmarks done a couple of years ago
by Kris Kennaway.

Aside from algorithmic limitations (SCHED_BSD doesn't really scale all that
well), there will always exist some problems in which SCHED_BSD is faster
because it by chance has a better execution order for these problems... The
good thing is people have a choice :-).

I'm looking forward to the results of your benchmark.

--
Pieter de Goeje


_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Ivan Klymenko

unread,
Dec 12, 2011, 11:28:41 AM12/12/11
to Bruce Cran, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Steve Kargl
В Mon, 12 Dec 2011 16:18:35 +0000
Bruce Cran <br...@cran.org.uk> пишет:

> On 12/12/2011 15:51, Steve Kargl wrote:


> > This comes up every 9 months or so, and must be approaching FAQ
> > status. In a HPC environment, I recommend 4BSD. Depending on the
> > workload, ULE can cause a severe increase in turn around time when
> > doing already long computations. If you have an MPI application,
> > simply launching greater than ncpu+1 jobs can show the problem. PS:
> > search the list archives for "kargl and ULE".
>
> This isn't something that can be fixed by tuning ULE? For example for
> desktop applications kern.sched.preempt_thresh should be set to 224
> from its default. I'm wondering if the installer should ask people
> what the typical use will be, and tune the scheduler appropriately.
>

This is by and large does not help in certain situations ...


_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Gary Jennejohn

unread,
Dec 12, 2011, 11:47:30 AM12/12/11
to Lars Engels, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Vincent Hoffman
On Mon, 12 Dec 2011 17:10:46 +0100
Lars Engels <lars....@0x20.net> wrote:

> Did you use -jX to build the world?
>

I'm top posting since Lars did.

It was buildkernel, not buildworld.

Yes, -j6.


--
Gary Jennejohn

Gary Jennejohn

unread,
Dec 12, 2011, 11:48:54 AM12/12/11
to m...@freebsd.org, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Vincent Hoffman

I suspect you're right since the buildworld time, a much better test,
was pretty much the same with 4BSD and ULE.

--
Gary Jennejohn

Steve Kargl

unread,
Dec 12, 2011, 12:06:04 PM12/12/11
to Bruce Cran, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote:
> On 12/12/2011 15:51, Steve Kargl wrote:
> >This comes up every 9 months or so, and must be approaching FAQ
> >status. In a HPC environment, I recommend 4BSD. Depending on the
> >workload, ULE can cause a severe increase in turn around time when
> >doing already long computations. If you have an MPI application,
> >simply launching greater than ncpu+1 jobs can show the problem. PS:
> >search the list archives for "kargl and ULE".
>
> This isn't something that can be fixed by tuning ULE? For example for
> desktop applications kern.sched.preempt_thresh should be set to 224 from
> its default. I'm wondering if the installer should ask people what the
> typical use will be, and tune the scheduler appropriately.
>

Tuning kern.sched.preempt_thresh did not seem to help for
my workload. My code is a classic master-slave OpenMPI
application where the master runs on one node and all
cpu-bound slaves are sent to a second node. If I send
send ncpu+1 jobs to the 2nd node with ncpu's, then
ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The
last two jobs are assigned to the ncpu'th cpu, and
these ping-pong on the this cpu. AFAICT, it is a cpu
affinity issue, where ULE is trying to keep each job
associated with its initially assigned cpu.

While one might suggest that starting ncpu+1 jobs
is not prudent, my example is just that. It is an
example showing that ULE has performance issues.
So, I now can start only ncpu jobs on each node
in the cluster and send emails to all other users
to not use those node, or use 4BSD and not worry
about loading issues.

--
Steve
_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

John Baldwin

unread,
Dec 12, 2011, 1:50:11 PM12/12/11
to freebsd...@freebsd.org, Bruce Cran, O. Hartmann, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Steve Kargl

This is a case where 4BSD's naive algorithm will spread out the load more
evenly because all the threads are on a single, shared queue and each CPU
just grabs the head of the queue when it finishes a timeslice. ULE always
assigns threads to a single CPU (even if they aren't pinned to a single
CPU using cpuset, etc.) and then tries to balance the load across cores
later, but I believe in this case it's rebalancer won't have anything to
really do as no matter what it does with the N+1 job it's going to be
sharing a CPU with another job.

--
John Baldwin

Scott Lambert

unread,
Dec 12, 2011, 2:03:30 PM12/12/11
to Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote:
> Tuning kern.sched.preempt_thresh did not seem to help for
> my workload. My code is a classic master-slave OpenMPI
> application where the master runs on one node and all
> cpu-bound slaves are sent to a second node. If I send
> send ncpu+1 jobs to the 2nd node with ncpu's, then
> ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The
> last two jobs are assigned to the ncpu'th cpu, and
> these ping-pong on the this cpu. AFAICT, it is a cpu
> affinity issue, where ULE is trying to keep each job
> associated with its initially assigned cpu.
>
> While one might suggest that starting ncpu+1 jobs
> is not prudent, my example is just that. It is an
> example showing that ULE has performance issues.
> So, I now can start only ncpu jobs on each node
> in the cluster and send emails to all other users
> to not use those node, or use 4BSD and not worry
> about loading issues.

Does it meet your expectations if you start (j modulo ncpu) = 0
jobs on a node?

--
Scott Lambert KC5MLE Unix SysAdmin
lam...@lambertfam.org

Steve Kargl

unread,
Dec 12, 2011, 2:26:37 PM12/12/11
to Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
On Mon, Dec 12, 2011 at 01:03:30PM -0600, Scott Lambert wrote:
> On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote:
> > Tuning kern.sched.preempt_thresh did not seem to help for
> > my workload. My code is a classic master-slave OpenMPI
> > application where the master runs on one node and all
> > cpu-bound slaves are sent to a second node. If I send
> > send ncpu+1 jobs to the 2nd node with ncpu's, then
> > ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The
> > last two jobs are assigned to the ncpu'th cpu, and
> > these ping-pong on the this cpu. AFAICT, it is a cpu
> > affinity issue, where ULE is trying to keep each job
> > associated with its initially assigned cpu.
> >
> > While one might suggest that starting ncpu+1 jobs
> > is not prudent, my example is just that. It is an
> > example showing that ULE has performance issues.
> > So, I now can start only ncpu jobs on each node
> > in the cluster and send emails to all other users
> > to not use those node, or use 4BSD and not worry
> > about loading issues.
>
> Does it meet your expectations if you start (j modulo ncpu) = 0
> jobs on a node?
>

I've never tried to launch more than ncpu + 1 (or + 2)
jobs. I suppose at the time I was investigating the issue,
it was determined that 4BSD allowed me to get my work done
in a more timely manner. So, I took the path of least
resistance.

--
Steve

O. Hartmann

unread,
Dec 12, 2011, 6:48:38 PM12/12/11
to Steve Kargl, Bruce Cran, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
On 12/12/11 18:06, Steve Kargl wrote:
> On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote:
>> On 12/12/2011 15:51, Steve Kargl wrote:
>>> This comes up every 9 months or so, and must be approaching FAQ
>>> status. In a HPC environment, I recommend 4BSD. Depending on the
>>> workload, ULE can cause a severe increase in turn around time when
>>> doing already long computations. If you have an MPI application,
>>> simply launching greater than ncpu+1 jobs can show the problem. PS:
>>> search the list archives for "kargl and ULE".
>>
>> This isn't something that can be fixed by tuning ULE? For example for
>> desktop applications kern.sched.preempt_thresh should be set to 224 from
>> its default. I'm wondering if the installer should ask people what the
>> typical use will be, and tune the scheduler appropriately.
>>

Is the tuning of kern.sched.preempt_thresh and a proper method of
estimating its correct value for the intended to use workload documented
in the manpages, maybe tuning()?

I find it hard to crawl a lot of pros and cons of mailing lists for
evaluating a correct value of this, seemingly, important tunable.

signature.asc

Bruce Cran

unread,
Dec 12, 2011, 7:15:13 PM12/12/11
to O. Hartmann, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Steve Kargl
On 12/12/2011 23:48, O. Hartmann wrote:
> Is the tuning of kern.sched.preempt_thresh and a proper method of
> estimating its correct value for the intended to use workload
> documented in the manpages, maybe tuning()? I find it hard to crawl a
> lot of pros and cons of mailing lists for evaluating a correct value
> of this, seemingly, important tunable.

Note that I said "for example" :)
I was suggesting that there may be sysctl's that can be tweaked to
improve performance.

--
Bruce Cran
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Doug Barton

unread,
Dec 12, 2011, 7:29:14 PM12/12/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org
On 12/12/2011 05:47, O. Hartmann wrote:
> Do we have any proof at hand for such cases where SCHED_ULE performs
> much better than SCHED_4BSD?

I complained about poor interactive performance of ULE in a desktop
environment for years. I had numerous people try to help, including
Jeff, with various tunables, dtrace'ing, etc. The cause of the problem
was never found.

I switched to 4BSD, problem gone.

This is on 2 separate systems with core 2 duos.


hth,

Doug

--

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price. :) http://SupersetSolutions.com/

_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Ivan Klymenko

unread,
Dec 13, 2011, 3:40:48 AM12/13/11
to Doug Barton, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
> On 12/12/2011 05:47, O. Hartmann wrote:
> > Do we have any proof at hand for such cases where SCHED_ULE performs
> > much better than SCHED_4BSD?
>
> I complained about poor interactive performance of ULE in a desktop
> environment for years. I had numerous people try to help, including
> Jeff, with various tunables, dtrace'ing, etc. The cause of the problem
> was never found.
>
> I switched to 4BSD, problem gone.
>
> This is on 2 separate systems with core 2 duos.
>
>
> hth,
>
> Doug
>

If the algorithm ULE does not contain problems - it means the problem
has Core2Duo, or in a piece of code that uses the ULE scheduler.
I already wrote in a mailing list that specifically in my case (Core2Duo)
partially helps the following patch:
--- sched_ule.c.orig 2011-11-24 18:11:48.000000000 +0200
+++ sched_ule.c 2011-12-10 22:47:08.000000000 +0200
@@ -794,7 +794,8 @@
* 1.5 * balance_interval.
*/
balance_ticks = max(balance_interval / 2, 1);
- balance_ticks += random() % balance_interval;
+// balance_ticks += random() % balance_interval;
+ balance_ticks += ((int)random()) % balance_interval;
if (smp_started == 0 || rebalance == 0)
return;
tdq = TDQ_SELF();
@@ -2118,13 +2119,21 @@
struct td_sched *ts;

THREAD_LOCK_ASSERT(td, MA_OWNED);
+ if (td->td_pri_class & PRI_FIFO_BIT)
+ return;
+ ts = td->td_sched;
+ /*
+ * We used up one time slice.
+ */
+ if (--ts->ts_slice > 0)
+ return;
tdq = TDQ_SELF();
#ifdef SMP
/*
* We run the long term load balancer infrequently on the first cpu.
*/
- if (balance_tdq == tdq) {
- if (balance_ticks && --balance_ticks == 0)
+ if (balance_ticks && --balance_ticks == 0) {
+ if (balance_tdq == tdq)
sched_balance();
}
#endif
@@ -2144,9 +2153,6 @@
if (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
tdq->tdq_ridx = tdq->tdq_idx;
}
- ts = td->td_sched;
- if (td->td_pri_class & PRI_FIFO_BIT)
- return;
if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
/*
* We used a tick; charge it to the thread so
@@ -2157,11 +2163,6 @@
sched_priority(td);
}
/*
- * We used up one time slice.
- */
- if (--ts->ts_slice > 0)
- return;
- /*
* We're out of time, force a requeue at userret().
*/
ts->ts_slice = sched_slice;


and refusal to use options FULL_PREEMPTION
But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo...
There is a suspicion that the problems stem from the sections of code associated with the SMP...
Maybe I'm in something wrong, but I want to help in solving this problem ...

Ivan Klymenko

unread,
Dec 13, 2011, 3:40:48 AM12/13/11
to Doug Barton, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
> On 12/12/2011 05:47, O. Hartmann wrote:
> > Do we have any proof at hand for such cases where SCHED_ULE performs
> > much better than SCHED_4BSD?
>
> I complained about poor interactive performance of ULE in a desktop
> environment for years. I had numerous people try to help, including
> Jeff, with various tunables, dtrace'ing, etc. The cause of the problem
> was never found.
>
> I switched to 4BSD, problem gone.
>
> This is on 2 separate systems with core 2 duos.
>
>
> hth,
>
> Doug
>

If the algorithm ULE does not contain problems - it means the problem

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Andrey Chernov

unread,
Dec 13, 2011, 4:00:51 AM12/13/11
to Ivan Klymenko, freebsd...@freebsd.org, O. Hartmann, Doug Barton, Current FreeBSD, freebsd-p...@freebsd.org
On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> > On 12/12/2011 05:47, O. Hartmann wrote:
> > > Do we have any proof at hand for such cases where SCHED_ULE performs
> > > much better than SCHED_4BSD?
> >
> > I complained about poor interactive performance of ULE in a desktop
> > environment for years. I had numerous people try to help, including
> > Jeff, with various tunables, dtrace'ing, etc. The cause of the problem
> > was never found.
> >
> > I switched to 4BSD, problem gone.
> >
> > This is on 2 separate systems with core 2 duos.
> >
> >
> > hth,
> >
> > Doug
> >
>
> If the algorithm ULE does not contain problems - it means the problem
> has Core2Duo, or in a piece of code that uses the ULE scheduler.

I observe ULE interactivity slowness even on single core machine (Pentium
4) in very visible places, like 'ps ax' output stucks in the middle by ~1
second. When I switch back to SHED_4BSD, all slowness is gone.

--
http://ache.vniz.net/

Adrian Chadd

unread,
Dec 13, 2011, 5:22:48 AM12/13/11
to Andrey Chernov, Ivan Klymenko, Doug Barton, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
On 13 December 2011 01:00, Andrey Chernov <ac...@freebsd.org> wrote:

>> If the algorithm ULE does not contain problems - it means the problem
>> has Core2Duo, or in a piece of code that uses the ULE scheduler.
>
> I observe ULE interactivity slowness even on single core machine (Pentium
> 4) in very visible places, like 'ps ax' output stucks in the middle by ~1
> second. When I switch back to SHED_4BSD, all slowness is gone.

Are you able to provide KTR traces of the scheduler results? Something
that can be fed to schedgraph?


Adrian

Jeremy Chadwick

unread,
Dec 13, 2011, 2:36:15 AM12/13/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org
On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:

This is in no way shape or form the same kind of benchmark as what
you're planning to do, but I thought I'd throw it out there for folks to
take in as they see fit.

I know folks were focused mainly on buildworld.

I personally would find it interesting if someone with a higher-end
system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the
same test (changing -jX to -j{numofcores} of course).

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |


sched_ule
===========
- time make -j2 buildworld
1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w
- time make -j2 buildkernel
640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w


sched_4bsd
============
- time make -j2 buildworld
1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w
- time make -j2 buildkernel
638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w


software
==========
* sched_ule test: FreeBSD 8.2-STABLE, Thu Dec 1 04:37:29 PST 2011
* sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011


hardware
==========
* Intel Core 2 Duo E8400, 3GHz
* Supermicro X7SBA
* 8GB ECC RAM (4x2GB), DDR2-800
* Intel 320-series SSD, 80GB: /, swap, /var, /tmp, /usr


tuning adjustments / etc.
===========================
* Before each scheduler test, system was rebooted to ensure I/O cache
and other whatnots were empty
* All filesystems stock UFS2 + SU (root is non-SU)
* All filesystems had tunefs -t enable applied to them
* powerd(8) in use, with two rc.conf variables (per CPU spec):

performance_cx_lowest="C2"
economy_cx_lowest="C2"

* loader.conf

kern.maxdsiz="2560M"
kern.dfldsiz="2560M"
kern.maxssiz="256M"
ahci_load="yes"
hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"
vfs.zfs.arc_max="5120M"

* make.conf

CPUTYPE?=core2

* src.conf

WITHOUT_INET6=true
WITHOUT_IPFILTER=true
WITHOUT_LIB32=true
WITHOUT_KERBEROS=true
WITHOUT_PAM_SUPPORT=true
WITHOUT_PROFILE=true
WITHOUT_SENDMAIL=true

* kernel configuration
- note: between kernel builds, config was changed to either use
SCHED_4BSD or SCHED_ULE respectively.

cpu HAMMER
ident GENERIC

makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols

options SCHED_4BSD # Classic BSD scheduler
#options SCHED_ULE # ULE scheduler
options PREEMPTION # Enable kernel thread preemption
options INET # InterNETworking
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options UFS_GJOURNAL # Enable gjournal-based UFS journaling
options MD_ROOT # MD is a potential root device
options NFSCLIENT # Network Filesystem Client
options NFSSERVER # Network Filesystem Server
options NFSLOCKD # Network Lock Manager
options NFS_ROOT # NFS usable as /, requires NFSCLIENT
options MSDOSFS # MSDOS Filesystem
options CD9660 # ISO 9660 Filesystem
options PROCFS # Process filesystem (requires PSEUDOFS)
options PSEUDOFS # Pseudo-filesystem framework
options GEOM_PART_GPT # GUID Partition Tables.
options GEOM_LABEL # Provides labelization
options COMPAT_43TTY # BSD 4.3 TTY compat (sgtty)
options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI
options KTRACE # ktrace(1) support
options STACK # stack(9) support
options SYSVSHM # SYSV-style shared memory
options SYSVMSG # SYSV-style message queues
options SYSVSEM # SYSV-style semaphores
options P1003_1B_SEMAPHORES # POSIX-style semaphores
options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options PRINTF_BUFR_SIZE=128 # Prevent printf output being interspersed.
options KBD_INSTALL_CDEV # install a CDEV entry in /dev
options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4)
options AUDIT # Security event auditing
options MAC # TrustedBSD MAC Framework
options FLOWTABLE # per-cpu routing cache
#options KDTRACE_FRAME # Ensure frames are compiled in
#options KDTRACE_HOOKS # Kernel DTrace hooks
options INCLUDE_CONFIG_FILE # Include this file in kernel

# Make an SMP-capable kernel by default
options SMP # Symmetric MultiProcessor Kernel

# Debugging options
options BREAK_TO_DEBUGGER # Sending a serial BREAK drops to DDB
options ALT_BREAK_TO_DEBUGGER # Permit <CR>~<Ctrl-b> to drop to DDB
options KDB # Enable kernel debugger support
options KDB_TRACE # Print stack trace automatically on panic
options DDB # Support DDB
options DDB_NUMSYM # Print numeric value of symbols
options GDB # Support remote GDB

# CPU frequency control
device cpufreq

# Bus support.
device acpi
device pci

# Floppy drives
device fdc

# ATA and ATAPI devices
# NOTE: "device ata" is missing because we use the Modular ATA core
# to only include the ATA-related drivers we need (e.g. AHCI).
device atadisk # ATA disk drives
device ataraid # ATA RAID drives
device atapicd # ATAPI CDROM drives
options ATA_STATIC_ID # Static device numbering

# Modular ATA
device atacore # Core ATA functionality
device ataisa # ISA bus support
device atapci # PCI bus support; only generic chipset support
device ataahci # AHCI SATA
device ataintel # Intel

# SCSI peripherals
device scbus # SCSI bus (required for SCSI)
device da # Direct Access (disks)
device cd # CD
device pass # Passthrough device (direct SCSI access)
device ses # SCSI Environmental Services (and SAF-TE)
options CAMDEBUG # CAM debugging (camcontrol debug)

# atkbdc0 controls both the keyboard and the PS/2 mouse
device atkbdc # AT keyboard controller
device atkbd # AT keyboard
device psm # PS/2 mouse

device kbdmux # keyboard multiplexer

device vga # VGA video card driver

device splash # Splash screen and screen saver support

# syscons is the default console driver, resembling an SCO console
device sc

device agp # support several AGP chipsets

# Serial (COM) ports
device uart # Generic UART driver

# PCI Ethernet NICs.
device em # Intel PRO/1000 Gigabit Ethernet Family

# Wireless NIC cards
device wlan # 802.11 support
options IEEE80211_DEBUG # enable debug msgs
options IEEE80211_AMPDU_AGE # age frames in AMPDU reorder q's
device wlan_wep # 802.11 WEP support
device wlan_ccmp # 802.11 CCMP support
device wlan_tkip # 802.11 TKIP support
device wlan_amrr # AMRR transmit rate control algorithm
device wlan_acl # MAC Access Control List support

# Pseudo devices.
device loop # Network loopback
device random # Entropy device
device ether # Ethernet support
device pty # BSD-style compatibility pseudo ttys
device md # Memory "disks"
device gif # IPv6 and IPv4 tunneling
device faith # IPv6-to-IPv4 relaying (translation)
device firmware # firmware assist module

# The `bpf' device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
# Note that 'bpf' is required for DHCP.
device bpf # Berkeley packet filter

# USB support
device uhci # UHCI PCI->USB interface
device ohci # OHCI PCI->USB interface
device ehci # EHCI PCI->USB interface (USB 2.0)
device usb # USB Bus (required)
#device udbp # USB Double Bulk Pipe devices
device uhid # "Human Interface Devices"
device ukbd # Keyboard
device umass # Disks/Mass storage - Requires scbus and da
device ums # Mouse

# Intel Core/Core2Duo CPU temperature monitoring driver
device coretemp

# SMBus support, needed for bsdhwmon
device smbus
device smb
device ichsmb

# Intel ICH hardware watchdog support
device ichwd

# pf ALTQ support
options ALTQ
options ALTQ_CBQ # Class Bases Queueing
options ALTQ_RED # Random Early Detection
options ALTQ_RIO # RED In/Out
options ALTQ_HFSC # Hierarchical Packet Scheduler
options ALTQ_CDNR # Traffic conditioner
options ALTQ_PRIQ # Priority Queueing
options ALTQ_NOPCC # Required for SMP build

O. Hartmann

unread,
Dec 13, 2011, 6:13:42 AM12/13/11
to Vincent Hoffman, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org
On 12/12/11 16:13, Vincent Hoffman wrote:
>
> On 12/12/2011 13:47, O. Hartmann wrote:
>
>>> Not fully right, boinc defaults to run on idprio 31 so this isn't an
>>> issue. And yes, there are cases where SCHED_ULE shows much better
>>> performance then SCHED_4BSD. [...]
>
>> Do we have any proof at hand for such cases where SCHED_ULE performs
>> much better than SCHED_4BSD? Whenever the subject comes up, it is
>> mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
>> 2. But in the end I see here contradictionary statements. People
>> complain about poor performance (especially in scientific environments),
>> and other give contra not being the case.
> It all a little old now but some if the stuff in
> http://people.freebsd.org/~kris/scaling/
> covers improvements that were seen.
>
> http://jeffr-tech.livejournal.com/5705.html
> shows a little too, reading though Jeffs blog is worth it as it has some
> interesting stuff on SHED_ULE.
>
> I thought there were some more benchmarks floating round but cant find
> any with a quick google.
>
>
> Vince
>
>

Interesting, there seems to be a much more performant scheduler in 7.0,
called SCHED_SMP. I have some faint recalls on that ... where is this
beast gone?

Oliver

signature.asc

Jeremy Chadwick

unread,
Dec 13, 2011, 7:47:07 AM12/13/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org, Vincent Hoffman

Boy I sure hope I remember this right. I strongly urge others to
correct me where I'm wrong; thanks in advance!

The classic scheduler, SCHED_4BSD, was implemented back before there was
oxygen. sched_4bsd(4) mentions this. No need to discuss it.

Jeff Robertson began working on the "first-generation ULE scheduler"
during the days of FreeBSD 5.x (I believe 5.1), and a paper on it was
presented at USENIX circa 2003:
http://www.usenix.org/event/bsdcon03/tech/full_papers/roberson/roberson.pdf

Over the following years, Jeff (and others I assume -- maybe folks like
George Neville-Neil and/or Kirk McKusick?) adjusted and tinkered with
some of the semantics and models/methods. If I remember right, some of
these quirks/fixes were committed. All of this was happening under the
scheduler that was then called SCHED_ULE, but it was "ULE 1.0" for lack
of better terminology.

This scheduler did not perform well, if I remember right, and Jeff was
quite honest about that. From this point forward, Jeff began idealising
and working on a scheduler which he called SCHED_SMP -- think of it as
"ULE 2.0", again, for lack of better terminology. It was different than
the existing SCHED_ULE scheduler, hence a different name. Jeff blogged
about this in early 2007, using exactly that term ("ULE 2.0"):
http://jeffr-tech.livejournal.com/3729.html

In mid-2007, prior to FreeBSD 7.0-RELEASE, Jeff announced that
effectively he wanted to make SCHED_ULE do what SCHED_SMP did, and
provided a patch to SCHED_ULE to accomplish just that:
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-07/msg00755.html

Full thread is here (beware -- many replies):
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-07/threads.html#00755

The patch mentioned above was merged into HEAD on 2007/07/19.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_ule.c#rev1.202

So in effect, as of 2007/07/19, SCHED_ULE became SCHED_SMP.

FreeBSD 7.0-RELEASE was released on 2008/02/27, and the above
commit/changes were available at that time as well (meaning: RELENG_7
and RELENG_7_0 at that moment in time should have included the patch
from the above paragraph).

The document released by Kris Kenneway hinted at those changes and
performance improvements:
http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf

Keep in mind, however, that at that time kernel configuration files
(GENERIC, etc.) still defaulted to SCHED_4BSD.

The default scheduler in kernel config files (GENERIC, etc.) for i386
and amd64 (not sure about others) was changed in 2007/10/19:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/conf/GENERIC#rev1.475
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/amd64/conf/GENERIC#rev1.485

This was done *prior* to FreeBSD 7.1-RELEASE. So, it first became
available as the default scheduler "for the masses" when 7.1-RELEASE
came out on 2009/01/05.

"All of the answers", in a roundabout and non-user-friendly way, are
available by examining the commit history for src/sys/kern/sched_ule.c.
It's hard to follow especially given that you have to consider all
the releases/branchpoints that took place over time, but:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_ule.c

Are we having fun yet? :-)

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

_______________________________________________

O. Hartmann

unread,
Dec 13, 2011, 8:23:46 AM12/13/11
to Steve Kargl, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org

Well, those recommendations should based on "WHY". As the mostly
negative experiences with SCHED_ULE in highly computative workloads get
allways contradicted by "...but there are workloads that show the
opposite ..." this should be shown by more recent benchmarks and
explanations than legacy benchmarks from years ago.

And, indeed, I highly would recommend having a FAQ or a short note in
"tuning" or the handbook in which it is mentioned to use SCHED_4BSD in
HPC environments and SCHED_ULE for other workloads (which has to be more
specific).

It is not an easy task setting up a certain kind of OS for a specific
purpose and tuning by crawling the mailing lists. Some notes and hints
in the documentation is always a valuable hint and highly appreciated by
folks not deep into development.

And by the way, I have the deep impression that most of these
discussions about the poor performance of SCHED_ULE tend to always end
up in a covering up that flaw and the conclusive waste of development.
But this is only my personal impression.

signature.asc

Steve Kargl

unread,
Dec 13, 2011, 10:54:56 AM12/13/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org

I have given the WHY in previous discussions of ULE, based
on what you call legacy benchmarks. I have not seen any
commit to sched_ule.c that would lead me to believe that
the performance issues with ULE and cpu-bound numerical
codes have been addressed. Repeating the benchmark would
be a waste of time.

Mike Tancsa

unread,
Dec 13, 2011, 4:25:18 PM12/13/11
to freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org
On 12/13/2011 10:54 AM, Steve Kargl wrote:
>
> I have given the WHY in previous discussions of ULE, based
> on what you call legacy benchmarks. I have not seen any
> commit to sched_ule.c that would lead me to believe that
> the performance issues with ULE and cpu-bound numerical
> codes have been addressed. Repeating the benchmark would
> be a waste of time.

Trying a simple pbzip2 on a large file, the results are pretty
consistent through iterations. pbzip2 with 4BSD is barely faster on a
file thats 322MB in size.

after a reboot, I did a
strings bigfile > /dev/null
then ran
pbzip2 -v xaa -c > /dev/null
7 times

If I do a burnP6 in the background, they perform about the same.

(from sysutils/cpuburn)
eg

pbzip2 -v xaa -c > /dev/null
Parallel BZIP2 v1.1.6 - by: Jeff Gilchrist [http://compression.ca]
[Oct. 30, 2011] (uses libbzip2 by Julian Seward)
Major contributions: Yavor Nikolov <nikolov.ja...@gmail.com>

# CPUs: 4
BWT Block Size: 900 KB
File Block Size: 900 KB
Maximum Memory: 100 MB
-------------------------------------------
File #: 1 of 1
Input Name: xaa
Output Name: <stdout>

Input Size: 352404831 bytes
Compressing data...
Output Size: 50630745 bytes
-------------------------------------------

Wall Clock: 18.139342 seconds


ULE
18.113204
18.116896
18.123400
18.105894
18.163332
18.139342
18.082888

ULE with burnP6
23.076085
22.003666
21.162987
21.682445
21.935568
23.595781
21.601277


4BSD
17.983395
17.986218
18.009254
18.004312
18.001494
17.997032

4BSD with burnP6
22.215508
21.886459
21.595179
21.361830
21.325351
21.244793

# ministat uleP6 bsdP6
x uleP6
+ bsdP6
+------------------------------------------------------------------------------------------------------------------------------------------+
|x + + + + x + x x +
x x|
|
|____|______________MA____________________|M_____________A__________________________________________________|
|
+------------------------------------------------------------------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 6 21.162987 23.595781 22.003666 22.242755 0.91175566
+ 6 21.244793 22.215508 21.595179 21.604853 0.3792413
No difference proven at 95.0% confidence

x ule
+ bsd
+------------------------------------------------------------------------------------------------------------------------------------------+
|+ + + + + +
x x x x x x x|
| |______A___M___|
|________________M__A__________________| |
+------------------------------------------------------------------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 7 18.082888 18.163332 18.116896 18.120708 0.025468695
+ 6 17.983395 18.009254 18.001494 17.996951 0.010248473
Difference at 95.0% confidence
-0.123757 +/- 0.024538
-0.68296% +/- 0.135414%
(Student's t, pooled s = 0.0200388)

hardware is X3450 with 8G of memory. RELENG8

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Doug Barton

unread,
Dec 13, 2011, 5:03:50 PM12/13/11
to ma...@randstrom.com, freebsd...@freebsd.org, Malin Randstrom, O. Hartmann, Current FreeBSD, Steve Kargl, freebsd-p...@freebsd.org
On 12/13/2011 13:31, Malin Randstrom wrote:
> stop sending me spam mail ... you never stop despite me having unsubscribeb
> several times. stop this!

If you had actually unsubscribed, the mail would have stopped. :)

You can see the instructions you need to follow below.

> _______________________________________________
> freebsd...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"
>

--

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price. :) http://SupersetSolutions.com/

_______________________________________________

Jilles Tjoelker

unread,
Dec 13, 2011, 6:04:42 PM12/13/11
to Ivan Klymenko, O. Hartmann, Doug Barton, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Current FreeBSD
On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> If the algorithm ULE does not contain problems - it means the problem
> has Core2Duo, or in a piece of code that uses the ULE scheduler.
> I already wrote in a mailing list that specifically in my case (Core2Duo)
> partially helps the following patch:
> --- sched_ule.c.orig 2011-11-24 18:11:48.000000000 +0200
> +++ sched_ule.c 2011-12-10 22:47:08.000000000 +0200
> @@ -794,7 +794,8 @@
> * 1.5 * balance_interval.
> */
> balance_ticks = max(balance_interval / 2, 1);
> - balance_ticks += random() % balance_interval;
> +// balance_ticks += random() % balance_interval;
> + balance_ticks += ((int)random()) % balance_interval;
> if (smp_started == 0 || rebalance == 0)
> return;
> tdq = TDQ_SELF();

This avoids a 64-bit division on 64-bit platforms but seems to have no
effect otherwise. Because this function is not called very often, the
change seems unlikely to help.

> @@ -2118,13 +2119,21 @@
> struct td_sched *ts;
>
> THREAD_LOCK_ASSERT(td, MA_OWNED);
> + if (td->td_pri_class & PRI_FIFO_BIT)
> + return;
> + ts = td->td_sched;
> + /*
> + * We used up one time slice.
> + */
> + if (--ts->ts_slice > 0)
> + return;

This skips most of the periodic functionality (long term load balancer,
saving switch count (?), insert index (?), interactivity score update
for long running thread) if the thread is not going to be rescheduled
right now.

It looks wrong but it is a data point if it helps your workload.

> tdq = TDQ_SELF();
> #ifdef SMP
> /*
> * We run the long term load balancer infrequently on the first cpu.
> */
> - if (balance_tdq == tdq) {
> - if (balance_ticks && --balance_ticks == 0)
> + if (balance_ticks && --balance_ticks == 0) {
> + if (balance_tdq == tdq)
> sched_balance();
> }
> #endif

The main effect of this appears to be to disable the long term load
balancer completely after some time. At some point, a CPU other than the
first CPU (which uses balance_tdq) will set balance_ticks = 0, and
sched_balance() will never be called again.

It also introduces a hypothetical race condition because the access to
balance_ticks is no longer restricted to one CPU under a spinlock.

If the long term load balancer may be causing trouble, try setting
kern.sched.balance_interval to a higher value with unpatched code.

--
Jilles Tjoelker

Marcus Reid

unread,
Dec 13, 2011, 6:02:15 PM12/13/11
to Doug Barton, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote:
> On 12/12/2011 05:47, O. Hartmann wrote:
> > Do we have any proof at hand for such cases where SCHED_ULE performs
> > much better than SCHED_4BSD?
>
> I complained about poor interactive performance of ULE in a desktop
> environment for years. I had numerous people try to help, including
> Jeff, with various tunables, dtrace'ing, etc. The cause of the problem
> was never found.

The issues that I've seen with ULE on the desktop seem to be caused by X
taking up a steady amount of CPU, and being demoted from being an
"interactive" process. X then becomes the bottleneck for other
processes that would otherwise be "interactive". Try 'renice -20
<pid_of_X>' and see if that makes your problems go away.

Marcus

Ivan Klymenko

unread,
Dec 13, 2011, 6:39:06 PM12/13/11
to Jilles Tjoelker, O. Hartmann, Doug Barton, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Current FreeBSD
В Wed, 14 Dec 2011 00:04:42 +0100
Jilles Tjoelker <jil...@stack.nl> пишет:

> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> > If the algorithm ULE does not contain problems - it means the
> > problem has Core2Duo, or in a piece of code that uses the ULE
> > scheduler. I already wrote in a mailing list that specifically in
> > my case (Core2Duo) partially helps the following patch:
> > --- sched_ule.c.orig 2011-11-24 18:11:48.000000000 +0200
> > +++ sched_ule.c 2011-12-10 22:47:08.000000000 +0200
> > @@ -794,7 +794,8 @@
> > * 1.5 * balance_interval.
> > */
> > balance_ticks = max(balance_interval / 2, 1);
> > - balance_ticks += random() % balance_interval;
> > +// balance_ticks += random() % balance_interval;
> > + balance_ticks += ((int)random()) % balance_interval;
> > if (smp_started == 0 || rebalance == 0)
> > return;
> > tdq = TDQ_SELF();
>
> This avoids a 64-bit division on 64-bit platforms but seems to have no
> effect otherwise. Because this function is not called very often, the
> change seems unlikely to help.

Yes, this section does not apply to this problem :)
Just I posted the latest patch which i using now...

>
> > @@ -2118,13 +2119,21 @@
> > struct td_sched *ts;
> >
> > THREAD_LOCK_ASSERT(td, MA_OWNED);
> > + if (td->td_pri_class & PRI_FIFO_BIT)
> > + return;
> > + ts = td->td_sched;
> > + /*
> > + * We used up one time slice.
> > + */
> > + if (--ts->ts_slice > 0)
> > + return;
>
> This skips most of the periodic functionality (long term load
> balancer, saving switch count (?), insert index (?), interactivity
> score update for long running thread) if the thread is not going to
> be rescheduled right now.
>
> It looks wrong but it is a data point if it helps your workload.

Yes, I did it for as long as possible to delay the execution of the code in section:
...


#ifdef SMP
/*
* We run the long term load balancer infrequently on the first cpu.
*/

if (balance_tdq == tdq) {


if (balance_ticks && --balance_ticks == 0)

sched_balance();
}
#endif
...

>
> > tdq = TDQ_SELF();
> > #ifdef SMP
> > /*
> > * We run the long term load balancer infrequently on the
> > first cpu. */
> > - if (balance_tdq == tdq) {
> > - if (balance_ticks && --balance_ticks == 0)
> > + if (balance_ticks && --balance_ticks == 0) {
> > + if (balance_tdq == tdq)
> > sched_balance();
> > }
> > #endif
>
> The main effect of this appears to be to disable the long term load
> balancer completely after some time. At some point, a CPU other than
> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
> sched_balance() will never be called again.
>

That is, for the same reason as above in the text...

> It also introduces a hypothetical race condition because the access to
> balance_ticks is no longer restricted to one CPU under a spinlock.
>
> If the long term load balancer may be causing trouble, try setting
> kern.sched.balance_interval to a higher value with unpatched code.

I checked it in the first place - but it did not help fix the situation...

The impression of malfunction rebalancing...
It seems that the thread is passed on to the same core that is loaded and so...
Perhaps this is a consequence of an incorrect definition of the topology CPU?

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Ivan Klymenko

unread,
Dec 13, 2011, 6:39:06 PM12/13/11
to Jilles Tjoelker, O. Hartmann, Doug Barton, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Current FreeBSD
В Wed, 14 Dec 2011 00:04:42 +0100
Jilles Tjoelker <jil...@stack.nl> пишет:

> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:


> > If the algorithm ULE does not contain problems - it means the
> > problem has Core2Duo, or in a piece of code that uses the ULE
> > scheduler. I already wrote in a mailing list that specifically in
> > my case (Core2Duo) partially helps the following patch:
> > --- sched_ule.c.orig 2011-11-24 18:11:48.000000000 +0200
> > +++ sched_ule.c 2011-12-10 22:47:08.000000000 +0200
> > @@ -794,7 +794,8 @@
> > * 1.5 * balance_interval.
> > */
> > balance_ticks = max(balance_interval / 2, 1);
> > - balance_ticks += random() % balance_interval;
> > +// balance_ticks += random() % balance_interval;
> > + balance_ticks += ((int)random()) % balance_interval;
> > if (smp_started == 0 || rebalance == 0)
> > return;
> > tdq = TDQ_SELF();
>
> This avoids a 64-bit division on 64-bit platforms but seems to have no
> effect otherwise. Because this function is not called very often, the
> change seems unlikely to help.

Yes, this section does not apply to this problem :)


Just I posted the latest patch which i using now...

>

> > @@ -2118,13 +2119,21 @@
> > struct td_sched *ts;
> >
> > THREAD_LOCK_ASSERT(td, MA_OWNED);
> > + if (td->td_pri_class & PRI_FIFO_BIT)
> > + return;
> > + ts = td->td_sched;
> > + /*
> > + * We used up one time slice.
> > + */
> > + if (--ts->ts_slice > 0)
> > + return;
>
> This skips most of the periodic functionality (long term load
> balancer, saving switch count (?), insert index (?), interactivity
> score update for long running thread) if the thread is not going to
> be rescheduled right now.
>
> It looks wrong but it is a data point if it helps your workload.

Yes, I did it for as long as possible to delay the execution of the code in section:
...


#ifdef SMP
/*
* We run the long term load balancer infrequently on the first cpu.
*/

if (balance_tdq == tdq) {


if (balance_ticks && --balance_ticks == 0)

sched_balance();
}
#endif
...

>
> > tdq = TDQ_SELF();
> > #ifdef SMP
> > /*
> > * We run the long term load balancer infrequently on the
> > first cpu. */
> > - if (balance_tdq == tdq) {
> > - if (balance_ticks && --balance_ticks == 0)
> > + if (balance_ticks && --balance_ticks == 0) {
> > + if (balance_tdq == tdq)
> > sched_balance();
> > }
> > #endif
>
> The main effect of this appears to be to disable the long term load
> balancer completely after some time. At some point, a CPU other than
> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
> sched_balance() will never be called again.
>

That is, for the same reason as above in the text...

> It also introduces a hypothetical race condition because the access to


> balance_ticks is no longer restricted to one CPU under a spinlock.
>
> If the long term load balancer may be causing trouble, try setting
> kern.sched.balance_interval to a higher value with unpatched code.

I checked it in the first place - but it did not help fix the situation...

The impression of malfunction rebalancing...
It seems that the thread is passed on to the same core that is loaded and so...
Perhaps this is a consequence of an incorrect definition of the topology CPU?

>

Ivan Klymenko

unread,
Dec 13, 2011, 6:42:11 PM12/13/11
to Marcus Reid, O. Hartmann, Doug Barton, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Current FreeBSD
В Tue, 13 Dec 2011 23:02:15 +0000
Marcus Reid <mar...@blazingdot.com> пишет:

> On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote:
> > On 12/12/2011 05:47, O. Hartmann wrote:
> > > Do we have any proof at hand for such cases where SCHED_ULE
> > > performs much better than SCHED_4BSD?
> >
> > I complained about poor interactive performance of ULE in a desktop
> > environment for years. I had numerous people try to help, including
> > Jeff, with various tunables, dtrace'ing, etc. The cause of the
> > problem was never found.
>
> The issues that I've seen with ULE on the desktop seem to be caused
> by X taking up a steady amount of CPU, and being demoted from being an
> "interactive" process. X then becomes the bottleneck for other
> processes that would otherwise be "interactive". Try 'renice -20
> <pid_of_X>' and see if that makes your problems go away.

Why, then X is not a bottleneck when using 4BSD?

Ivan Klymenko

unread,
Dec 13, 2011, 6:42:11 PM12/13/11
to Marcus Reid, O. Hartmann, Doug Barton, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Current FreeBSD
В Tue, 13 Dec 2011 23:02:15 +0000
Marcus Reid <mar...@blazingdot.com> пишет:

> On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote:


> > On 12/12/2011 05:47, O. Hartmann wrote:
> > > Do we have any proof at hand for such cases where SCHED_ULE
> > > performs much better than SCHED_4BSD?
> >
> > I complained about poor interactive performance of ULE in a desktop
> > environment for years. I had numerous people try to help, including
> > Jeff, with various tunables, dtrace'ing, etc. The cause of the
> > problem was never found.
>
> The issues that I've seen with ULE on the desktop seem to be caused
> by X taking up a steady amount of CPU, and being demoted from being an
> "interactive" process. X then becomes the bottleneck for other
> processes that would otherwise be "interactive". Try 'renice -20
> <pid_of_X>' and see if that makes your problems go away.

Why, then X is not a bottleneck when using 4BSD?

> Marcus
_______________________________________________

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

m...@freebsd.org

unread,
Dec 13, 2011, 7:01:56 PM12/13/11
to Ivan Klymenko, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko <fi...@ukr.net> wrote:
> В Wed, 14 Dec 2011 00:04:42 +0100
> Jilles Tjoelker <jil...@stack.nl> пишет:
>
>> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
>> > If the algorithm ULE does not contain problems - it means the
>> > problem has Core2Duo, or in a piece of code that uses the ULE
>> > scheduler. I already wrote in a mailing list that specifically in
>> > my case (Core2Duo) partially helps the following patch:
>> > --- sched_ule.c.orig        2011-11-24 18:11:48.000000000 +0200
>> > +++ sched_ule.c     2011-12-10 22:47:08.000000000 +0200
>> > @@ -794,7 +794,8 @@
>> >      * 1.5 * balance_interval.
>> >      */
>> >     balance_ticks = max(balance_interval / 2, 1);
>> > -   balance_ticks += random() % balance_interval;
>> > +// balance_ticks += random() % balance_interval;
>> > +   balance_ticks += ((int)random()) % balance_interval;
>> >     if (smp_started == 0 || rebalance == 0)
>> >             return;
>> >     tdq = TDQ_SELF();
>>
>> This avoids a 64-bit division on 64-bit platforms but seems to have no
>> effect otherwise. Because this function is not called very often, the
>> change seems unlikely to help.
>
> Yes, this section does not apply to this problem :)
> Just I posted the latest patch which i using now...
>
>>
>> > @@ -2118,13 +2119,21 @@
>> >     struct td_sched *ts;
>> >
>> >     THREAD_LOCK_ASSERT(td, MA_OWNED);
>> > +   if (td->td_pri_class & PRI_FIFO_BIT)
>> > +           return;
>> > +   ts = td->td_sched;
>> > +   /*
>> > +    * We used up one time slice.
>> > +    */
>> > +   if (--ts->ts_slice > 0)
>> > +           return;
>>
>> This skips most of the periodic functionality (long term load
>> balancer, saving switch count (?), insert index (?), interactivity
>> score update for long running thread) if the thread is not going to
>> be rescheduled right now.
>>
>> It looks wrong but it is a data point if it helps your workload.
>
> Yes, I did it for as long as possible to delay the execution of the code in section:
> ...
> #ifdef SMP
>        /*
>         * We run the long term load balancer infrequently on the first cpu.
>         */
>        if (balance_tdq == tdq) {

>                if (balance_ticks && --balance_ticks == 0)
>                        sched_balance();
>        }
> #endif
> ...

>
>>
>> >     tdq = TDQ_SELF();
>> >  #ifdef SMP
>> >     /*
>> >      * We run the long term load balancer infrequently on the
>> > first cpu. */
>> > -   if (balance_tdq == tdq) {
>> > -           if (balance_ticks && --balance_ticks == 0)
>> > +   if (balance_ticks && --balance_ticks == 0) {
>> > +           if (balance_tdq == tdq)
>> >                     sched_balance();
>> >     }
>> >  #endif
>>
>> The main effect of this appears to be to disable the long term load
>> balancer completely after some time. At some point, a CPU other than
>> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
>> sched_balance() will never be called again.
>>
>
> That is, for the same reason as above in the text...
>
>> It also introduces a hypothetical race condition because the access to
>> balance_ticks is no longer restricted to one CPU under a spinlock.
>>
>> If the long term load balancer may be causing trouble, try setting
>> kern.sched.balance_interval to a higher value with unpatched code.
>
> I checked it in the first place - but it did not help fix the situation...
>
> The impression of malfunction rebalancing...
> It seems that the thread is passed on to the same core that is loaded and so...
> Perhaps this is a consequence of an incorrect definition of the topology CPU?
>
>>


Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ?

I don't remember what our specific problem at $WORK was, perhaps it
was just interrupt threads not getting serviced fast enough, but we've
hard-coded this to 1 and removed the code that sets it in
sched_initticks(). The same effect should be had by setting the
sysctl after a box is up.

Thanks,
matthew


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Ivan Klymenko

unread,
Dec 13, 2011, 7:36:29 PM12/13/11
to m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Tjoelker, O. Hartmann, Current FreeBSD, Jil...@freebsd.org, freebsd-p...@freebsd.org
В Tue, 13 Dec 2011 16:01:56 -0800
m...@FreeBSD.org пишет:

In my case, the variable kern.sched.steal_thresh and so has the value 1.

Ivan Klymenko

unread,
Dec 13, 2011, 7:36:29 PM12/13/11
to m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Tjoelker, O. Hartmann, Current FreeBSD, Jil...@freebsd.org, freebsd-p...@freebsd.org
В Tue, 13 Dec 2011 16:01:56 -0800
m...@FreeBSD.org пишет:

In my case, the variable kern.sched.steal_thresh and so has the value 1.

> I don't remember what our specific problem at $WORK was, perhaps it


> was just interrupt threads not getting serviced fast enough, but we've
> hard-coded this to 1 and removed the code that sets it in
> sched_initticks(). The same effect should be had by setting the
> sysctl after a box is up.
>
> Thanks,
> matthew
_______________________________________________

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Bruce Evans

unread,
Dec 13, 2011, 8:25:14 PM12/13/11
to Ivan Klymenko, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On Wed, 14 Dec 2011, Ivan Klymenko wrote:

> В Wed, 14 Dec 2011 00:04:42 +0100
> Jilles Tjoelker <jil...@stack.nl> пишет:
>

>> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
>>> If the algorithm ULE does not contain problems - it means the
>>> problem has Core2Duo, or in a piece of code that uses the ULE
>>> scheduler. I already wrote in a mailing list that specifically in
>>> my case (Core2Duo) partially helps the following patch:
>>> --- sched_ule.c.orig 2011-11-24 18:11:48.000000000 +0200
>>> +++ sched_ule.c 2011-12-10 22:47:08.000000000 +0200

>>> ...


>>> @@ -2118,13 +2119,21 @@
>>> struct td_sched *ts;
>>>
>>> THREAD_LOCK_ASSERT(td, MA_OWNED);
>>> + if (td->td_pri_class & PRI_FIFO_BIT)
>>> + return;
>>> + ts = td->td_sched;
>>> + /*
>>> + * We used up one time slice.
>>> + */
>>> + if (--ts->ts_slice > 0)
>>> + return;
>>
>> This skips most of the periodic functionality (long term load
>> balancer, saving switch count (?), insert index (?), interactivity
>> score update for long running thread) if the thread is not going to
>> be rescheduled right now.
>>
>> It looks wrong but it is a data point if it helps your workload.
>

> Yes, I did it for as long as possible to delay the execution of the code in section:

I don't understand what you are doing here, but recently noticed that
the timeslicing in SCHED_4BSD is completely broken. This bug may be a
feature. SCHED_4BSD doesn't have its own timeslice counter like ts_slice
above. It uses `switchticks' instead. But switchticks hasn't been usable
for this purpose since long before SCHED_4BSD started using it for this
purpose. switchticks is reset on every context switch, so it is useless
for almost all purposes -- any interrupt activity on a non-fast interrupt
clobbers it.

Removing the check of ts_slice in the above and always returning might
give a similar bug to the SCHED_4BSD one.

I noticed this while looking for bugs in realtime scheduling. In the
above, returning early for PRI_FIFO_BIT also skips most of the periodic
functionality. In SCHED_4BSD, returning early is the usual case, so
the PRI_FIFO_BIT might as well not be checked, and it is the unusual
fifo scheduling case (which is supposed to only apply to realtime
priority threads) which has a chance of working as intended, while the
usual roundrobin case degenerates to an impure form of fifo scheduling
(iit is impure since priority decay still works so it is only fifo
among threads of the same priority).

>>...


>>> @@ -2144,9 +2153,6 @@
>>> if
>>> (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
>>> tdq->tdq_ridx = tdq->tdq_idx; }
>>> - ts = td->td_sched;
>>> - if (td->td_pri_class & PRI_FIFO_BIT)
>>> - return;
>>> if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
>>> /*
>>> * We used a tick; charge it to the thread so
>>> @@ -2157,11 +2163,6 @@
>>> sched_priority(td);
>>> }
>>> /*
>>> - * We used up one time slice.
>>> - */
>>> - if (--ts->ts_slice > 0)
>>> - return;
>>> - /*
>>> * We're out of time, force a requeue at userret().
>>> */
>>> ts->ts_slice = sched_slice;

With the ts_slice check here before you moved it, removing it might
give buggy behaviour closer to SCHED_4BSD.

>>> and refusal to use options FULL_PREEMPTION

4-5 years ago, I found that any form of PREMPTION was a pessimization
for at least makeworld (since it caused too many context switches).
PREEMPTION was needed for the !SMP case, at least partly because of
the broken switchticks (switchticks, when it works, gives voluntary
yielding by some CPU hogs in the kernel. PREEMPTION, if it works,
should do this better). So I used PREEMPTION in the !SMP case and
not for the SMP case. I didn't worry about the CPU hogs in the SMP
case since it is rare to have more than 1 of them and 1 will use at
most 1/2 of a multi-CPU system.

>>> But no one has unsubscribed to my letter, my patch helps or not in
>>> the case of Core2Duo...
>>> There is a suspicion that the problems stem from the sections of
>>> code associated with the SMP...
>>> Maybe I'm in something wrong, but I want to help in solving this
>>> problem ...

The main point of SCHED_ULE is to give better affinity for multi-CPU
systems. But the `multi' apparently needs to be strictly more than
2 for it to brak even.

Bruce

Mike Tancsa

unread,
Dec 14, 2011, 11:59:51 AM12/14/11
to m...@freebsd.org, Ivan Klymenko, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On 12/13/2011 7:01 PM, m...@freebsd.org wrote:
>
> Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ?
>
> I don't remember what our specific problem at $WORK was, perhaps it
> was just interrupt threads not getting serviced fast enough, but we've
> hard-coded this to 1 and removed the code that sets it in
> sched_initticks(). The same effect should be had by setting the
> sysctl after a box is up.

FWIW, this does impact the performance of pbzip2 on an i7. Using a 1.1G file

pbzip2 -v -c big > /dev/null

with burnP6 running in the background,

sysctl kern.sched.steal_thresh=1
vs
sysctl kern.sched.steal_thresh=3

N Min Max Median Avg Stddev

x 10 38.005022 38.42238 38.194648 38.165052 0.15546188
+ 9 38.695417 40.595544 39.392127 39.435384 0.59814114
Difference at 95.0% confidence
1.27033 +/- 0.412636
3.32852% +/- 1.08119%
(Student's t, pooled s = 0.425627)

a value of 1 is *slightly* faster.


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Andrey Chernov

unread,
Dec 14, 2011, 12:34:35 PM12/14/11
to Adrian Chadd, Ivan Klymenko, Doug Barton, freebsd...@freebsd.org, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote:
> On 13 December 2011 01:00, Andrey Chernov <ac...@freebsd.org> wrote:
>
> >> If the algorithm ULE does not contain problems - it means the problem
> >> has Core2Duo, or in a piece of code that uses the ULE scheduler.
> >
> > I observe ULE interactivity slowness even on single core machine (Pentium
> > 4) in very visible places, like 'ps ax' output stucks in the middle by ~1
> > second. When I switch back to SHED_4BSD, all slowness is gone.
>
> Are you able to provide KTR traces of the scheduler results? Something
> that can be fed to schedgraph?

Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2 Duo
instead and don't notice this effect, but it is overall pretty fast
comparing to that Pentium 4.

--
http://ache.vniz.net/

Ivan Klymenko

unread,
Dec 14, 2011, 12:55:49 PM12/14/11
to Andrey Chernov, Adrian Chadd, Doug Barton, freebsd...@freebsd.org, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
В Wed, 14 Dec 2011 21:34:35 +0400
Andrey Chernov <ac...@FreeBSD.ORG> пишет:

> On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote:
> > On 13 December 2011 01:00, Andrey Chernov <ac...@freebsd.org> wrote:
> >
> > >> If the algorithm ULE does not contain problems - it means the
> > >> problem has Core2Duo, or in a piece of code that uses the ULE
> > >> scheduler.
> > >
> > > I observe ULE interactivity slowness even on single core machine
> > > (Pentium 4) in very visible places, like 'ps ax' output stucks in
> > > the middle by ~1 second. When I switch back to SHED_4BSD, all
> > > slowness is gone.
> >
> > Are you able to provide KTR traces of the scheduler results?
> > Something that can be fed to schedgraph?
>
> Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2
> Duo instead and don't notice this effect, but it is overall pretty
> fast comparing to that Pentium 4.
>

Give me, please, detailed instructions on how to do it - I'll do it ...
Be a shame if this the theme is will end again just only the
discussions ... :(

Ivan Klymenko

unread,
Dec 14, 2011, 12:55:49 PM12/14/11
to Andrey Chernov, Adrian Chadd, Doug Barton, freebsd...@freebsd.org, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
В Wed, 14 Dec 2011 21:34:35 +0400
Andrey Chernov <ac...@FreeBSD.ORG> пишет:

> On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote:


> > On 13 December 2011 01:00, Andrey Chernov <ac...@freebsd.org> wrote:
> >
> > >> If the algorithm ULE does not contain problems - it means the
> > >> problem has Core2Duo, or in a piece of code that uses the ULE
> > >> scheduler.
> > >
> > > I observe ULE interactivity slowness even on single core machine
> > > (Pentium 4) in very visible places, like 'ps ax' output stucks in
> > > the middle by ~1 second. When I switch back to SHED_4BSD, all
> > > slowness is gone.
> >
> > Are you able to provide KTR traces of the scheduler results?
> > Something that can be fed to schedgraph?
>
> Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2
> Duo instead and don't notice this effect, but it is overall pretty
> fast comparing to that Pentium 4.
>

Give me, please, detailed instructions on how to do it - I'll do it ...


Be a shame if this the theme is will end again just only the
discussions ... :(
_______________________________________________

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Malin Randstrom

unread,
Dec 13, 2011, 4:31:20 PM12/13/11
to Steve Kargl, freebsd-p...@freebsd.org, Current FreeBSD, freebsd...@freebsd.org, O. Hartmann
stop sending me spam mail ... you never stop despite me having unsubscribeb
several times. stop this!
On Dec 13, 2011 8:12 PM, "Steve Kargl" <s...@troutmask.apl.washington.edu>
wrote:

O. Hartmann

unread,
Dec 15, 2011, 2:32:48 AM12/15/11
to Jeremy Chadwick, freebsd-p...@freebsd.org, Current FreeBSD, FreeBSD Stable Mailing List
Just saw this shot benchmark on Phoronix dot com today:

http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA

It may be worth to discuss the sad performance of FBSD in some parts of
the benchmark. A difference of a factor 10 or 100 is simply far beyond
disapointing, it is more than inacceptable and by just reading those
benchmarks, I'd like to drop thinking of using FreeBSD even as a backend
server in scientific and business environments. In detail, some of the
SciMark benches look disappointing. The overall image can't help over
the fact that in C-Ray FreeBSD is better performing.

From the compiler, I'd like say there couldn't be a drop of more than 10
- 15% in performance - but not 10 or 100 times.

I'm just thinking about the discussion of SCHED_ULE and all the saur
spots we discussed when I stumbled over the test.

Regards,
Oliver

signature.asc

Adrian Chadd

unread,
Dec 15, 2011, 2:40:32 AM12/15/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, FreeBSD Stable Mailing List, Jeremy Chadwick
On 14 December 2011 23:32, O. Hartmann <ohar...@zedat.fu-berlin.de> wrote:
> Just saw this shot benchmark on Phoronix dot com today:
>
> http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA
>
> It may be worth to discuss the sad performance of FBSD in some parts of
> the benchmark. A difference of a factor 10 or 100 is simply far beyond

Well, the only way it's going to get fixed is if someone sits down,
replicates it, and starts to document exactly what it is that these
benchmarks are/aren't doing.

Sometimes it's because the benchmark is very much tickling things
incorrectly. In a lot of cases though, the benchmark is testing
something synthetic that Linux just happens to have micro-optimised.

So if you care about this a lot, someone needs to stand up, work with
Phronix to get some actual feedback about what's going on, and see if
it can be fixed. Maybe you'll find ULE is broken in some instances; I
bet you'll find something like "the disk driver is suboptimal." For
example, I remember seeing someone mess up a test because they split
their filesystems across raid5 boundaries, and this was hidden by the
choice of raid controller and stripe size. This made FreeBSD look
worse; when this was corrected for, it sped up far past Linux.

Adrian

Steven Hartland

unread,
Dec 15, 2011, 6:02:47 AM12/15/11
to Michael Larabel, Michael Ross, FreeBSD Stable Mailing List, freebsd-p...@freebsd.org, Current FreeBSD, O. Hartmann, Jeremy Chadwick
----- Original Message -----
From: "Michael Larabel" <michael...@phoronix.com>

>
> I was the on that carried out the testing and know that it was on the
> same system.
>
> All of the testing, including the system tables, is fully automated.
> Under FreeBSD sometimes the parsing of some component strings isn't as
> nice as Linux and other supported operating systems by the Phoronix Test
> Suite. For the BSD motherboard string parsing it's grabbing
> hw.vendor/hw.product from sysctl. Is there a better place to read the
> motherboard DMI information from?

dmidecode may provide better info?

Regards
Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postm...@multiplay.co.uk.

Stefan Esser

unread,
Dec 15, 2011, 8:25:26 AM12/15/11
to Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, Michael Ross, freebsd-p...@freebsd.org, O. Hartmann, Jeremy Chadwick
Am 15.12.2011 11:10, schrieb Michael Larabel:
> No, the same hardware was used for each OS.
>
> In terms of the software, the stock software stack for each OS was used.

Just curious: Why did you choose ZFS on FreeBSD, while UFS2 (with
journaling enabled) should be an obvious choice since it is more similar
in concept to ext4 and since that is what most FreeBSD users will use
with FreeBSD?

Did you tune the ZFS ARC (e.g. vfs.zfs.arc_max="6G") for the tests?

And BTW: Did your measured run times account for the effect, that Linux
keeps much more dirty data in the buffer cache (FreeBSD has a low limit
on dirty buffers since under realistic load the already cached data is
much more likely to be reused and thus more valuable than freshly
written data; aggressively caching dirty data would significantly reduce
throughput and responsiveness under high load). Given the hardware specs
of the test system, I guess that Linux accepts at least 100 times the
dirty data in the buffer cache, compared to FreeBSD (where this number
is at most in the tens of megabyte range).

If you did not, then your results do not represent a server load (which
I'd expect relevant, if you are testing against Oracle Linux 6.1
server), where continuous performance is required. Tests that run on an
idle system starting in a clean state and ignoring background flushing
of the buffer cache after the timed program has stopped are perhaps
useful for a very lowly loaded PC, but not for a system with high load
average as the default.

I bet that if you compared the systems under higher load (which
admittedly makes it much harder to get sensible numbers for the program
under test) or with reduced buffer cache size (or raise the dirty buffer
limit in FreeBSD accordingly, which ought to be possible with sysctl
and/or boot time tuneables, e.g. "vfs.hidirtybuffers").

And a last remark: Single benchmark runs do not provide reliable data.
FreeBSD comes with "ministat" to check the significance of benchmark
results. Each test should be repeated at least 5 times for meaningful
averages with acceptable confidence level.

Regards, STefan

Daniel Kalchev

unread,
Dec 15, 2011, 8:58:44 AM12/15/11
to Jeremy Chadwick, Adrian Chadd, Samuel J. Greear, Current FreeBSD, FreeBSD Stable Mailing List, freebsd-p...@freebsd.org, O. Hartmann

On Dec 15, 2011, at 3:48 PM, Jeremy Chadwick wrote:

[…]
> That said: thrown out, data ignored, done.
>
> Now what? Where are we? We're right back where we were a day or two
> ago; meaning no closer to solving the dilemma reported by users and
> SCHED_ULE. Heck, we're not even sure if there is an issue, other than
> some folks confirming that SCHED_4BSD performs better for them (that's
> what started this whole thread), and there are at least a couple which
> have stated this.

But, are any of these benchmarks really engaging the 4BSD/ULE scheduler differences? Most such benchmarks are run on a system with no other load whatsoever and in no way represent real world experience.

What is more, I believe in such benchmarks "the system feels sluggish" is not measured at all. Even if it is measured, if in such case the benchmark finishes "better" - that is, faster, or say, makes the system freeze for the user for the duration of the test -- it will be considered "win", because the benchmark suite ran faster on that particular system -- whereas a system which ran the benchmark fast, provided good interactive response etc would be considered "loser".

I think it is not good idea to hijack this thread, but instead focusing on the other SCHED_ULE bashing thread to define an reasonable benchmark or a set of benchmarks rather -- so that many would run it and provide feedback.


Daniel_______________________________________________

Daniel Kalchev

unread,
Dec 15, 2011, 8:51:33 AM12/15/11
to Stefan Esser, Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, Michael Ross, freebsd-p...@freebsd.org, O. Hartmann, Jeremy Chadwick

On Dec 15, 2011, at 3:25 PM, Stefan Esser wrote:

> Am 15.12.2011 11:10, schrieb Michael Larabel:
>> No, the same hardware was used for each OS.
>>
>> In terms of the software, the stock software stack for each OS was used.
>
> Just curious: Why did you choose ZFS on FreeBSD, while UFS2 (with
> journaling enabled) should be an obvious choice since it is more similar
> in concept to ext4 and since that is what most FreeBSD users will use
> with FreeBSD?


Or perhaps, since it is "server" Linux distribution, use ZFS on Linux as well. With identical tuning on both Linux and FreeBSD. Having the same FS used by both OS will help make the comparison more sensible for FS I/O.

Daniel_______________________________________________

Sergey Matveychuk

unread,
Dec 15, 2011, 9:26:16 AM12/15/11
to Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, Michael Ross, freebsd-p...@freebsd.org, O. Hartmann, Jeremy Chadwick
15.12.2011 17:36, Michael Larabel пишет:

> On 12/15/2011 07:25 AM, Stefan Esser wrote:
>> Am 15.12.2011 11:10, schrieb Michael Larabel:
>>> No, the same hardware was used for each OS.
>>>
>>> In terms of the software, the stock software stack for each OS was used.
>> Just curious: Why did you choose ZFS on FreeBSD, while UFS2 (with
>> journaling enabled) should be an obvious choice since it is more similar
>> in concept to ext4 and since that is what most FreeBSD users will use
>> with FreeBSD?
>
> I was running some ZFS vs. UFS tests as well and this happened to have
> ZFS on when I was running some other tests.
>

Can we look at the tests?
My opinion is ZFS without tuning is much slower than UFS2.


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Mike Bedwell

unread,
Dec 15, 2011, 9:32:27 AM12/15/11
to freebsd-p...@freebsd.org
The benchmarks also need to be ran on equivalent hardware. I've yet to
see phoronix perform a benchmark test where they actually used the same
rig to perform benchmark comparisons. Too many things change from one
benchmark to the next to be able to reliably say what is at fault for
the benchmark differences. The test needs to be re-ran in an
environment where the only thing that changes, is the operating system.
These benchmarks only show that linux on one machine performs
differently than bsd on an entirely different machine.

_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Tony McC

unread,
Dec 15, 2011, 9:49:24 AM12/15/11
to freebsd-p...@freebsd.org

I suggest always ignoring benchmarks. They are like reading the
astrology column in a tabloid newspaper. Instead, try FreeBSD for your
work. Is it fast enough? Surely that is all you need to know. FreeBSD
is quite fast enough for my needs and I am simply more productive using
it than when I use any other operating system. That is partly to do
with my familiarity with my setup, which I have customised the way I
want. That is something that no benchmark can allow for.

Tony

Volodymyr Kostyrko

unread,
Dec 15, 2011, 9:37:27 AM12/15/11
to Jeremy Chadwick, Adrian Chadd, Samuel J. Greear, Current FreeBSD, FreeBSD Stable Mailing List, freebsd-p...@freebsd.org, O. Hartmann
15.12.2011 15:48, Jeremy Chadwick wrote:
> I'm getting to the point where I'm considering formulating a private
> mail to Jeff Roberson, requesting that he be aware of the discussion
> that's happening (not that he necessarily follow or read it), and that
> based on what I can tell we're at a roadblock -- nobody so far is
> absolutely certain how to "benchmark" and compare ULE vs. 4BSD in
> multiple ways, so that those of us involved here can run such utilities
> and provide the data somewhere central for devs to review. I only
> mention this because so far I haven't seen anyone really say "okay, this
> is what we should be using for these kinds of tests". Yay nature of the
> beast.

I'll try to summarize and propose a test scenario. I don't know whether
this helps or not.

We should have two different task types for this one. The first would be
Super Affine tasks. They should use few to none syscalls, use medium
math, have low memory footprint. No syscalls means this tasks will never
stop for memory/disk or other activity so each time the queue is looked
upon this task will be ready to run. Medium math means this shouldn't be
just a simple big loop so that processor will really compute something
with this data. Low memory footprint means this task can reside with
data on CPU L1 cache for eons. I'm not sure about branch prediction,
should it be distorted or not...

The other task type would be Worker. It doesn't matter what it does but
it agressively uses syscalls like working with files/directories.

There should be at least one SA-task per core and at least 10 (?)
W-tasks per core.

--
Sphinx of black quartz judge my vow.

Pieter de Goeje

unread,
Dec 15, 2011, 10:48:33 AM12/15/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD
Op 15-12-2011 8:32, O. Hartmann schreef:
Detailed results here:
http://openbenchmarking.org/result/1112113-AR-ORACLELIN37

As usual, the phoronix benchmarks are very misleading.
1) The linked benchmarks were not run on the same hardware. Hardware is
close but not completely equal; for instance different brands of disks
were used.
2) They didn't use the same compiler. This is really bad and _can_ lead
to more than a factor 2 performance difference. Especially in
"scientific" programs where (auto) vectorization is very important. Why
on earth the benchmarker was too lazy to install a more recent GCC I
have no idea.

Of all the benchmarks shown only the disk benchmarks are interesting,
because they actually stress the system. Unfortunately they screwed that
up too because they were performed on ZFS instead of the default, plain
UFS which is a lot more like EXT4 in terms of functionality.

The rest are pure CPU bound userspace workloads and I bet that if they
were performed using the same compiler, similar results would've been
achieved (barring any major VM differences). In any case we would've
been able to actually compare FreeBSD vs Oracle Linux instead of GCC 4.5
vs 4.2. Now they are useless.

I'm sorry if this mail sounds a bit harsh but I'm tired of seeing
phoronix making the same elementary mistakes again and again even after
these have been pointed out years ago.

- Pieter

Attilio Rao

unread,
Dec 15, 2011, 11:26:04 AM12/15/11
to Mike Tancsa, Ivan Klymenko, m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
2011/12/14 Mike Tancsa <mi...@sentex.net>:

> On 12/13/2011 7:01 PM, m...@freebsd.org wrote:
>>
>> Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ?
>>
>> I don't remember what our specific problem at $WORK was, perhaps it
>> was just interrupt threads not getting serviced fast enough, but we've
>> hard-coded this to 1 and removed the code that sets it in
>> sched_initticks().  The same effect should be had by setting the
>> sysctl after a box is up.
>
> FWIW, this does impact the performance of pbzip2 on an i7. Using a 1.1G file
>
> pbzip2 -v -c big > /dev/null
>
> with burnP6 running in the background,
>
> sysctl kern.sched.steal_thresh=1
> vs
> sysctl kern.sched.steal_thresh=3
>
>
>
>    N           Min           Max        Median           Avg        Stddev
> x  10     38.005022      38.42238     38.194648     38.165052    0.15546188
> +   9     38.695417     40.595544     39.392127     39.435384    0.59814114
> Difference at 95.0% confidence
>        1.27033 +/- 0.412636
>        3.32852% +/- 1.08119%
>        (Student's t, pooled s = 0.425627)
>
> a value of 1 is *slightly* faster.

Hi Mike,
was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE?

Also, the results here should be in the 3% interval for the avg case,
which is not yet at the 'alarm level' but could still be an
indication.
I still suspect I/O plays a big role here, however, thus it could be
detemined by other factors.

Could you retry the bench checking CPU usage and possible thread
migration around for both cases?

Thanks,
Attilio


--
Peace can only be achieved by understanding - A. Einstein


_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Attilio Rao

unread,
Dec 15, 2011, 11:26:27 AM12/15/11
to Jeremy Chadwick, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
2011/12/13 Jeremy Chadwick <fre...@jdc.parodius.com>:

> On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
>> > Not fully right, boinc defaults to run on idprio 31 so this isn't an
>> > issue. And yes, there are cases where SCHED_ULE shows much better
>> > performance then SCHED_4BSD.  [...]
>>
>> Do we have any proof at hand for such cases where SCHED_ULE performs
>> much better than SCHED_4BSD? Whenever the subject comes up, it is
>> mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
>> 2. But in the end I see here contradictionary statements. People
>> complain about poor performance (especially in scientific environments),
>> and other give contra not being the case.
>>
>> Within our department, we developed a highly scalable code for planetary
>> science purposes on imagery. It utilizes present GPUs via OpenCL if
>> present. Otherwise it grabs as many cores as it can.
>> By the end of this year I'll get a new desktop box based on Intels new
>> Sandy Bridge-E architecture with plenty of memory. If the colleague who
>> developed the code is willing performing some benchmarks on the same
>> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
>> recent Suse. For FreeBSD I intent also to look for performance with both
>> different schedulers available.
>
> This is in no way shape or form the same kind of benchmark as what
> you're planning to do, but I thought I'd throw it out there for folks to
> take in as they see fit.
>
> I know folks were focused mainly on buildworld.
>
> I personally would find it interesting if someone with a higher-end
> system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the
> same test (changing -jX to -j{numofcores} of course).
>
> --
> | Jeremy Chadwick                                jdc at parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                   Mountain View, CA, US |
> | Making life hard for others since 1977.               PGP 4BD6C0CB |
>
>
> sched_ule
> ===========
> - time make -j2 buildworld
>  1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w
> - time make -j2 buildkernel
>  640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w
>
>
> sched_4bsd
> ============
> - time make -j2 buildworld
>  1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w
> - time make -j2 buildkernel
>  638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w
>
>
> software
> ==========
> * sched_ule test:  FreeBSD 8.2-STABLE, Thu Dec  1 04:37:29 PST 2011
> * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011

Hi Jeremy,
thanks for the time you spent on this.

However, I wanted to ask/let you note 3 things:
1) Did you use 2 different code base for the test? (one updated on
December 1 and another one on December 12)
2) Please note that you should have repeated this test several times
(basically until you don't get a standard deviation which is
acceptable with ministat) and report the ministat output
3) The difference is less than 2% which I suspect is really
statistically unuseful/the same

I'm not really even surprised ULE is not faster than 4BSD in this case
because usually buildworld/buildkernel tests are driven for the vast
majority by I/O overhead rather than scheduler capacity. It would be
more interesting to analyze how buildworld does while another type of
workload is going on.

Mike Tancsa

unread,
Dec 15, 2011, 11:38:22 AM12/15/11
to Attilio Rao, Ivan Klymenko, m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On 12/15/2011 11:26 AM, Attilio Rao wrote:
>
> Hi Mike,
> was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE?

Hi Attilio,
It was the same codebase.


> Could you retry the bench checking CPU usage and possible thread
> migration around for both cases?

I can, but how do I do that ?

---Mike

--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Attilio Rao

unread,
Dec 15, 2011, 11:42:09 AM12/15/11
to Mike Tancsa, Ivan Klymenko, m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
2011/12/15 Mike Tancsa <mi...@sentex.net>:

> On 12/15/2011 11:26 AM, Attilio Rao wrote:
>>
>> Hi Mike,
>> was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE?
>
> Hi Attilio,
>        It was the same codebase.
>
>
>> Could you retry the bench checking CPU usage and possible thread
>> migration around for both cases?
>
> I can, but how do I do that ?

I'm thinking now to a better test-case for this: can you try that on a
tmpfs volume?

Also what filesystem you were using? How many CPUs were in place?
Did you reboot before to move the steal_thresh value?

Attilio


--
Peace can only be achieved by understanding - A. Einstein

Mike Tancsa

unread,
Dec 15, 2011, 11:52:11 AM12/15/11
to Attilio Rao, Ivan Klymenko, m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On 12/15/2011 11:42 AM, Attilio Rao wrote:
>
> I'm thinking now to a better test-case for this: can you try that on a
> tmpfs volume?

There is enough RAM in the box so that it should not touch the disk, and
I was sending the output to /dev/null, so it was not writing to the disk.

>
> Also what filesystem you were using?

UFS

> How many CPUs were in place?

4

> Did you reboot before to move the steal_thresh value?

No.

---Mike
--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Randy Schultz

unread,
Dec 15, 2011, 11:38:20 AM12/15/11
to Pieter de Goeje, freebsd-p...@freebsd.org, Current FreeBSD, O. Hartmann
On Thu, 15 Dec 2011, Pieter de Goeje spaketh thusly:

-}Detailed results here:
-}http://openbenchmarking.org/result/1112113-AR-ORACLELIN37

LOL! Pretty much 2 entirely different systems, even running different screen
resolutions. Tnx for this link.


-}
-}As usual, the phoronix benchmarks are very misleading.

Also, they tested fbsd RC2. This same thing has come up repeatedly. Seems to
me "big waves" happened when fbsd 8.0 was coming out and phoronix tested RC1
or RC2. Unless my memory is in error (and it may well be), on the 8.0
"comparison" fiasco, it was pointed out that testing a fbsd RC release is like
racing but being preventing from going full throttle. There are debugging
hooks and various extra code bits that slow things down and are not taken out
until the stable release. They *can* be taken out by the end-SA, but phoronix
stated they used a stock kernel. That phoronix did this again makes me
wonder...

I have to agree with and cannot stress enough the importance of testing in the
environment it is to be run in, with the software that is to be run on it. I
used to be a massive linux fan, right up until the day I put freebsd up
against several *nix boxen (IIRC Redhat, Debian, SuSE and IRIX) in a particular
application I was re-working. I had to run the test several times, the
difference was so great. Fbsd didn't just beat the others, it rolled 'em,
smoked 'em and tapped them in the ashtray. But this was with _our_ hardware
configurations and _our_ software configurations and tweaks. Currently we
have a mixture of linux and fbsd in production and test. Some of the things
we do run better on linux, some run better on fbsd. And if they're close,
I'll pick fbsd mostly for personal reasons, e.g. it just makes more sense to
me, some things I like to do are more easily done in fbsd, ...

FWIW, YMMV, yadda yadda. ;>

--
Randy (sch...@earlham.edu) 765.983.1283 <*>

nosce te ipsum

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Attilio Rao

unread,
Dec 15, 2011, 11:56:38 AM12/15/11
to Mike Tancsa, Ivan Klymenko, m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
2011/12/15 Mike Tancsa <mi...@sentex.net>:

> On 12/15/2011 11:42 AM, Attilio Rao wrote:
>>
>> I'm thinking now to a better test-case for this: can you try that on a
>> tmpfs volume?
>
> There is enough RAM in the box so that it should not touch the disk, and
> I was sending the output to /dev/null, so it was not writing to the disk.
>
>>
>> Also what filesystem you were using?
>
> UFS
>
>> How many CPUs were in place?
>
> 4
>
>> Did you reboot before to move the steal_thresh value?
>
> No.

So, as very first thing, can you try the following:
- Same codebase, etc. etc.
- Make the test 4 times, discard the first and ministat for the other 3
- Reboot
- Change the steal_thresh value
- Make the test 4 times, discard the first and ministat for the other 3

Then report discarded values and the ministated one and we will have
more informations I guess
(also, I don't think devfs contention should play a role here, thus
nevermind about it for now).

Thanks,
Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Adrian Chadd

unread,
Dec 15, 2011, 12:30:13 PM12/15/11
to Tony McC, michael...@phoronix.com, freebsd-p...@freebsd.org
On 15 December 2011 06:49, Tony McC <af...@btinternet.com> wrote:
> I suggest always ignoring benchmarks. They are like reading the
> astrology column in a tabloid newspaper.  Instead, try FreeBSD for your
> work.  Is it fast enough?  Surely that is all you need to know. FreeBSD
> is quite fast enough for my needs and I am simply more productive using
> it than when I use any other operating system.  That is partly to do
> with my familiarity with my setup, which I have customised the way I
> want.  That is something that no benchmark can allow for.

You can't ignore benchmarks because:

* people read them;
* media link to them;
* they have pretty pictures.

These are all very important things. If all we do is talk on a mailing
list and never write public articles of our own, if we never push out
our message or work with groups like phronix to dig into the WHY,
we're going to be stuck looking bad. It doesn't matter if we aren't
bad, we still look bad.

It's PR and marketing 101. :)

This discussion with Michael @ Phronix is very helpful. Michael, are
you willing to help dig into why this is the case, and possibly write
a followup article or two about it?
I believe Stefan's response looking at what the benchmarks actually do
is worthwhile. I wonder if there's a way to get FreeBSD to behave the
same at least for comparison, so you can publish what the underlying
differences are due to and what relevance they have in the real world.

To everyone else (including Jeremy) - don't be afraid to publish and
be wrong. You invite discussion and further research that way. Just
don't act like a self-absorbed asshat.

Everyone benefits. :)


Adrian

O. Hartmann

unread,
Dec 15, 2011, 12:58:46 PM12/15/11
to Daniel Kalchev, Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, Stefan Esser, Michael Ross, freebsd-p...@freebsd.org, Jeremy Chadwick
Am 12/15/11 14:51, schrieb Daniel Kalchev:

>
> On Dec 15, 2011, at 3:25 PM, Stefan Esser wrote:
>
>> Am 15.12.2011 11:10, schrieb Michael Larabel:
>>> No, the same hardware was used for each OS.
>>>
>>> In terms of the software, the stock software stack for each OS was used.
>>
>> Just curious: Why did you choose ZFS on FreeBSD, while UFS2 (with
>> journaling enabled) should be an obvious choice since it is more similar
>> in concept to ext4 and since that is what most FreeBSD users will use
>> with FreeBSD?
>
>
> Or perhaps, since it is "server" Linux distribution, use ZFS on Linux as well. With identical tuning on both Linux and FreeBSD. Having the same FS used by both OS will help make the comparison more sensible for FS I/O.
>
> Daniel_______________________________________________

Since ZFS in Linux can only be achieved via FUSE (ad far as I know), it
is legitimate to compare ZFS and ext4. It would be much more competetive
to compare Linux BTRFS and FreeBSD ZFS.

Each OS does optimize on different filesystems and a user/manager can
assume that the vendor offers the best performance available by turning
on the default FS by a standard stock installation.

Using ZFS on Linux would be a great disadvantage and the benchmark would
turn out the same bullsh... as comparing Linux-domain only with FreeBSD
weknesses only ...

Linux distributions offer setups for desktop and server. The FreeBSD
folks have the choice to do it themselfes. And maybe I'm one of those
puritain people appreciating this. "Out of the box" OS is Windooze, with
all its consequences.

Oliver

Post scriptum:
It seems to be hard to follow the benchmark environment on Phoronix
since the URL refers to a setup of different systems.

signature.asc

O. Hartmann

unread,
Dec 15, 2011, 1:00:43 PM12/15/11
to Daniel Kalchev, Adrian Chadd, FreeBSD Stable Mailing List, Current FreeBSD, Samuel J. Greear, freebsd-p...@freebsd.org, Jeremy Chadwick
Am 12/15/11 14:58, schrieb Daniel Kalchev:

>
> On Dec 15, 2011, at 3:48 PM, Jeremy Chadwick wrote:
>
> […]
>> That said: thrown out, data ignored, done.
>>
>> Now what? Where are we? We're right back where we were a day or two
>> ago; meaning no closer to solving the dilemma reported by users and
>> SCHED_ULE. Heck, we're not even sure if there is an issue, other than
>> some folks confirming that SCHED_4BSD performs better for them (that's
>> what started this whole thread), and there are at least a couple which
>> have stated this.
>
> But, are any of these benchmarks really engaging the 4BSD/ULE scheduler differences? Most such benchmarks are run on a system with no other load whatsoever and in no way represent real world experience.
>
> What is more, I believe in such benchmarks "the system feels sluggish" is not measured at all. Even if it is measured, if in such case the benchmark finishes "better" - that is, faster, or say, makes the system freeze for the user for the duration of the test -- it will be considered "win", because the benchmark suite ran faster on that particular system -- whereas a system which ran the benchmark fast, provided good interactive response etc would be considered "loser".

I guess you have some proofs on that "feeling"?

signature.asc

Freddie Cash

unread,
Dec 15, 2011, 1:06:21 PM12/15/11
to O. Hartmann, Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, Daniel Kalchev, Michael Ross, freebsd-p...@freebsd.org, Jeremy Chadwick
On Thu, Dec 15, 2011 at 9:58 AM, O. Hartmann
<ohar...@zedat.fu-berlin.de> wrote:
> Am 12/15/11 14:51, schrieb Daniel Kalchev:
>>
>> On Dec 15, 2011, at 3:25 PM, Stefan Esser wrote:
>>
>>> Am 15.12.2011 11:10, schrieb Michael Larabel:
>>>> No, the same hardware was used for each OS.
>>>>
>>>> In terms of the software, the stock software stack for each OS was used.
>>>
>>> Just curious: Why did you choose ZFS on FreeBSD, while UFS2 (with
>>> journaling enabled) should be an obvious choice since it is more similar
>>> in concept to ext4 and since that is what most FreeBSD users will use
>>> with FreeBSD?
>>
>>
>> Or perhaps, since it is "server" Linux distribution, use ZFS on Linux as well. With identical tuning on both Linux and FreeBSD. Having the same FS used by both OS will help make the comparison more sensible for FS I/O.
>>
>> Daniel_______________________________________________
>
> Since ZFS in Linux can only be achieved via FUSE (ad far as I know), it
> is legitimate to compare ZFS and ext4. It would be much more competetive
> to compare Linux BTRFS and FreeBSD ZFS.

There is a separate kernel module for ZFS that can be installed,
giving you proper kernel-level support for ZFS on Linux.

--
Freddie Cash
fjw...@gmail.com


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

O. Hartmann

unread,
Dec 15, 2011, 1:09:23 PM12/15/11
to Steven Hartland, teva...@googlemail.com, per...@pluto.rain.com, att...@freebsd.org, george+...@m5p.com, freebsd-current >> Current FreeBSD, freebsd...@freebsd.org, fre...@jdc.parodius.com
Am 12/15/11 15:20, schrieb Steven Hartland:
> With all the discussion I thought I'd give a buildworld
> benchmark a go here on a spare 24 core machine. ULE
> tested fine but with 4BSD it wont even boot panicing
> with the following:-
> http://screensnapr.com/v/hwysGV.png
>
> This is on a clean 8.2-RELEASE-p4
>
> Upgrading to RELENG_9 fixed this but its a bit concerning
> that just changing the scheduler would cause the machine
> to panic on boot.
>
> Its only a single run so varience could be high but here's
> the result of a buildworld on this machine running the
> two different schedulers:-
> 4BSD: 24m54.10s real 2h43m12.42s user 56m20.07s sys
> ULE: 23m54.68s real 2h34m59.04s user 50m59.91s sys
>
> What really sticks out is that this is over double that
> of an 8.2 buildworld on the same machine with the same
> kernel
> ULE: 11m12.76s real 1h27m59.39s user 28m59.57s sys
>
> This was run 9.0-PRERELEASE kernel due to 4BSD panicing
> on boot under 8.2.
>
> So for this use ULE vs 4BSD is neither here-nor-there
> but 9.0 buildworld is very slow (x2 slower) compared
> with 8.2 so whats a bigger question in my mind.
>
> Regards
> Steve
>


All of our 8.2-STABLE with ncpu >= 4 compile the OS in half the time a
compilation of FreeBSD 9/10 is needed to. I guess this is due to the
huge LLVM contribution which is now part of the source tree. Even if you
allow building a whole LLVM suite (and not even pieces of it as in
FreeBSD standard for CLANG purposes), it takes another q0 to 20 minutes,
depending on the architecture of the underlying host.

Building kernel or worl, taking time and show then the invers of that
number isn't a good idea, in my opinion.
Therefore I like "artificial" benchmarks: have a set of programs that
can be compiled and take the time if compilation time is important.

Well, your one-shot test would show, that there is indeed a marginal
advantage of SCHED_ULE, if the number of cores is big enough (as said to
be n > 2 in this thread). But I'm a bit disappointed about the very
small advantage on that 24 core hog.

Oliver

signature.asc

Chris Rees

unread,
Dec 15, 2011, 1:46:22 PM12/15/11
to O. Hartmann, Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, Daniel Kalchev, Michael Ross, freebsd-p...@freebsd.org, Jeremy Chadwick
On 15 December 2011 17:58, O. Hartmann <ohar...@zedat.fu-berlin.de> wrote:
> Since ZFS in Linux can only be achieved via FUSE (ad far as I know), it
> is legitimate to compare ZFS and ext4. It would be much more competetive
> to compare Linux BTRFS and FreeBSD ZFS.
>


Er... does ext4 guarantee data integrity?

You're not comparing like with like; please do some research on the
point of ZFS before asserting that they're fair comparisons.

A fair(er) comparison could be ext4 with UFS+soft-updates.

Chris


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Attilio Rao

unread,
Dec 15, 2011, 2:02:44 PM12/15/11
to Jeremy Chadwick, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
2011/12/15 Jeremy Chadwick <fre...@jdc.parodius.com>:

> On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote:
>> 2011/12/13 Jeremy Chadwick <fre...@jdc.parodius.com>:
>> > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
>> >> > Not fully right, boinc defaults to run on idprio 31 so this isn't an
>> >> > issue. And yes, there are cases where SCHED_ULE shows much better
>> >> > performance then SCHED_4BSD. ??[...]

>> >>
>> >> Do we have any proof at hand for such cases where SCHED_ULE performs
>> >> much better than SCHED_4BSD? Whenever the subject comes up, it is
>> >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
>> >> 2. But in the end I see here contradictionary statements. People
>> >> complain about poor performance (especially in scientific environments),
>> >> and other give contra not being the case.
>> >>
>> >> Within our department, we developed a highly scalable code for planetary
>> >> science purposes on imagery. It utilizes present GPUs via OpenCL if
>> >> present. Otherwise it grabs as many cores as it can.
>> >> By the end of this year I'll get a new desktop box based on Intels new
>> >> Sandy Bridge-E architecture with plenty of memory. If the colleague who
>> >> developed the code is willing performing some benchmarks on the same
>> >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
>> >> recent Suse. For FreeBSD I intent also to look for performance with both
>> >> different schedulers available.
>> >
>> > This is in no way shape or form the same kind of benchmark as what
>> > you're planning to do, but I thought I'd throw it out there for folks to
>> > take in as they see fit.
>> >
>> > I know folks were focused mainly on buildworld.
>> >
>> > I personally would find it interesting if someone with a higher-end
>> > system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the
>> > same test (changing -jX to -j{numofcores} of course).
>> >
>> > --
>> > | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at parodius.com |
>> > | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? http://www.parodius.com/ |
>> > | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, US |
>> > | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 4BD6C0CB |

>> >
>> >
>> > sched_ule
>> > ===========
>> > - time make -j2 buildworld
>> > ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w

>> > - time make -j2 buildkernel
>> > ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w

>> >
>> >
>> > sched_4bsd
>> > ============
>> > - time make -j2 buildworld
>> > ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w

>> > - time make -j2 buildkernel
>> > ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w
>> >
>> >
>> > software
>> > ==========
>> > * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011

>> > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011
>>
>> Hi Jeremy,
>> thanks for the time you spent on this.
>>
>> However, I wanted to ask/let you note 3 things:
>> 1) Did you use 2 different code base for the test? (one updated on
>> December 1 and another one on December 12)
>
> No; src-all (/usr/src on this system) was not updated between December
> 1st and December 12th PST.  I do believe I updated it today (15th PST).
> I can/will obviously hold off so that we have a consistent code base for
> comparing numbers between schedulers during buildworld and/or
> buildkernel.

>
>> 2) Please note that you should have repeated this test several times
>> (basically until you don't get a standard deviation which is
>> acceptable with ministat) and report the ministat output
>
> This is the first time I have heard of ministat(1).  I'm pretty sure I
> see what it's for and how it applies to this situation, but boy that man
> page could use some clarification (I have 3 people looking at this thing
> right now trying to figure out what means what in the graph :-) ).
> Anyway, graph or not, I see the point.
>
> Regarding multiple tests: yup, you're absolutely right, the only way to
> do it would be to run a sequence of tests repeatedly (probably 10 per
> scheduler).  Reboots and rm -fr /usr/obj/* would be required after each
> test too, to guarantee empty kernel caches (of all types) consistently
> every time.
>
> What I posted was supposed to give people just a "general idea" if there
> was any gigantic difference between the two, and there really isn't.
> But, as others have stated (and you below), buildworld may not be an
> effective way to "benchmark" what we're trying to test.
>
> Hence me wondering exactly what would make for a good test.  Example:
>
> 1. Run + background some program that "beats on things" (I really don't
> know what; creation/deletion of threads?  CPU benchmark?  bonnie++?),
> with output going to /dev/null.
> 2. Run + background "time make -j2 buildworld" with output going to /dev/null
> 3. Record/save output from "time".
> 4. rm -fr /usr/obj && shutdown -r now
> 5. Repeat all steps ~10 times
> 6. Adjust kernel configuration file to use other scheduler
> 7. Repeat steps 1-5.
>
> What I'm trying to figure out is what #1 and #2 should be in the above
> example.

>
>> 3) The difference is less than 2% which I suspect is really
>> statistically unuseful/the same
>
> Understood.

>
>> I'm not really even surprised ULE is not faster than 4BSD in this case
>> because usually buildworld/buildkernel tests are driven for the vast
>> majority by I/O overhead rather than scheduler capacity. It would be
>> more interesting to analyze how buildworld does while another type of
>> workload is going on.
>
> Yup, agreed/understood, hence me trying to find out what would classify
> as a good stress test for all of this.
>
> I have a testbed system in my garage which I could set up to literally
> do all of this in a loop, meaning automate the entire above process and
> just let it go, writing stderr from time to a file (which wouldn't skew
> the results at all).
>
> Let me know what #1 and #2 above, re: "the workloads", should be and
> I'll be happy to set it up.

My idea, in order to gather meaningful datas for both ULE and 4BSD
would be to see how well they behave in the futher situation:
- 2 concurrent interactive workloads
- 2 concurrent cpu-intensive workloads
- mixed

and having the number of threads for both varying as: N/2, N, N +
small_amount (1 or 2 or 3, etc), N*2 (where N is the number of
available CPUs) which automatically translates into:

- 2 concurrent interactive and intensive (A and B workloads):
* A N/2 threads, B N/2 threads
* A N threads, B N/2 threads
* A N + small_amount, B N/2 threads
* A N*2 threads, B N/2 threads
* A N threads, B N threads
* A N + small_amount, B N threads
* A N*2 threads, B N threads
* A N + small_amount, B N + small_amount threads
* A N*2 threads, B N + small_amount threads
* A N*2 threads, B N*2 threads

For the mixed case, instead, we should try all the 16 combinations
possibly and it is likely the most interesting case, to be honest.

About the workload, we could use:
interactives: buildworld and bonnie++ (I'm not totally sure if
bonnie++ let you decides how many threads to run, but I'm sure we can
replace with something that really does that)
cpu-intensive: dnetc and SOMETHINGELSE (please propose something that
can be setup very easilly!)
mixed case: buildworld and dnetc

About the environment I'd suggest the following things:
- Try to boot with a maximum of 16 CPUs. I'm sure past that point TLB
shootdown overhead is going to be too overwhelming, make doesn't
really scale well, and also there could be too much contention on
vm_page_lock_queue for interactive threads.
- Try to reduce the I/O effect by using tmpfs as a storage for in and
out datas when working out the benchmark
- Use 10.0 with both kerneland and userland totally debug-free (please
recall to set MALLOC_PRODUCTION in jemalloc) and always at the same
svn revision, with the only change being the scheduler switch and the
number of threads changing during the runs

About the test itself I'd suggest the following things:
- After every test combination, please reboot the machine (like, after
you have tested the A N/2 threads and B N/2 threads case on
sched_4bsd, reboot the machine before to do A N threads and B N/2
threads)
- For every test combination I suggest to run the workloads 4 times,
discard the first one (but keep the value!) and ministat the other
three. Showing the "uncached" case against the average cached one will
give much more indication than expected.
- Expect a standard deviation from ministat to be 95% (or beyond) to be valuable
- For every difference in performance we find we should likely start
worry about if it is as or bigger than 3% and being very concerned
from 5% to above

I think we already have some datas of ULE being broken in some cases
(like George's and Steven's case) but we really need to characterize
more, I think.

Now, I understand this seems a gigantic work but I think there is much
people which is interested in working on this and we may scatter these
tests around, to different testers, to find meaningful datas.

If it was me, I would start with comparisons involving all the N and N
+ small_amount cases which should be the most interesting.

Do you have questions?

Ivan Klymenko

unread,
Dec 15, 2011, 2:46:27 PM12/15/11
to Attilio Rao, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Jeremy Chadwick
В Thu, 15 Dec 2011 20:02:44 +0100
Attilio Rao <att...@freebsd.org> пишет:

Perhaps it makes sense to co-write a script to automate these actions?
And place it in /usr/src/tools/sched/...


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Ivan Klymenko

unread,
Dec 15, 2011, 2:46:27 PM12/15/11
to Attilio Rao, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org, Jeremy Chadwick
В Thu, 15 Dec 2011 20:02:44 +0100
Attilio Rao <att...@freebsd.org> пишет:

> 2011/12/15 Jeremy Chadwick <fre...@jdc.parodius.com>:

Perhaps it makes sense to co-write a script to automate these actions?


And place it in /usr/src/tools/sched/...

Mike Tancsa

unread,
Dec 15, 2011, 3:58:04 PM12/15/11
to Attilio Rao, Ivan Klymenko, m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On 12/15/2011 11:56 AM, Attilio Rao wrote:
> So, as very first thing, can you try the following:
> - Same codebase, etc. etc.
> - Make the test 4 times, discard the first and ministat for the other 3
> - Reboot
> - Change the steal_thresh value
> - Make the test 4 times, discard the first and ministat for the other 3
>
> Then report discarded values and the ministated one and we will have
> more informations I guess
> (also, I don't think devfs contention should play a role here, thus
> nevermind about it for now).


Results and data at

http://www.tancsa.com/ule-bsd.html

---Mike


--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mi...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/

Kevin Oberman

unread,
Dec 15, 2011, 4:25:06 PM12/15/11
to Chris Rees, Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, Daniel Kalchev, Michael Ross, freebsd-p...@freebsd.org, O. Hartmann, Jeremy Chadwick
On Thu, Dec 15, 2011 at 10:46 AM, Chris Rees <cr...@freebsd.org> wrote:
> On 15 December 2011 17:58, O. Hartmann <ohar...@zedat.fu-berlin.de> wrote:
>> Since ZFS in Linux can only be achieved via FUSE (ad far as I know), it
>> is legitimate to compare ZFS and ext4. It would be much more competetive
>> to compare Linux BTRFS and FreeBSD ZFS.
>>
>
>
> Er... does ext4 guarantee data integrity?
>
> You're not comparing like with like; please do some research on the
> point of ZFS before asserting that they're fair comparisons.
>
> A fair(er) comparison could be ext4 with UFS+soft-updates.

Wouldn't UFS+SUJ be the closest atch?
--
R. Kevin Oberman, Network Engineer
E-mail: kob...@gmail.com


_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Attilio Rao

unread,
Dec 15, 2011, 4:30:08 PM12/15/11
to Mike Tancsa, Ivan Klymenko, m...@freebsd.org, Doug Barton, freebsd...@freebsd.org, Jilles Tjoelker, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
2011/12/15 Mike Tancsa <mi...@sentex.net>:

> On 12/15/2011 11:56 AM, Attilio Rao wrote:
>> So, as very first thing, can you try the following:
>> - Same codebase, etc. etc.
>> - Make the test 4 times, discard the first and ministat for the other 3
>> - Reboot
>> - Change the steal_thresh value
>> - Make the test 4 times, discard the first and ministat for the other 3
>>
>> Then report discarded values and the ministated one and we will have
>> more informations I guess
>> (also, I don't think devfs contention should play a role here, thus
>> nevermind about it for now).
>
>
> Results and data at
>
> http://www.tancsa.com/ule-bsd.html

I'm not totally sure, what does burnP6 do? is it a CPU-bound workload?
Also, how many threads are spanked in your case for parallel bzip2?

Also, it would be very good if you could arrange these tests against
newer -CURRENT (with userland and kerneland debugging off).

Thanks a lot of your hard work,
Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Chris Rees

unread,
Dec 15, 2011, 4:50:16 PM12/15/11
to Kevin Oberman, Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, Daniel Kalchev, Michael Ross, freebsd-p...@freebsd.org, O. Hartmann, Jeremy Chadwick
On 15 Dec 2011 21:25, "Kevin Oberman" <kob...@gmail.com> wrote:
>
> On Thu, Dec 15, 2011 at 10:46 AM, Chris Rees <cr...@freebsd.org> wrote:
> > On 15 December 2011 17:58, O. Hartmann <ohar...@zedat.fu-berlin.de>
wrote:
> >> Since ZFS in Linux can only be achieved via FUSE (ad far as I know), it
> >> is legitimate to compare ZFS and ext4. It would be much more
competetive
> >> to compare Linux BTRFS and FreeBSD ZFS.
> >>
> >
> >
> > Er... does ext4 guarantee data integrity?
> >
> > You're not comparing like with like; please do some research on the
> > point of ZFS before asserting that they're fair comparisons.
> >
> > A fair(er) comparison could be ext4 with UFS+soft-updates.
>
> Wouldn't UFS+SUJ be the closest atch?

Yup. Thanks.

Chris
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Arnaud Lacombe

unread,
Dec 16, 2011, 12:41:41 AM12/16/11
to O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, FreeBSD Stable Mailing List, Jeremy Chadwick
Hi,

On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann
<ohar...@zedat.fu-berlin.de> wrote:
> Just saw this shot benchmark on Phoronix dot com today:
>
> http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA
>

it might be worth highlighting that despite Oracle Linux 6.1 Server is
using a kernel + compiler almost 2 years old, it still manages to
out-perform the bleeding edge FreeBSD :-)

Now, from what I've read so far in this thread, it seems that a lot of
people are still in abnegation...

my 0.2c,
- Arnaud

> It may be worth to discuss the sad performance of FBSD in some parts of
> the benchmark. A difference of a factor 10 or 100 is simply far beyond
> disapointing, it is more than inacceptable and by just reading those
> benchmarks, I'd like to drop thinking of using FreeBSD even as a backend
> server in scientific and business environments. In detail, some of the
> SciMark benches look disappointing. The overall image can't help over
> the fact that in C-Ray FreeBSD is better performing.
>
> From the compiler, I'd like say there couldn't be a drop of more than 10
> - 15% in performance - but not 10 or 100 times.
>
> I'm just thinking about the discussion of SCHED_ULE and all the saur
> spots we discussed when I stumbled over the test.
>
> Regards,
> Oliver
>

Alex Kuster

unread,
Dec 16, 2011, 1:06:59 AM12/16/11
to freebsd...@freebsd.org
On 12/16/2011 02:41, Arnaud Lacombe wrote:
> Hi,
>
> On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann
> <ohar...@zedat.fu-berlin.de> wrote:
>> Just saw this shot benchmark on Phoronix dot com today:
>>
>> http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA
>>
> it might be worth highlighting that despite Oracle Linux 6.1 Server is
> using a kernel + compiler almost 2 years old, it still manages to
> out-perform the bleeding edge FreeBSD :-)
>
> Now, from what I've read so far in this thread, it seems that a lot of
> people are still in abnegation...
>
> my 0.2c,
> - Arnaud

This smells like flamebait ...
Because everyone with a little love or knowledge about benchmarking
would realize that the benchmark is all wrong, and not only that ...
they say that the benchmark tests defaults and ZFS, afaik is far from
being a default.

Joe Holden

unread,
Dec 16, 2011, 1:44:47 AM12/16/11
to Arnaud Lacombe, FreeBSD Stable Mailing List, freebsd-p...@freebsd.org, Current FreeBSD, O. Hartmann, Jeremy Chadwick
Arnaud Lacombe wrote:
> Hi,
>
> On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann
> <ohar...@zedat.fu-berlin.de> wrote:
>> Just saw this shot benchmark on Phoronix dot com today:
>>
>> http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA
>>
> it might be worth highlighting that despite Oracle Linux 6.1 Server is
> using a kernel + compiler almost 2 years old, it still manages to
> out-perform the bleeding edge FreeBSD :-)
>
serenity# gcc --version
gcc (GCC) 4.2.1 20070831 patched [FreeBSD]

serenity# uname -r
9.0-RC3

> Now, from what I've read so far in this thread, it seems that a lot of
> people are still in abnegation...
>
> my 0.2c,
> - Arnaud
>
>> It may be worth to discuss the sad performance of FBSD in some parts of
>> the benchmark. A difference of a factor 10 or 100 is simply far beyond
>> disapointing, it is more than inacceptable and by just reading those
>> benchmarks, I'd like to drop thinking of using FreeBSD even as a backend
>> server in scientific and business environments. In detail, some of the
>> SciMark benches look disappointing. The overall image can't help over
>> the fact that in C-Ray FreeBSD is better performing.
>>
>> From the compiler, I'd like say there couldn't be a drop of more than 10
>> - 15% in performance - but not 10 or 100 times.
>>
>> I'm just thinking about the discussion of SCHED_ULE and all the saur
>> spots we discussed when I stumbled over the test.
>>
>> Regards,
>> Oliver
>>
> _______________________________________________
> freebsd...@freebsd.org mailing list

> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

O. Hartmann

unread,
Dec 16, 2011, 2:06:09 AM12/16/11
to Joe Holden, freebsd-p...@freebsd.org, Current FreeBSD, FreeBSD Stable Mailing List, Jeremy Chadwick, Arnaud Lacombe
On 12/16/11 07:44, Joe Holden wrote:
> Arnaud Lacombe wrote:
>> Hi,
>>
>> On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann
>> <ohar...@zedat.fu-berlin.de> wrote:
>>> Just saw this shot benchmark on Phoronix dot com today:
>>>
>>> http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA
>>>
>> it might be worth highlighting that despite Oracle Linux 6.1 Server is
>> using a kernel + compiler almost 2 years old, it still manages to
>> out-perform the bleeding edge FreeBSD :-)
>>
> serenity# gcc --version
> gcc (GCC) 4.2.1 20070831 patched [FreeBSD]
>
> serenity# uname -r
> 9.0-RC3
>

For the underlying OS, as far as I know, the compiler hasn't as much
impact as on userland software since autovectorization and other neat
things are not used during system build.

From my experience using gcc 4.2 or 4.4/4.5 does not have an impact
beyond 3% when SSE isn't explicetly enforced.

More interesting is the performance gain due to the architecture. I
think it would be very easy for M. Larabel to repeat this benchmark with
a "bleeding edge" Ubuntu or Suse as well. And since FreeBSD 9.0 can be
compiled with CLANG, it should be possible to compare both also with
"bleeding edge" compilers, say FreeBSD 9/CLANG, Ubuntu 12/gcc 4.6.2.

signature.asc

Stefan Esser

unread,
Dec 16, 2011, 3:33:18 AM12/16/11
to freebsd...@freebsd.org
Am 16.12.2011 07:06, schrieb Alex Kuster:
> On 12/16/2011 02:41, Arnaud Lacombe wrote:
>> Hi,
>>
>> On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann
>> <ohar...@zedat.fu-berlin.de> wrote:
>>> Just saw this shot benchmark on Phoronix dot com today:
>>>
>>> http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA
>>>
>> it might be worth highlighting that despite Oracle Linux 6.1 Server is
>> using a kernel + compiler almost 2 years old, it still manages to
>> out-perform the bleeding edge FreeBSD :-)

No, there was no measurement of Oracle Linux 6.1 compared to a normal
FreeBSD installation as has already pointed out by quite a number of people.

>> Now, from what I've read so far in this thread, it seems that a lot of
>> people are still in abnegation...
>>
>> my 0.2c,
>> - Arnaud
>
> This smells like flamebait ...
> Because everyone with a little love or knowledge about benchmarking
> would realize that the benchmark is all wrong, and not only that ...
> they say that the benchmark tests defaults and ZFS, afaik is far from
> being a default.

Yes, and a default installation of FreeBSD (with UFS2 and SU or SU+J)
would have allowed to run the *exact same* binaries used in the Linux
test by just recursively copying the Linux root to /compat/linux (and
loading linux.ko, of course). There is some emulation overhead (more
pathes are searched, for example), but FreeBSD compared well under
realistic loads in prior tests.

The problem with a number of the tests (obviously measuring the amount
of dirty buffers allowed by the kernel before a generating program is
throttled back to prevent loosing valuable buffer cache contents) does
also lead to very misleading results (since they do not measure a steady
state load situation common on a server).


We have gone through this topic a number of times (as a search for
Phoronix on the mail archives schould be able to reveal).

There may be performance advantages for either OS compared to the other,
but most of the Phoronix tests are totally unsuitable to find them, even
when performed under fair conditions (e.g. same compiler version,
comparable file system).

STefan

Daniel Nebdal

unread,
Dec 16, 2011, 5:30:49 AM12/16/11
to Mike Tancsa, freebsd-p...@freebsd.org, Current FreeBSD
On Thu, Dec 15, 2011 at 9:58 PM, Mike Tancsa <mi...@sentex.net> wrote:
> On 12/15/2011 11:56 AM, Attilio Rao wrote:
>> So, as very first thing, can you try the following:
>> - Same codebase, etc. etc.
>> - Make the test 4 times, discard the first and ministat for the other 3
>> - Reboot
>> - Change the steal_thresh value
>> - Make the test 4 times, discard the first and ministat for the other 3
>>
>> Then report discarded values and the ministated one and we will have
>> more informations I guess
>> (also, I don't think devfs contention should play a role here, thus
>> nevermind about it for now).
>
>
> Results and data at
>
> http://www.tancsa.com/ule-bsd.html
>
>        ---Mike
>

I took the liberty of re-plotting this as one boxplot per test-type,
in the hope of getting a better overview. R script included. Beware
the y-ranges. (To re-plot with a specific y range, add e.g.
"ylim=c(0,35)" to the boxplot() calls.)

http://nebdal.net/sched/plot.html

--
Daniel Nebdal
Dep. of genetics, Oslo University Hospital

Attilio Rao

unread,
Dec 16, 2011, 5:54:55 AM12/16/11
to Arnaud Lacombe, FreeBSD Stable Mailing List, freebsd-p...@freebsd.org, Current FreeBSD, O. Hartmann, Jeremy Chadwick
2011/12/16 Arnaud Lacombe <laco...@gmail.com>:

> Hi,
>
> On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann
> <ohar...@zedat.fu-berlin.de> wrote:
>> Just saw this shot benchmark on Phoronix dot com today:
>>
>> http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA
>>
> it might be worth highlighting that despite Oracle Linux 6.1 Server is
> using a kernel + compiler almost 2 years old, it still manages to
> out-perform the bleeding edge FreeBSD :-)
>
> Now, from what I've read so far in this thread, it seems that a lot of
> people are still in abnegation...
>
> my 0.2c,
>  - Arnaud

Said by someone which really thinks passing __FILE__ and __LINE__ to
kernel function is going to give a mesaurable performance penalty is
really hilarious however :)

It is crystal clear you really don't understand how to make reliable
benchmarks (and likely you don't really have a grasp of nowaday's
machine contention points), so why you keep talking about it? It would
be more valuable for you and whatever project you follow if you spend
your time coding and making real benchmarking.

Attilio


--
Peace can only be achieved by understanding - A. Einstein

_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Johan Hendriks

unread,
Dec 16, 2011, 7:12:44 AM12/16/11
to Arnaud Lacombe, freebsd...@freebsd.org
Arnaud Lacombe schreef:

Well it is just the way it is.
I must say that every time FreeBSD comes out bad, there are always
comments on how the benchmark is done, but NOT in the case when FreeBSD
comes out better.
I remember the MySQL and ULE benchmarkings.
FreeBSD was quicker than Linux...
Nobody complains from the FreeBSD side that we did not use same gcc as
Linux and what ever more, and maybe the benchmarks where more equil, do
we care?
Is FreeBSD not doing the job anymore for you if it is, or if it is not?
Do you want to run Linux because it comes out better in benchmarks? i
for certain do not.
And to be honest, i did try Linux because of the bad samba performance
of FreeBSD, but i take the lower performance over the whole Linux thing.
Linux is just not my cup of thee. Why? feeling, community ? i do not know.

See it from the bright side, there is much more room for improvements. :D

I think that FreeBSD should not worry that much about benchmarks.
Sure it is strange that FreeBSD shows such a great gap, but we all know
that FreeBSD needs some tuning.
Also it is know that FreeBSD is quite conservative with some default
settings.
Every now and then someone complains about this.
MAXPHYS is such a value that comes to mind.
What most people seems to be doing after installing FreeBSD is set some
network tunings in /etc/sysctl.conf. and other stuff.

Maybe it is time to overlook the default settings, and make them more
suitable for machines of today.

The argument is mostly that FreeBSD also needs to run on older hardware,
but if you use amd64, you already have some 'newer' hardware.

just me ...

regards
Johan Hendriks

Stefan Esser

unread,
Dec 16, 2011, 8:08:01 AM12/16/11
to O. Hartmann, Joe Holden, FreeBSD Stable Mailing List, Current FreeBSD, Arnaud Lacombe, freebsd-p...@freebsd.org, Jeremy Chadwick
Am 16.12.2011 08:06, schrieb O. Hartmann:
> For the underlying OS, as far as I know, the compiler hasn't as much
> impact as on userland software since autovectorization and other neat
> things are not used during system build.
>
> From my experience using gcc 4.2 or 4.4/4.5 does not have an impact
> beyond 3% when SSE isn't explicetly enforced.

Well, but the compute intensive tests showed performance variance of a
few percents only, IIRC. The big differences were in the parts that
heavily depend on file system and buffer cache concepts (i.e. the low
limit on dirty buffers in FreeBSD, which is very beneficial in real
world situations; do you remember the first few releases of SunOS-4,
which heavily suffered in interactive performance due to a naive unified
buffer cache VM system that did not limit the amount of dirty buffers?
It caused interactive shells to be swapped out within seconds on systems
with background jobs writing to disk).

> More interesting is the performance gain due to the architecture. I
> think it would be very easy for M. Larabel to repeat this benchmark with
> a "bleeding edge" Ubuntu or Suse as well. And since FreeBSD 9.0 can be
> compiled with CLANG, it should be possible to compare both also with
> "bleeding edge" compilers, say FreeBSD 9/CLANG, Ubuntu 12/gcc 4.6.2.

Clang may be considered "bleeding edge", but in quite a different way
than gcc-4.6.2. While the latter can look back on 2 decades of
development, clang is still in a state where feature completeness (and
bug-to-bug compatibility with GCC ;-) is much more important than
performance. there is much promise of powerful optimizations becoming
available in clang once it is mature, but just now expect GCC 4.6.2 to
deliver 5% to 10% higher performance than clang.

But as stated before: To exclude compiler dependencies just run the
Linux binaries on FreeBSD. There is slight emulation overhead and Glibc
is not particularly optimized for FreeBSD, but this will still provide
more useful results.

And the tests should be selected to represent reasonable real-world
scenarios. Server programs tested on otherwise idle systems and running
for just a few seconds (not reaching equilibrium during the majority of
the test period) are not representative at all (again: if your goal is
to compare server performance).

Regards, STefan


_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Arnaud Lacombe

unread,
Dec 16, 2011, 11:30:38 AM12/16/11
to Attilio Rao, FreeBSD Stable Mailing List, freebsd-p...@freebsd.org, Current FreeBSD, O. Hartmann, Jeremy Chadwick
Hi,

[resend on the ml, my bad]

On Fri, Dec 16, 2011 at 5:54 AM, Attilio Rao <att...@freebsd.org> wrote:
> 2011/12/16 Arnaud Lacombe <laco...@gmail.com>:
>> Hi,
>>
>> On Thu, Dec 15, 2011 at 2:32 AM, O. Hartmann
>> <ohar...@zedat.fu-berlin.de> wrote:
>>> Just saw this shot benchmark on Phoronix dot com today:
>>>
>>> http://www.phoronix.com/scan.php?page=news_item&px=MTAyNzA
>>>
>> it might be worth highlighting that despite Oracle Linux 6.1 Server is
>> using a kernel + compiler almost 2 years old, it still manages to
>> out-perform the bleeding edge FreeBSD :-)
>>
>> Now, from what I've read so far in this thread, it seems that a lot of
>> people are still in abnegation...
>>
>> my 0.2c,
>>  - Arnaud
>
> Said by someone which really thinks passing __FILE__ and __LINE__ to
> kernel function is going to give a mesaurable performance penalty is
> really hilarious however :)
>

You are right, the rest of the kernel's subsystem are so sluggish,
fragile and half baked that this would barely improve anything...

That will be my last word in this thread.

- Arnaud


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Adrian Chadd

unread,
Dec 16, 2011, 11:43:27 AM12/16/11
to Stefan Esser, Joe Holden, FreeBSD Stable Mailing List, Current FreeBSD, Arnaud Lacombe, freebsd-p...@freebsd.org, O. Hartmann, Jeremy Chadwick
Can someone please write up a nice, concise blog post somewhere
outlining all of this?

Extra bonus points if it's a blog that is picked up by
blogs.freebsdish.org and/or some of the other BSD sites.

Guys/girls/fuzzy things - this is 2011; people look at shiny blog
sites with graphs rather than mailing lists. Sorry, we lost that
battle. :)

Adrian


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Bruce Cran

unread,
Dec 17, 2011, 9:37:52 PM12/17/11
to Andrey Chernov, Ivan Klymenko, Doug Barton, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
On 13/12/2011 09:00, Andrey Chernov wrote:
> I observe ULE interactivity slowness even on single core machine
> (Pentium 4) in very visible places, like 'ps ax' output stucks in the
> middle by ~1 second. When I switch back to SHED_4BSD, all slowness is
> gone.

I'm also seeing problems with ULE on a dual-socket quad-core Xeon
machine with 16 logical CPUs. If I run "tar xf somefile.tar" and "make
-j16 buildworld" then logging into another console can take several
seconds. Sometimes even the "Password:" prompt can take a couple of
seconds to appear after typing my username.

--
Bruce Cran

Andrey Chernov

unread,
Dec 18, 2011, 2:52:42 AM12/18/11
to Ian Smith, Bruce Cran, Ivan Klymenko, Doug Barton, freebsd...@freebsd.org, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote:

> On Sun, 18 Dec 2011 02:37:52 +0000, Bruce Cran wrote:
> > On 13/12/2011 09:00, Andrey Chernov wrote:
> > > I observe ULE interactivity slowness even on single core machine (Pentium
> > > 4) in very visible places, like 'ps ax' output stucks in the middle by ~1
> > > second. When I switch back to SHED_4BSD, all slowness is gone.
> >
> > I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine
> > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16
> > buildworld" then logging into another console can take several seconds.
> > Sometimes even the "Password:" prompt can take a couple of seconds to appear
> > after typing my username.
>
> I'd resigned myself to expecting this sort of behaviour as 'normal' on
> my single core 1133MHz PIII-M. As a reproducable data point, running
> 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat
> the CPU while testing my manual fan control script, hogs it up pretty
> much while regularly running the script below in another konsole to
> check values - which often gets stuck half way, occasionally pausing
> _twice_ before finishing. Switching back to the first konsole (on
> another desktop) to kill the dd can also take a couple/few seconds.

This issue not about slow machine under load, because the same
slow machine under exact the same load, but with SCHED_4BSD is very fast
to response interactively.

I think we should not misinterpret interactivity with speed. I see no big
speed (i.e. compilation time) differences, switching schedulers, but see
big _interactivity_ difference. ULE in general tends to underestimate
interactive processes in favour of background ones. It perhaps helps to
compilation, but looks like slowpoke OS from the interactive user
experience.

--
http://ache.vniz.net/


_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Alexander Best

unread,
Dec 18, 2011, 5:24:01 AM12/18/11
to Andrey Chernov, Ian Smith, Bruce Cran, Ivan Klymenko, Doug Barton, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org

+1

i've also experienced issues with ULE and performed several tests to compare
it to the historical 4BSD scheduler. the difference between the two does *not*
seem to be speed (at least not a huge difference), but interactivity.

one of the tests i performed was the following

ttyv0: untar a *huge* (+10G) archive
ttyv1: after ~ 30 seconds of untaring do 'ls -la $direcory', where directory
contains a lot of files. i used "direcory = /var/db/portsnap", because
that directory contains 23117 files on my machine.

measuring 'ls -la $direcory' via time(1) revealed that SCHED_ULE takes > 15
seconds, whereas SCHED_4BSD only takes ~ 3-5 seconds. i think the issue is io.
io operations usually get a high priority, because statistics have shown that
- unlike computational tasks - io intensive tasks only run for a small fraction
of time and then exit: read data -> change data -> writeback data.

so SCHED_ULE might take these statistics too literaly and gives tasks like
bsdtar(1) (in my case) too many ressources, so other tasks which require io are
struggling to get some ressources assigned to them (ls(1) in my case).

of course SCHED_4BSD isn't perfect, too. try using it and run the stress2
testsuite. your whole system will grind to a halt. mouse input drops below
1 HZ. even after killing all the stress2 tests, it will take a few minutes
after the system becomes snappy again.

cheers.
alex

Alexander Best

unread,
Dec 18, 2011, 5:26:00 AM12/18/11
to Andrey Chernov, Ian Smith, Bruce Cran, Ivan Klymenko, Doug Barton, O. Hartmann, Current FreeBSD, freebsd...@freebsd.org, freebsd-p...@freebsd.org
s/portsnap/portsnap\/files/

Bruce Cran

unread,
Dec 18, 2011, 8:06:28 AM12/18/11
to Adrian Chadd, freebsd-p...@freebsd.org, Current FreeBSD
On 18/12/2011 10:34, Adrian Chadd wrote:
> I applaud reppie for trying to make it as easy as possible for people
> to use KTR to provide scheduler traces for him to go digging with, so
> please, if you have these issues and you can absolutely reproduce
> them, please follow his instructions and work with him to get him what
> he needs.

Who's 'reppie'?

--
Bruce Cran
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

O. Hartmann

unread,
Dec 18, 2011, 8:44:24 AM12/18/11
to Bruce Cran, Ivan Klymenko, Doug Barton, freebsd...@freebsd.org, O. Hartmann, Current FreeBSD, freebsd-p...@freebsd.org
On 12/18/11 03:37, Bruce Cran wrote:
> On 13/12/2011 09:00, Andrey Chernov wrote:
>> I observe ULE interactivity slowness even on single core machine
>> (Pentium 4) in very visible places, like 'ps ax' output stucks in the
>> middle by ~1 second. When I switch back to SHED_4BSD, all slowness is
>> gone.
>
> I'm also seeing problems with ULE on a dual-socket quad-core Xeon
> machine with 16 logical CPUs. If I run "tar xf somefile.tar" and "make
> -j16 buildworld" then logging into another console can take several
> seconds. Sometimes even the "Password:" prompt can take a couple of
> seconds to appear after typing my username.
>

I reported ages ago several problems using SCHED_ULE on FreeBSD 8/9 when
doing heavy I/O, either disk or network bound (that time I realised the
problem on servers doing heavy disk I/O or net I/O). It was suspected
that X could be the problem, but we also have a Dell PowerEdge 1950III
running FreeBSD 8.2-STABLE (by next week 9.0-RC[2/3]/STABLE) without X,
but the same problems, but no so prominent as with X. The box has 8
cores, 4 cores per socket each and 16 GB RAM, SAS 6/iR controller and
two PCI-X attached Broacom NexTreme NICs, so the hardware shouldn't be
any kind of trouble.

But that time (over the past two years for now), the problem was
considered "a personal" problem. Bah!

By the beginning of next year my working group expects new hardware.
Since we use for Linux for scientific work (due to OpenCL and CUDA on
TESLA cards), I can't use the Blade system. The boxes I expect is one
Dell Precission T7500, 96 GB RAM, two sockets, two Westmere XEONs each
socket with a summary of 12 cores/24 threads. I'll start a dual OS
installation with FreeBSD 10 and the most recent Suse (since the
development is mostly done by my colleagues on Suse for the C2075 TESLA
board, I need Suse Linux).
I will then being capable of performing some benchmarks on both boxes on
the very same hardware. The other box will be my desk's box, a brand new
Sandy-Bridge E CPU (i7-3960X) with 32 GB RAM. I'm also inclined to
install a dual boot box (I rejected this up to now since I do not like
to install GRUB2 for having multiboot when using GPT on FreeBSD). The
box will run with FreeBSD 9 and an Ubuntu or Gentoo Linux, if. I'm
unsure in the question of Linux, but I tend to have Gentoo for compiling
everything myself.
On this box, I also can perform benchmarks with several setups.

I see forward getting some help and/or tips to proof the issues we
discussed here.

Oliver

signature.asc

Adrian Chadd

unread,
Dec 18, 2011, 2:30:21 PM12/18/11
to O. Hartmann, Bruce Cran, Ivan Klymenko, Doug Barton, freebsd...@freebsd.org, O. Hartmann, Current FreeBSD, Andrey Chernov, freebsd-p...@freebsd.org
Hi,

What Attilllo and others need are KTR traces in the most stripped down
example of interactive-busting workload you can find.

Eg: if you're doing 32 concurrent buildworlds and trying to test
interactivity - fine, but that's going to result in a lot of KTR
stuff.
If you can reproduce it using a dd via /dev/null and /dev/random (like
another poster did) with nothing else running, then even better.
If you can do it without X running, even better.

I honestly suggest ignoring benchmarks for now and concentrating on
interactivity.


Adrian


_______________________________________________
freebsd-p...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance

To unsubscribe, send any mail to "freebsd-perform...@freebsd.org"

Michael Ross

unread,
Dec 15, 2011, 5:41:05 AM12/15/11
to Michael Larabel, FreeBSD Stable Mailing List, freebsd-p...@freebsd.org, Current FreeBSD, O. Hartmann, Jeremy Chadwick
Am 15.12.2011, 11:10 Uhr, schrieb Michael Larabel
<michael...@phoronix.com>:

> On 12/15/2011 02:48 AM, Michael Ross wrote:

>> Anyway these tests were performed on different hardware, FWIW.
>> And with different filesystems, different compilers, different GUIs...
>>
>>
>
> No, the same hardware was used for each OS.
>

The picture under the heading "System Hardware / Software" does not
reflect that.

Motherboard description differs, Chipset description for FreeBSD is empty.


Regards,

Michael


> In terms of the software, the stock software stack for each OS was used.
>
> -- Michael

Michael Larabel

unread,
Dec 15, 2011, 5:55:16 AM12/15/11
to Michael Ross, O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, FreeBSD Stable Mailing List, Jeremy Chadwick
On 12/15/2011 04:41 AM, Michael Ross wrote:
> Am 15.12.2011, 11:10 Uhr, schrieb Michael Larabel
> <michael...@phoronix.com>:
>
>> On 12/15/2011 02:48 AM, Michael Ross wrote:
>
>>> Anyway these tests were performed on different hardware, FWIW.
>>> And with different filesystems, different compilers, different GUIs...
>>>
>>>
>>
>> No, the same hardware was used for each OS.
>>
>
> The picture under the heading "System Hardware / Software" does not
> reflect that.
>
> Motherboard description differs, Chipset description for FreeBSD is
> empty.
>

I was the on that carried out the testing and know that it was on the
same system.

All of the testing, including the system tables, is fully automated.
Under FreeBSD sometimes the parsing of some component strings isn't as
nice as Linux and other supported operating systems by the Phoronix Test
Suite. For the BSD motherboard string parsing it's grabbing
hw.vendor/hw.product from sysctl. Is there a better place to read the
motherboard DMI information from?

-- Michael

>
> Regards,
>
> Michael
>
>
>> In terms of the software, the stock software stack for each OS was used.
>>
>> -- Michael
> _______________________________________________

> To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Michael Ross

unread,
Dec 15, 2011, 6:18:55 AM12/15/11
to Michael Larabel, O. Hartmann, freebsd-p...@freebsd.org, Current FreeBSD, FreeBSD Stable Mailing List, Jeremy Chadwick
Am 15.12.2011, 11:55 Uhr, schrieb Michael Larabel
<michael...@phoronix.com>:

> On 12/15/2011 04:41 AM, Michael Ross wrote:
>> Am 15.12.2011, 11:10 Uhr, schrieb Michael Larabel
>> <michael...@phoronix.com>:
>>
>>> On 12/15/2011 02:48 AM, Michael Ross wrote:
>>
>>>> Anyway these tests were performed on different hardware, FWIW.
>>>> And with different filesystems, different compilers, different GUIs...
>>>>
>>>>
>>>
>>> No, the same hardware was used for each OS.
>>>
>>
>> The picture under the heading "System Hardware / Software" does not
>> reflect that.
>>
>> Motherboard description differs, Chipset description for FreeBSD is
>> empty.
>>
>
> I was the on that carried out the testing and know that it was on the
> same system.

No offense. I'm not doubting you.

But I didn't know this:

> All of the testing, including the system tables, is fully automated.
> Under FreeBSD sometimes the parsing of some component strings isn't as
> nice as Linux and other supported operating systems by the Phoronix Test
> Suite. For the BSD motherboard string parsing it's grabbing
> hw.vendor/hw.product from sysctl.

so maybe you can understand how I got my impression.
NVidia Audio and Realtek Audio.
Looks different to me :-)

> Is there a better place to read the motherboard DMI information from?
>

Following Steven Hartlands' suggestion,
from one of my machines:

/usr/ports/sysutils/dmidecode/#sysctl -a | egrep "hw.vendor|hw.product"

/usr/ports/sysutils/dmidecode/#dmidecode -t 2
# dmidecode 2.11
SMBIOS 2.6 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: FUJITSU
Product Name: D2759
Version: S26361-D2759-A13 WGS04 GS02
Serial Number: 35838599
Asset Tag: -
Features:
Board is a hosting board
Board is removable
Location In Chassis: -
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0


Nice. Didn't know about that.

Regards,

Michael

Patrick M. Hausen

unread,
Dec 15, 2011, 6:28:01 AM12/15/11
to Michael Ross, Michael Larabel, FreeBSD Stable Mailing List, Current FreeBSD, freebsd-p...@freebsd.org, O. Hartmann, Jeremy Chadwick
Hi, all,

Am 15.12.2011 um 12:18 schrieb Michael Ross:
> Following Steven Hartlands' suggestion,
> from one of my machines:
>
> /usr/ports/sysutils/dmidecode/#sysctl -a | egrep "hw.vendor|hw.product"
>
> /usr/ports/sysutils/dmidecode/#dmidecode -t 2
> # dmidecode 2.11
> SMBIOS 2.6 present.
>
> Handle 0x0002, DMI type 2, 15 bytes
> Base Board Information
> Manufacturer: FUJITSU
> Product Name: D2759
> Version: S26361-D2759-A13 WGS04 GS02
> Serial Number: 35838599
> Asset Tag: -
> Features:
> Board is a hosting board
> Board is removable
> Location In Chassis: -
> Chassis Handle: 0x0003
> Type: Motherboard
> Contained Object Handles: 0


Without the need to install an additional port:

datatomb2# kenv

smbios.bios.reldate="11/03/2011"
smbios.bios.vendor="FUJITSU // American Megatrends Inc."
smbios.bios.version="V4.6.4.1 R1.18.0 for D3034-A1x"
smbios.chassis.maker="FUJITSU"
smbios.chassis.serial="YLAP004857"
smbios.chassis.tag="System Asset Tag "
smbios.chassis.version="RX100S7R2"
smbios.memory.enabled="8388608"
smbios.planar.maker="FUJITSU"
smbios.planar.product="D3034-A1"
smbios.planar.serial="LJ1B-P00996"
smbios.planar.version="S26361-D3034-A100 WGS01 GS02"
smbios.socket.enabled="1"
smbios.socket.populated="1"
smbios.system.maker="FUJITSU"
smbios.system.product="PRIMERGY RX100 S7"
smbios.system.serial="YLAP004857"
smbios.system.uuid="f0493081-f5ca-e011-b8a5-a1c4d143da5f"
smbios.system.version="GS02"
smbios.version="2.7"

Kind regards,
Patrick
--
punkt.de GmbH * Kaiserallee 13a * 76133 Karlsruhe
Tel. 0721 9109 0 * Fax 0721 9109 100
in...@punkt.de http://www.punkt.de
Gf: Jürgen Egeling AG Mannheim 108285

Jeremy Chadwick

unread,
Dec 15, 2011, 6:52:54 AM12/15/11
to Michael Larabel, O. Hartmann, Michael Ross, freebsd-p...@freebsd.org, Current FreeBSD, FreeBSD Stable Mailing List
On Thu, Dec 15, 2011 at 04:55:16AM -0600, Michael Larabel wrote:
> On 12/15/2011 04:41 AM, Michael Ross wrote:
> >Am 15.12.2011, 11:10 Uhr, schrieb Michael Larabel
> ><michael...@phoronix.com>:
> >
> >>On 12/15/2011 02:48 AM, Michael Ross wrote:
> >
> >>>Anyway these tests were performed on different hardware, FWIW.
> >>>And with different filesystems, different compilers, different GUIs...
> >>>
> >>>
> >>
> >>No, the same hardware was used for each OS.
> >>
> >
> >The picture under the heading "System Hardware / Software" does
> >not reflect that.
> >
> >Motherboard description differs, Chipset description for FreeBSD
> >is empty.
> >
>
> I was the on that carried out the testing and know that it was on
> the same system.
>
> All of the testing, including the system tables, is fully automated.
> Under FreeBSD sometimes the parsing of some component strings isn't
> as nice as Linux and other supported operating systems by the
> Phoronix Test Suite. For the BSD motherboard string parsing it's
> grabbing hw.vendor/hw.product from sysctl.
>
> Is there a better place to read the motherboard DMI information from?

I *think* what you're referring to is SMBIOS strings -- and these are
available from kenv(1) / kenv(2), not sysctl. But keep reading for why
SMBIOS data is not 100% reliable (greatly depends on the hardware). For
actual device strings/etc. for all devices on busses (PCI, AGP, etc.)
you can use pciconf -lvcb.

That's about as good as it's going to get via software. SMBIOS data
(e.g. smbios.{bios,chassis,planar,system}) is never going to give you
fully-identifiable data; I can point you to tons of systems where the
data inserted there is nonsense, sometimes even just ASCII spaces (and
that is the fault of the system vendor/BIOS manufacturer, not FreeBSD).
Sometimes identical strings are used across completely different
systems/boards (sometimes even server-class boards like ones from
Supermicro). And PCI vendor strings don't give you things like speeds,
frequency/voltages, etc.. Sometimes this matters. For example (just
making something up): "the video benchmark was horrible on FreeBSD",
when in fact it turned out that a run of "pciconf -lvcb" showed your
PCIe card was running at x4 link speed instead of x16.

The best place to get your specifications from are:

* The box
* The physical hardware (by physically inspecting it)
* The user manual / product documentation/
* Purchase orders from whoever bought the hardware
* And, of course, operational speed (if possible) from the OS/userland
utilities

When I read a benchmark/review, I have to assume the person is doing
them on a system they have 100% control over, all the way down to the
hardware. Thus, they should know what exact hardware they have.

Also, when publishing results online, you should take the time to
proofread everything (with a 2nd set of eyes if possible) and be patient
and thorough. People like accuracy, especially when there's hard
data/evidence to back it up that can be made available for download.

Try to understand: so many review-esque sites consist of individuals who
do not understand even remotely what they're doing.

I'm going to give you two examples -- one personal, one word-of-mouth
but from someone I trust dearly.

I have a "reverse analysis" of Anantech's Intel 510 SSD review that has been
sitting in my "draft" folder on my blog for a month now because I'm
downright afraid to publish how their data seems completely and totally
wrong (with evidence to prove it). I'm afraid/stalling because I want
to make absolutely damn sure I'm not missing some key piece of evidence
that explains it, and I've had multiple people read it and go "...wow, I
didn't notice that, that benchmark data makes no sense", but I'm STILL
reluctant. The last thing I want to do is "publish" something that
sparks a controversy where it turns out I'm wrong (and I AM wrong, quite
often!).

As for the other:

http://www.overclockers.com/bulldozer-architecture-explained/

The author of this "review" talks about CPU arch and is praised for
writing a "wonderful article that speaks the truth". But sadly that
doesn't appear to be the case. A colleague of mine is long-time friends
with another individual who is getting his Ph.D in computer architecture
and recently submit a paper to a journal (and was published/accepted)
which has published papers on things like RAID (when it was first
introduced as a concept/method), and hardware watchpoints. Said
individual read the above "review" and described it as, quote, "the
worst article on computer architecture on the entire Internet". One of
the amusing quotes (that got me laughing since I did understand it; my
understanding of CPUs on a silicon level is limited, I'm just an old
65xxx assembly programmer...) was how the article states "this is the
first time AMD has implemented branch prediction". Sigh.

Here's the kicker: said individual immediately recognised that the
article was a near dry cut-and-paste from one of two commonly-used
computer architecture books in college/universities; the first book is
basically a "beginner's guide to CPU architecture". The book is also a
bit old at that. Individual proceeded to look up where the article
author went to school, and noted that said school's CPU architecture
course **ends** with that book.

The user/viewer demographic of overclockers.com is going to be
significantly different from that of phoronix.com -- you know that I'm
sure. The point is that you should be aware that there is going to
be significant discussions that come from publishing such benchmark
comparisons with such a demographic. Things that indicate severe
performance differential (e.g. "10x to 100x worse") are going to be
focused on and criticised -- and hopefully in a socially-agreeable
manner[1] -- and in a much different way than, say, a 3D video card
review site ("lol ur pc sux if u spend onl $4000 on it lol").

The first step is to try and figure out what exactly you're seeing and
why it's so significantly different when compared to other OSes.

[1]: I'm sure by now you know that the BSDs in general tend to harbour a
community of folks who are more argumentative/aggressive than, say,
Linux (generally speaking). In this thread though, I think all of us
really want to assist in some way to figure out what exactly is going on
here, scheduler-wise, and see if we can put something together to hand
developers who are "responsible" for said code and see what comes of it.
Remember, we're all here to try and make things better... I hope. :-)

Footnote: It's nice meeting you (indirectly), I was always curious who
did the phoronix.com reviews/"stuff" when it came to FreeBSD.
Greetings!

--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |

It is loading more messages.
0 new messages