performance problem: Mac OS X vs Linux

Andreas Otto

unread,

Mar 29, 2004, 12:33:08 PM3/29/04

to

Hi,

just my problem ....

I port a software from linux to Mac OS X with !identical! build
environment.
the porting was very fast and the results were identical:

as hardware I use a 500 MHz athlon and a 500 MHz ppc computer.
performance
tests has shown that both machines are on the same speed level ...

and now the problem:

my software uses fifo's to communicate between 2 independent parts.
every transaction needs 2 fifo requests.

now the problem:

on linux one transaction needs ~327 usec and on macosx one transaction
needs ~2424 usec.

the fifo transaction time is on linux 2 x 50 usec and on macosx 2 x
1100 usec.

and now my question : it is possible to optimize macosx fifo speed?

mfg

aotto

Reinder Verlinde

unread,

Apr 2, 2004, 2:33:29 PM4/2/04

to

In article <40685DD4...@t-online.de>,
Andreas Otto <ao...@t-online.de> wrote:

Why do you want to know? If you want to speed up your program, there
might be other ways. Give some more information about what your program
does, and somebody might give a hint.

If you just want to know why the Mac is slower, study the Darwin source
code <http://developer.apple.com/darwin/projects/darwin/>, and/or the
PowerPC architecture.

Reinder

Peter da Silva

unread,

Apr 2, 2004, 6:57:54 PM4/2/04

to

In article <40685DD4...@t-online.de>,
Andreas Otto <ao...@t-online.de> wrote:

> on linux one transaction needs ~327 usec and on macosx one transaction
> needs ~2424 usec.

Hmmm. FIFO implementations can vary widely in latency, because they
tend to be optimised for throughput. Here you may be seeing Mach message
overhead?

Might need to use a different IPC mechanism if you need less latency.

--
I've seen things you people can't imagine. Chimneysweeps on fire over the roofs
of London. I've watched kite-strings glitter in the sun at Hyde Park Gate. All
these things will be lost in time, like chalk-paintings in the rain. `-_-'
Time for your nap. | Peter da Silva | Har du kramat din varg, idag? 'U`

Eric Grant

unread,

Apr 2, 2004, 7:53:04 PM4/2/04

to

On 4/2/04 3:57 PM, "Peter da Silva" <pe...@abbnm.com> wrote:

> In article <40685DD4...@t-online.de>,
> Andreas Otto <ao...@t-online.de> wrote:
>> on linux one transaction needs ~327 usec and on macosx one transaction
>> needs ~2424 usec.
>
> Hmmm. FIFO implementations can vary widely in latency, because they
> tend to be optimised for throughput. Here you may be seeing Mach message
> overhead?
>
> Might need to use a different IPC mechanism if you need less latency.

What IPC mechanism on Mac OS X is faster than a Mach message?

Thanks,
Eric Grant

Paul Russell

unread,

Apr 2, 2004, 8:38:59 PM4/2/04

to

Eric Grant wrote:

>
> What IPC mechanism on Mac OS X is faster than a Mach message?
>

Shared memory ?

Paul

Thomas Bushnell, BSG

unread,

Apr 2, 2004, 11:39:36 PM4/2/04

to

Paul Russell <prus...@sonic.net> writes:

Don't count on it! On a uniprocessor, moving data between processes
using shared memory requires the same context switch, and sometimes
requires page manipulations or cache flushes that can be more
expensive than the copying of the IPC message. It all depends on how
much data you are moving and exactly how.

Thomas

Eric Albert

unread,

Apr 3, 2004, 1:08:26 AM4/3/04

to

In article <40685DD4...@t-online.de>,
Andreas Otto <ao...@t-online.de> wrote:

Maybe; maybe not. Can you provide an example program that demonstrates
the performance problem?

-Eric

--
Eric Albert ejal...@cs.stanford.edu
http://rescomp.stanford.edu/~ejalbert/

Robert Bonomi

unread,

Apr 4, 2004, 10:41:07 PM4/4/04

to

In article <BC934AF0.3C58%er...@eagrant.com>,

Facetious answer: a Mach 2 message would be twice as fast.

Seriously: shared memory. messaging involves the O/S, so you have context
switch to the OS, and back. both for send and receive. plus task switching
between sender and receiver. shared-memory eliminates all the O/S overhead
except for the task-switching between sender and receiver.

Nick Landsberg

unread,

Apr 4, 2004, 10:58:16 PM4/4/04

to

Robert Bonomi wrote:

All good points.

None of which explains the ~2 millisecond discrepancy
between linux and the MACH kernel to do the same
thing (I believe pipes/fifos was the original question?)

(2 milliseconds is almost an eternity when you
get down to that low a level, might as well be
accessing a database system... hrmph)

I would suspect either that the tests were not apples to apples
(since I can't envision the amount of inefficiency
necessary to drive the timings to the millisecond
range for *any* kernel) or the implementation
forced the write out to disk for some reason.
(In which case it is broken, IMO.)

Were the original measurements CPU time or wall-clock
time?

--
"It is impossible to make anything foolproof
because fools are so ingenious"
- A. Bloch

Peter da Silva

unread,

Apr 5, 2004, 1:03:00 PM4/5/04

to

In article <BC934AF0.3C58%er...@eagrant.com>,
Eric Grant <er...@eagrant.com> wrote:

You mean "what IPC mechanism is faster than FIFO implemented on top of Mach
message ports". What you've got is a packet-oriented mechanism got a stream
(FIFO) on top of it, and you're implementing a packet protocol on top of that.

In traditional UNIX the underlying mechanism the FIFO uses is typically
something like a list of fixed size blocks. In Mac OS X they may be doing
this and using Mach messages to lock the list, or they may be using Mach
messages.

Have a look at using the low-level protocol instead of all these layers.

Robert Bonomi

unread,

Apr 7, 2004, 3:08:55 AM4/7/04

to

In article <cX3cc.21907$vo5.6...@bgtnsc05-news.ops.worldnet.att.net>,

Nick Landsberg <huk...@NOSPAM.att.net> wrote:
>Robert Bonomi wrote:
>
>> In article <BC934AF0.3C58%er...@eagrant.com>,
>> Eric Grant <er...@eagrant.com> wrote:
>>
>>>On 4/2/04 3:57 PM, "Peter da Silva" <pe...@abbnm.com> wrote:
>>>
>>>
>>>>In article <40685DD4...@t-online.de>,
>>>>Andreas Otto <ao...@t-online.de> wrote:
>>>>
>>>>> on linux one transaction needs ~327 usec and on macosx one transaction
>>>>> needs ~2424 usec.
>>>>
>>>>Hmmm. FIFO implementations can vary widely in latency, because they
>>>>tend to be optimised for throughput. Here you may be seeing Mach message
>>>>overhead?
>>>>
>>>>Might need to use a different IPC mechanism if you need less latency.
>>>
>>>What IPC mechanism on Mac OS X is faster than a Mach message?
>>
>>
>> Facetious answer: a Mach 2 message would be twice as fast.
>>
>> Seriously: shared memory. messaging involves the O/S, so you have context
>> switch to the OS, and back. both for send and receive. plus task switching
>> between sender and receiver. shared-memory eliminates all the O/S overhead
>> except for the task-switching between sender and receiver.
>>
>All good points.
>
>None of which explains the ~2 millisecond discrepancy
>between linux and the MACH kernel to do the same
>thing (I believe pipes/fifos was the original question?)

It very well *could* explain the difference. All it takes is
a "slightly different" implementation of the task scheduler.

If, on one machine, the receiving task has a higher priority than
the sending one, and the scheduler is fully pre-emptive, then the
the sending task will be pended *immediately* upon sending the message,
and the receiving task activated.

If, on the other machine, the receiving task does -not- have higher
priority, or the scheduler is not pre-emptive, then the sending task
will _continue_ to run until it exhausts its time-slice (or releases
the CPU -- either expressly via sleep()/usleep(), or implicitly by
making some O/S function call), and _only_then_ will the receiving task
become active.

I _think_ the Mac kernel *is* biased to let running jobs run till the
end of their quanta, when possible.

Thomas Bushnell, BSG

unread,

Apr 7, 2004, 3:45:42 PM4/7/04

to

bon...@host122.r-bonomi.com (Robert Bonomi) writes:

> If, on the other machine, the receiving task does -not- have higher
> priority, or the scheduler is not pre-emptive, then the sending task
> will _continue_ to run until it exhausts its time-slice (or releases
> the CPU -- either expressly via sleep()/usleep(), or implicitly by
> making some O/S function call), and _only_then_ will the receiving task
> become active.

If it's a synchronous Mach 3.0 message, then this is incorrect. The
sending task will, in one system call, send a message and wait for a
reply, with only one context switch (and then another for the reply).

Nick Landsberg

unread,

Apr 7, 2004, 4:11:49 PM4/7/04

to

That's what I thought too.

In any event, has anyone asked the
question of whether the timings
were actually CPU time or elapsed
time? I don't recall from the thread.

Tim

unread,

Apr 9, 2004, 12:35:35 PM4/9/04

to

> my software uses fifo's to communicate between 2 independent parts.
> every transaction needs 2 fifo requests.

Not sure if this is related, but I just ported a simple Linux app to a
BSD system, and found it sucking down major CPU on BSD. It uses
mkfifo to create a named pipe, and then select to wait on reads. I
haven't had time to do any experimention, but my best guess right now
is that the select is never blocking. I open the pipe
O_RDONLY|O_NONBLOCK, because the open itself will block unless you
specify the O_NONBLOCK. I suspect, maybe, on BSD this prevents the
select from ever blocking. From some poking around, it appears that
the solution on BSD may be to open the pipe with read/write access, to
prevent the open from blocking. Haven't had time to test yet.

Thomas Bushnell, BSG

unread,

Apr 9, 2004, 4:28:23 PM4/9/04

to

tim...@pacbell.net (Tim) writes:

> I haven't had time to do any experimention, but my best guess right
> now is that the select is never blocking. I open the pipe
> O_RDONLY|O_NONBLOCK, because the open itself will block unless you
> specify the O_NONBLOCK. I suspect, maybe, on BSD this prevents the
> select from ever blocking.

If you set O_NONBLOCK at open time, then it gets copied into the file
descriptor and affects later operations. What you want to do is clear
it with fcntl after you open the file.

Tim

unread,

Apr 21, 2004, 5:08:55 PM4/21/04

to

> If you set O_NONBLOCK at open time, then it gets copied into the file
> descriptor and affects later operations. What you want to do is clear
> it with fcntl after you open the file.

I know this is very off-topic, but for what it's worth, I found that
on FreeBSD, select() will not block on a named pipe fd that is opened
O_RDONLY, even if I clear the O_NONBLOCK flag. On OSX or Linux,
select() works as expected regardless of whether O_NONBLOCK is set.
On FreeBSD, the only way I could get it to work as expected (block
until there is data to read) was to open the pipe O_RDWR.

dave

unread,

May 21, 2005, 9:47:31 AM5/21/05

to

Mach Messages are more efficient for short messages than Shared memory.
If the standard FIFO latencies are an issue, consider a your own FIFO
process/thread using mach messaging.