Minix IPC traffic measurement

220 views
Skip to first unread message

Jaswinder Singh Rajput

unread,
Dec 29, 2009, 5:18:13 AM12/29/09
to minix3, Erik van der Kouwe, Niek Linnenbank, Ben Gras
Dear friends,

I prepared a simple patch to measure Minix IPC traffic, it seems to very high :

2,00,000 mini_send /sec
2,00,000 mini_receive /sec
10,000 mini_senda /sec

My plan is to reduce this traffic by 1000 times. Any comments/Hints.


Procedure to check IPC traffic on Minix315
------------------------------------------

1. Install Minix315

2. Install iperf package on Minix

https://gforge.cs.vu.nl/gf/project/minix/tracker/?action=TrackerItemEdit&tracker_item_id=349

3. Install iperf on Linux

4. Apply either ipc_traffic.patch or copy proc.c to /usr/src/kernel/proc.c

5. make install on (/usr/src)

6. reboot

7. Run iperf server on Minix
$ cd iperf-2.0.4/src
$ ./iperf -s

8. Run iperf client on Minix
$ cd iperf-2.0.4/src
$ ./iperf -c 192.168.1.150 (My minix box IP address 192.168.1.150)

9. Here is the capture Loopback test of IPC traffic on Pentium 2.4 GHz
Minix box:
Tasks involved in Loopback test are :

iperf <--> system <--> inet

Dec 29 14:10:07 minix kernel: mr(355)
Dec 29 14:10:07 minix kernel: ms(320)
Dec 29 14:10:07 minix kernel: mr(360)
Dec 29 14:10:07 minix kernel: ms(325)
Dec 29 14:10:07 minix kernel: mr(365)
Dec 29 14:10:07 minix kernel: ms(330)

Here mr means mini_receive and 355 mean 35,50,000
Here ms means mini_send and 320 mean 32,00,000

So in 1 second IPC traffic for Loopback test is :
mini_receive : 365 - 355 = 1,00,000/sec
mini_send : 330 - 320 = 1,00,000/sec

10. Run iperf client on Linux box
$ cd iperf-2.0.4/src
$ ./iperf -c 192.168.1.150 (My minix box IP address 192.168.1.150)

11. Here is the capture Network test of IPC traffic on Pentium 2.4 GHz
Minix box with RTL8139:
Tasks involved in Network test are :

iperf <--> system <--> inet <--> rtl8139 <--> pci

Dec 29 14:11:13 minix kernel: mr(445)
Dec 29 14:11:13 minix kernel: ms(405)
Dec 29 14:11:13 minix kernel: mr(450)
Dec 29 14:11:13 minix kernel: msa(11,2)
Dec 29 14:11:13 minix kernel: ms(410)
Dec 29 14:11:13 minix kernel: mr(455)
Dec 29 14:11:13 minix kernel: ms(415)
Dec 29 14:11:13 minix kernel: mr(460)
Dec 29 14:11:13 minix kernel: ms(420)
Dec 29 14:11:13 minix kernel: mr(465)
Dec 29 14:11:13 minix kernel: ms(425)

Here mr means mini_receive and 445 mean 44,50,000
Here ms means mini_send and 405 mean 40,50,000
Here msa means mini_senda and 11 mean 11,000 and 2 mean maximum size

So in 1 second IPC traffic for Network test is :
mini_receive : 465 - 445 = 2,00,000/sec
mini_send : 425 - 405 = 2,00,000/sec
mini_senda : 11 - 10 = 10,000/sec

Enjoy !!

--
Jaswinder Singh.

ipc_traffic.zip

Jaswinder Singh Rajput

unread,
Dec 29, 2009, 5:29:47 AM12/29/09
to minix3, Erik van der Kouwe, Niek Linnenbank, Ben Gras, phi...@cs.vu.nl
Hello friends,

Minor clarification :

In Loopback test run iperf client on same Minix machine.

Thank you,
--
Jaswinder Singh.

Jaswinder Singh Rajput

unread,
Dec 29, 2009, 11:21:33 AM12/29/09
to Philip Homburg, Niek Linnenbank, Ben Gras, minix3
Dear Philip,

On Tue, Dec 29, 2009 at 6:15 PM, Philip Homburg <phi...@cs.vu.nl> wrote:


> In your letter dated Tue, 29 Dec 2009 15:59:47 +0530 you wrote:
>>> I prepared a simple patch to measure Minix IPC traffic, it seems to very hig
>>h :
>>>
>>> 2,00,000 mini_send /sec
>

> I can't parse this number. I that two hundred thousand or 2 million?

It is two hundred thousand.

>
> Anyhow, you have to relate that to throughput.
>

iperf is network bandwidth measurement tool, in this case it transfers
6.5 MBytes/sec. which is very less with 100% CPU usage. I want to
transfer 50 MBytes/seconds means approx 400Mbits/sec with 70-80% CPU
usage.

I am curious why IPC calls are so high. How can I reduce it from
2,00,000/sec to 200/second for 6.5 Mbytes/sec. So that I can transfer
50 MBytes/sec easily.

why mini_senda is showing size of 2 ?

> I don't have detailed performance measurements for Minix-3, but for Minix-vmd
> on fairly slow machines (1.6 GHz Atom and 766 MHz Celeron) the cost of a null
> system call is around 2 microseconds. For two hundred thousand mini_sends/sec
> that would be left in the noise.
>
> I assume systemcalls are more expensive on Minix-3, but I don't know how much.
>
> An important issue is how big the reads and writes issued by iperf are. Too
> small, and performance will suffer.

iperf is network bandwidth measurement tool, it keep on transferring
the packets based on what time is specified.

Niek Linnenbank

unread,
Dec 30, 2009, 2:45:07 AM12/30/09
to Jaswinder Singh Rajput, Philip Homburg, Ben Gras, minix3
Hi Jaswinder,

These are very interesting numbers. Such a high number of IPC calls may explain
the slow performance we get. I believe that Ben is working on a profiling solution on MINIX, perhaps
that can be useful here as well :-)

Niek
--
Niek Linnenbank

WWW: http://www.nieklinnenbank.nl/
BLOG: http://nieklinnenbank.wordpress.com/
FUN:    http://www.FreeNOS.org/

Jaswinder Singh Rajput

unread,
Dec 30, 2009, 3:01:31 AM12/30/09
to Niek Linnenbank, Philip Homburg, Ben Gras, minix3
Dear Niek,

In these test I am rounding at 50,000 IPC and I calculated it at
200,000 , so IPC count > 200,000 mini_send/sec

I am trying to use cprofile to check who is calling so many
send-receives like this :

1. setting 1 to CPROFILE in /usr/src/include/minix/config

2. export CPROFILE="-Rcmem-p"

3. make install

4. reboot

5. profile get -o pc8139

6. profile reset

7. /usr/bin/cprofalyze.pl 2400 pc8139

But it does not show the call profiling, Am I missing some thing.

regarding iperf, I can only test TCP , UDP is failing on Minix, can
you please run UDP test on Minix and share the results with us.

Thank you,
--
Jaswinder Singh.

Jaswinder Singh Rajput

unread,
Dec 30, 2009, 6:16:31 AM12/30/09
to Philip Homburg, Niek Linnenbank, Ben Gras, minix3
Dear Philip,

On Wed, Dec 30, 2009 at 4:09 PM, Philip Homburg <phi...@cs.vu.nl> wrote:


> In your letter dated Tue, 29 Dec 2009 21:51:33 +0530 you wrote:
>>iperf is network bandwidth measurement tool, in this case it transfers
>>6.5 MBytes/sec. which is very less with 100% CPU usage. I want to
>>transfer 50 MBytes/seconds means approx 400Mbits/sec with 70-80% CPU
>>usage.
>

> Yes, 200000 for 6.5 Mbyte/s is way too high to get reasonable performance.


>
>>why mini_senda is showing size of 2 ?
>

> I guess that the '2' is the size of the table passed by senda. But I don't
> know where you are getting that number from.
>

I am attaching the patch and proc.c , please check it, I think
mini_senda() is written by you and no one has touched it so far.

> In a loopback test, I expect senda only from FS. In that case 10000 suggests
> that the request size used by iperf is a bit small.
>

In loopback test rtl8139 and pci send/request is not there that is why
count is less.

>>>
>>> An important issue is how big the reads and writes issued by iperf are. Too
>>> small, and performance will suffer.
>>
>>iperf is network bandwidth measurement tool, it keep on transferring
>>the packets based on what time is specified.
>

> But if iperf does reads and writes of, say, one byte at a time, then
> performance will suffer.
>

iperf sends very big packets.

Problem is not with iperf, it works perfectly with Linux.

On Linux box, with iperf :
In loopback test it runs transfer 285 MBytes/sec with 2.39 Gbits/sec

and with Gigabit network it run with 400 Mbits/sec.

ipc_traffic.zip

Antoine Leca

unread,
Dec 30, 2009, 8:16:38 AM12/30/09
to min...@googlegroups.com
Jaswinder Singh Rajput a �crit :

> Dear friends,
>
> I prepared a simple patch to measure Minix IPC traffic, it seems to very high :
>
> 2,00,000 mini_send /sec
> 2,00,000 mini_receive /sec
> 10,000 mini_senda /sec
>
> My plan is to reduce this traffic by 1000 times. Any comments/Hints.

I guess you are speaking about net traffic.

Thinking about it, I am under the impression that there is much IPC
calls dealing with I/O ports. Since you are Realtek cards you should
have the (hardware) option to use memory mapped ports instead of I/O
mapped; doing so probably require a bit of diving into the internals of
the VM server (to map the PCI memory space into the address space of the
driver task), and unfortunately I cannot help here.

But I would learn about the result!


Philip Homburg also wrote (privately?):
> [...] for Minix-vmd [...] the cost of a null
> system call is around 2 microseconds. [...]


> I assume systemcalls are more expensive on Minix-3, but I don't
> know how much.

Well, this would be a interesting stat, and one which is not too
difficult to get.

The results are interesting ;-) ; and beyond Philip's idea, while there
might be differences between Minix-VMD and Minix3, there are also
differences between Minix3 and Minix3!

On a qemu VM (certainly an example of a slow setup, since host is P4
2.4GHz), 2.0.4 spends 17 �s for a getpid() system call, 3.1.2a spends
25�s (+50%), while 3.1.5 spends 51 �s! (+200% or �3 vs 2.0.4)


Antoine

Jaswinder Singh Rajput

unread,
Dec 30, 2009, 9:02:44 AM12/30/09
to Philip Homburg, Niek Linnenbank, Ben Gras, minix3
Hello Friends,

Here I am checking the receive() count :

Here is the log for 2 seconds :

Dec 30 17:19:51 minix kernel: system(64) .. means receive() of system
count:640,000
Dec 30 17:19:51 minix kernel: ms(140) .. means mini_send() count:1400,000
Dec 30 17:19:51 minix kernel: mr(150) .. means mini_receive() count:1500,000
Dec 30 17:19:51 minix kernel: 8139(8) .. means receive() of 8139 count:80,000
Dec 30 17:19:51 minix kernel: system(66)
Dec 30 17:19:51 minix kernel: mr(155)
Dec 30 17:19:51 minix kernel: ms(145)
Dec 30 17:19:51 minix kernel: inet(6) .. means receive() of inet count:60,000
Dec 30 17:19:51 minix kernel: system(68)
Dec 30 17:19:51 minix kernel: mr(160)
Dec 30 17:19:51 minix kernel: ms(150)
Dec 30 17:19:51 minix kernel: system(70)
Dec 30 17:19:51 minix kernel: mr(165)
Dec 30 17:19:51 minix kernel: ms(155)
Dec 30 17:19:51 minix kernel: msa(5,2) .. means mini_senda() count:50,000
Dec 30 17:19:51 minix kernel: system(72)
Dec 30 17:19:51 minix kernel: mr(170)
Dec 30 17:19:52 minix kernel: ms(160)
Dec 30 17:19:52 minix kernel: system(74)
Dec 30 17:19:52 minix kernel: mr(175)
Dec 30 17:19:52 minix kernel: ms(165)
Dec 30 17:19:52 minix kernel: system(76)
Dec 30 17:19:52 minix kernel: mr(180)
Dec 30 17:19:52 minix kernel: ms(170)
Dec 30 17:19:52 minix kernel: system(78)
Dec 30 17:19:52 minix kernel: 8139(10)
Dec 30 17:19:52 minix kernel: mr(185)
Dec 30 17:19:52 minix kernel: ms(175)
Dec 30 17:19:52 minix kernel: system(80)
Dec 30 17:19:52 minix kernel: mr(190)
Dec 30 17:19:52 minix kernel: inet(8)
Dec 30 17:19:52 minix kernel: system(82)
Dec 30 17:19:52 minix kernel: ms(180)

So in 2 seconds :

mini_send() : 180 - 140 = 400,000 so 200,000/sec
mini_senda() : 10,000/sec
mini_receive() : 190 - 150 = 400,000 so 200,000/sec
system: receive() : 82 - 64 = 180,000 so 90,000/sec
inet: receive() : 8 - 6 = 20,000 so 10,000/sec
8139: receive() : 10 - 8 = 20,000 so 10,000/sec

Total mini_receive() is 200,000/sec
And system + inet + 8139 = 110,000/sec so remaining 90,000/sec is
coming from sendrec() + systemcall

I am checking if call profiling works it makes my task easier.

Thank you,
--
Jaswinder Singh.

On Wed, Dec 30, 2009 at 4:09 PM, Philip Homburg <phi...@cs.vu.nl> wrote:


> In your letter dated Tue, 29 Dec 2009 21:51:33 +0530 you wrote:

>>iperf is network bandwidth measurement tool, in this case it transfers
>>6.5 MBytes/sec. which is very less with 100% CPU usage. I want to
>>transfer 50 MBytes/seconds means approx 400Mbits/sec with 70-80% CPU
>>usage.
>

> Yes, 200000 for 6.5 Mbyte/s is way too high to get reasonable performance.
>

>>why mini_senda is showing size of 2 ?
>

> I guess that the '2' is the size of the table passed by senda. But I don't
> know where you are getting that number from.
>

> In a loopback test, I expect senda only from FS. In that case 10000 suggests
> that the request size used by iperf is a bit small.
>
>>>

>>> An important issue is how big the reads and writes issued by iperf are. Too
>>> small, and performance will suffer.
>>
>>iperf is network bandwidth measurement tool, it keep on transferring
>>the packets based on what time is specified.
>

Jaswinder Singh Rajput

unread,
Dec 30, 2009, 9:30:04 AM12/30/09
to min...@googlegroups.com, Philip Homburg
Dear Antoine,

On Wed, Dec 30, 2009 at 6:46 PM, Antoine Leca <Antoine...@gmail.com> wrote:
> Jaswinder Singh Rajput a écrit :


>> Dear friends,
>>
>> I prepared a simple patch to measure Minix IPC traffic, it seems to very high :
>>
>> 2,00,000 mini_send /sec
>> 2,00,000 mini_receive /sec
>> 10,000 mini_senda /sec
>>
>> My plan is to reduce this traffic by 1000 times. Any comments/Hints.
>
> I guess you are speaking about net traffic.
>

If Minix IPC is using 70% of CPU then how can network traffic will
enhance. I am looking at Minix IPC traffic and trying to improve it.


> Thinking about it, I am under the impression that there is much IPC
> calls dealing with I/O ports. Since you are Realtek cards you should
> have the (hardware) option to use memory mapped ports instead of I/O
> mapped; doing so probably require a bit of diving into the internals of
> the VM server (to map the PCI memory space into the address space of the
> driver task), and unfortunately I cannot help here.
>

In my log file I have also shown loopback test where these is no 8139 and PCI.


>> [...] for Minix-vmd [...] the cost of a null
>> system call is around 2 microseconds. [...]
>> I assume systemcalls are more expensive on Minix-3, but I don't
>> know how much.
>
> Well, this would be a interesting stat, and one which is not too
> difficult to get.
>

How can I calculate number of total Minix syscalls ?

Antoine Leca

unread,
Dec 30, 2009, 9:38:47 AM12/30/09
to min...@googlegroups.com
Jaswinder Singh Rajput wrote:
> How can I calculate number of total Minix syscalls ?

I guess you mean the rate.

#!/bin/sh
cat >getpid.c <<EOF
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
int n = atoi(argv[1]);

while(n--)
getpid();
}
EOF

cc -o getpid getpid.c
chmod +x getpid

time ./getpid 1000000 # result in �s
#time ./getpid 1000000000 # result in ns. Take longer :-)

exit

Antoine

PAP

unread,
Dec 30, 2009, 9:55:09 AM12/30/09
to minix3
A suggested approach to reduce I/O IPC overhead is presented in:
Enhancing MINIX 3.X Input/Output Performance
http://sites.google.com/site/minix3projects/

PERFORMANCE EVALUATIONS
The tests were performed sending and receiving files through the
RS-232 serial port at 19200 and 38400 Kbps.
The I/O performance test results are presented in Table 1.
The time units are CPU Hz reported by the Timestamp Counter Register
(TSC).

Table 1: I/O Performance Tests
MINIX STANDARD MINIX-HAL
IN OUT IN OUT
Average 3268 3346 1435 1454
Std Dev 1053 785 754 687

The avarage time to perform I/O operations with the I/O HAL is 43% of
the avarage time used by Standard MINIX3 .
The equipment used for the tests was an Intel Pentium MMX 233.9 MHz
with a L1 Code Cache of 16 KB., L1 Data Cache of 16 KB, RAM size of 96
MB, SDRAM Acces Time 12 [ns], EDO Dram Acces Time 60 [ns].

Now, I am working on Enhanced MINIX3 Input/Output performance through
a Virtual Machine approach

On Dec 30, 11:30 am, Jaswinder Singh Rajput <jaswindermi...@gmail.com>
wrote:
> Dear Antoine,


>
> On Wed, Dec 30, 2009 at 6:46 PM, Antoine Leca <Antoine.Lec...@gmail.com> wrote:
> > Jaswinder Singh Rajput a écrit :
> >> Dear friends,
>
> >> I prepared a simple patch to measure Minix IPC traffic, it seems to very high :
>
> >> 2,00,000 mini_send /sec
> >> 2,00,000 mini_receive /sec
> >> 10,000 mini_senda /sec
>
> >> My plan is to reduce this traffic by 1000 times. Any comments/Hints.
>
> > I guess you are speaking about net traffic.
>
> If Minix IPC is using 70% of CPU then how can network traffic will
> enhance. I am looking at Minix IPC traffic and trying to improve it.
>
> > Thinking about it, I am under the impression that there is much IPC
> > calls dealing with I/O ports. Since you are Realtek cards you should
> > have the (hardware) option to use memory mapped ports instead of I/O
> > mapped; doing so probably require a bit of diving into the internals of
> > the VM server (to map the PCI memory space into the address space of the
> > driver task), and unfortunately I cannot help here.
>
> In my log file I have also shown loopback test where these is no 8139 and PCI.
>

> >> [...] forMinix-vmd[...] the cost of a null

Antoine Leca

unread,
Dec 30, 2009, 12:13:21 PM12/30/09
to min...@googlegroups.com
> On Wed, Dec 30, 2009 at 4:09 PM, Philip Homburg <phi...@cs.vu.nl> wrote:
>> Yes, 200000 for 6.5 Mbyte/s is way too high to get reasonable performance.

There is something I do not match.

I compute 6.5 Mbytes TCP means 4333+4333 Ethernet packets (data+ACK)
(real numbers for received appears lower, probably because the TCP
window is larger than MTU; but go with them.)

If there are 200,000 IPCs, it means (at least) 23 messages/packet.
Look like a bit high to me.

So I went on and counted for sending, in the RTL8139 driver:
.5IPC for part of the (32K) TCP send from iperf (via VFS' ioctl)
0 IPC to set up the grant mechanism inside inet
1 IPC for the DL_WRITE_S, replying "SENT"
1 IPC to get the iovec[] data
1 IPC to get each part of the frame (assuming 1 part)
1 IPC(out) to actually send
1 IPC to add the timestamp to the reply (getuptime())
1 IPC(notify) for the "done" interrupt
1 IPC to reenable the IRQ
1 IPC(in) to check the ISR status
1 IPC(out) to acknowledge the interrupt (writing ISR)
1 IPC(in) to read TSAD
1 IPC(in) to read TSDx inside the for(i) loop, assuming the loop works
correctly (I find it a bit difficult to follow)

Total 12.5 (!), still short from the 23 above.

Now while reading (the ACKs), still on the same RTL8139:
.5IPC for part of the traffic with iperf (via VFS' ioctl)
0 IPC to set up the grant mechanism inside inet
1 IPC for the DL_READ_S, replying 0 (because we suspend)
1 IPC(in) to check the Rx status (BUFE), assume empty, suspend
1 IPC(notify) for the "got_something" interrupt
1 IPC to reenable the IRQ
1 IPC(in) to check the ISR status
1 IPC(out) to acknowledge the interrupt (writing ISR)
1 IPC(in) to read TSAD, since we enter the ISR_TER|ISR_TOK case
because of the ||1 at the end the if() case (is it a bug?);
I assume tx_head==tx_tail so we quickly go out of the for(i) loop
1 IPC(in) to check the Rx status (BUFE) in rl_check_ints (spurious?)
2 IPCs(in) to get d_start and d_end
1 IPC to get the iovec[] data
1 IPC (?) to put the data up
1 IPC(out) to reset CAPR
1 IPC(free reply) to signal the RECVed status

Total 14.5 (!), again short from the 23 of the above computation.

While this protocol is a bit fater than I would have expected (but it
was a very interesting exercise to walk it), there are still a important
difference in the numbers.

I note there are perhaps 2 "in" operations that could be spared, but it
would not change the whole picture.

I did not inspect the inet server (beyond the check that the grant
mechanism did not introduced any spurious calls while sending or
receiving.) In particular, I did not check how many fragments inet was
really passing to the mnx_eth interface, I assumed 1 while sending and 1
while reading above, but I remember from other tests that the actual
numbers might be higher (like 2-3, probably because of the various
checksums); still, I do not expect 10 fragments...
However, based on Jasswinder's datas and particularly when using the
loopback interface (which scores 0 in the above tables), I suspect there
is some fat in the inet server as well; unfortunately I know nothing to
the it, so I am just reduced to plain hints. Does inet uses ioctl() to
communicate between layers? If yes, are those ioctl() routed through the
VFS server? (And each ioctl costs 2 IPCs.) If this hypothesis is
correct, the scheme introduced by Jasswinder would more realistically be

iperf <--> vfs <--> inet/tcp <--> vfs <--> inet/ip <--> rtl8139 <--> pci

or perhaps something even more complex, introducing inet/eth...


Antoine

Jaswinder Singh Rajput

unread,
Dec 30, 2009, 12:16:40 PM12/30/09
to min...@googlegroups.com
Dear Antoine,


On Wed, Dec 30, 2009 at 8:08 PM, Antoine Leca <Antoine...@gmail.com> wrote:
> Jaswinder Singh Rajput wrote:
>> How can I calculate number of total Minix syscalls ?
>
> I guess you mean the rate.
>

No, I mean how many Minix syscalls happens in certain period I mean I
want to see how many Minix syscalls called in last 10 seconds.

Jaswinder Singh Rajput

unread,
Dec 30, 2009, 12:31:10 PM12/30/09
to min...@googlegroups.com
Dear Antoine Leca,

On Wed, Dec 30, 2009 at 10:43 PM, Antoine Leca <Antoine...@gmail.com> wrote:
>> On Wed, Dec 30, 2009 at 4:09 PM, Philip Homburg <phi...@cs.vu.nl> wrote:
>>> Yes, 200000 for 6.5 Mbyte/s is way too high to get reasonable performance.
>
> There is something I do not match.
>
> I compute 6.5 Mbytes TCP means 4333+4333 Ethernet packets (data+ACK)
> (real numbers for received appears lower, probably because the TCP
> window is larger than MTU; but go with them.)
>

Have you tried the patch I specified, how many ms() and mr() you are
getting per second ?


> If there are 200,000 IPCs, it means (at least) 23 messages/packet.
> Look like a bit high to me.
>
> So I went on and counted for sending, in the RTL8139 driver:
>  .5IPC for part of the (32K) TCP send from iperf (via VFS' ioctl)
>  0 IPC to set up the grant mechanism inside inet
>  1 IPC for the DL_WRITE_S, replying "SENT"
>  1 IPC to get the iovec[] data
>  1 IPC to get each part of the frame (assuming 1 part)
>  1 IPC(out) to actually send
>  1 IPC to add the timestamp to the reply (getuptime())
>  1 IPC(notify) for the "done" interrupt
>  1 IPC to reenable the IRQ
>  1 IPC(in) to check the ISR status
>  1 IPC(out) to acknowledge the interrupt (writing ISR)
>  1 IPC(in) to read TSAD
>  1 IPC(in) to read TSDx inside the for(i) loop, assuming the loop works
> correctly (I find it a bit difficult to follow)
>
> Total 12.5 (!), still short from the 23 above.
>

How you get 23 ?

Have you also tried loopback test with iperf (step 7,8 and 9 on same
machine, then there will be no rtl8139 and pci)

Antoine Leca

unread,
Dec 30, 2009, 1:01:14 PM12/30/09
to min...@googlegroups.com
Jaswinder Singh Rajput wrote:
> No, I mean how many Minix syscalls happens in certain period I mean I
> want to see how many Minix syscalls called in last 10 seconds.

Ah sorry.

I remember there were features like this inside mdb (Minix debugger),
particularly to trap messages and IPCs, beyond what is usually seen in a
debugger; yet I do not know the status of mdb (David was working on it?)
and I do not even know if this particular feature is in.

Yet I believe the path you were along (hooking mini_send) is the correct
one; perhaps you could improve it to get "profiling" information by
indexing the calls according to the end_points. I do not know if you are
actually working on it, I did not check the "cprofiling" thing.


Antoine

Antoine Leca

unread,
Dec 30, 2009, 1:41:31 PM12/30/09
to min...@googlegroups.com
Jaswinder Singh Rajput wrote:
> Have you tried the patch I specified, how many ms() and mr() you are
> getting per second ?

I am on it. See below.


>> If there are 200,000 IPCs, it means (at least) 23 messages/packet.
>> Look like a bit high to me.

> How you get 23 ?

I got 4333 from 6.5E6 / 1500, and 23 from 200E3 / (2*4333).


> Have you also tried loopback test with iperf (step 7,8 and 9 on same
> machine, then there will be no rtl8139 and pci)

Yes, and the throughput greatly increases in my (very modest) testbed,
going from 2.5 Mbps (.3 MB/s) with the NIC up to 12.3 Mbps (1.47 MB/s)
with lo; as the raw performance are worse than your material, it might
confirm your hypothesis that indeed the IPC mechanism takes the lion's
share of the counter-performance.


<LATER>
I finalized the tests (and I have to go home, so EOT)

About 13,000 IPC/s with rtl8139 (emulated, virtual) NIC to get .3MB/s,
and 7,000 IPC/s to 1.5 M/s with the loopback interface. One measure
only. The idle system stalls at about 300 IPC/s (with iperf -s).

That boils down to 33 IPCs/frame with the NIC involved,
and 3.5 IPCs/frame without it, assuming 1500 byte frame.

Also, given 51 �s for an empty system call (getpid), 13,000 IPC/s means
66% of whole time; while with 7,000 IPC it "lowers" at 35%, margin
increase is +90%, �1.9 yet the bandwidth grows +400%, �5: I would
conclude that the NIC work (inet/generic/eth+inet/mnx_eth+rtl8139) would
eat more than half of the available time, once the IPC overhead is
discarded. But this is based on one measure only, clearly not sufficient
even if it looks reasonable.


Happy New Year to all!

Antoine

Jaswinder Singh Rajput

unread,
Dec 30, 2009, 1:49:04 PM12/30/09
to Philip Homburg, Niek Linnenbank, Ben Gras, minix3
Dear Philip,

Here is the test with rtl8139 :

Dec 31 00:02:20 minix kernel: Realtek RTL 8139 statistics of port 0:
Dec 31 00:02:20 minix kernel: recvErr : 0 sendErr :
0 OVW : 0
Dec 31 00:02:20 minix kernel: CRCerr : 0 frameAll :
0 missedP : 0
Dec 31 00:02:20 minix kernel: packetR : 1 packetT :
2 transDef : 0
Dec 31 00:02:20 minix kernel: collision : 0 transAb :
0 carrSense : 0
Dec 31 00:02:20 minix kernel: fifoUnder : 0 fifoOver :
0 CDheartbeat: 0
Dec 31 00:02:20 minix kernel: OWC : 0 Interrupts :
3 re_flags = 0x390
Dec 31 00:02:20 minix kernel: TSAD: 0x300f, TSD: 0x0000a03c,
0x0000a03c, 0x00002000, 0x00002000
Dec 31 00:02:20 minix kernel: tx_head 2, tx_tail 2, busy: 0 0 0 0
Dec 31 00:02:32 minix kernel: system(16)
Dec 31 00:02:32 minix kernel: ms(35)
Dec 31 00:02:32 minix kernel: mr(40)
Dec 31 00:02:32 minix kernel: system(18)
Dec 31 00:02:32 minix kernel: ms(40)
Dec 31 00:02:32 minix kernel: mr(45)
Dec 31 00:02:32 minix kernel: system(20)
Dec 31 00:02:32 minix kernel: ms(45)
Dec 31 00:02:32 minix kernel: system(22)
Dec 31 00:02:32 minix kernel: mr(50)
Dec 31 00:02:33 minix kernel: ms(50)
Dec 31 00:02:33 minix kernel: system(24)
Dec 31 00:02:33 minix kernel: mr(55)
Dec 31 00:02:33 minix kernel: ms(55)
Dec 31 00:02:33 minix kernel: system(26)
Dec 31 00:02:33 minix kernel: mr(60)
Dec 31 00:02:39 minix kernel:
Dec 31 00:02:39 minix kernel: Realtek RTL 8139 statistics of port 0:
Dec 31 00:02:39 minix kernel: recvErr : 0 sendErr :
0 OVW : 0
Dec 31 00:02:39 minix kernel: CRCerr : 0 frameAll :
0 missedP : 0
Dec 31 00:02:39 minix kernel: packetR : 4226 packetT :
4432 transDef : 0
Dec 31 00:02:39 minix kernel: collision : 0 transAb :
0 carrSense : 0
Dec 31 00:02:39 minix kernel: fifoUnder : 1 fifoOver :
0 CDheartbeat: 0
Dec 31 00:02:39 minix kernel: OWC : 0 Interrupts :
7682 re_flags = 0x390
Dec 31 00:02:39 minix kernel: TSAD: 0xf00f, TSD: 0x0000a03c,
0x0000a03c, 0x0000a03c, 0x0000a03c
Dec 31 00:02:39 minix kernel: tx_head 0, tx_tail 0, busy: 0 0 0 0



(1) the number of interrupts received : 7679
the number of interrupts send : 7919
total interrupt = 7679 + 7919 = 15598
(2) the number of packets received : 4225
the number of packets transmitted : 4430
Total packets : 4225 + 4430 = 8675
(3) the number of mini_send : 55 - 35 = 200,000
the number of mini_receive : 60 - 40 = 200,000
(4) Duration of this period : 1 second
(5) Useful data transferred : 5.87 MBytes (6155141.12 bytes)
(6) Bandwidth : 48.4 Mbits/sec

So this test need 6155141.12 / 1514 = approx 4065 packets but it
wasted more than double packets for it.

And wasted lot of interrupts 15598.

Thank you,
--
Jaswinder Singh.

On Wed, Dec 30, 2009 at 4:09 PM, Philip Homburg <phi...@cs.vu.nl> wrote:


> In your letter dated Tue, 29 Dec 2009 21:51:33 +0530 you wrote:

>>iperf is network bandwidth measurement tool, in this case it transfers
>>6.5 MBytes/sec. which is very less with 100% CPU usage. I want to
>>transfer 50 MBytes/seconds means approx 400Mbits/sec with 70-80% CPU
>>usage.
>

> Yes, 200000 for 6.5 Mbyte/s is way too high to get reasonable performance.
>

>>why mini_senda is showing size of 2 ?
>

> I guess that the '2' is the size of the table passed by senda. But I don't
> know where you are getting that number from.
>
> In a loopback test, I expect senda only from FS. In that case 10000 suggests
> that the request size used by iperf is a bit small.
>
>>>

>>> An important issue is how big the reads and writes issued by iperf are. Too
>>> small, and performance will suffer.
>>
>>iperf is network bandwidth measurement tool, it keep on transferring
>>the packets based on what time is specified.
>

D.C. van Moolenbroek

unread,
Dec 30, 2009, 2:19:38 PM12/30/09
to minix3
Hi,

> I remember there were features like this inside mdb (Minix debugger),
> particularly to trap messages and IPCs, beyond what is usually seen in a
> debugger; yet I do not know the status of mdb (David was working on it?)
> and I do not even know if this particular feature is in.

Indeed, mdb used to have limited support for system call (IPC)
tracing, but that part has been broken and disabled for a long time.
I've been working on a separate system call/message trace utility
"mtrace" for a while now, and a first version should be finished soon
(ish).

However, while such tools can tell you what processes are doing, any
ptrace-based approach would itself be so slow as to be unsuitable for
taking performance-related measurements..

Regards,
David

Antoine Leca

unread,
Dec 31, 2009, 6:29:18 AM12/31/09
to min...@googlegroups.com
Jaswinder Singh Rajput wrote:
> (1) the number of interrupts received : 7679
> the number of interrupts send : 7919
> total interrupt = 7679 + 7919 = 15598

I do not reach this conclusion. I indeed see 7679 interrupts received by
rtl8139 driver, but nothing else, at least in the log you sent.
Furthermore, each "interrupt received" by the driver have to been "sent"
by something, but it is all the same interrupt, just seen by two
different counters.

> (2) the number of packets received : 4225
> the number of packets transmitted : 4430
> Total packets : 4225 + 4430 = 8675

> (5) Useful data transferred : 5.87 MBytes (6155141.12 bytes)
> (6) Bandwidth : 48.4 Mbits/sec
>
> So this test need 6155141.12 / 1514 = approx 4065 packets but it
> wasted more than double packets for it.

Again I do not see that. Assuming minix is the iperf's "client", it has
to send the datas, that's 6,155,141/1500=4104 packets, it sent 4430,
that's OK for me. The server got them, and, since this is TCP
transmission, should sent back ACK packets; perhaps that number (4225)
is a bit high given a 32K window, yet it does not seem overwhelming.


> And wasted lot of interrupts 15598.

I got 7700, to be compared with 8675 packets.
And this reflects the way the driver is programmed: it interrupts for
every packet received (other ways are more CPU intensive), and when it
comes to send it uses interrupts to return back as soon as possible to
the caller, before the datas are actually delivered over the wire.


I do not know enough of TCP/IP and its implementations to discuss if
these numbers are correct or not; but it does not look completely wrong
to me.


Antoine

Jaswinder Singh Rajput

unread,
Dec 31, 2009, 8:30:40 AM12/31/09
to Philip Homburg, Niek Linnenbank, Ben Gras, minix3
Dear Peter,

On Thu, Dec 31, 2009 at 1:16 AM, Philip Homburg <phi...@cs.vu.nl> wrote:
>>
>>(1) the number of interrupts received : 7679
>>the number of interrupts send : 7919
>>total interrupt = 7679 + 7919 = 15598
>>(2) the number of packets received : 4225
>>the number of packets transmitted : 4430
>>Total packets : 4225 + 4430 = 8675
>>(3) the number of mini_send : 55 - 35 = 200,000
>>the number of mini_receive : 60 - 40 = 200,000
>>(4) Duration of this period : 1 second
>>(5) Useful data transferred : 5.87 MBytes (6155141.12 bytes)
>>(6) Bandwidth : 48.4 Mbits/sec
>>
>>So this test need 6155141.12 / 1514 = approx 4065 packets but it
>>wasted more than double packets for it.
>>
>>And wasted lot of interrupts 15598.
>

> Maybe I'm missing something, but I see a total of 7682 interrupts. Where is the
> 7919 coming from?
>

7919 is calculated on Linux side which is iperf clinet , I calculated
it by difference in cat /proc/interrupts.

> The number of packets makes sense. For every packet that is received, inet
> also sends an ACK. The RFCs suggest to send an ACK every other packet, but
> that never seemed worth the extra complexity.
>

I am sorry, but I am not happy with this, 8139 read only single packet
per interrupt and send ack. I am planning to change it to support read
multiple packets per interrupt.

major problem with inet and it affects to all kinds networks :-

1. It does not read/write more than one packet (1514 bytes) per interrupt

2. It further divided the packet of 1514 like this :

For reading : To read 1514 bytes packet it uses 3 cycles :
a. 512 bytes
b. 512 bytes
c. 494 bytes

For writing : To write 1514 bytes packet it uses 6 cycles :
a. 14 bytes
b. 40 bytes
c. 400 bytes
d. 512 bytes
e. 512 bytes
f. 36 bytes

So due to this 100 Mbit 8139 driver can do reading at 48Mbits/sec and
writing at 12Mbits/sec on Minix.

Minix vs Linux Network Performance
----------------------------------

A. Speed of any Minix Networking loopback without Network driver :
approx 250 Mbits/sec
B. Speed of any Linix Networking loopback without Network driver :
approx 2.5 Gbits/sec
C. Reading speed of any 100/1000 Mbit Minix Ethernet driver : approx
55 Mbits/sec
D. Writing speed of any 100/1000 Mbit Minix Ethernet driver : approx 9 Mbits/sec
E. Reading speed of any 100Mbit Linux ethernet driver : approx 94 Mbits/sec
F. Writing speed of any 100Mbit Linux ethernet driver : approx 94 Mbits/sec
G. Reading speed of any 1000Mbit Linux ethernet driver : approx 505 Mbits/sec
H. Writing speed of any 1000Mbit Linux ethernet driver : approx 505 Mbits/sec

So overall Minix Networking is almost 10 times slower then Linux
And Minix network driver model is very bad.

> So the number of interrupts and the number of packets seem to add up.
>
> I just did a quick test between a FreeBSD system and an FXP in a Celeron 766
> running Minix-vmd. In that test, about 90 Mbit/s was received. So I don't
> expect either the number of ACKs or the processing speed of inet to be an
> issue.
>

Can you please let me know how can I test Minix-vmd.

Niek Linnenbank

unread,
Dec 31, 2009, 9:25:23 AM12/31/09
to Jaswinder Singh Rajput, Philip Homburg, Ben Gras, minix3
Hi Jaswinder,


That's a very interesting conclusion! I got to the exact same conclusion a while ago, and even
wrote a small test program to test the difference in performance, called ethbench (see attachment). This ethbench mimicks
the protocol used by INET, except that it passes multiple network packets per message. I tested it with a modified ethernet
driver (I used e1000) to send ethernet packets to a Linux machine, and achieved the full 1000Mbit :-)

ethbench-multi.tar

Jaswinder Singh Rajput

unread,
Dec 31, 2009, 2:52:29 PM12/31/09
to Philip Homburg, Niek Linnenbank, Ben Gras, minix3
Happy New Year !!

On Thu, Dec 31, 2009 at 1:16 AM, Philip Homburg <phi...@cs.vu.nl> wrote:

> Maybe I'm missing something, but I see a total of 7682 interrupts. Where is the
> 7919 coming from?
>

I further debugged it.

(1) the number of interrupts received : 7679
the number of interrupts send : 7919

total interrupt = 7682 + 7919 = 15598


(2) the number of packets received : 4225
the number of packets transmitted : 4430

Total packets : 4225 + 4430 = 8655

In r8139 interrupts : 7679

ROK interrupts : 4225 (Received packets)
TOK interrupts : 4430 (Transmitted data + ACKs)
Shared interrupts (ROK + TOK) : 976

So 4225 + 4430 - 976 = 7679

We do not need to count interrupts on Linux side 7919 as they are also
ROK + TOK + Shared on linux side.

So 7679 Interrupts are OK.

Jaswinder Singh Rajput

unread,
Jan 3, 2010, 11:22:12 AM1/3/10
to Philip Homburg, Niek Linnenbank, Ben Gras, minix3
Hello,

I have further investigated mini_send and mini_receive.

mini_send() is called by 4 :
1. sys_call with call_nr = SEND <- send()
2. sys_call with call_nr = SENDNB <- sendnb()
3. lock_send() <- sys_task()
4. sys_call with call_nr = SENDREC <- sendrec()

Here ms(85 = s:4 + b:0 + l:38 + sr:41) mini_send (s: SEND b: SENDNB l:
lock_send sr: SENDREC)

and mini_receive() is called by 2 :
1. sys_call with call_nr = RECEIVE <- receive()
2. sys_call with call_nr = SENDREC <- sendrec()

Here mr(90 = r:49 + sr:40) mini_receive (r: RECEIVE sr: SENDREC)

Jan 3 20:11:47 minix kernel: system(38)
Jan 3 20:11:47 minix kernel: mr(90 = r:49 + sr:40)
Jan 3 20:11:47 minix kernel: ms(85 = s:4 + b:0 + l:38 + sr:41)
Jan 3 20:11:47 minix kernel: system(40)
Jan 3 20:11:47 minix kernel: mr(95 = r:51 + sr:43)
Jan 3 20:11:47 minix kernel: ms(90 = s:4 + b:0 + l:41 + sr:43)
Jan 3 20:11:47 minix kernel: system(42)
Jan 3 20:11:47 minix kernel: mr(100 = r:54 + sr:45)
Jan 3 20:11:47 minix kernel: msa(3,2)
Jan 3 20:11:47 minix kernel: ms(95 = s:4 + b:1 + l:43 + sr:45)
Jan 3 20:11:47 minix kernel: system(44)
Jan 3 20:11:47 minix kernel: mr(105 = r:57 + sr:47)
Jan 3 20:11:47 minix kernel: 8139(6)
Jan 3 20:11:47 minix kernel: ms(100 = s:4 + b:1 + l:45 + sr:48)
Jan 3 20:11:47 minix kernel: system(46)
Jan 3 20:11:47 minix kernel: inet(4)
Jan 3 20:11:47 minix kernel: mr(110 = r:60 + sr:49)
Jan 3 20:11:47 minix kernel: system(48)
Jan 3 20:11:47 minix kernel: ms(105 = s:5 + b:1 + l:48 + sr:50)

So in 1 second :

mini_send() : 200,000
1. sys_call with call_nr = SEND <- send() : 10,000
2. sys_call with call_nr = SENDNB <- sendnb() : 10,000
3. lock_send() <- sys_task() : 100,000
4. sys_call with call_nr = SENDREC <- sendrec() : 90,000

mini_receive() : 200,000
1. sys_call with call_nr = RECEIVE <- receive() : 110,000
2. sys_call with call_nr = SENDREC <- sendrec() : 90,000

It shows major load is due to lock_send() and send_rec()

lock_send is called only by one which is sys_task() can you give me
some hint why it is sending so many send()

send_rec() is called by more than 40 places I am doing further
investigation for send_rec(). Once again if get call profile support
it makes my task much easier.

Thank you,
--
Jaswinder Singh.

On Thu, Dec 31, 2009 at 1:16 AM, Philip Homburg <phi...@cs.vu.nl> wrote:
> In your letter dated Thu, 31 Dec 2009 00:19:04 +0530 you wrote:

> Maybe I'm missing something, but I see a total of 7682 interrupts. Where is the
> 7919 coming from?
>

> The number of packets makes sense. For every packet that is received, inet
> also sends an ACK. The RFCs suggest to send an ACK every other packet, but
> that never seemed worth the extra complexity.
>

Antoine Leca

unread,
Jan 4, 2010, 10:14:50 AM1/4/10
to min...@googlegroups.com, Philip Homburg
I finally realized that Philip does not seem to be subscribed to the
list, or at the very least his answers are not going into the list.
Sorry Philip to have you forgotten from my earlier posts; I wonder if
cross-posting to comp.os.minix would not be a good idea?

Jaswinder Singh Rajput wrote:
> On Thu, Dec 31, 2009 at 1:16 AM, Philip Homburg <phi...@cs.vu.nl> wrote:
>>> (1) the number of interrupts received : 7679
>>> the number of interrupts send : 7919
>>> total interrupt = 7679 + 7919 = 15598
>>> (2) the number of packets received : 4225
>>> the number of packets transmitted : 4430
>>> Total packets : 4225 + 4430 = 8675
>>> (3) the number of mini_send : 55 - 35 = 200,000
>>> the number of mini_receive : 60 - 40 = 200,000
>>> (4) Duration of this period : 1 second
>>> (5) Useful data transferred : 5.87 MBytes (6155141.12 bytes)
>>> (6) Bandwidth : 48.4 Mbits/sec
>>>
>>> So this test need 6155141.12 / 1514 = approx 4065 packets but it
>>> wasted more than double packets for it.
>>>

>> The number of packets makes sense. For every packet that is received, inet
>> also sends an ACK. The RFCs suggest to send an ACK every other packet, but
>> that never seemed worth the extra complexity.

Perhaps^W Probably I misunderstood something (and I must confess I did
not read RFC793 :-( ), but my idea of the TCP window size was that you
are supposed to send an ACK at the end of the "window", assuming that
the stream of packets is coming in "regularly".

On the other side I empirically observed (to much of my surprise then)
that the Windows stack did send about half ACKs as packets it received;
I guess I should take this as a confirmation. ;-)

At the end of the day, this means a supplementary traffic of half the
number of frames sent by Minix' inet (when acting as receiver of the
data). Of course the overhead is negligible over the wire (since there
are small packets, probably 64 bytes or something like that), the real
overhead is within the stack.


> I am sorry, but I am not happy with this, 8139 read only single packet
> per interrupt and send ack. I am planning to change it to support read
> multiple packets per interrupt.

The problem above only occurs when Minix is receiving data and sending
ACKs, so the overhead is only in the "writing" path.

Furthermore, I do not see how you can avoid the RTL8139 to interrupt for
every received frame. I believe polling is not an option with Minix,
clock granularity is 60Hz, which at 100 Mb/s means as much as 200
kbytes, far more than the RTL8139 can buffer; even the increased
capacity (up to 1024 frames) of the RTL8169 can be insufficient, and at
1 Gb/s it clearly would.

On the tx side, I believe the interrupt could be avoided (provided 4
buffers like the RTL8139) if the driver waits for the chip to actually
send the data before returning (instead of current model where the
driver returns to Inet as soon as possible, then clean up when the chip
interrupts it later.)
However there is a big difference here: such a driver _and_ such a Inet
server should be able to deal with many (up to 4) frames at the same
moment, which means they should be multi-programmed (or multi-tasked if
you prefer), while the current model is much easier, allowing only one
frame to be dealt with at any given time...


> major problem with inet and it affects to all kinds networks :-
>
> 1. It does not read/write more than one packet (1514 bytes) per interrupt
>
> 2. It further divided the packet of 1514 like this :
>
> For reading : To read 1514 bytes packet it uses 3 cycles :
> a. 512 bytes
> b. 512 bytes
> c. 494 bytes

Clearly we see an artefact at some intermediary buffer undersized.

Did you try to increase BUF_S to 2048 in inet/const.h, along with
switching the setup of buffers in inet/buf.c?

I believe it should work (a little) better.

At least it appears to do so here. The number of IPCs to move datas
between the driver and inet drop from 3 to 1 while receiving, and from
5.82 to 3.75 when sending (see below for the discussion). Throughput
increased from 3.0 Mb/s up to
3.45 Mb/s (that's +15%, not 10 fold), while I did not notice signficant
changes in the number of IPCs/s (using your tool).

>
> For writing : To write 1514 bytes packet it uses 6 cycles :
> a. 14 bytes
> b. 40 bytes
> c. 400 bytes

OK, this one is different: I believe we are seeing the various headers
(Ethernet then TCP/IP; wild guess here, I did not look at the code)
which are probably computed in various places, then the Eth layer is
using the vectored I/O facility to avoid moving datas within the inet
server. I believe this is typical of a trade-off between facility of
developement versus performances.
Perhaps Philip (or any developper fluent enough with the inet server...
is Michael still around?) can help us by making the relevant changes
to test the impact?

> d. 512 bytes
> e. 512 bytes
> f. 36 bytes

Clearly (spurious) cases d and e are the same as above (buffer
undersized), with a similar cure as my test might show.


> Minix vs Linux Network Performance

... are completely unfair: Linux is targetting at high performance
networking, so should be one of the leaders in this area; clearly this
is not the same target for Minix3.
In fact, current Minix3 is just the opposite of the base OS for a
networking server: design is unstable, performances are almost never
considered provided the behaviour is "workable", a lot of hooks (like
port I/Os using the IPC mechanism) are present to provide features like
resiliency which are completely opposite to your point, etc.

Furthermore Minix is suffering from a shortage of resources; so if
Minix' networking code should be improved, I believe your contributions
will be welcome.


> And Minix network driver model is very bad.

One of the nice thing with the microkernel approach is that it is easy
to wipe a defective part and replace it completely with a working one.

Yet I am afraid there are few "good" solutions, once you accept that the
hardware-dependant driver should be separated from the server, and also
that the (priviledged) I/O operations incur the IPC overhead (both
decisions are widely entrenched into Minix 3 philosophy.)

See above about sending multiple frames at once (and no Tx interrupts),
but I feel there are significant changes into Inet to make it possible...

It should be possible to authorize drivers to do I/O directly on some
designated ports: the i386 hardware have a mechanism for that, using the
"iomap" (see src/kernel/i386/protect.c, currently commented out); but
that would mean giving a distinct TSS to each and every driver, a
significant change in the internals (and since drivers are started by
RS, would lead to having a hardware-dependant part in RS, somewhat a
weird idea.) Also there is a performance cost to use the iomap.

Much easier would be to change the drivers (for the capable hardware) to
use memory-mapped registers instead of I/O ports; then bringing in a
mechanism at VM server level to actually do that mapping.
However this is vaporware, as it is very much akin to user-space mmap(),
something that people are expecting to come (ain't you, Roman? :-))


>> So the number of interrupts and the number of packets seem to add up.
>>
>> I just did a quick test between a FreeBSD system and an FXP in a Celeron 766
>> running Minix-vmd. In that test, about 90 Mbit/s was received. So I don't
>> expect either the number of ACKs or the processing speed of inet to be an
>> issue.

Philip,
I shall send you separately my other messages, but in a nutshell: IPC
cost have increased a lot: on an (admittedly weak) qemu on top of
P4/2.4GHz, for a getpid() system call, time went from 17 �s with plain
old 2.0.4, to 25�s (+50%) with 3.1.2a, while 3.1.5 with virtual memory
spends 51 �s! I know VM is a killer for emulators, yet 3.1.5 have
improved a lot here over 3.1.4; I have no idea of the performance of VMD
vs. 3.1.5 on the bare metal.
Also the 3.1.x model for drivers incurs a substancial penalty for any
I/O port operation.


> Can you please let me know how can I test Minix-vmd.

http://www.minix-vmd.org have many informations.


Antoine

Tomas Hruby

unread,
Jan 4, 2010, 1:51:11 PM1/4/10
to min...@googlegroups.com
> Furthermore, I do not see how you can avoid the RTL8139 to interrupt for
> every received frame. I believe polling is not an option with Minix,
> clock granularity is 60Hz, which at 100 Mb/s means as much as 200
> kbytes, far more than the RTL8139 can buffer; even the increased
> capacity (up to 1024 frames) of the RTL8169 can be insufficient, and at
> 1 Gb/s it clearly would.

Indeed polling is an option on the RX side and as proven by the NAPI
model in Linux it is _essential_ to reach more then a few 100s of
Mbit/s especially as Minix has inherently more overheads then Linux.

Imho the same needs to be done on the TX side so that the driver can
poll inet. I think a major rework is required to place shared ring
buffers between drivers and inet on both TX and RX side and possibly
between other parts of the networking stack if inet is choppped in
more part in the future. This model would mimic the NAPI model,
messages between inet and drivers would be changed into notifies
(which are much cheaper as they are deliverd cumulatively as a single
message) and copying would be reduced by exploiting shared memory. On
the TX side a driver would be able to pass more then a single packet
to the driver under high load which has been already shown by Niek as
neccessary to reach gigabit.

The even more interesting part of this rework is that the very same
model is applicable in other parts of multiserver system like Minix if
designed in a generic way.

T.

Tomas Hruby

unread,
Jan 4, 2010, 1:59:15 PM1/4/10
to min...@googlegroups.com
> Philip Homburg also wrote (privately?):
> > [...] for Minix-vmd [...] the cost of a null
> > system call is around 2 microseconds. [...]
> > I assume systemcalls are more expensive on Minix-3, but I don't
> > know how much.
>
> Well, this would be a interesting stat, and one which is not too
> difficult to get.
>
> The results are interesting ;-) ; and beyond Philip's idea, while there
> might be differences between Minix-VMD and Minix3, there are also
> differences between Minix3 and Minix3!

I don't know the difference between minix-vmd and 3, however, every
syscall is pretty expensive operation as system task has to be
scheduled first and the message has to be copied to system task space.
This overhead will be removed soonish by removing the system task.

T.

Leith

unread,
Jan 4, 2010, 2:12:31 PM1/4/10
to min...@googlegroups.com

This overhead will be removed soonish by removing the system task.

               T.

--


Not to take this thread in a different direction, but that comment interested me.  What is the plan to remove the system task?

Antoine Leca

unread,
Jan 5, 2010, 5:35:57 AM1/5/10
to min...@googlegroups.com
Tomas Hruby wrote:
>> Furthermore, I do not see how you can avoid the RTL8139 to interrupt for
>> every received frame. I believe polling is not an option with Minix,
>> clock granularity is 60Hz, which at 100 Mb/s means as much as 200
>> kbytes, far more than the RTL8139 can buffer; even the increased
>> capacity (up to 1024 frames) of the RTL8169 can be insufficient, and at
>> 1 Gb/s it clearly would.
>
> Indeed polling is an option on the RX side and as proven by the NAPI
> model in Linux it is _essential_ to reach more then a few 100s of
> Mbit/s especially as Minix has inherently more overheads then Linux.

Sorry I was not clear. Of course polling is definitively a way to
achieve high performances, I took it as an evidence. My point was, how
can you _implement_ polling at 100 Mb/s when the receiver's capacity is
0.5 Mb (64K), and the obvious poll (system alarms) has 60Hz frequency?

Thinking more to it, I find a way. The RTL8139 have a internal timer
running at PCI clock frequency, so it should be possible to create a
local (to the driver) timer at 200 Hz, and activate it instead of
at-each-packet interrupts when the stream is sustained (to avoid putting
stress on the whole system when the interface is idle); the drawback is
of course an increased complexity in the driver, for example to detect
the "stream is sustained" state (since Inet does not provide any hint).


> Imho the same needs to be done on the TX side so that the driver can
> poll inet.

Sorry, I do not understand that. On the tx side, it should be Inet which
drives the game, shouldn't it? What might be done could be that Inet
polls the driver to learn if it can burstly send many frames, without
waiting for intermediary acknowledgements; but I do not see a
significant advantage in doing such a poll, while I do see problems when
the driver cannot fullfill the promise done in the poll.

IMHO, the problem on the tx side is after the rendez-vous rather than
before it: if the (tx side of the) driver does not have sufficient
resources to achieve the rendezvous, it blocks tx and waits (perhaps
till a reset), I do not see a need for an additional poll.

On the other hand, after the rendezvous, current hardware can store
multiple frames pending to send, so we should enable some overlapping
mechanism to allow the driver to take advantage at this; this in turn
will require a mechanism to postnotify inet that sending is indeed
completed (until a given frame).
The easiest way I can see with current IPC mechanisms is to allow
multiple DL_WRITE concurrently, each returning only when the send is
completed; thus requiring the driver to be multi-programmed, as I said
earlier. I have no idea of the additional complexity it adds at Inet.


> I think a major rework is required to place shared ring buffers [...]

I agree this would be a major rework!

Does it mean we should abandon any new work on the current model,
pending to further definition of this rework to take place, or at least
to be concretized by some design documents?


Antoine

Tomas Hruby

unread,
Jan 5, 2010, 5:05:57 PM1/5/10
to min...@googlegroups.com
On Tue, Jan 05, 2010 at 11:35:57AM +0100, Antoine Leca wrote:
> Tomas Hruby wrote:
> >> Furthermore, I do not see how you can avoid the RTL8139 to interrupt for
> >> every received frame. I believe polling is not an option with Minix,
> >> clock granularity is 60Hz, which at 100 Mb/s means as much as 200
> >> kbytes, far more than the RTL8139 can buffer; even the increased
> >> capacity (up to 1024 frames) of the RTL8169 can be insufficient, and at
> >> 1 Gb/s it clearly would.
> >
> > Indeed polling is an option on the RX side and as proven by the NAPI
> > model in Linux it is _essential_ to reach more then a few 100s of
> > Mbit/s especially as Minix has inherently more overheads then Linux.
>
> Sorry I was not clear. Of course polling is definitively a way to
> achieve high performances, I took it as an evidence. My point was, how
> can you _implement_ polling at 100 Mb/s when the receiver's capacity is
> 0.5 Mb (64K), and the obvious poll (system alarms) has 60Hz frequency?
>
> Thinking more to it, I find a way. The RTL8139 have a internal timer
> running at PCI clock frequency, so it should be possible to create a
> local (to the driver) timer at 200 Hz, and activate it instead of
> at-each-packet interrupts when the stream is sustained (to avoid putting
> stress on the whole system when the interface is idle); the drawback is
> of course an increased complexity in the driver, for example to detect
> the "stream is sustained" state (since Inet does not provide any hint).

NAPI is a mix of interrupts and polling. After a driver receives an
interrpt it starts polling the NIC as long as it can get any new
packets. The interrupts from the device are disabled in the meantime.
If there are no new packets available, it falls back to interrupt
driven mode. No need for any timers. Actually, any timer would have
similar overhead as an interrupt. In addition, it is completely driver
specific problem, i.e. some drivers can use this mode and some not.

>
> > Imho the same needs to be done on the TX side so that the driver can
> > poll inet.
>
> Sorry, I do not understand that. On the tx side, it should be Inet which
> drives the game, shouldn't it? What might be done could be that Inet
> polls the driver to learn if it can burstly send many frames, without
> waiting for intermediary acknowledgements; but I do not see a
> significant advantage in doing such a poll, while I do see problems when
> the driver cannot fullfill the promise done in the poll.
>
> IMHO, the problem on the tx side is after the rendez-vous rather than
> before it: if the (tx side of the) driver does not have sufficient
> resources to achieve the rendezvous, it blocks tx and waits (perhaps
> till a reset), I do not see a need for an additional poll.

In my point of view, it is very similar to RX, only the producer is
not the NIC but inet. There are various option using this model to
reduce the IPC overhead per packet. In any case, if the driver sees
that there are more packets ready for TX it can hand them of to the
NIC at once.

As a side note, it becomes even mor interesting if inet and the
driver(s) are on different cpu and run in parallel.

> > I think a major rework is required to place shared ring buffers [...]
>
> I agree this would be a major rework!
>
> Does it mean we should abandon any new work on the current model,
> pending to further definition of this rework to take place, or at least
> to be concretized by some design documents?

I am merely trying to argue that imo more needs to be done then to
optimize the current model and share some ideas which is find possibly
interesting to work on. If IPC is the major factor and it takes 23
messages to transmit a single packet you can only optimize by a factor
of 23 not 1000. Another note is that the previous discussion talks
about transmitting 200.000 packets per seccond (iirc). The worst case
for gigabit is an order of magnitude higher!

T.

Tomas Hruby

unread,
Jan 5, 2010, 5:08:55 PM1/5/10
to min...@googlegroups.com

There is no need for a system task. It only adds overhead and
complexity. Therefore the plan is to handle system calls straight away
in the kernel.

T.

Leith

unread,
Jan 5, 2010, 5:50:13 PM1/5/10
to min...@googlegroups.com
That is interesting...who is working on that? Is there anything
documented on that yet?

AntoineLeca

unread,
Jan 6, 2010, 10:03:33 AM1/6/10
to minix3
On Jan 5th, 22:05Z, Tomas Hruby wrote:
> On Tue, Jan 05, 2010 at 11:35:57AM +0100, Antoine Leca wrote:
> NAPI is a mix of interrupts and polling. After a driver receives an
> interrpt it starts polling the NIC as long as it can get any new
> packets. The interrupts from the device are disabled in the meantime.
> If there are no new packets available, it falls back to interrupt
> driven mode. No need for any timers.

Ah OK, I did not know that. It makes sense.
However, I fail to understand a fine point: assuming the packets are
coming at 90 Mb/s, and the backoffice (driver + inet + bus +
infrastructure structure overhead) is able to pass say 120 Mb/s,
you'll get about 1 interrupt each 2 packets, at most 3, won't you?

As it understand it: the interrupt comes when the first packet is
received; once it is fully received and passed by up, more than 135 ns
elapsed so the 2nd is now there, it is also engaged, and 1 full
interrupt cycle is avoided;
yet at the end of the 2nd packet, about 240 ns elapsed, that's less
than 270 ns so the driver returns to interrupt mode; is it a correct
description?


> There are various option using this model to
> reduce the IPC overhead per packet. In any case, if the driver sees
> that there are more packets ready for TX it can hand them of to the
> NIC at once.

So I believe we are in agreement here.


> If IPC is the major factor and it takes 23
> messages to transmit a single packet you can only optimize by a factor
> of 23 not 1000.

:-) 1000 was probably just a word; anyway the number 23 above is based
on 100 Mb/s hardware transferring "only" 6.5 Mbytes/s (52 Mb/s), so
the whole margin for optimization is at most 2, not 23 either!

> Another note is that the previous discussion talks
> about transmitting 200.000 packets per seccond (iirc).

Did not see that.
I did see a stat about 200,000 (2e+5) IPCs/s to achieve 6.5 Mbytes/s
which turns into 8700 packets transmitted (counting the TCP ACK),
hence 23 IPCs/(full) packet interaction (of which I can identify only
18, to which we should add the IRQ cost in IPCs).
However I did not see a computation if the time of a single IPC on
that hardware.

For a different (low performing) target I did compute the IPC "cost"
at being below 51µs (to be compared to 2µs for Philip on its low-end,
bare metal, machines, so clearly there is need for further measures);
and I did ONE (!) test counting IPCs, resulting in the max throughput
to be 300 kbytes/s (!) with 13,000 IPC/s, so 32.5 IPC/packet; this
boils down to an upper value of 65% occupation just for IPC (but could
be less is the 51 above is exaggerated, which is quite possible.)

Clearly this latter target is not capable of 1Gb/s, even under Linux!
Also, with those two points I cannot establish a cost of IPC/packet
with Minix, it seems to me that there are other factors than Ethernet
throughput.

Since my testbed is "out of the charts", it would be good to have more
testcases.


Antoine

Tomas Hruby

unread,
Jan 6, 2010, 2:48:20 PM1/6/10
to min...@googlegroups.com
On Wed, Jan 06, 2010 at 07:03:33AM -0800, AntoineLeca wrote:
> On Jan 5th, 22:05Z, Tomas Hruby wrote:
> > On Tue, Jan 05, 2010 at 11:35:57AM +0100, Antoine Leca wrote:
> > NAPI is a mix of interrupts and polling. After a driver receives an
> > interrpt it starts polling the NIC as long as it can get any new
> > packets. The interrupts from the device are disabled in the meantime.
> > If there are no new packets available, it falls back to interrupt
> > driven mode. No need for any timers.
>
> Ah OK, I did not know that. It makes sense.
> However, I fail to understand a fine point: assuming the packets are
> coming at 90 Mb/s, and the backoffice (driver + inet + bus +
> infrastructure structure overhead) is able to pass say 120 Mb/s,
> you'll get about 1 interrupt each 2 packets, at most 3, won't you?
>
> As it understand it: the interrupt comes when the first packet is
> received; once it is fully received and passed by up, more than 135 ns
> elapsed so the 2nd is now there, it is also engaged, and 1 full
> interrupt cycle is avoided;
> yet at the end of the 2nd packet, about 240 ns elapsed, that's less
> than 270 ns so the driver returns to interrupt mode; is it a correct
> description?

The description is correct.

If processing takes to long even with all overheads minimized one can
hardly do more. But that is not the case here.

> > Another note is that the previous discussion talks
> > about transmitting 200.000 packets per seccond (iirc).
>
> Did not see that.
> I did see a stat about 200,000 (2e+5) IPCs/s to achieve 6.5 Mbytes/s
> which turns into 8700 packets transmitted (counting the TCP ACK),
> hence 23 IPCs/(full) packet interaction (of which I can identify only
> 18, to which we should add the IRQ cost in IPCs).
> However I did not see a computation if the time of a single IPC on
> that hardware.

OK 8700 packets per second. What if you have 100.000 packets per
second? Or 1.000.000 Quite a lot of interrups and messages per second.
Quite a lot of CPU time spent in IPC ...

T.

Yuntao Yang

unread,
Jan 6, 2010, 7:30:20 PM1/6/10
to min...@googlegroups.com
What do you mean by handling system calls straight away
in the kernel.


I'm Free


--
You received this message because you are subscribed to the Google Groups "minix3" group.
To post to this group, send email to min...@googlegroups.com.
To unsubscribe from this group, send email to minix3+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/minix3?hl=en.




Jaswinder Singh Rajput

unread,
Jan 7, 2010, 2:12:29 AM1/7/10
to Philip Homburg, Niek Linnenbank, Ben Gras, minix3
On Thu, Dec 31, 2009 at 1:16 AM, Philip Homburg <phi...@cs.vu.nl> wrote:
>
> I just did a quick test between a FreeBSD system and an FXP in a Celeron 766
> running Minix-vmd. In that test, about 90 Mbit/s was received. So I don't
> expect either the number of ACKs or the processing speed of inet to be an
> issue.
>
>

On Minix315 __minix_vmd is broken :

cc -I. -D_MINIX -o buf.o -c buf.c
"./inet.h", line 24: cannot open include file "minix/ansi.h"
"./inet.h", line 25: cannot open include file "minix/cfg_public.h"
"./inet.h", line 80: _NORETURN is not a type identifier
"./inet.h", line 110: extern formal illegal
"./inet.h", line 110: this_proc not in parameter list
"./inet.h", line 111: extern formal illegal
"./inet.h", line 111: version not in parameter list
"./inet.h", line 114: extern formal illegal
"./inet.h", line 114: system_hz not in parameter list
"./generic/assert.h", line 11: (warning) bad_assertion is a function;
cannot be formal
"./generic/assert.h", line 11: bad_assertion not in parameter list
"./generic/assert.h", line 11: identifier not expected
"buf.c", line 113: { not expected

Who is maintaining minix_vmd on minix3

Tomas Hruby

unread,
Jan 9, 2010, 3:44:31 PM1/9/10
to min...@googlegroups.com
On Wed, Jan 06, 2010 at 08:30:20PM -0400, Yuntao Yang wrote:
> What do you mean by handling system calls straight away
> in the kernel.

I mean that it will be done in the interrupt context without
scheduling a system task. There will be no special kernel threads
(tasks) accept pseudo idle tasks soon. There will only user process
then.

T.

>
> I'm Free
>
>
> On Tue, Jan 5, 2010 at 6:08 PM, Tomas Hruby <thr...@gmail.com> wrote:
>
> > On Mon, Jan 04, 2010 at 01:12:31PM -0600, Leith wrote:
> > > >
> > > >
> > > > This overhead will be removed soonish by removing the system task.
> > > >
> > > > T.
> > > >
> > > > --
> > > >
> > > >
> > > Not to take this thread in a different direction, but that comment
> > > interested me. What is the plan to remove the system task?
> >
> > There is no need for a system task. It only adds overhead and
> > complexity. Therefore the plan is to handle system calls straight away
> > in the kernel.
> >
> > T.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "minix3" group.
> > To post to this group, send email to min...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > minix3+un...@googlegroups.com<minix3%2Bunsu...@googlegroups.com>


> > .
> > For more options, visit this group at
> > http://groups.google.com/group/minix3?hl=en.
> >
> >
> >
> >

> -- <br />
>
> You received this message because you are subscribed to the Google Groups "minix3" group.<br />
>
> To post to this group, send email to min...@googlegroups.com.<br />
>
> To unsubscribe from this group, send email to minix3+un...@googlegroups.com.<br />
>
> For more options, visit this group at http://groups.google.com/group/minix3?hl=en.<br />
>

Reply all
Reply to author
Forward
0 new messages