Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: SMPable version of EM driver

149 views
Skip to first unread message

Jack Vogel

unread,
Aug 1, 2007, 12:23:45 PM8/1/07
to Vladimir Ivanov, freeb...@freebsd.org
On 8/1/07, Vladimir Ivanov <wa...@yandex-team.ru> wrote:
> Hi,
>
> I've just published revision of EM (mainstream RELENG_6 version w/patch)
> driver which is being used in our company to increase network
> performance. The main benefit - significantly better SMP utilization.
>
> http://people.yandex-team.ru/~wawa/em-6.2.9-yandex.tar.gz.
> The driver should be used w/RELENG_6.
>
> Feedbacks welcome.

I will take a look at what you've done as soon as I can, I have a
some issues keeping me busy so it may take me a few days.

Regards,

Jack

Vladimir Ivanov

unread,
Oct 2, 2007, 1:33:42 PM10/2/07
to Jack Vogel, freeb...@freebsd.org
Hi,
Didn't receive feedback still.
:-(
> Regards,
>
> Jack
>

We've just published newest revision of Yandex' em driver at
http://people.yandex-team.ru/~wawa/em-6.2.9-yandex-1.15.tar.gz.

Main improvement of this version: driver does not use TX interrupts at
all. So, interrupt rate reduced significantly.

There are also several small bug fixes and tiny patch set explanation in
the file README.Yandex.

WBR,

--
Vladimir Ivanov
Network Operations Center
OOO "Yandex"
t: +7 495 739-7000
f: +7 495 739-7070
@: n...@yandex.net (corporate)
wa...@yandex-team.ru (personal)
www: www.yandex.ru
--

Jack Vogel

unread,
Oct 2, 2007, 4:58:22 PM10/2/07
to Vladimir Ivanov, freeb...@freebsd.org
I'm sorry I have not been able to get to this yet, but putting
food on the table comes first so the FreeBSD work that
Intel pays me for has to come first. Also your driver work
is based on a version that is too old to just accept, I am
hoping to get the STABLE tree converted to the new
shared code that CURRENT has shortly, so whatever good
ideas you have, and I'm sure you have many, will need to
be made into the new code base. Furthermore, that base
is a moving target as I add new hardware support.

In the near term I will be taking changes that I did for the
10G Oplin driver, specifically multiqueue/rss and the lock
splitting that is in that driver, and putting them back into
the Gig driver, but that should go into CURRENT first.

My recomendation is to move your work to CURRENT.

Regards,

Jack

Vladimir Ivanov

unread,
Oct 2, 2007, 6:03:13 PM10/2/07
to Jack Vogel, freeb...@freebsd.org
Jack Vogel wrote:
> I'm sorry I have not been able to get to this yet, but putting
> food on the table comes first so the FreeBSD work that
> Intel pays me for has to come first. Also your driver work
> is based on a version that is too old to just accept, I am
> hoping to get the STABLE tree converted to the new
> shared code that CURRENT has shortly, so whatever good
> ideas you have, and I'm sure you have many, will need to
> be made into the new code base. Furthermore, that base
> is a moving target as I add new hardware support.
>
> In the near term I will be taking changes that I did for the
> 10G Oplin driver, specifically multiqueue/rss and the lock
> splitting that is in that driver, and putting them back into
> the Gig driver, but that should go into CURRENT first.
>
> My recomendation is to move your work to CURRENT.

We have a little bit different points of view. Your business is code
development. Our business is to make several thousand FreeBSD boxes fast
and stable. That's why we've limited with OS release selection. But
another side of my coin is: we able to test software with a lot of
running systems.

We plan to deal with CURRENT though.

WBR,
Vladimir

Jack Vogel

unread,
Oct 2, 2007, 6:44:32 PM10/2/07
to Vladimir Ivanov, freeb...@freebsd.org
On 10/2/07, Vladimir Ivanov <wa...@yandex-team.ru> wrote:

I can understand and appreciate that, and in fact, both our tasks
are necessary for success. I will try harder to get to your code
Vladimir.

Jack

Bruce Evans

unread,
Oct 2, 2007, 9:48:33 PM10/2/07
to Vladimir Ivanov, freeb...@freebsd.org, Jack Vogel
On Tue, 2 Oct 2007, Vladimir Ivanov wrote:

> Main improvement of this version: driver does not use TX interrupts at all.
> So, interrupt rate reduced significantly.

Polling for anything is a bug IMO. Buggy hardware may work better with it,
but em is not buggy :-).

For bge, I tune the interrupt moderation parameters to reduce the tx
interrupt rate to almost as low as possible without doing polling.
The rate is either 1 interrupt per second if the tx is almost inactive
or 1 interrupt every 384 packets if the tx is active. -current mistunes
these parameters to 150 (microseconds) and 10 (descriptos). Old tuning
of 150 and 128 only loses a little compared with 1000000 and 384. (150
gives 6667 interrupts per second under load. This interrupt rate is
quite manageable and is about the same rate as you have to use with
polling to get the same throughput but lower efficiency as with
interrupts. 128 for the descriptor limit causes in a max interrupt rate
of only a few hundred per second except with tiny packets, but 10 is
excessively small and requires a rate of up to 140000 per second to keep
up with tiny packets. 140000 isn't manageable.)

em has more/better interrupt parameters with non-broken defaults so I haven't
needed to tune them. For bge, I implement dynamic rx interrupt moderation
in software where em has it in hardware. 10000 interrupts/second for rx
is a good limit. IIRC, em uses 8000 which is a bit low for a max, and
is missing a sysctl for easy tuning.

Bruce

LI Xin

unread,
Oct 3, 2007, 5:45:39 PM10/3/07
to Vladimir Ivanov, freeb...@freebsd.org, Jack Vogel
Hi Valdimir and Jack,

I have ported Valdimir's 1.16 revision of their driver to -CURRENT code
as of today, but I don't have a box that is suitable for testing right
now as I just moved, and the server I used to do FreeBSD coding stuff is
located several thousand miles away :-)

I hope that this would be useful for adoption to the official em(4)
driver, and thanks Valdimir and Yandex for their work on this.

Cheers,
--
Xin LI <del...@delphij.net> http://www.delphij.net/
FreeBSD - The Power to Serve!

signature.asc
em.diff

Vladimir Ivanov

unread,
Oct 4, 2007, 4:59:51 PM10/4/07
to rmkml, freeb...@freebsd.org
rmkml wrote:
> Hi Vladimir,
> very thank for your work on intel em driver !
> just commented line 772 and added 773 :
> 771: /* Send a copy of the frame to the BPF listener */
> 772: /* ETHER_BPF_MTAP(ifp, m_head); */
> 773: BPF_MTAP(ifp, m_head);
> what is ETHER_BPF_MTAP() ?

look at RELENG_6.
They've moved vlan promisc hack from driver level into ethersubr.
Briefly: we've to disable hardware vlan tagging if we want to tcpdump or
bridge trunked port.

> Best Regards
> Rmkml

Regards,
Vladimir

Vladimir Ivanov

unread,
Oct 4, 2007, 5:38:02 PM10/4/07
to rmkml, freeb...@freebsd.org
rmkml wrote:
> thx Vladimir,
> I have litle tested your em driver,
> quick comparaison :
> kernel: sys=10% intr=25% user=15%
> kernel+polling: sys=7% intr=5% user=15%
> kernel+yandex_em: sys=30% intr=1% user=15%
> same trafic (tcpreplay ~230Mbps)
> fbsd62release_amd64 on intel core 2 duo E6600+4Go
> It is possible reduce sys cpu ?

I'm not sure what that digits mean :-).
There is one more (see RX_KTHREADS_NUM) than usual thread started by the
driver. That's why the results isn't easy to compare. Much of L/A
calculation techniques depends of "number of running threads".

> Regards
> Rmkml
>
>
> On Fri, 5 Oct 2007, Vladimir Ivanov wrote:
>
>> Date: Fri, 05 Oct 2007 00:59:51 +0400
>> From: Vladimir Ivanov <wa...@yandex-team.ru>
>> To: rmkml <rm...@free.fr>, "freeb...@freebsd.org"
>> <freeb...@freebsd.org>
>> Subject: Re: SMPable version of EM driver

LI Xin

unread,
Oct 8, 2007, 1:46:19 PM10/8/07
to Vladimir Ivanov, freeb...@freebsd.org, d...@delphij.net, Jack Vogel
Vladimir Ivanov wrote:
> Hi,

>
> LI Xin wrote:
>> Hi Valdimir and Jack,
>>
>> I have ported Valdimir's 1.16 revision of their driver to -CURRENT code
>> as of today, but I don't have a box that is suitable for testing right
>> now as I just moved, and the server I used to do FreeBSD coding stuff is
>> located several thousand miles away :-)
>>
>> I hope that this would be useful for adoption to the official em(4)
>> driver, and thanks Valdimir and Yandex for their work on this.
>>
>> Cheers,
>>
> Jack has commited version 6.6.6 to RELENG_6. It seems to be very close
> to CURRENT version.
> I've merged it to our revision
> (http://people.yandex-team.ru/~wawa/em-6.6.6-yandex-1.18.tar.gz). Be
> careful: it is fresh (today) code.

Yes, as Jack said 6.6.6 was the tested version at Intel (thanks to Jack
and Intel :) and will became the -CURRENT version for FreeBSD. Thanks
for the work!

signature.asc

Vladimir Ivanov

unread,
Oct 14, 2007, 12:14:19 PM10/14/07
to freeb...@freebsd.org
Hi,

Mike Tancsa wrote:
> On Wed, 01 Aug 2007 18:26:10 +0400, in sentex.lists.freebsd.net you
> wrote:
>
>
>> Bill Marquette wrote:
>>
>>> [skip]
>>> What type of performance differences are you seeing with these
>>> changes? Is this with FreeBSD acting as a router/firewall, or purely
>>>
>>>
>> RX queue is being processed w/more than one thread.
>> TX queue thread isn't locked with RX anymore.
>>
>> Extra CPU time can be used by e.g. IPFW firewall or routing and so on.
>>
>>
>
> Hi,
> I am interested in trying your version of the em driver. On
> one of my routers, I am seeing
>
> kernel: em2: Missed Packets = 953
> kernel: em2: Receive No Buffers = 128
> kernel: em2: RX overruns = 7
> kernel: em2: Good Packets Rcvd = 62453961
> kernel: em2: Good Packets Xmtd = 31935910
>
> This is with the em driver currently in the RELENG_6 tree (version
> 6.6.6).. Previous versions were the same.
> I notice that you have some different defaults as well
>
> dev.em.0.rx_int_delay: 0
> dev.em.0.tx_int_delay: 67108
> dev.em.0.rx_abs_int_delay: 1000
> dev.em.0.tx_abs_int_delay: 67108
>
> vs
>
> dev.em.1.rx_int_delay: 0
> dev.em.1.tx_int_delay: 66
> dev.em.1.rx_abs_int_delay: 66
> dev.em.1.tx_abs_int_delay: 66
> dev.em.1.rx_processing_limit: 100
>
> What are these tuned for ? Hi pps ? Low latency ?
>
We've both problems and even more:
we need low latency, we've huge pps, we've to run firewall and so on

Tuning can not solve them.

Actually our rx/tx timeout defaults mostly are meaningless because:
1) we do not use TX interrupts et all
2) we use explicit SYSCTL (see dev.em.N.rx_kthread_priority) for tuning
RX threads' priority instead of rx_processing_limit.
3) we mask rx interrupts if aren't ready to catch that's why we do not
need interrupt pending/throttling.
> Thanks for any info,
>
> ---Mike
>
>
>> Also:
>> + RX and TX use different priority value. System seems to be more stable
>> if RX scheduled w/less priority.
>> + RX/TX stay masked if there is no thread ready to catch interrupt.
>>
>>
>>> as a server? Any chance you are using the pf filtering engine (which
>>> I believe is still under giant in releng_6) with this? Thanks
>>>
>>>
>> I have been talked that GIANT is a big problem for pf driver and they
>> can not fix it easy.
>>
>> Regards,
>> --
>>

[skip]

WBR,
PS: your personal e-mail doesn't work

Lawrence Stewart

unread,
Oct 24, 2007, 10:53:43 PM10/24/07
to d...@delphij.net, James Healy, freeb...@freebsd.org, Jack Vogel, Vladimir Ivanov
Hi Xin,

LI Xin wrote:
> Hi Valdimir and Jack,
>
> I have ported Valdimir's 1.16 revision of their driver to -CURRENT code
> as of today, but I don't have a box that is suitable for testing right
> now as I just moved, and the server I used to do FreeBSD coding stuff is
> located several thousand miles away :-)
>
> I hope that this would be useful for adoption to the official em(4)
> driver, and thanks Valdimir and Yandex for their work on this.
>
> Cheers,
>

We've just tested your patch on a FreeBSD 7-PRERELEASE box running
cvsuped source from 14th Oct 2007. The patch applied cleanly and the
kernel compiled without error.

Booting the new Yandex-enabled kernel resulted in an apparent lock
acquisition problem and shortly after, a possibly unrelated kernel panic
after starting devd. I'm not sure what info you might need to debug it,
but let me know if you need anything in addition to what I thought was
relevant and have included in the attached text file.

Cheers,
Lawrence Stewart

http://caia.swin.edu.au


freebsd7_em_yandex_debug.txt

LI Xin

unread,
Oct 25, 2007, 1:25:20 AM10/25/07
to Lawrence Stewart, James Healy, freeb...@freebsd.org, d...@delphij.net, Jack Vogel, Vladimir Ivanov
Shoot, the TX mutex locking and unlocking should not belong here. Let
me check the code.
signature.asc

Vladimir Ivanov

unread,
Oct 26, 2007, 11:31:31 AM10/26/07
to d...@delphij.net, Lawrence Stewart, James Healy, Jack Vogel, freeb...@freebsd.org
Hi,

LI Xin wrote:
> Shoot, the TX mutex locking and unlocking should not belong here. Let
> me check the code.
>
> Cheers,
>

Don't forget: our latest version
http://people.yandex-team.ru/wawa/em-6.6.6-yandex-1.20.tar.gz is very
close to CURRENT.
Also, you can alter threads' number runtime in this revision.

LI Xin

unread,
Oct 27, 2007, 1:19:49 AM10/27/07
to Vladimir Ivanov, Lawrence Stewart, James Healy, d...@delphij.net, Jack Vogel, freeb...@freebsd.org
Vladimir Ivanov wrote:
> Hi,
>
> LI Xin wrote:
>> Shoot, the TX mutex locking and unlocking should not belong here. Let
>> me check the code.
>>
>> Cheers,
>>
> Don't forget: our latest version
> http://people.yandex-team.ru/wawa/em-6.6.6-yandex-1.20.tar.gz is very
> close to CURRENT.
> Also, you can alter threads' number runtime in this revision.

Oh... So you has adopted Jack's new version of driver? Maybe I should
take some time to port it to -HEAD first? :-)

signature.asc
0 new messages