I will take a look at what you've done as soon as I can, I have a
some issues keeping me busy so it may take me a few days.
We've just published newest revision of Yandex' em driver at
Main improvement of this version: driver does not use TX interrupts at
all. So, interrupt rate reduced significantly.
There are also several small bug fixes and tiny patch set explanation in
the file README.Yandex.
In the near term I will be taking changes that I did for the
10G Oplin driver, specifically multiqueue/rss and the lock
splitting that is in that driver, and putting them back into
the Gig driver, but that should go into CURRENT first.
My recomendation is to move your work to CURRENT.
We have a little bit different points of view. Your business is code
development. Our business is to make several thousand FreeBSD boxes fast
and stable. That's why we've limited with OS release selection. But
another side of my coin is: we able to test software with a lot of
We plan to deal with CURRENT though.
I can understand and appreciate that, and in fact, both our tasks
are necessary for success. I will try harder to get to your code
> Main improvement of this version: driver does not use TX interrupts at all.
> So, interrupt rate reduced significantly.
Polling for anything is a bug IMO. Buggy hardware may work better with it,
but em is not buggy :-).
For bge, I tune the interrupt moderation parameters to reduce the tx
interrupt rate to almost as low as possible without doing polling.
The rate is either 1 interrupt per second if the tx is almost inactive
or 1 interrupt every 384 packets if the tx is active. -current mistunes
these parameters to 150 (microseconds) and 10 (descriptos). Old tuning
of 150 and 128 only loses a little compared with 1000000 and 384. (150
gives 6667 interrupts per second under load. This interrupt rate is
quite manageable and is about the same rate as you have to use with
polling to get the same throughput but lower efficiency as with
interrupts. 128 for the descriptor limit causes in a max interrupt rate
of only a few hundred per second except with tiny packets, but 10 is
excessively small and requires a rate of up to 140000 per second to keep
up with tiny packets. 140000 isn't manageable.)
em has more/better interrupt parameters with non-broken defaults so I haven't
needed to tune them. For bge, I implement dynamic rx interrupt moderation
in software where em has it in hardware. 10000 interrupts/second for rx
is a good limit. IIRC, em uses 8000 which is a bit low for a max, and
is missing a sysctl for easy tuning.
I have ported Valdimir's 1.16 revision of their driver to -CURRENT code
as of today, but I don't have a box that is suitable for testing right
now as I just moved, and the server I used to do FreeBSD coding stuff is
located several thousand miles away :-)
I hope that this would be useful for adoption to the official em(4)
driver, and thanks Valdimir and Yandex for their work on this.
look at RELENG_6.
They've moved vlan promisc hack from driver level into ethersubr.
Briefly: we've to disable hardware vlan tagging if we want to tcpdump or
bridge trunked port.
> Best Regards
I'm not sure what that digits mean :-).
There is one more (see RX_KTHREADS_NUM) than usual thread started by the
driver. That's why the results isn't easy to compare. Much of L/A
calculation techniques depends of "number of running threads".
> On Fri, 5 Oct 2007, Vladimir Ivanov wrote:
>> Date: Fri, 05 Oct 2007 00:59:51 +0400
>> From: Vladimir Ivanov <wa...@yandex-team.ru>
>> To: rmkml <rm...@free.fr>, "freeb...@freebsd.org"
>> Subject: Re: SMPable version of EM driver
Yes, as Jack said 6.6.6 was the tested version at Intel (thanks to Jack
and Intel :) and will became the -CURRENT version for FreeBSD. Thanks
for the work!
Mike Tancsa wrote:
> On Wed, 01 Aug 2007 18:26:10 +0400, in sentex.lists.freebsd.net you
>> Bill Marquette wrote:
>>> What type of performance differences are you seeing with these
>>> changes? Is this with FreeBSD acting as a router/firewall, or purely
>> RX queue is being processed w/more than one thread.
>> TX queue thread isn't locked with RX anymore.
>> Extra CPU time can be used by e.g. IPFW firewall or routing and so on.
> I am interested in trying your version of the em driver. On
> one of my routers, I am seeing
> kernel: em2: Missed Packets = 953
> kernel: em2: Receive No Buffers = 128
> kernel: em2: RX overruns = 7
> kernel: em2: Good Packets Rcvd = 62453961
> kernel: em2: Good Packets Xmtd = 31935910
> This is with the em driver currently in the RELENG_6 tree (version
> 6.6.6).. Previous versions were the same.
> I notice that you have some different defaults as well
> dev.em.0.rx_int_delay: 0
> dev.em.0.tx_int_delay: 67108
> dev.em.0.rx_abs_int_delay: 1000
> dev.em.0.tx_abs_int_delay: 67108
> dev.em.1.rx_int_delay: 0
> dev.em.1.tx_int_delay: 66
> dev.em.1.rx_abs_int_delay: 66
> dev.em.1.tx_abs_int_delay: 66
> dev.em.1.rx_processing_limit: 100
> What are these tuned for ? Hi pps ? Low latency ?
We've both problems and even more:
we need low latency, we've huge pps, we've to run firewall and so on
Tuning can not solve them.
Actually our rx/tx timeout defaults mostly are meaningless because:
1) we do not use TX interrupts et all
2) we use explicit SYSCTL (see dev.em.N.rx_kthread_priority) for tuning
RX threads' priority instead of rx_processing_limit.
3) we mask rx interrupts if aren't ready to catch that's why we do not
need interrupt pending/throttling.
> Thanks for any info,
>> + RX and TX use different priority value. System seems to be more stable
>> if RX scheduled w/less priority.
>> + RX/TX stay masked if there is no thread ready to catch interrupt.
>>> as a server? Any chance you are using the pf filtering engine (which
>>> I believe is still under giant in releng_6) with this? Thanks
>> I have been talked that GIANT is a big problem for pf driver and they
>> can not fix it easy.
PS: your personal e-mail doesn't work
LI Xin wrote:
> Hi Valdimir and Jack,
> I have ported Valdimir's 1.16 revision of their driver to -CURRENT code
> as of today, but I don't have a box that is suitable for testing right
> now as I just moved, and the server I used to do FreeBSD coding stuff is
> located several thousand miles away :-)
> I hope that this would be useful for adoption to the official em(4)
> driver, and thanks Valdimir and Yandex for their work on this.
We've just tested your patch on a FreeBSD 7-PRERELEASE box running
cvsuped source from 14th Oct 2007. The patch applied cleanly and the
kernel compiled without error.
Booting the new Yandex-enabled kernel resulted in an apparent lock
acquisition problem and shortly after, a possibly unrelated kernel panic
after starting devd. I'm not sure what info you might need to debug it,
but let me know if you need anything in addition to what I thought was
relevant and have included in the attached text file.
LI Xin wrote:
> Shoot, the TX mutex locking and unlocking should not belong here. Let
> me check the code.
Don't forget: our latest version
http://people.yandex-team.ru/wawa/em-6.6.6-yandex-1.20.tar.gz is very
close to CURRENT.
Also, you can alter threads' number runtime in this revision.
Oh... So you has adopted Jack's new version of driver? Maybe I should
take some time to port it to -HEAD first? :-)