Linux networking system 将要大改进, 别人的思路对我们写高性能程序有帮助

18 views

Skip to first unread message

redsea

unread,

Oct 12, 2007, 1:29:39 AM10/12/07

to TopLanguage

原来的性能其实也不错了, 比起freebsd 只是在少数几个地方差些, 差距也不甚大, 比Windows 强得多了.

现在要彻底重写, 主要的原因是, 计算机发展到现在, CPU 速度比内存速度快得多, 以及core 的数目增加, 很多原来还不错的做法, 现在
已经不好使了.

别人的考虑, 咱们也可以作为参考, 里面提到主要的问题是: 太多内存操作, cpu 亲和性保持得不够, locking 太昂贵, 使用
link-list 对cache 不友好.

这里是其中一篇文章
http://lwn.net/Articles/169961/

里面说到
Van, like many others, points out that the biggest impediment to
scalability on contemporary hardware is memory performance. Current
processors can often execute multiple instructions per nanosecond, but
loading a cache line from memory still takes 50ns or more. So cache
behavior will often be the dominant factor in the performance of
kernel code. That is why simply making code smaller often makes it
faster. The kernel developers understand cache behavior well, and much
work has gone into improving cache utilization in the kernel.

The Linux networking stack (like all others) does a number of things
which reduce cache performance, however. These include:

* Passing network packets through multiple layers of the kernel. When
a packet arrives, the network card's interrupt handler begins the task
of feeding the packet to the kernel. The remainder of the work may
well be performed at software interrupt level within the driver (in a
tasklet, perhaps). The core network processing happens in another
software interrupt. Copying the data (an expensive operation in
itself) to the application happens in kernel context. Finally the
application itself does something interesting with the data. The
context changes are expensive, and if any of these changes causes the
work to move from one CPU to another, a big cache penalty results.
Much work has been done to improve CPU locality in the networking
subsystem, but much remains to be done.

* Locking is expensive. Taking a lock requires a cross-system atomic
operation and moves a cache line between processors. Locking costs
have led to the development of lock-free techniques like seqlocks and
read-copy-update, but the the networking stack (like the rest of the
kernel) remains full of locks.

* The networking code makes extensive use of queues implemented with
doubly-linked lists. These lists have poor cache behavior since they
require each user to make changes (and thus move cache lines) in
multiple places.

刘未鹏(pongba)

unread,

Oct 12, 2007, 2:13:50 AM10/12/07

to TopLanguage

很有价值的文章，多谢redsea分享:=)

On Oct 12, 1:29 pm, redsea <red...@gmail.com> wrote:
> 原来的性能其实也不错了, 比起freebsd 只是在少数几个地方差些, 差距也不甚大, 比Windows 强得多了.
>
> 现在要彻底重写, 主要的原因是, 计算机发展到现在, CPU 速度比内存速度快得多, 以及core 的数目增加, 很多原来还不错的做法, 现在
> 已经不好使了.
>
> 别人的考虑, 咱们也可以作为参考, 里面提到主要的问题是: 太多内存操作, cpu 亲和性保持得不够, locking 太昂贵, 使用
> link-list 对cache 不友好.
>