Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1034550: r8168-dkms: Excessive network latency with PREEMPT_RT kernel without the R8168-dkms driver

261 views
Skip to first unread message

Rod Webster

unread,
Apr 18, 2023, 6:00:03 AM4/18/23
to
Thanks for a prompt response. We used the alpha 2 ISO release. I had checked sources.list. The two of us working on this were unaware that there was a difference between non-free-firmware and non-free. non-free-firmware is not mentioned as a component on the debian wiki https://wiki.debian.org/SourcesList  But the examples for bookworm show the use of non-free-firmware so it's no wonder we were confused. Could the documentation team amend the wiki to clarify the changes and the differences between non-free and non-free-firmware components? I'm sure this will be a source of confusion moving forward.

In any case, this is a side issue and we await a response from the kernel team in relation to the latency issue with the R8169 module under PREEMPT_RT.





On Tue, 18 Apr 2023 at 18:13, Andreas Beckmann <an...@debian.org> wrote:
Control: tag -1 - newcomer
Control: retitle -1 excessive network latency with PREEMPT_RT kernel without the r8168-dkms driver
Control: reassign -1 src:linux

On 18/04/2023 04.12, Rod Webster wrote:
> Package: r8168-dkms

> We are linuxcnc users which is packaged in Bookworm. Linuxcnc requires the
> PREEMPT_RT real time kernel as a prerequisite. We have found excessive latency
> in the real time environment since Debian moved to the 5.x kernels.  We note
...

I'm reassigning this bug to the linux kernel package, as this is not an
issue in the r8168 driver.

> We are no longer able to locate the R8168-dkms driver in the repositories,
> despite it being listed as available in package search. We have downloaded a
> .deb file from the Sid packages to install the correct driver.

>   3. The R8168-dkms driver to continue to be made available in the Bookworm
> repositories.

The r8168-dkms package is in non-free - do you have that enabled?


Andreas

Ben Hutchings

unread,
Apr 22, 2023, 11:33:06 AM4/22/23
to
On Tue, 18 Apr 2023 12:12:58 +1000 Rod Webster <r...@vmn.com.au> wrote:
[...]
> Linuxcnc uses a 1 ms realtime thread and we regularly see "Error Finishing
> Read" reported.  This error disables the connection becasue our 1 ms thread has
> been overrun. This issue mainly affects Realtek NIC hardware and s of real
> concern where the motion hardware could be commanding components weiging
> several thousand pounds.
[...]

The real-time kernel packages are provided as a convenience for users
that have non-safety-critical real-time requirements, such as audio
production.

For safety-critical applications, you must take responsibility (or find
a supplier who can) for selecting and validating software that meets
the real-time and other reliability requirements.

As a reminder, "Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to
the extent permitted by applicable law."

Ben.

--
Ben Hutchings
Theory and practice are closer in theory than in practice - John Levine
signature.asc

Rod Webster

unread,
Apr 22, 2023, 7:20:03 PM4/22/23
to
Thanks.
That is really a disappointing response because:
1. Hardware selected based on  Debian  4.x kernels in Buster that operated safely was broken by the 5.10 and above kernels in Bullseye and Bookworm
2. You ask us to report a bug if the R8168-dkms package has to be used so we did, now no interest is shown in actioning the report
3. It does not address the excessive latency in the Debian RT kernel that is not present in the upstream version at kernel.org
4. It has taken a lot of work from a lot of Linuxcnc users to identify the issues before this report could be made.

The official ISO release of Linuxcnc is still based on Buster so not many users ventured into the later kernels hence the delay in reporting. Linuxcnc is packaged in Bookworm so the issue will be more prevalent moving forward.

I was told by a Debian developer involved in linuxcnc that if there were issues affecting us, they would be fixed. I hope something comes of this.


Rod Webster

VMN®

www.vmn.com.au

Ph: 1300 896 832

Mob: +61 435 765 611



Ben Hutchings

unread,
Apr 23, 2023, 9:50:09 AM4/23/23
to
Control: retitle -1 linux-image-rt-amd64: High network latency with r8169 driver
Control: tag -1 moreinfo

On Sun, 2023-04-23 at 09:14 +1000, Rod Webster wrote:
> Thanks.
> That is really a disappointing response because:
> 1. Hardware selected based on Debian 4.x kernels in Buster that operated
> safely was broken by the 5.10 and above kernels in Bullseye and Bookworm
> 2. You ask us to report a bug if the R8168-dkms package has to be used so
> we did, now no interest is shown in actioning the report
> 3. It does not address the excessive latency in the Debian RT kernel that
> is not present in the upstream version at kernel.org
> 4. It has taken a lot of work from a lot of Linuxcnc users to identify the
> issues before this report could be made.
>
> The official ISO release of Linuxcnc is still based on Buster so not many
> users ventured into the later kernels hence the delay in reporting.
> Linuxcnc is packaged in Bookworm so the issue will be more prevalent moving
> forward.
>
> I was told by a Debian developer involved in linuxcnc that if there were
> issues affecting us, they would be fixed. I hope something comes of this.
[...]

I'm not dismissing this bug report, but I wanted to first make it clear
that we cannot take any responsibility for safety-critical
applications.

As to the general issue of network latency:
- What was the latest Debian packaged kernel version you used?
- You've said that installing r8168-dkms resolves the issue. Am I
correct in assuming that when you ran the Debian packaged kernel, the
r8169 driver was used?
- Have you tested on any other machines with different network
hardware?
- We don't make a lot of changes to the kernel source, but our build
configuration will be different. Can you confirm exactly which upstream
release you've tested, and provide the configuration (.config) file you
used?

Ben.

--
Ben Hutchings
It's easier to fight for one's principles than to live up to them.
signature.asc

Diederik de Haas

unread,
Apr 23, 2023, 11:10:03 AM4/23/23
to
On Sunday, 23 April 2023 15:43:01 CEST Ben Hutchings wrote:
> Can you confirm exactly which upstream release you've tested

The initial bug report (which didn't end up on debian-kernel ML) had:

On Tue, 18 Apr 2023 12:12:58 +1000 Rod Webster <r...@vmn.com.au> wrote:
> We note that RT latency/jitter has significantly improved in the 6.x kernels
> and is better again with the 6.3 kernel compiled from kernel.org sources
> where latency/jitter is on a par with the 4.x kernels found in Buster.

So I'm guess upstream master (so 6.3-rc7 f.e.).

me@pc:~/dev/kernel.org/linux$ git log --oneline v6.1..HEAD -- drivers/net/ethernet/realtek/r8169*
33189f0a94b9 r8169: fix RTL8168H and RTL8107E rx crc error
ce870af39558 r8169: reset bus if NIC isn't accessible after tx timeout
a99da46ac01a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
80c0576ef179 r8169: disable ASPM in case of tx timeout
2ea26b4de6f4 Revert "r8169: disable detection of chip version 36"
bb41c13c05c2 r8169: fix dmar pte write access is not set error
ad425666a1f0 r8169: move rtl_wol_enable_rx() and rtl_prepare_power_down()
42f66a44d837 r8169: enable GRO software interrupt coalescing per default
4b6c6065fca1 r8169: use tp_to_dev instead of open code
eca485d22165 drivers: net: convert to boolean for the mac_managed_pm flag

It looks like some are now part of 6.1.25 too, but not all.

It also looks like realtek is now actually contributing to the upstream kernel
instead of periodically dumping their own code on the internet :-)
signature.asc

Rod Webster

unread,
Apr 23, 2023, 5:10:03 PM4/23/23
to
Thanks. I think I have tried most (if not all) released kernels (from 5.10 to present day 6.1) I adopted Bullseye a few weeks before it became the stable branch. Bookworm supports the Realtek R8125 NIC which I noted this week is also using the R8169 driver with the latest Bullseye kernel. I believe it is also affected by this issue but have not benchmarked it.

Linuxcnc was accepted into SID back in January 2022 and I started using the non-free sid versions from that time. I then started using Bookworm/Testing once linuxcnc was accepted into it.

I have personally tested on 4-5 USFF PC's ranging from intel J1900, J4115 and i3 CPU's. All used Realtek network hardware and  all were affected. All were initially using the R8169 driver. Many Other Linuxcnc users have reported the issue. All of these had hardware covered by Realteks official R8168 driver. All of these benefited from installing the Debian R8168-dkms driver.

Compiling the RT kernel is not new to linuxcnc users as it was required up until Debian first released linux-image-rt packages. All we have ever needed to do was to patch the code and make a single change in menuconfig/xconfig to select the fully preemptible kernel and compile. I learnt how to build kernel debs when Bookworm was on the 6.0 kernel and built a 6.1-rt5 version which I shared publicly with other users via Google Drive. This resolved issues for a lot of users. Another user recently reported substantial improvement in latency with the 6.3 kernel so two of us built and tested it with outstanding and near identical results for both overall latency and network latency.

I have not kept my .config files as PC's have been reformatted so many times. However, my kernels and the steps used to build them are available in my google drive. They will show what we have changed. Here is the link to the 6.1.0-rt5 kernel  https://drive.google.com/drive/folders/1jGc6AUYKMPvsSOdWRdvhWeDX1P96tsFQ?usp=sharing  I will redo this for the final 6.1 kernel and share the config to get in step with Debian Bookworm's current state.

Note we use Linuxcnc's latency testing tools to measure latency but cyclictest produces similar observable results.
Unfortunately, we don't have any portable method to test network latency. Our hardware reports the maximum time to read and write to it in CPU timer ticks.
This command may give some insight but the other device would need to be on a dedicated point to point ethernet connection (no hub or router)
sudo chrt 99 ping -i .001 -q 10.10.10.10

I hope this covers your questions.

Rod Webster

VMN®

www.vmn.com.au

Ph: 1300 896 832

Mob: +61 435 765 611


Ben Hutchings

unread,
Apr 23, 2023, 6:11:53 PM4/23/23
to
On Mon, 2023-04-24 at 07:00 +1000, Rod Webster wrote:

[...]
> I have not kept my .config files as PC's have been reformatted so many
> times. However, my kernels and the steps used to build them are available
> in my google drive. They will show what we have changed. Here is the link
> to the 6.1.0-rt5 kernel
> https://drive.google.com/drive/folders/1jGc6AUYKMPvsSOdWRdvhWeDX1P96tsFQ?usp=sharing
> <https://drive.google.com/drive/folders/1jGc6AUYKMPvsSOdWRdvhWeDX1P96tsFQ?usp=sharing>
> I
> will redo this for the final 6.1 kernel and share the config to get in step
> with Debian Bookworm's current state.
[...]
>

I'm not spotting any particular interesting differences in the config
there, unfortunately. And from your earlier messages, it sounded like
you didn't have so much of a problem with the Debian packages for Linux
6.1 anyway.

If you still have custom packages of Linux 5.10 where the r8169
driver's network latency is OK, I would like to see those.
signature.asc
0 new messages