Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] fix PHY polling system blocking

7 views
Skip to first unread message

Stefani Seibold

unread,
Mar 6, 2010, 12:00:02 PM3/6/10
to
This patch fix the PHY poller, which can block the whole system. On a
Freescale PPC 834x this result in a delay of 450 us due the slow
communication with the PHY chip.

For PHY chips without interrupts, the status of the ethernet will be
polled every 2 sec. The poll function will read some register of the MII
PHY. The time between the sending the MII_READ_COMMAND and receiving the
result is more the 100 us on a PPC 834x.

The patch modifies the poller a lit bit. Only a link status state change
will result in a successive detection of the connection type. The poll
cycle on the other hand will be increased to every seconds.

Together this patch will prevent a blocking of nearly 400 us every two
seconds of the whole system on a PPC 834x.

The patch is against kernel 2.6.33. Please merge it.

Signed-off-by: Stefani Seibold <ste...@seibold.net>
---
phy.c | 5 ++---
phy_device.c | 12 +++++++++---
2 files changed, 11 insertions(+), 6 deletions(-)

diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy.c linux-2.6.33/drivers/net//phy/phy.c
--- linux-2.6.33.orig/drivers/net/phy/phy.c 2010-02-24 19:52:17.000000000 +0100
+++ linux-2.6.33/drivers/net//phy/phy.c 2010-02-28 22:53:14.725464101 +0100
@@ -871,9 +871,8 @@ void phy_state_machine(struct work_struc
case PHY_RUNNING:
/* Only register a CHANGE if we are
* polling */
- if (PHY_POLL == phydev->irq)
- phydev->state = PHY_CHANGELINK;
- break;
+ if (PHY_POLL != phydev->irq)
+ break;
case PHY_CHANGELINK:
err = phy_read_status(phydev);

diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy_device.c linux-2.6.33/drivers/net//phy/phy_device.c
--- linux-2.6.33.orig/drivers/net/phy/phy_device.c 2010-02-24 19:52:17.000000000 +0100
+++ linux-2.6.33/drivers/net//phy/phy_device.c 2010-02-28 22:53:14.726464145 +0100
@@ -161,7 +161,7 @@ struct phy_device* phy_device_create(str
dev->speed = 0;
dev->duplex = -1;
dev->pause = dev->asym_pause = 0;
- dev->link = 1;
+ dev->link = 0;
dev->interface = PHY_INTERFACE_MODE_GMII;

dev->autoneg = AUTONEG_ENABLE;
@@ -694,10 +694,16 @@ int genphy_update_link(struct phy_device
if (status < 0)
return status;

- if ((status & BMSR_LSTATUS) == 0)
+ if ((status & BMSR_LSTATUS) == 0) {
+ if (phydev->link==0)
+ return 1;
phydev->link = 0;
- else
+ }
+ else {
+ if (phydev->link==1)
+ return 1;
phydev->link = 1;
+ }

return 0;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andrew Morton

unread,
Mar 12, 2010, 5:50:01 PM3/12/10
to
On Sat, 06 Mar 2010 17:50:58 +0100
Stefani Seibold <ste...@seibold.net> wrote:

> This patch fix the PHY poller, which can block the whole system. On a
> Freescale PPC 834x this result in a delay of 450 us due the slow
> communication with the PHY chip.
>
> For PHY chips without interrupts, the status of the ethernet will be
> polled every 2 sec. The poll function will read some register of the MII
> PHY. The time between the sending the MII_READ_COMMAND and receiving the
> result is more the 100 us on a PPC 834x.
>
> The patch modifies the poller a lit bit. Only a link status state change
> will result in a successive detection of the connection type. The poll
> cycle on the other hand will be increased to every seconds.

You didn't tell us how many seconds. That would be important?

> Together this patch will prevent a blocking of nearly 400 us every two
> seconds of the whole system on a PPC 834x.
>

I can't say that I really understand what you did - what functionality
did we lose?

Would it not be better to extend the phy state machine a bit? When we
detect the link status change, issue the MII command then arm the
delayed-work timer to expire in a millisecond, then go in next time and
read the result?

That might require fancy locking to prevent other threads of control
from going in and mucking up the MII state. Possibly leave phydev_lock
held across that millisecond to keep other people away?

>
> ...


>
> diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy.c linux-2.6.33/drivers/net//phy/phy.c
> --- linux-2.6.33.orig/drivers/net/phy/phy.c 2010-02-24 19:52:17.000000000 +0100
> +++ linux-2.6.33/drivers/net//phy/phy.c 2010-02-28 22:53:14.725464101 +0100

erp, your weird patch headers ("//") broke my seven-year-old script.
But I fixed it!

Please don't invent new coding styles. scripts/checkpatch.pl is here
to help.

Stefani Seibold

unread,
Mar 13, 2010, 7:00:01 AM3/13/10
to
Am Freitag, den 12.03.2010, 14:42 -0800 schrieb Andrew Morton:
> On Sat, 06 Mar 2010 17:50:58 +0100
> Stefani Seibold <ste...@seibold.net> wrote:
>
> > This patch fix the PHY poller, which can block the whole system. On a
> > Freescale PPC 834x this result in a delay of 450 us due the slow
> > communication with the PHY chip.
> >
> > For PHY chips without interrupts, the status of the ethernet will be
> > polled every 2 sec. The poll function will read some register of the MII
> > PHY. The time between the sending the MII_READ_COMMAND and receiving the
> > result is more the 100 us on a PPC 834x.
> >
> > The patch modifies the poller a lit bit. Only a link status state change
> > will result in a successive detection of the connection type. The poll
> > cycle on the other hand will be increased to every seconds.
>
> You didn't tell us how many seconds. That would be important?
>

The old implementation would poll the PHC every 2 seconds, the new one
once per second.

> > Together this patch will prevent a blocking of nearly 400 us every two
> > seconds of the whole system on a PPC 834x.
> >
>
> I can't say that I really understand what you did - what functionality
> did we lose?
>

There is not real drawback, only the detection of a connection type
change without unplugging the cable will not work. But this is more an
esoteric use case.

> Would it not be better to extend the phy state machine a bit? When we
> detect the link status change, issue the MII command then arm the
> delayed-work timer to expire in a millisecond, then go in next time and
> read the result?
>
> That might require fancy locking to prevent other threads of control
> from going in and mucking up the MII state. Possibly leave phydev_lock
> held across that millisecond to keep other people away?
>

You are right, that would be the best solution. But i am currently busy,
so this a fast interim solution ;-) It was also my approach, because it
the PHY communication in the most cases very slow. Maybe i will do this
in the near future.

> >
> > ...
> >
> > diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy.c linux-2.6.33/drivers/net//phy/phy.c
> > --- linux-2.6.33.orig/drivers/net/phy/phy.c 2010-02-24 19:52:17.000000000 +0100
> > +++ linux-2.6.33/drivers/net//phy/phy.c 2010-02-28 22:53:14.725464101 +0100
>
> erp, your weird patch headers ("//") broke my seven-year-old script.
> But I fixed it!
>

Sorry, my fault. But now you have a better script.

Will be fixed!

Stefani Seibold

unread,
Mar 13, 2010, 8:00:02 AM3/13/10
to
This patch fix the PHY poller, which can block the whole system. PHY
access are normaly not very fast, since there are serial attached.

For PHY chips without interrupts, the status of the ethernet will be
polled every 2 sec. The poll function will read some register of the MII
PHY. The time between the sending the MII_READ_COMMAND and receiving the

result could be very long (>100us).

For example:

On a Freescale PPC 834x this result in a delay of 450 us due the slow

communication with the PHY chip. The time between the sending the
MII_READ_COMMAND and receiving the result is more the 100 us on this
controller.



The patch modifies the poller a lit bit. Only a link status state change
will result in a successive detection of the connection type. The poll

cycle on the other hand will be increased to one every seconds.

All in all this patch will prevent a blocking of f.e nearly 400 us every


two seconds of the whole system on a PPC 834x.

There is not real drawback, only the detection of a connection type


change without unplugging the cable will not work. But this is more an
esoteric use case.

The patch is against kernel 2.6.33. Please merge it.

Signed-off-by: Stefani Seibold <ste...@seibold.net>
---
phy.c | 5 ++---

phy_device.c | 11 ++++++++---
2 files changed, 10 insertions(+), 6 deletions(-)

diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy.c linux-2.6.33/drivers/net/phy/phy.c


--- linux-2.6.33.orig/drivers/net/phy/phy.c 2010-02-24 19:52:17.000000000 +0100

+++ linux-2.6.33/drivers/net/phy/phy.c 2010-02-28 22:53:14.725464101 +0100


@@ -871,9 +871,8 @@ void phy_state_machine(struct work_struc
case PHY_RUNNING:
/* Only register a CHANGE if we are
* polling */
- if (PHY_POLL == phydev->irq)
- phydev->state = PHY_CHANGELINK;
- break;
+ if (PHY_POLL != phydev->irq)
+ break;
case PHY_CHANGELINK:
err = phy_read_status(phydev);

diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy_device.c linux-2.6.33/drivers/net/phy/phy_device.c


--- linux-2.6.33.orig/drivers/net/phy/phy_device.c 2010-02-24 19:52:17.000000000 +0100

+++ linux-2.6.33/drivers/net/phy/phy_device.c 2010-02-28 22:53:14.726464145 +0100


@@ -161,7 +161,7 @@ struct phy_device* phy_device_create(str
dev->speed = 0;
dev->duplex = -1;
dev->pause = dev->asym_pause = 0;
- dev->link = 1;
+ dev->link = 0;
dev->interface = PHY_INTERFACE_MODE_GMII;

dev->autoneg = AUTONEG_ENABLE;

@@ -694,10 +694,15 @@ int genphy_update_link(struct phy_device


if (status < 0)
return status;

- if ((status & BMSR_LSTATUS) == 0)
+ if ((status & BMSR_LSTATUS) == 0) {
+ if (phydev->link == 0)
+ return 1;
phydev->link = 0;
- else

+ } else {
+ if (phydev->link == 1)


+ return 1;
phydev->link = 1;
+ }

return 0;

David Miller

unread,
Mar 13, 2010, 3:20:02 PM3/13/10
to
From: Stefani Seibold <ste...@seibold.net>
Date: Sat, 13 Mar 2010 13:53:10 +0100

> For PHY chips without interrupts, the status of the ethernet will be
> polled every 2 sec. The poll function will read some register of the MII
> PHY. The time between the sending the MII_READ_COMMAND and receiving the
> result could be very long (>100us).

I'm not apply this, as I described in my previous email.

If it's expensive to detect link configuration changes that
doesn't mean you just turn it off.

David Miller

unread,
Mar 13, 2010, 3:20:02 PM3/13/10
to
From: Stefani Seibold <ste...@seibold.net>
Date: Sat, 13 Mar 2010 12:54:02 +0100

> There is not real drawback, only the detection of a connection type
> change without unplugging the cable will not work. But this is more an
> esoteric use case.

I don't think it's so esoteric.

We should definitely notice all link state and capability changes.

If proper functionality cannot be done without an expensive operation,
it doesn't mean we take away the proper functionality.

I really don't agree with these changes, sorry.

Stefani Seibold

unread,
Mar 21, 2010, 6:00:02 PM3/21/10
to
I had now analyzed the PHY handling in most of the network drivers. Most
of the PHY communication will be handled in a polling/blocking way,
write a command word and then wait for the results. Due the nature of
the PHY attachment, this will take some time.

Some of the network drivers do this polling/blocking also in atomic code
paths, like interrupts or timer. So activities on the PHY can cause huge
latency jitters.

On the other side, most of the network driver handle the PHY without
using or only partially using the phylib.

The phylib has also a drawback, because it polls the PHY despite if it
has interrupt support for it or not. I can't see a reason for this
behavior.

So the problem of huge latencies by polling the PHY occurs in most of
the network drivers. For example have a look at the e100 network driver
in the file drivers/net/e100.c, function mdio_ctrl_hw(): This function
will poll for max. of 4000 us or 4 ms.

To fix this latency jitter problem with the PHY polling there are the
following steps to do:

- disable polling in driver/net/phy.c if an interrupt for the PHY is
available
- create an own single or per cpu workqueue for the phylib, so that the
PHY specific code can temporary schedule or block
- prevent all current user of the phylib to access the PHY in a atomic
code path
- modify all current users of the phylib from using cpu_relax() to
cond_resched() and replace the counters against inquiring a timeout
- modify all other network drivers to use the phylib

What do you think?

Stefani

David Miller

unread,
Mar 21, 2010, 9:00:02 PM3/21/10
to
From: Stefani Seibold <ste...@seibold.net>
Date: Sun, 21 Mar 2010 22:54:50 +0100

> The phylib has also a drawback, because it polls the PHY despite if it
> has interrupt support for it or not. I can't see a reason for this
> behavior.

Careful, in my experience many PHYs that do have interrupt
support have buggy implementations to the point where the
interrupt support cannot be used at all.

Typically the problem is that events aren't reported reliably.

So I just wanted you to keep in mind that a chip having
interrupt support doesn't automatically mean it can be used.

0 new messages