> The KSZ8842 has a switch with lots of hardware configurations. The =
> driver uses the proc system to allow users to configure the switch. If =
> this is not desired the whole thing can be removed by not calling the =
> init_proc() function.
I think there needs to be a serious discussion about how
this driver uses bridge layer internals by doing things like:
+/* Needed for STP support. */
+#ifdef CONFIG_KSZ8842_STP
+#include <../net/bridge/br_private.h>
+#endif
and uses procfs to configure the ports.
Stephen please look this over and make suggestions for better
ways to support and configure these kinds of devices.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
We have existing standard defines for most of these that should be used
instead where relevant. The code appears not use most of these defines
for generic PCI stuff anyway.
> +#define PORT_CTRL_ADDR(port, addr) \
> + (addr = KS8842_PORT_1_CTRL_1 + port * \
> + (KS8842_PORT_2_CTRL_1 - KS8842_PORT_1_CTRL_1))
Brackets around port so that things like PORT_CTRL_ADDR(x + 1) work
as expected
> +/*
> + * Hardware register access macros
> + */
> +
> +#define hw_dis_intr_sync(hw) hw_dis_intr(hw)
> +
> +#define HW_R8(hw, addr, data) (*(data) = readb((hw)->ioaddr + (addr)))
> +#define HW_W8(hw, addr, data) writeb((data), (hw)->ioaddr + (addr))
> +
> +#define HW_R16(hw, addr, data) (*(data) = readw((hw)->ioaddr + (addr)))
> +#define HW_W16(hw, addr, data) writew((data), (hw)->ioaddr + (addr))
> +
> +#define HW_R32(hw, addr, data) (*(data) = readl((hw)->ioaddr + (addr)))
> +#define HW_W32(hw, addr, data) writel((data), (hw)->ioaddr + (addr))
> +
> +#define HW_PCI_READ_BYTE(hw, addr, data) \
> + pci_read_config_byte((struct pci_dev *) (hw)->pdev, addr, data)
> +
> +#define HW_PCI_READ_WORD(hw, addr, data) \
> + pci_read_config_word((struct pci_dev *) (hw)->pdev, addr, data)
> +
> +#define HW_PCI_READ_DWORD(hw, addr, data) \
> + pci_read_config_dword((struct pci_dev *) (hw)->pdev, addr, data)
> +
> +#define HW_PCI_WRITE_BYTE(hw, addr, data) \
> + pci_write_config_byte((struct pci_dev *) (hw)->pdev, addr, data)
> +
> +#define HW_PCI_WRITE_WORD(hw, addr, data) \
> + pci_write_config_word((struct pci_dev *) (hw)->pdev, addr, data)
> +
> +#define HW_PCI_WRITE_DWORD(hw, addr, data) \
> + pci_write_config_dword((struct pci_dev *) (hw)->pdev, addr, data)
This sort of stuff just obfuscates things so the defines ought to get
removed so the code is more readable to all (remember the Linux kernel
code needs to be generally readable not just readable by Micrel people)
> + * delay_milli - delay in millisecond
> + * @millisec: Number of milliseconds to delay.
> + *
> + * This routine delays in milliseconds.
> + */
> +static void delay_milli(uint millisec)
> +{
> + unsigned long ticks = millisec * HZ / 1000;
> +
> + if (!ticks || in_interrupt())
> + mdelay(millisec);
> + else {
> + set_current_state(TASK_INTERRUPTIBLE);
> + schedule_timeout(ticks);
msleep() is probably what you want for this part ? Note that you really
don't want to be spinning for milliseconds in the IRQ paths if at all
possible.
The DEBUG ioctl is a bit odd - you define it and it does nothing - just a
left over ?
> +
> +#define PCI_VENDOR_ID_KS884X 0x16C6
> +#define PCI_DEVICE_ID_KS8841 0x8841
> +#define PCI_DEVICE_ID_KS8842 0x8842
Those belong in the pci device id header.
In your pci init function you do the following
> + hw->pdev = pdev;
If you make a private copy of pdev in your struct you should refcount it
and use pci_dev_get/pci_dev_put when you take and release the reference.
The proc stuff probably belongs in sysfs nowdays
2010/1/16 Ha, Tristram <Trist...@micrel.com>:
> This is a new network driver for Micrel KSZ8841/KSZ8842 PCI Ethernet chips. �The same driver can run both chips at the same time. �It supports IPv4 TCP hardware checksumming and so can use scatter/gather transmission.
>
> The KSZ8842 has a switch with lots of hardware configurations. �The driver uses the proc system to allow users to configure the switch. �If this is not desired the whole thing can be removed by not calling the init_proc() function.
>
> The KSZ8842 switch has 2 ports. �Some users like to take direct control of those ports. �So KSZ8842 has a multiple devices mode in which the driver creates another network device so that users can specify which port to send packets. �This mode is enabled by passing the "multi_dev=1" parameter to the driver during loading.
[...]
I briefly looked over the datasheet for your device and it looks like
it has the same basic design idea as a "ROBO" switch from Broadcom
that is used in bcm53xx SoCs. This can be seen as a 3-port switch with
one port fixed to be link to CPU (a special kind of PHY).
If my assumptions are correct, then I thing it would be better to have
a driver that presents this device as two: one - an ethernet device,
and second - a switch management interface like the one used in
OpenWRT.
Here are some links for reference:
http://downloads.openwrt.org/kamikaze/8.09.2/kamikaze_8.09.2_source.tar.bz2;
subdir: package/switch/src/
-> switch management driver for ADMTEK Adm6996 and Broadcom BCM5325E/536x
(I'm adding switch driver's authors to CC list. I hope I picket the
correct address for Felix from 3 in the sources. :-)
Best Regards,
Micha� Miros�aw
I've been working on a switch configuration API based on netlink, which
is being used on some of the other platforms in OpenWrt.
You can find the sources here:
https://dev.openwrt.org/export/19173/trunk/target/linux/generic-2.6/files/drivers/net/phy/swconfig.c
https://dev.openwrt.org/export/19173/trunk/target/linux/generic-2.6/files/include/linux/switch.h
It's meant to be used with drivers that hook into the PHY abstraction layer.
You can find an example of such a driver here:
https://dev.openwrt.org/export/19173/trunk/target/linux/generic-2.6/files/drivers/net/phy/rtl8306.c
I'll do the final cleanups and submit it for review once I have time to do so.
- Felix
I like to explain a little bit about this Spanning Tree Protocol
support.
Micrel KSZ8842's 3-port switch and Micrel's other 5-port switches have
port controls to enable/disable tx and rx and stop MAC address learning.
They are supposed to help run STP more efficiently, but somebody needs
to control those ports.
From my observation of how the brctl application controls the network
devices when running STP, I know the kernel bridge puts all the devices
under it in promiscuous mode and declares the state of each bridge port
associated with the device blocked or forwarding depending on the BPDU
frames received. When the port is blocked, the device is still active
and passes all frames to the host. The bridge only looks at BPDU frames
and drops all other frames. It is better to just shut off the port.
From the time when the KSZ8842 driver was developed for the Linux 2.4
kernel I looked for a kernel API to tell the bridge port's state so the
device driver can shut off the port if necessary. I could not find one
and so I came up with this hack to look at the bridge port's structure
directly. It looks dangerous but is quite safe. The driver only looks
at the bridge port state variable and finds out the MAC address
associated with the bridge device. It can get the state definitions
from the if_bridge header. The private bridge structure may change in
the future and break the code, but as kernel network interfaces are
changing all the time, the driver just needs to be modified for the new
version. To avoid this situation, the kernel may need to export two
functions to tell the bridge port state and bridge device address and
put the prototypes in the if_bridge header.
Now for the driver implementation for STP support. I programmed the
switch's static MAC table to always pass the following frames to the
host: BPDU frames with specific multicast address, broadcast frames,
unicast frames with the device bridge's MAC address, and multicast
frames with ICMPv6 multicast address. All other frames are not passed
to the host and are handled by the switch, forwarding each frame with
its standard forwarding logic. The port can be shut off if it is
blocked and those frames will not pass through that port. The host gets
BPDU frames so that the bridge can determine each port's state. The
other broadcast, unicast, and multicast frames passed to the host are
necessary if some other network devices want to communicate with the
host. As the forwarding is done by hardware rather than software,
overall performance does increase.
I did verify the driver disables or enables the port appropriately when
the bridge port state changes, but as I do not have the experience of
running a full-scale Spanning Tree, I do not know if this hardware
implementation behaves the same as the software one provided by the
kernel and brctl. I also did not try using VLAN.
The KSZ884X driver is modified from an old driver with shared code which
are used by other OS. I will clean it up further.
> The DEBUG ioctl is a bit odd - you define it and it does nothing -
just a left over ?
>
>
The KSZ884X has features and configurations that are not accessible from
standard Linux interface. The best way is probably using an application
to access the driver through ioctl interface. As that application
exists outside the kernel, I removed some code in the ioctl function. I
am still looking for a better solution to support those hardware
features. See my procfs comments below.
>
>> +
>> +#define PCI_VENDOR_ID_KS884X 0x16C6
>> +#define PCI_DEVICE_ID_KS8841 0x8841
>> +#define PCI_DEVICE_ID_KS8842 0x8842
>
> Those belong in the pci device id header.
>
>
I do not quite understand your suggestion. Do I need to put those IDs
in one of the kernel headers?
> In your pci init function you do the following
>
>> + hw->pdev = pdev;
>
>
> If you make a private copy of pdev in your struct you should refcount
it and use
> pci_dev_get/pci_dev_put when you take and release the reference.
>
>
The old driver uses shared code which compile in other OS. Those shared
functions use the hw structure, which does not use any OS dependent data
structures, as an anchor to access the hardware. In a few situations
the driver needs to write to the PCI configuration registers. That is
why the hw structure stores the pci device pointer as a generic pointer
variable.
I do not understand how pci_dev_get/pci_dev_put work. Does the pdev
pointer actually change during the lifetime of the PCI driver?
> The proc stuff probably belongs in sysfs nowdays
I need a quick way to access KSZ884X's hardware features without using a
user application. I use procfs as it is easy to implement. I am aware
that sysfs is preferred. But after trying it I do not think it is
appropriate for my purpose.
The sysfs interface requires just a filename, and read/write functions.
It does not provide a means to pass user-defined data. So it is fine to
just modify a simple global variable that controls a driver behavior.
The KSZ884X driver has same settings for each individual port. The
sysfs does not seem to have the capability to create an attribute with
filename like "port1/setting1," so an alternative is to use
"port1_setting1," "port1_setting2," and so on. It does not look
organized. (I used the DEVICE_ATTR macro, and the compiler complained
about the name. I probably did it wrong.) Also, as the attribute
access functions do not know anything about the port, the same identical
functions have to be created for each port. It is not efficient.
I also like the attribute to associate with the network device rather
than the PCI device, as the KSZ8842 driver can create another virtual
network device. I tried to pass the device pointer of the network
device to device_create_file function and the driver crashed. I will
investigate this matter further.
Into include/linux/pci_ids.h
> >> + hw->pdev = pdev;
> >
> >
> > If you make a private copy of pdev in your struct you should refcount
> it and use
> > pci_dev_get/pci_dev_put when you take and release the reference.
> I do not understand how pci_dev_get/pci_dev_put work. Does the pdev
> pointer actually change during the lifetime of the PCI driver?
No but it can go away if the device is removed. The pci_dev_get ensures
it won't go away while you have a pointer to it. and the pci_dev_put
gives up your reference.
> I also like the attribute to associate with the network device rather
> than the PCI device, as the KSZ8842 driver can create another virtual
> network device. I tried to pass the device pointer of the network
> device to device_create_file function and the driver crashed. I will
> investigate this matter further.
Ok
Ok.
> From the time when the KSZ8842 driver was developed for the Linux 2.4
> kernel I looked for a kernel API to tell the bridge port's state so the
> device driver can shut off the port if necessary. I could not find one
> and so I came up with this hack to look at the bridge port's structure
> directly. It looks dangerous but is quite safe. The driver only looks
> at the bridge port state variable and finds out the MAC address
> associated with the bridge device. It can get the state definitions
> from the if_bridge header. The private bridge structure may change in
> the future and break the code, but as kernel network interfaces are
> changing all the time, the driver just needs to be modified for the new
> version. To avoid this situation, the kernel may need to export two
> functions to tell the bridge port state and bridge device address and
> put the prototypes in the if_bridge header.
There was one added for user level RSTP support.
RSTP is where Linux STP is headed, so adding more hooks into
existing STP is going backwards.
> Now for the driver implementation for STP support. I programmed the
> switch's static MAC table to always pass the following frames to the
> host: BPDU frames with specific multicast address, broadcast frames,
> unicast frames with the device bridge's MAC address, and multicast
> frames with ICMPv6 multicast address. All other frames are not passed
> to the host and are handled by the switch, forwarding each frame with
> its standard forwarding logic. The port can be shut off if it is
> blocked and those frames will not pass through that port. The host gets
> BPDU frames so that the bridge can determine each port's state. The
> other broadcast, unicast, and multicast frames passed to the host are
> necessary if some other network devices want to communicate with the
> host. As the forwarding is done by hardware rather than software,
> overall performance does increase.
What about LACP needed by bridging?
Your work looks interesting, but rtl8036 seems to use MDIO page registers to access its own internal registers. Micrel switches do not have that feature and so cannot use that way.
I am looking at those PHY drivers and try to figure out how they work under mdio bus. I am using Micrel's own KSZ8695P and KSZ9692P SoC to develop and debug the KSZ884X PCI driver. KSZ9692P actually has two external PHYs so that the PHY driver can run on it. But the driver does not load because I probably need to add some hooks to the mdio bus driver to do the actual hardware access. Can somebody give me some pointers to this mdio bus support? Does the PHY driver create a user interface like eth0 so that it can be accessed using some tools like ethtool?
Anyway I am afraid the phy driver model does not work on KSZ884X. I cannot think of how the mdio bus accesses hardware registers located on a PCI device.
I also have a config utility for it:
https://dev.openwrt.org/browser/trunk/package/swconfig/src
It currently depends on our config library (libuci), but that can be
removed easily, as the code for that is not spread over the core of the
utility.
Part of this code can also be used as a small library, if you want to
write your own utility.
- Felix
I am not aware of LACP and do not know how this protocol works under
bridging. If the requirement is certain multicast frames do not get
forwarded and must pass to the host bridge, I can add those fixed
multicast addresses. The static MAC table has 8 entries, so there are 4
more to use.
>
If the PCI device is removed, like physically removing the card,
shouldn't the kernel also close the network device assocated with the
PCI device? The driver does actually cleanup the network devices and
free all memory when the pci remove function is called.
From the PCI network drivers included in the kernel I found most drivers
use pci_dev_put only on devices that are outside their own PCI devices,
retrieved from the pci_get_device call. (They never call pci_dev_get.)
That makes sense as those PCI devices are outside PCI driver control.
> Stephen Hemminger wrote:
> >> Now for the driver implementation for STP support. I programmed the
> >> switch's static MAC table to always pass the following frames to the
> >> host: BPDU frames with specific multicast address, broadcast frames,
> >> unicast frames with the device bridge's MAC address, and multicast
> >> frames with ICMPv6 multicast address. All other frames are not
> passed
> >> to the host and are handled by the switch, forwarding each frame with
> >> its standard forwarding logic. The port can be shut off if it is
> >> blocked and those frames will not pass through that port. The host
> >> gets BPDU frames so that the bridge can determine each port's state.
> >> The other broadcast, unicast, and multicast frames passed to the host
> >> are necessary if some other network devices want to communicate with
> >> the host. As the forwarding is done by hardware rather than
> software,
> >> overall performance does increase.
> >
> > What about LACP needed by bridging?
> >
>
> I am not aware of LACP and do not know how this protocol works under
> bridging. If the requirement is certain multicast frames do not get
> forwarded and must pass to the host bridge, I can add those fixed
> multicast addresses. The static MAC table has 8 entries, so there are 4
> more to use.
Anything 01:80:C2:00:00:00 should go local host.
LACP is part of 802.3ad bonding and uses 01:80:C2:00:02
In general anything to 01:80:C2:00:00:XX is likely to be
used by some IEEE 802 standard for link only multicast.
If the strict requirement is to support all 01:80:C2:00:00:XX multicast
addresses, my scheme will not work. It was designed only for STP, as
most our customers request that feature.
I will pass your suggestions to our hardware engineers so that they can
develop a better switch engine.
I wouldn't worry about anything but STP.
> If the strict requirement is to support all 01:80:C2:00:00:XX multicast
> addresses, my scheme will not work. It was designed only for STP, as
> most our customers request that feature.
>
> I will pass your suggestions to our hardware engineers so that they can
> develop a better switch engine.
It is fine as is, just a warning of what standards committees are
likely to invent in future.
--