This sequence of patches will add a framework to libata to allow for
hot-swapping disks in and out.
There are three patches:
01-promise_sataII150_support
02-libata_hotswap_infrastructure
03-promise_hotswap_support
The rationale for each will be described in following emails.
I encourage anyone with design ideas to come forward and contribute, and
anyone who can see concurrency problems (I will describe what I see as
issues along with the second patch) to suggest fixes.
Thus far, I've tested this HEAVILY with a 2.6.11.12 kernel +
2.6.11-libata-dev1.patch. I have found no issues outstanding on that
kernel. All testing was done with Promise SATA150 and SATAII150 Tx4/Tx2
Plus controllers and a huge variety of Western Digital and Seagate disks.
I have ported my patches to 2.6.13-rc3 and 2.6.13-rc3-mm1, and have
tested on the latter as well; they work identically to the 2.6.11 tests
except for a breakage in the SCSI layer [1].
The patches I will attach apply to the latter (2.6.13-rc3-mm1) tree,
since I expect that by the time people start looking at them seriously,
the existing libata patches in that tree will have become part of
mainline. If this is NOT the right thing to do, please tell me, and
I'll submit patches for the requested kernel version.
Enjoy!
Luke Kosewski
Human Cannonball
Net Integration Technologies
[1] The SCSI error on 2.6.13-rc3-mm1 that I found:
'echo "scsi add-single-device a b c d" > /proc/scsi/scsi' //works, or
no-op if the sd corresponding to that device is there already
'echo "scsi remove-single-device a b c d" > /proc/scsi/scsi' //works
'echo "scsi add-single-device a b c d" > /proc/scsi/scsi' //works
'echo "scsi remove-single-device a b c d" > /proc/scsi/scsi' //FAILS
As such, since the same underlying functions are called by hotplugging,
you will only be able to remove a disk from a device once before it
fails, until this error is fixed. I'll look into it as well.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
The design goals for this framework were as follows:
- easy, stable API.
- simplicity of design and code
- as damn near as we can get to a guarantee that we will NOT panic the
kernel if the user does something stupid, an an attempt to guarantee
correct behaviour anyways.
To meet these goals, I have many comments. The new API, as far as
driver writers see, is two functions 'ata_hotplug_plug' and
'ata_hotplug_unplug'. Respectively, these should be called when a new
disk is picked up or removed. This seems like the most logical way to
go about this.
The functions use a single shared workqueue to schedule plug and unplug
events. In the 'normal case' where a user merely plugs in or unplugs a
disk, you can trivially follow the functionality. The edge cases all
have to do with these cases: what happens when the user really quickly
plugs/unplugs disks, or has pending I/O?
You'll note at a cursory glance that we need to call an ata_bus_reset
when plugging in a disk; this means that we have to wait a bit for
everything to settle. If the user goes and unplugs the disk on us, we
need to immediately take corrective action.
As such, the 'plug' function is a little complicated.
There is also the issue of I/O (or other requests?) pending on a device
when we unplug it. Since we do not want to panic the kernel, we need to
handle them gracefully. As it stands, the SCSI layer does this for us
rather nicely, EXCEPT in the case that we have an outstanding qc on our
device, waiting for completion. This is the explanation for the 'if
(qc)' checks in various parts of the code to kill these functions.
I must point out that I have NOT tested this on SMP systems, where we
can potentially be servicing more than one request from our request
queue. Potentially doing a plug/unplug/plug/unplug/etc. kind of action
very fast might lead us to running multiple 'plug' and 'unplug'
functions simultaneously.
I did NOT want to complicate the libata code by throwing spinlocks
around all over the place to protect against this (changing working,
existing code), instead, I have wrapped the 'plug' and 'unplug'
functions within a mutex. This should ensure no erroneous behaviour,
and still service requests in a timely fashion 99% of the time (on UP
systems, it actually makes no difference).
There are GRATUITOUS comments put in the code for you to look at to see
if my logic is flawed. Have fun!
Luke Kosewski
Human Cannonball
Net Integration Technologies
Signed-off-by: Luke Kosewski <lkos...@nit.ca>
We handle hotplug interrupts and call our new API functions in libata.
There is also one new part for the SATAII150 controllers and how they
handle a phy_reset, since they seem to want to re-spew their last
hotplug interrupt unless we mask and clear it.
I don't expect this to be too contentious either.
Also, the promise interrupt handler uses a 'spin_lock', I have changed
it into a 'spin_lock_irqsave', since I observe this on most other libata
drivers, so for consistency.
I've also set up a new callback for handling SATAII150 interrupts, which
seemingly does nothing in this patch. Yes, it does nothing. Wait until
we get to patch 03, though.
I don't expect there to be too much contentious material in this patch.
echo -n 1 > /sys/.../hostA/targetA:B:C/A:B:C:D/delete
still works. I think.
And (again, I think) this same problem exists with 2.6.11 as well.
At least, I wasn't able to remove-single-device even once (I discovered
this mechanism only recently, haven't tried it with other kernels).
> As such, since the same underlying functions are called by hotplugging,
> you will only be able to remove a disk from a device once before it
> fails, until this error is fixed. I'll look into it as well.
/mjt
You're both correct and incorrect based on my testing; in 2.6.11.12, I
have no problems.
However, in 2.6.13-rc1-mm1, you're right that echoing 1 into the delete
node does remove the device.
It seems that there is some issue with the 'scsi_device_lookup' function
then?
I'd have to debug further.
Luke Kosewski
Human Cannonball
Net Integration Technologies
Pretty cool stuff!
As soon as I finish SATA ATAPI (this week[end]), I'll take a look at
this. A quick review of the patches didn't turn up anything terribly
objectionable, though :)
Jeff
P.S. You might want to CC linux-ide as well, on libata patches.
Excellent! Thanks for the quick turnaround!
> P.S. You might want to CC linux-ide as well, on libata patches.
I've CCed the original patches to Linux-ide for comments. Also, I've
re-sent patch 3 with a minor fix to boolean logic with bits.
Enjoy!
Luke Kosewski
Human Cannonball
Net Integration Technologies
Comments:
1) Interrupt handler should use the much-less-expensive spin_lock().
spin_lock_irqsave() is only used where two separate interrupts (legacy
IDE irqs 14 & 15) could be sharing the same interrupt handler.
2) Don't pass the hotplug register offset as an argument to a function.
Add the offset as a member of struct pdc_port_priv.
This eliminates the need to split up the interrupt handler as you have done.
3) Don't comment out ATA_FLAG_SATA, it is a SATA controller :)
Jeff
P.S. Watch for the SiI hotplug email I'm about to send, too...
Don't special case this. disable/enable hotplug interrupts for all
controllers.
> static void pdc_pata_cbl_detect(struct ata_port *ap)
> {
> u8 tmp;
> @@ -483,6 +526,7 @@ static inline unsigned int pdc_interrupt
> struct ata_host_set *host_set = dev_instance;
> struct ata_port *ap;
> void *mmio_base;
> + u8 plugdata, maskflags;
> u32 mask = 0;
> unsigned int i, tmp, handled = 0;
> unsigned long flags;
> @@ -492,21 +536,18 @@ static inline unsigned int pdc_interrupt
> }
>
> mmio_base = host_set->mmio_base;
> -
> +
> spin_lock_irqsave(&host_set->lock, flags);
>
> /* reading should also clear interrupts */
> mask = readl(mmio_base + PDC_INT_SEQMASK);
>
> - if (mask == 0xffffffff) {
> - VPRINTK("QUICK EXIT 2\n");
> - goto done_irq;
> - }
> + if (mask == 0xffffffff)
> + goto try_hotplug;
you are confusing controller hotplug (mask == 0xffffffff) and device
hotplug.
controller hotplug should already be handled correct, by testing
0xffffffff and supporting the PCI ->remove hook.
> mask &= 0xffff; /* only 16 tags possible */
> - if (!mask) {
> - VPRINTK("QUICK EXIT 3\n");
> - goto done_irq;
> - }
> + if (!mask)
> + goto try_hotplug;
>
> writel(mask, mmio_base + PDC_INT_SEQMASK);
>
> @@ -522,7 +563,32 @@ static inline unsigned int pdc_interrupt
> handled += pdc_host_intr(ap, qc);
> }
> }
> -
> +
> + if (handled)
> + goto done_irq;
> +
> +try_hotplug:
> + plugdata = readb(mmio_base + hotplug_regs);
> + maskflags = readb(mmio_base + hotplug_regs + 2);
> + plugdata &= ~maskflags;
> + if (plugdata) {
> + writeb(plugdata, mmio_base + hotplug_regs);
> + for (i = 0; i < host_set->n_ports; ++i) {
> + ap = host_set->ports[i];
> + if (!(ap->flags & ATA_FLAG_SATA))
> + continue; //No PATA support here... yet
> + // Check unplug flag
> + if (plugdata & 0x01) {
> + ata_hotplug_unplug(ap);
> + handled = 1;
> + } else if ((plugdata >> 4) & 0x01) { //Check plug flag
> + ata_hotplug_plug(ap);
> + handled = 1;
> + }
> + plugdata >>= 1;
you probably need a debounce timer; see the SiI email I am about to send.
> done_irq:
> spin_unlock_irqrestore(&host_set->lock, flags);
> return handled;
> @@ -636,9 +702,9 @@ static void pdc_host_init(unsigned int c
> tmp = readl(offset);
> writel(tmp | 0xff, offset);
>
> - /* mask plug/unplug ints */
> + /* unmask plug/unplug ints */
> tmp = readl(offset);
> - writel(tmp | 0xff0000, offset);
> + writel(tmp & 0xff00ffff, offset);
>
> /* reduce TBG clock to 133 Mhz. */
> tmp = readl(mmio + PDC_TBG_MODE);
I would like to offer to test when you are ready. Some older and new SATAPI
drives, various chipsets (ICH{5,6}, TX4 on the way). And a SATA analyzer
for anything really odd.
++doug
Great!
It'll be posted here on linux-ide, so just keep an eye out.
Analysis of any portion of libata, with a SATA analyzer, would be much
appreciated.
Jeff