[PATCH 0/3] Add disk hotswap support to libata

Lukasz Kosewski

unread,

Jul 21, 2005, 5:20:09 PM7/21/05

to

Hey all, introductory blurb here.

This sequence of patches will add a framework to libata to allow for
hot-swapping disks in and out.

There are three patches:
01-promise_sataII150_support
02-libata_hotswap_infrastructure
03-promise_hotswap_support

The rationale for each will be described in following emails.

I encourage anyone with design ideas to come forward and contribute, and
anyone who can see concurrency problems (I will describe what I see as
issues along with the second patch) to suggest fixes.

Thus far, I've tested this HEAVILY with a 2.6.11.12 kernel +
2.6.11-libata-dev1.patch. I have found no issues outstanding on that
kernel. All testing was done with Promise SATA150 and SATAII150 Tx4/Tx2
Plus controllers and a huge variety of Western Digital and Seagate disks.

I have ported my patches to 2.6.13-rc3 and 2.6.13-rc3-mm1, and have
tested on the latter as well; they work identically to the 2.6.11 tests
except for a breakage in the SCSI layer [1].

The patches I will attach apply to the latter (2.6.13-rc3-mm1) tree,
since I expect that by the time people start looking at them seriously,
the existing libata patches in that tree will have become part of
mainline. If this is NOT the right thing to do, please tell me, and
I'll submit patches for the requested kernel version.

Enjoy!

Luke Kosewski
Human Cannonball
Net Integration Technologies

[1] The SCSI error on 2.6.13-rc3-mm1 that I found:
'echo "scsi add-single-device a b c d" > /proc/scsi/scsi' //works, or
no-op if the sd corresponding to that device is there already
'echo "scsi remove-single-device a b c d" > /proc/scsi/scsi' //works
'echo "scsi add-single-device a b c d" > /proc/scsi/scsi' //works
'echo "scsi remove-single-device a b c d" > /proc/scsi/scsi' //FAILS

As such, since the same underlying functions are called by hotplugging,
you will only be able to remove a disk from a device once before it
fails, until this error is fixed. I'll look into it as well.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Lukasz Kosewski

unread,

Jul 21, 2005, 5:30:12 PM7/21/05

to

This patch is probably the most contentious one; adding a hotswap
framework to libata to allow it to handle disk plugs/unplugs.

The design goals for this framework were as follows:
- easy, stable API.
- simplicity of design and code
- as damn near as we can get to a guarantee that we will NOT panic the
kernel if the user does something stupid, an an attempt to guarantee
correct behaviour anyways.

To meet these goals, I have many comments. The new API, as far as
driver writers see, is two functions 'ata_hotplug_plug' and
'ata_hotplug_unplug'. Respectively, these should be called when a new
disk is picked up or removed. This seems like the most logical way to
go about this.

The functions use a single shared workqueue to schedule plug and unplug
events. In the 'normal case' where a user merely plugs in or unplugs a
disk, you can trivially follow the functionality. The edge cases all
have to do with these cases: what happens when the user really quickly
plugs/unplugs disks, or has pending I/O?

You'll note at a cursory glance that we need to call an ata_bus_reset
when plugging in a disk; this means that we have to wait a bit for
everything to settle. If the user goes and unplugs the disk on us, we
need to immediately take corrective action.

As such, the 'plug' function is a little complicated.

There is also the issue of I/O (or other requests?) pending on a device
when we unplug it. Since we do not want to panic the kernel, we need to
handle them gracefully. As it stands, the SCSI layer does this for us
rather nicely, EXCEPT in the case that we have an outstanding qc on our
device, waiting for completion. This is the explanation for the 'if
(qc)' checks in various parts of the code to kill these functions.

I must point out that I have NOT tested this on SMP systems, where we
can potentially be servicing more than one request from our request
queue. Potentially doing a plug/unplug/plug/unplug/etc. kind of action
very fast might lead us to running multiple 'plug' and 'unplug'
functions simultaneously.

I did NOT want to complicate the libata code by throwing spinlocks
around all over the place to protect against this (changing working,
existing code), instead, I have wrapped the 'plug' and 'unplug'
functions within a mutex. This should ensure no erroneous behaviour,
and still service requests in a timely fashion 99% of the time (on UP
systems, it actually makes no difference).

There are GRATUITOUS comments put in the code for you to look at to see
if my logic is flawed. Have fun!

Luke Kosewski
Human Cannonball
Net Integration Technologies

Signed-off-by: Luke Kosewski <lkos...@nit.ca>

02-libata_hotswap_infrastructure-2.6.13-rc3-mm1.diff

Lukasz Kosewski

unread,

Jul 21, 2005, 5:30:14 PM7/21/05

to

This patch is an implementation of hotswap on the sata_promise module,
tested on SATA150 and SATAII150 Promise controllers. It depends on
patch 01 from this series of patches to apply, and both 01 and 02 in
order to compile.

We handle hotplug interrupts and call our new API functions in libata.

There is also one new part for the SATAII150 controllers and how they
handle a phy_reset, since they seem to want to re-spew their last
hotplug interrupt unless we mask and clear it.

I don't expect this to be too contentious either.

03-promise_hotswap_support-2.6.13-rc3-mm1.diff

Lukasz Kosewski

unread,

Jul 21, 2005, 5:30:14 PM7/21/05

to

This patch changes the sata_promise driver in libata to correctly mask
out hotplug interrupts. The location of the primary hotplug registers
in the SATA150 Tx4/Tx2 Plus controllers is correctly defined as '0x6C',
HOWEVER, for the SATAII150 Tx4/Tx2 Plus controllers, this changes to
'0x60'. This patch rectifies us 'masking out interrupts' at the wrong
location, thus not masking them out at all.

Also, the promise interrupt handler uses a 'spin_lock', I have changed
it into a 'spin_lock_irqsave', since I observe this on most other libata
drivers, so for consistency.

I've also set up a new callback for handling SATAII150 interrupts, which
seemingly does nothing in this patch. Yes, it does nothing. Wait until
we get to patch 03, though.

I don't expect there to be too much contentious material in this patch.

01-promise_sataII150_support-2.6.13-rc3-mm1.diff

Michael Tokarev

unread,

Jul 21, 2005, 6:10:11 PM7/21/05

to

Lukasz Kosewski wrote:
[]

> [1] The SCSI error on 2.6.13-rc3-mm1 that I found:
> 'echo "scsi add-single-device a b c d" > /proc/scsi/scsi' //works, or
> no-op if the sd corresponding to that device is there already
> 'echo "scsi remove-single-device a b c d" > /proc/scsi/scsi' //works
> 'echo "scsi add-single-device a b c d" > /proc/scsi/scsi' //works
> 'echo "scsi remove-single-device a b c d" > /proc/scsi/scsi' //FAILS

echo -n 1 > /sys/.../hostA/targetA:B:C/A:B:C:D/delete
still works. I think.
And (again, I think) this same problem exists with 2.6.11 as well.
At least, I wasn't able to remove-single-device even once (I discovered
this mechanism only recently, haven't tried it with other kernels).

> As such, since the same underlying functions are called by hotplugging,
> you will only be able to remove a disk from a device once before it
> fails, until this error is fixed. I'll look into it as well.

/mjt

Lukasz Kosewski

unread,

Jul 21, 2005, 6:20:07 PM7/21/05

to

Michael Tokarev wrote:
> echo -n 1 > /sys/.../hostA/targetA:B:C/A:B:C:D/delete
> still works. I think.
> And (again, I think) this same problem exists with 2.6.11 as well.
> At least, I wasn't able to remove-single-device even once (I discovered
> this mechanism only recently, haven't tried it with other kernels).

You're both correct and incorrect based on my testing; in 2.6.11.12, I
have no problems.

However, in 2.6.13-rc1-mm1, you're right that echoing 1 into the delete
node does remove the device.

It seems that there is some issue with the 'scsi_device_lookup' function
then?

I'd have to debug further.

Luke Kosewski
Human Cannonball
Net Integration Technologies

Jeff Garzik

unread,

Jul 21, 2005, 9:40:06 PM7/21/05

to

Lukasz Kosewski wrote:
> Hey all, introductory blurb here.
>
> This sequence of patches will add a framework to libata to allow for
> hot-swapping disks in and out.
>
> There are three patches:
> 01-promise_sataII150_support
> 02-libata_hotswap_infrastructure
> 03-promise_hotswap_support

Pretty cool stuff!

As soon as I finish SATA ATAPI (this week[end]), I'll take a look at
this. A quick review of the patches didn't turn up anything terribly
objectionable, though :)

Jeff

P.S. You might want to CC linux-ide as well, on libata patches.

Lukasz Kosewski

unread,

Jul 22, 2005, 2:40:08 PM7/22/05

to

On 7/21/05, Jeff Garzik <jga...@pobox.com> wrote:
>
> Pretty cool stuff!
>
> As soon as I finish SATA ATAPI (this week[end]), I'll take a look at
> this. A quick review of the patches didn't turn up anything terribly
> objectionable, though :)

Excellent! Thanks for the quick turnaround!

> P.S. You might want to CC linux-ide as well, on libata patches.

I've CCed the original patches to Linux-ide for comments. Also, I've
re-sent patch 3 with a minor fix to boolean logic with bits.

Enjoy!

Luke Kosewski
Human Cannonball
Net Integration Technologies

Jeff Garzik

unread,

Jul 26, 2005, 1:30:31 PM7/26/05

to

Lukasz Kosewski wrote:
> This patch changes the sata_promise driver in libata to correctly mask
> out hotplug interrupts. The location of the primary hotplug registers
> in the SATA150 Tx4/Tx2 Plus controllers is correctly defined as '0x6C',
> HOWEVER, for the SATAII150 Tx4/Tx2 Plus controllers, this changes to
> '0x60'. This patch rectifies us 'masking out interrupts' at the wrong
> location, thus not masking them out at all.
>
> Also, the promise interrupt handler uses a 'spin_lock', I have changed
> it into a 'spin_lock_irqsave', since I observe this on most other libata
> drivers, so for consistency.

Comments:

1) Interrupt handler should use the much-less-expensive spin_lock().
spin_lock_irqsave() is only used where two separate interrupts (legacy
IDE irqs 14 & 15) could be sharing the same interrupt handler.

2) Don't pass the hotplug register offset as an argument to a function.
Add the offset as a member of struct pdc_port_priv.

This eliminates the need to split up the interrupt handler as you have done.

3) Don't comment out ATA_FLAG_SATA, it is a SATA controller :)

Jeff

P.S. Watch for the SiI hotplug email I'm about to send, too...

Jeff Garzik

unread,

Jul 26, 2005, 1:30:27 PM7/26/05

to

Lukasz Kosewski wrote:
> Hey Jeff, everyone.
>
> This is a resend of patch 3 of my libata hotswap series, wherein I
> found a silly flaw in my logic. I don't know what kind of crack I was
> smoking, but I somehow turned "a = a & ~b" in the Promise driver into
> "a = a ^ b" in my hotswap code, which is clearly wrong. I don't know
> how it worked through my testing, but I'm willing to bet this is a
> coincidence. Please apply this patch instead of the previously sent
> patch 03.

>
> Luke Kosewski
> Human Cannonball
> Net Integration Technologies
>
>

> ------------------------------------------------------------------------
>
> 21.07.05 Luke Kosewski <lkos...@nit.ca>
>
> * A full implementation of hotplug on a libata controller, this being
> the Promise Tx4/Tx2 Plus controller line (both SATA150 and SATAII150).
> Almost all of the code pertaining to how to talk to the hotplug
> registers has been stolen from the pdc-ulsata2 and ultra-1.0.8 Promise
> drivers. This involves detecting when we have an interrupt pending
> and on what device, as well as the bit where a hard SATA reset gets
> a SATAII150 controller to re-spew a plug interrupt.
> * Note that the hotplug handling code comes AFTER the normal interrupt
> handling code in pdc_interrupt_common; this is because we're much
> more likely to receive normal interrupts, so this drops the AVERAGE
> interrupt handling time down a lot.
>
> Signed-off-by: Luke Kosewski <lkos...@nit.ca>
>
> --- linux-2.6.13-rc3/drivers/scsi/sata_promise.c.old 2005-07-21 13:52:13.037895639 -0400
> +++ linux-2.6.13-rc3/drivers/scsi/sata_promise.c 2005-07-21 13:55:53.490964645 -0400
> @@ -84,6 +84,7 @@ static void pdc_eng_timeout(struct ata_p
> static int pdc_port_start(struct ata_port *ap);
> static void pdc_port_stop(struct ata_port *ap);
> static void pdc_phy_reset(struct ata_port *ap);
> +static void pdc2_phy_reset(struct ata_port *ap);
> static void pdc_pata_phy_reset(struct ata_port *ap);
> static void pdc_pata_cbl_detect(struct ata_port *ap);
> static void pdc_qc_prep(struct ata_queued_cmd *qc);
> @@ -139,7 +140,7 @@ static struct ata_port_operations pdc2_a
> .check_status = ata_check_status,
> .exec_command = pdc_exec_command_mmio,
> .dev_select = ata_std_dev_select,
> - .phy_reset = pdc_phy_reset,
> + .phy_reset = pdc2_phy_reset,
> .qc_prep = pdc_qc_prep,
> .qc_issue = pdc_qc_issue_prot,
> .eng_timeout = pdc_eng_timeout,
> @@ -325,6 +326,48 @@ static void pdc_phy_reset(struct ata_por
> pdc_pata_phy_reset(ap);
> }
>
> +/* Mask hotplug interrupts for one channel (ap) */
> +static inline void pdc2_disable_channel_hotplug_interrupts(struct ata_port *ap)
> +{
> + void *mmio = ap->host_set->mmio_base + PDC2_SATA_PLUG_CSR + 2;
> +
> + u8 maskflags = readb(mmio);
> + maskflags |= (0x11 << (u8)ap->hard_port_no);
> + writeb(maskflags, mmio);
> +}
> +
> +static inline void pdc2_enable_channel_hotplug_interrupts(struct ata_port *ap)
> +{
> +
> + void *mmio = ap->host_set->mmio_base + PDC2_SATA_PLUG_CSR;
> +
> + //Clear channel hotplug interrupts
> + u8 maskflags = readb(mmio);
> + maskflags = (0x11 << (u8)ap->hard_port_no);
> + writeb(maskflags, mmio);
> +
> + //Unmask channel hotplug interrupts
> + maskflags = readb(mmio + 2);
> + maskflags &= ~(0x11 << (u8)ap->hard_port_no);
> + writeb(maskflags, mmio + 2);
> +}
> +
> +static void pdc2_phy_reset(struct ata_port *ap)
> +{
> + /* As observed on the Promise SATAII150 Tx2 Plus/Tx4, giving the
> + * controller a hard reset triggers another hotplug interrupt. So
> + * disable them for the hard reset, and re-enable afterwards.
> + *
> + * No PATA support here yet
> + */
> + if (ap->flags & ATA_FLAG_SATA_RESET && ap->flags & ATA_FLAG_SATA) {
> + pdc2_disable_channel_hotplug_interrupts(ap);
> + pdc_phy_reset(ap);
> + pdc2_enable_channel_hotplug_interrupts(ap);
> + } else
> + pdc_phy_reset(ap);
> +}

Don't special case this. disable/enable hotplug interrupts for all
controllers.

> static void pdc_pata_cbl_detect(struct ata_port *ap)
> {
> u8 tmp;
> @@ -483,6 +526,7 @@ static inline unsigned int pdc_interrupt
> struct ata_host_set *host_set = dev_instance;
> struct ata_port *ap;
> void *mmio_base;
> + u8 plugdata, maskflags;
> u32 mask = 0;
> unsigned int i, tmp, handled = 0;
> unsigned long flags;
> @@ -492,21 +536,18 @@ static inline unsigned int pdc_interrupt
> }
>
> mmio_base = host_set->mmio_base;
> -
> +
> spin_lock_irqsave(&host_set->lock, flags);
>
> /* reading should also clear interrupts */
> mask = readl(mmio_base + PDC_INT_SEQMASK);
>
> - if (mask == 0xffffffff) {
> - VPRINTK("QUICK EXIT 2\n");
> - goto done_irq;
> - }
> + if (mask == 0xffffffff)
> + goto try_hotplug;

you are confusing controller hotplug (mask == 0xffffffff) and device
hotplug.

controller hotplug should already be handled correct, by testing
0xffffffff and supporting the PCI ->remove hook.

> mask &= 0xffff; /* only 16 tags possible */
> - if (!mask) {
> - VPRINTK("QUICK EXIT 3\n");
> - goto done_irq;
> - }
> + if (!mask)
> + goto try_hotplug;
>
> writel(mask, mmio_base + PDC_INT_SEQMASK);
>
> @@ -522,7 +563,32 @@ static inline unsigned int pdc_interrupt
> handled += pdc_host_intr(ap, qc);
> }
> }
> -
> +
> + if (handled)
> + goto done_irq;
> +
> +try_hotplug:
> + plugdata = readb(mmio_base + hotplug_regs);
> + maskflags = readb(mmio_base + hotplug_regs + 2);
> + plugdata &= ~maskflags;
> + if (plugdata) {
> + writeb(plugdata, mmio_base + hotplug_regs);
> + for (i = 0; i < host_set->n_ports; ++i) {
> + ap = host_set->ports[i];
> + if (!(ap->flags & ATA_FLAG_SATA))
> + continue; //No PATA support here... yet
> + // Check unplug flag
> + if (plugdata & 0x01) {
> + ata_hotplug_unplug(ap);
> + handled = 1;
> + } else if ((plugdata >> 4) & 0x01) { //Check plug flag
> + ata_hotplug_plug(ap);
> + handled = 1;
> + }
> + plugdata >>= 1;

you probably need a debounce timer; see the SiI email I am about to send.

> done_irq:
> spin_unlock_irqrestore(&host_set->lock, flags);
> return handled;
> @@ -636,9 +702,9 @@ static void pdc_host_init(unsigned int c
> tmp = readl(offset);
> writel(tmp | 0xff, offset);
>
> - /* mask plug/unplug ints */
> + /* unmask plug/unplug ints */
> tmp = readl(offset);
> - writel(tmp | 0xff0000, offset);
> + writel(tmp & 0xff00ffff, offset);
>
> /* reduce TBG clock to 133 Mhz. */
> tmp = readl(mmio + PDC_TBG_MODE);

Doug Maxey

unread,

Jul 28, 2005, 3:00:11 PM7/28/05

to

On Thu, 21 Jul 2005 21:35:24 EDT, Jeff Garzik wrote:
>As soon as I finish SATA ATAPI (this week[end]), I'll take a look at
>this. A quick review of the patches didn't turn up anything terribly
>objectionable, though :)
>

I would like to offer to test when you are ready. Some older and new SATAPI
drives, various chipsets (ICH{5,6}, TX4 on the way). And a SATA analyzer
for anything really odd.

++doug

Jeff Garzik

unread,

Jul 28, 2005, 3:20:13 PM7/28/05

to

Doug Maxey wrote:
> On Thu, 21 Jul 2005 21:35:24 EDT, Jeff Garzik wrote:
>
>>As soon as I finish SATA ATAPI (this week[end]), I'll take a look at
>>this. A quick review of the patches didn't turn up anything terribly
>>objectionable, though :)
>>
>
>
> I would like to offer to test when you are ready. Some older and new SATAPI
> drives, various chipsets (ICH{5,6}, TX4 on the way). And a SATA analyzer
> for anything really odd.

Great!

It'll be posted here on linux-ide, so just keep an eye out.

Analysis of any portion of libata, with a SATA analyzer, would be much
appreciated.

Jeff