Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
cciss: Fix race between disk-adding code and interrupt handler
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
scame...@beardog.cca.cpqcorp.net  
View profile  
 More options Apr 14 2008, 10:30 am
Newsgroups: linux.kernel
From: scame...@beardog.cca.cpqcorp.net
Date: Mon, 14 Apr 2008 16:30:20 +0200
Local: Mon, Apr 14 2008 10:30 am
Subject: [patch] cciss: Fix race between disk-adding code and interrupt handler

Fix race condition between cciss_init_one(), cciss_update_drive_info(),
and cciss_check_queues().  cciss_softirq_done would try to start
queues which were not quite ready to be started, as its checks for
readiness were not sufficiently synchronized with the queue initializing
code in cciss_init_one and cciss_update_drive_info.  Slow cpu and
large numbers of logical drives seem to make the race more likely
to cause a problem.

Signed-off-by: Stephen M. Cameron <scame...@beardog.cca.cpqcorp.net>

---

 linux-2.6.25-rc9/drivers/block/cciss.c |   25 ++++++++++++++++++++++++-
 linux-2.6.25-rc9/drivers/block/cciss.h |    3 +++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff -puN linux-2.6.25-rc9/drivers/block/cciss.c~cciss_init_one_race linux-2.6.25-rc9/drivers/block/cciss.c
--- linux-2.6.25-rc9/linux-2.6.25-rc9/drivers/block/cciss.c~cciss_init_one_race 2008-04-14 08:21:03.000000000 -0500
+++ linux-2.6.25-rc9-scameron/linux-2.6.25-rc9/drivers/block/cciss.c    2008-04-14 08:22:04.000000000 -0500
@@ -1270,7 +1270,9 @@ static void cciss_check_queues(ctlr_info
                /* make sure the disk has been added and the drive is real
                 * because this can be called from the middle of init_one.
                 */
-               if (!(h->drv[curr_queue].queue) || !(h->drv[curr_queue].heads))
+               if (!(h->drv[curr_queue].queue) ||
+                       !(h->drv[curr_queue].heads) ||
+                       !h->drv[curr_queue].queue_ready)
                        continue;
                blk_start_queue(h->gendisk[curr_queue]->queue);

@@ -1394,6 +1396,11 @@ geo_inq:

        /* if it's the controller it's already added */
        if (drv_index) {
+
+               /* Prevent race with interrupt handler's queue starting code. */
+               h->drv[drv_index].queue_ready = 0;
+               wmb();
+
                disk->queue = blk_init_queue(do_cciss_request, &h->lock);
                sprintf(disk->disk_name, "cciss/c%dd%d", ctlr, drv_index);
                disk->major = h->major;
@@ -1420,6 +1427,11 @@ geo_inq:
                                        hba[ctlr]->drv[drv_index].block_size);

                h->drv[drv_index].queue = disk->queue;
+
+               /* Prevent race with interrupt handler's queue starting code. */
+               wmb();
+               h->drv[drv_index].queue_ready = 1;
+
                add_disk(disk);
        }

@@ -3473,6 +3485,11 @@ static int __devinit cciss_init_one(stru
                struct gendisk *disk = hba[i]->gendisk[j];
                struct request_queue *q;

+               /* prevent race with interrupt handler's */
+               /* queue starting code. */
+               drv->queue_ready = 0;
+               wmb();
+
                /* Check if the disk was allocated already */
                if (!disk){
                        hba[i]->gendisk[j] = alloc_disk(1 << NWD_SHIFT);
@@ -3520,6 +3537,12 @@ static int __devinit cciss_init_one(stru
                        continue;
                blk_queue_hardsect_size(q, drv->block_size);
                set_capacity(disk, drv->nr_blocks);
+
+               /* prevent race with interrupt handler's */
+               /* queue starting code. */
+               wmb();
+               drv->queue_ready = 1;
+
                add_disk(disk);
                j++;
        } while (j <= hba[i]->highest_lun);
diff -puN linux-2.6.25-rc9/drivers/block/cciss.h~cciss_init_one_race linux-2.6.25-rc9/drivers/block/cciss.h
--- linux-2.6.25-rc9/linux-2.6.25-rc9/drivers/block/cciss.h~cciss_init_one_race 2008-04-14 08:21:06.000000000 -0500
+++ linux-2.6.25-rc9-scameron/linux-2.6.25-rc9/drivers/block/cciss.h    2008-04-14 08:21:58.000000000 -0500
@@ -39,6 +39,9 @@ typedef struct _drive_info_struct
                                   *to prevent it from being opened or it's queue
                                   *from being started.
                                  */
+       int queue_ready; /* This is used to prevent the interrupt handler */
+                        /* from racing (while starting up queues) with */
+                        /* cciss_init_one() (while setting up new queues) */
 } drive_info_struct;

 #ifdef CONFIG_CISS_SCSI_TAPE
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jens Axboe  
View profile  
 More options Apr 14 2008, 1:10 pm
Newsgroups: linux.kernel
From: Jens Axboe <jens.ax...@oracle.com>
Date: Mon, 14 Apr 2008 19:10:13 +0200
Local: Mon, Apr 14 2008 1:10 pm
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

On Mon, Apr 14 2008, scame...@beardog.cca.cpqcorp.net wrote:

> Fix race condition between cciss_init_one(), cciss_update_drive_info(),
> and cciss_check_queues().  cciss_softirq_done would try to start
> queues which were not quite ready to be started, as its checks for
> readiness were not sufficiently synchronized with the queue initializing
> code in cciss_init_one and cciss_update_drive_info.  Slow cpu and
> large numbers of logical drives seem to make the race more likely
> to cause a problem.

Hmm, this seems backwards to me.  cciss_softirq_done() isn't going to
start the queues, until an irq has triggered for instance. Why isn't the
init properly ordered instead of band-aiding around this with a
'queue_ready' variable?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
scame...@beardog.cca.cpqcorp.net  
View profile  
 More options Apr 14 2008, 1:30 pm
Newsgroups: linux.kernel
From: scame...@beardog.cca.cpqcorp.net
Date: Mon, 14 Apr 2008 19:30:09 +0200
Local: Mon, Apr 14 2008 1:30 pm
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

Each call to add_disk() will trigger some interrupts,
and earlier added disks may cause the queues of later,
not-yet-completely added disks to be started.

I suppose the init routine might be reorganized to initialize all
the queues, then have second loop call add_disk() for all
of them.  Is that what you had in mind by "properly ordered?"

Disks may be added at run time though, and I think this tears
down all but the first disk, and re-adds them all, if I remember
right, so there is some complication there to think about.

-- steve

On Mon, Apr 14, 2008 at 07:05:05PM +0200, Jens Axboe wrote:
> --
> Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jens Axboe  
View profile  
 More options Apr 14 2008, 1:40 pm
Newsgroups: linux.kernel
From: Jens Axboe <jens.ax...@oracle.com>
Date: Mon, 14 Apr 2008 19:40:20 +0200
Local: Mon, Apr 14 2008 1:40 pm
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

Yep precisely, don't call add_disk() until everything is set up.

> Disks may be added at run time though, and I think this tears
> down all but the first disk, and re-adds them all, if I remember
> right, so there is some complication there to think about.

Well, other drivers manage quite fine without resorting to work-arounds
:-)

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
scame...@beardog.cca.cpqcorp.net  
View profile  
 More options Apr 14 2008, 1:50 pm
Newsgroups: linux.kernel
From: scame...@beardog.cca.cpqcorp.net
Date: Mon, 14 Apr 2008 19:50:10 +0200
Local: Mon, Apr 14 2008 1:50 pm
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

Ok.  Thanks for the constructive criticism.  I'll rethink it.

Fortunately, (or unfortunately) the race is apparently pretty hard
to trigger, it's been in there for ages, and we've only just seen it
manifest as a problem recently and only in one particular configuration.

-- steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jens Axboe  
View profile  
 More options Apr 14 2008, 2:00 pm
Newsgroups: linux.kernel
From: Jens Axboe <jens.ax...@oracle.com>
Date: Mon, 14 Apr 2008 20:00:17 +0200
Local: Mon, Apr 14 2008 2:00 pm
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

Hopefully that will not matter. If you rework the init code so that
everything is up and running before you allow any IO going on, then
it'll be easier to 'prove' that you can't hit such races. If you can
have disk added at runtime, make them go through the same init
process/function.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
scame...@beardog.cca.cpqcorp.net  
View profile  
 More options Apr 16 2008, 3:10 pm
Newsgroups: linux.kernel
From: scame...@beardog.cca.cpqcorp.net
Date: Wed, 16 Apr 2008 21:10:12 +0200
Local: Wed, Apr 16 2008 3:10 pm
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

Fix race condition between cciss_init_one(), cciss_update_drive_info(),
and cciss_check_queues().

Signed-off-by: Stephen M. Cameron <scame...@beardog.cca.cpqcorp.net>

---

 linux-2.6.25-rc9/drivers/block/cciss.c |   17 ++++++++++++++++-
 drivers/block/cciss.h                  |    0
 2 files changed, 16 insertions(+), 1 deletion(-)

diff -puN linux-2.6.25-rc9/drivers/block/cciss.c~cciss_init_one_race linux-2.6.25-rc9/drivers/block/cciss.c
--- linux-2.6.25-rc9/linux-2.6.25-rc9/drivers/block/cciss.c~cciss_init_one_race 2008-04-14 08:21:03.000000000 -0500
+++ linux-2.6.25-rc9-scameron/linux-2.6.25-rc9/drivers/block/cciss.c    2008-04-16 08:15:38.000000000 -0500
@@ -1349,6 +1349,10 @@ static void cciss_update_drive_info(int
                spin_lock_irqsave(CCISS_LOCK(h->ctlr), flags);
                h->drv[drv_index].busy_configuring = 1;
                spin_unlock_irqrestore(CCISS_LOCK(h->ctlr), flags);
+
+               /* deregister_disk sets h->drv[drv_index].queue = NULL */
+               /* which keeps the interrupt handler from starting */
+               /* the queue. */
                ret = deregister_disk(h->gendisk[drv_index],
                                      &h->drv[drv_index], 0);
                h->drv[drv_index].busy_configuring = 0;
@@ -1419,6 +1423,10 @@ geo_inq:
                blk_queue_hardsect_size(disk->queue,
                                        hba[ctlr]->drv[drv_index].block_size);

+               /* Make sure all queue data is written out before */
+               /* setting h->drv[drv_index].queue, as setting this */
+               /* allows the interrupt handler to start the queue */
+               wmb();
                h->drv[drv_index].queue = disk->queue;
                add_disk(disk);
        }
@@ -3520,10 +3528,17 @@ static int __devinit cciss_init_one(stru
                        continue;
                blk_queue_hardsect_size(q, drv->block_size);
                set_capacity(disk, drv->nr_blocks);
-               add_disk(disk);
                j++;
        } while (j <= hba[i]->highest_lun);

+       /* Make sure all queue data is written out before */
+       /* interrupt handler, triggered by add_disk,  */
+       /* is allowed to start them. */
+       wmb();
+
+       for (j = 0; j <= hba[i]->highest_lun; j++)
+               add_disk(hba[i]->gendisk[j]);
+
        return 1;

       clean4:
diff -puN linux-2.6.25-rc9/drivers/block/cciss.h~cciss_init_one_race linux-2.6.25-rc9/drivers/block/cciss.h
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "cciss: fix warning oops on rmmod of driver" by scame...@beardog.cca.cpqcorp.net
scame...@beardog.cca.cpqcorp.net  
View profile  
 More options Apr 16 2008, 3:10 pm
Newsgroups: linux.kernel
From: scame...@beardog.cca.cpqcorp.net
Date: Wed, 16 Apr 2008 21:10:15 +0200
Local: Wed, Apr 16 2008 3:10 pm
Subject: [patch] cciss: fix warning oops on rmmod of driver

* Fix oops on cciss rmmod due to calling pci_free_consistent with
  irqs disabled.

Signed-off-by: Stephen M. Cameron <scame...@beardog.cca.cpqcorp.net>
---

 linux-2.6.25-rc9/drivers/block/cciss_scsi.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN linux-2.6.25-rc9/drivers/block/cciss_scsi.c~pci_free_consistent_oops linux-2.6.25-rc9/drivers/block/cciss_scsi.c
--- root/linux-2.6.25-rc9/drivers/block/cciss_scsi.c~pci_free_consistent_oops   2008-04-16 12:58:53.000000000 -0500
+++ root-root/linux-2.6.25-rc9/drivers/block/cciss_scsi.c       2008-04-16 12:59:22.000000000 -0500
@@ -1349,9 +1349,9 @@ cciss_unregister_scsi(int ctlr)
        /* set scsi_host to NULL so our detect routine will
           find us on register */
        sa->scsi_host = NULL;
+       spin_unlock_irqrestore(CCISS_LOCK(ctlr), flags);
        scsi_cmd_stack_free(ctlr);
        kfree(sa);
-       spin_unlock_irqrestore(CCISS_LOCK(ctlr), flags);
 }

 static int
_
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "cciss: Fix race between disk-adding code and interrupt handler" by Miller, Mike (OS Dev)
Miller, Mike (OS Dev)  
View profile  
 More options Apr 16 2008, 3:10 pm
Newsgroups: linux.kernel
From: "Miller, Mike (OS Dev)" <Mike.Mil...@hp.com>
Date: Wed, 16 Apr 2008 21:10:18 +0200
Local: Wed, Apr 16 2008 3:10 pm
Subject: RE: [patch] cciss: Fix race between disk-adding code and interrupt handler

> -----Original Message-----
> From: scame...@beardog.cca.cpqcorp.net
> [mailto:scame...@beardog.cca.cpqcorp.net]
> Sent: Wednesday, April 16, 2008 1:59 PM
> To: Jens Axboe
> Cc: linux-ker...@vger.kernel.org; Miller, Mike (OS Dev);
> mi...@beardog.cca.cpqcorp.net
> Subject: Re: [patch] cciss: Fix race between disk-adding code
> and interrupt handler

> Fix race condition between cciss_init_one(),
> cciss_update_drive_info(), and cciss_check_queues().

> Signed-off-by: Stephen M. Cameron <scame...@beardog.cca.cpqcorp.net>

Acked-by: Mike Miller <mike.mil...@hp.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "cciss: fix warning oops on rmmod of driver" by Miller, Mike (OS Dev)
Miller, Mike (OS Dev)  
View profile  
 More options Apr 16 2008, 3:20 pm
Newsgroups: linux.kernel
From: "Miller, Mike (OS Dev)" <Mike.Mil...@hp.com>
Date: Wed, 16 Apr 2008 21:20:10 +0200
Local: Wed, Apr 16 2008 3:20 pm
Subject: RE: [patch] cciss: fix warning oops on rmmod of driver

> -----Original Message-----
> From: scame...@beardog.cca.cpqcorp.net
> [mailto:scame...@beardog.cca.cpqcorp.net]
> Sent: Wednesday, April 16, 2008 2:01 PM
> To: Jens Axboe
> Cc: linux-ker...@vger.kernel.org; Miller, Mike (OS Dev);
> mi...@beardog.cca.cpqcorp.net
> Subject: [patch] cciss: fix warning oops on rmmod of driver

> * Fix oops on cciss rmmod due to calling pci_free_consistent with
>   irqs disabled.

> Signed-off-by: Stephen M. Cameron <scame...@beardog.cca.cpqcorp.net>

Acked-by: Mike Miller <mike.mil...@hp.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "cciss: Fix race between disk-adding code and interrupt handler" by Jens Axboe
Jens Axboe  
View profile  
 More options Apr 17 2008, 7:20 am
Newsgroups: linux.kernel
From: Jens Axboe <jens.ax...@oracle.com>
Date: Thu, 17 Apr 2008 13:20:09 +0200
Local: Thurs, Apr 17 2008 7:20 am
Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler

On Wed, Apr 16 2008, scame...@beardog.cca.cpqcorp.net wrote:

> Fix race condition between cciss_init_one(), cciss_update_drive_info(),
> and cciss_check_queues().

That's certainly belt and suspenders, but it's looking much better than
the previous version. Applied, thanks.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "cciss: fix warning oops on rmmod of driver" by Jens Axboe
Jens Axboe  
View profile  
 More options Apr 17 2008, 7:20 am
Newsgroups: linux.kernel
From: Jens Axboe <jens.ax...@oracle.com>
Date: Thu, 17 Apr 2008 13:20:15 +0200
Local: Thurs, Apr 17 2008 7:20 am
Subject: Re: [patch] cciss: fix warning oops on rmmod of driver

On Wed, Apr 16 2008, scame...@beardog.cca.cpqcorp.net wrote:

> * Fix oops on cciss rmmod due to calling pci_free_consistent with
>   irqs disabled.

> Signed-off-by: Stephen M. Cameron <scame...@beardog.cca.cpqcorp.net>

Applied

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »