Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
a deadlock (or corruption) bug in iscsid's logging
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
guy keren  
View profile  
 More options Nov 2, 7:54 pm
From: guy keren <c...@actcom.co.il>
Date: Tue, 03 Nov 2009 02:54:07 +0200
Local: Mon, Nov 2 2009 7:54 pm
Subject: a deadlock (or corruption) bug in iscsid's logging

Hi,

the logging code in open-iscsi uses a "logarea" structure in shared
memory protected by a SysV semaphore (using semop system calls) - and
also places the sembuf array structure used in the semop calls in this
same shared-memory area (i.e. inside the logarea struct that is
allocated in shared memory at function logarea_init).

as a result, both the iscsid logging process and the iscsid control
process attempt to use this structure in a non-synchronized manner,
which is racy and may result either a deadlock or data corruption (we
saw these deadlocks several times).

the relevant code of the logging process is in usr/log.c, function
log_flush(). the relevant code of the control process is in the same
file, function dolog().

the deadlock senario:

    1. the logging process has the semaphore held. the control process is
       doing some work.
    2. the logging process is about to release the semaphore. it sets
the sem_op parameter in the sembuf structure to '1'.
    3. the control process now wants to add a logging record. it sets
the sem_op parameter in the sembuf structure to '-1'.
    4. the control process invokes semop and gets blocked (because the
semaphore is held by the logging process).
    5. the logging process invokes semop and also gets blocked for the
same reason.

    we're in deadlock.

to get a data corruption, we'll need a slightly different scheduling -
i.e. that the process that wants to take the lock will update the sembuf
struct first - and then the process releasing the lock would modify
sem_op to '1' - and we'll have both processes increasing the semaphore's
value instead of one increasing and one decreasing it - and thus the
semaphore's value will later allow both processes to grab the semaphore
at the same time.

solution:

to solve this, i have created a local variable on the stack of both
processes, that is used with the semop calls. another possible solution
is to move the sembuf structure to a global variable that is NOT placed
in shared memory.

before i send a patch - is there a preference either way? or some other way?

thanks,
--guy


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ulrich Windl  
View profile  
 More options Nov 4, 2:33 am
From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de>
Date: Wed, 04 Nov 2009 08:33:01 +0100
Local: Wed, Nov 4 2009 2:33 am
Subject: Re: a deadlock (or corruption) bug in iscsid's logging
On 3 Nov 2009 at 2:54, guy keren wrote:

> Hi,

> the logging code in open-iscsi uses a "logarea" structure in shared
> memory protected by a SysV semaphore (using semop system calls) - and
> also places the sembuf array structure used in the semop calls in this
> same shared-memory area (i.e. inside the logarea struct that is
> allocated in shared memory at function logarea_init).

> as a result, both the iscsid logging process and the iscsid control
> process attempt to use this structure in a non-synchronized manner,

The fact that the logging and the control process use the shared memory in a
unsynchronized way seems unrelated to the fact that both structures are located in
the same memory area, or I didn't understand your statement. For performance
reasons it seems wise to locate the controlling semaphores close to the area being
controlled.

> which is racy and may result either a deadlock or data corruption (we
> saw these deadlocks several times).

> the relevant code of the logging process is in usr/log.c, function
> log_flush(). the relevant code of the control process is in the same
> file, function dolog().

> the deadlock senario:

>     1. the logging process has the semaphore held. the control process is
>        doing some work.
>     2. the logging process is about to release the semaphore. it sets
> the sem_op parameter in the sembuf structure to '1'.
>     3. the control process now wants to add a logging record. it sets
> the sem_op parameter in the sembuf structure to '-1'.

Ah, I understand: not the semaphore structure is in shared memory, but the
parameter structure for calling the semop(). OK, that's bad. Probably those
structures should be local (on the stack). I was confused with POSIX semaphores
where shared memory is required.

>     4. the control process invokes semop and gets blocked (because the
> semaphore is held by the logging process).
>     5. the logging process invokes semop and also gets blocked for the
> same reason.

>     we're in deadlock.

Good spotting!

Ulrich


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
guy keren  
View profile  
 More options Nov 22, 6:47 am
From: guy keren <c...@actcom.co.il>
Date: Sun, 22 Nov 2009 13:47:46 +0200
Local: Sun, Nov 22 2009 6:47 am
Subject: Re: a deadlock (or corruption) bug in iscsid's logging

(finally) attached is the patch:

each process must have its own semarg structure - or they step on each
others' toes - which could cause either deadlocks or smearing of the
shared memory protected by the semaphore.

Signed-off-by: guy keren <c...@actcom.co.il>

--guy

  0001-do-not-use-a-semarg-in-shared-mem-for-semop-calls.patch
1K Download

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google