a deadlock (or corruption) bug in iscsid's logging

7 views
Skip to first unread message

guy keren

unread,
Nov 2, 2009, 7:54:07 PM11/2/09
to open-...@googlegroups.com

Hi,

the logging code in open-iscsi uses a "logarea" structure in shared
memory protected by a SysV semaphore (using semop system calls) - and
also places the sembuf array structure used in the semop calls in this
same shared-memory area (i.e. inside the logarea struct that is
allocated in shared memory at function logarea_init).

as a result, both the iscsid logging process and the iscsid control
process attempt to use this structure in a non-synchronized manner,
which is racy and may result either a deadlock or data corruption (we
saw these deadlocks several times).

the relevant code of the logging process is in usr/log.c, function
log_flush(). the relevant code of the control process is in the same
file, function dolog().

the deadlock senario:

1. the logging process has the semaphore held. the control process is
doing some work.
2. the logging process is about to release the semaphore. it sets
the sem_op parameter in the sembuf structure to '1'.
3. the control process now wants to add a logging record. it sets
the sem_op parameter in the sembuf structure to '-1'.
4. the control process invokes semop and gets blocked (because the
semaphore is held by the logging process).
5. the logging process invokes semop and also gets blocked for the
same reason.

we're in deadlock.

to get a data corruption, we'll need a slightly different scheduling -
i.e. that the process that wants to take the lock will update the sembuf
struct first - and then the process releasing the lock would modify
sem_op to '1' - and we'll have both processes increasing the semaphore's
value instead of one increasing and one decreasing it - and thus the
semaphore's value will later allow both processes to grab the semaphore
at the same time.

solution:

to solve this, i have created a local variable on the stack of both
processes, that is used with the semop calls. another possible solution
is to move the sembuf structure to a global variable that is NOT placed
in shared memory.

before i send a patch - is there a preference either way? or some other way?

thanks,
--guy

Ulrich Windl

unread,
Nov 4, 2009, 2:33:01 AM11/4/09
to open-...@googlegroups.com
On 3 Nov 2009 at 2:54, guy keren wrote:

>
>
> Hi,
>
> the logging code in open-iscsi uses a "logarea" structure in shared
> memory protected by a SysV semaphore (using semop system calls) - and
> also places the sembuf array structure used in the semop calls in this
> same shared-memory area (i.e. inside the logarea struct that is
> allocated in shared memory at function logarea_init).
>
> as a result, both the iscsid logging process and the iscsid control
> process attempt to use this structure in a non-synchronized manner,

The fact that the logging and the control process use the shared memory in a
unsynchronized way seems unrelated to the fact that both structures are located in
the same memory area, or I didn't understand your statement. For performance
reasons it seems wise to locate the controlling semaphores close to the area being
controlled.

> which is racy and may result either a deadlock or data corruption (we
> saw these deadlocks several times).
>
> the relevant code of the logging process is in usr/log.c, function
> log_flush(). the relevant code of the control process is in the same
> file, function dolog().
>
> the deadlock senario:
>
> 1. the logging process has the semaphore held. the control process is
> doing some work.
> 2. the logging process is about to release the semaphore. it sets
> the sem_op parameter in the sembuf structure to '1'.
> 3. the control process now wants to add a logging record. it sets
> the sem_op parameter in the sembuf structure to '-1'.

Ah, I understand: not the semaphore structure is in shared memory, but the
parameter structure for calling the semop(). OK, that's bad. Probably those
structures should be local (on the stack). I was confused with POSIX semaphores
where shared memory is required.

> 4. the control process invokes semop and gets blocked (because the
> semaphore is held by the logging process).
> 5. the logging process invokes semop and also gets blocked for the
> same reason.
>
> we're in deadlock.

Good spotting!

Ulrich

guy keren

unread,
Nov 22, 2009, 6:47:46 AM11/22/09
to open-...@googlegroups.com
(finally) attached is the patch:

each process must have its own semarg structure - or they step on each
others' toes - which could cause either deadlocks or smearing of the
shared memory protected by the semaphore.

Signed-off-by: guy keren <ch...@actcom.co.il>

--guy
0001-do-not-use-a-semarg-in-shared-mem-for-semop-calls.patch

Mike Christie

unread,
Dec 7, 2009, 11:59:03 AM12/7/09
to open-...@googlegroups.com
Sorry for the late reply. Looks ok. Merged in
3300b26934848cb674390d86f09a91edf2c71980.
Reply all
Reply to author
Forward
0 new messages