Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
lockdep trace from posix timers
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 48 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Dave Jones  
View profile  
 More options Jul 24 2012, 4:40 pm
Newsgroups: linux.kernel
From: Dave Jones <da...@redhat.com>
Date: Tue, 24 Jul 2012 22:40:02 +0200
Local: Tues, Jul 24 2012 4:40 pm
Subject: lockdep trace from posix timers
Linus tree as of 5fecc9d8f59e765c2a48379dd7c6f5cf88c7d75a

        Dave

======================================================
[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
3.5.0+ #122 Not tainted
------------------------------------------------------
trinity-child2/5327 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
blocked:  (tasklist_lock){.+.+..}, instance: ffffffff81c05098, at: [<ffffffff8109762b>] posix_cpu_timer_del+0x2b/0xe0

and this task is already holding:
blocked:  (&(&new_timer->it_lock)->rlock){-.-...}, instance: ffff880143bce170, at: [<ffffffff81093d49>] __lock_timer+0x89/0x1f0
which would create a new lock dependency:
 (&(&new_timer->it_lock)->rlock){-.-...} -> (tasklist_lock){.+.+..}

but this new dependency connects a HARDIRQ-irq-safe lock:
 (&(&new_timer->it_lock)->rlock){-.-...}
... which became HARDIRQ-irq-safe at:
  [<ffffffff810d8e31>] __lock_acquire+0x7e1/0x1ae0
  [<ffffffff810da83d>] lock_acquire+0xad/0x220
  [<ffffffff8168a15d>] _raw_spin_lock_irqsave+0x7d/0xd0
  [<ffffffff81093c17>] posix_timer_fn+0x37/0xe0
  [<ffffffff8109a534>] __run_hrtimer+0x94/0x4c0
  [<ffffffff8109b157>] hrtimer_interrupt+0xf7/0x230
  [<ffffffff816953a9>] smp_apic_timer_interrupt+0x69/0x99
  [<ffffffff8169402f>] apic_timer_interrupt+0x6f/0x80
  [<ffffffff811644e9>] __generic_file_aio_write+0x239/0x450
  [<ffffffff81164773>] generic_file_aio_write+0x73/0xe0
  [<ffffffff81258832>] ext4_file_write+0xc2/0x280
  [<ffffffff811cfb76>] do_sync_write+0xe6/0x120
  [<ffffffff811d04af>] vfs_write+0xaf/0x190
  [<ffffffff811d07ed>] sys_write+0x4d/0x90
  [<ffffffff8169336d>] system_call_fastpath+0x1a/0x1f

to a HARDIRQ-irq-unsafe lock:
 (&(&p->alloc_lock)->rlock){+.+...}
... which became HARDIRQ-irq-unsafe at:
...  [<ffffffff810d8c37>] __lock_acquire+0x5e7/0x1ae0
  [<ffffffff810da83d>] lock_acquire+0xad/0x220
  [<ffffffff816895f6>] _raw_spin_lock+0x46/0x80
  [<ffffffff811d8882>] set_task_comm+0x32/0x180
  [<ffffffff81095848>] kthreadd+0x38/0x2e0
  [<ffffffff81694934>] kernel_thread_helper+0x4/0x10

other info that might help us debug this:

Chain exists of:
  &(&new_timer->it_lock)->rlock --> tasklist_lock --> &(&p->alloc_lock)->rlock

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&p->alloc_lock)->rlock);
                               local_irq_disable();
                               lock(&(&new_timer->it_lock)->rlock);
                               lock(tasklist_lock);
  <Interrupt>
    lock(&(&new_timer->it_lock)->rlock);

 *** DEADLOCK ***

1 lock on stack by trinity-child2/5327:
 #0: blocked:  (&(&new_timer->it_lock)->rlock){-.-...}, instance: ffff880143bce170, at: [<ffffffff81093d49>] __lock_timer+0x89/0x1f0

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (&(&new_timer->it_lock)->rlock){-.-...} ops: 584 {
   IN-HARDIRQ-W at:
                    [<ffffffff810d8e31>] __lock_acquire+0x7e1/0x1ae0
                    [<ffffffff810da83d>] lock_acquire+0xad/0x220
                    [<ffffffff8168a15d>] _raw_spin_lock_irqsave+0x7d/0xd0
                    [<ffffffff81093c17>] posix_timer_fn+0x37/0xe0
                    [<ffffffff8109a534>] __run_hrtimer+0x94/0x4c0
                    [<ffffffff8109b157>] hrtimer_interrupt+0xf7/0x230
                    [<ffffffff816953a9>] smp_apic_timer_interrupt+0x69/0x99
                    [<ffffffff8169402f>] apic_timer_interrupt+0x6f/0x80
                    [<ffffffff811644e9>] __generic_file_aio_write+0x239/0x450
                    [<ffffffff81164773>] generic_file_aio_write+0x73/0xe0
                    [<ffffffff81258832>] ext4_file_write+0xc2/0x280
                    [<ffffffff811cfb76>] do_sync_write+0xe6/0x120
                    [<ffffffff811d04af>] vfs_write+0xaf/0x190
                    [<ffffffff811d07ed>] sys_write+0x4d/0x90
                    [<ffffffff8169336d>] system_call_fastpath+0x1a/0x1f
   IN-SOFTIRQ-W at:
                    [<ffffffff810d8bfe>] __lock_acquire+0x5ae/0x1ae0
                    [<ffffffff810da83d>] lock_acquire+0xad/0x220
                    [<ffffffff8168a15d>] _raw_spin_lock_irqsave+0x7d/0xd0
                    [<ffffffff81093c17>] posix_timer_fn+0x37/0xe0
                    [<ffffffff8109a534>] __run_hrtimer+0x94/0x4c0
                    [<ffffffff8109b157>] hrtimer_interrupt+0xf7/0x230
                    [<ffffffff8109b2bb>] __hrtimer_peek_ahead_timers.part.27+0x2b/0x30
                    [<ffffffff8109b309>] hrtimer_peek_ahead_timers+0x49/0xa0
                    [<ffffffff8109b391>] run_hrtimer_softirq+0x31/0x40
                    [<ffffffff810746e0>] __do_softirq+0xf0/0x400
                    [<ffffffff81074b2c>] run_ksoftirqd+0x13c/0x230
                    [<ffffffff81095517>] kthread+0xb7/0xc0
                    [<ffffffff81694934>] kernel_thread_helper+0x4/0x10
   INITIAL USE at:
                   [<ffffffff810d8956>] __lock_acquire+0x306/0x1ae0
                   [<ffffffff810da83d>] lock_acquire+0xad/0x220
                   [<ffffffff8168a15d>] _raw_spin_lock_irqsave+0x7d/0xd0
                   [<ffffffff81093d49>] __lock_timer+0x89/0x1f0
                   [<ffffffff810947a4>] sys_timer_delete+0x44/0x100
                   [<ffffffff8169336d>] system_call_fastpath+0x1a/0x1f
 }
 ... key      at: [<ffffffff820ac8a0>] __key.29841+0x0/0x8
 ... acquired at:
   [<ffffffff810d820b>] check_irq_usage+0x5b/0xe0
   [<ffffffff810d93da>] __lock_acquire+0xd8a/0x1ae0
   [<ffffffff810da83d>] lock_acquire+0xad/0x220
   [<ffffffff81689b59>] _raw_read_lock+0x49/0x80
   [<ffffffff8109762b>] posix_cpu_timer_del+0x2b/0xe0
   [<ffffffff81094786>] sys_timer_delete+0x26/0x100
   [<ffffffff8169336d>] system_call_fastpath+0x1a/0x1f

the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock:
 -> (&(&p->alloc_lock)->rlock){+.+...} ops: 1189942 {
    HARDIRQ-ON-W at:
                      [<ffffffff810d8c37>] __lock_acquire+0x5e7/0x1ae0
                      [<ffffffff810da83d>] lock_acquire+0xad/0x220
                      [<ffffffff816895f6>] _raw_spin_lock+0x46/0x80
                      [<ffffffff811d8882>] set_task_comm+0x32/0x180
                      [<ffffffff81095848>] kthreadd+0x38/0x2e0
                      [<ffffffff81694934>] kernel_thread_helper+0x4/0x10
    SOFTIRQ-ON-W at:
                      [<ffffffff810d8c6c>] __lock_acquire+0x61c/0x1ae0
                      [<ffffffff810da83d>] lock_acquire+0xad/0x220
                      [<ffffffff816895f6>] _raw_spin_lock+0x46/0x80
                      [<ffffffff811d8882>] set_task_comm+0x32/0x180
                      [<ffffffff81095848>] kthreadd+0x38/0x2e0
                      [<ffffffff81694934>] kernel_thread_helper+0x4/0x10
    INITIAL USE at:
                     [<ffffffff810d8956>] __lock_acquire+0x306/0x1ae0
                     [<ffffffff810da83d>] lock_acquire+0xad/0x220
                     [<ffffffff816895f6>] _raw_spin_lock+0x46/0x80
                     [<ffffffff811d8882>] set_task_comm+0x32/0x180
                     [<ffffffff81095848>] kthreadd+0x38/0x2e0
                     [<ffffffff81694934>] kernel_thread_helper+0x4/0x10
  }
  ... key      at: [<ffffffff820883a0>] __key.45832+0x0/0x8
  ... acquired at:
   [<ffffffff810da83d>] lock_acquire+0xad/0x220
   [<ffffffff816895f6>] _raw_spin_lock+0x46/0x80
   [<ffffffff812d5f2e>] keyctl_session_to_parent+0xde/0x490
   [<ffffffff812d634d>] sys_keyctl+0x6d/0x1a0
   [<ffffffff8169336d>] system_call_fastpath+0x1a/0x1f

-> (tasklist_lock){.+.+..} ops: 35423 {
   HARDIRQ-ON-R at:
                    [<ffffffff810d8b1e>] __lock_acquire+0x4ce/0x1ae0
                    [<ffffffff810da83d>] lock_acquire+0xad/0x220
                    [<ffffffff81689b59>] _raw_read_lock+0x49/0x80
                    [<ffffffff810701a2>] do_wait+0xb2/0x360
                    [<ffffffff81072095>] sys_wait4+0x75/0xf0
                    [<ffffffff8108a482>] wait_for_helper+0x82/0xb0
                    [<ffffffff81694934>] kernel_thread_helper+0x4/0x10
   SOFTIRQ-ON-R at:
                    [<ffffffff810d8c6c>] __lock_acquire+0x61c/0x1ae0
                    [<ffffffff810da83d>] lock_acquire+0xad/0x220
                    [<ffffffff81689b59>] _raw_read_lock+0x49/0x80
                    [<ffffffff810701a2>] do_wait+0xb2/0x360
                    [<ffffffff81072095>] sys_wait4+0x75/0xf0
                    [<ffffffff8108a482>] wait_for_helper+0x82/0xb0
                    [<ffffffff81694934>] kernel_thread_helper+0x4/0x10
   INITIAL USE at:
                   [<ffffffff810d8956>] __lock_acquire+0x306/0x1ae0
                   [<ffffffff810da83d>] lock_acquire+0xad/0x220
                   [<ffffffff81689f0c>] _raw_write_lock_irq+0x5c/0xa0
                   [<ffffffff81068a41>] copy_process.part.22+0x1051/0x1750
                   [<ffffffff810692d0>] do_fork+0x140/0x4e0
                   [<ffffffff81024606>] kernel_thread+0x76/0x80
                   [<ffffffff81664602>] rest_init+0x26/0x154
                   [<ffffffff81efecb3>] start_kernel+0x3f8/0x405
                   [<ffffffff81efe356>] x86_64_start_reservations+0x131/0x135
                   [<ffffffff81efe4a2>] x86_64_start_kernel+0x148/0x157
 }
 ... key      at: [<ffffffff81c05098>] tasklist_lock+0x18/0x80
 ... acquired at:
   [<ffffffff810d820b>] check_irq_usage+0x5b/0xe0
   [<ffffffff810d93da>] __lock_acquire+0xd8a/0x1ae0
   [<ffffffff810da83d>] lock_acquire+0xad/0x220
   [<ffffffff81689b59>] _raw_read_lock+0x49/0x80
   [<ffffffff8109762b>] posix_cpu_timer_del+0x2b/0xe0
   [<ffffffff81094786>] sys_timer_delete+0x26/0x100
   [<ffffffff8169336d>] system_call_fastpath+0x1a/0x1f

stack backtrace:
Pid: 5327, comm: trinity-child2 Not tainted 3.5.0+ #122
Call Trace:
 [<ffffffff810d8194>] check_usage+0x4e4/0x500
 [<ffffffff81023729>] ? native_sched_clock+0x19/0x80
...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Jones  
View profile  
 More options Jul 27 2012, 12:30 pm
Newsgroups: linux.kernel
From: Dave Jones <da...@redhat.com>
Date: Fri, 27 Jul 2012 18:30:01 +0200
Local: Fri, Jul 27 2012 12:30 pm
Subject: Re: lockdep trace from posix timers
On Tue, Jul 24, 2012 at 04:36:13PM -0400, Dave Jones wrote:

 > Linus tree as of 5fecc9d8f59e765c2a48379dd7c6f5cf88c7d75a
 >
 >   Dave
 >
 > ======================================================
 > [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
 > 3.5.0+ #122 Not tainted
 > ------------------------------------------------------
 > trinity-child2/5327 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 > blocked:  (tasklist_lock){.+.+..}, instance: ffffffff81c05098, at: [<ffffffff8109762b>] posix_cpu_timer_del+0x2b/0xe0
 >
 > and this task is already holding:
 > blocked:  (&(&new_timer->it_lock)->rlock){-.-...}, instance: ffff880143bce170, at: [<ffffffff81093d49>] __lock_timer+0x89/0x1f0
 > which would create a new lock dependency:
 >  (&(&new_timer->it_lock)->rlock){-.-...} -> (tasklist_lock){.+.+..}
 >
 > but this new dependency connects a HARDIRQ-irq-safe lock:
 >  (&(&new_timer->it_lock)->rlock){-.-...}
 > ... which became HARDIRQ-irq-safe at:

Shall I start bisecting this ? I can trigger it very easily, but if you
can give me a set of commits to narrow down, it'll speed up the bisection.

        Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ming Lei  
View profile  
 More options Aug 16 2012, 9:00 am
Newsgroups: linux.kernel
From: Ming Lei <tom.leim...@gmail.com>
Date: Thu, 16 Aug 2012 15:00:02 +0200
Local: Thurs, Aug 16 2012 9:00 am
Subject: Re: lockdep trace from posix timers

It should a real possible deadlock, could you test the below patch to
see if it can fix the warning?

--
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 125cb67..29f6a8e 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -420,7 +420,7 @@ static int posix_cpu_timer_del(struct k_itimer *timer)
        int ret = 0;

        if (likely(p != NULL)) {
-               read_lock(&tasklist_lock);
+               /* tasklist_lock held already in timer_delete */
                if (unlikely(p->sighand == NULL)) {
                        /*
                         * We raced with the reaping of the task.
@@ -435,7 +435,6 @@ static int posix_cpu_timer_del(struct k_itimer *timer)
                                list_del(&timer->it.cpu.entry);
                        spin_unlock(&p->sighand->siglock);
                }
-               read_unlock(&tasklist_lock);

                if (!ret)
                        put_task_struct(p);
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 69185ae..222d24c 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -884,15 +884,31 @@ SYSCALL_DEFINE1(timer_delete, timer_t, timer_id)
        struct k_itimer *timer;
        unsigned long flags;

+       /*
+        * hold tasklist_lock to protect tsk->sighand which might be
+        * accessed inside ->timer_del. It should be held before
+        * timer->it_lock to avoid the below deadlock:
+        *      CPU0                    CPU1
+        *      lock(tasklist_lock)
+        *                              timer_delete()
+        *                                      lock(timer->it_lock)
+        *                                      lock(tasklist_lock)
+        *      <timer interrupt>
+        *              lock(timer->it_lock)
+        */
+       read_lock(&tasklist_lock);
 retry_delete:
        timer = lock_timer(timer_id, &flags);
-       if (!timer)
+       if (!timer) {
+               read_unlock(&tasklist_lock);
                return -EINVAL;
+       }

        if (timer_delete_hook(timer) == TIMER_RETRY) {
                unlock_timer(timer, flags);
                goto retry_delete;
        }
+       read_unlock(&tasklist_lock);

        spin_lock(&current->sighand->siglock);
        list_del(&timer->list);

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Jones  
View profile  
 More options Aug 16 2012, 10:10 am
Newsgroups: linux.kernel
From: Dave Jones <da...@redhat.com>
Date: Thu, 16 Aug 2012 16:10:02 +0200
Local: Thurs, Aug 16 2012 10:10 am
Subject: Re: lockdep trace from posix timers
On Thu, Aug 16, 2012 at 08:54:31PM +0800, Ming Lei wrote:

 > On Sat, Jul 28, 2012 at 12:20 AM, Dave Jones <da...@redhat.com> wrote:
 > > On Tue, Jul 24, 2012 at 04:36:13PM -0400, Dave Jones wrote:
 > >  > Linus tree as of 5fecc9d8f59e765c2a48379dd7c6f5cf88c7d75a
 > >  >
 > >  >      Dave
 > >  >
 > >  > ======================================================
 > >  > [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
 > >  > 3.5.0+ #122 Not tainted
 > >  > ------------------------------------------------------
 > >  > trinity-child2/5327 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 > >  > blocked:  (tasklist_lock){.+.+..}, instance: ffffffff81c05098, at: [<ffffffff8109762b>] posix_cpu_timer_del+0x2b/0xe0
 > >  >
 > >  > and this task is already holding:
 > >  > blocked:  (&(&new_timer->it_lock)->rlock){-.-...}, instance: ffff880143bce170, at: [<ffffffff81093d49>] __lock_timer+0x89/0x1f0
 > >  > which would create a new lock dependency:
 > >  >  (&(&new_timer->it_lock)->rlock){-.-...} -> (tasklist_lock){.+.+..}
 > >  >
 > >  > but this new dependency connects a HARDIRQ-irq-safe lock:
 > >  >  (&(&new_timer->it_lock)->rlock){-.-...}
 > >  > ... which became HARDIRQ-irq-safe at:
 > >
 > > Shall I start bisecting this ? I can trigger it very easily, but if you
 > > can give me a set of commits to narrow down, it'll speed up the bisection.
 >
 > It should a real possible deadlock, could you test the below patch to
 > see if it can fix the warning?

I've not managed to hit it in a while. It seems very dependant upon
specific builds for some reason.  Very strange.

        Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 16 2012, 2:10 pm
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Thu, 16 Aug 2012 20:10:02 +0200
Local: Thurs, Aug 16 2012 2:10 pm
Subject: Re: lockdep trace from posix timers

So we have:

 sys_keyctl()
   keyctl_session_to_parent()
     write_lock_irq(&tasklist_lock)
     task_lock(parent)          parent->alloc_lock

VS

  sys_timer_delete()
    lock_timer()                timer->it_lock
    posix_cpu_timer_del()
      read_lock(&tasklist_lock)

Creating:

  timer->it_lock -> tasklist_lock -> task->alloc_lock

And since it_lock is IRQ-safe and alloc_lock isn't, you've got the IRQ
inversion deadlock reported.

The task_lock() in keyctl_session_to_parent() comes from Al who didn't
think it necessary to write a changelog in d35abdb2.

David, Al, anybody want to have a go at fixing this?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oleg Nesterov  
View profile  
 More options Aug 17 2012, 11:20 am
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Fri, 17 Aug 2012 17:20:02 +0200
Local: Fri, Aug 17 2012 11:20 am
Subject: Re: lockdep trace from posix timers
On 08/16, Peter Zijlstra wrote:

>      write_lock_irq(&tasklist_lock)
>      task_lock(parent)             parent->alloc_lock

And this is already wrong. See the comment above task_lock().

> And since it_lock is IRQ-safe and alloc_lock isn't, you've got the IRQ
> inversion deadlock reported.

Yes. Or, IOW, write_lock(tasklist) is IRQ-safe and thus it can't nest
with alloc_lock.

> David, Al, anybody want to have a go at fixing this?

I still think that task_work_add() should synhronize with exit_task_work()
itself and fail if necessary. But I wasn't able to convince Al ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oleg Nesterov  
View profile  
 More options Aug 17 2012, 11:30 am
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Fri, 17 Aug 2012 17:30:02 +0200
Local: Fri, Aug 17 2012 11:30 am
Subject: Re: lockdep trace from posix timers
On 08/17, Oleg Nesterov wrote:

And this is my old patch: http://marc.info/?l=linux-kernel&m=134082268721700
It should be re-diffed of course.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "task_work_add() should not succeed unconditionally (Was: lockdep trace from posix timers)" by Oleg Nesterov
Oleg Nesterov  
View profile  
 More options Aug 17 2012, 12:50 pm
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Fri, 17 Aug 2012 18:50:01 +0200
Local: Fri, Aug 17 2012 12:50 pm
Subject: task_work_add() should not succeed unconditionally (Was: lockdep trace from posix timers)
On 08/17, Oleg Nesterov wrote:

Something like below. Uncompiled/untested, I need to re-check and test.
Now we can remove that task_lock() and rely on task_work_add().

Al, what do you think?

Oleg.

--- x/include/linux/task_work.h
+++ x/include/linux/task_work.h
@@ -18,8 +18,7 @@ void task_work_run(void);

 static inline void exit_task_work(struct task_struct *task)
 {
-       if (unlikely(task->task_works))
-               task_work_run();
+       task_work_run();
 }

 #endif /* _LINUX_TASK_WORK_H */
--- x/kernel/task_work.c
+++ x/kernel/task_work.c
@@ -2,29 +2,35 @@
 #include <linux/task_work.h>
 #include <linux/tracehook.h>

+#define TWORK_EXITED   ((struct callback_head *)1)
+
 int
 task_work_add(struct task_struct *task, struct callback_head *twork, bool notify)
 {
        struct callback_head *last, *first;
        unsigned long flags;
+       int err = -ESRCH;

        /*
-        * Not inserting the new work if the task has already passed
-        * exit_task_work() is the responisbility of callers.
+        * We must not insert the new work if the exiting task has already
+        * passed task_work_run().
         */
        raw_spin_lock_irqsave(&task->pi_lock, flags);
-       last = task->task_works;
-       first = last ? last->next : twork;
-       twork->next = first;
-       if (last)
-               last->next = twork;
-       task->task_works = twork;
+       if (likely(task->task_works != TWORK_EXITED) {
+               last = task->task_works;
+               first = last ? last->next : twork;
+               twork->next = first;
+               if (last)
+                       last->next = twork;
+               task->task_works = twork;
+               err = 0;
+       }
        raw_spin_unlock_irqrestore(&task->pi_lock, flags);

        /* test_and_set_bit() implies mb(), see tracehook_notify_resume(). */
-       if (notify)
+       if (!err && notify)
                set_notify_resume(task);
-       return 0;
+       return err;
 }

 struct callback_head *
@@ -35,7 +41,7 @@ task_work_cancel(struct task_struct *tas

        raw_spin_lock_irqsave(&task->pi_lock, flags);
        last = task->task_works;
-       if (last) {
+       if (last && last != TWORK_EXITED) {
                struct callback_head *q = last, *p = q->next;
                while (1) {
                        if (p->func == func) {
@@ -63,7 +69,12 @@ void task_work_run(void)
        while (1) {
                raw_spin_lock_irq(&task->pi_lock);
                p = task->task_works;
-               task->task_works = NULL;
+               /*
+                * twork->func() can do task_work_add(), do not
+                * set TWORK_EXITED until the list becomes empty.
+                */
+               task->task_works = (!p && (task->flags & PF_EXITING))
+                                       ? TWORK_EXITED : NULL;
                raw_spin_unlock_irq(&task->pi_lock);

                if (unlikely(!p))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "lockdep trace from posix timers" by Peter Zijlstra
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 3:20 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 09:20:02 +0200
Local: Mon, Aug 20 2012 3:20 am
Subject: Re: lockdep trace from posix timers

On Fri, 2012-08-17 at 17:14 +0200, Oleg Nesterov wrote:
> I still think that task_work_add() should synhronize with exit_task_work()
> itself and fail if necessary. But I wasn't able to convince Al ;)

I'm not at all sure how that relates to needing task_lock() in the
keyctl stuff.

Also, can't task_work use llist stuff? That would also avoid using
->pi_lock.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 7:50 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 13:50:02 +0200
Local: Mon, Aug 20 2012 7:50 am
Subject: Re: lockdep trace from posix timers

On Mon, 2012-08-20 at 09:15 +0200, Peter Zijlstra wrote:
> On Fri, 2012-08-17 at 17:14 +0200, Oleg Nesterov wrote:
> > I still think that task_work_add() should synhronize with exit_task_work()
> > itself and fail if necessary. But I wasn't able to convince Al ;)

> I'm not at all sure how that relates to needing task_lock() in the
> keyctl stuff.

> Also, can't task_work use llist stuff? That would also avoid using
> ->pi_lock.

How about something like the below?

---
 include/linux/task_work.h |   7 +--
 kernel/exit.c             |   2 +-
 kernel/task_work.c        | 120 ++++++++++++++++++++++++----------------------
 3 files changed, 65 insertions(+), 64 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index fb46b03..f365416 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -15,11 +15,6 @@ init_task_work(struct callback_head *twork, task_work_func_t func)
 int task_work_add(struct task_struct *task, struct callback_head *twork, bool);
 struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
 void task_work_run(void);
-
-static inline void exit_task_work(struct task_struct *task)
-{
-       if (unlikely(task->task_works))
-               task_work_run();
-}
+void task_work_exit(void);

 #endif /* _LINUX_TASK_WORK_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index f65345f..92aa94b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -992,7 +992,7 @@ void do_exit(long code)
        exit_shm(tsk);
        exit_files(tsk);
        exit_fs(tsk);
-       exit_task_work(tsk);
+       task_work_exit();
        check_stack_usage();
        exit_thread();

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 91d4e17..e5eac14 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -2,79 +2,85 @@
 #include <linux/task_work.h>
 #include <linux/tracehook.h>

+static void task_work_nop(struct callback_head *work)
+{
+}
+
+static struct callback_head dead = {
+       .next = NULL,
+       .func = task_work_nop,
+};
+
 int
-task_work_add(struct task_struct *task, struct callback_head *twork, bool notify)
+task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
 {
-       struct callback_head *last, *first;
-       unsigned long flags;
-
-       /*
-        * Not inserting the new work if the task has already passed
-        * exit_task_work() is the responisbility of callers.
-        */
-       raw_spin_lock_irqsave(&task->pi_lock, flags);
-       last = task->task_works;
-       first = last ? last->next : twork;
-       twork->next = first;
-       if (last)
-               last->next = twork;
-       task->task_works = twork;
-       raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+       struct callback_head **head = &task->task_works;
+       struct callback_head *entry, *old_entry;
+
+       entry = &head;
+       for (;;) {
+               if (entry == &dead)
+                       return -ESRCH;
+
+               old_entry = entry;
+               work->next = entry;
+               entry = cmpxchg(head, old_entry, work);
+               if (entry == old_entry)
+                       break;
+       }

        /* test_and_set_bit() implies mb(), see tracehook_notify_resume(). */
        if (notify)
                set_notify_resume(task);
+
        return 0;
 }

 struct callback_head *
 task_work_cancel(struct task_struct *task, task_work_func_t func)
 {
-       unsigned long flags;
-       struct callback_head *last, *res = NULL;
-
-       raw_spin_lock_irqsave(&task->pi_lock, flags);
-       last = task->task_works;
-       if (last) {
-               struct callback_head *q = last, *p = q->next;
-               while (1) {
-                       if (p->func == func) {
-                               q->next = p->next;
-                               if (p == last)
-                                       task->task_works = q == p ? NULL : q;
-                               res = p;
-                               break;
-                       }
-                       if (p == last)
-                               break;
-                       q = p;
-                       p = q->next;
+       struct callback_head **workp, *work;
+
+again:
+       workp = &task->task_works;
+       work = *workp;
+       while (work) {
+               if (work->func == func) {
+                       if (cmpxchg(workp, work, work->next) == work)
+                               return work;
+                       goto again;
                }
+
+               workp = &work->next;
+               work = *workp;
        }
-       raw_spin_unlock_irqrestore(&task->pi_lock, flags);
-       return res;
+
+       return NULL;
 }

-void task_work_run(void)
+static void __task_work_run(struct callback_head *tail)
 {
-       struct task_struct *task = current;
-       struct callback_head *p, *q;
-
-       while (1) {
-               raw_spin_lock_irq(&task->pi_lock);
-               p = task->task_works;
-               task->task_works = NULL;
-               raw_spin_unlock_irq(&task->pi_lock);
-
-               if (unlikely(!p))
-                       return;
-
-               q = p->next; /* head */
-               p->next = NULL; /* cut it */
-               while (q) {
-                       p = q->next;
-                       q->func(q);
-                       q = p;
+       struct callback_head **head = &current->task_works;
+
+       do {
+               struct callback_head *work = xchg(head, NULL);
+               while (work) {
+                       struct callback_head *next = ACCESS_ONCE(work->next);
+
+                       WARN_ON_ONCE(work == &dead);
+
+                       work->func(work);
+                       work = next;
                }
-       }
+       } while (cmpxchg(head, NULL, tail) != NULL);
+}
+
+void task_work_run(void)
+{
+       __task_work_run(NULL);
+}
+
+void task_work_exit(void)
+{
+       __task_work_run(&dead);
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 7:50 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 13:50:02 +0200
Local: Mon, Aug 20 2012 7:50 am
Subject: Re: lockdep trace from posix timers

On Mon, 2012-08-20 at 13:44 +0200, Peter Zijlstra wrote:
> > I'm not at all sure how that relates to needing task_lock() in the
> > keyctl stuff.

Ah, is it used to serialize against exit_mm() so that the !->mm check
will suffice to avoid queuing works past exit_task_work() ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 8:00 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 14:00:02 +0200
Local: Mon, Aug 20 2012 8:00 am
Subject: Re: lockdep trace from posix timers

On Mon, 2012-08-20 at 13:44 +0200, Peter Zijlstra wrote:
>  int
> +task_work_add(struct task_struct *task, struct callback_head *work, bool notify)
>  {
> +       struct callback_head **head = &task->task_works;
> +       struct callback_head *entry, *old_entry;
> +
> +       entry = &head;

My compiler just alerted me that:

        entry = *head;

works a lot better..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven Rostedt  
View profile  
 More options Aug 20 2012, 8:20 am
Newsgroups: linux.kernel
From: Steven Rostedt <rost...@goodmis.org>
Date: Mon, 20 Aug 2012 14:20:02 +0200
Local: Mon, Aug 20 2012 8:20 am
Subject: Re: lockdep trace from posix timers

But your compiler likes "entry == &dead"? ;-)

-- Steve

> > +                       return -ESRCH;
> > +
> > +               old_entry = entry;
> > +               work->next = entry;
> > +               entry = cmpxchg(head, old_entry, work);
> > +               if (entry == old_entry)
> > +                       break;
> > +       }

> >         /* test_and_set_bit() implies mb(), see tracehook_notify_resume(). */
> >         if (notify)
> >                 set_notify_resume(task);
> > +
> >         return 0;
> >  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 8:30 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 14:30:02 +0200
Local: Mon, Aug 20 2012 8:30 am
Subject: Re: lockdep trace from posix timers
On Mon, 2012-08-20 at 08:19 -0400, Steven Rostedt wrote:

> > > +       for (;;) {
> > > +               if (entry == &dead)

> But your compiler likes "entry == &dead"? ;-)

Yes, fancy as GCC is these days, it doesn't quite bother with the
meta-physical questions just yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oleg Nesterov  
View profile  
 More options Aug 20 2012, 11:00 am
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Mon, 20 Aug 2012 17:00:02 +0200
Local: Mon, Aug 20 2012 11:00 am
Subject: Re: lockdep trace from posix timers
On 08/20, Peter Zijlstra wrote:

> On Fri, 2012-08-17 at 17:14 +0200, Oleg Nesterov wrote:
> > I still think that task_work_add() should synhronize with exit_task_work()
> > itself and fail if necessary. But I wasn't able to convince Al ;)

> I'm not at all sure how that relates to needing task_lock() in the
> keyctl stuff.

To serialize with exit_mm() which clears ->mm.

We shouldn't do task_work_add(task) if the exiting task has already
passed exit_task_work(). There is no way to do this after ed3e694d7
(and I think this is wrong), so keyctl relies on the fact that
exit_task_work() is called after exit_mm().

> Also, can't task_work use llist stuff? That would also avoid using
> ->pi_lock.

Not sure.... task_work_add/run can use cmpxchg, but _cancel can't.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oleg Nesterov  
View profile  
 More options Aug 20 2012, 11:10 am
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Mon, 20 Aug 2012 17:10:02 +0200
Local: Mon, Aug 20 2012 11:10 am
Subject: Re: lockdep trace from posix timers
On 08/20, Peter Zijlstra wrote:

>  task_work_cancel(struct task_struct *task, task_work_func_t func)
>  {
> ...
> +
> +again:
> +  workp = &task->task_works;
> +  work = *workp;
> +  while (work) {
> +          if (work->func == func) {

But this all can race with task_work_run() if task != current.

This "work" can be freed/reused. And it should only return !NULL
if twork->func() was not called.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oleg Nesterov  
View profile  
 More options Aug 20 2012, 11:10 am
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Mon, 20 Aug 2012 17:10:04 +0200
Local: Mon, Aug 20 2012 11:10 am
Subject: Re: lockdep trace from posix timers
On 08/20, Peter Zijlstra wrote:

Yes, we can add the explicit argument to __task_work_run(), but it can
check PF_EXITING instead, this looks simpler to me.

Note also your patch breaks fifo, but this is fixable.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 11:20 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 17:20:02 +0200
Local: Mon, Aug 20 2012 11:20 am
Subject: Re: lockdep trace from posix timers

I guess we could.. but I thought the explicit callback was simpler ;-)

> Note also your patch breaks fifo, but this is fixable.

Why do you care about the order? Iterating a single linked queue in fifo
seems more expensive than useful.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 11:20 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 17:20:03 +0200
Local: Mon, Aug 20 2012 11:20 am
Subject: Re: lockdep trace from posix timers

Ah, because we could be iterating the list after the xchg done by run.

Hmm..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 11:30 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 17:30:03 +0200
Local: Mon, Aug 20 2012 11:30 am
Subject: Re: lockdep trace from posix timers

I guess we could steal the entire list and requeue it afterwards and
lift TIF_NOTIFY_RESUME along with it.. but I can't say that's pretty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oleg Nesterov  
View profile  
 More options Aug 20 2012, 11:40 am
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Mon, 20 Aug 2012 17:40:03 +0200
Local: Mon, Aug 20 2012 11:40 am
Subject: Re: lockdep trace from posix timers
On 08/20, Peter Zijlstra wrote:

We can't. This can race with exit_task_work(). And this can break
fput(), the task can return to the userspace without __fput().

> but I can't say that's pretty.

Yes ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 11:50 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 17:50:01 +0200
Local: Mon, Aug 20 2012 11:50 am
Subject: Re: lockdep trace from posix timers

On Mon, 2012-08-20 at 17:32 +0200, Oleg Nesterov wrote:
> > I guess we could steal the entire list and requeue it afterwards and
> > lift TIF_NOTIFY_RESUME along with it..

> We can't. This can race with exit_task_work(). And this can break
> fput(), the task can return to the userspace without __fput().

So we could put that spinlock back around cancel and run and leave add
lockless. That'd solve my immediate problem but its not something I'm
proud of.

All schemes I can come up with end up being broken or effectively the
same as the above proposal.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Zijlstra  
View profile  
 More options Aug 20 2012, 11:50 am
Newsgroups: linux.kernel
From: Peter Zijlstra <pet...@infradead.org>
Date: Mon, 20 Aug 2012 17:50:01 +0200
Local: Mon, Aug 20 2012 11:50 am
Subject: Re: lockdep trace from posix timers

On Mon, 2012-08-20 at 17:43 +0200, Oleg Nesterov wrote:
> BTW, do you think it is really important to try to avoid ->pi_lock?

It is for me, I'm doing task_work_add() from under rq->lock.. :/ I could
fudge around and delay, but not having to would be nice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oleg Nesterov  
View profile  
 More options Aug 20 2012, 11:50 am
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Mon, 20 Aug 2012 17:50:02 +0200
Local: Mon, Aug 20 2012 11:50 am
Subject: Re: lockdep trace from posix timers
On 08/20, Peter Zijlstra wrote:

> Also, can't task_work use llist stuff? That would also avoid using
> ->pi_lock.

BTW, do you think it is really important to try to avoid ->pi_lock?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oleg Nesterov  
View profile  
 More options Aug 20 2012, 11:50 am
Newsgroups: linux.kernel
From: Oleg Nesterov <o...@redhat.com>
Date: Mon, 20 Aug 2012 17:50:02 +0200
Local: Mon, Aug 20 2012 11:50 am
Subject: Re: lockdep trace from posix timers
On 08/20, Peter Zijlstra wrote:

> On Mon, 2012-08-20 at 17:05 +0200, Oleg Nesterov wrote:

> > Yes, we can add the explicit argument to __task_work_run(), but it can
> > check PF_EXITING instead, this looks simpler to me.

> I guess we could.. but I thought the explicit callback was simpler ;-)

I won't insist. The patch I sent uses PF_EXITING and the fake
"struct callback_head* TWORK_EXITED", but this looks almost the same.

> > Note also your patch breaks fifo, but this is fixable.

> Why do you care about the order?

IMHO, this is just more natural.

For example. keyctl_session_to_parent() does _cancel only to protect
from exploits doing keyctl(KEYCTL_SESSION_TO_PARENT) in an endless
loop. It could simply do task_work_add(), but in this case we need
fifo for correctness.

> Iterating a single linked queue in fifo
> seems more expensive than useful.

Currently the list is fifo (we add to the last element), this is O(1).

But the list should be short, we can reverse it in _run() if we change
task_work_add() to add to the head.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 48   Newer >
« Back to Discussions « Newer topic     Older topic »