Re: [RFC][PATCH] kmap_atomic

Linus Torvalds

unread,

Oct 8, 2009, 12:00:11 PM10/8/09

to

On Thu, 8 Oct 2009, Peter Zijlstra wrote:
>
> The below patchlet changes the kmap_atomic interface to a stack based
> one that doesn't require the KM_types anymore.

I think this is how we should have done it originally.

That said, if we do this, I'd hate to have the "push" and "pop" parts to
the name. They don't really add a whole lot.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Ingo Molnar

unread,

Oct 8, 2009, 12:10:09 PM10/8/09

to

* Peter Zijlstra <pet...@infradead.org> wrote:

> The below patchlet changes the kmap_atomic interface to a stack based
> one that doesn't require the KM_types anymore.
>

> This significantly simplifies some code (more still than are present
> in this patch -- ie. pte_map_nested can go now)
>
> This obviously requires that push and pop are matched, I fixed a few
> cases that were not properly nested, the (x86) code checks for this
> and will go BUG when trying to pop a vaddr that isn't the top one so
> abusers should be rather visible.

Looks great IMO! Last i proposed this i think either Andrew or Avi had
second thoughts about the hard-to-calculate worst-case mapping limit -
but i dont think that's a big issue.

Lets not change the API names though - the rule is that map/unmap must
be properly nested.

Ingo

Peter Zijlstra

unread,

Oct 8, 2009, 12:40:10 PM10/8/09

to

On Thu, 2009-10-08 at 17:53 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <pet...@infradead.org> wrote:
>
> > The below patchlet changes the kmap_atomic interface to a stack based
> > one that doesn't require the KM_types anymore.
> >
> > This significantly simplifies some code (more still than are present
> > in this patch -- ie. pte_map_nested can go now)
> >
> > This obviously requires that push and pop are matched, I fixed a few
> > cases that were not properly nested, the (x86) code checks for this
> > and will go BUG when trying to pop a vaddr that isn't the top one so
> > abusers should be rather visible.
>
> Looks great IMO! Last i proposed this i think either Andrew or Avi had
> second thoughts about the hard-to-calculate worst-case mapping limit -
> but i dont think that's a big issue.

That would've been me ;-)

> Lets not change the API names though - the rule is that map/unmap must
> be properly nested.

Right, so I did that full rename just so that people wouldn't get
confused or something, but if both you and Linus think it should remain:
kmap_atomic() and kunmap_atomic(), I can certainly undo that part.

Linus Torvalds

unread,

Oct 8, 2009, 1:00:21 PM10/8/09

to

On Thu, 8 Oct 2009, Peter Zijlstra wrote:
>

> Right, so I did that full rename just so that people wouldn't get
> confused or something, but if both you and Linus think it should remain:
> kmap_atomic() and kunmap_atomic(), I can certainly undo that part.

I think the renaming probably helps find all the places (simple "grep -w"
shows the difference, and no fear of confusion with comma-expressions and
multi-line arguments etc). But once they've all been converted, you might
as well then do a search-and-replace-back on the patch, and make the end
result look like you just removed the (now pointless) argument.

In fact, I'd personally be inclined to split the patch into two patches:

- one that just ignores the now redundant argument (but still keeps it),
and fixes the cases that didn't nest

- one that then removes the argument.

Why? The _bugs_ are going to be shown by the first patch, and it would be
nice to keep that patch small. When a bug shows up, it would be either
because there's something wrong in that (much smaller) patch, or because
some not-properly-nested casel wasn't fixed.

In contrast, the second patch would be large, but if done right, you could
then prove that it has no actual semantic changes (ie "binary is same
before and after"). That just sounds _much_ nicer from a debug standpoint.
Developers would look at the small and concentrated "real changes" patch,
rather than be distracted by all the trivial noise.

Linus

Hugh Dickins

unread,

Oct 8, 2009, 2:20:07 PM10/8/09

to

On Thu, 8 Oct 2009, Linus Torvalds wrote:
> On Thu, 8 Oct 2009, Peter Zijlstra wrote:
> >
> > Right, so I did that full rename just so that people wouldn't get
> > confused or something, but if both you and Linus think it should remain:
> > kmap_atomic() and kunmap_atomic(), I can certainly undo that part.

I love the patch, Peter: thank you. But agree with the others to
keep the old names, just let the change in prototype (vanishing
second arg) do the work of weeding out any stragglers.

>
> I think the renaming probably helps find all the places (simple "grep -w"
> shows the difference, and no fear of confusion with comma-expressions and
> multi-line arguments etc). But once they've all been converted, you might
> as well then do a search-and-replace-back on the patch, and make the end
> result look like you just removed the (now pointless) argument.
>
> In fact, I'd personally be inclined to split the patch into two patches:
>
> - one that just ignores the now redundant argument (but still keeps it),
> and fixes the cases that didn't nest
>
> - one that then removes the argument.
>
> Why? The _bugs_ are going to be shown by the first patch, and it would be
> nice to keep that patch small. When a bug shows up, it would be either
> because there's something wrong in that (much smaller) patch, or because
> some not-properly-nested casel wasn't fixed.
>
> In contrast, the second patch would be large, but if done right, you could
> then prove that it has no actual semantic changes (ie "binary is same
> before and after"). That just sounds _much_ nicer from a debug standpoint.
> Developers would look at the small and concentrated "real changes" patch,
> rather than be distracted by all the trivial noise.

And I very much agree with Linus's two patch approach: it also makes
it much easier to review, separating the wheat of the interesting
changes from the chaff of eliminating the unnecessary arg. It was
hard to find the interesting part in the patch as you sent it.

I wasn't really checking it, but think I noticed something called
swap_two_pages() somewhere, which wasn't doing the unnesting right:
you may need to swap two lines ;)

Hugh

David Howells

unread,

Oct 8, 2009, 6:20:05 PM10/8/09

to

Peter Zijlstra <pet...@infradead.org> wrote:

> The below patchlet changes the kmap_atomic interface to a stack based
> one that doesn't require the KM_types anymore.
>
> This significantly simplifies some code (more still than are present in
> this patch -- ie. pte_map_nested can go now)

> ...
> What do people think?

Brrr.

This makes FRV much worse. kmap_atomic() used to take a compile-time constant
value - which meant that the switch-statement inside it was mostly optimised
away. kmap_atomic() would be rendered down to very few instructions. Now
it'll be a huge jump table plus all those few instructions repeated because
the selector is now dynamic.

What I would prefer to see is something along the lines of local_irq_save()
and local_irq_restore(), where the displaced value gets stored on the machine
stack. In FRV, I could then represent kmap_atomic() as:

static inline void *kmap_atomic_push(struct page *page,
kmap_save_t *save)
{
unsigned long paddr, damlr, dampr;
int type;

pagefault_disable();

dampr = page_to_phys(page);
dampr |= xAMPRx_L | xAMPRx_M | xAMPRx_S | xAMPRx_SS_16Kb | xAMPRx_V;
asm volatile("movgs dampr6,%0 \n"
"movgs %0,dampr6"
: "=r"(save->dampr) : "r"(dampr) : "memory");
asm("movsg damlr6,%0" : "=r"(damlr));
return (void *) damlr;
}

However, since we occasionally want a second kmap slot, we could also add:

typedef struct { unsigned long dampr; } kmap_save_t;

static inline void *kmap2_atomic_push(struct page *page,
kmap_save_t *save)
{
unsigned long paddr, damlr, dampr;
int type;

pagefault_disable();

dampr = page_to_phys(page);
dampr |= xAMPRx_L | xAMPRx_M | xAMPRx_S | xAMPRx_SS_16Kb | xAMPRx_V;
asm volatile("movgs dampr7,%0 \n"
"movgs %0,dampr7"
: "=r"(save->dampr) : "r"(dampr) : "memory");
asm("movsg damlr7,%0" : "=r"(damlr));
return (void *) damlr;
}

And the reverse ops would be:

static inline void kmap_atomic_pop(kmap_save_t *save)
{
asm volatile("movgs %0,dampr6"
:: "r"(save->dampr) : "memory");
pagefault_enable();
}

static inline void kmap2_atomic_pop(kmap_save_t *save)
{
asm volatile("movgs %0,dampr7"
:: "r"(save->dampr) : "memory");
pagefault_enable();
}

And I would avoid the need to lock fake TLB entries in the shallow TLB.

If it's too much trouble for an arch to extract the current kmap setting from
the MMU or the page tables, *those* could be mirrored in per-CPU data.

Do we have any code that uses two slots and then calls more code that
ultimately requires further slots? In other words, do we need more than two
slots?

> David, frv has a 'funny' issue in that it treats __KM_CACHE special from
> the other ones, something which isn't possibly anymore. Do you see
> another option than to add kmap_atomic_push_cache() for frv?

The four __KM_xxx slots are special and must not be committed to this stack.
They're used by assembly code directly for certain tasks, such as maintaining
the TLB-lookup state. Your changes are guaranteed to break MMU-mode FRV.

That said, there's no reason these four slots *have* to be administered
through kmap_atomic(). A kmap_frv() could be made instead for them.

David

Peter Zijlstra

unread,

Oct 8, 2009, 6:40:04 PM10/8/09

to

On Thu, 2009-10-08 at 18:27 -0400, jim owens wrote:
> So if I understand this correctly, the sequence:
>
> in = kmap_atomic(inpage, KM_USER1);
>
> out = kmap_atomic(outpage, KM_USER0);
>
> kunmap_atomic(in, KM_USER1);
>
> in = kmap_atomic(next_inpage, KM_USER1);
>
> is now illegal with this patch

Yep.

Peter Zijlstra

unread,

Oct 8, 2009, 6:40:06 PM10/8/09

to

On Thu, 2009-10-08 at 23:12 +0100, David Howells wrote:
> Do we have any code that uses two slots and then calls more code that
> ultimately requires further slots? In other words, do we need more than two
> slots?

I can think of code that does a lot more than that, suppose you have
both KM_USER[01], get an interrupt that takes KM_IRQ[01], take an NMI
that takes KM_NMI.

Maybe we can stack the SOFTIRQ ones in as well ;-)

jim owens

unread,

Oct 8, 2009, 6:40:04 PM10/8/09

to

So if I understand this correctly, the sequence:

in = kmap_atomic(inpage, KM_USER1);

out = kmap_atomic(outpage, KM_USER0);

kunmap_atomic(in, KM_USER1);

in = kmap_atomic(next_inpage, KM_USER1);

is now illegal with this patch, which breaks code
I am testing now for btrfs.

My code does this because the in/out are zlib inflate
and the in/out run at different rates.

OK, the code is not submitted yet and I can redesign the
code using a temp buffer for out and copy every byte or
use kmap(), either of them at some performance cost.

I'm just pointing out that there are cases where this
stack design puts an ugly restriction on use.

So if I understand this right, I don't love the patch (:

jim

Peter Zijlstra

unread,

Oct 8, 2009, 6:50:09 PM10/8/09

to

Just to make it clear, the stack design gets rid of crap like:

-#define __KM_PTE \
- (in_nmi() ? KM_NMI_PTE : \
- in_irq() ? KM_IRQ_PTE : \
- KM_PTE0)

and

-static inline enum km_type crypto_kmap_type(int out)
-{
- enum km_type type;
-
- if (in_softirq())
- type = out * (KM_SOFTIRQ1 - KM_SOFTIRQ0) + KM_SOFTIRQ0;
- else
- type = out * (KM_USER1 - KM_USER0) + KM_USER0;
-
- return type;
-}

David Howells

unread,

Oct 8, 2009, 7:10:05 PM10/8/09

to

Peter Zijlstra <pet...@infradead.org> wrote:

> I can think of code that does a lot more than that, suppose you have
> both KM_USER[01], get an interrupt that takes KM_IRQ[01], take an NMI
> that takes KM_NMI.

But whilst the interrupt might want to use two slots, it's probably a bug for
it to want to access the mappings set up by whatever called KM_USER[01] - so
it can probably reuse those slots, provided it puts them back again.

Similarly for NMI taking KM_NMI - it probably shouldn't be attempting to
access the mappings set up by the normal mode or the interrupt mode - in which
case, why can't it reuse those slots?

> Maybe we can stack the SOFTIRQ ones in as well ;-)

Ditto.

David

Peter Zijlstra

unread,

Oct 8, 2009, 7:10:07 PM10/8/09

to

On Thu, 2009-10-08 at 18:27 -0400, jim owens wrote:

> So if I understand this correctly, the sequence:
>
> in = kmap_atomic(inpage, KM_USER1);
>
> out = kmap_atomic(outpage, KM_USER0);
>
> kunmap_atomic(in, KM_USER1);
>
> in = kmap_atomic(next_inpage, KM_USER1);
>
> is now illegal with this patch, which breaks code
> I am testing now for btrfs.
>
> My code does this because the in/out are zlib inflate
> and the in/out run at different rates.

You can do things like:

do {
in = kmap_atomic(inpage);
out = kmap_atomic(outpage);

kunmap_atomic(outpage);
kunmap_atomic(inpage);

cond_resched();

} while (<not done>)

The double unmap gives a preemption point, which sounds like a good
thing to have, because your scheme could run for a long while without
enabling preemption, which is badness.

jim owens

unread,

Oct 9, 2009, 8:30:10 AM10/9/09

to

Peter Zijlstra wrote:
>
> The double unmap gives a preemption point, which sounds like a good
> thing to have, because your scheme could run for a long while without
> enabling preemption, which is badness.

Thanks, optimizing my loop, I forgot all about
the old long code path not preempting problem.

Now I have to love the patch for making it harder to be stupid :)

Re: [RFC][PATCH] kmap_atomic_push

Linus Torvalds

Ingo Molnar

Peter Zijlstra

Linus Torvalds

Hugh Dickins

David Howells

Peter Zijlstra

Peter Zijlstra

jim owens

Peter Zijlstra

David Howells

Peter Zijlstra

jim owens