Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 1/6] Remove node's validity check in alloc_pages

1 view
Skip to first unread message

Minchan Kim

unread,
Apr 13, 2010, 11:30:02 AM4/13/10
to
alloc_pages calls alloc_pages_node with numa_node_id().
alloc_pages_node can't see nid < 0.

So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Mel Gorman <m...@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
Signed-off-by: Minchan Kim <minch...@gmail.com>
---
include/linux/gfp.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4c6d413..b65f003 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -308,7 +308,7 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr);
#else
#define alloc_pages(gfp_mask, order) \
- alloc_pages_node(numa_node_id(), gfp_mask, order)
+ alloc_pages_exact_node(numa_node_id(), gfp_mask, order)
#define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
--
1.7.0.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Minchan Kim

unread,
Apr 13, 2010, 11:30:02 AM4/13/10
to
alloc_pages_exact_node naming makes some people misleading.
They considered it following as.
"This function will allocate pages from node which I wanted
exactly".
But it can allocate pages from fallback list if page allocator
can't find free page from node user wanted.

So let's comment this NOTE.

Actually I wanted to change naming with better.
ex) alloc_pages_explict_node.
But I changed my mind since the comment would be enough.

If anybody suggests better name, I will do with pleasure.

Cc: Mel Gorman <m...@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>

Cc: Bob Liu <lli...@gmail.com>


Signed-off-by: Minchan Kim <minch...@gmail.com>
---

include/linux/gfp.h | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b65f003..7539c17 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -288,6 +288,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}

+/*
+ * NOTE : Allow page from fallback if page allocator can't find free page
+ * in your nid. Only if you want to allocate page from your nid, use
+ * __GFP_THISNODE flags with gfp_mask.
+ */
static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
unsigned int order)
{

Minchan Kim

unread,
Apr 13, 2010, 11:30:02 AM4/13/10
to
alloc_pages_node is called with cpu_to_node(cpu).
I think cpu_to_node(cpu) never returns -1.
(But I am not sure we need double check.)

So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Tejun Heo <t...@kernel.org>
Cc: Christoph Lameter <c...@linux-foundation.org>


Signed-off-by: Minchan Kim <minch...@gmail.com>
---

mm/percpu.c | 2 +-


1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 768419d..ec3e671 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -720,7 +720,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
for (i = page_start; i < page_end; i++) {
struct page **pagep = &pages[pcpu_page_idx(cpu, i)];

- *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
+ *pagep = alloc_pages_exact_node(cpu_to_node(cpu), gfp, 0);
if (!*pagep) {
pcpu_free_pages(chunk, pages, populated,
page_start, page_end);

Mel Gorman

unread,
Apr 13, 2010, 11:40:01 AM4/13/10
to
On Wed, Apr 14, 2010 at 12:24:58AM +0900, Minchan Kim wrote:
> alloc_pages calls alloc_pages_node with numa_node_id().
> alloc_pages_node can't see nid < 0.
>
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Mel Gorman <m...@csn.ul.ie>
> Cc: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minch...@gmail.com>

Makes sense.

Signed-off-by: Mel Gorman <m...@csn.ul.ie>

> ---
> include/linux/gfp.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 4c6d413..b65f003 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -308,7 +308,7 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
> struct vm_area_struct *vma, unsigned long addr);
> #else
> #define alloc_pages(gfp_mask, order) \
> - alloc_pages_node(numa_node_id(), gfp_mask, order)
> + alloc_pages_exact_node(numa_node_id(), gfp_mask, order)
> #define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
> #endif
> #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
> --
> 1.7.0.5
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

Mel Gorman

unread,
Apr 13, 2010, 11:50:02 AM4/13/10
to
On Wed, Apr 14, 2010 at 12:24:59AM +0900, Minchan Kim wrote:
> alloc_pages_node is called with cpu_to_node(cpu).
> I think cpu_to_node(cpu) never returns -1.
> (But I am not sure we need double check.)
>
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>

Well, numa_node_id() is implemented as

#ifndef numa_node_id
#define numa_node_id() (cpu_to_node(raw_smp_processor_id()))
#endif

and the mapping table on x86 at least is based on possible CPUs in
init_cpu_to_node() leaves the mapping as 0 if the APIC is bad or the numa
node is reported in apicid_to_node as -1. It would appear on power that
the node will be 0 for possible CPUs as well.

Hence, I believe this to be safe but a confirmation from Tejun would be
nice. I would continue digging but this looks like an initialisation path
so I'll move on to the next patch rather than spending more time.

> Cc: Tejun Heo <t...@kernel.org>
> Cc: Christoph Lameter <c...@linux-foundation.org>
> Signed-off-by: Minchan Kim <minch...@gmail.com>
> ---
> mm/percpu.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 768419d..ec3e671 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -720,7 +720,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
> for (i = page_start; i < page_end; i++) {
> struct page **pagep = &pages[pcpu_page_idx(cpu, i)];
>
> - *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
> + *pagep = alloc_pages_exact_node(cpu_to_node(cpu), gfp, 0);
> if (!*pagep) {
> pcpu_free_pages(chunk, pages, populated,
> page_start, page_end);
> --
> 1.7.0.5
>

--

Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

Mel Gorman

unread,
Apr 13, 2010, 12:20:02 PM4/13/10
to
On Wed, Apr 14, 2010 at 12:25:03AM +0900, Minchan Kim wrote:
> alloc_pages_exact_node naming makes some people misleading.
> They considered it following as.
> "This function will allocate pages from node which I wanted
> exactly".
> But it can allocate pages from fallback list if page allocator
> can't find free page from node user wanted.
>
> So let's comment this NOTE.
>

It's a little tough to read. How about

/*
* Use this instead of alloc_pages_node when the caller knows
* exactly which node they need (as opposed to passing in -1
* for current). Fallback to other nodes will still occur
* unless __GFP_THISNODE is specified.
*/

That at least will tie in why "exact" is in the name?

> Actually I wanted to change naming with better.
> ex) alloc_pages_explict_node.

"Explicit" can also be taken to mean "this and only this node".

--

Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

Minchan Kim

unread,
Apr 13, 2010, 12:30:02 PM4/13/10
to
On Wed, Apr 14, 2010 at 1:13 AM, Mel Gorman <m...@csn.ul.ie> wrote:
> On Wed, Apr 14, 2010 at 12:25:03AM +0900, Minchan Kim wrote:
>> alloc_pages_exact_node naming makes some people misleading.
>> They considered it following as.
>> "This function will allocate pages from node which I wanted
>> exactly".
>> But it can allocate pages from fallback list if page allocator
>> can't find free page from node user wanted.
>>
>> So let's comment this NOTE.
>>
>
> It's a little tough to read. How about
>
> /*
>  * Use this instead of alloc_pages_node when the caller knows
>  * exactly which node they need (as opposed to passing in -1
>  * for current). Fallback to other nodes will still occur
>  * unless __GFP_THISNODE is specified.
>  */

It is better than mine.

>
> That at least will tie in why "exact" is in the name?
>
>> Actually I wanted to change naming with better.
>> ex) alloc_pages_explict_node.
>
> "Explicit" can also be taken to mean "this and only this node".

I agree.
I will repost modified comment after Tejun comment [2/6].
Thanks for quick review, Mel. :)
--
Kind regards,
Minchan Kim

KAMEZAWA Hiroyuki

unread,
Apr 13, 2010, 8:10:02 PM4/13/10
to
On Wed, 14 Apr 2010 00:24:58 +0900
Minchan Kim <minch...@gmail.com> wrote:

> alloc_pages calls alloc_pages_node with numa_node_id().
> alloc_pages_node can't see nid < 0.
>
> So we can use alloc_pages_exact_node instead of alloc_pages_node.
> It could avoid comparison and branch as 6484eb3e2a81807722 tried.
>
> Cc: Mel Gorman <m...@csn.ul.ie>
> Cc: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minch...@gmail.com>

Acked-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujisu.com>

Minchan Kim

unread,
Apr 14, 2010, 11:00:03 AM4/14/10
to
alloc_pages_node is called with cpu_to_node(cpu).
I think cpu_to_node(cpu) never returns -1.
(But I am not sure we need double check.)

So we can use alloc_pages_exact_node instead of alloc_pages_node.


It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Tejun Heo <t...@kernel.org>
Cc: Christoph Lameter <c...@linux-foundation.org>


Signed-off-by: Minchan Kim <minch...@gmail.com>
---

mm/percpu.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/percpu.c b/mm/percpu.c
index 768419d..ec3e671 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -720,7 +720,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
for (i = page_start; i < page_end; i++) {
struct page **pagep = &pages[pcpu_page_idx(cpu, i)];

- *pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
+ *pagep = alloc_pages_exact_node(cpu_to_node(cpu), gfp, 0);
if (!*pagep) {
pcpu_free_pages(chunk, pages, populated,
page_start, page_end);
--
1.7.0.5

--

Minchan Kim

unread,
Apr 14, 2010, 11:00:02 AM4/14/10
to
V2
* Add some reviewed-by

alloc_pages calls alloc_pages_node with numa_node_id().
alloc_pages_node can't see nid < 0.

So we can use alloc_pages_exact_node instead of alloc_pages_node.


It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Reviewed-by: Mel Gorman <m...@csn.ul.ie>
Acked-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujisu.com>


Signed-off-by: Minchan Kim <minch...@gmail.com>
---

include/linux/gfp.h | 2 +-


1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4c6d413..b65f003 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h


@@ -308,7 +308,7 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr);
#else
#define alloc_pages(gfp_mask, order) \
- alloc_pages_node(numa_node_id(), gfp_mask, order)
+ alloc_pages_exact_node(numa_node_id(), gfp_mask, order)
#define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)

Minchan Kim

unread,
Apr 14, 2010, 11:10:02 AM4/14/10
to
V2
* Add some reviewed-by

__vmalloc_area_node never pass -1 to alloc_pages_node.
It means node's validity check is unnecessary.


So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Nick Piggin <npi...@suse.de>
Reviewed-by: Mel Gorman <m...@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>


Signed-off-by: Minchan Kim <minch...@gmail.com>
---

mm/vmalloc.c | 2 +-


1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ae00746..7abf423 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1499,7 +1499,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
if (node < 0)
page = alloc_page(gfp_mask);
else
- page = alloc_pages_node(node, gfp_mask, 0);
+ page = alloc_pages_exact_node(node, gfp_mask, 0);

if (unlikely(!page)) {
/* Successfully allocated i pages, free them in __vunmap() */

Minchan Kim

unread,
Apr 14, 2010, 11:10:02 AM4/14/10
to
V2
o modify comment by suggestion. (Thanks to Mel)

alloc_pages_exact_node naming makes some people misleading.
They considered it following as.
"This function will allocate pages from node which I wanted
exactly".
But it can allocate pages from fallback list if page allocator
can't find free page from node user wanted.

So let's comment this NOTE.

Actually I wanted to change naming with better.
ex) alloc_pages_explict_node.


But I changed my mind since the comment would be enough.

If anybody suggests better name, I will do with pleasure.

Cc: Mel Gorman <m...@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
Cc: Bob Liu <lli...@gmail.com>


Signed-off-by: Minchan Kim <minch...@gmail.com>
---

include/linux/gfp.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b65f003..56b5fe6 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -288,6 +288,12 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,


return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}

+/*

+ * Use this instead of alloc_pages_node when the caller knows
+ * exactly which node they need (as opposed to passing in -1
+ * for current). Fallback to other nodes will still occur
+ * unless __GFP_THISNODE is specified.


+ */
static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
unsigned int order)
{

Minchan Kim

unread,
Apr 14, 2010, 11:10:03 AM4/14/10
to
V2
* change changelog
* Add some reviewed-by

alloc_slab_page always checks nid == -1, so alloc_page_node can't be
called with -1.
It means node's validity check in alloc_pages_node is unnecessary.

So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Pekka Enberg <pen...@cs.helsinki.fi>


Cc: Christoph Lameter <c...@linux-foundation.org>
Signed-off-by: Minchan Kim <minch...@gmail.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
Reviewed-by: Mel Gorman <m...@csn.ul.ie>
---
mm/slub.c | 2 +-


1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b364844..9984165 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1084,7 +1084,7 @@ static inline struct page *alloc_slab_page(gfp_t flags, int node,
if (node == -1)
return alloc_pages(flags, order);
else
- return alloc_pages_node(node, flags, order);
+ return alloc_pages_exact_node(node, flags, order);
}

static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)

Minchan Kim

unread,
Apr 14, 2010, 11:10:03 AM4/14/10
to
V2
* Add some reviewed-by

if node_state is N_HIGH_MEMORY, node doesn't have -1.
It means node's validity check is unnecessary.


So we can use alloc_pages_exact_node instead of alloc_pages_node.
It could avoid comparison and branch as 6484eb3e2a81807722 tried.

Cc: Christoph Lameter <c...@linux-foundation.org>


Reviewed-by: Mel Gorman <m...@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>

Signed-off-by: Minchan Kim <minch...@gmail.com>
---

mm/sparse-vmemmap.c | 2 +-


1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 392b9bb..7710ebc 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -53,7 +53,7 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
struct page *page;

if (node_state(node, N_HIGH_MEMORY))
- page = alloc_pages_node(node,
+ page = alloc_pages_exact_node(node,
GFP_KERNEL | __GFP_ZERO, get_order(size));
else
page = alloc_pages(GFP_KERNEL | __GFP_ZERO,

Pekka Enberg

unread,
Apr 14, 2010, 2:00:02 PM4/14/10
to

Applied, thanks!

Tejun Heo

unread,
Apr 14, 2010, 7:40:01 PM4/14/10
to
Hello,

On 04/14/2010 12:48 AM, Mel Gorman wrote:
> and the mapping table on x86 at least is based on possible CPUs in
> init_cpu_to_node() leaves the mapping as 0 if the APIC is bad or the numa
> node is reported in apicid_to_node as -1. It would appear on power that
> the node will be 0 for possible CPUs as well.
>
> Hence, I believe this to be safe but a confirmation from Tejun would be
> nice. I would continue digging but this looks like an initialisation path
> so I'll move on to the next patch rather than spending more time.

This being a pretty cold path, I don't really see much benefit in
converting it to alloc_pages_node_exact(). It ain't gonna make any
difference. I'd rather stay with the safer / boring one unless
there's a pressing reason to convert.

Thanks.

--
tejun

Minchan Kim

unread,
Apr 14, 2010, 9:40:02 PM4/14/10
to
Hi, Tejun.

On Thu, Apr 15, 2010 at 8:39 AM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On 04/14/2010 12:48 AM, Mel Gorman wrote:
>> and the mapping table on x86 at least is based on possible CPUs in
>> init_cpu_to_node() leaves the mapping as 0 if the APIC is bad or the numa
>> node is reported in apicid_to_node as -1. It would appear on power that
>> the node will be 0 for possible CPUs as well.
>>
>> Hence, I believe this to be safe but a confirmation from Tejun would be
>> nice. I would continue digging but this looks like an initialisation path
>> so I'll move on to the next patch rather than spending more time.
>
> This being a pretty cold path, I don't really see much benefit in
> converting it to alloc_pages_node_exact().  It ain't gonna make any
> difference.  I'd rather stay with the safer / boring one unless
> there's a pressing reason to convert.

Actually, It's to weed out not-good API usage as well as some performance gain.
But I don't think to need it strongly.
Okay. Please keep in mind about this and correct it if you confirms it
in future. :)

>
> Thanks.
>
> --
> tejun
>

--
Kind regards,
Minchan Kim

Tejun Heo

unread,
Apr 15, 2010, 3:20:02 AM4/15/10
to
Hello,

On 04/15/2010 10:31 AM, Minchan Kim wrote:
> Hi, Tejun.


>> This being a pretty cold path, I don't really see much benefit in
>> converting it to alloc_pages_node_exact(). It ain't gonna make any
>> difference. I'd rather stay with the safer / boring one unless
>> there's a pressing reason to convert.
>
> Actually, It's to weed out not-good API usage as well as some
> performance gain. But I don't think to need it strongly.
> Okay. Please keep in mind about this and correct it if you confirms
> it in future. :)

Hmm... if most users are converting over to alloc_pages_node_exact(),
I think it would be better to convert percpu too. I thought it was a
performance optimization (of rather silly kind too). So, this is to
weed out -1 node id usage? Wouldn't it be better to update
alloc_pages_node() such that it whines once per each caller if it's
called with -1 node id and after updating most users convert the
warning into WARN_ON_ONCE()? Having two variants for this seems
rather extreme to me.

Thanks.

--
tejun

Minchan Kim

unread,
Apr 15, 2010, 4:10:01 AM4/15/10
to
On Thu, Apr 15, 2010 at 4:21 PM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On 04/15/2010 10:31 AM, Minchan Kim wrote:
>> Hi, Tejun.
>>> This being a pretty cold path, I don't really see much benefit in
>>> converting it to alloc_pages_node_exact().  It ain't gonna make any
>>> difference.  I'd rather stay with the safer / boring one unless
>>> there's a pressing reason to convert.
>>
>> Actually, It's to weed out not-good API usage as well as some
>> performance gain.  But I don't think to need it strongly.
>> Okay. Please keep in mind about this and correct it if you confirms
>> it in future. :)
>
> Hmm... if most users are converting over to alloc_pages_node_exact(),
> I think it would be better to convert percpu too.  I thought it was a
> performance optimization (of rather silly kind too).  So, this is to
> weed out -1 node id usage?  Wouldn't it be better to update
> alloc_pages_node() such that it whines once per each caller if it's
> called with -1 node id and after updating most users convert the
> warning into WARN_ON_ONCE()?  Having two variants for this seems
> rather extreme to me.

Yes. I don't like it.
With it, someone who does care about API usage uses alloc_pages_exact_node but
someone who don't have a time or careless uses alloc_pages_node.
It would make API fragmentation and not good.
Maybe we can weed out -1 and make new API which is more clear.

* struct page *alloc_pages_any_node(gfp_t gfp_mask, unsigned int order);
* struct page *alloc_pages_exact_node(int nid, gfp_mask, unsigned int order);

So firstly we have to make sure users who use alloc_pages_node can
change alloc_pages_node with alloc_pages_exact_node.

After all of it was weed out, I will change alloc_pages_node with
alloc_pages_any_node.


--
Kind regards,
Minchan Kim

Tejun Heo

unread,
Apr 15, 2010, 4:20:01 AM4/15/10
to
Hello,

On 04/15/2010 05:00 PM, Minchan Kim wrote:
> Yes. I don't like it.
> With it, someone who does care about API usage uses alloc_pages_exact_node but
> someone who don't have a time or careless uses alloc_pages_node.
> It would make API fragmentation and not good.
> Maybe we can weed out -1 and make new API which is more clear.
>
> * struct page *alloc_pages_any_node(gfp_t gfp_mask, unsigned int order);
> * struct page *alloc_pages_exact_node(int nid, gfp_mask, unsigned int order);

I'm not an expert on that part of the kernel but isn't
alloc_pages_any_node() identical to alloc_pages_exact_node()? All
that's necessary to do now is to weed out callers which pass in
negative nid to alloc_pages_node(), right? If so, why not just do a
clean sweep of alloc_pages_node() users and update them so that they
don't call in w/ -1 nid and add WARN_ON_ONCE() in alloc_pages_node()?
Is there any reason to keep both variants going forward? If not,
introducing new API just to weed out invalid usages seems like an
overkill.

Thanks.

--
tejun

Minchan Kim

unread,
Apr 15, 2010, 5:50:02 AM4/15/10
to
On Thu, Apr 15, 2010 at 5:15 PM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On 04/15/2010 05:00 PM, Minchan Kim wrote:
>> Yes. I don't like it.
>> With it, someone who does care about API usage uses alloc_pages_exact_node but
>> someone who don't have a time or careless uses alloc_pages_node.
>> It would make API fragmentation and not good.
>> Maybe we can weed out -1 and make new API which is more clear.
>>
>> * struct page *alloc_pages_any_node(gfp_t gfp_mask, unsigned int order);
>> * struct page *alloc_pages_exact_node(int nid, gfp_mask, unsigned int order);
>
> I'm not an expert on that part of the kernel but isn't
> alloc_pages_any_node() identical to alloc_pages_exact_node()?  All

alloc_pages_any_node means user allows allocated pages in any
node(most likely current node)
alloc_pages_exact_node means user allows allocated pages in nid node
if he doesn't use __GFP_THISNODE.

> that's necessary to do now is to weed out callers which pass in
> negative nid to alloc_pages_node(), right?  If so, why not just do a

It might be my final goal. I hope user uses alloc_pages_any_node
instead of nid == -1.

> clean sweep of alloc_pages_node() users and update them so that they
> don't call in w/ -1 nid and add WARN_ON_ONCE() in alloc_pages_node()?
> Is there any reason to keep both variants going forward?  If not,

I am not sure someone still need alloc_pages_node. That's because
sometime he want to allocate page from specific node but sometime not.
I hope it doesn't happen. Anyway I have to identify the situation.

> introducing new API just to weed out invalid usages seems like an
> overkill.

It might be.

It think it's almost same add_to_page_cache and add_to_page_cache_locked.
If user knows the page is already locked, he can use
add_to_page_cache_locked for performance gain and code readability
which we need to lock the page before calling it.
The important point is that user uses it as he is conscious of locked page.
I think if user already know to want page where from specific node, it
would be better to use alloc_pages_exact_node instead of
alloc_pages_node.

If he want to allocate page from any node[current..fallback list], he
have to use alloc_pages_any_node without nid argument. It would make a
little performance gain(reduce passing argument) and good readbility.
:)

Now, most of user uses alloc_pages_exact_node. So I think we can do it.
But someone else also might think of overkill. :)
It's a not urgent issue. So I will take it easy.

Thanks.

--
Kind regards,
Minchan Kim

Tejun Heo

unread,
Apr 15, 2010, 6:10:01 AM4/15/10
to
Hello,

On 04/15/2010 06:40 PM, Minchan Kim wrote:
>> I'm not an expert on that part of the kernel but isn't
>> alloc_pages_any_node() identical to alloc_pages_exact_node()? All
>
> alloc_pages_any_node means user allows allocated pages in any
> node(most likely current node) alloc_pages_exact_node means user
> allows allocated pages in nid node if he doesn't use __GFP_THISNODE.

Ooh, sorry, I meant alloc_pages(). What would be the difference
between alloc_pages_any_node() and alloc_pages()?

>> introducing new API just to weed out invalid usages seems like an
>> overkill.
>
> It might be.
>
> It think it's almost same add_to_page_cache and add_to_page_cache_locked.
> If user knows the page is already locked, he can use
> add_to_page_cache_locked for performance gain and code readability
> which we need to lock the page before calling it.

Yeah, if both APIs are necessary at the end of the conversion, sure.
I was just saying that introducing new APIs just to weed out invalid
usages and then later killing the old API would be rather excessive.

I was just wondering whether we could just clean up alloc_pages_node()
users and kill alloc_pages_exact_node().

Thanks.

--
tejun

Minchan Kim

unread,
Apr 15, 2010, 6:30:01 AM4/15/10
to
On Thu, Apr 15, 2010 at 7:08 PM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On 04/15/2010 06:40 PM, Minchan Kim wrote:
>>> I'm not an expert on that part of the kernel but isn't
>>> alloc_pages_any_node() identical to alloc_pages_exact_node()?  All
>>
>> alloc_pages_any_node means user allows allocated pages in any
>> node(most likely current node) alloc_pages_exact_node means user
>> allows allocated pages in nid node if he doesn't use __GFP_THISNODE.
>
> Ooh, sorry, I meant alloc_pages().  What would be the difference
> between alloc_pages_any_node() and alloc_pages()?

It's no different. It's same. Just naming is more explicit. :)
I think it could be following as.

#define alloc_pages alloc_pages_any_node.
strucdt page * alloc_pages_node() {
int nid = numa_node_id();
...
return page;
}

>
>>> introducing new API just to weed out invalid usages seems like an
>>> overkill.
>>
>> It might be.
>>
>> It think it's almost same add_to_page_cache and add_to_page_cache_locked.
>> If user knows the page is already locked, he can use
>> add_to_page_cache_locked for performance gain and code readability
>> which we need to lock the page before calling it.
>
> Yeah, if both APIs are necessary at the end of the conversion, sure.
> I was just saying that introducing new APIs just to weed out invalid
> usages and then later killing the old API would be rather excessive.
>
> I was just wondering whether we could just clean up alloc_pages_node()
> users and kill alloc_pages_exact_node().

kill alloc_pages_exact_node?
Sorry but I can't understand your point.
I don't want to kill user of alloc_pages_exact_node.
That's opposite.
I want to kill user of alloc_pages_node and change it with
alloc_pages_any_node or alloc_pages_exact_node. :)

I think we can do it. That's because all of caller already can check nid == -1
before calling allocation function explicitly if he cares node locality.

--
Kind regards,
Minchan Kim

Minchan Kim

unread,
Apr 15, 2010, 6:40:02 AM4/15/10
to
On Thu, Apr 15, 2010 at 7:21 PM, Minchan Kim <minch...@gmail.com> wrote:
> On Thu, Apr 15, 2010 at 7:08 PM, Tejun Heo <t...@kernel.org> wrote:
>> Hello,
>>
>> On 04/15/2010 06:40 PM, Minchan Kim wrote:
>>>> I'm not an expert on that part of the kernel but isn't
>>>> alloc_pages_any_node() identical to alloc_pages_exact_node()?  All
>>>
>>> alloc_pages_any_node means user allows allocated pages in any
>>> node(most likely current node) alloc_pages_exact_node means user
>>> allows allocated pages in nid node if he doesn't use __GFP_THISNODE.
>>
>> Ooh, sorry, I meant alloc_pages().  What would be the difference
>> between alloc_pages_any_node() and alloc_pages()?
>
> It's no different. It's same. Just naming is more explicit. :)
> I think it could be following as.
>
> #define alloc_pages alloc_pages_any_node.
> strucdt page * alloc_pages_node() {
typo. Sorry.
struct page * alloc_pages_any_node {

>   int nid = numa_node_id();
>   ...
>   return page;
> }
>

Minchan Kim

unread,
Apr 15, 2010, 7:50:03 AM4/15/10
to
On Thu, Apr 15, 2010 at 8:43 PM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On 04/15/2010 07:21 PM, Minchan Kim wrote:
>> kill alloc_pages_exact_node?
>> Sorry but I can't understand your point.
>> I don't want to kill user of alloc_pages_exact_node.
>> That's opposite.
>> I want to kill user of alloc_pages_node and change it with
>> alloc_pages_any_node or alloc_pages_exact_node. :)
>
> I see, so...
>
>  alloc_pages()          -> alloc_pages_any_node()
>  alloc_pages_node()     -> alloc_pages_exact_node()
>
> right?  It just seems strange to me and different from usual naming
> convention - ie. something which doesn't care about nodes usually
> doesn't carry _node postfix.  Anyways, no big deal, those names just
> felt a bit strange to me.

I don't want to remove alloc_pages for UMA system.
#define alloc_pages alloc_page_sexact_node

What I want to remove is just alloc_pages_node. :)
Sorry for confusing you.

Tejun Heo

unread,
Apr 15, 2010, 7:50:03 AM4/15/10
to
Hello,

On 04/15/2010 07:21 PM, Minchan Kim wrote:
> kill alloc_pages_exact_node?
> Sorry but I can't understand your point.
> I don't want to kill user of alloc_pages_exact_node.
> That's opposite.
> I want to kill user of alloc_pages_node and change it with
> alloc_pages_any_node or alloc_pages_exact_node. :)

I see, so...

alloc_pages() -> alloc_pages_any_node()
alloc_pages_node() -> alloc_pages_exact_node()

right? It just seems strange to me and different from usual naming
convention - ie. something which doesn't care about nodes usually
doesn't carry _node postfix. Anyways, no big deal, those names just

felt a bit strange to me.

Thanks.

--
tejun

Christoph Lameter

unread,
Apr 16, 2010, 12:10:02 PM4/16/10
to
On Thu, 15 Apr 2010, Minchan Kim wrote:

> I don't want to remove alloc_pages for UMA system.

alloc_pages is the same as alloc_pages_any_node so why have it?

> #define alloc_pages alloc_page_sexact_node
>
> What I want to remove is just alloc_pages_node. :)

Why remove it? If you want to get rid of -1 handling then check all the
callsites and make sure that they are not using -1.

Also could you define a constant for -1? -1 may have various meanings. One
is the local node and the other is any node. The difference is if memory
policies are obeyed or not. Note that alloc_pages follows memory policies
whereas alloc_pages_node does not.

Therefore

alloc_pages() != alloc_pages_node( , -1)

Lee Schermerhorn

unread,
Apr 16, 2010, 3:20:02 PM4/16/10
to
On Fri, 2010-04-16 at 11:07 -0500, Christoph Lameter wrote:
> On Thu, 15 Apr 2010, Minchan Kim wrote:
>
> > I don't want to remove alloc_pages for UMA system.
>
> alloc_pages is the same as alloc_pages_any_node so why have it?
>
> > #define alloc_pages alloc_page_sexact_node
> >
> > What I want to remove is just alloc_pages_node. :)
>
> Why remove it? If you want to get rid of -1 handling then check all the
> callsites and make sure that they are not using -1.
>
> Also could you define a constant for -1? -1 may have various meanings. One
> is the local node and the other is any node.

NUMA_NO_NODE is #defined as (-1) and can be used for this purpose. '-1'
has been replaced by this in many cases. It can be interpreted as "No
node specified" == "any node is acceptable". But, it also has multiple
meanings. E.g., in the hugetlb sysfs attribute and sysctl functions it
indicates the global hstates [all nodes] vs a per node hstate. So, I
suppose one could define a NUMA_ANY_NODE, to make the intention clear at
the call site.

I believe that all usage of -1 to mean the local node has been removed,
unless I missed one. Local allocation is now indicated by a mempolicy
mode flag--MPOL_F_LOCAL. It's treated as a special case of
MPOL_PREFERRED.

> The difference is if memory
> policies are obeyed or not. Note that alloc_pages follows memory policies
> whereas alloc_pages_node does not.
>
> Therefore
>
> alloc_pages() != alloc_pages_node( , -1)
>
> --

> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majo...@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"do...@kvack.org"> em...@kvack.org </a>

Minchan Kim

unread,
Apr 18, 2010, 12:00:01 PM4/18/10
to
Hi, Christoph.

On Fri, 2010-04-16 at 11:07 -0500, Christoph Lameter wrote:

> On Thu, 15 Apr 2010, Minchan Kim wrote:
>
> > I don't want to remove alloc_pages for UMA system.
>
> alloc_pages is the same as alloc_pages_any_node so why have it?

I don't want to force using '_node' postfix on UMA users.
Maybe they don't care getting page from any node and event don't need to
know about _NODE_.

>
> > #define alloc_pages alloc_page_sexact_node
> >
> > What I want to remove is just alloc_pages_node. :)
>
> Why remove it? If you want to get rid of -1 handling then check all the

alloc_pages_node have multiple meaning as you said. So some of users
misuses that API. I want to clear intention of user.

> callsites and make sure that they are not using -1.

Sure. I must do it before any progressing.

>
> Also could you define a constant for -1? -1 may have various meanings. One
> is the local node and the other is any node. The difference is if memory
> policies are obeyed or not. Note that alloc_pages follows memory policies
> whereas alloc_pages_node does not.
>
> Therefore
>
> alloc_pages() != alloc_pages_node( , -1)
>

Yes, now it's totally different.
On UMA, It's any node but on NUMA, local node.

My concern is following as.

alloc_pages_node means any node but it has nid argument.
Why should user of alloc_pages who want to get page from any node pass
nid argument? It's rather awkward.

Some of user misunderstood it and used alloc_pages_node instead of
alloc_pages_exact_node although he already know exact _NID_.
Of course, it's not a BUG since if nid >= 0 it works well.

But I want to remove such multiple meaning to clear intention of user.

--
Kind regards,
Minchan Kim

Minchan Kim

unread,
Apr 18, 2010, 12:00:02 PM4/18/10
to
Hi, Lee.

On Fri, 2010-04-16 at 15:13 -0400, Lee Schermerhorn wrote:
> On Fri, 2010-04-16 at 11:07 -0500, Christoph Lameter wrote:
> > On Thu, 15 Apr 2010, Minchan Kim wrote:
> >
> > > I don't want to remove alloc_pages for UMA system.
> >
> > alloc_pages is the same as alloc_pages_any_node so why have it?
> >
> > > #define alloc_pages alloc_page_sexact_node
> > >
> > > What I want to remove is just alloc_pages_node. :)
> >
> > Why remove it? If you want to get rid of -1 handling then check all the
> > callsites and make sure that they are not using -1.
> >
> > Also could you define a constant for -1? -1 may have various meanings. One
> > is the local node and the other is any node.
>
> NUMA_NO_NODE is #defined as (-1) and can be used for this purpose. '-1'
> has been replaced by this in many cases. It can be interpreted as "No
> node specified" == "any node is acceptable". But, it also has multiple
> meanings. E.g., in the hugetlb sysfs attribute and sysctl functions it
> indicates the global hstates [all nodes] vs a per node hstate. So, I
> suppose one could define a NUMA_ANY_NODE, to make the intention clear at
> the call site.
>
> I believe that all usage of -1 to mean the local node has been removed,
> unless I missed one. Local allocation is now indicated by a mempolicy
> mode flag--MPOL_F_LOCAL. It's treated as a special case of
> MPOL_PREFERRED.

Thanks for good information. :)

--
Kind regards,
Minchan Kim

Tejun Heo

unread,
Apr 18, 2010, 5:20:02 PM4/18/10
to
On 04/19/2010 12:54 AM, Minchan Kim wrote:
>> alloc_pages is the same as alloc_pages_any_node so why have it?
>
> I don't want to force using '_node' postfix on UMA users.
> Maybe they don't care getting page from any node and event don't need to
> know about _NODE_.

Yeah, then, remove alloc_pages_any_node(). I can't really see the
point of any_/exact_node. alloc_pages() and alloc_pages_node() are
fine and in line with other functions. Why change it?

>> Why remove it? If you want to get rid of -1 handling then check all the
>
> alloc_pages_node have multiple meaning as you said. So some of users
> misuses that API. I want to clear intention of user.

The name is fine. Just clean up the users and make the intended usage
clear in documentation and implementation (ie. trigger a big fat
warning) and make all the callers use named constants instead of -1
for special meanings.

Thanks.

--
tejun

Minchan Kim

unread,
Apr 18, 2010, 8:10:02 PM4/18/10
to
Hi, Tejun.

On Mon, Apr 19, 2010 at 6:22 AM, Tejun Heo <t...@kernel.org> wrote:
> On 04/19/2010 12:54 AM, Minchan Kim wrote:
>>> alloc_pages is the same as alloc_pages_any_node so why have it?
>>
>> I don't want to force using '_node' postfix on UMA users.
>> Maybe they don't care getting page from any node and event don't need to
>> know about _NODE_.
>
> Yeah, then, remove alloc_pages_any_node().  I can't really see the
> point of any_/exact_node.  alloc_pages() and alloc_pages_node() are
> fine and in line with other functions.  Why change it?
>
>>> Why remove it? If you want to get rid of -1 handling then check all the
>>
>> alloc_pages_node have multiple meaning as you said. So some of users
>> misuses that API. I want to clear intention of user.
>
> The name is fine.  Just clean up the users and make the intended usage
> clear in documentation and implementation (ie. trigger a big fat
> warning) and make all the callers use named constants instead of -1
> for special meanings.
>
> Thanks.

Let's tidy my table.

I made quick patch to show the concept with one example of pci-dma.
(Sorry but I attach patch since web gmail's mangling.)

On UMA, we can change alloc_pages with
alloc_pages_exact_node(numa_node_id(),....)
(Actually, the patch is already merged mmotm)

on NUMA, alloc_pages is some different meaning, so I don't want to change it.
on NUMA, alloc_pages_node means _ANY_NODE_.
So let's remove nid argument and change naming with alloc_pages_any_node.

Then, whole users of alloc_pages_node can be changed between
alloc_pages_exact_node and alloc_pages_any_node.

It was my intention. What's your concern?

Thanks for your interest, Tejun. :)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a4ac764..dc511cb 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -152,12 +152,21 @@ void *dma_generic_alloc_coherent(struct device
*dev, size_t size,
unsigned long dma_mask;
struct page *page;
dma_addr_t addr;
+ int nid;

dma_mask = dma_alloc_coherent_mask(dev, flag);

flag |= __GFP_ZERO;
again:
- page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
+ nid = dev_to_node(dev);
+ /*
+ * If pci-dma maintainer makes sure nid never has NUMA_NO_NODE
+ * we can remove this ugly checking.
+ */
+ if (nid == NUMA_NO_NODE)
+ page = alloc_pages_any_node(flag, get_order(size));
+ else
+ page = alloc_pages_exact_node(nid, flag, get_order(size));
if (!page)
return NULL;

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4c6d413..47fba21 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -278,13 +278,10 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
}

-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
+static inline struct page *alloc_pagse_any_node(gfp_t gfp_mask,
unsigned int order)
{
- /* Unknown node is current node */
- if (nid < 0)
- nid = numa_node_id();
-
+ int nid = numa_node_id();


return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}

@@ -308,7 +305,7 @@ extern struct page *alloc_page_vma(gfp_t gfp_mask,


struct vm_area_struct *vma, unsigned long addr);
#else
#define alloc_pages(gfp_mask, order) \
- alloc_pages_node(numa_node_id(), gfp_mask, order)
+ alloc_pages_exact_node(numa_node_id(), gfp_mask, order)
#define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
#endif
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)

~

change_alloc_functions_naming.patch

Christoph Lameter

unread,
Apr 19, 2010, 1:40:03 PM4/19/10
to
On Mon, 19 Apr 2010, Minchan Kim wrote:

> My concern is following as.
>
> alloc_pages_node means any node but it has nid argument.
> Why should user of alloc_pages who want to get page from any node pass
> nid argument? It's rather awkward.

Its not awkward but an optimization. The page can be placed on any node
but the user would prefer a certain node. Most of the NUMA things are
there for optimization purposes and not for correctness. If you must have
an allocation on certain nodes for correctness (like SLAB) then
GFP_THISNODE is used.

> Some of user misunderstood it and used alloc_pages_node instead of
> alloc_pages_exact_node although he already know exact _NID_.
> Of course, it's not a BUG since if nid >= 0 it works well.
>
> But I want to remove such multiple meaning to clear intention of user.

Its not clear to me that this renaming etc helps. You
must use GFP_THISNODE if allocation must occur from a certain node.
alloc_pages_exact_node results in more confusion because it does suggest
that fallback to other nodes is not allowed.

Christoph Lameter

unread,
Apr 19, 2010, 1:50:02 PM4/19/10
to
On Mon, 19 Apr 2010, Minchan Kim wrote:

> Let's tidy my table.
>
> I made quick patch to show the concept with one example of pci-dma.
> (Sorry but I attach patch since web gmail's mangling.)
>
> On UMA, we can change alloc_pages with
> alloc_pages_exact_node(numa_node_id(),....)
> (Actually, the patch is already merged mmotm)

UMA does not have the concept of nodes. Whatever node you specify is
irrelevant. Please remove the patch from mmotm.

> on NUMA, alloc_pages is some different meaning, so I don't want to change it.

No it has the same meaning. It means allocate a page.

> on NUMA, alloc_pages_node means _ANY_NODE_.

It means allocate on the indicated node if possible. Memory could come
from any node due to fallback (in order of node preference).

> So let's remove nid argument and change naming with alloc_pages_any_node.

??? What in the world are you doing?

> Then, whole users of alloc_pages_node can be changed between
> alloc_pages_exact_node and alloc_pages_any_node.
>
> It was my intention. What's your concern?

I dont see the point.

> again:
> - page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
> + nid = dev_to_node(dev);
> + /*
> + * If pci-dma maintainer makes sure nid never has NUMA_NO_NODE
> + * we can remove this ugly checking.
> + */
> + if (nid == NUMA_NO_NODE)
> + page = alloc_pages_any_node(flag, get_order(size));

s/alloc_pages_any_node/alloc_pages/

> + else
> + page = alloc_pages_exact_node(nid, flag, get_order(size));

s/alloc_pages_exact_node/alloc_pages_node/

> -static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
> +static inline struct page *alloc_pagse_any_node(gfp_t gfp_mask,
> unsigned int order)
> {
> - /* Unknown node is current node */
> - if (nid < 0)
> - nid = numa_node_id();
> -
> + int nid = numa_node_id();
> return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
> }
>

This is very confusing. Because it is

alloc_pages_numa_node_id()


alloca_pages_any_node suggests that the kernel randomly picks a node?

change_alloc_functions_naming.patch

Tejun Heo

unread,
Apr 19, 2010, 6:30:02 PM4/19/10
to
Hello, Christoph.

On 04/20/2010 02:38 AM, Christoph Lameter wrote:
> alloc_pages_exact_node results in more confusion because it does suggest
> that fallback to other nodes is not allowed.

I can't see why alloc_pages_exact_node() exists at all. It's in the
mainline and if you look at the difference between alloc_pages_node()
and alloc_pages_exact_node(), it's almost silly. :-(

--
tejun

Minchan Kim

unread,
Apr 19, 2010, 8:30:02 PM4/19/10
to
On Tue, Apr 20, 2010 at 2:45 AM, Christoph Lameter
<c...@linux-foundation.org> wrote:
> On Mon, 19 Apr 2010, Minchan Kim wrote:
>
>> Let's tidy my table.
>>
>> I made quick patch to show the concept with one example of pci-dma.
>> (Sorry but I attach patch since web gmail's mangling.)
>>
>> On UMA, we can change alloc_pages with
>> alloc_pages_exact_node(numa_node_id(),....)
>> (Actually, the patch is already merged mmotm)
>
> UMA does not have the concept of nodes. Whatever node you specify is
> irrelevant. Please remove the patch from mmotm.

I didn't change API name. The patch is just for optimization.
http://lkml.org/lkml/2010/4/14/225
I think it's reasonable in UMA.
Why do you want to remove it?

Do you dislike alloc_pages_exact_node naming?
I added comment.
http://lkml.org/lkml/2010/4/14/230
Do you think it isn't enough?

This patch results from misunderstanding of alloc_pages_exact_node.
(http://marc.info/?l=linux-mm&m=127109064101184&w=2)
At that time, I thought naming changing is worth.
But many people don't like it.
Okay. It was just trial and if everyone dislike, I don't have any strong cause.
But this patch series don't relate to it. Again said, It's just for
optimization patch.

Let's clarify other's opinion.

1. "I dislike alloc_pages_exact_node naming. Let's change it with more
clear name."
2. "I hate alloc_pages_exact_node. It's trivial optimization. Let's
remove it and replace it with alloc_pages_node."
3. "alloc_pages_exact_node naming is not bad. Let's add the comment to
clear name"
4. "Let's cleanup alloc_pages_xxx in this change as well as 3.
5. "Please, don't touch. Remain whole of thing like as-is."

I think Chrsitop selects 5 or 1, Tejun selects 2, Mel selects 3, me
want to 4 but is satisfied with 3. Right?

If we selects 5, In future, there are confusing between
alloc_pages_node and alloc_pages_exact_node.So I don't want it.

If we select 2, We already have many place of alloc_pages_exact_node.
And I add this patch series. So most of caller uses alloc_pages_exact_node now.
Isn't it trivial?

So I want 3 at lest although you guys don't like 4.
Please, suggest better idea to me. :)

--
Kind regards,
Minchan Kim

Mel Gorman

unread,
Apr 20, 2010, 11:10:02 AM4/20/10
to
On Tue, Apr 20, 2010 at 07:27:09AM +0900, Tejun Heo wrote:
> Hello, Christoph.
>
> On 04/20/2010 02:38 AM, Christoph Lameter wrote:
> > alloc_pages_exact_node results in more confusion because it does suggest
> > that fallback to other nodes is not allowed.
>
> I can't see why alloc_pages_exact_node() exists at all. It's in the
> mainline and if you look at the difference between alloc_pages_node()
> and alloc_pages_exact_node(), it's almost silly. :-(
>

alloc_pages_exact_node() avoids a branch in a hot path that is checking for
something the caller already knows. That's the reason it exists.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

Christoph Lameter

unread,
Apr 21, 2010, 10:20:01 AM4/21/10
to
On Tue, 20 Apr 2010, Mel Gorman wrote:

> alloc_pages_exact_node() avoids a branch in a hot path that is checking for
> something the caller already knows. That's the reason it exists.

We can avoid alloc_pages_exact_node() by making all callers of
alloc_pages_node() never use -1. -1 is ambiguous and only rarely will a
caller pass that to alloc_pages_node().

Minchan Kim

unread,
Apr 21, 2010, 1:10:01 PM4/21/10
to
On Wed, Apr 21, 2010 at 11:15 PM, Christoph Lameter
<c...@linux-foundation.org> wrote:
> On Tue, 20 Apr 2010, Mel Gorman wrote:
>
>> alloc_pages_exact_node() avoids a branch in a hot path that is checking for
>> something the caller already knows. That's the reason it exists.
>
> We can avoid alloc_pages_exact_node() by making all callers of
> alloc_pages_node() never use -1. -1 is ambiguous and only rarely will a
> caller pass that to alloc_pages_node().

That's very reasonable to me.
Then, we can remove alloc_pages_exact_node and nid < 0 check in
alloc_pages_node at the same time.

Mel. Could you agree?

Firstly Tejun suggested this but I didn't got the point.
Sorry for bothering you.

Okay. I will dive into this approach.
Thanks for careful review, All.


> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majo...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"do...@kvack.org"> em...@kvack.org </a>
>

--
Kind regards,
Minchan Kim

Tejun Heo

unread,
Apr 22, 2010, 4:40:02 AM4/22/10
to
Hello,

On 04/20/2010 05:05 PM, Mel Gorman wrote:
> alloc_pages_exact_node() avoids a branch in a hot path that is checking for
> something the caller already knows. That's the reason it exists.

Yeah sure but Minchan is trying to tidy up the API by converting
alloc_pages_node() users to use alloc_pages_exact_node(), at which
point, the distinction becomes pretty useless. Wouldn't just making
alloc_pages_node() do what alloc_pages_exact_node() does now and
converting all its users be simpler? IIRC, the currently planned
transformation looks like the following.

alloc_pages() -> alloc_pages_any_node()
alloc_pages_node() -> basically gonna be obsoleted by _exact_node
alloc_pages_exact_node() -> gonna be used by most NUMA aware allocs

So, let's just make sure no one calls alloc_pages_node() w/ -1 nid,
kill alloc_pages_node() and rename alloc_pages_exact_node() to
alloc_pages_node().

Thanks.

--
tejun

Minchan Kim

unread,
Apr 22, 2010, 6:20:01 AM4/22/10
to
On Wed, Apr 21, 2010 at 7:48 PM, Tejun Heo <t...@kernel.org> wrote:
> Hello,
>
> On 04/20/2010 05:05 PM, Mel Gorman wrote:
>> alloc_pages_exact_node() avoids a branch in a hot path that is checking for
>> something the caller already knows. That's the reason it exists.
>
> Yeah sure but Minchan is trying to tidy up the API by converting
> alloc_pages_node() users to use alloc_pages_exact_node(), at which
> point, the distinction becomes pretty useless.  Wouldn't just making
> alloc_pages_node() do what alloc_pages_exact_node() does now and
> converting all its users be simpler?  IIRC, the currently planned
> transformation looks like the following.
>
>  alloc_pages()                  -> alloc_pages_any_node()
>  alloc_pages_node()             -> basically gonna be obsoleted by _exact_node
>  alloc_pages_exact_node()       -> gonna be used by most NUMA aware allocs
>
> So, let's just make sure no one calls alloc_pages_node() w/ -1 nid,
> kill alloc_pages_node() and rename alloc_pages_exact_node() to
> alloc_pages_node().

Yes. It was a stupid idea. I hope Mel agree this suggestion.
Thanks for careful review, Tejun.


--
Kind regards,
Minchan Kim

0 new messages