Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

/proc/kcore has a unreasonable size(281474974617600) in x86_64 2.6.30-rc8.

120 views
Skip to first unread message

Tao Ma

unread,
Jun 5, 2009, 12:04:32 AM6/5/09
to linux-...@vger.kernel.org
Hi list,
In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable large
to be 281474974617600.
While in a x86 box, it is 931131392 which looks sane.

[root@test8 ~]# ll /proc/kcore
-r-------- 1 root root 281474974617600 Jun 5 11:15 /proc/kcore

[root@ocfs2-test9 ~]$ ll /proc/kcore
-r-------- 1 root root 931131392 Jun 5 11:58 /proc/kcore

I just noticed this when kexec fails in "Can't find kernel text map area
from kcore".

Is there something wrong?

Regards,
Tao

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Andrew Morton

unread,
Jun 5, 2009, 1:38:28 AM6/5/09
to Tao Ma, linux-...@vger.kernel.org
On Fri, 05 Jun 2009 12:03:52 +0800 Tao Ma <tao...@oracle.com> wrote:

> Hi list,
> In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable large
> to be 281474974617600.
> While in a x86 box, it is 931131392 which looks sane.
>
> [root@test8 ~]# ll /proc/kcore
> -r-------- 1 root root 281474974617600 Jun 5 11:15 /proc/kcore
>
> [root@ocfs2-test9 ~]$ ll /proc/kcore
> -r-------- 1 root root 931131392 Jun 5 11:58 /proc/kcore
>
> I just noticed this when kexec fails in "Can't find kernel text map area
> from kcore".
>
> Is there something wrong?
>

fs/proc/kcore.c hasn't changed since October last year. Was 2.6.29 OK?
Earlier kernels?

Thanks.

Amerigo Wang

unread,
Jun 5, 2009, 1:47:04 AM6/5/09
to Tao Ma, linux-...@vger.kernel.org
On Fri, Jun 05, 2009 at 12:03:52PM +0800, Tao Ma wrote:
> Hi list,
> In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable large
> to be 281474974617600.
> While in a x86 box, it is 931131392 which looks sane.
>
> [root@test8 ~]# ll /proc/kcore
> -r-------- 1 root root 281474974617600 Jun 5 11:15 /proc/kcore
>
> [root@ocfs2-test9 ~]$ ll /proc/kcore
> -r-------- 1 root root 931131392 Jun 5 11:58 /proc/kcore

Hmm, what is your physical RAM size on test8?
/proc/kcore looks fine on my x86_64 box.

>
> I just noticed this when kexec fails in "Can't find kernel text map area
> from kcore".
>
> Is there something wrong?

It looks like that error message is from userspace?

Tao Ma

unread,
Jun 5, 2009, 2:08:26 AM6/5/09
to Amerigo Wang, linux-...@vger.kernel.org

Amerigo Wang wrote:
> On Fri, Jun 05, 2009 at 12:03:52PM +0800, Tao Ma wrote:
>> Hi list,
>> In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable large
>> to be 281474974617600.
>> While in a x86 box, it is 931131392 which looks sane.
>>
>> [root@test8 ~]# ll /proc/kcore
>> -r-------- 1 root root 281474974617600 Jun 5 11:15 /proc/kcore
>>
>> [root@ocfs2-test9 ~]$ ll /proc/kcore
>> -r-------- 1 root root 931131392 Jun 5 11:58 /proc/kcore
>
> Hmm, what is your physical RAM size on test8?
> /proc/kcore looks fine on my x86_64 box.

Only 4G.


>
>> I just noticed this when kexec fails in "Can't find kernel text map area
>> from kcore".
>>
>> Is there something wrong?
>
> It looks like that error message is from userspace?

I just started kdump and get the error message.

Regards,
Tao

Amerigo Wang

unread,
Jun 5, 2009, 2:41:26 AM6/5/09
to Tao Ma, Amerigo Wang, linux-...@vger.kernel.org
On Fri, Jun 05, 2009 at 02:07:58PM +0800, Tao Ma wrote:
>
>
> Amerigo Wang wrote:
>> On Fri, Jun 05, 2009 at 12:03:52PM +0800, Tao Ma wrote:
>>> Hi list,
>>> In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable
>>> large to be 281474974617600.
>>> While in a x86 box, it is 931131392 which looks sane.
>>>
>>> [root@test8 ~]# ll /proc/kcore
>>> -r-------- 1 root root 281474974617600 Jun 5 11:15 /proc/kcore
>>>
>>> [root@ocfs2-test9 ~]$ ll /proc/kcore
>>> -r-------- 1 root root 931131392 Jun 5 11:58 /proc/kcore
>>
>> Hmm, what is your physical RAM size on test8?
>> /proc/kcore looks fine on my x86_64 box.
> Only 4G.

Hmm, my x86_box has 8G mem, the size of kcore looks much
saner than the above huge number, but it is still wrong
according to what the man page describes...

Please do what Andrew said, it will be helpful.

>>
>>> I just noticed this when kexec fails in "Can't find kernel text map
>>> area from kcore".
>>>
>>> Is there something wrong?
>>
>> It looks like that error message is from userspace?
> I just started kdump and get the error message.

IIRC, kdump should use /proc/vmcore, instead of /proc/kcore...
nothing related.

Thanks.

Tao Ma

unread,
Jun 5, 2009, 2:57:01 AM6/5/09
to Amerigo Wang, linux-...@vger.kernel.org

in el5, when start kdump service it will do something like
/sbin/kexec --args-linux -p '--command-line=ro root=LABEL=/ rhgb quiet
irqpoll maxcpus=1' --initrd=/boot/initrd-2.6.18-53.el5kdump.img
/boot/vmlinuz-2.6.18-53.el5

And the error message is from there.

Regards,
Tao

Tao Ma

unread,
Jun 5, 2009, 3:00:19 AM6/5/09
to Andrew Morton, linux-...@vger.kernel.org

Andrew Morton wrote:
> On Fri, 05 Jun 2009 12:03:52 +0800 Tao Ma <tao...@oracle.com> wrote:
>
>> Hi list,
>> In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable large
>> to be 281474974617600.
>> While in a x86 box, it is 931131392 which looks sane.
>>
>> [root@test8 ~]# ll /proc/kcore
>> -r-------- 1 root root 281474974617600 Jun 5 11:15 /proc/kcore
>>
>> [root@ocfs2-test9 ~]$ ll /proc/kcore
>> -r-------- 1 root root 931131392 Jun 5 11:58 /proc/kcore
>>
>> I just noticed this when kexec fails in "Can't find kernel text map area
>> from kcore".
>>
>> Is there something wrong?
>>
>
> fs/proc/kcore.c hasn't changed since October last year. Was 2.6.29 OK?
> Earlier kernels?

with 2.6.29, ls shows the same output.


[root@test8 ~]# ll /proc/kcore

-r-------- 1 root root 281474974617600 Jun 5 14:35 /proc/kcore

But the kexec works.

I just checked .28, the same as .29.

Regards,
Tao

Amerigo Wang

unread,
Jun 5, 2009, 3:54:48 AM6/5/09
to Tao Ma, Andrew Morton, linux-...@vger.kernel.org
On Fri, Jun 05, 2009 at 02:59:46PM +0800, Tao Ma wrote:
>
>
> Andrew Morton wrote:
>> On Fri, 05 Jun 2009 12:03:52 +0800 Tao Ma <tao...@oracle.com> wrote:
>>
>>> Hi list,
>>> In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable
>>> large to be 281474974617600.
>>> While in a x86 box, it is 931131392 which looks sane.
>>>
>>> [root@test8 ~]# ll /proc/kcore
>>> -r-------- 1 root root 281474974617600 Jun 5 11:15 /proc/kcore
>>>
>>> [root@ocfs2-test9 ~]$ ll /proc/kcore
>>> -r-------- 1 root root 931131392 Jun 5 11:58 /proc/kcore
>>>
>>> I just noticed this when kexec fails in "Can't find kernel text map
>>> area from kcore".
>>>
>>> Is there something wrong?
>>>
>>
>> fs/proc/kcore.c hasn't changed since October last year. Was 2.6.29 OK?
>> Earlier kernels?
> with 2.6.29, ls shows the same output.
> [root@test8 ~]# ll /proc/kcore
> -r-------- 1 root root 281474974617600 Jun 5 14:35 /proc/kcore


Thanks.

It looks like the value of 'high_memory' is insane..

Can you get its value on your machine? You can add a printk() or use
systemtap etc..

Amerigo Wang

unread,
Jun 5, 2009, 3:58:26 AM6/5/09
to Tao Ma, Amerigo Wang, linux-...@vger.kernel.org

From /sbin/kexec? I just checked the source code of kexec-tools,
I haven't found that message...

Tao Ma

unread,
Jun 5, 2009, 4:58:22 AM6/5/09
to Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org

Amerigo Wang wrote:
> On Fri, Jun 05, 2009 at 02:59:46PM +0800, Tao Ma wrote:
>>
>> Andrew Morton wrote:
>>> On Fri, 05 Jun 2009 12:03:52 +0800 Tao Ma <tao...@oracle.com> wrote:
>>>
>>>> Hi list,
>>>> In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable
>>>> large to be 281474974617600.
>>>> While in a x86 box, it is 931131392 which looks sane.
>>>>
>>>> [root@test8 ~]# ll /proc/kcore
>>>> -r-------- 1 root root 281474974617600 Jun 5 11:15 /proc/kcore
>>>>
>>>> [root@ocfs2-test9 ~]$ ll /proc/kcore
>>>> -r-------- 1 root root 931131392 Jun 5 11:58 /proc/kcore
>>>>
>>>> I just noticed this when kexec fails in "Can't find kernel text map
>>>> area from kcore".
>>>>
>>>> Is there something wrong?
>>>>
>>> fs/proc/kcore.c hasn't changed since October last year. Was 2.6.29 OK?
>>> Earlier kernels?
>> with 2.6.29, ls shows the same output.
>> [root@test8 ~]# ll /proc/kcore
>> -r-------- 1 root root 281474974617600 Jun 5 14:35 /proc/kcore
>
>
> Thanks.
>
> It looks like the value of 'high_memory' is insane..
>
> Can you get its value on your machine? You can add a printk() or use
> systemtap etc..

Just did that.
Also a strange number.
high memory 18446612137615818752.

Regards,
Tao

Tao Ma

unread,
Jun 5, 2009, 5:01:27 AM6/5/09
to Amerigo Wang, linux-...@vger.kernel.org

No, it is there.
See kexec-tools-1.101-reloc-update.patch.
src rpm is kexec-tools-1.101-194.4.el5.src.rpm. So it is a patch from el5.

Regards,
Tao

Andrew Morton

unread,
Jun 5, 2009, 5:16:25 AM6/5/09
to Américo Wang, Tao Ma, linux-...@vger.kernel.org, Ingo Molnar, Yinghai Lu, Andi Kleen
On Fri, 5 Jun 2009 17:09:54 +0800 Am__rico Wang <xiyou.w...@gmail.com> wrote:

> On Fri, Jun 5, 2009 at 4:57 PM, Tao Ma<tao...@oracle.com> wrote:
> >
> >
> > Amerigo Wang wrote:
> >>
> >> On Fri, Jun 05, 2009 at 02:59:46PM +0800, Tao Ma wrote:
> >>>
> >>> Andrew Morton wrote:
> >>>>
> >>>> On Fri, 05 Jun 2009 12:03:52 +0800 Tao Ma <tao...@oracle.com> wrote:
> >>>>
> >>>>> Hi list,

> >>>>> __ __ __ In 2.6.30-rc8, /proc/kcore in x86_64's size is unreasonable large


> >>>>> to be 281474974617600.
> >>>>> While in a x86 box, it is 931131392 which looks sane.
> >>>>>
> >>>>> [root@test8 ~]# ll /proc/kcore

> >>>>> -r-------- 1 root root 281474974617600 Jun __5 11:15 /proc/kcore


> >>>>>
> >>>>> [root@ocfs2-test9 ~]$ ll /proc/kcore

> >>>>> -r-------- 1 root root 931131392 Jun __5 11:58 /proc/kcore


> >>>>>
> >>>>> I just noticed this when kexec fails in "Can't find kernel text map
> >>>>> area from kcore".
> >>>>>
> >>>>> Is there something wrong?
> >>>>>

> >>>> fs/proc/kcore.c hasn't changed since October last year. __Was 2.6.29 OK?


> >>>> Earlier kernels?
> >>>
> >>> with 2.6.29, ls shows the same output.
> >>> [root@test8 ~]# ll /proc/kcore

> >>> -r-------- 1 root root 281474974617600 Jun __5 14:35 /proc/kcore


> >>
> >>
> >> Thanks.
> >>
> >> It looks like the value of 'high_memory' is insane..
> >> Can you get its value on your machine? You can add a printk() or use
> >> systemtap etc..
> >
> > Just did that.
> > Also a strange number.
> > high memory 18446612137615818752.
> >

(top-posting repaired)

> Add some Cc: to x86 people. :)
>
> Yinghai?
>

Please send the boot logs: dmesg -s 1000000 > foo

Américo Wang

unread,
Jun 5, 2009, 5:17:45 AM6/5/09
to Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Yinghai Lu, Andi Kleen
Add some Cc: to x86 people. :)

Yinghai?

Amerigo Wang

unread,
Jun 5, 2009, 5:18:25 AM6/5/09
to Tao Ma, Amerigo Wang, linux-...@vger.kernel.org
On Fri, Jun 05, 2009 at 05:01:00PM +0800, Tao Ma wrote:
>>>>>>> I just noticed this when kexec fails in "Can't find kernel
>>>>>>> text map area from kcore".
>>>>>>>
>>>>>>> Is there something wrong?
>>>>>> It looks like that error message is from userspace?
>>>>> I just started kdump and get the error message.
>>>> IIRC, kdump should use /proc/vmcore, instead of /proc/kcore...
>>>> nothing related.
>>> in el5, when start kdump service it will do something like
>>> /sbin/kexec --args-linux -p '--command-line=ro root=LABEL=/ rhgb
>>> quiet irqpoll maxcpus=1'
>>> --initrd=/boot/initrd-2.6.18-53.el5kdump.img
>>> /boot/vmlinuz-2.6.18-53.el5
>>>
>>> And the error message is from there.
>>
>>> From /sbin/kexec? I just checked the source code of kexec-tools,
>> I haven't found that message...
> No, it is there.
> See kexec-tools-1.101-reloc-update.patch.
> src rpm is kexec-tools-1.101-194.4.el5.src.rpm. So it is a patch from el5.

Oh, I used the original source code without any extral patches..

Thanks for your reply.

Tao Ma

unread,
Jun 5, 2009, 5:33:40 AM6/5/09
to Andrew Morton, Américo Wang, linux-...@vger.kernel.org, Ingo Molnar, Yinghai Lu, Andi Kleen

attached.

Thanks.
Tao

foo

Amerigo Wang

unread,
Jun 5, 2009, 5:49:57 AM6/5/09
to Tao Ma, Andrew Morton, Américo Wang, linux-...@vger.kernel.org, Ingo Molnar, Yinghai Lu, Andi Kleen
On Fri, Jun 05, 2009 at 05:30:49PM +0800, Tao Ma wrote:
>>
>> Please send the boot logs: dmesg -s 1000000 > foo
> attached.

>#######high memory 18446612137615818752, size_t 18446612137615818752
>#######kcore size 5301604352, PAGE_OFFSET 0, PAGE_SIZE 4096


These two lines must be added by yourself...

What?!
How can PAGE_OFFSET be 0??
Can you show us these two printk() you just added?

And, the size of kcore is not the crazy number in the subject...
This one is much saner..

Tao Ma

unread,
Jun 5, 2009, 10:28:36 AM6/5/09
to Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Yinghai Lu, Andi Kleen

Amerigo Wang wrote:
> On Fri, Jun 05, 2009 at 05:30:49PM +0800, Tao Ma wrote:
>>> Please send the boot logs: dmesg -s 1000000 > foo
>> attached.
>
>> #######high memory 18446612137615818752, size_t 18446612137615818752
>> #######kcore size 5301604352, PAGE_OFFSET 0, PAGE_SIZE 4096
>
>
> These two lines must be added by yourself...
>
> What?!
> How can PAGE_OFFSET be 0??
> Can you show us these two printk() you just added?
>
> And, the size of kcore is not the crazy number in the subject...
> This one is much saner..

Sorry, I used the wrong printk. the correct one is:


#######high memory 18446612137615818752, size_t 18446612137615818752

#######kcore size 5301604352, PAGE_OFFSET 18446612132314218496,
PAGE_SIZE 4096

the printk is attached.

Thanks.
Tao

printk.diff

Yinghai Lu

unread,
Jun 5, 2009, 1:51:29 PM6/5/09
to Tao Ma, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Andi Kleen

%lx should be used.

also you compiler doesn't like

high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;

in setup.c?

YH

Tao Ma

unread,
Jun 6, 2009, 10:39:28 AM6/6/09
to Yinghai Lu, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Andi Kleen

Yinghai Lu wrote:
> Tao Ma wrote:
>>
>> Amerigo Wang wrote:
>>> On Fri, Jun 05, 2009 at 05:30:49PM +0800, Tao Ma wrote:
>>>>> Please send the boot logs: dmesg -s 1000000 > foo
>>>> attached.
>>>> #######high memory 18446612137615818752, size_t 18446612137615818752
>>>> #######kcore size 5301604352, PAGE_OFFSET 0, PAGE_SIZE 4096
>>>
>>> These two lines must be added by yourself...
>>>
>>> What?!
>>> How can PAGE_OFFSET be 0??
>>> Can you show us these two printk() you just added?
>>>
>>> And, the size of kcore is not the crazy number in the subject...
>>> This one is much saner..
>> Sorry, I used the wrong printk. the correct one is:
>> #######high memory 18446612137615818752, size_t 18446612137615818752
>> #######kcore size 5301604352, PAGE_OFFSET 18446612132314218496,
>> PAGE_SIZE 4096
>>
>
> %lx should be used.
>
> also you compiler doesn't like
>
> high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;
>
> in setup.c?
Sorry fo my poor English, bug what do you mean?

I just printk in the setup.c and the result is

@@@@high_momory ffff88013c000000

and my gcc version is:
gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14)

Thanks.
Tao

Yinghai Lu

unread,
Jun 6, 2009, 6:21:59 PM6/6/09
to Tao Ma, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Andi Kleen

so that value print out is right.

YH

Amerigo Wang

unread,
Jun 7, 2009, 9:50:30 PM6/7/09
to Yinghai Lu, Tao Ma, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Andi Kleen

Yeah.

Tao, can you reproduce the number mentioned in the subject??

Thanks.

Tao Ma

unread,
Jun 8, 2009, 2:04:25 AM6/8/09
to Amerigo Wang, Yinghai Lu, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Andi Kleen

Sorry for the delay.

But the result is the same and I don't think it should be changed by my
printk.

Regards,
Tao

Américo Wang

unread,
Jun 8, 2009, 2:41:24 AM6/8/09
to Tao Ma, Yinghai Lu, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Andi Kleen

Yes?
Your printk() shows kcore size is: 5301604352, and in your subject it is
281474974617600...

Or they happened in the same time?

Tao Ma

unread,
Jun 8, 2009, 4:02:47 AM6/8/09
to Américo Wang, Yinghai Lu, Andrew Morton, linux-...@vger.kernel.org, Ingo Molnar, Andi Kleen

yes. the same box and the same linux version.
A bit strange.

[taoma@ocfs2-test2 ~]$ dmesg|grep "high memory"
high memory ffff88013c000000, size 5301604352
[taoma@ocfs2-test2 ~]$ ll /proc/kcore
-r-------- 1 root root 281474974617600 Jun 8 15:20 /proc/kcore


Regards,
Tao

Américo Wang

unread,
Jun 8, 2009, 8:43:47 PM6/8/09
to Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Eric W. Biederman, Alexey Dobriyan
On Mon, Jun 8, 2009 at 4:00 PM, Tao Ma<tao...@oracle.com> wrote:
>>>
>>> But the result is the same
>>
>> Yes?
>> Your printk() shows kcore size is: 5301604352, and in your subject it is
>> 281474974617600...
>>
>> Or they happened in the same time?
>
> yes. the same box and the same linux version.
> A bit strange.
>
> [taoma@ocfs2-test2 ~]$ dmesg|grep "high memory"
> high memory ffff88013c000000, size 5301604352
> [taoma@ocfs2-test2 ~]$ ll /proc/kcore
> -r-------- 1 root root 281474974617600 Jun  8 15:20 /proc/kcore

Really weird...
They should be the same. This means we have some problem in our procfs.

And, we have no problem on i386, I, myself, even can't reproduce this on my
x86_64 box...

Drop Cc to x86 people, add some Cc to proc people. :)

Eric, Alexey, any ideas?

Tao, would you like to send us your .config? Thanks.

Eric W. Biederman

unread,
Jun 9, 2009, 12:10:34 AM6/9/09
to Américo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Américo Wang <xiyou.w...@gmail.com> writes:

> On Mon, Jun 8, 2009 at 4:00 PM, Tao Ma<tao...@oracle.com> wrote:
>>>>
>>>> But the result is the same
>>>
>>> Yes?
>>> Your printk() shows kcore size is: 5301604352, and in your subject it is
>>> 281474974617600...
>>>
>>> Or they happened in the same time?
>>
>> yes. the same box and the same linux version.
>> A bit strange.
>>
>> [taoma@ocfs2-test2 ~]$ dmesg|grep "high memory"
>> high memory ffff88013c000000, size 5301604352
>> [taoma@ocfs2-test2 ~]$ ll /proc/kcore
>> -r-------- 1 root root 281474974617600 Jun  8 15:20 /proc/kcore
>
> Really weird...
> They should be the same. This means we have some problem in our procfs.
>
> And, we have no problem on i386, I, myself, even can't reproduce this on my
> x86_64 box...
>
> Drop Cc to x86 people, add some Cc to proc people. :)
>
> Eric, Alexey, any ideas?
>
> Tao, would you like to send us your .config? Thanks.

Short of some strange patch applied I would guess that a non-sense /proc/kcore
size is related to a kernel memory stomp, stepping on the high_memory variable.

Eric

Amerigo Wang

unread,
Jun 11, 2009, 1:07:15 AM6/11/09
to Eric W. Biederman, Américo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
On Mon, Jun 08, 2009 at 09:10:10PM -0700, Eric W. Biederman wrote:
>Américo Wang <xiyou.w...@gmail.com> writes:
>
>> On Mon, Jun 8, 2009 at 4:00 PM, Tao Ma<tao...@oracle.com> wrote:
>>>>>
>>>>> But the result is the same
>>>>
>>>> Yes?
>>>> Your printk() shows kcore size is: 5301604352, and in your subject it is
>>>> 281474974617600...
>>>>
>>>> Or they happened in the same time?
>>>
>>> yes. the same box and the same linux version.
>>> A bit strange.
>>>
>>> [taoma@ocfs2-test2 ~]$ dmesg|grep "high memory"
>>> high memory ffff88013c000000, size 5301604352
>>> [taoma@ocfs2-test2 ~]$ ll /proc/kcore
>>> -r-------- 1 root root 281474974617600 Jun  8 15:20 /proc/kcore
>>
>> Really weird...
>> They should be the same. This means we have some problem in our procfs.
>>
>> And, we have no problem on i386, I, myself, even can't reproduce this on my
>> x86_64 box...
>>
>> Drop Cc to x86 people, add some Cc to proc people. :)
>>
>> Eric, Alexey, any ideas?
>>
>> Tao, would you like to send us your .config? Thanks.
>
>Short of some strange patch applied I would guess that a non-sense /proc/kcore
>size is related to a kernel memory stomp, stepping on the high_memory variable.

Hello, Eric.

I see the problem now, I think the documentation of /proc/kcore
is wrong, the size of kcore can be more than the size of physical
memory, because it also contains the info of kernel modules which
stay above the mapping of phy memory, see arch/x86/mm/init_64.c.

What do you think?

Thanks!

Eric W. Biederman

unread,
Jun 11, 2009, 10:12:35 AM6/11/09
to Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Amerigo Wang <xiyou.w...@gmail.com> writes:

I think that doesn't make any sense.

I was reading the code.

I smell a nasty problem somewhere.

Eric

Tao Ma

unread,
Jun 12, 2009, 3:55:27 AM6/12/09
to Eric W. Biederman, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Hi all,
sorry for the delay. I am occupied by other stuff these days.

I just tried and the strange thing is that 2 same boxes(Dell optiplex
745) with 2.6.29 kernel have different output. One is normal and one is
wrong. So I am totally puzzled now

So Eric may be right(there is a memory stomp), but it does show sometimes.

Regards,
Tao

Amerigo Wang

unread,
Jun 13, 2009, 12:07:58 AM6/13/09
to Eric W. Biederman, Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan

Fix wrong /proc/kcore size on x86_64.

x86_64 uses __va() macro to caculate the virtual address passed to kclist_add()
but decodes it with its own macro kc_vadd_to_offset(). This is wrong.

Also, according to Documentation/x86/x86_64/mm.txt, kc_vaddr_to_offset()
is wrong too.

So just remove them, use the generic macro.

BTW, the man page for /proc/kcore is wrong, its size can be more than
the physical memory size, because it also contains memory area of
vmalloc(), vsyscall etc...

Reported-by: Tao Ma <tao...@oracle.com>
Signed-off-by: WANG Cong <amw...@redhat.com>
Cc: Eric W. Biederman <ebie...@xmission.com>

---
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index abde308..cdbfd1d 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -163,12 +163,6 @@ extern void cleanup_highmap(void);
#define PAGE_AGP PAGE_KERNEL_NOCACHE
#define HAVE_PAGE_AGP 1

-/* fs/proc/kcore.c */
-#define kc_vaddr_to_offset(v) ((v) & __VIRTUAL_MASK)
-#define kc_offset_to_vaddr(o) \
- (((o) & (1UL << (__VIRTUAL_MASK_SHIFT - 1))) \
- ? ((o) | ~__VIRTUAL_MASK) \
- : (o))

#define __HAVE_ARCH_PTE_SAME
#endif /* !__ASSEMBLY__ */

Eric W. Biederman

unread,
Jun 13, 2009, 12:21:30 AM6/13/09
to Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Amerigo Wang <xiyou.w...@gmail.com> writes:

> Fix wrong /proc/kcore size on x86_64.

How does that change anything?

> x86_64 uses __va() macro to caculate the virtual address passed to kclist_add()
> but decodes it with its own macro kc_vadd_to_offset(). This is wrong.
>
> Also, according to Documentation/x86/x86_64/mm.txt, kc_vaddr_to_offset()
> is wrong too.
>
> So just remove them, use the generic macro.
>
> BTW, the man page for /proc/kcore is wrong, its size can be more than
> the physical memory size, because it also contains memory area of
> vmalloc(), vsyscall etc...

The set of offsets that are usable sure.

However the size from stat is:
proc_root_kcore->size = (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;

Which can not be different than the physical memory size.

Amerigo Wang

unread,
Jun 14, 2009, 10:12:59 PM6/14/09
to Eric W. Biederman, Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
On Fri, Jun 12, 2009 at 09:20:50PM -0700, Eric W. Biederman wrote:
>Amerigo Wang <xiyou.w...@gmail.com> writes:
>
>> Fix wrong /proc/kcore size on x86_64.
>
>How does that change anything?

Please check the description below.

>
>> x86_64 uses __va() macro to caculate the virtual address passed to kclist_add()
>> but decodes it with its own macro kc_vadd_to_offset(). This is wrong.
>>
>> Also, according to Documentation/x86/x86_64/mm.txt, kc_vaddr_to_offset()
>> is wrong too.
>>
>> So just remove them, use the generic macro.
>>
>> BTW, the man page for /proc/kcore is wrong, its size can be more than
>> the physical memory size, because it also contains memory area of
>> vmalloc(), vsyscall etc...
>
>The set of offsets that are usable sure.

We have generic kc_vaddr_to_offset() etc. in fs/proc/kcore.c.


>
>However the size from stat is:
> proc_root_kcore->size = (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
>
>Which can not be different than the physical memory size.

I never say this is not different, of course they are same, but what Tao
reported is the wrong size after a read operation, please try the following:

#ls -l /proc/kcore
#readelf -l /proc/kcore
#ls -l /proc/kcore

You will find the *second* 'ls -l /proc/kcore' reports a size much more
than the physical mem size.

And you will notice the difference of it after this patch applied.

Tao Ma

unread,
Jun 15, 2009, 2:00:06 AM6/15/09
to Amerigo Wang, Eric W. Biederman, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Hi Amerigo,
Just patched my kernel and tested.
The bad news is that although the number is changed, but it isn't right
either.

Here is the output.
[root@test3 ~]# ls -l /proc/kcore
-r-------- 1 root root 131941393240064 Jun 15 13:39 /proc/kcore

But your patch does change something. I just try your commands in
another box which show the right value after reboot. And the result is:

[root@test8 ~]# ls -l /proc/kcore
-r-------- 1 root root 5301604352 Jun 15 13:35 /proc/kcore
[root@test8 ~]# readelf -l /proc/kcore

Elf file type is CORE (Core file)
Entry point 0x0
There are 6 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
NOTE 0x0000000000000190 0x0000000000000000 0x0000000000000000
0x00000000000008bc 0x0000000000000000 0
LOAD 0x000077ffff601000 0xffffffffff600000 0x0000000000000000
0x0000000000800000 0x0000000000800000 RWE 1000
LOAD 0x000077ffa0001000 0xffffffffa0000000 0x0000000000000000
0x000000005f000000 0x000000005f000000 RWE 1000
LOAD 0x000077ff8200a000 0xffffffff82009000 0x0000000000000000
0x00000000006ceb50 0x00000000006ceb50 RWE 1000
LOAD 0x00003a0000001000 0xffffc20000000000 0x0000000000000000
0x00001fffffffffff 0x00001fffffffffff RWE 1000
LOAD 0x0000000000001000 0xffff880000000000 0x0000000000000000
0x000000013c000000 0x000000013c000000 RWE 1000
[root@test8 ~]# ls -l /proc/kcore
-r-------- 1 root root 131941393240064 Jun 15 13:35 /proc/kcore

So you see, the second "ls -l" will show the wrong value.

Regards,
Tao

Tao Ma

unread,
Jun 15, 2009, 4:35:44 AM6/15/09
to Amerigo Wang, Eric W. Biederman, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Hi Amerigo,

The wrong number I mean is 131941393240064.

So do you think


[root@test3 ~]# ls -l /proc/kcore
-r-------- 1 root root 131941393240064 Jun 15 13:39 /proc/kcore

is better than

[taoma@test2 ~]$ ll /proc/kcore
-r-------- 1 root root 281474974617600 Jun 15 15:20 /proc/kcore
?

I don't think so.

Actually the right result should look like

[root@test8 ~]# ls -l /proc/kcore
-r-------- 1 root root 5301604352 Jun 15 13:35 /proc/kcore

And with your patch I can't get this number.

Regards,
Tao

Amerigo Wang wrote:


> On Mon, Jun 15, 2009 at 01:59:08PM +0800, Tao Ma wrote:
>> Hi Amerigo,
>> Just patched my kernel and tested.
>> The bad news is that although the number is changed, but it isn't right
>> either.
>

> Thanks for testing.
>
> What do you mean by saying it isn't right? You think it is wrong only because
> it is more than phy mem size?
>
> Again, the document of /proc/kcore is wrong, it _can_ be more than phy mem size.
>
> Regards.

Amerigo Wang

unread,
Jun 15, 2009, 4:58:53 AM6/15/09
to Tao Ma, Amerigo Wang, Eric W. Biederman, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan

Please don't top-post.

On Mon, Jun 15, 2009 at 04:34:27PM +0800, Tao Ma wrote:
> Hi Amerigo,
>

> The wrong number I mean is 131941393240064.
>
> So do you think
> [root@test3 ~]# ls -l /proc/kcore
> -r-------- 1 root root 131941393240064 Jun 15 13:39 /proc/kcore
>
> is better than
>
> [taoma@test2 ~]$ ll /proc/kcore
> -r-------- 1 root root 281474974617600 Jun 15 15:20 /proc/kcore
> ?

Yes, the former *is* what I can expect.


>
> I don't think so.
>
> Actually the right result should look like
>
> [root@test8 ~]# ls -l /proc/kcore
> -r-------- 1 root root 5301604352 Jun 15 13:35 /proc/kcore
>
> And with your patch I can't get this number.

Of course not.

Again and again, kernel modules and vsyscall are also included
into kcore, unless doing this is wrong you will never get the
number you mentioned above, because they sit above the
phy mem map on x86_64.

Please read the code, I don't want to explain again and again.

Eric W. Biederman

unread,
Jun 15, 2009, 6:09:01 AM6/15/09
to Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Amerigo Wang <xiyou.w...@gmail.com> writes:

> Fix wrong /proc/kcore size on x86_64.
>
> x86_64 uses __va() macro to caculate the virtual address passed to kclist_add()
> but decodes it with its own macro kc_vadd_to_offset(). This is wrong.

Ok. I finally understand what is going on here, and no kc_vaddr_to_offset
is not wrong when applied to a virtual address. In fact I expect the current
definition makes things a bit more predictable.

And yes kclist_add is must be given a virtual address

> Also, according to Documentation/x86/x86_64/mm.txt, kc_vaddr_to_offset()
> is wrong too.

How so? The file offset is a number space that is different from both
physical and virtual addresses.

> So just remove them, use the generic macro.

I think a case can be made either way. In practice neither answer
gives us a dense offset space on x86_64 so I think I prefer the
current definition which sets or clears the high bits as opposed
to something that mangles the address more.

> BTW, the man page for /proc/kcore is wrong, its size can be more than
> the physical memory size, because it also contains memory area of
> vmalloc(), vsyscall etc...

Yes, the man page is wrong. The kcore code is also misleading as it
uses two entirely different definitions of size (aka the maximum
offset accepted).

It uses get_kcore_size and (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
The second definition being bogus as it has nothing to do with which
offsets are accepted.

Eric

Eric W. Biederman

unread,
Jun 15, 2009, 6:11:21 AM6/15/09
to Tao Ma, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Tao Ma <tao...@oracle.com> writes:

> Hi Amerigo,
>
> The wrong number I mean is 131941393240064.
>
> So do you think
> [root@test3 ~]# ls -l /proc/kcore
> -r-------- 1 root root 131941393240064 Jun 15 13:39 /proc/kcore
>
> is better than
>
> [taoma@test2 ~]$ ll /proc/kcore
> -r-------- 1 root root 281474974617600 Jun 15 15:20 /proc/kcore
> ?
>
> I don't think so.
>
> Actually the right result should look like
>
> [root@test8 ~]# ls -l /proc/kcore
> -r-------- 1 root root 5301604352 Jun 15 13:35 /proc/kcore
>
> And with your patch I can't get this number.

Actually that value is the bug. It has absolutely nothing
to do with the offsets that are valid within /proc/kcore.

Why do you prefer the smaller number?

Eric

TaoMa

unread,
Jun 15, 2009, 10:10:51 AM6/15/09
to ebie...@xmission.com, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
ebie...@xmission.com wrote:
> Tao Ma <tao...@oracle.com> writes:
>
>
>> Hi Amerigo,
>>
>> The wrong number I mean is 131941393240064.
>>
>> So do you think
>> [root@test3 ~]# ls -l /proc/kcore
>> -r-------- 1 root root 131941393240064 Jun 15 13:39 /proc/kcore
>>
>> is better than
>>
>> [taoma@test2 ~]$ ll /proc/kcore
>> -r-------- 1 root root 281474974617600 Jun 15 15:20 /proc/kcore
>> ?
>>
>> I don't think so.
>>
>> Actually the right result should look like
>>
>> [root@test8 ~]# ls -l /proc/kcore
>> -r-------- 1 root root 5301604352 Jun 15 13:35 /proc/kcore
>>
>> And with your patch I can't get this number.
>>
>
> Actually that value is the bug. It has absolutely nothing
> to do with the offsets that are valid within /proc/kcore.
>
> Why do you prefer the smaller number?
>
Amerigo said in the previous e-mail that " the man page for/proc/kcore
is wrong, its size can be more than the physical memory size, because it
also contains memory area of vmalloc(), vsyscall etc..."

I have 4G memory, and 5301604352 is just a bit larger than 4G and looks
sane. So I misunderstand that this number is right.

But if it is also a bug, I am willing to test any of the new patch. ;)

Regards,
Tao

Eric W. Biederman

unread,
Jun 15, 2009, 3:48:34 PM6/15/09
to TaoMa, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
TaoMa <tao...@oracle.com> writes:

It should also include the 32 Tebibyte range we have for vmalloc. So
a completely dense encoding would be a bit larger than 35184372088832
bytes. You can see that range in your readelf -l output.

Since the encoding is not dense the size actually comes to. 256TiB.
Or roughly 281474976710656 bytes.

> But if it is also a bug, I am willing to test any of the new patch. ;)

Not in the sense that anything could go wrong. Merely in the sense that
we have a contradictory definition. Which causes loads of confusion.

I am wondering if this difference in definition has caused any
problems applications to fail or if this just started out as an
observation of an anomaly?

Eric

Tao Ma

unread,
Jun 15, 2009, 10:05:46 PM6/15/09
to ebie...@xmission.com, Amerigo Wang, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan

I first noticed it when my el5 box refused to start kdump service and
kexec said something like "Can't find kernel text map area from kcore".
And then I found this number which looked a bit strange.
I also just have another x86 box and "ls -l /proc/kcore" shows:
-r-------- 1 root root 939528192 Jun 16 10:01 /proc/kcore
So I thought this may be a bug and started this thread.

Anyway, later I found that kexec's problem isn't related to this issue.
So maybe we can leave as-is.

regards,
Tao

Américo Wang

unread,
Jun 16, 2009, 11:29:33 AM6/16/09
to Eric W. Biederman, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
On Mon, Jun 15, 2009 at 6:08 PM, Eric W. Biederman<ebie...@xmission.com> wrote:
> Amerigo Wang <xiyou.w...@gmail.com> writes:
>
>> Fix wrong /proc/kcore size on x86_64.
>>
>> x86_64 uses __va() macro to caculate the virtual address passed to kclist_add()
>> but decodes it with its own macro kc_vadd_to_offset(). This is wrong.
>
> Ok.  I finally understand what is going on here, and no kc_vaddr_to_offset
> is not wrong when applied to a virtual address.  In fact I expect the current
> definition makes things a bit more predictable.
>
> And yes kclist_add is must be given a virtual address
>
>> Also, according to Documentation/x86/x86_64/mm.txt, kc_vaddr_to_offset()
>> is wrong too.
>
> How so?  The file offset is a number space that is different from both
> physical and virtual addresses.

Why? They _do_ have some calculated relations.

>
>> So just remove them, use the generic macro.
>
> I think a case can be made either way.  In practice neither answer
> gives us a dense offset space on x86_64 so I think I prefer the
> current definition which sets or clears the high bits as opposed
> to something that mangles the address more.
>

I am trying to dig more... There must be something wrong there.

>
> It uses get_kcore_size and (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
> The second definition being bogus as it has nothing to do with which
> offsets are accepted.

Agreed. Maybe we can just remove the second one and update the doc?

Eric W. Biederman

unread,
Jun 16, 2009, 3:28:02 PM6/16/09
to Américo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Américo Wang <xiyou.w...@gmail.com> writes:

> On Mon, Jun 15, 2009 at 6:08 PM, Eric W. Biederman<ebie...@xmission.com> wrote:
>> Amerigo Wang <xiyou.w...@gmail.com> writes:
>>
>>> Fix wrong /proc/kcore size on x86_64.
>>>
>>> x86_64 uses __va() macro to caculate the virtual address passed to kclist_add()
>>> but decodes it with its own macro kc_vadd_to_offset(). This is wrong.
>>
>> Ok.  I finally understand what is going on here, and no kc_vaddr_to_offset
>> is not wrong when applied to a virtual address.  In fact I expect the current
>> definition makes things a bit more predictable.
>>
>> And yes kclist_add is must be given a virtual address
>>
>>> Also, according to Documentation/x86/x86_64/mm.txt, kc_vaddr_to_offset()
>>> is wrong too.
>>
>> How so?  The file offset is a number space that is different from both
>> physical and virtual addresses.
>
> Why? They _do_ have some calculated relations.

Sure. The offset is what you give to read/write. The virtual
addresses are what the kernel uses. In general in a core file they
are only tied together with the elf header. We do something a little
more pragmatic in the kernel.

>>> So just remove them, use the generic macro.
>>
>> I think a case can be made either way.  In practice neither answer
>> gives us a dense offset space on x86_64 so I think I prefer the
>> current definition which sets or clears the high bits as opposed
>> to something that mangles the address more.
>>
>
> I am trying to dig more... There must be something wrong there.

How so?

>> It uses get_kcore_size and (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
>> The second definition being bogus as it has nothing to do with which
>> offsets are accepted.
>
> Agreed. Maybe we can just remove the second one and update the doc?

Yes. It isn't critical but reducing confusion is good.
Do you want to cook up the patch for that?

Eric

Amerigo Wang

unread,
Jun 17, 2009, 10:59:07 PM6/17/09
to Eric W. Biederman, Américo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
On Tue, Jun 16, 2009 at 12:27:36PM -0700, Eric W. Biederman wrote:
>Américo Wang <xiyou.w...@gmail.com> writes:
>>> I think a case can be made either way.  In practice neither answer
>>> gives us a dense offset space on x86_64 so I think I prefer the
>>> current definition which sets or clears the high bits as opposed
>>> to something that mangles the address more.
>>>
>>
>> I am trying to dig more... There must be something wrong there.
>
>How so?

See what you will get for kc_vaddr_to_offset(__va(0))?
It is supposed to be 0.


>
>>> It uses get_kcore_size and (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
>>> The second definition being bogus as it has nothing to do with which
>>> offsets are accepted.
>>
>> Agreed. Maybe we can just remove the second one and update the doc?
>
>Yes. It isn't critical but reducing confusion is good.
>Do you want to cook up the patch for that?

Yes, I am cooking a patch set... will send them when ready.

Eric W. Biederman

unread,
Jun 17, 2009, 11:37:57 PM6/17/09
to Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Amerigo Wang <xiyou.w...@gmail.com> writes:

> On Tue, Jun 16, 2009 at 12:27:36PM -0700, Eric W. Biederman wrote:
>>Américo Wang <xiyou.w...@gmail.com> writes:
>>>> I think a case can be made either way.  In practice neither answer
>>>> gives us a dense offset space on x86_64 so I think I prefer the
>>>> current definition which sets or clears the high bits as opposed
>>>> to something that mangles the address more.
>>>>
>>>
>>> I am trying to dig more... There must be something wrong there.
>>
>>How so?
>
> See what you will get for kc_vaddr_to_offset(__va(0))?
> It is supposed to be 0.

I see: 0x0000880000001000 That extra 0x1000 looks suspicous.

It MUST NOT be 0. That is where the ELF header lives in the file.

> Yes, I am cooking a patch set... will send them when ready.

The I will leave it to you.

Eric

Amerigo Wang

unread,
Jun 18, 2009, 12:39:01 AM6/18/09
to Eric W. Biederman, Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
On Wed, Jun 17, 2009 at 08:37:40PM -0700, Eric W. Biederman wrote:
>Amerigo Wang <xiyou.w...@gmail.com> writes:
>
>> On Tue, Jun 16, 2009 at 12:27:36PM -0700, Eric W. Biederman wrote:
>>>Américo Wang <xiyou.w...@gmail.com> writes:
>>>>> I think a case can be made either way.  In practice neither answer
>>>>> gives us a dense offset space on x86_64 so I think I prefer the
>>>>> current definition which sets or clears the high bits as opposed
>>>>> to something that mangles the address more.
>>>>>
>>>>
>>>> I am trying to dig more... There must be something wrong there.
>>>
>>>How so?
>>
>> See what you will get for kc_vaddr_to_offset(__va(0))?
>> It is supposed to be 0.
>
>I see: 0x0000880000001000 That extra 0x1000 looks suspicous.


huh? 0x0000880000000000 not?

>
>It MUST NOT be 0. That is where the ELF header lives in the file.

Of course I knew this.

Just read the code:

phdr->p_offset = kc_vaddr_to_offset(m->addr) + dataoff;

So it should be 0, 'dataoff' is there...

Eric W. Biederman

unread,
Jun 18, 2009, 1:41:49 AM6/18/09
to Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan
Amerigo Wang <xiyou.w...@gmail.com> writes:

> On Wed, Jun 17, 2009 at 08:37:40PM -0700, Eric W. Biederman wrote:
>>Amerigo Wang <xiyou.w...@gmail.com> writes:
>>
>>> On Tue, Jun 16, 2009 at 12:27:36PM -0700, Eric W. Biederman wrote:
>>>>Américo Wang <xiyou.w...@gmail.com> writes:
>>>>>> I think a case can be made either way.  In practice neither answer
>>>>>> gives us a dense offset space on x86_64 so I think I prefer the
>>>>>> current definition which sets or clears the high bits as opposed
>>>>>> to something that mangles the address more.
>>>>>>
>>>>>
>>>>> I am trying to dig more... There must be something wrong there.
>>>>
>>>>How so?
>>>
>>> See what you will get for kc_vaddr_to_offset(__va(0))?
>>> It is supposed to be 0.
>>
>>I see: 0x0000880000001000 That extra 0x1000 looks suspicous.
>
>
> huh? 0x0000880000000000 not?
>
>>
>>It MUST NOT be 0. That is where the ELF header lives in the file.
>
> Of course I knew this.
>
> Just read the code:
>
> phdr->p_offset = kc_vaddr_to_offset(m->addr) + dataoff;
>
> So it should be 0, 'dataoff' is there...

Sorry. The naming then is horrible. It is really
kc_vaddr_to_something_like_the_offset.

I still don't see the need for a flat offset space.

I can see a real point of only having a single kc_vaddr_to_offset
function. Instead of the 3 in existence.

No point in cluttering the whole world with the oddities of the kcore
code. Especially when it should get cleaned up.

My real point earlier is that kc_vaddr_to_offset and
kc_offset_to_vaddr actually on x86_64 aren't broken. They are just
peculiar. There is some small point to their oddities, in that if
something is in the upper half of the address space (like xen) but
below PAGE_OFFSET you have a chance of accessing it with /proc/kcore.
But that is a very minor benefit.

Eric

Amerigo Wang

unread,
Jun 22, 2009, 4:52:29 AM6/22/09
to Eric W. Biederman, Amerigo Wang, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan, mtk.ma...@gmail.com
On Wed, Jun 17, 2009 at 10:41:32PM -0700, Eric W. Biederman wrote:
>Amerigo Wang <xiyou.w...@gmail.com> writes:
>>
>> Of course I knew this.
>>
>> Just read the code:
>>
>> phdr->p_offset = kc_vaddr_to_offset(m->addr) + dataoff;
>>
>> So it should be 0, 'dataoff' is there...
>
>Sorry. The naming then is horrible. It is really
>kc_vaddr_to_something_like_the_offset.
>
>I still don't see the need for a flat offset space.
>
>I can see a real point of only having a single kc_vaddr_to_offset
>function. Instead of the 3 in existence.
>
>No point in cluttering the whole world with the oddities of the kcore
>code. Especially when it should get cleaned up.
>
>My real point earlier is that kc_vaddr_to_offset and
>kc_offset_to_vaddr actually on x86_64 aren't broken. They are just
>peculiar. There is some small point to their oddities, in that if
>something is in the upper half of the address space (like xen) but
>below PAGE_OFFSET you have a chance of accessing it with /proc/kcore.
>But that is a very minor benefit.

It looks like that Linus fixes this in commit 9063c61fd5cbd.

So I will only fix the rest.

Signed-off-by: WANG Cong <amw...@redhat.com>
Cc: mtk.ma...@gmail.com

---
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 59b43a0..eca5201 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -405,9 +405,6 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
static int __init proc_kcore_init(void)
{
proc_root_kcore = proc_create("kcore", S_IRUSR, NULL, &proc_kcore_operations);
- if (proc_root_kcore)
- proc_root_kcore->size =
- (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
return 0;
}
module_init(proc_kcore_init);

---
diff --git a/man5/proc.5 b/man5/proc.5
index ed47f70..e31aae4 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -1246,8 +1246,6 @@ kernel
binary, GDB can be used to
examine the current state of any kernel data structures.

-The total length of the file is the size of physical memory (RAM) plus
-4KB.
.TP
.I /proc/kmsg
This file can be used instead of the

Amerigo Wang

unread,
Jun 30, 2009, 6:07:16 AM6/30/09
to Amerigo Wang, Eric W. Biederman, Tao Ma, Andrew Morton, linux-...@vger.kernel.org, Alexey Dobriyan, mtk.ma...@gmail.com

Linus fixes wrong size of /proc/kcore problem in commit 9063c61fd5cbd.

But its size still looks insane, since it never equals to the size
of physical memory.

Signed-off-by: WANG Cong <amw...@redhat.com>
Cc: mtk.ma...@gmail.com

(Andrew, could you please just cut off the kernel part from below? :)

Andrew Morton

unread,
Jul 1, 2009, 5:48:03 PM7/1/09
to Amerigo Wang, xiyou.w...@gmail.com, ebie...@xmission.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com
On Tue, 30 Jun 2009 18:08:50 +0800
Amerigo Wang <xiyou.w...@gmail.com> wrote:

>
> Linus fixes wrong size of /proc/kcore problem in commit 9063c61fd5cbd.
>
> But its size still looks insane, since it never equals to the size
> of physical memory.

Better changelogs, please!

I think that what you're saying is that the stat.st_size field of the
/proc/kcore inode does not equal the amount of physical memory, and
that you think it should do so?

If that is correct then it would be appropriate to explain what value
the stat.st_size field has before the patch and afterwards. Just
calling it "insane" isn't optimal.

> Signed-off-by: WANG Cong <amw...@redhat.com>
> Cc: mtk.ma...@gmail.com
>
> (Andrew, could you please just cut off the kernel part from below? :)
>
> ---
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index 59b43a0..eca5201 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -405,9 +405,6 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
> static int __init proc_kcore_init(void)
> {
> proc_root_kcore = proc_create("kcore", S_IRUSR, NULL, &proc_kcore_operations);
> - if (proc_root_kcore)
> - proc_root_kcore->size =
> - (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
> return 0;
> }
> module_init(proc_kcore_init);

AFAICT this means that proc_root_kcore->size will remain uninitialised
until a process opens and reads from /proc/kcore. So on initial boot
the `ls' output will presumably show a size of zero, and this will
change once /proc/kcore has been read?

If so, should we run get_kcore_size() in proc_kcore_init(), perhaps?

In fact, do we need to run get_kcore_size() more than once per boot?
AFAICT we only run kclist_add() during bootup, so if proc_kcore_init()
is called at the appropriate time, we can permanently cache its result?

In which case get_kcore_size() and kclist_add() can be marked __init.

Maybe that's all wrong - I didn't look terribly closely.

Eric W. Biederman

unread,
Jul 1, 2009, 7:25:40 PM7/1/09
to Andrew Morton, Amerigo Wang, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com
Andrew Morton <ak...@linux-foundation.org> writes:

Which is better than showing a random number of dubious relationship
to the size we normally show. That code is just a maintenance problem.

> If so, should we run get_kcore_size() in proc_kcore_init(), perhaps?
>
> In fact, do we need to run get_kcore_size() more than once per boot?
>
> AFAICT we only run kclist_add() during bootup, so if proc_kcore_init()
> is called at the appropriate time, we can permanently cache its result?
>
> In which case get_kcore_size() and kclist_add() can be marked __init.
>
> Maybe that's all wrong - I didn't look terribly closely.

Memory hot add I expect is the excuse. There is more that could be
done. But this patch is an obvious bit of chipping away nonsense
code.

Eric

Andrew Morton

unread,
Jul 1, 2009, 8:13:25 PM7/1/09
to Eric W. Biederman, Amerigo Wang, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, Yasunori Goto, KAMEZAWA Hiroyuki
On Wed, 01 Jul 2009 16:25:05 -0700 ebie...@xmission.com (Eric W. Biederman) wrote:

> Andrew Morton <ak...@linux-foundation.org> writes:
>
> >> index 59b43a0..eca5201 100644
> >> --- a/fs/proc/kcore.c
> >> +++ b/fs/proc/kcore.c
> >> @@ -405,9 +405,6 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
> >> static int __init proc_kcore_init(void)
> >> {
> >> proc_root_kcore = proc_create("kcore", S_IRUSR, NULL, &proc_kcore_operations);
> >> - if (proc_root_kcore)
> >> - proc_root_kcore->size =
> >> - (size_t)high_memory - PAGE_OFFSET + PAGE_SIZE;
> >> return 0;
> >> }
> >> module_init(proc_kcore_init);
> >
> > AFAICT this means that proc_root_kcore->size will remain uninitialised
> > until a process opens and reads from /proc/kcore. So on initial boot
> > the `ls' output will presumably show a size of zero, and this will
> > change once /proc/kcore has been read?
>
> Which is better than showing a random number of dubious relationship
> to the size we normally show. That code is just a maintenance problem.

Well it's not just that st_size is wrong before the first read. It's
also wrong after memory hot-add, up until the next read.

> > If so, should we run get_kcore_size() in proc_kcore_init(), perhaps?
> >
> > In fact, do we need to run get_kcore_size() more than once per boot?
> >
> > AFAICT we only run kclist_add() during bootup, so if proc_kcore_init()
> > is called at the appropriate time, we can permanently cache its result?
> >
> > In which case get_kcore_size() and kclist_add() can be marked __init.
> >
> > Maybe that's all wrong - I didn't look terribly closely.
>
> Memory hot add I expect is the excuse. There is more that could be
> done. But this patch is an obvious bit of chipping away nonsense
> code.

We have the infrastructure to get this right, I think:

- run

proc_root_kcore->size = get_kcore_size(...)

within proc_kcore_init()

- register a memory-hotplug notifier and each time memory goes online
or offline, rerun

proc_root_kcore->size = get_kcore_size(...)

- stop running get_kcore_size() within read_kcore().

I suspect that read_kcore() will not behave well if a memory hotplug
operation happens concurrently. But that's a separate problem.

(hopefully cc's some memory-hotplug people)


Or we just leave /proc/kcore's st_size at zero. It's a pretty hopeless
exercise trying to get this "right", as nobody can safely _use_ that
size - it can be wrong as soon as the caller has read from it.

KAMEZAWA Hiroyuki

unread,
Jul 1, 2009, 8:43:54 PM7/1/09
to Andrew Morton, Eric W. Biederman, Amerigo Wang, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, Yasunori Goto
On Wed, 1 Jul 2009 17:12:49 -0700
Andrew Morton <ak...@linux-foundation.org> wrote:

> On Wed, 01 Jul 2009 16:25:05 -0700 ebie...@xmission.com (Eric W. Biederman) wrote:
> > Which is better than showing a random number of dubious relationship
> > to the size we normally show. That code is just a maintenance problem.
>
> Well it's not just that st_size is wrong before the first read. It's
> also wrong after memory hot-add, up until the next read.
>

And I found kclist_add() is not called at memory hotplug...


> > > If so, should we run get_kcore_size() in proc_kcore_init(), perhaps?
> > >
> > > In fact, do we need to run get_kcore_size() more than once per boot?
> > >
> > > AFAICT we only run kclist_add() during bootup, so if proc_kcore_init()
> > > is called at the appropriate time, we can permanently cache its result?
> > >
> > > In which case get_kcore_size() and kclist_add() can be marked __init.
> > >
> > > Maybe that's all wrong - I didn't look terribly closely.
> >
> > Memory hot add I expect is the excuse. There is more that could be
> > done. But this patch is an obvious bit of chipping away nonsense
> > code.
>
> We have the infrastructure to get this right, I think:
>
> - run
>
> proc_root_kcore->size = get_kcore_size(...)
>
> within proc_kcore_init()
>

yes, seems sane.


> - register a memory-hotplug notifier and each time memory goes online
> or offline, rerun
>
> proc_root_kcore->size = get_kcore_size(...)
>

yes. and we need kclist_add() under memory hotplug.


> - stop running get_kcore_size() within read_kcore().
>
> I suspect that read_kcore() will not behave well if a memory hotplug
> operation happens concurrently. But that's a separate problem.
>
> (hopefully cc's some memory-hotplug people)
>

Maybe no problem. I don't think people does memory hotplug while he reads
/proc/kcore. (It sounds like modify coredump while investigating it.)

Thanks,
-Kame

Amerigo Wang

unread,
Jul 2, 2009, 5:26:27 AM7/2/09
to Andrew Morton, Amerigo Wang, ebie...@xmission.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com
On Wed, Jul 01, 2009 at 02:47:42PM -0700, Andrew Morton wrote:
>On Tue, 30 Jun 2009 18:08:50 +0800
>Amerigo Wang <xiyou.w...@gmail.com> wrote:
>
>>
>> Linus fixes wrong size of /proc/kcore problem in commit 9063c61fd5cbd.
>>
>> But its size still looks insane, since it never equals to the size
>> of physical memory.
>
>Better changelogs, please!
>
>I think that what you're saying is that the stat.st_size field of the
>/proc/kcore inode does not equal the amount of physical memory, and
>that you think it should do so?


No, it is expected to be more than the amount of physical memory.


>
>If that is correct then it would be appropriate to explain what value
>the stat.st_size field has before the patch and afterwards. Just
>calling it "insane" isn't optimal.

Yup!

My bad, I just mentioned this in the earlier email in this thread,
but I forgot it put it here. Sorry for this!

>
>AFAICT this means that proc_root_kcore->size will remain uninitialised
>until a process opens and reads from /proc/kcore. So on initial boot
>the `ls' output will presumably show a size of zero, and this will
>change once /proc/kcore has been read?

Yes, exactly...

>
>If so, should we run get_kcore_size() in proc_kcore_init(), perhaps?

Yes, we can, but I think leaving this like what the rest /proc files
behave is better.

>
>In fact, do we need to run get_kcore_size() more than once per boot?
>AFAICT we only run kclist_add() during bootup, so if proc_kcore_init()
>is called at the appropriate time, we can permanently cache its result?
>
>In which case get_kcore_size() and kclist_add() can be marked __init.

A quick grep shows kclist_add() can be marked as __init, but I don't
know if anyone will use it in other parts in the future.

I prefer leaving it as it is.

Andrew Morton

unread,
Jul 17, 2009, 6:30:39 PM7/17/09
to KAMEZAWA Hiroyuki, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com

I think I'm about to forget about the above issues. If everyone else
does the same, they won't get addressed. Oh well.

And I still need to decide whether
kcore-fix-proc-kcores-statst_size.patch fixes things up sufficiently
well to justify merging it.

KAMEZAWA Hiroyuki

unread,
Jul 20, 2009, 10:11:47 PM7/20/09
to Andrew Morton, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com

Hmm, I read fs/proc/kcore.c and feel followng, now.

- kclist doesn't handle memory hole, then, it will never be "correct" size.
For example, arch/x86/mm/init.c calls kclist_add() as following

715 kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
716 VMALLOC_END-VMALLOC_START);

Wow, extremely big anyway.

- Then, yes. Size of /proc/kcode is pointless. Anyway, what's important is
not "size", but ELF phder of kcore.

To this patch,
Acked-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>

BTW, I'd like to look into handling physical memory range for /proc/kcore.
IMHO, kclist for physical memory is not necessary...it's handled by /proc/iomem.
"kdump" uses this information and it's properly maintained by memory hotplug.
I'd like to try some pathces and make kclist_add() for physical memory cleaner,
later.

Thanks,
-Kame

KAMEZAWA Hiroyuki

unread,
Jul 21, 2009, 4:49:01 AM7/21/09
to KAMEZAWA Hiroyuki, Andrew Morton, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com
On Tue, 21 Jul 2009 11:09:24 +0900
KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com> wrote:

> On Fri, 17 Jul 2009 15:29:55 -0700
> Andrew Morton <ak...@linux-foundation.org> wrote:
>
> > On Thu, 2 Jul 2009 09:41:38 +0900
> > KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com> wrote:
> > I think I'm about to forget about the above issues. If everyone else
> > does the same, they won't get addressed. Oh well.
> >
> > And I still need to decide whether
> > kcore-fix-proc-kcores-statst_size.patch fixes things up sufficiently
> > well to justify merging it.
> >
>
> Hmm, I read fs/proc/kcore.c and feel followng, now.
>
> - kclist doesn't handle memory hole, then, it will never be "correct" size.
> For example, arch/x86/mm/init.c calls kclist_add() as following
>
> 715 kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
> 716 VMALLOC_END-VMALLOC_START);
>
> Wow, extremely big anyway.
>
> - Then, yes. Size of /proc/kcode is pointless. Anyway, what's important is
> not "size", but ELF phder of kcore.
>
> To this patch,
> Acked-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
>

Ah...BTW, if set size to be 0,
%objdump -x /proc/kcore
returns immediately because objdump finds size as 0. but readelf seems to
work well.

KAMEZAWA Hiroyuki

unread,
Jul 21, 2009, 5:38:30 AM7/21/09
to Andrew Morton, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com

Now, /proc/kcore is built on kclist information which is constructed at boot.
This kclist includes physical memory range information but not updated at
memory hotplug. And, this information tends to includes big memory hole.

On the other hand, /proc/iomem includes all physical memory information as
"System RAM" and this is updated properly and kdump use this, IIUC.
(I hope all archtecuture stores necessary information...)

This patch tries to build kclist for physical memory(direct map) on
/proc/iomem info. It's refreshed at open("/proc/kcore",) if necesasry.

This is just a RFC. Any comments are welcome.

[1/3] ... clean up kclist handling.
[2/3] ... clean up kclist_add()
[3/3] ... use /proc/iomem information for /proc/kcore.


I can only test x86-64.

Thanks,
-Kame

KAMEZAWA Hiroyuki

unread,
Jul 21, 2009, 5:39:59 AM7/21/09
to KAMEZAWA Hiroyuki, Andrew Morton, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com
From: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>

/proc/kcore uses its own list handling codes. But it's better to use
generic list codes.

And read_kcore() use "m" to specifiy
- kcore entry
- vmalloc entry
both in different types.
This patch renames "m" to "vms" for vmalloc(), avoiding confusion.

No changes in logic. just clean up.

Signed-off-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
---
fs/proc/kcore.c | 41 ++++++++++++++++++++++-------------------
include/linux/proc_fs.h | 2 +-
2 files changed, 23 insertions(+), 20 deletions(-)

Index: mmotm-2.6.31-Jul16/fs/proc/kcore.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/fs/proc/kcore.c
+++ mmotm-2.6.31-Jul16/fs/proc/kcore.c
@@ -20,6 +20,7 @@
#include <linux/init.h>
#include <asm/uaccess.h>
#include <asm/io.h>
+#include <linux/list.h>

#define CORE_STR "CORE"

@@ -57,7 +58,7 @@ struct memelfnote
void *data;
};

-static struct kcore_list *kclist;
+static LIST_HEAD(kclist_head);
static DEFINE_RWLOCK(kclist_lock);

void
@@ -67,8 +68,7 @@ kclist_add(struct kcore_list *new, void
new->size = size;

write_lock(&kclist_lock);
- new->next = kclist;
- kclist = new;
+ list_add_tail(&new->list, &kclist_head);
write_unlock(&kclist_lock);
}

@@ -80,7 +80,7 @@ static size_t get_kcore_size(int *nphdr,
*nphdr = 1; /* PT_NOTE */
size = 0;

- for (m=kclist; m; m=m->next) {
+ list_for_each_entry(m, &kclist_head, list) {
try = kc_vaddr_to_offset((size_t)m->addr + m->size);
if (try > size)
size = try;
@@ -192,7 +192,7 @@ static void elf_kcore_store_hdr(char *bu
nhdr->p_align = 0;

/* setup ELF PT_LOAD program header for every area */
- for (m=kclist; m; m=m->next) {
+ list_for_each_entry(m, &kclist_head, list) {
phdr = (struct elf_phdr *) bufp;
bufp += sizeof(struct elf_phdr);
offset += sizeof(struct elf_phdr);
@@ -317,7 +317,7 @@ read_kcore(struct file *file, char __use
struct kcore_list *m;

read_lock(&kclist_lock);
- for (m=kclist; m; m=m->next) {
+ list_for_each_entry(m, &kclist_head, list) {
if (start >= m->addr && start < (m->addr+m->size))
break;
}
@@ -328,7 +328,7 @@ read_kcore(struct file *file, char __use
return -EFAULT;
} else if (is_vmalloc_addr((void *)start)) {
char * elf_buf;
- struct vm_struct *m;
+ struct vm_struct *vms;
unsigned long curstart = start;
unsigned long cursize = tsz;

@@ -337,29 +337,32 @@ read_kcore(struct file *file, char __use
return -ENOMEM;

read_lock(&vmlist_lock);
- for (m=vmlist; m && cursize; m=m->next) {
+ for (vms = vmlist; vms && cursize; vms = vms->next) {
unsigned long vmstart;
unsigned long vmsize;
- unsigned long msize = m->size - PAGE_SIZE;
+ unsigned long msize = vms->size - PAGE_SIZE;
+ unsigned long curend, vmend;

- if (((unsigned long)m->addr + msize) <
+ if (((unsigned long)vms->addr + msize) <
curstart)
continue;
- if ((unsigned long)m->addr > (curstart +
+ if ((unsigned long)vms->addr > (curstart +
cursize))
break;
- vmstart = (curstart < (unsigned long)m->addr ?
- (unsigned long)m->addr : curstart);
- if (((unsigned long)m->addr + msize) >
- (curstart + cursize))
- vmsize = curstart + cursize - vmstart;
+ if (curstart < (unsigned long)vms->addr)
+ vmstart = (unsigned long)vms->addr;
else
- vmsize = (unsigned long)m->addr +
- msize - vmstart;
+ vmstart = curstart;
+ curend = curstart + cursize;
+ vmend = (unsigned long)vms->addr + msize;
+ if (vmend > curend)
+ vmsize = curend - vmstart;
+ else
+ vmsize = vmend - vmstart;
curstart = vmstart + vmsize;
cursize -= vmsize;
/* don't dump ioremap'd stuff! (TA) */
- if (m->flags & VM_IOREMAP)
+ if (vms->flags & VM_IOREMAP)
continue;
memcpy(elf_buf + (vmstart - start),
(char *)vmstart, vmsize);
Index: mmotm-2.6.31-Jul16/include/linux/proc_fs.h
===================================================================
--- mmotm-2.6.31-Jul16.orig/include/linux/proc_fs.h
+++ mmotm-2.6.31-Jul16/include/linux/proc_fs.h
@@ -79,7 +79,7 @@ struct proc_dir_entry {
};

struct kcore_list {
- struct kcore_list *next;
+ struct list_head list;
unsigned long addr;
size_t size;
};

KAMEZAWA Hiroyuki

unread,
Jul 21, 2009, 5:41:31 AM7/21/09
to KAMEZAWA Hiroyuki, Andrew Morton, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com
From: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>

Now, kclist_add() only eats start address and size as its arguments.
Considering to make kclist dynamically reconfigulable, it's necessary
to know which kclists are for System RAM and which are not.

This patch add kclist types as
KCORE_RAM
KCORE_VMALLOC
KCORE_TEXT
KCORE_OTHER

region for KCORE_RAM will be dynamically updated at memory hotplug.

Signed-off-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
---

arch/ia64/mm/init.c | 7 ++++---
arch/mips/mm/init.c | 7 ++++---
arch/powerpc/mm/init_32.c | 4 ++--
arch/powerpc/mm/init_64.c | 5 +++--
arch/sh/mm/init.c | 4 ++--
arch/x86/mm/init_32.c | 4 ++--
arch/x86/mm/init_64.c | 11 ++++++-----
fs/proc/kcore.c | 3 ++-
include/linux/proc_fs.h | 13 +++++++++++--
9 files changed, 36 insertions(+), 22 deletions(-)

Index: mmotm-2.6.31-Jul16/include/linux/proc_fs.h
===================================================================
--- mmotm-2.6.31-Jul16.orig/include/linux/proc_fs.h
+++ mmotm-2.6.31-Jul16/include/linux/proc_fs.h

@@ -78,10 +78,18 @@ struct proc_dir_entry {
struct list_head pde_openers; /* who did ->open, but not ->release */
};

+enum kcore_type {
+ KCORE_TEXT,
+ KCORE_VMALLOC,
+ KCORE_RAM,
+ KCORE_OTHER,
+};
+
struct kcore_list {


struct list_head list;
unsigned long addr;
size_t size;

+ int type;
};

struct vmcore {
@@ -233,11 +241,12 @@ static inline void dup_mm_exe_file(struc
#endif /* CONFIG_PROC_FS */

#if !defined(CONFIG_PROC_KCORE)
-static inline void kclist_add(struct kcore_list *new, void *addr, size_t size)
+static inline void
+kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
{
}
#else
-extern void kclist_add(struct kcore_list *, void *, size_t);
+extern void kclist_add(struct kcore_list *, void *, size_t, int type);
#endif

union proc_op {
Index: mmotm-2.6.31-Jul16/arch/ia64/mm/init.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/ia64/mm/init.c
+++ mmotm-2.6.31-Jul16/arch/ia64/mm/init.c
@@ -639,9 +639,10 @@ mem_init (void)

high_memory = __va(max_low_pfn * PAGE_SIZE);

- kclist_add(&kcore_mem, __va(0), max_low_pfn * PAGE_SIZE);
- kclist_add(&kcore_vmem, (void *)VMALLOC_START, VMALLOC_END-VMALLOC_START);
- kclist_add(&kcore_kernel, _stext, _end - _stext);
+ kclist_add(&kcore_mem, __va(0), max_low_pfn * PAGE_SIZE, KCORE_RAM);
+ kclist_add(&kcore_vmem, (void *)VMALLOC_START,
+ VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);
+ kclist_add(&kcore_kernel, _stext, _end - _stext, KCORE_TEXT);

for_each_online_pgdat(pgdat)
if (pgdat->bdata->node_bootmem_map)
Index: mmotm-2.6.31-Jul16/arch/mips/mm/init.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/mips/mm/init.c
+++ mmotm-2.6.31-Jul16/arch/mips/mm/init.c
@@ -409,11 +409,12 @@ void __init mem_init(void)
if ((unsigned long) &_text > (unsigned long) CKSEG0)
/* The -4 is a hack so that user tools don't have to handle
the overflow. */
- kclist_add(&kcore_kseg0, (void *) CKSEG0, 0x80000000 - 4);
+ kclist_add(&kcore_kseg0, (void *) CKSEG0,
+ 0x80000000 - 4, KCORE_TEXT);
#endif
- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT);
+ kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT, KCORE_RAM);
kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
- VMALLOC_END-VMALLOC_START);
+ VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);

printk(KERN_INFO "Memory: %luk/%luk available (%ldk kernel code, "
"%ldk reserved, %ldk data, %ldk init, %ldk highmem)\n",
Index: mmotm-2.6.31-Jul16/arch/powerpc/mm/init_32.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/powerpc/mm/init_32.c
+++ mmotm-2.6.31-Jul16/arch/powerpc/mm/init_32.c
@@ -270,11 +270,11 @@ static int __init setup_kcore(void)
size);
}

- kclist_add(kcore_mem, __va(base), size);
+ kclist_add(kcore_mem, __va(base), size, KCORE_RAM);
}

kclist_add(&kcore_vmem, (void *)VMALLOC_START,
- VMALLOC_END-VMALLOC_START);
+ VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);

return 0;
}
Index: mmotm-2.6.31-Jul16/arch/powerpc/mm/init_64.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/powerpc/mm/init_64.c
+++ mmotm-2.6.31-Jul16/arch/powerpc/mm/init_64.c
@@ -128,10 +128,11 @@ static int __init setup_kcore(void)
if (!kcore_mem)
panic("%s: kmalloc failed\n", __func__);

- kclist_add(kcore_mem, __va(base), size);
+ kclist_add(kcore_mem, __va(base), size, KCORE_RAM);
}

- kclist_add(&kcore_vmem, (void *)VMALLOC_START, VMALLOC_END-VMALLOC_START);
+ kclist_add(&kcore_vmem, (void *)VMALLOC_START,
+ VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);

return 0;
}
Index: mmotm-2.6.31-Jul16/arch/sh/mm/init.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/sh/mm/init.c
+++ mmotm-2.6.31-Jul16/arch/sh/mm/init.c
@@ -218,9 +218,9 @@ void __init mem_init(void)
datasize = (unsigned long) &_edata - (unsigned long) &_etext;
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;

- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT);
+ kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT, KCORE_RAM);
kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
- VMALLOC_END - VMALLOC_START);
+ VMALLOC_END - VMALLOC_START, KCORE_VMALLOC);

printk(KERN_INFO "Memory: %luk/%luk available (%dk kernel code, "
"%dk data, %dk init)\n",
Index: mmotm-2.6.31-Jul16/arch/x86/mm/init_32.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/x86/mm/init_32.c
+++ mmotm-2.6.31-Jul16/arch/x86/mm/init_32.c
@@ -886,9 +886,9 @@ void __init mem_init(void)
datasize = (unsigned long) &_edata - (unsigned long) &_etext;
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;

- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT);
+ kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT, KCORE_RAM);
kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
- VMALLOC_END-VMALLOC_START);
+ VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);

printk(KERN_INFO "Memory: %luk/%luk available (%dk kernel code, "
"%dk reserved, %dk data, %dk init, %ldk highmem)\n",
Index: mmotm-2.6.31-Jul16/arch/x86/mm/init_64.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/x86/mm/init_64.c
+++ mmotm-2.6.31-Jul16/arch/x86/mm/init_64.c
@@ -677,13 +677,14 @@ void __init mem_init(void)
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;

/* Register memory areas for /proc/kcore */
- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT);
+ kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT, KCORE_RAM);
kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
- VMALLOC_END-VMALLOC_START);
- kclist_add(&kcore_kernel, &_stext, _end - _stext);
- kclist_add(&kcore_modules, (void *)MODULES_VADDR, MODULES_LEN);
+ VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);
+ kclist_add(&kcore_kernel, &_stext, _end - _stext, KCORE_TEXT);
+ kclist_add(&kcore_modules, (void *)MODULES_VADDR, MODULES_LEN,
+ KCORE_OTHER);
kclist_add(&kcore_vsyscall, (void *)VSYSCALL_START,
- VSYSCALL_END - VSYSCALL_START);
+ VSYSCALL_END - VSYSCALL_START, KCORE_OTHER);

printk(KERN_INFO "Memory: %luk/%luk available (%ldk kernel code, "
"%ldk absent, %ldk reserved, %ldk data, %ldk init)\n",


Index: mmotm-2.6.31-Jul16/fs/proc/kcore.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/fs/proc/kcore.c
+++ mmotm-2.6.31-Jul16/fs/proc/kcore.c

@@ -62,10 +62,11 @@ static LIST_HEAD(kclist_head);
static DEFINE_RWLOCK(kclist_lock);

void
-kclist_add(struct kcore_list *new, void *addr, size_t size)
+kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
{
new->addr = (unsigned long)addr;
new->size = size;
+ new->type = type;

write_lock(&kclist_lock);
list_add_tail(&new->list, &kclist_head);

KAMEZAWA Hiroyuki

unread,
Jul 21, 2009, 5:43:21 AM7/21/09
to KAMEZAWA Hiroyuki, Andrew Morton, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com
From: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>

For /proc/kcore, each arch registers its memory range by kclist_add().
In usual,
- range of physical memory
- range of vmalloc area
- text, etc...
are registered but "range of physical memory" has some troubles.
It doesn't updated at memory hotplug and it tend to include
unnecessary memory holes. Now, /proc/iomem (kernel/resource.c)
includes required physical memory range information and it's
properly updated at memory hotplug. Then, it's good to avoid
using its own code(duplicating information) and to rebuild
kclist for physical memory based on /proc/iomem.

By this, per-arch kclist_add() for KCORE_RAM can be dropped.


Signed-off-by: KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com>
---

Index: mmotm-2.6.31-Jul16/fs/proc/kcore.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/fs/proc/kcore.c 2009-07-20 20:44:57.000000000 +0900
+++ mmotm-2.6.31-Jul16/fs/proc/kcore.c 2009-07-20 22:01:52.000000000 +0900
@@ -21,6 +21,9 @@
#include <asm/uaccess.h>
#include <asm/io.h>
#include <linux/list.h>
+#include <linux/ioport.h>
+#include <linux/memory_hotplug.h>
+#include <linux/memory.h>

#define CORE_STR "CORE"

@@ -30,17 +33,6 @@

static struct proc_dir_entry *proc_root_kcore;

-static int open_kcore(struct inode * inode, struct file * filp)
-{
- return capable(CAP_SYS_RAWIO) ? 0 : -EPERM;
-}
-
-static ssize_t read_kcore(struct file *, char __user *, size_t, loff_t *);
-
-static const struct file_operations proc_kcore_operations = {
- .read = read_kcore,
- .open = open_kcore,
-};

#ifndef kc_vaddr_to_offset
#define kc_vaddr_to_offset(v) ((v) - PAGE_OFFSET)
@@ -60,6 +52,7 @@

static LIST_HEAD(kclist_head);
static DEFINE_RWLOCK(kclist_lock);
+static int kcore_need_update;

void
kclist_add(struct kcore_list *new, void *addr, size_t size, int type)
@@ -98,6 +91,104 @@
return size + *elf_buflen;
}

+static void free_kclist_ents(struct list_head *head)
+{
+ struct kcore_list *tmp, *pos;
+
+ list_for_each_entry_safe(pos, tmp, head, list) {
+ list_del(&pos->list);
+ kfree(pos);
+ }
+}
+/*
+ * Replace all KCORE_RAM information with passed list.
+ */
+static void __kcore_update_ram(struct list_head *list)
+{
+ struct kcore_list *tmp, *pos;
+ LIST_HEAD(garbage);
+
+ write_lock(&kclist_lock);
+ if (kcore_need_update) {
+ list_for_each_entry_safe(pos, tmp, &kclist_head, list) {
+ if (pos->type == KCORE_RAM)
+ list_move(&pos->list, &garbage);
+ }
+ list_splice(list, &kclist_head);
+ } else
+ list_splice(list, &garbage);
+ kcore_need_update = 0;
+ write_unlock(&kclist_lock);
+
+ free_kclist_ents(&garbage);
+}
+
+
+#ifdef CONFIG_HIGHMEM
+/*
+ * If no highmem, we can assume [0...max_low_pfn) continuous range of memory
+ * because memory hole is not as big as !HIGHMEM case.
+ * (HIGHMEM is special because part of memory is _invisible_ from the kernel.)
+ */
+static int kcore_update_ram(void)
+{
+ LIST_HEAD(head);
+ struct kcore_list *ent;
+ int ret = 0;
+
+ ent = kmalloc(sizeof(*head), GFP_KERNEL);
+ if (!ent) {
+ ret = -ENOMEM;
+ goto unlock_out;
+ }
+ ent->addr = __va(0);
+ ent->size = max_low_pfn << PAGE_SHIFT;
+ ent->type = SYSTEM_RAM;
+ list_add(&ent->list, &head);
+ __kcore_update_ram(&head);
+ return ret;
+}
+
+#else /* !CONFIG_HIGHMEM */
+
+static int
+kclist_add_private(unsigned long pfn, unsigned long nr_pages, void *arg)
+{
+ struct list_head *head = (struct list_head *)arg;
+ struct kcore_list *ent;
+
+ ent = kmalloc(sizeof(*ent), GFP_KERNEL);
+ if (!ent)
+ return -ENOMEM;
+ ent->addr = (unsigned long)__va((pfn << PAGE_SHIFT));
+ ent->size = nr_pages << PAGE_SHIFT;
+ ent->type = KCORE_RAM;
+ list_add(&ent->list, head);
+ return 0;
+}
+
+static int kcore_update_ram(void)
+{
+ int nid, ret;
+ unsigned long end_pfn;
+ LIST_HEAD(head);
+
+ /* Not inialized....update now */
+ /* find out "max pfn" */
+ end_pfn = 0;
+ for_each_node_state(nid, N_HIGH_MEMORY)
+ if (end_pfn < node_end_pfn(nid))
+ end_pfn = node_end_pfn(nid);
+ /* scan 0 to max_pfn */
+ ret = walk_memory_resource(0, end_pfn, &head, kclist_add_private);
+ if (ret) {
+ free_kclist_ents(&head);
+ return -ENOMEM;
+ }
+ __kcore_update_ram(&head);
+ return ret;
+}
+#endif /* CONFIG_HIGH_MEM */

/*****************************************************************************/
/*
@@ -271,6 +362,11 @@
read_unlock(&kclist_lock);
return 0;
}
+ /* memory hotplug ?? */
+ if (kcore_need_update) {
+ read_unlock(&kclist_lock);
+ return -EBUSY;
+ }

/* trim buflen to not go beyond EOF */
if (buflen > size - *fpos)
@@ -406,9 +502,42 @@
return acc;
}

+static int open_kcore(struct inode * inode, struct file *filp)
+{
+ if (!capable(CAP_SYS_RAWIO))
+ return -EPERM;
+ if (kcore_need_update)
+ kcore_update_ram();
+ return 0;
+}
+
+
+static const struct file_operations proc_kcore_operations = {
+ .read = read_kcore,
+ .open = open_kcore,
+};
+
+/* just remember that we have to update kcore */
+static int __meminit kcore_callback(struct notifier_block *self,
+ unsigned long action, void *arg)
+{
+ switch (action) {
+ case MEM_ONLINE:
+ case MEM_OFFLINE:
+ write_lock(&kclist_lock);
+ kcore_need_update = 1;
+ write_unlock(&kclist_lock);
+ }
+ return NOTIFY_OK;
+}
+
+


static int __init proc_kcore_init(void)
{
proc_root_kcore = proc_create("kcore", S_IRUSR, NULL, &proc_kcore_operations);

+ kcore_update_ram();
+ hotplug_memory_notifier(kcore_callback, 0);
return 0;
}
module_init(proc_kcore_init);
+
Index: mmotm-2.6.31-Jul16/include/linux/ioport.h
===================================================================
--- mmotm-2.6.31-Jul16.orig/include/linux/ioport.h 2009-07-20 20:44:57.000000000 +0900
+++ mmotm-2.6.31-Jul16/include/linux/ioport.h 2009-07-20 20:45:10.000000000 +0900
@@ -186,5 +186,13 @@
extern int iomem_map_sanity_check(resource_size_t addr, unsigned long size);
extern int iomem_is_exclusive(u64 addr);

+/*
+ * Walk through all SYSTEM_RAM which is registered as resource.
+ * arg is (start_pfn, nr_pages, private_arg_pointer)
+ */
+extern int walk_memory_resource(unsigned long start_pfn,
+ unsigned long nr_pages, void *arg,
+ int (*func)(unsigned long, unsigned long, void *));
+
#endif /* __ASSEMBLY__ */
#endif /* _LINUX_IOPORT_H */
Index: mmotm-2.6.31-Jul16/include/linux/memory_hotplug.h
===================================================================
--- mmotm-2.6.31-Jul16.orig/include/linux/memory_hotplug.h 2009-07-20 20:44:57.000000000 +0900
+++ mmotm-2.6.31-Jul16/include/linux/memory_hotplug.h 2009-07-20 20:45:10.000000000 +0900
@@ -191,13 +191,6 @@

#endif /* ! CONFIG_MEMORY_HOTPLUG */

-/*
- * Walk through all memory which is registered as resource.
- * arg is (start_pfn, nr_pages, private_arg_pointer)
- */
-extern int walk_memory_resource(unsigned long start_pfn,
- unsigned long nr_pages, void *arg,
- int (*func)(unsigned long, unsigned long, void *));

#ifdef CONFIG_MEMORY_HOTREMOVE

Index: mmotm-2.6.31-Jul16/kernel/resource.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/kernel/resource.c 2009-07-20 20:44:57.000000000 +0900
+++ mmotm-2.6.31-Jul16/kernel/resource.c 2009-07-20 20:45:10.000000000 +0900
@@ -234,7 +234,7 @@

EXPORT_SYMBOL(release_resource);

-#if defined(CONFIG_MEMORY_HOTPLUG) && !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
+#if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
/*
* Finds the lowest memory reosurce exists within [res->start.res->end)
* the caller must specify res->start, res->end, res->flags.
Index: mmotm-2.6.31-Jul16/arch/ia64/mm/init.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/ia64/mm/init.c 2009-07-20 19:29:53.000000000 +0900
+++ mmotm-2.6.31-Jul16/arch/ia64/mm/init.c 2009-07-20 21:20:24.000000000 +0900
@@ -639,7 +639,6 @@



high_memory = __va(max_low_pfn * PAGE_SIZE);

- kclist_add(&kcore_mem, __va(0), max_low_pfn * PAGE_SIZE, KCORE_RAM);
kclist_add(&kcore_vmem, (void *)VMALLOC_START,
VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);


kclist_add(&kcore_kernel, _stext, _end - _stext, KCORE_TEXT);

Index: mmotm-2.6.31-Jul16/arch/mips/mm/init.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/mips/mm/init.c 2009-07-20 19:39:16.000000000 +0900
+++ mmotm-2.6.31-Jul16/arch/mips/mm/init.c 2009-07-20 21:20:55.000000000 +0900
@@ -412,7 +412,6 @@
kclist_add(&kcore_kseg0, (void *) CKSEG0,


0x80000000 - 4, KCORE_TEXT);
#endif

- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT, KCORE_RAM);
kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);

Index: mmotm-2.6.31-Jul16/arch/powerpc/mm/init_32.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/powerpc/mm/init_32.c 2009-07-20 19:41:13.000000000 +0900
+++ mmotm-2.6.31-Jul16/arch/powerpc/mm/init_32.c 2009-07-20 21:21:54.000000000 +0900
@@ -249,30 +249,6 @@

static int __init setup_kcore(void)
{
- int i;
-
- for (i = 0; i < lmb.memory.cnt; i++) {
- unsigned long base;
- unsigned long size;
- struct kcore_list *kcore_mem;
-
- base = lmb.memory.region[i].base;
- size = lmb.memory.region[i].size;
-
- kcore_mem = kmalloc(sizeof(struct kcore_list), GFP_ATOMIC);
- if (!kcore_mem)
- panic("%s: kmalloc failed\n", __func__);
-
- /* must stay under 32 bits */
- if ( 0xfffffffful - (unsigned long)__va(base) < size) {
- size = 0xfffffffful - (unsigned long)(__va(base));
- printk(KERN_DEBUG "setup_kcore: restrict size=%lx\n",
- size);
- }
-
- kclist_add(kcore_mem, __va(base), size, KCORE_RAM);
- }
-
kclist_add(&kcore_vmem, (void *)VMALLOC_START,
VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);

Index: mmotm-2.6.31-Jul16/arch/powerpc/mm/init_64.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/powerpc/mm/init_64.c 2009-07-20 19:42:06.000000000 +0900
+++ mmotm-2.6.31-Jul16/arch/powerpc/mm/init_64.c 2009-07-20 21:22:20.000000000 +0900
@@ -114,23 +114,6 @@

static int __init setup_kcore(void)
{
- int i;
-
- for (i=0; i < lmb.memory.cnt; i++) {
- unsigned long base, size;
- struct kcore_list *kcore_mem;
-
- base = lmb.memory.region[i].base;
- size = lmb.memory.region[i].size;
-
- /* GFP_ATOMIC to avoid might_sleep warnings during boot */
- kcore_mem = kmalloc(sizeof(struct kcore_list), GFP_ATOMIC);
- if (!kcore_mem)
- panic("%s: kmalloc failed\n", __func__);
-
- kclist_add(kcore_mem, __va(base), size, KCORE_RAM);
- }
-
kclist_add(&kcore_vmem, (void *)VMALLOC_START,
VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);

Index: mmotm-2.6.31-Jul16/arch/sh/mm/init.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/sh/mm/init.c 2009-07-20 19:43:19.000000000 +0900
+++ mmotm-2.6.31-Jul16/arch/sh/mm/init.c 2009-07-20 21:22:52.000000000 +0900
@@ -218,7 +218,6 @@


datasize = (unsigned long) &_edata - (unsigned long) &_etext;
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;

- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT, KCORE_RAM);
kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
VMALLOC_END - VMALLOC_START, KCORE_VMALLOC);

Index: mmotm-2.6.31-Jul16/arch/x86/mm/init_32.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/x86/mm/init_32.c 2009-07-20 19:44:21.000000000 +0900
+++ mmotm-2.6.31-Jul16/arch/x86/mm/init_32.c 2009-07-20 21:23:36.000000000 +0900
@@ -886,7 +886,6 @@


datasize = (unsigned long) &_edata - (unsigned long) &_etext;
initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;

- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT, KCORE_RAM);
kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);

Index: mmotm-2.6.31-Jul16/arch/x86/mm/init_64.c
===================================================================
--- mmotm-2.6.31-Jul16.orig/arch/x86/mm/init_64.c 2009-07-20 19:45:45.000000000 +0900
+++ mmotm-2.6.31-Jul16/arch/x86/mm/init_64.c 2009-07-20 21:24:28.000000000 +0900
@@ -677,7 +677,6 @@


initsize = (unsigned long) &__init_end - (unsigned long) &__init_begin;

/* Register memory areas for /proc/kcore */

- kclist_add(&kcore_mem, __va(0), max_low_pfn << PAGE_SHIFT, KCORE_RAM);
kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
VMALLOC_END-VMALLOC_START, KCORE_VMALLOC);


kclist_add(&kcore_kernel, &_stext, _end - _stext, KCORE_TEXT);

--

Andi Kleen

unread,
Jul 21, 2009, 7:30:17 AM7/21/09
to KAMEZAWA Hiroyuki, Andrew Morton, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com
KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com> writes:

> Now, /proc/kcore is built on kclist information which is constructed at boot.
> This kclist includes physical memory range information but not updated at
> memory hotplug. And, this information tends to includes big memory hole.
>
> On the other hand, /proc/iomem includes all physical memory information as
> "System RAM" and this is updated properly and kdump use this, IIUC.
> (I hope all archtecuture stores necessary information...)
>
> This patch tries to build kclist for physical memory(direct map) on
> /proc/iomem info. It's refreshed at open("/proc/kcore",) if necesasry.
>
> This is just a RFC. Any comments are welcome.
>
> [1/3] ... clean up kclist handling.
> [2/3] ... clean up kclist_add()
> [3/3] ... use /proc/iomem information for /proc/kcore.

Great cleanup! Thanks.

The only missing part that we still need is to also include
the kallsyms information, then the core would be even more useful.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

KAMEZAWA Hiroyuki

unread,
Jul 21, 2009, 8:29:57 PM7/21/09
to Andi Kleen, Andrew Morton, ebie...@xmission.com, xiyou.w...@gmail.com, tao...@oracle.com, linux-...@vger.kernel.org, adob...@gmail.com, mtk.ma...@gmail.com, y-g...@jp.fujitsu.com
On Tue, 21 Jul 2009 13:29:57 +0200
Andi Kleen <an...@firstfloor.org> wrote:

> KAMEZAWA Hiroyuki <kamezaw...@jp.fujitsu.com> writes:
>
> > Now, /proc/kcore is built on kclist information which is constructed at boot.
> > This kclist includes physical memory range information but not updated at
> > memory hotplug. And, this information tends to includes big memory hole.
> >
> > On the other hand, /proc/iomem includes all physical memory information as
> > "System RAM" and this is updated properly and kdump use this, IIUC.
> > (I hope all archtecuture stores necessary information...)
> >
> > This patch tries to build kclist for physical memory(direct map) on
> > /proc/iomem info. It's refreshed at open("/proc/kcore",) if necesasry.
> >
> > This is just a RFC. Any comments are welcome.
> >
> > [1/3] ... clean up kclist handling.
> > [2/3] ... clean up kclist_add()
> > [3/3] ... use /proc/iomem information for /proc/kcore.
>
> Great cleanup! Thanks.
>

Thank you. I'll reveiw this set again and post v2.

> The only missing part that we still need is to also include
> the kallsyms information, then the core would be even more useful.
>

yes.

Thanks,
-Kame

0 new messages