get_user_pages() still broken in 2.6

Timur Tabi

unread,

Sep 28, 2004, 6:40:26 PM9/28/04

to

I was hoping that this bug would be fixed in the 2.6 kernels, but
apparently it hasn't been.

Function get_user_pages() is supposed to lock user memory. However,
under extreme memory constraints, the kernel will swap out the "locked"
memory.

I have a test app which does this:

1) Calls our driver, which issues a get_user_pages() call for one page.
2) Calls our driver again to get the physical address of that page (the
driver uses pgd/pmd/pte_offset).
3) Tries allocate 1GB of memory (this system has 1GB of physical RAM).
4) Tries to get the physical address again.

In step 4, the physical address is usually zero, which means either
pgd_offset or pmd_offset failed. This indicates the page was swapped out.

I don't understand how this bug can continue to exist after all this
time. get_user_pages() is supposed to lock the memory, because drivers
use it for DMA'ing directly into user memory.

--
Timur Tabi
Staff Software Engineer
timur...@ammasso.com

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive: http://mail.nl.linux.org/kernelnewbies/
FAQ: http://kernelnewbies.org/faq/

Christoph Hellwig

unread,

Sep 28, 2004, 7:03:25 PM9/28/04

to

On Tue, Sep 28, 2004 at 05:40:26PM -0500, Timur Tabi wrote:
> I was hoping that this bug would be fixed in the 2.6 kernels, but
> apparently it hasn't been.
>
> Function get_user_pages() is supposed to lock user memory. However,
> under extreme memory constraints, the kernel will swap out the "locked"
> memory.
>
> I have a test app which does this:
>
> 1) Calls our driver, which issues a get_user_pages() call for one page.
> 2) Calls our driver again to get the physical address of that page (the
> driver uses pgd/pmd/pte_offset).
> 3) Tries allocate 1GB of memory (this system has 1GB of physical RAM).
> 4) Tries to get the physical address again.
>
> In step 4, the physical address is usually zero, which means either
> pgd_offset or pmd_offset failed. This indicates the page was swapped out.
>
> I don't understand how this bug can continue to exist after all this
> time. get_user_pages() is supposed to lock the memory, because drivers
> use it for DMA'ing directly into user memory.

get_user_pages locks the page in memory. It doesn't do anything about ptes.

Dave Hansen

unread,

Sep 28, 2004, 7:21:18 PM9/28/04

to

On Tue, 2004-09-28 at 16:03, Christoph Hellwig wrote:
> get_user_pages locks the page in memory. It doesn't do anything about ptes.

You probably want mlock(2) to keep the kernel from messing with the ptes
at all. But, you should probably really be thinking about why you're
accessing the page tables at all. I count *ONE* instance in drivers/
where page tables are accessed directly.

-- Dave

Rakesh Jagota

unread,

Sep 29, 2004, 12:49:17 AM9/29/04

to

Hi all,
I am working in linux, i would like to know abt whether can I open a file
inside the kernel module without using any application. If so how how the
files_struct will be maintained. Does a kernel module has this struct?

Waiting for any suggestion from the list.

Thanks in advance,
rakesh

Jeff Garzik

unread,

Sep 29, 2004, 1:00:01 AM9/29/04

to

Rakesh Jagota wrote:
> Hi all,
> I am working in linux, i would like to know abt whether can I open a file
> inside the kernel module without using any application. If so how how the
> files_struct will be maintained. Does a kernel module has this struct?

Don't do this. It's incompatible with namespaces.

Instead, figure out some way to pass the file contents to the kernel module.

Jeff

Amith

unread,

Sep 29, 2004, 1:37:19 AM9/29/04

to

hi there,
when your module is used by a process , it runs in that process context then it ( the process which used
your module) has a files_struct which is updated when you open a file from inside the kernel . A module doesnt have a
files_struct on its own , cause it is not a process and doesnt have s task_struct too ( doesn't need one ) .

cheers,
Amith

PS: Opening a file from inside the kernel is not a good idea anyway .

Rakesh Jagota

unread,

Sep 29, 2004, 1:56:30 AM9/29/04

to

Hi,
Thnx.

I want to implement socket from the module. I won't be having any user
process running to handle the descriptors coming from socket. Could you pl
tell me how to handle the socket descriptor from the kernel module.

Thanks,
rakesh

Rakesh Jagota

unread,

Sep 29, 2004, 1:56:44 AM9/29/04

to

Hi Amith,
Thanx a lot for ur prompt reply.

I want to implement socket from the module. I won't be having any user
process running to handle the descriptors coming from socket. Could you pl

tell me how to handle the socket descriptor from the kernel module?
Thanx,
rakesh

Rakesh Jagota

unread,

Sep 29, 2004, 2:05:06 AM9/29/04

to

Hi all,
what is the difference between provcess and kernel module? As process
contains text, data, bss, stack & heap, Will all this present for kernel
Module also? asking silly Q?

Is it possible to do like whatever we are doing inside the process, can we
do the same in the kernel module.

Thanks,

Dhiman, Gaurav

unread,

Sep 29, 2004, 2:17:35 AM9/29/04

to

Well Process is an entity running over the kernel in user space, where
as module is a service provider in the kernel which become a part of
kernel and is used by user process thru system calls ...... so module
does not run as a process it is used by process to perform some actions
in kernel space. For e.g. device driver module in kernel are called when
a device file is opened, read, write or any other action performed on
device file.

Cheers !!
Gaurav

Amith

unread,

Sep 29, 2004, 2:43:30 AM9/29/04

to

Rakesh Jagota wrote:

hi ,

My comments inline.

>
> Hi all,
> what is the difference between provcess and kernel module?

A Process -
1) Can be executed on its own . Is an executable.
2) Has life , it runs on the CPU .Has a kernel data_structue associated with it called struct task_struct in kernel
space.
3) Has a process id.

A Module - is just object code linked with the kernel when insmod'ed,

1) Can't be executed on their own , it is not an executable .
2) Doesn't have any struct task_struct associated with it, Since it is not a process .
3) Can be considered as a library , which processes use ( through an interface ), hence they run in process context (
not always though ).

As process
> contains text, data, bss, stack & heap, Will all this present for kernel
> Module also?
asking silly Q?

i suggest you try some simple modules , it would clear your doubts.

>
> Is it possible to do like whatever we are doing inside the process, can we
> do the same in the kernel module.

More than what a process can do , cause your module code is executing in kernel mode.
>
> Thanks,
> rakesh
>

cheers,
Amith

Brandon Niemczyk

unread,

Sep 29, 2004, 3:36:01 AM9/29/04

to

On Wed, 29 Sep 2004 11:35:06 +0530, Rakesh Jagota
<j.ra...@gdatech.co.in> wrote:
> Hi all,
> what is the difference between provcess and kernel module? As process
> contains text, data, bss, stack & heap, Will all this present for kernel
> Module also? asking silly Q?

IIUC a kernel module uses the kernel's stack and heap. And because of
this modules which use a lot of stack space can cause some serious
problems.

>
> Is it possible to do like whatever we are doing inside the process, can we
> do the same in the kernel module.
>
> Thanks,
> rakesh
>
> --
> Kernelnewbies: Help each other learn about the Linux kernel.
> Archive: http://mail.nl.linux.org/kernelnewbies/
> FAQ: http://kernelnewbies.org/faq/
>
>

--
Brandon Niemczyk
http://bniemczyk.doesntexist.com

Dhiman, Gaurav

unread,

Sep 29, 2004, 3:48:42 AM9/29/04

to

There is no kernel stack as such.
When we make system call, we enter the kernel mode and CPU SS and SP to
the Process specific ring 0 stack (this stack is used in kernel space
but belong to specific process). So if Process P1 makes system call and
at the same time P2 Also make the system call both the execution threads
will have different stacks in kernel mode. There is not kernel stack as
whole which is shared by all the process in kernel mode.

Read about it in Intel Arch Docs, which explains the Task switch and
Stack Switching concepts.

Cheers !!
Gaurav

manish regmi

unread,

Sep 29, 2004, 3:49:27 AM9/29/04

to

On Wed, 29 Sep 2004 11:35:06 +0530, Rakesh Jagota
<j.ra...@gdatech.co.in> wrote:

> Hi all,
> what is the difference between provcess and kernel module? As process
> contains text, data, bss, stack & heap, Will all this present for kernel
> Module also? asking silly Q?
>
> Is it possible to do like whatever we are doing inside the process, can we
> do the same in the kernel module.
>
> Thanks,
> rakesh
>

hi,
I think We had a good discussion on similar topic some time ago. I
think the archives would give you good concept.

http://mail.nl.linux.org/kernelnewbies/2004-06/msg00332.html

regards manish

Dhiman, Gaurav

unread,

Sep 29, 2004, 5:01:30 AM9/29/04

to

ftp://download.intel.com/design/PentiumII/manuals/24319202.pdf

The above link will download the specific Intel Arch Doc, which I was talking about (check chapter 4, 5 and 6 of this doc for information related to task switch and stack switching).

If you are interested in more Intel Arch Docs, refer to following link:

http://www.x86.org/intel.doc/386manuals.htm

Cheers !!

Gaurav

From: SiM [mailto:face2fac...@yahoo.co.in]
Sent: Wednesday, September 29, 2004 1:56 PM
To: Dhiman, Gaurav
Subject: RE: Difference between process & kernel module

Hi Dhiman,

Could you please send me the links to the Indtel Arch docs,

I'am unable to locate it !

TIA,

Cheers.

Simith

"Dhiman, Gaurav" <Gaurav...@ca.com> wrote:

There is no kernel stack as such.
When we make system call, we enter the kernel mode and CPU SS and SP to
the Process specific ring 0 stack (this stack is used in kernel space
but belong to specific process). So if Process P1 makes system call and
at the same time P2 Also make the system call both the execution threads
will have different stacks in kernel mode. There is not kernel stack as
whole which is shared by all the process in kernel mode.

Read about it in Intel Arch Docs, which explains the Task switch and
Stack Switching concepts.

Cheers !!
Gaurav

-----Original Message-----
From: kernelnewb...@nl.linux.org
[mailto:kernelnewb...@nl.linux.org] On Behalf Of Brandon Niemczyk
Sent: Wednesday, September 29, 2004 1:06 PM
To: Rakesh Jagota
Cc: kernel...@nl.linux.org
Subject: Re: Difference between process & kernel module

On Wed, 29 Sep 2004 11:35:06 +0530, Rakesh Jagota

wrote:
> Hi all,
> what is the difference between provcess and kernel module? As process
> contains text, data, bss, stack & heap, Will all this present for
kernel
> Module also? asking silly Q?

IIUC a kernel module uses the kernel's stack and heap. And because of
this modules which use a lot of stack space can cause some serious
problems.

>

> Is it possible to do like whatever we are doing inside the process,
can we
> do the same in the kernel module.
>
> Thanks,
> rakesh
>

> --
> Kernelnewbies: Help each other learn about the Linux kernel.
> Archive: http://mail.nl.linux.org/kernelnewbies/
> FAQ: http://kernelnewbies.org/faq/
>
>

--
Brandon Niemczyk
http://bniemczyk.doesntexist.com

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive: http://mail.nl.linux.org/kernelnewbies/
FAQ: http://kernelnewbies.org/faq/

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive: http://mail.nl.linux.org/kernelnewbies/
FAQ: http://kernelnewbies.org/faq/

Yahoo! India Matrimony: Find your life partner online.

Stephane List

unread,

Sep 29, 2004, 5:24:49 AM9/29/04

to

>IIUC a kernel module uses the kernel's stack and heap. And because of
>this modules which use a lot of stack space can cause some serious
>problems.
>
>
>

Does Linux provide a mecanism to panic in case of stack or heap overflow ?
If I run Linux in an emulator, is there a thing I could trace to detect
such problem ?

Thanks

Stephane

Brandon Niemczyk

unread,

Sep 29, 2004, 6:00:13 AM9/29/04

to

apparently my post is a bit wrong, see Gaurav's posts.

That said, I found the following in arch/i386/kernel/irq.c

#ifdef CONFIG_DEBUG_STACKOVERFLOW
/* Debugging check for stack overflow: is there less than 1KB free? */
{
long esp;

__asm__ __volatile__("andl %%esp,%0" :
"=r" (esp) : "0" (THREAD_SIZE - 1));
if (unlikely(esp < (sizeof(struct thread_info) + STACK_WARN))) {
printk("do_IRQ: stack overflow: %ld\n",
esp - sizeof(struct thread_info));
dump_stack();
}
}
#endif

is that what you are looking for?

--
Brandon Niemczyk
http://bniemczyk.doesntexist.com

--

Stephane List

unread,

Sep 29, 2004, 8:20:09 AM9/29/04

to

Thanks for the link,

CONFIG_DEBUG_STACKOVERFLOW is available for 386 and ppc64 only, I was looking for the same thing but for arm processor.

Stephane

Timur Tabi

unread,

Sep 29, 2004, 10:48:09 AM9/29/04

to

Christoph Hellwig wrote:

> get_user_pages locks the page in memory. It doesn't do anything about ptes.

I don't understand the difference. I thought a locked page is one that
stays in memory (i.e. isn't swapped out) and whose physical address
never changes. Is that wrong? All I need to do is keep a page in
memory at the same physical address until I'm done with it.

--
Timur Tabi
Staff Software Engineer
timur...@ammasso.com

--

Timur Tabi

unread,

Sep 29, 2004, 10:46:38 AM9/29/04

to

Dave Hansen wrote:

> You probably want mlock(2) to keep the kernel from messing with the ptes
> at all.

mlock() can only be called via sys_mlock(), which is a user-space call.
Not only that, but only root can call sys_mlock(). This is not
compatible with our needs.

> But, you should probably really be thinking about why you're
> accessing the page tables at all. I count *ONE* instance in drivers/
> where page tables are accessed directly.

I access PTEs to get the physical addresses of a user-space buffer, so
that we can DMA to/from it directly.

--
Timur Tabi
Staff Software Engineer
timur...@ammasso.com

--

Christoph Hellwig

unread,

Sep 29, 2004, 11:01:34 AM9/29/04

to

On Wed, Sep 29, 2004 at 09:48:09AM -0500, Timur Tabi wrote:
> Christoph Hellwig wrote:
>
> > get_user_pages locks the page in memory. It doesn't do anything about ptes.
>
> I don't understand the difference. I thought a locked page is one that
> stays in memory (i.e. isn't swapped out) and whose physical address
> never changes. Is that wrong?

Yes. But if you're walking ptes you're looking at virtual addresses
somehow. Can you send me a pointer to your code please? I suspect
it's doing something terribly stupid.

Stuart MacDonald

unread,

Sep 29, 2004, 11:52:42 AM9/29/04

to

From: linux-ker...@vger.kernel.org

> I want to implement socket from the module. I won't be having any user
> process running to handle the descriptors coming from socket.
> Could you pl
> tell me how to handle the socket descriptor from the kernel module.

Check out fs/smbfs/sock.c.

..Stu

Jon Masters

unread,

Sep 29, 2004, 4:02:51 PM9/29/04

to

On Wed, 29 Sep 2004 11:24:49 +0200, Stephane List <sl...@lilotux.net> wrote:

> Does Linux provide a mecanism to panic in case of stack or heap overflow ?

Yes. That's the default action in StackOverflow(regs).

> If I run Linux in an emulator, is there a thing I could trace to detect
> such problem ?

You could trivially insert a breakpoint or infinate loop in to the
overflow function.

Jon.

Mandeep Sandhu

unread,

Sep 30, 2004, 6:42:42 AM9/30/04

to

hi list,

don't know wether this is the correct list to ask KGDB related prob.
If not, can someone point me to the mailing list where i can post my
ques.........or else read on.......

I'm trying to bring up a kgdb setup. I have 2 intel machines.
Target - mac1, Host - mac2. I downloaded the latest kgdb patch
for 2.6.8.1 on mac1 and applied it on the kernel src. I then
complied it on the target machine itself. Then copied (scp) the
kernel image + System.map to my host mac where i have the same
kgdb patched source (copied it to /usr/src/linux-2.6.8.1 of mac2).

made changes to grub etc. and rebooted the target mac...which stopped
at "waiting for connection from remote server".

>From the host's /usr/src/linux-2.6.8.1 dir i ran
$> gdb vmlinuz-2.6.8.1-kgdb

connected to the target mac using "target remote /dev/ttyS0"

then i got the following line

(gdb) 0x(some addr) in ??

then when i try to step thru. using step command i get "Cannot find
bounds of current function".

If i type "continue" the target resumes booting fine.
any clues why this is happeneing???

TIA,
-mandeep

Artem B. Bityuckiy

unread,

Sep 30, 2004, 6:53:37 AM9/30/04

to

Did you load debugging symbols?

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

Mandeep Sandhu

unread,

Sep 30, 2004, 7:15:26 AM9/30/04

to

i don't think so......how do u do that???

Artem B. Bityuckiy

unread,

Sep 30, 2004, 7:35:47 AM9/30/04

to

You should use "add-symbol-file" command. The kernel should be compiled
with debuginfo. Also you should have the uncompressed kernel (or module)
image to read symbols from it.

--

Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

--

Mandeep Sandhu

unread,

Sep 30, 2004, 8:11:32 AM9/30/04

to

On Thu, 2004-09-30 at 17:05, Artem B. Bityuckiy wrote:
> You should use "add-symbol-file" command. The kernel should be compiled
> with debuginfo. Also you should have the uncompressed kernel (or module)
> image to read symbols from it.

Does this mean i have to uncompress the "bzImage" that i just made???
i think that is the prob. as i'm giving the compressed image as arg
to the gdb. i'll remake the image with "make vmlinuz". that shud
do right??

Artem B. Bityuckiy

unread,

Sep 30, 2004, 8:44:49 AM9/30/04

to

> Does this mean i have to uncompress the "bzImage" that i just made???
> i think that is the prob. as i'm giving the compressed image as arg
> to the gdb. i'll remake the image with "make vmlinuz". that shud
> do right??

You should build the linux image that:
1. built for debugging (switch this in the Linux configuration)
2. isn't compressed

--
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

--

Shakthi Kannan

unread,

Sep 30, 2004, 10:11:18 AM9/30/04

to

Hi!

<quote>

Does this mean i have to uncompress the "bzImage" that i just made???

</quote>

You are to "make vmlinux", which is very useful for debugging with
ksymoops and gdb,

Regards,

Shaks

mohanlal jangir

unread,

Sep 30, 2004, 10:24:31 AM9/30/04

to

>
> I'm trying to bring up a kgdb setup. I have 2 intel machines.
> Target - mac1, Host - mac2. I downloaded the latest kgdb patch
> for 2.6.8.1 on mac1 and applied it on the kernel src. I then
> complied it on the target machine itself. Then copied (scp) the
> kernel image + System.map to my host mac where i have the same

^^^^^^^^^^^^^

> >From the host's /usr/src/linux-2.6.8.1 dir i ran
> $> gdb vmlinuz-2.6.8.1-kgdb

^^^^^^^^^^^^^^^^^

Are the "kernel image" and vmlinuz-2.6.8.1-kgdb same?

Actually you should do other way around. Compile the kernel on mac2 and copy
bzImage to mac1. Because debugger needs vmlinux image as well as source
code. And if path to compiled sources on target and sources on host are
different , there may be problem (although I am not sure about this). If you
compile on host, you can simply escape from this problem.

Regards
Mohanlal

Mandeep Sandhu

unread,

Oct 1, 2004, 1:29:09 AM10/1/04

to

but the webpage on kgdb sourceforge says "make bzImage"!!!
why the discrepancy???? anyways i checked my .config file and
my kernel was built with debug info....

On Thu, 2004-09-30 at 18:14, Artem B. Bityuckiy wrote:
> > Does this mean i have to uncompress the "bzImage" that i just made???
> > i think that is the prob. as i'm giving the compressed image as arg
> > to the gdb. i'll remake the image with "make vmlinuz". that shud
> > do right??
> You should build the linux image that:
> 1. built for debugging (switch this in the Linux configuration)
> 2. isn't compressed
>
>

--