| Howdy.
|
| On systems with lots of processors (512 for example), catting /proc/interrupts
| fails with a "not enough memory" error.
|
| This was observed in 2.6.0-test8
|
| I tracked this down to this in proc_misc.c:
|
| static int interrupts_open(struct inode *inode, struct file *file)
| {
| unsigned size = 4096 * (1 + num_online_cpus() / 8);
| char *buf = kmalloc(size, GFP_KERNEL);
|
| The kmalloc fails here.
|
| I'm looking for suggestions on how to fix this. I came up with one fix
| that seems to work OK for ia64. I have attached it to this message.
| I'm looking for advice on what should be proposed for the real fix.
An alternative is to limit 'size' to a maximum of 128 KB (or whatever
the max. kmalloc() on ia64 is) and continue to use kmalloc().
Does that work for you?
Another alternative is to convert show_interrupts to use a seq_file
iterator.
--
~Randy
MOTD: Always include version info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
I think it'd make more sense to only use vmalloc when it's explicitly
too big for kmalloc - or simply switch on num_online_cpus > 100 or
whatever a sensible cutoff is (ie nobody but you would ever see this ;-))
M.
> static int interrupts_open(struct inode *inode, struct file *file)
> {
> unsigned size = 4096 * (1 + num_online_cpus() / 8);
> char *buf = kmalloc(size, GFP_KERNEL);
>
> The kmalloc fails here.
Ew.
We should probably use seq_file here, although vmalloc() should not
hurt.
Why __vmalloc() over vmalloc(), though? Eh, I do not even know what
__vmalloc() is?? ;)
Robert Love
> /* don't ask for more than the kmalloc() max size, currently 128 KB */
> if (size > 128 * 1024)
> size = 128 * 1024;
>- buf = kmalloc(size, GFP_KERNEL);
>+ buf = __vmalloc(size, GFP_KERNEL, PAGE_KERNEL);
>
>
kmalloc needs a contiguous memory block. Thus after a long runtime,
large kmalloc calls can fail due to fragmentation. I'd prefer if you
switch to vmalloc after 2*PAGE_SIZE.
Or switch to a line based seq file iterator, then you wouldn't need the
huge buffer.
--
Manfred
This is not the real fix.
Allowing people to use up vmalloc() space by opening the /proc files would
be a major DoS attack. Not worth it.
Instead, just make /proc/interrupts use the proper _sequence_ things, so
that instead of trying to print out everything in one go, you have the
"s_next()" thing to print them out one at a time. The seqfile interfaces
will then do the rigth thing with blocking/caching, and you only need a
single page.
Al - do we have some good documentation of how to use the seq-file
interface?
In the meantime, without documentation, the best place to look is just at
other examples. One such example would be the kernel/kallsyms.c case: see
how it does s_start/s_show/s_next/s_stop (or /proc/slabinfo, or vmstat, or
any number of them).
Linus
> Al - do we have some good documentation of how to use the seq-file
> interface?
There was a text on LWN and it's OK for starting point. I'll try to
put together something compact, but somebody will have to go through
the result and turn it into proper English...
No, please please please don't do these things.
vmalloc() is NOT SOMETHING YOU SHOULD EVER USE! It's only valid for when
you _need_ a big array, and you don't have any choice. It's slow, and it's
a very restricted resource: it's a global resource that is literally
restricted to a few tens of megabytes. It should be _very_ carefully used.
There are basically no valid new uses of it. There's a few valid legacy
users (I think the file descriptor array), and there are some drivers that
use it (which is crap, but drivers are drivers), and it's _really_ valid
only for modules. Nothing else.
Basically: if you think you need more memory than a kmalloc() can give,
you need to re-organize your data structures. To either not need a big
area, or to be able to allocate it in chunks.
Linus
OK, I was actually trying to avoid the use of vmalloc, instead of the
unconditional conversion to vmalloc, which is what the original patch did ;-)
But you are, of course, correct - in this case, it should be easy to use
the seq_file stuff to do it in smaller chunks, and use a smaller buffer.
M.
Yes, I realize that, but it's the old case of
"I'm totally faithful to my husband - I never sleep with other men when
he is around"
joke.
Basically, if it's wrong to use, it's wrong to use even occasionally. In
fact, having two different code-paths just makes the code worse.
Yes, I realize that sometimes you have to do it that way, and it might be
the simplest way to fix something. In this case, though, the cost and
fragility of a generic interface is not worth it, since the problem isn't
actually in the generic code at all..
Linus
|
| On Tue, 11 Nov 2003, Erik Jacobson wrote:
| >
| > I'm looking for suggestions on how to fix this. I came up with one fix
| > that seems to work OK for ia64. I have attached it to this message.
| > I'm looking for advice on what should be proposed for the real fix.
|
| This is not the real fix.
|
| Allowing people to use up vmalloc() space by opening the /proc files would
| be a major DoS attack. Not worth it.
|
| Instead, just make /proc/interrupts use the proper _sequence_ things, so
| that instead of trying to print out everything in one go, you have the
| "s_next()" thing to print them out one at a time. The seqfile interfaces
| will then do the rigth thing with blocking/caching, and you only need a
| single page.
|
| Al - do we have some good documentation of how to use the seq-file
| interface?
See http://lwn.net/Articles/22355/
and http://www.xenotime.net/linux/doc/seq_file_howto.txt
| In the meantime, without documentation, the best place to look is just at
| other examples. One such example would be the kernel/kallsyms.c case: see
| how it does s_start/s_show/s_next/s_stop (or /proc/slabinfo, or vmstat, or
| any number of them).
--
~Randy
MOTD: Always include version info.
FYI we are seeing this on ppc64 too (less cpus but probably more
interrupt sources).
Anton
Does anybody else find hw_interrupt_type.typename lacking?
"XT-PIC" doesn't tell us if it is level or edge triggered, high or low
polarity.
"IO-APIC-edge" and "IO-APIC-level" don't tell us if it is high or low
polarity.
Having this info easily available in /proc/interrupts would make
recognizing and diagnosing interrupt configuration issues much easier.
thanks,
-Len
That would be http://lwn.net/Articles/22355/.
If you'd like a version packaged up for Documentation/, say the word and I
can do that.
jon
Jonathan Corbet
Executive editor, LWN.net
cor...@lwn.net
The IPC code is doing ugly things too:
void* ipc_alloc(int size)
{
void* out;
if(size > PAGE_SIZE)
out = vmalloc(size);
else
out = kmalloc(size, GFP_KERNEL);
return out;
}
Anton
That seems particularly .... odd ... as PAGE_SIZE isn't anywhere near the
breakpoint. Worst case (and I know I'll get yelled at for this, but I'll
get another amusing analogy out of Linus ;-)) we should just call kmalloc
and if it fails, then try vmalloc ...
M.
> That seems particularly .... odd ... as PAGE_SIZE isn't anywhere near the
> breakpoint. Worst case (and I know I'll get yelled at for this, but I'll
> get another amusing analogy out of Linus ;-)) we should just call kmalloc
> and if it fails, then try vmalloc ...
You're Cruisin' for a bruisin' ;)
I wonder if it ever does exceed 128k anyway...
static int semctl_main(...)
{
int nsems = sma->sem_nsems;
...
sem_io = ipc_alloc(sizeof(ushort)*nsems);
...
This is a bit OT in this thread but...
Doesn't the information in /proc/interrupts really belong somewhere in sysfs?
Regards
Erlend Aasland
Only if you want to open 10 billion files to get the data every time.
M.