[REGRESSION] RLIMIT_DATA crashes named

Laura Abbott

unread,

Sep 16, 2016, 11:16:57 AM9/16/16

to Konstantin Khlebnikov, Linus Torvalds, Andrew Morton, Cyrill Gorcunov, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

Hi,

Fedora received a bug report[1] after pushing 4.7.2 that named
was segfaulting with named-chroot. With some help (thank you
tibbs!), it was noted that on older kernels named was spitting
out

mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
Will be forbidden soon.

and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
workaround for valgrind") it now spits out

mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
Update limits or use boot option ignore_rlimit_data.

Apparently the segfault goes away when dropping datasize=size.
I haven't looked into the named code yet but what I'm
suspecting is named is not setting its limits correctly and
then corrupting itself. This may have existed for much longer
but the rlimit is only now exposing it.

I'd like to propose reverting f4fcd55841fc ("mm: enable RLIMIT_DATA
by default with workaround for valgrind") or default to setting
ignore_rlimit_data to true and spitting out a warning until
named can be fixed.

Thanks,
Laura

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1374917

Linus Torvalds

unread,

Sep 16, 2016, 1:47:15 PM9/16/16

to Laura Abbott, Sam Varshavchik, Brent, Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 8:16 AM, Laura Abbott <lab...@redhat.com> wrote:
>
> Fedora received a bug report[1] after pushing 4.7.2 that named
> was segfaulting with named-chroot. With some help (thank you
> tibbs!), it was noted that on older kernels named was spitting
> out
>
> mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
> Will be forbidden soon.
>
> and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
> workaround for valgrind") it now spits out
>
> mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
> Update limits or use boot option ignore_rlimit_data.

Ok, we can certainly revert, but before we do that I'd like to
understand a few more things.

For example, where the data limit came from, and how likely this is to
hit others that have a much harder time fixing it. Adding Sam
Varshavchik and Brent to the participants list...

In particular, this is clearly trivially fixable as noted by Brent in
that bugzilla entry:

'remove the "datasize 20M;" directive in named.conf'

along with the (much worse) option of "use boot option
ignore_rlimit_data" that the kernel dmesg itself suggests as an
option.

So for example, if that "datasize 20M;" is coming from just the Fedora
named package, it would be much nicer to just get that fixed instead.
Because RLIMIT_DATA the old way was just meaningless noise.

We definitely don't want to break peoples existing setups, but as this
is *so* easy to fix in other ways (even at runtime without even
updating a kernel), and since this commit is already four months old
by now with this single bugzilla being the only report since then that
I'm aware of, my reaction is just that there are better ways to fix it
than reverting a commit that can be worked around trivially.

Linus

Laura Abbott

unread,

Sep 16, 2016, 4:12:05 PM9/16/16

to Linus Torvalds, Sam Varshavchik, Brent, Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

As far as I can tell this isn't Fedora specific.

> We definitely don't want to break peoples existing setups, but as this
> is *so* easy to fix in other ways (even at runtime without even
> updating a kernel), and since this commit is already four months old
> by now with this single bugzilla being the only report since then that
> I'm aware of, my reaction is just that there are better ways to fix it
> than reverting a commit that can be worked around trivially.

I was debating the merits of a revert. My concern is that this bugzilla
just represents the people who are reporting the bug and able to
correlate it to named. The actual number of people who are seeing
problems may be higher and anyone mucking with their config
could hit this and then have to go through troubleshooting steps again.
Add a config, get a segfault is a pretty terrible experience even
by Linux standards. I'd feel better about not reverting if there
were a proposed patch for named

I would like to see RLIMIT_DATA actually do something useful so worse
case I'll figure out something to carry in Fedora and this thread
can be an FYI for people googling.

>
> Linus
>

Thanks,
Laura

Linus Torvalds

unread,

Sep 16, 2016, 4:32:52 PM9/16/16

to Laura Abbott, Sam Varshavchik, Brent, Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 1:10 PM, Laura Abbott <lab...@redhat.com> wrote:
>
> As far as I can tell this isn't Fedora specific.

Some googling does seem to say that "datalimit 20M" and "named.conf"
ends up being some really old default that just gets endlessly copied.

So no, it's not Fedora-specific per se.

But I suspect most people with a named.conf did either

(a) get it from their distro and didn't change it and so if the
distro just updates theirs, things will automatically "just work"

(b) actually did write their own (or at least edited it), and knows
what they are doing, and have absolutely no problem removing or
updating that datalimit thing.

> I would like to see RLIMIT_DATA actually do something useful so worse
> case I'll figure out something to carry in Fedora and this thread
> can be an FYI for people googling.

Yeah, even if we only get a good hit for "named segmentation fault", I
guess that will help people a lot.

The really annoying thing seems to be that the kernel message has been
hidden too much. IOW, Sam in his bugzilla report clearly found the
system messages with

Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of
user 25 dumped core.

but for some reason never noticed the kernel saying (quoting Jason):

mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
Update limits or use boot option ignore_rlimit_data

at the same time.

Ok, the kernel only says it *once*. Maybe Sam had it in his logs, but
didn't notice the initial failure (which would have had the kernel
message too), and he then looked at the logs for when he tried to
re-start.

Or maybe the system logs don't have those kernel messages, which would
be a disaster.

So maybe we should just change the "pr_warn_once()" into
"pr_warn_ratelimited()", except the default rate limits for that are
wrong (we'd perhaps want something like "at most once every minute" or
similar, while the default rate limits are along the lines of "max 10
lines every 5 _seconds_").

Sam, do you end up seeing the kernel warning in your logs if you just
go back earlier in the boot?

Linus

Sam Varshavchik

unread,

Sep 16, 2016, 6:36:19 PM9/16/16

to Linus Torvalds, Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

Linus Torvalds writes:

> On Fri, Sep 16, 2016 at 1:10 PM, Laura Abbott <lab...@redhat.com> wrote:
> >
> > As far as I can tell this isn't Fedora specific.
>
> Some googling does seem to say that "datalimit 20M" and "named.conf"
> ends up being some really old default that just gets endlessly copied.
>
> So no, it's not Fedora-specific per se.

I'll confirm that.

It's been sitting in my named.conf for at least ten years. I don't remember
where it came from. The Google sources are very likely. I probably copied
it, from some tutorial.

> But I suspect most people with a named.conf did either
>
> (a) get it from their distro and didn't change it and so if the
> distro just updates theirs, things will automatically "just work"
>
> (b) actually did write their own (or at least edited it), and knows
> what they are doing, and have absolutely no problem removing or
> updating that datalimit thing.

(b) in my case. Now that the root cause is mostly known, I'll just bump it
up.

> The really annoying thing seems to be that the kernel message has been
> hidden too much. IOW, Sam in his bugzilla report clearly found the
> system messages with
>
> Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of
> user 25 dumped core.
>
> but for some reason never noticed the kernel saying (quoting Jason):
>
> mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
> Update limits or use boot option ignore_rlimit_data
>
> at the same time.
>
> Ok, the kernel only says it *once*. Maybe Sam had it in his logs, but
> didn't notice the initial failure (which would have had the kernel
> message too), and he then looked at the logs for when he tried to
> re-start.

I still have this log file. Looking over it, this is indeed what happened.

> Or maybe the system logs don't have those kernel messages, which would
> be a disaster.
>
> So maybe we should just change the "pr_warn_once()" into
> "pr_warn_ratelimited()", except the default rate limits for that are
> wrong (we'd perhaps want something like "at most once every minute" or
> similar, while the default rate limits are along the lines of "max 10
> lines every 5 _seconds_").
>
> Sam, do you end up seeing the kernel warning in your logs if you just
> go back earlier in the boot?

Yes, I found it.

Sep 10 07:36:29 shorty kernel: mmap: named (1108): VmData 52588544 exceed
data ulimit 20971520. Update limits or use boot option ignore_rlimit_data.

Now that I know what to search for: this appeared about 300 lines earlier in
/var/log/messages.

When trying to figure out what's going on with named, searching backwards in
time, and finding the logged segault @07:38:23, IIRC I only looked as far
back until the @07:38:23 timestamp started, and did not see anything other
the apparent segfault. Before that, /var/log/messages was full of other
noise. The original named that was launched two minutes earlier was ancient
history, by then.

All I saw was that named was apparently segfaulting after booting a new
kernel. Ok, boot back to the previous kernel, search bugzilla to see if it
was reported already, and, if not, create it yourself. That's what happened.

Linus Torvalds

unread,

Sep 16, 2016, 7:59:31 PM9/16/16

to Sam Varshavchik, Ingo Molnar, Joe Perches, Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 3:30 PM, Sam Varshavchik <mr...@courier-mta.com> wrote:
>>
>> Sam, do you end up seeing the kernel warning in your logs if you just
>> go back earlier in the boot?
>
> Yes, I found it.
>
> Sep 10 07:36:29 shorty kernel: mmap: named (1108): VmData 52588544 exceed
> data ulimit 20971520. Update limits or use boot option ignore_rlimit_data.
>
> Now that I know what to search for: this appeared about 300 lines earlier in
> /var/log/messages.

Ok, so that's a pretty strong argument that we shouldn't just warn once.

Maybe the warning happened at bootup, and it is now three months
later, and somebody notices that something doesn't work. It might not
be *critical* (three months without working implies it isn't), but it
sure is silly for the kernel to say "yeah, I already warned you, I'm
not going to tell you why it's not working any more".

So it sounds like if the kernel had just had a that warning be
rate-limited instead of happening only once, there would never have
been any confusion about the RLIMIT_DATA change.

Doing a grep for "pr_warn_once()", I get the feeling that we could
just change the definition of "once" to be "at most once per minute"
and everybody would be happy.

Maybe we could change all the "pr_xyz_once()" to consider "once" to be
a softer "at most once per minute" thing. After all, these things are
*supposed* to be very uncommon to begin with, but when they do happen
we do want the user to be aware of them.

Here's a totally untested patch. What do people say?

Linus

patch.diff

Linus Torvalds

unread,

Sep 16, 2016, 8:05:12 PM9/16/16

to Sam Varshavchik, Ingo Molnar, Joe Perches, Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 4:58 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
> Here's a totally untested patch. What do people say?

Heh. It looks like "pr_xyz_once()" is used in places that haven't
included "ratelimit.h", so this doesn't actually build for everything.

But I guess as a concept patch it's not hard to understand, even if
the implementation needs a bit of tweaking.

Linus

Cyrill Gorcunov

unread,

Sep 17, 2016, 5:10:03 AM9/17/16

to Konstantin Khlebnikov, Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar, Laura Abbott, Brent, Andrew Morton, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

On Sat, Sep 17, 2016 at 11:33:56AM +0300, Konstantin Khlebnikov wrote:
> >
> > do_just_once just isn't a good name for a global
> > rate limited mechanism that does something very
> > different than the name.
> >
> > Maybe allow_once_per_ratelimit or the like
> >
> > There could be an equivalent do_once
> >
> > https://lkml.org/lkml/2009/5/22/3
> >
>
> What about this printk_reriodic() and pr_warn_once_per_minute()?
>
> It simply remembers next jiffies to print rather than using that
> complicated ratelimiting engine.

+#define printk_periodic(period, fmt, ...) \
+({ \
+ static unsigned long __print_next __read_mostly = INITIAL_JIFFIES; \
+ bool __do_print = time_after_eq(jiffies, __print_next); \
+ \
+ if (__do_print) { \
+ __print_next = jiffies + (period); \
+ printk(fmt, ##__VA_ARGS__); \
+ } \
+ unlikely(__do_print); \
+})

Seems I don't understand the bottom unlikely...

Konstantin Khlebnikov

unread,

Sep 17, 2016, 8:09:25 AM9/17/16

to Cyrill Gorcunov, Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar, Laura Abbott, Brent, Andrew Morton, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

This is gcc extrension: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
Here macro works as a function which returns bool

After second though macro should update __print_next if it's too far
if first warning happens too late here will long period of silence
untill next jiffies overlap.

something like

#define printk_periodic(period, fmt, ...)
({
static unsigned long __print_next = INITIAL_JIFFIES;
unsigned long __print_jiffies = jiffies;
bool __do_print = time_after_eq(__print_jiffies, __print_next);

if (__do_print) {
__print_next = __print_jiffies + (period);
printk(fmt, ##__VA_ARGS__);
} else if (time_after(__print_next, __print_jiffies + (period))
__print_next = __print_jiffies + (period);
unlikely(__do_print);
})

Cyrill Gorcunov

unread,

Sep 17, 2016, 8:20:48 AM9/17/16

to Konstantin Khlebnikov, Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar, Laura Abbott, Brent, Andrew Morton, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

On Sat, Sep 17, 2016 at 03:09:09PM +0300, Konstantin Khlebnikov wrote:
> >
> > Seems I don't understand the bottom unlikely...
>
> This is gcc extrension: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
> Here macro works as a function which returns bool

no, no, I know for what unlikely extension stand for.
it was just hard to obtain from without the context.
this extension implies someone calls for
if (printk_periodic()) right?

Joe Perches

unread,

Sep 17, 2016, 5:52:52 PM9/17/16

to Konstantin Khlebnikov, Cyrill Gorcunov, Linus Torvalds, Sam Varshavchik, Ingo Molnar, Laura Abbott, Brent, Andrew Morton, Christian Borntraeger, linu...@kvack.org, Linux Kernel Mailing List

On Sun, 2016-09-18 at 00:40 +0300, Konstantin Khlebnikov wrote:
> #define printk_periodic(period, fmt, ...)
> ({

> static unsigned long __prev __read_mostly = INITIAL_JIFFIES - (period);
> unsigned long __now = jiffies;
> bool __print = !time_in_range_open(__now, __prev, __prev + (period));
>
> if (__print) {
> __prev = __now;
> printk(fmt, ##__VA_ARGS__);
> }
> unlikely(__print);
> })

printk_periodic reads like a thing that would create a
thread to printk a message every period.

And trivially, period should be copied to a temporary
and not be reused (use your choice of # of underscores)

unsigned long _period = period;
unsigned long _now = now;
static unsigned long _prev __read_mostly = etc...

Dan Wilcox

unread,

Jul 15, 2021, 11:50:19 PM7/15/21

to

We are using a new July 2021 kernel on an Amazon AWS Linux server, and we are experiencing this same error you all are talking about in this thread:
mmap: htmlos.cgi (14349): VmData 20000768 exceed data ulimit 20000000. Update limits or use boot option ignore_rlimit_data.

I read everything in the thread, but nowhere do I see mention of the fix. What is the fix? :-)

Specifically, it's our htmlos.cgi processes that ALL receive this error. And in our situation all our major processes powering our various websites are htmlos.cgi, so a fix is crucial.

Any advice is appreciated. I am not a Linux guru, but I have SSH access, can get around, and when I cannot I employ someone who can. So if you give me a fix, it would be awesome if you walked me thru it slowly with the various commands.

Thank you very much.

Dan W.