tl;dr test via use flag makes sense; shipping images needs a more
documented discussion (perf,mem,info-leaks).
On Wed, Mar 13, 2013 at 3:28 PM, Jason Abele <
jab...@chromium.org> wrote:
> On Wed, Mar 13, 2013 at 12:19 PM, Mike Frysinger <
vap...@chromium.org> wrote:
>> On Wed, Mar 13, 2013 at 12:52 PM, Jason Abele <
jab...@chromium.org> wrote:
>>> On Tue, Mar 12, 2013 at 8:18 AM, Mike Frysinger <
vap...@chromium.org> wrote:
>>>> On Tue, Mar 12, 2013 at 11:06 AM, Doug Anderson <
dian...@chromium.org>
>>>> wrote:
>>>> > On Mon, Mar 11, 2013 at 1:16 PM, Mike Frysinger <
vap...@chromium.org>
>>>> > wrote:
>>>> >>> nice! any ability to pass the USE=debug into build_packages
>>>> >>> (selectively, so that it only affects the kernel) or cros_workon_make?
>>>> >>
>>>> >> you can do:
>>>> >> USE=debug ./build_packages ...
>>>> >
>>>> > What about using "cros-debug" for Jason's needs rather than "debug"?
>>>> >
>>>> > I believe that cros-debug is enabled by default for local developer
>>>> > builds and turned off for builders.
>>>>
>>>> that should work
>>>
>>> So, the issue with both cros-debug and debug as I see it is that it does not
>>> address the need to have this capability already available on the images
>>> used in the test lab.
>>
>> those images are test images right ? seems like a safe assumption,
>> but you didn't explicitly say here.
>
> it is an assumption on my part as well, I'll know soon ... but I do
> know they prefer to use images produced by the builders and selected
> as the next release candidate by the TPMs
Unfortunately, it would encourage further movement away from
testing-what-we-ship. If this was an option for test images, that may
make sense (when we know we need more).
>>
>>> * autotest run yields a failure with logs/traces pointing to a
>>> driver/hardware issue
>>> * chromium engineer raises partner issue
>>> * partner responds with trivial kernel patch enabling more printk's or
>>> tracing
>>> * chromium engineer builds custom chrome os image
>>> * chromium engineer waits while custom chrome os image is loaded and
>>> retested in lab
>>> * chromium engineer responds to partner with new logs
>>> * repeat partner request for trivial kernel patch in new area
>>>
>>> I would like to avoid building a whole custom image for the test lab, when
>>> all the partner seeks is better logging/tracing so they can understand and
>>> localize the issue.
Even with DYNAMIC_DEBUG, we wouldn't have gotten enough information.
Unfortunately, it seems like a lack of effective logging itself may be
one of the bigger culprits.
>>> How do we add DYNAMIC_DEBUG to the kernels on the test images generated
>>> automatically for the test lab?
>>
>> if you're running test images, then we might want to investigate
>> allowing different USE settings for test images.
>
> I think that would be excellent, but I have no real idea how to pursue
> it ... pointers welcome, I can think of a few use cases for such
> things in the connectivity group. I could definitely meet my
> immediate needs with something like this.
vapier and others can help with The-Right-Way :) In general, you can
declare a IUSE in the ebuild for the kernel, then check the 'use
dyndebug' during build and potentially grab a different config or slap
on the CONFIG_DYNAMIC_DEBUG=y to the build args. Making that
configurable with test images means working up the build stack from
there.
>> we also have barriers for enabling new options for all kernels that is
>> more than a simple device driver or additional kernel module: what is
>> the space (disk/runtime) and execution overhead ? you posted some
>> stats on the CL about disk usage.
>
> summary: CONFIG_DYNAMIC_DEBUG adds about 200k of resident memory in
> the kernel (plus some for strings in modules)
>
> execution overhead should be vanishingly close to zero when the
> pr_debug in question has not been enabled via the
> debugfs/dynamic_debug/control
Has this been tested? This replaces non-existent code with a
conditional branch. Will overly verbose drivers take a hit?
> Yeah, I put the CL up while I continued to build my case ... and like
> magic, a bunch of reviews appeared and asked questions, I felt I
> should address questions posed on the CL, on the CL. For anyone
> interested in the full details, see (but lets keep the discussion here
> until we resolve what will be an acceptable solution):
>
>
https://gerrit.chromium.org/gerrit/#/c/45274/
>
http://crbug.com/188825
>
http://crbug.com/188822
>
>> i know in the past when i wrote kernel code i'd sprinkle pr_debug()
>> around, and i'd use/calculate values to store in variables that were
>> only used by pr_debug(). that means the overhead from enabling this
>> option isn't purely a memory load/test/jump (i.e. if (debug) {
>> printk() }), but also the overhead of those additional variable
>> storage/calculation. when dynamic debug is off, that means gcc sees
>> "if (0) { pr_debug() }" which allows it to do all sorts of fun DCE on
>> the output.
>
> my understanding of DYNAMIC_DEBUG is that aside from the extra memory
> (found by 'size vmlinux' comments to the CL) its execution overhead
> will be close to "if (0) { pr_debug() }" when the specific pr_debug is
> not enabled via the control file.
>
> Keep in mind ...
>
> We do not plan to enable extra pr_debug's during normal use .. only in test
> We could use some of the nifty new filesystem namespacing to prevent
> user access to the control file
Namespacing (as is) isn't going to keep access away from the root user
:) One of the challenges with dynamic debug is that it means that we
completely lose any effort to not leak information about pointer
addresses out of the kernel. With ftrace, we can at least pick and
chose events, etc. Debug messages are free-form unless we try to
block cases where pointer-esque variables are replaced (ew).
> That said, I still am unclear why CONFIG_DYNAMIC_PRINTK is ok but
> CONFIG_DYNAMIC_DEBUG is not.
Does dynamic printk still exist?
> I think all the same arguments I have heard in favor of DYNAMIC_FTRACE
> apply to DYNAMIC_DEBUG.
DYNAMIC_FTRACE was grandfathered in. I'm not a fan of making all
kernel functions trivially redirect-able, but without it , ftrace
overhead was a concern (iirc).
> DYNAMIC_FTRACE adds ~100k to kernel memory (plus the sort of awsome
> about:tracing interface in chrome)
about:tracing access is brokered via debugd and its categories
restricted (as per debugd/src/helpers/systrace.sh). Even if we limit
unprivileged dynamic-debug to just wireless cards, we'd not have a
good sense of the data that may come out.
> DYNAMIC_FTRACE, like DYNAMIC_DEBUG adds little overhead when not
> enabled during normal use
>
>
> DYNAMIC_FTRACE, like DYNAMIC_DEBUG is for ease of diagnosing field or
> test failures
>
> For example, see these issues where partners requested extra kernel
> debug messages by diff attachment:
>
>
crosbug.com/p/18058#c8
This one was message additions and debug level changes.
>
crosbug.com/p/18188#c25
This one is a debug level change!
>
crosbug.com/p/17764#c64
This was a message addition, not a debug level change.
>
crosbug.com/p/15129#c54
>
crosbug.com/p/15129#c67
>
crosbug.com/p/15129#c75
They still had to add new pr_debugs that didn't exist before. Did
they send them upstream after?
> and some others which predate my chrome os involvement
To me, it seems like dynamic debug will make test images easier to
work with, but that there is still the problem of lacking the messages
needed for effective debugging. And if vendors start adding that,
we're going to burn even more memory on the strings. What's the plan
for enabling during use -- manually or by autotest or ?
> So, I am happy to have DYNAMIC_DEBUG just in test images, for now, but
>
> I do not know how to have different kernels for different image builds
> I can forsee a time when a neat interface to DYNAMIC_DEBUG, like
> about:tracing provides for DYNAMIC_FTRACE could help debug issues, but
> I am not asking for that (right now :)
I'd argue that CONFIG_FTRACE and DYNAMIC_DEBUG are much more similar
than DYNAMIC_FTRACE. Dynamic ftrace rewrites function prologues while
dynamic debug introduces and if (unlikely()) followed by debug query
matching. While hot patching the kernel scares me, it means much less
potential runtime overhead. Additionally, ftrace's memory burden is
different than DYNAMIC_DEBUG because it doesn't keep arbitrary static
strings around.
Regardless, there is still a fundamental challenge here around
information leakage. As we work harder on making our kernel resilient
to attacks from even root-level attackers, we don't want to add more
places for the kernel addresses to appear in random logs. This is a
tractable problem with ftrace, but dynamic debug is a bigger issue.
Debug messages have no expected format or classification mechanism
which will make it pretty much impossible to avoid leaking kernel
addresses on the regular if they are enabled. As we head down into
this rabbit hole, we'll need to see if there is a good way to approach
debug logging from the kernel on shipping devices -- either elegantly
or with brute strength.
For some test images, this makes perfect sense and a USE flag is
certainly the right place to start! For use on the shipping kernel,
I'd much prefer to see a short write-up that covers all the benefits,
drawbacks, and ways we can mitigate the risks (or do better debug
logging!) so we don't have to revisit old mail threads or old code
reviews, and can try to reach an approach that will keep all the
requirements sanely balanced.
thanks!
will