Thoughts on sending error telemetry for Jenkins Essentials

44 views
Skip to first unread message

R. Tyler Croy

unread,
Feb 16, 2018, 9:51:29 AM2/16/18
to jenkin...@googlegroups.com

One of the necessary details, in my opinion, to make Jenkins Essentials [0]
successful is providing near-real-time error telemetry. Coupled with the
"Evergreen" distribution system [1], error telemetry "post-deploy" will be
absolutely crucial to determine whether or not we have just pushed out bad code
worthy of reverting.

I currently define "error telemetry" to include:

* Uncaught exceptions which cause the Evil Jenkins 500 page
* Logged ERROR messages, with or without exceptions
* Logged WARN messages, with or without exceptions

This list is by no means set in stone, and it is expected that there's going to
be some "noise" in the system, so tooling upstream of this error telemetry
won't be looking for the presence of errors but rather tracking patterns over
time [2].


The big challenge that we have, for which I wanted feedback, is *how* we can
acquire this error telemetry


My first prototype in this area was a plugin which integrates with the
Sentry[3] error reporting service: https://github.com/jenkinsci/sentry-plugin
This approach basically spins up a background busy-waiting thread which loops
over all the loggers in the JVM, and adds the SentryHandler to loggers. Not the
prettiest solution but it mostly works. There is an opportunity to miss
logged errors before the SentryHandler is added, but it's hard to quantify how
serious a gap that might be.

I am not /thrilled/ with this approach, but it meets a very important criteria in
that it's non-invasive to core and other plugins and can simply be installed in
a Jenkins instance in order to work.


I wanted to ask for more thoughts on alternative approaches, if they exist,
which would enable the collection of the error telemetry discussed above. I'm
sure there's something I'm missing.




[0] https://github.com/jenkinsci/jep/tree/master/jep/300
[1] https://github.com/jenkinsci/jep/tree/master/jep/300#auto-update
[2] For example: https://itmonitor.zenoss.com/is-your-performance-normal-how-do-you-know/
[3] https://sentry.io


Cheers
- R. Tyler Croy

------------------------------------------------------
Code: <https://github.com/rtyler>
Chatter: <https://twitter.com/agentdero>
xmpp: rty...@jabber.org

% gpg --keyserver keys.gnupg.net --recv-key 1426C7DC3F51E16F
------------------------------------------------------
signature.asc

Robert Sandell

unread,
Feb 16, 2018, 10:02:51 AM2/16/18
to jenkin...@googlegroups.com
Well IIUC since the evergreen master (and possibly also agents) are specially produced docker images where you control how Jenkins starts why not just provide a custom JUL configuration file? Either via the java property -Djava.util.logging.config.file=myLoggingConfigFilePath  or just put it in JDK_HOME/jre/lib/logging.properties in the container.
That way you can get the error logs all the way from jetty startup to the end.

/B


--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/20180216145116.yizslgftmjgnhwmn%40blackberry.coupleofllamas.com.
For more options, visit https://groups.google.com/d/optout.



--
Robert Sandell
Software Engineer
CloudBees, Inc.
CloudBees-Logo.png
Twitter: robert_sandell

Jesse Glick

unread,
Feb 16, 2018, 10:36:34 AM2/16/18
to Jenkins Dev
On Fri, Feb 16, 2018 at 9:51 AM, R. Tyler Croy <ty...@monkeypox.org> wrote:
> I currently define "error telemetry" to include:
>
> * Uncaught exceptions which cause the Evil Jenkins 500 page
> * Logged ERROR messages, with or without exceptions

Absolutely.


> * Logged WARN messages, with or without exceptions

Possibly. Cf.

https://github.com/jenkinsci/jenkins/blob/09e1f8cd0ca173f3526f016e9ff18410fb422807/test/src/test/java/jenkins/model/StartupTest.java#L38-L45

But beware that the list of plugins which emit rather spurious
`WARNING`s is pretty long at the moment. I guess we will have the
chance to start fixing those cases among Essentials plugins, but how
do you plan to filter out cases coming from “inessential” [am I
allowed to say that?] plugins over which we have little control?


> a background busy-waiting thread which loops
> over all the loggers in the JVM, and adds the SentryHandler to loggers.

Huh?? All you need to do is add a handler to the root logger, once at
startup (say in an `@Initializer`). That will receive records sent to
any sublogger.

To Bobby’s response, well yes you would miss some early messages that
way, but Jenkins core is already capturing a bunch of records at
`INFO`+

https://github.com/jenkinsci/jenkins/blob/09e1f8cd0ca173f3526f016e9ff18410fb422807/core/src/main/java/hudson/WebAppMain.java#L84-L295

so in your initializer you can check for anything interesting in that
buffer too. Anything in Jetty startup prior to `contextInitialized` is
unlikely, and these would typically not be bugs in software we were
updating via Evergreen anyway.


Anyway, +1 for concept. Will this be a JEP?

R. Tyler Croy

unread,
Feb 16, 2018, 11:17:40 AM2/16/18
to jenkin...@googlegroups.com
(replies inline)

On Fri, 16 Feb 2018, Jesse Glick wrote:

> On Fri, Feb 16, 2018 at 9:51 AM, R. Tyler Croy <ty...@monkeypox.org> wrote:
> > I currently define "error telemetry" to include:
> >
> > * Uncaught exceptions which cause the Evil Jenkins 500 page
> > * Logged ERROR messages, with or without exceptions
>
> Absolutely.
>
>
> > * Logged WARN messages, with or without exceptions
>
> Possibly. Cf.
>
> https://github.com/jenkinsci/jenkins/blob/09e1f8cd0ca173f3526f016e9ff18410fb422807/test/src/test/java/jenkins/model/StartupTest.java#L38-L45
>
> But beware that the list of plugins which emit rather spurious
> `WARNING`s is pretty long at the moment. I guess we will have the
> chance to start fixing those cases among Essentials plugins, but how
> do you plan to filter out cases coming from ???inessential??? [am I
> allowed to say that?] plugins over which we have little control?


Indeed! I've seen, and reported, a lot of these spurious warnings in JIRA based
on my work with the "Code Valet" experiment/prototype. Fortunately most
developers be fine with tweaking the log levels for spurious WARNING logs which
might confuse the already beleaguered Jenkins administrator :)


> > a background busy-waiting thread which loops
> > over all the loggers in the JVM, and adds the SentryHandler to loggers.
>
> Huh?? All you need to do is add a handler to the root logger, once at
> startup (say in an `@Initializer`). That will receive records sent to
> any sublogger.


Heh, I'm aware that it's not a good approach, thus my questions here.

See also: http://littlefun.org/uploads/52309db3e691b236df7d6b76_736.jpg


> To Bobby???s response, well yes you would miss some early messages that
> way, but Jenkins core is already capturing a bunch of records at
> `INFO`+
>
> https://github.com/jenkinsci/jenkins/blob/09e1f8cd0ca173f3526f016e9ff18410fb422807/core/src/main/java/hudson/WebAppMain.java#L84-L295
>
> so in your initializer you can check for anything interesting in that
> buffer too. Anything in Jetty startup prior to `contextInitialized` is
> unlikely, and these would typically not be bugs in software we were
> updating via Evergreen anyway.
>
>
> Anyway, +1 for concept. Will this be a JEP?


Definitely! This will be in a JEP as my goal is to provide comprehensive design
documents (via JEP) for as much of Jenkins Essentials as possible for two
reasons: it's the right thing to do, and also it helps me identify gaps and
build a better thing.


Thanks Jesse and Bobby for the ideas, I'll experiment with the custom logger
configuration file approach and verify that it will work as suggested.
signature.asc

Baptiste Mathus

unread,
Feb 17, 2018, 4:54:41 PM2/17/18
to Jenkins Developers


Le 16 févr. 2018 15:51, "R. Tyler Croy" <ty...@monkeypox.org> a écrit :

One of the necessary details, in my opinion, to make Jenkins Essentials [0]
successful is providing near-real-time error telemetry. Coupled with the
"Evergreen" distribution system [1], error telemetry "post-deploy" will be
absolutely crucial to determine whether or not we have just pushed out bad code
worthy of reverting.

I currently define "error telemetry" to include:

 * Uncaught exceptions which cause the Evil Jenkins 500 page
 * Logged ERROR messages, with or without exceptions
 * Logged WARN messages, with or without exceptions

Totally agreed automated reporting is a must.

Shouldn't the evergreen client send feedback too? Like if it triggered a Jenkins restart and never heard back since?

How about also a less automated /form/ in the Jenkins UI itself, to be used by human in case something is clearly wrong but didn't cause logs or outages.
About that probably a clear web ui somewhere in case everything went wrong.

General thought/note: this probably will require some setup to avoid attackers can trigger an auto-revert by sending bad reports to the telemetry endpoint.


This list is by no means set in stone, and it is expected that there's going to
be some "noise" in the system, so rooming upstream of this error telemetry

R. Tyler Croy

unread,
Feb 21, 2018, 9:44:52 AM2/21/18
to Baptiste Mathus, Jenkins Developers
(replies inline)

On Sat, 17 Feb 2018, Baptiste Mathus wrote:

> Le 16 févr. 2018 15:51, "R. Tyler Croy" <[1]ty...@monkeypox.org> a écrit :
>
>
> One of the necessary details, in my opinion, to make Jenkins Essentials [0]
> successful is providing near-real-time error telemetry. Coupled with the
> "Evergreen" distribution system [1], error telemetry "post-deploy" will be
> absolutely crucial to determine whether or not we have just pushed out bad
> code
> worthy of reverting.
>
> I currently define "error telemetry" to include:
>
>  * Uncaught exceptions which cause the Evil Jenkins 500 page
>  * Logged ERROR messages, with or without exceptions
>  * Logged WARN messages, with or without exceptions
>
>
>
> Totally agreed automated reporting is a must.
>
> Shouldn't the evergreen client send feedback too? Like if it triggered a
> Jenkins restart and never heard back since?


Your questions are definitely on the right track but I have been mentally
segmenting Jenkins _error_ telemetry from "generalized telemetry." For example,
my thinking recently evolved to change an "update" service to a "status"
service to more thoroughly accomodate the "status" from evergreen-client (for
example, is the Jenkins online, what version, how long has it been online,
etc).

> How about also a less automated /form/ in the Jenkins UI itself, to be used by
> human in case something is clearly wrong but didn't cause logs or outages.
> About that probably a clear web ui somewhere in case everything went wrong.

I like the idea theory, but in practice I believe we would get a tremendous
amount of low-signal "bug reports" through any such functionality and I don't
have the capacity to triage and handle that kind of feedback from users, thus
the automated routes :)


> General thought/note: this probably will require some setup to avoid attackers
> can trigger an auto-revert by sending bad reports to the telemetry endpoint.

Well certainly, "don't 100% trust client data" should be a foundational
principle for most applications :)


As an aside, your mail client sure likes to do non-standard quoting and inline
replies :/




>
>
>
> This list is by no means set in stone, and it is expected that there's
> going to
> be some "noise" in the system, so rooming upstream of this error telemetry
> won't be looking for the presence of errors but rather tracking patterns
> over
> time [2].
>
>
> The big challenge that we have, for which I wanted feedback, is *how* we
> can
> acquire this error telemetry
>
>
> My first prototype in this area was a plugin which integrates with the
> Sentry[3] error reporting service: [2]https://github.com/jenkinsci/
> sentry-plugin
> This approach basically spins up a background busy-waiting thread which
> loops
> over all the loggers in the JVM, and adds the SentryHandler to loggers. Not
> the
> prettiest solution but it mostly works. There is an opportunity to miss
> logged errors before the SentryHandler is added, but it's hard to quantify
> how
> serious a gap that might be.
>
> I am not /thrilled/ with this approach, but it meets a very important
> criteria in
> that it's non-invasive to core and other plugins and can simply be
> installed in
> a Jenkins instance in order to work.
>
>
> I wanted to ask for more thoughts on alternative approaches, if they exist,
> which would enable the collection of the error telemetry discussed above.
> I'm
> sure there's something I'm missing.
>
>
>
>
> [0] [3]https://github.com/jenkinsci/jep/tree/master/jep/300
> [1] [4]https://github.com/jenkinsci/jep/tree/master/jep/300#auto-update
> [2] For example: [5]https://itmonitor.zenoss.com/
> is-your-performance-normal-how-do-you-know/
> [3] [6]https://sentry.io
>
>
> Cheers
> - R. Tyler Croy
>
> ------------------------------------------------------
>      Code: <[7]https://github.com/rtyler>
>   Chatter: <[8]https://twitter.com/agentdero>
>      xmpp: [9]rty...@jabber.org
>
>   % gpg --keyserver [10]keys.gnupg.net --recv-key 1426C7DC3F51E16F
> ------------------------------------------------------
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [11]jenkinsci-de...@googlegroups.com.
> To view this discussion on the web visit [12]https://groups.google.com/d/
> msgid/jenkinsci-dev/20180216145116.yizslgftmjgnhwmn%40blackberry.
> coupleofllamas.com.
> For more options, visit [13]https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [14]jenkinsci-de...@googlegroups.com.
> To view this discussion on the web visit [15]https://groups.google.com/d/msgid/
> jenkinsci-dev/
> CANWgJS6xJnYhcuxTzPwtP%3DSrgymJmc6gKOAsb-ThMbK4YrGcLg%40mail.gmail.com.
> For more options, visit [16]https://groups.google.com/d/optout.
>
> References:
>
> [1] mailto:ty...@monkeypox.org
> [2] https://github.com/jenkinsci/sentry-plugin
> [3] https://github.com/jenkinsci/jep/tree/master/jep/300
> [4] https://github.com/jenkinsci/jep/tree/master/jep/300#auto-update
> [5] https://itmonitor.zenoss.com/is-your-performance-normal-how-do-you-know/
> [6] https://sentry.io/
> [7] https://github.com/rtyler
> [8] https://twitter.com/agentdero
> [9] mailto:rty...@jabber.org
> [10] http://keys.gnupg.net/
> [11] mailto:jenkinsci-dev%2Bunsu...@googlegroups.com
> [12] https://groups.google.com/d/msgid/jenkinsci-dev/20180216145116.yizslgftmjgnhwmn%40blackberry.coupleofllamas.com
> [13] https://groups.google.com/d/optout
> [14] mailto:jenkinsci-de...@googlegroups.com
> [15] https://groups.google.com/d/msgid/jenkinsci-dev/CANWgJS6xJnYhcuxTzPwtP%3DSrgymJmc6gKOAsb-ThMbK4YrGcLg%40mail.gmail.com?utm_medium=email&utm_source=footer
> [16] https://groups.google.com/d/optout
signature.asc

Jesse Glick

unread,
Feb 21, 2018, 10:58:11 AM2/21/18
to Jenkins Dev
On Wed, Feb 21, 2018 at 9:44 AM, R. Tyler Croy <ty...@monkeypox.org> wrote:
>> How about also a less automated /form/ in the Jenkins UI itself, to be used by
>> human in case something is clearly wrong but didn't cause logs or outages.
>
> I like the idea theory, but in practice I believe we would get a tremendous
> amount of low-signal "bug reports" through any such functionality and I don't
> have the capacity to triage and handle that kind of feedback from users

Maybe a simple “sad face” or “bug” button somewhere in a standard footer?

If a user clicks it, we send back just the top-level Stapler view
being rendered (since a full URL would pose privacy concerns), e.g.:
`hudson/model/Run/console.jelly`

Not very precise feedback, but would at least alert us to sudden
spikes in problems encountered (according to the user’s evaluation) in
certain kinds of pages.

Robert Sandell

unread,
Feb 21, 2018, 11:14:18 AM2/21/18
to jenkin...@googlegroups.com
How do you feel about Jenkins today? 👍👎

:)

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr3NM5oNkgqHP-8-jNbr%3DvM9WNsRE1AksXDZ39aJDKKY_Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Robert Sandell

unread,
Feb 21, 2018, 11:23:18 AM2/21/18
to jenkin...@googlegroups.com
Or if you've ever seen one of these panels at the checkout counter in a store somewhere

R. Tyler Croy

unread,
Feb 21, 2018, 11:45:31 AM2/21/18
to Robert Sandell, jenkin...@googlegroups.com
(replies inline)

On Wed, 21 Feb 2018, Robert Sandell wrote:

> Or if you've ever seen one of these panels at the checkout counter in a
> store somewhere
>
> http://customerstrategy.net/customer-experience-improvement-systems-part-1/


Heh, I've seen those outside bathrooms, which I find kind of gross; I'm not
touching those buttons. :)

I understand the potential utility in the system proffered by jglick, so I will
keep that idea in mind, but punt it down the road for when we need to better
understand our users' relationship with Jenkins Essentials, rather than the
system itself.

Right now my primary focus is on automating the collection of useful telemetry
about the system in aggregate. Once that is in place I'm happy to entertain
further iterations to gather more fine-grained information.
signature.asc

R. Tyler Croy

unread,
Feb 28, 2018, 7:17:39 PM2/28/18
to jenkin...@googlegroups.com

I just wanted to make sure I shared an update on this thread. I have filed the
following ticket (https://issues.jenkins-ci.org/browse/JENKINS-49805) to
capture some of the prototyping work necessary here.

I expect we'll have a draft JEP coming up after the prototyping work is done.


Thanks for the ideas and feedback everybody!
signature.asc
Reply all
Reply to author
Forward
0 new messages