Pre-JEP check: Jenkins telemetry

Daniel Beck

unread,

Aug 16, 2018, 7:14:23 PM8/16/18

to Jenkins Developers

Hi everyone,

This is a pre-JEP email per https://github.com/jenkinsci/jep/tree/master/jep/1#discuss-the-idea-with-the-community

Please provide feedback whether this looks worthwhile/feasible, and what questions you'd like to see answered in the JEP.

Thanks!
Daniel

---

We are occasionally in the situation that we need to understand how Jenkins is used or configured, typically to estimate the feasibility or impact of a planned change. As Jenkins is not a hosted SaaS, but distributed for local installation, we have very limited ways to get any information from our users in advance of actually performing the changes. Combined with the complexity of having 1500+ plugins with virtually unrestricted access to internal APIs means the risk of problems due to major changes is rather high. While we have _some_ tooling (mostly related to Java API use across components), that might only be a small subset of what we'd need to know.

While there are anonymous usage statistics, the data provided is fairly limited (mostly basic master/agent setup, job types and count, and installed plugins), and the collection protocol does not allow for it to be easily extended with additional information. Not to mention that the server-side processing is a mess and very difficult to change.

Therefore I propose a second, independent usage telemetry collection system with the following properties:

* Handles specific information needs: While the proposed basic collection service (both client and server) is generic, the specific information would be collected using extension implementations that are tailored to a specific information need.
* Defined, limited collection time span: Data for a specific purpose is only collected during a specific time frame determined in advance.
* Admin controlled: Data is collected if and only if the instance allows anonymous usage statistics collection.
* Restricted data access: Only select board members and officers of the Jenkins project have access to the raw collected data and will make that data available in an aggregate form to specific, previously determined developers.

This will keep both the client side and server side simple -- we’re not plotting monthly updated graphs of install stats, but rather collect information for a few weeks, evaluate it (perhaps just by grepping a bunch of files), and it’s over.

What follows below is an _example_ for a specific information need to illustrate a potential use case. It's not what the JEP would be about -- the JEP would define the generic infrastructure (extension point, collection service, etc., not a specific information collector. Information collected is also not necessarily related to System properties, but can, in principle, gather any information. We'd also never want to collect actual user data like job or user names, but just anonymous information about configuration and usage.

----

Example use case:
The Jenkins security team routinely adds 'escape hatches' to security fixes, typically in the form of system property flags that can disable a fix. We don’t know whether they’re actually needed, or how commonly they’re used. While we consider system properties unsupported (i.e. they can, in theory, be removed at any time), to the best of my knowledge we’ve never done so unless they simply didn’t apply anymore.

So I'd like to collect the values of several system properties for a duration of 4 weeks:

* hudson.ConsoleNote.INSECURE (as boolean)
* hudson.model.ParametersAction.keepUndefinedParameters (as boolean)
* hudson.model.User.allowNonExistentUserToLogin (as boolean)
* hudson.model.User.allowUserCreationViaUrl (as boolean)
* hudson.model.User.SECURITY_243_FULL_DEFENSE (as boolean)
* Whether hudson.model.ParametersAction.safeParameters is set to a non-empty string.
* Whether hudson.model.DirectoryBrowserSupport.CSP is defined, and if so, whether it's an empty string.

Additional metadata to be collected to provide context for some of the above:

* Jenkins core version
* Number of user records on disk
* Class name of the configured security realm

Oleg Nenashev

unread,

Aug 17, 2018, 7:17:54 AM8/17/18

to Jenkins Developers

Hi Daniel,

Do you plan to implement this data collection in the core directly or in a plugin?

In any case, it looks like a good opportunity to for us to get some data, so +1 for implementing the engine.

Obviously, we need to review conflicts with JEP-308 (Telemetry API for evergreen) so that the proposals do not conflict with each other.

It would be great if we could somehow use the same APIs for both cases.

BR, Oleg

Daniel Beck

unread,

Aug 17, 2018, 7:57:26 AM8/17/18

to Jenkins Developers

> On 17. Aug 2018, at 13:17, Oleg Nenashev <o.v.ne...@gmail.com> wrote:
>
> Do you plan to implement this data collection in the core directly or in a plugin?

Core (or core module).

> Obviously, we need to review conflicts with JEP-308 (Telemetry API for evergreen) so that the proposals do not conflict with each other.

The client side of that is outside Jenkins itself, while mine would be inside. This alone means it's practically independent.

Jesse Glick

unread,

Aug 17, 2018, 9:38:41 AM8/17/18

to Jenkins Dev

On Fri, Aug 17, 2018 at 7:57 AM Daniel Beck <m...@beckweb.net> wrote:
>> Do you plan to implement this data collection in the core directly or in a plugin?
>
> Core (or core module).

Fine for the initial example of collecting system properties, but in
general this sort of collection will need to make use of plugin code.
For example, Pipeline developers might like to know things such as how
many `FlowNode`s are created per build on overage. This is currently
sent to `support-core` by `workflow-cps` but we would like to gather
this sort of thing anonymously. (Note that `support-core` already
includes an anonymization system.) I see no straightforward way to
gather such a metric without explicit support from one of the Pipeline
plugins. That implies an _API_ defined in core for transmitting data
but _implementations_ possibly spread across core and various plugins.

And this just highlights a fundamental question: how exactly do you
plan to collect this kind of information during a “specific time
frame”? You can add the collector logic to a core weekly release, or a
plugin release, and enable the server side for that statistic at the
same time; and then later turn off the server and remove the collector
logic from the next component release. But users update software on
whatever schedule they like, so lots of systems will not have the
right collector logic in place during the window in which it is
allowed to send data. Will that not bias the results toward
installations that update frequently? Do you care?

>> we need to review conflicts with JEP-308 (Telemetry API for evergreen)
>

> The client side of that is outside Jenkins itself, while mine would be inside. This alone means it's practically independent.

Well, in terms of current implementation, but we would like to align
these things if at all possible so that developers of a given
statistic can collect it both from Evergreen systems or from (more or
less up to date) traditional installations as a single data stream.
After all, what you are describing is very much one of the key goals
of Evergreen (if not JEP-308 specifically):

https://github.com/jenkinsci/jep/blob/master/jep/300/README.adoc#connected

> ensure that Jenkins project developers receive useful error and usage telemetry to drive further improvements in Jenkins

Daniel Beck

unread,

Aug 17, 2018, 4:28:12 PM8/17/18

to jenkin...@googlegroups.com

> On 17. Aug 2018, at 15:38, Jesse Glick <jgl...@cloudbees.com> wrote:
>
> On Fri, Aug 17, 2018 at 7:57 AM Daniel Beck <m...@beckweb.net> wrote:
>>> Do you plan to implement this data collection in the core directly or in a plugin?
>>
>> Core (or core module).
>
> Fine for the initial example of collecting system properties, but in
> general this sort of collection will need to make use of plugin code.
> For example, Pipeline developers might like to know things such as how
> many `FlowNode`s are created per build on overage. This is currently
> sent to `support-core` by `workflow-cps` but we would like to gather
> this sort of thing anonymously. (Note that `support-core` already
> includes an anonymization system.) I see no straightforward way to
> gather such a metric without explicit support from one of the Pipeline
> plugins. That implies an _API_ defined in core for transmitting data
> but _implementations_ possibly spread across core and various plugins.

That makes sense. Yes, that should definitely be possible.

> And this just highlights a fundamental question: how exactly do you
> plan to collect this kind of information during a “specific time
> frame”? You can add the collector logic to a core weekly release, or a
> plugin release, and enable the server side for that statistic at the
> same time; and then later turn off the server and remove the collector
> logic from the next component release.

That's one option, coupled with collectors defining the start and end dates during which they collect and submit data, otherwise being disabled client-side. Combined with shutting down collection on the server side for specific collectors should take care of this restriction.

> But users update software on
> whatever schedule they like, so lots of systems will not have the
> right collector logic in place during the window in which it is
> allowed to send data. Will that not bias the results toward
> installations that update frequently? Do you care?

Usage stats point to fairly good update behavior: More than a third of installations are on a release no older than eight weeks. This is true for both weeklies and LTS. That seems like a large enough user base to get telemetry from, while still being in a fairly limited time window.

Still, without extending the collection window to many months, we'd only really collect information on those 35-50% of users. It's also difficult to say right now how we'd handle LTS releases -- would collectors be eligible for backports? (I say yes, as optional extension points written such that they mostly silently fail.)

While my original intention was to limit the collection time span as much as possible while retaining useful data, it seems we might need to adapt over time to the upgrade behavior of our users. That said, I'd prefer to start with rather small windows by default, and extend them as needed based on our experience.

(Realistically, how much do we care about the experience of users who haven't upgraded in several months? We frequently reject their bug reports, they get no security updates, we only have a single LTS line…)

>>> we need to review conflicts with JEP-308 (Telemetry API for evergreen)
>>
>> The client side of that is outside Jenkins itself, while mine would be inside. This alone means it's practically independent.
>
> Well, in terms of current implementation, but we would like to align
> these things if at all possible so that developers of a given
> statistic can collect it both from Evergreen systems or from (more or
> less up to date) traditional installations as a single data stream.
> After all, what you are describing is very much one of the key goals
> of Evergreen (if not JEP-308 specifically):

To be useful, we need a large enough user base (one of the concerns you brought up above), and therefore cannot tie ourselves to Evergreen here, at least not in the short term.

When I first brought up this idea with Tyler a few months ago, IIRC the response (from him and/or Baptiste) was there's nothing we can realistically share. That's why I went ahead with an independent proposal.

Jesse Glick

unread,

Aug 20, 2018, 9:07:34 AM8/20/18

to Jenkins Dev

On Fri, Aug 17, 2018 at 4:28 PM Daniel Beck <m...@beckweb.net> wrote:
> When I first brought up this idea with Tyler a few months ago, IIRC the response (from him and/or Baptiste) was there's nothing we can realistically share.

OK. I hope that changes at some point.

R. Tyler Croy

unread,

Aug 21, 2018, 5:59:25 PM8/21/18

to jenkin...@googlegroups.com

(replies inline)

There are a couple key differences in requirements between what Evergreen is
doing, and what Daniel seeks:

* Storage requirements: I have no intention on making promises around data
expiration with telemetry necessary for Evergreen. Some of the feature
development questions I would like to answer require data over a long
timeline in order to properly answer. My understanding of what Daniel is
proposing would be much more targeted and specific.

* Point of observation: with Evergreen all telemetry collection, thus far, is
done intentionally from an _external_ point of view. In essence the
evergreen-client is what is observing a Jenkins instance outputs or makes
availalble to it, or to other consumers, like a standard monitoring tool.
This principle is fairly important to me, as I'm wary of a Schroedinger's
Jenkins problem, where instrumenting Jenkins causes differing behavior from
what other users of Jenkins core/plugins might encounter, thereby reducing
the value of the feedback we can provide to developers.

My understanding of what Daniel seeks to do is a fairly tight integration
into some fairly key aspects of Jenkins core and related tooling.
Instrumentation which I doubt would ever be accessible to anything but the
innards of Jenkins itself.

Personally, I think both approaches are necessary for different reasons and are
independently valid.

I suggest keeping the scope for internal instrumentation small for now without
attempting to come up with the one-global-instrumentation-approach to solve all
imaginable use-cases. I trust Daniel to implement something which can grow in
the future, and may provide a foothold for future instrumentation requirments.

We have a tendency to imagine new requirements any time sommething like this
comes up, and ultimately burden the contributor with heaps of scope, leading to
nothing ultimately being done.

IMHO the scope here is narrow enough, so from my perspective, go for it Daniel!

Cheers

signature.asc

Daniel Beck

unread,

Aug 30, 2018, 6:43:45 AM8/30/18

to Jenkins Developers

JEP draft for review at:
https://github.com/jenkinsci/jep/pull/192

> --
> You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/20180821215738.GA2292%40grape.lasagna.io.
> For more options, visit https://groups.google.com/d/optout.

Daniel Beck

unread,

Sep 11, 2018, 6:28:52 PM9/11/18

to jenkin...@googlegroups.com

Hi everyone,

We're making pretty good progress here.

The JEP got a few updates (one in progress[1]), Tyler is working hard on the infra service receiving the submissions[2], and the core PR is in review, is basically "just" lacking tests right now[3].

If you're interested in this proposal, now would be a good time to provide feedback.

Thanks!
Daniel

1: https://github.com/jenkinsci/jep/pull/196
2: https://github.com/rtyler/uplink/
3: https://github.com/jenkinsci/jenkins/pull/3604

Baptiste Mathus

unread,

Sep 12, 2018, 12:59:07 AM9/12/18

to Jenkins Developers

Hello Daniel,

A few thoughts:

* How does this relate to JENKINS-32485? I seem to understand it could well cover it, apart from the required end date?

* "Time period (start and end date) during which data collection is enabled on the client (end date is mandatory)"

I am not sure I get this one (I am going to have a look at the implem, but I refrained on purpose since I guess the spec ideally ought to not need one to look in the code to get it). Does this mean plugins will add an extension point for the new Telemetry class, and at some point, after the end date, that code will be dead code (that will aggressively be removed from core I suppose)?

Do/should we define a max date? If implem says it ends in 2099, I suppose it's well like they didn't put any end date? :)

As the spec does cover the link and "reasoning" about how it relates to JEP 308, this proposal looks fine to me as such.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/59BB8929-3BA0-4D33-8C5F-ADCB25FF2080%40beckweb.net.

Daniel Beck

unread,

Sep 12, 2018, 4:02:47 AM9/12/18

to Jenkins Developers

> On 12. Sep 2018, at 06:58, Baptiste Mathus <bma...@batmat.net> wrote:
>
> * How does this relate to JENKINS-32485? I seem to understand it could well cover it, apart from the required end date?

Basically that, and JENKINS-32485 suffers from tying itself, at least in the description, to the terrible usage statistics mechanism which doesn't really have the capacity for extensibility.

> * "Time period (start and end date) during which data collection is enabled on the client (end date is mandatory)"
>

> Does this mean plugins will add an extension point for the new Telemetry class, and at some point, after the end date, that code will be dead code (that will aggressively be removed from core I suppose)?

Yes.

> Do/should we define a max date? If implem says it ends in 2099, I suppose it's well like they didn't put any end date? :)

Something we'd need to handle via project policy. The idea is to gather information for a specific purpose, not "to have it". So any collection defined to run for longer than a few months will need to be looked at closely. Since the Jenkins infra team controls access to the collected data (and which data even gets stored), it should not be a problem in practice.

On top of that it should be noted that some plugins implement their own stats collection systems, some even ignoring the opt out for usage statistics for that. So it's not like this will somehow result in a new problem.

Reply all

Reply to author

Forward