Diamond Dependencies and Upstreams at Scale

18 views
Skip to first unread message

Gerald Wiltse

unread,
May 18, 2020, 4:48:03 PM5/18/20
to jenkin...@googlegroups.com
TLDR; 
This ticket suggests that Jenkins team open a discussion and planning to spend significant times on the problems mentioned above.  If you have experienced issues with diamond dependencies between jenkins jobs, or want more features around upstream() and downstream() dependency handling, please read on and consider adding some comments/votes in this ticket below. There are a lot of other related issues where you may have commented in the past. Please use this ticket to update/aggregate those comments  (maybe you've worked around the issue since, or found ).
https://issues.jenkins-ci.org/browse/JENKINS-57648 

Disclaimer:
The message below is based on my teams current understanding of Jenkins and the current Plugin ecosystem for Pipeline jobs specifically.  The call to action is based on this understanding. It's possible that there's some critical piece of information which we are not aware of, which renders this discussion completely or partially invalid. If so, we apologize, please let us know and help guide the discussion forward.

Introduction:
Jenkins upstream and downstream triggers seem to be a critical and fundamental abstraction for Jenkins when used at enterprise scale. In the abstract sense, they are the declarative primitives for defining a "dependency graph". There's another non-declarative primitive, the build() step, which we can consider a "dynamic"/"runtime" version of downstream().  There are a number of Jira Issues surrounding features for the job dependency graph referenced in the citations at the bottom of this message. Some points in this message are borrowed from those issues.

For the purpose of this message, we'll define one acronym:
 - "JDG" = job dependency graph

Focus:
This message is focused on the interaction between two related concepts:
- Upstream job triggers in pipelines
- The diamond dependency problem (locking/blocking/de-duplicating/etc of downstream builds)

History:
Jenkins has a very robust API for plugins and jobs to work with the job dependency graph. This API evolved over a long period of time and became very mature. Unfortunately, for the new pipeline model, this entire body of work has been rendered unusable.  Plugins based on DependencyGraph do not work with pipelines. So, currently, in terms of "dependency graph" features for pipelines, Jenkins lost a lot of ground. If we had a number of the old API features on the new model, it's likely this message would be unnecessary.

There's a decent reason for this gap. It seems virtually impossible to update the existing API to support the new model, especially in a way that would be non-breaking.  Also, the old DependencyGraph API had some fundamental limitations which had been identified and needed to be overcome anyway. So, it makes more sense to re-think any DependencyGraph-related features for pipelines, and consult the old API for guidance where applicable. With so much going on in Jenkins evolution, a massive project like a "new from-scratch dependency graph API" this is not easy to justify without overwhelming public support which simply hasn't existed yet. 

The Hard Part for Jenkins:
It turns out that an overwhelming number of non-trivial pipeline cases rely on the "build()" step as the cornerstone of constructing their pipelines because it offers the unique power and flexibility of controlling the entire build tree from a single "job". It seems likely that this is one major contributing factor to the lack of public demand for more attention to supporting API's for working with upstream() and downstream() constructs. Without conditional logic, upstream() and downstream() declarations just seem like non-starters compared to using the "build()" step. Many organizations are now likely to be deeply entrenched in pipelines based on build() step and dis-incentivized to rethink them, or be involved with this discussion. 

The Hard Part for Jenkins Users:
The hardest cases are enterprise organizations with large, complex JDG's with many hundreds of inter-related components. In these environments, designing, maintaining, and evolving the JDG is one of the most significant and challenging aspects of the Jenkins administration role. Smaller organizations have many of the same challenges, but design compromises and imperfections are less painful and costly. A lot of "annoyances" are acceptable up to a certain scale.  BUT, at some scale become "completely unmanageable", and users in those environments are the ones targeted in this message.

At some scale, the build() step goes from being a powerful, flexible "bread and butter" feature, to becoming the "GOTO" of CI. It encourages large monolithic pipelines which are difficult to debug and manage, and which are anathema to goal of modularity of components and their builds. In the OSS world, each component has separate builds on one-or-more build services because that's a good general strategy for building components. This is not to say that the build() step is bad and we shouldn't use it.  It's not to say that ALL large complex monolithic builds can or should be replaced with modular ones.  This is to point out that there is a scale and range of use-cases in which separate modular builds would be preferred, but where users are forced to create and maintain large monolithic builds against there will because there is no clear alternative. Sadly, that scale is even that large.

The Wrong Direction for Scale:
As we said before, in an abstract sense, we're talking about modeling a dependency graph. In some substantially large number of cases, we're talking about jobs which align 1-to-1 with libraries and applications. In some substantially large number of those cases, the JDG dependency graph mirrors the dependency graph in code between the libraries. 

In simple terms, we suggest that the problem with build() (and the downstream() declaration block), is that they define the graph in the wrong direction from a scalability and manageability perspective. It goes the opposite direction of the dependency declarations in the code.  Dependency management is a very mature field with a lot of well established principles. One of the most fundamental is that downstream consumers specify upstream dependencies, and NEVER the other way around.  Upstream declarations are the most manageable way to define a JAG in Jenkins, or at least they would be if they received some necessary capabilities.

It's worth pointing out that Jenkins jobs are different from code. Not all JDG's mirror library and application dependency graphs. Jenkins has far more use cases which feature huge diversity, so such strict principles don't apply to all JDG's. There are lots of cases are handled just fine with the current features. We're not suggesting there's one true way.

A Future for Upstreams():
Here's what we ARE suggesting. Jenkins NEEDS something new for JDG's which ARE this way (bigger than can be managed with downstream() and build()).  Furthermore, we feel that at least one new approach to solving this problem should be based on making upstream() declarations work for more cases.  For this, it must have new capabilities including advanced features around the statuses of downstream jobs. Remember that even in a graph where all deps are declared in the upstream direction, at runtime, it's still a graph which has a downstream direction. We can still have a condition "whenAllDownstreamsComplete()", even for a job that declares no downstreams.  It's not easy to re-think some of the current build() patterns like this in terms of upstream() declarations... there are a lot of questions that come up.  The biggest of which is "Diamond Dependencies".

Diamond Dependencies:
Diamond dependencies are actually a problem in large scale JDGs whether you use upstream(), downstream(), or build() declarations. The one and only solution for pipelines is to use the build() step strategy, because with this, you effectively have the ability to resolve any possible issues by throwing more code at it.  You can setup a locking system, and complex functions which gate the triggering of downstream jobs based on arbitrary conditions.  You can engineer a solution all of your own, tailored for each individual pipeline. However, if you want to use upstream() or even simple downstream() blocks to declare your graph, your options for gating with diamond dependencies are very limited and unpleasant. You're better off giving up on upstream/downstream and calling build() everywhere you need it, at whatever price that comes.  

We've thought of several reasonable ways to approach the diamond dependency problem with upstream() declarations. It's all technically possible, not quite as hard as it seems at the start, and actually could be nicely generalized. We would very much like to go into more detail in the future with the Jenkins core team.

Moving forward on these topics:

To move forward, there needs to be some consensus from the core team that:

1. The scalability problems of downstream() and build() for some cases are acknowledged
2. There's some willingness to discussing advanced upstreams() cases in more detail
3. There's some willingness and ability to actually work on this if we can agree on a path forward

If we've failed to make the case strongly enough, or the team is simply too busy or disinclined to work on these features, then we can shelve the idea until some future date. However, we're struggling with the problems now, so we wanted to bring it up now.  Also, we noticed some updates in some of the cited issues that seem to indicate some work might be underway, so we wanted to get our feedback in before it's too late.

Beyond Static Upstreams():
If Jenkins jobs can be given a better comprehension and functionality around Upstream() triggers and the dependency graph in general, then several exciting things become possible. For example, we can imagine plugins and shared library functions which dynamically define blocks such as triggers{myUpstreamsFunction()} based on the enumerating a build system or package manager file for any language. Something similar has been done in the past with maven, but this approach would be much more logical and generalizable.

Regardless of the outcome, thanks to all who took the time to read.

Regards,
Jerry

Citations:

Much needed dependency management between jobs
https://issues.jenkins-ci.org/browse/JENKINS-19728

Diamond dependency descriptions:
https://issues.jenkins-ci.org/browse/JENKINS-19728?focusedCommentId=216076&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216076

Generalize DependencyGraph to Job (or ParameterizedJobMixIn)
https://issues.jenkins-ci.org/browse/JENKINS-29913

RPM use case: 
https://issues.jenkins-ci.org/browse/JENKINS-29913?focusedCommentId=335785&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-335785

Block when downstream jobs are broken
https://issues.jenkins-ci.org/browse/JENKINS-19727
Open since 2013

Block Pipeline job while upstream or downstream projects are building
https://issues.jenkins-ci.org/browse/JENKINS-37718

Provide a 'prerequisites' sections (like triggers) that allows blocking the execution of a job
https://issues.jenkins-ci.org/browse/JENKINS-52203

Lock label does not work with upstream trigger
https://issues.jenkins-ci.org/browse/JENKINS-54627

Add New DependencyGraph API's for declarative pipelines
https://issues.jenkins-ci.org/browse/JENKINS-57648
 

Gerald R. Wiltse
jerry...@gmail.com

Jesse Glick

unread,
May 18, 2020, 5:55:26 PM5/18/20
to Jenkins Dev
As far as I know there is no serious work in progress in this area,
and no particular plan for work on it from the “core team” (maybe a
misleading phrase).

Indeed `DependencyGraph` as currently defined is very rigid and could
not work well even for moderately subtle Pipeline scenarios, so it
does not seem worth trying to adapt.

You can define more sophisticated variants of `ReverseBuildTrigger` in
plugins, though I would tend to discourage doing this sort of thing at
the Jenkins level to begin with. Instead it is likely more scalable to
have “downstream” builds be triggered by some external event, such an
artifact appearing in Nexus or an image in a Docker registry.

Alternatively, you can keep trigger management outside of component
Pipelines altogether, defining some sort of orchestration project that
uses the `build` step internally but in a computed graph. Or this
orchestration can be done by external tools designed for that purpose,
for example using the Jenkins REST API to trigger builds.

If some larger and more intrusive concept of dependency graphs needs
to make its way into fundamental APIs so that a variety of plugins can
interoperate based on a common understanding of project relationships
(for example so the graph can be displayed in build visualizations),
then someone would need to file a JEP for it and commit to writing a
reference implementation and driving integrations. The added
complexity would need to be justified by new abilities that a lot of
people could enjoy without too much migration effort.

Some inertia stems from the fact there is no obvious, straightforward,
single best practice for doing CI when you have hundreds of
interrelated components. Some organizations use a monorepo and use
various tools to cache partial build results. Others prefer microrepos
with subtle triggering relationships and special workflows. The build
system often frames the problem. If you have a particular model in
mind then you are in a position to sketch a tool which would help you
and others in the same situation.

Gerald Wiltse

unread,
May 19, 2020, 10:30:45 AM5/19/20
to jenkin...@googlegroups.com
I find this feedback very encouraging. It definitely does seem to be a good candidate for JEP proposal.  I will plan for that in the mid-term future.  

Your suggestions about alternatives are all right on. In one large environment, I created a solution with a metajob that received webhooks from all jobs, used the Jenkins REST API's to query "all job configurations" and then correlate the hooks to metadata, and use build() to trigger the appropriate jobs.  This effectively represented an alternative downstream mapping mechanism.  It worked for it's purpose and is still in production today.  In the end, we looked back and squinted at it, and could see that with a few very deep, yet reasonable (likely non-breaking) changes to the upstream/downstream system , Jenkins could do the same logic natively. That largely led to this thread.  Right now, I'm engineering a solution for a different use case which is similar-in-scope, related to the topic, yet different enough to learn some new things.  At the end of this, I think I will have an even better mix of perspectives to guide me through a JEP.  I apologize in advance for bothering everyone in the future with my struggles on creating the reference implementation. 

To everyone reading, I would still like to collect support for this effort in terms of votes and comments and other peoples struggles in the Issue I linked.

Regards,
Jerry

Gerald R. Wiltse
jerry...@gmail.com



--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr3oAz%2BFL%3D8FOXufOe0frmXiKeTP7WD9Jis%2BkwFCAkVvLw%40mail.gmail.com.

Gerald Wiltse

unread,
May 20, 2020, 11:09:47 AM5/20/20
to jenkin...@googlegroups.com
It's worth noting that I think a lot of good code and abstraction and capturing of the problem already exists here: 


I'm not sure if Stefan Wolf still watches this list or actively contributes to Jenkins, but his insight would probably be invaluable. I will study his work and try to reach out to him.

Regards,
Gerald R. Wiltse
jerry...@gmail.com


Gerald Wiltse

unread,
May 20, 2020, 11:39:32 AM5/20/20
to jenkin...@googlegroups.com
Upon further review, this uses an observer pattern which is just completely different from what I was envisioning. I'm still glad I found it because it still captures the problem with visuals, and helps demonstrate demand for a solution. 

Gerald R. Wiltse
jerry...@gmail.com


Basil Crow

unread,
May 20, 2020, 11:43:44 AM5/20/20
to jenkin...@googlegroups.com
On Wed, May 20, 2020 at 8:39 AM Gerald Wiltse <jerry...@gmail.com> wrote:
>
> it still captures the problem with visuals, and helps demonstrate demand for a solution.

When we were using Build Flow before the creation of Pipeline, my
users really liked the visualization provided by the buildgraph-view
plugin. I got a lot of complaints from my users when we switched to
Pipeline and this UI was removed.

Basil
Reply all
Reply to author
Forward
0 new messages