Kogito Challenges with Quarkus Compatibility

426 views
Skip to first unread message

Edoardo Vacchi

unread,
Jan 20, 2022, 9:26:46 AM1/20/22
to Quarkus Development mailing list

I would like to take the chance of the last excellent cross-team collab to squash a devmode-related issue[1,2] to discuss a couple of concerns that I've been meaning to raise for a while now


(1) compatibility and testing of Quarkus features (esp. devmode/classloading)

(2) CR vs. Final releases


I have tried to keep this mail short and stick to the data as much as possible, with a few, most recent examples, so that together, we can discuss solutions and improve the overall development experience.  


- We all value moving fast, and we wouldn't want the Quarkus team to slow down just for the sake of it.

  • The business automation team definitely has its own issues with keeping with the pace, but we are doing our best :-)

    • Part of it is due to the number of interconnected components that have to be cross-tested, but also the number and types of artifacts (e.g. cloud images) that have to be released:

      • all the BA-managed dependencies in Kogito have been released together at once, as part of the same train, for a while now

    • Part of it is due to technical limitations, that we've been trying to address (e.g. using JBoss infra instead of deploying to Maven Central)

    • Finally, like everyone, we have conflicting priorities and we cannot always interrupt other tasks to investigate Quarkus-related issues.


However, this has sometimes resulted in a poor experience for end users: Quarkus may ship with a broken Kogito release and/or even not ship Kogito at all (luckily, this has only happened in extreme cases) 


In our experience, the features that are most commonly a source of issues are 

  • native image building

  • devmode/classloading;

  • integration between different extensions


Now:


  • with native-image, there is little we can do, as most issues often derive from changes in the upstream GraalVM native image builder

  • over DevMode, classloading and cross-extension integration we have more control 


For instance, instrumentation was first introduced [3], it dramatically changed the behavior of "real devmode", departing sometimes significantly from the behavior that we could reproduce in supported DevMode tests. We were able to reproduce the behavior of instrumentation by using a mechanism (DevMojoIT [4]) that is supposedly for internal Quarkus use only. 


A recent release (Quarkus 2.6) introduced a major bump of the Kafka version that inadvertently broke a Vert.X extension [5]. Admittedly this extension was not among the advised Kafka connectors, but it still shipped as part of the core. I remember other similar incidents in the past (for instance with the OpenAPI extension). While evidence of such situations often occurs within the Kogito codebase, this seems to suggest some extensions may not be verifying some of integration paths. Kogito, being a "leaf" consumer of many features is more likely to incur in breakage: but this also means that also user-level code may incur in similar breakage. Indeed, this is a thorny issue, because the compatibility matrix may be very large.


Another cause of disruption is that, sometimes in the past, fixes to issues of a CR release have been shipped as part of a Final release, without intermediate CRs. While it is completely understandable why it was so, this has sometimes (e.g. [5]) resulted in a change in behavior in the middle of the platform release window; as already mentioned, due to our release process, this window is already kind of short, and issues occurring within that window shrink it even further because of the time that needs to be consumed seeking a solution and releasing the fix (sometimes on both ends). 


What kind of actions can we take to further improve over the current state?


On the one hand, I think, as the business automation/Kogito team, we may be able to contribute representative tests even to quarkus core when some edge-case is detected, instead of only creating integration tests in our own codebase. Again, the reason why such tests are often not provided on our end, is that it is harder for us to create completely self-contained reproducers, because most bugs involve at least our Quarkus extension.


Do you have any other ideas on how to improve this process further?



[1] Zulip: Kogito and RESTEasy Reactive

[2] GitHub PR

[3] Zulip: Kogito extension: instrumentation may not reload some classes 

[4] DevMojoIT

[4] Kogito + 2.4.2.Final/2.5.0.CR1 issues

     quarkus-jackson no longer a (transitive) dependency

     Quarkus-provided Vert.X Kafka Client incompatible with the Quarkus Kafka client version (3.0.0) 



--
Edoardo Vacchi

Principal Software Engineer, PhD

Middleware Automation

Red Hat

Guillaume Smet

unread,
Jan 20, 2022, 1:06:09 PM1/20/22
to Edoardo Vacchi, Quarkus Development mailing list
Hi Edoardo,

Thanks for your email. Always better to get things out :).

Two things from me:

Another cause of disruption is that, sometimes in the past, fixes to issues of a CR release have been shipped as part of a Final release, without intermediate CRs.

Yes, we don't have a strict CR cycle for Quarkus. Getting a strict CR cycle would slow us down quite significantly as:
- we would have to release a new CR whenever we find an issue (and we fix quite a lot of issues between CR1 and Final)
- we would have to wait for the feedback from the field (so 1 week minimum)
- restart

Given we release every month, adding one or two weeks to the release cycle doesn't fly. Also, it makes things very unpredictable for the Platform members. You wouldn't have a fixed release date for Final but a rolling window so I don't think it would be sustainable for the Platform members as you wouldn't be able to schedule anything.

That being said, I don't think that's much of a problem:
- we try to be careful about what we merge after the CR1. Obviously, shit happens but it's not a systemic issue.
- it could also happen in micros as we push fixes all the time that might break Kogito (or other extensions) and we won't do CRs for micros

So I think it all boils down to us being careful with what we merge post CRs (and we are but sometimes we miss something) and my second point below.

> Do you have any other ideas on how to improve this process further?

Yes. We **need** to fix the Ecosystem CI for Kogito. For months, it hasn't been reliable so I personally do not check it anymore as I wasted a considerable amount of time tracking issues that were not Quarkus related.
We had it red everyday for months. Sometimes it gets green for whatever reason and the next day it gets red again.
And every time I ask for it, I get another reason for it to be red. Which is understandable but, in the end, it's just impossible for us to track. I must admit I gave up on this.

Most of the problems would have been discovered far earlier if we were able to be warned properly when there is an issue.
It wouldn't have been perfect for sure but at least we would have more time to adjust things.

So, if you ask me, I would make this particular issue a big priority and that would help both teams a lot.

Cheers,

--
Guillaume

--
You received this message because you are subscribed to the Google Groups "Quarkus Development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quarkus-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/quarkus-dev/CACPHShxoOYA0cdzS-gX8%2Bu2XLC9i7VTX3Ujn%3DVYe2h3RMQ58gw%40mail.gmail.com.

Guillaume Smet

unread,
Jan 20, 2022, 1:15:14 PM1/20/22
to Edoardo Vacchi, Quarkus Development mailing list
Also, maybe we should start documenting a bit more what gets consumed by various Platform members such as extensions, build items...

Sometimes we are under the impression that something won't have any impact on anything outside of the Core and we end up breaking Platform members.

I don't have a proper solution for that but that's probably something to think about.

Guillaume Smet

unread,
Jan 20, 2022, 1:19:32 PM1/20/22
to Edoardo Vacchi, Quarkus Development mailing list
Oh and finally, maybe we should do post-mortems when we break something in Kogito/Camel/... and document the cases so that we could improve on how we handle things. I'm pretty sure we have some common patterns there.

We could typically present something in a few months with good practices based on this work.

Alexey Loubyansky

unread,
Jan 20, 2022, 2:58:14 PM1/20/22
to Guillaume Smet, Edoardo Vacchi, Quarkus Development mailing list
This is what I was thinking as well, the ecosystem CI based on the main Quarkus branch is there exactly for this reason - to catch issues early. I was checking it and saw failures some weeks ago but it wasn't clear whether it was something on the Kogito side or the Quarkus main. I think once it fails we should clarify what caused it ASAP, otherwise, we keep evolving the codebase and it may result in more issues later. Ideally, the ecosystem CI has to be green before we release a Quarkus CR. Otherwise, we may need to apply significant changes in a short period of time between a CR and the Final to fix the integration issues, which may break other integration points.

Increasing test coverage for the integration points in Quarkus core is definitely a good idea.

Georgios Andrianakis

unread,
Jan 21, 2022, 1:35:41 AM1/21/22
to Alexey Loubyansky, Guillaume Smet, Edoardo Vacchi, Quarkus Development mailing list
Very much plus +1 on putting effort into getting Kogito's Quarkus Ecosystem CI stable. I personally don't see how we can address the very valid concerns raised here without it

Edoardo Vacchi

unread,
Jan 21, 2022, 8:17:34 AM1/21/22
to Quarkus Development mailing list

Thanks everyone for chiming in.


So, to summarize:


  • it is not possible to introduce a further round of CR after a CR1, because it would make Final release dates less predictable

  • we need to address the issue with Kogito + Quarkus Ecosystem CI

  • document which components are being consumed by our extension 

  • introduce post-mortems after significant incidents


w.r.t Ecosystem CI. Indeed, I understand the sentiment. We, ourselves, have a list of nightly builds that are often in broken state (e.g. full native build) and in the last few weeks we've been trying to address those with a more thorough "guardian" schedule (i.e. we are taking turns at watching the logs); but admittedly, for some reason, the Quarkus Ecosystem CI was overlooked.


I am raising this issue to make sure that we are notified on more channels, so more eyes can be kept on it.


The Kogito Ecosystem CI would need further improvement too, because currently it only builds kogito-runtimes, while issues have been sometimes found also on kogito-apps and kogito-examples.


On the other hand, I think that we should not rely only on Kogito's Quarkus Ecosystem CI build, or we may end up being frustrated again and stop watching it again :-)


There are at least two reasons why it often breaks:


1) Kogito surface itself is large because it integrates quite a few components: this is also one reason why we have nightly builds ourselves: to run longer integration tests, native builds, etc. we are also starting to consider some degree of manual testing to verify devmode and/or other features that have sometimes shown to be challenging to test in an automated way.


2) because the surface is large, it integrates several different upstream components in the core; which means that breakage may occur at many different places


Hence, as much as I am frustrated myself with it being routinely broken, even if we address it more promptly (which, again, indeed it is necessary), it will continue to break often (and if we include -apps and -examples, even more broken) unless we also extend the coverage of Quarkus core, when it comes to integrating multiple extensions together.


In other words, we should strive to make it so that a red status in Kogito Ecosystem CI only means "Kogito did something wrong, that was rectified in Quarkus, and Kogito needs fixing", and not just "either Kogito or Quarkus broke, and the issue may be on either end".


<insert spiderman meme here>

:)


On Thu, Jan 20, 2022 at 3:26 PM Edoardo Vacchi <eva...@redhat.com> wrote:

Max Rydahl Andersen

unread,
Jan 21, 2022, 9:07:16 AM1/21/22
to Edoardo Vacchi, Quarkus Development mailing list

Interesting read and thanks for the link to issues that I'm still exploring.

I see it as two part solutions:

  • fix ecosystem ci for Kogito and as far as I can see kogito team hold all the answers here to add additional test and keep an eye on it ?

  • adding tests to Quarkus whhere Kogito team find there is not well enough coverage. I say that makes perfect sense to me. We have two places here depending on what level the tests make most sense. In Quarkus core extensions and integration tests or in the platform test suite.

And doing post mortems makes perfect sense for me.

/max

--

You received this message because you are subscribed to the Google Groups "Quarkus Development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quarkus-dev...@googlegroups.com.

Edoardo Vacchi

unread,
Feb 1, 2022, 3:40:44 AM2/1/22
to Quarkus Development mailing list
On Fri, Jan 21, 2022 at 2:17 PM Edoardo Vacchi <eva...@redhat.com> wrote:

Thanks everyone for chiming in.


So, to summarize:


  • it is not possible to introduce a further round of CR after a CR1, because it would make Final release dates less predictable

  • we need to address the issue with Kogito + Quarkus Ecosystem CI

  • document which components are being consumed by our extension 

  • introduce post-mortems after significant incidents


I am happy to report that the Ecosystem CI is back to green! Now let us all keep it that way! 🚀


 

Georgios Andrianakis

unread,
Feb 1, 2022, 4:31:36 AM2/1/22
to Edoardo Vacchi, Quarkus Development mailing list
Today was a great day for CI jobs! All the ones I saw were green 🌲🌲🌲

--
You received this message because you are subscribed to the Google Groups "Quarkus Development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quarkus-dev...@googlegroups.com.

Guillaume Smet

unread,
Feb 1, 2022, 4:47:12 AM2/1/22
to Edoardo Vacchi, Quarkus Development mailing list
On Tue, Feb 1, 2022 at 9:40 AM Edoardo Vacchi <eva...@redhat.com> wrote:
I am happy to report that the Ecosystem CI is back to green! Now let us all keep it that way! 🚀

Let me take a mental picture of what it looks like :).

Thanks!

--
Guillaume

Max Rydahl Andersen

unread,
Feb 1, 2022, 7:41:33 AM2/1/22
to Georgios Andrianakis, Edoardo Vacchi, Quarkus Development mailing list
On 1 Feb 2022, at 10:31, Georgios Andrianakis wrote:

> Today was a great day for CI jobs! All the ones I saw were green 🌲🌲🌲

green and lean!

/max

>
> On Tue, Feb 1, 2022, 10:40 Edoardo Vacchi <eva...@redhat.com> wrote:
>
>>
>>
>> On Fri, Jan 21, 2022 at 2:17 PM Edoardo Vacchi <eva...@redhat.com> wrote:
>>
>>> Thanks everyone for chiming in.
>>>
>>> So, to summarize:
>>>
>>>
>>> -
>>>
>>> it is not possible to introduce a further round of CR after a CR1,
>>> because it would make Final release dates less predictable
>>> -
>>>
>>> we need to address the issue with Kogito + Quarkus Ecosystem CI
>>> -
>>>
>>> document which components are being consumed by our extension
>>> -
>>>
>>> introduce post-mortems after significant incidents
>>>
>>>
>> I am happy to report that the Ecosystem CI is back to green! Now let us
>> all keep it that way! 🚀
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Quarkus Development mailing list" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to quarkus-dev...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/quarkus-dev/CACPHShz8xbKebXAA939%2B1G9S19j1v7sxZHiDcZ2WAA0Z62Qh1Q%40mail.gmail.com
>> <https://groups.google.com/d/msgid/quarkus-dev/CACPHShz8xbKebXAA939%2B1G9S19j1v7sxZHiDcZ2WAA0Z62Qh1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups "Quarkus Development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to quarkus-dev...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/quarkus-dev/CALeTM-%3D2B5mrEj9WZDFzHc64QECT-_BoHz_Qi0s8ureUt-BCcA%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages