Issues with the Fault Tolerance 4.1 metrics

58 views
Skip to first unread message

Andrew Rouse

unread,
Jul 29, 2024, 9:58:42 AM7/29/24
to MicroProfile
I posted this to the microprofile-dev mailing list on Friday, but after the lack of response, realised I probably should have posted here instead.

Just as we're coming up to release, we've had an issue raised about the new Open Telemetry metrics added in Fault Tolerance 4.1: https://github.com/eclipse/microprofile-fault-tolerance/issues/639

When we added the integration of Open Telemetry metrics, we used our existing integration with MP Metrics as a guide. However, we didn't realise that while both APIs have many of the same concepts (counters, gauges, histograms etc.) the details between the two are quite different and some of the choices we made for MP Metrics are not the right choices for Open Telemetry metrics.

We do now have several PRs open:
My PR to update the TCK to match the spec proposal: https://github.com/eclipse/microprofile-fault-tolerance/pull/641
An Open Liberty PR to update our implementation: https://github.com/OpenLiberty/open-liberty/pull/29216

If we were to go ahead with merging these, it would take a few days to get the changes into a liberty build, the TCK run and a new CCR created.

Given that we're trying to get MP 7.0 released, I think the options
are:

1. Delay MP 7.0 again so that this issue can be addressed
2. Remove MP Fault Tolerance 4.1 from MP 7.0 (and include Fault Tolerance 4.0 instead)
3. Release MP Fault Tolerance 4.1 in its current state (which would require a breaking change to fix these problems in a later release)

The only change in MP Fault Tolerance 4.1 is the addition of these Open Telemetry metrics, so personally I would prefer one of the first two options, rather than releasing it with these issues.

Regards,
Andrew Rouse

John Clingan

unread,
Jul 29, 2024, 5:48:48 PM7/29/24
to MicroProfile
For option 1, how long of a delay?

John Clingan

unread,
Jul 29, 2024, 5:53:42 PM7/29/24
to MicroProfile
Also, if we reverted to FT 4.0, what would FT 4.0 Metrics be integrating with since MP Metrics is being removed from the platform spec? I hope I worded this in a parseable way, LOL.

On Monday, July 29, 2024 at 6:58:42 AM UTC-7 Andrew Rouse wrote:

Andrew Rouse

unread,
Jul 30, 2024, 7:02:49 AM7/30/24
to MicroProfile
For option 1, how long of a delay?

I think we could have a new release, implementation and CCR ready by Monday 5th August, which would mean the ballot could start then and finish on the 19th.

Also, if we reverted to FT 4.0, what would FT 4.0 Metrics be integrating with since MP Metrics is being removed from the platform spec? I hope I worded this in a parseable way, LOL.
 
FT 4.0 states "When Microprofile Fault Tolerance and Microprofile Metrics are used together, metrics are automatically added for ...."

So where an implementation provides both technologies they must work together, but if MP Metrics is not there then there's no requirement to produce metrics for Fault Tolerance.

FT 4.1 extends this section to additionally require integration with MP Telemetry if it's present.

I did also note while looking up these references that we need to document how to run the TCK with either the MP Telemetry or MP Metrics tests disabled.

Andrew

John Clingan

unread,
Jul 30, 2024, 10:40:38 AM7/30/24
to MicroProfile
Given your feedback, this is a no-brainer: Delay since this is a reasonable timeframe.  We can discuss this during the technical call today. Will you be able to join? We can cover it as the first topic.

Andrew Rouse

unread,
Jul 30, 2024, 11:57:23 AM7/30/24
to MicroProfile
I have another commitment later this evening but I can join for the first 15 minutes. Thanks.

John Clingan

unread,
Jul 30, 2024, 12:09:03 PM7/30/24
to MicroProfile

Got it.

Andrew Rouse

unread,
Jul 31, 2024, 8:28:15 AM7/31/24
to MicroProfile
I have released 4.1-RC3 with this change included and will now stage the final release.
Reply all
Reply to author
Forward
0 new messages