Konflux Multi-Arch Support: POC Conclusion and Future Plans (May 2025)

18 views
Skip to first unread message

Barak Korren

unread,
Jun 3, 2025, 8:59:19 AMJun 3
to konflux-...@redhat.com, kon...@redhat.com, kon...@googlegroups.com

Summary

The Proof of Concept (POC) to evaluate Kata-peer-PODs as a replacement for the multi-platform-controller (MPC) has concluded. While the POC demonstrated the successful launch of s390x and ARM64 PODs, it also identified gaps that prevent Kata-peer-PODs and the OpenShift Sandboxed Containers (OSC) product from being production-ready for Konflux at this time. This report details the POC findings, identified gaps, and outlines the future plans for the Konflux infra team.

POC Results and Findings

  • POC Completion: The POC has concluded. Full investigation of PPC64LE (PowerPC) was not completed due to a shift away from a dynamic VM allocation strategy, which is currently unsustainable with the existing IBM cloud and IBM PowerVS architectures. IBM's upstream contributions to Kata-peer-PODs, working towards VM pooling, further support the need for a different approach.

  • VM Storage Allocation: Investigations into the VM storage allocation capabilities of the cloud drivers built into Kata-peer-PODs were dropped, as a Bring Your Own VM (BYOVM) architecture is becoming the preferred approach. Dynamic provisioning with Kata-peer-PODs requires an upstream enhancement to support IBM cloud and other clouds for more flexibility.

  • OSC Production Readiness: POC results indicate that OSC and its underlying Kata-peer-PODs technology are not currently production-ready for Konflux.

  • Privileged Permissions: Granting privileged permissions to workloads in Kata-peer-PODs works similarly to typical worker nodes, using ServiceAccounts and SecurityContextConstraints. It is recommended to avoid creating new privileged ServiceAccounts and instead grant more permissions to existing ones, with usage restricted by Conforma policies. This investigation is currently stalled.

Gaps with OpenShift Sandboxed Containers

  • Bring Your Own VM (BYOVM): Konflux requires flexibility in VM allocation and management, including pre-allocated VM pools and VM reuse. Pushing all features into Kata-peer-PODs' Cloud API Adapter (CAA) is unsustainable. Leveraging existing VM allocation solutions through a BYOVM driver for CAA is preferred. Some upstream work on VM pre-allocation has begun, but more is needed. VM reuse was discussed, but no progress was made.

  • Multi-arch Image Builds: OSC and upstream projects do not provide ready-made cloud images for all needed architectures. Significant time was spent building custom images. The image build process is complex, involving Rust and Go software components, multiple container images, and VM image creation using tools like Packer, Mkosi, or BootC. The OSC approach of building images during operator setup is also unsuitable for Konflux. A Konflux build process for ready-to-use PODVM images for all architectures is desired, preferably with a single-phase build process using a multi-stage Dockerfile/Containerfile.

  • OSC Version: The POC started with OSC 1.8, but a pre-released OSC 1.9 image was used due to the need for newer versions of the CAA component for custom PODVM images.

Gaps with Other Software

  • RHEL 9.6: RHEL 9.6 is required to run on PPC, while RHEL 9.5 was used during the POC.

  • Tekton PodTemplate Support: Support for setting podTemplate on Tekton Tasks to enable multi-arch builds with Matrix is being worked on upstream. This may be less critical if architecture limits can be set within a single pipeline.

Future Work by the Konflux Infra Team

  • VM Provisioning and Pooling: Develop VM provisioning and pooling based on prior art (KONFLUX-8415).

  • MPC Evolution: MPC will be adapted to work on top of the new VM provisioning system, and the current MPC driver layer will be phased out.

  • Kata Integration: When the Kata project is able to close the gaps we mentioned, we can re-evaluate using it as a replacement for MPC.

Further Issues with Current Architecture

The following issues exist with the current MPC architecture, our expectation was that using Kata-peer-PODs would help towards resolving those issues. Our POC shows that we will need custom solutions for these issues whether we adopt Kata or not. The Konflux infra team will need to plan for resolving these issues in the future.


  • Outdated VM Images: MPC's VM images are outdated. A multi-arch image building process is needed. OSC image-related solutions could address this.

  • Lack of VM configuration management: Since the VMs may be long-lived and may require a set of specialized configuration settings for increased security, better monitoring, etc. Adopting a configuration management solution such as Ansible is desirable.

  • Image Security and Attestation: Image security is critical. The full attestation chain for image builds and proof of VM image integrity are needed. Attestation data for builds on VMs also needs to be reflected.

  • MPC Metrics: MPC lacks metrics gathering for workloads and VMs. Kata-peer-PODs provide POD-like metrics, which is an advantage.

  • VM System Logs: System logs from all VMs need to be gathered. Currently, neither MPC nor Kata-peer-PODs provide a solution.

  • Network Egress Logging: Network egress traffic for VMs needs to be logged.


Adam Kaplan

unread,
Jun 4, 2025, 9:41:01 AMJun 4
to Barak Korren, kon...@googlegroups.com
Hi Barak,

Thanks so much for your detailed update. There's a lot of areas that you have identified for improvement or changes in strategy (such as dynamic provisioning of VMs, which appears to not scale on some cloud providers).

Are you able to provide links to upstream issues (particularly for Kata and Confidential Containers) where these discussions and improvements are happening?

Thanks,
Adam


--
You received this message because you are subscribed to the Google Groups "Konflux CI" group.
To unsubscribe from this group and stop receiving emails from it, send an email to konflux+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/konflux/CAGJrMmr7%2Bd11F2%3DG4-g-NDt9D6RhzgLTehg1ZnDL1vHPUeSGiA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


--

Adam Kaplan

He/Him

Principal Software Engineer

Red Hat

100 E. Davie Street

adam....@redhat.com    


Barak Korren

unread,
Jun 5, 2025, 4:37:28 AMJun 5
to Adam Kaplan, kon...@googlegroups.com
Hi Adam,

Here is what is going upstream in this space to the best of my knowledge:

VM Pooling:

Issue: #1317: Investigate usage of pre-created instance pool for faster pod creation time
PR: #2405: WIP: caa: add a feature to use pre-created VM pools

* Note the approach taken there is to build pooling into the CAA drivers directly rather than going for BYOVM, but if this is implemented it will lay down some important groundwork for enabling VM pre-allocation in general.

BYOVM:

Issue: #1224: Azure: Use Azure Service Operator (ASO) for cloud resource management

The original issue was focused around Asure and offloading to an Asure-specific controller, but the discussion quickly moved to considering other similar controllers including provider-agnostic ones.

Regards,
Barak.

--

Barak Korren

SPSE, Secure Flow (Konflux), Vanguard team.

Red Hat

Reply all
Reply to author
Forward
0 new messages