On 26/03/2021 17:46, Jan Kiszka wrote:
> On 26.03.21 17:14, Silvano Cirujano Cuesta wrote:
>> On 26/03/2021 10:43, Jan Kiszka wrote:
>>> On 26.03.21 10:14, [ext] Silvano Cirujano Cuesta wrote:
>>>> ...
>>>>
>>>> 1. Container has to run privileged.
>>> Yes, though that need will not vanish for Isar very soon when resolving
>>> the binfmt topic, as we know.
>> If I don't remember wrong, getting rid of this binfmt_misc configuration enables us to get it running granting just one or two capabilities, instead of "--privileged".
>>
> Yes, but it would not change the fact that the build could break/attack
> the host. It would get us one step closer, true.
AFAIK only the capability SYS_ADMIN and MKNOD where needed. Although I assume that it's somehow possible to make a privilege escalation only with both of them, then level of expertise needed to do so it's widely available... I've investigated the topic a bit for a project and I'm not aware of any technique capable of it.
>
>>>> ...
>>>>
>>>> Is it the only problem that the qemu-user binaries can be too old? I mean, just having qemu-user-static > 5.2 (the version being currently installed with the buster-backport) would be enough? That's at least my assumption for the proposed solutions.
>> Can anybody confirm that this is the issue? I mean, would a new (how new? which is the minimal version?) qemu-user binary suffice? The answer to this question is key to understand which problem the binfmt_misc configuration from the container was trying to fix in the first place.
> Config from the container is first of all addressing the issue that we
> have to run on any host distribution, not just Debian, and on Debian
> irrespective of the fact if the user installed qemu-user-static or not.
> That's relevant for the "Linux beginner can build an image" story. We
> only need to tell them to install docker and enable the logged in user
> to access it.
I can understand that use-case. But breaking the binfmt_misc system of a Linux beginner is not cool. And that's what the current approach potentially does.
My proposals don't assume any distribution at all. Only the statically built qemu-user binaries are required (you can build them yourself or extract them from a package distributing them) and some tools (a simple script like [1] suffice) to register them. Packages provided by the distributions provided by the distros are simply a comfortable way for getting them).
IMHO that "linux beginner" use-case shouldn't be the default, but activated with a flag. And a clear message should make users aware of the consequences (even if they cannot understand it the moment they read it, they might keep it in their head until they stumble upon it).
[1]
https://github.com/qemu/qemu/blob/master/scripts/qemu-binfmt-conf.sh
>
>>>> ...
>>>>
>>>>
>>>> NO FIX_BINARY FLAG
>>>>
>>>> This approach is much more flexible, since each chroot and mount namespace in the system has its own configuration.
>>>>
>>>> But this flexibility doesn't come for free. The setup becomes much more complex on the host.
>>>>
>>>> The qemu-user statically linked binaries have to be provisioned on the host, but without the "fix_binary" registration.
>>>>
>>>> Each chroot/mountns has to:
>>>>
>>>> - either bring its own qemu-user binaries fitting the same path the host is using
>>>>
>>>> - or get the qemu-user binaries bind-mounted into their root filesystem
>>>>
>>>> Getting these requirements fulfilled in desktop environments seems to me to be too cumbersome.
>>> This is how things work so far, and how they are proven to work
>>> sufficiently reliable for single-user/single-build desktop environments.
>>> It's a must to preserve that use case.
>> What I meant with "these requirements" is not the current status, but the proposal of not using the "fix_binary" flag. What would require:
>>
>> 1. A configuration of the host that can be easily accomplished with some easy scripts.
>>
> - install docker on (recent) distro of your choice
> - add user to group docker (or whatever grants access -> distro doc)
> - run kas-container
>
> That's how projects/products like meta-iot2050 or
> {jailhouse,xenomai}-images work.
Accomplishing the required host configuration is even easier than installing Docker on most distros :-D
I mean, even kas-container could take care of it requesting root permissions only once from outside of the container and globally for the whole system.
>
>> 2. Starting the container bind mounting the corresponding qemu-user binaries.
>>
>> Using kas-container as it is now isn't really cumbersome, it gets problematic for all other uses of binfmt_misc with qemu-user in the same system.
>>
> Right, but we never heard complaints in the context of those single
> desktop user scenarios.
Most people wouldn't be able to blame the kas-isar container image even when facing issues provoked by it. The needed knowledge about the binfmt_misc mechanisms would probably keep most people puzzled by the kind of issues that you might face.
>
>> The kas-container setup can only be reliable for a "single-user/single-build" desktop environment if you are only using binmft_misc with qemu-user for ISAR builds with kas-container. As mentioned above, kas-container is leaving behind "scorched earth"... but since the first thing it does is "fertilizing", you won't notice anything as long as you only use kas-container. See below what I mean with "scorched earth".
> kas-container is not aiming at CI, the kas-isar /container/ is.
I've mixed up both in some places, but it's the kas-isar container image what I usually mean. In the above sentence replace "kas-container" with "kas-isar container image".
>
>> Give following a try:
>>
>> 1. Run "kas-container shell..."
>>
>> 2. Reboot
>>
>> 3. Run "docker run --platform arm64 debian:buster-slim uname -m"
>>
>> The second command will fail because kas-container configured its own qemu-user binary, but the binary disappears the moment the container is removed and the in memory loaded copy disappears with the reboot.
>>
>>
>> Installing qemu-user-static in the host would partially fix it, although it becomes an unpredictable setup if running other binftm_misc consumers in the system (what you typically do in a development desktop).
> I have no qemu-user-static that configures binfmt_misc the way Debian
> does on my SUSE. And I bet that's similar, just different, on Fedora,
> Arch, you-name-it.
IMO qemu-user-static packages shouldn't be configuring binfmt_misc without a chance to modify it (like Debian does). You're right that my comment is too Debian specific. More generally I meant with "Installing qemu-user-static" => "Installing qemu-user statically built binaries and registering them on binfmt_misc". If your distro provides a package that does it, fine. If not, you can do it yourself.
>
> Requiring a more specific host setup is ok for unisolated CI, it's not
> for the desktop.
If kas-container can require Docker, I don't understand why it cannot require a host binfmt_misc configuration.
>
>> First the qemu-user binaries of the host are loaded, until kas-container is run from the first time and its binaries get loaded and remain loaded until a reboot or a binfmt_misc reconfiguration.
>>
>> If multiarch/qemu-user-static is run ("docker run --rm --privileged multiarch/qemu-user-static --reset -p yes"), then multiarch/qemu-user-static binaries get loaded an remain active until a reboot or the next binftm_misc reconfiguration (possibly done by kas-container).
>>
>> Either all binfmt_misc consumers reconfigure the system before running (what you typically wouldn't do) or they won't know which binaries are being used...
>>
>> But why only "partially fix"? Because if any configuration from a container is using different paths than the host, then the binftm_misc configuration left behind will be appointing to a path that only exists inside of the container!
>>
>> I know at least of one project running on the same host the KAS container is being run and the first thing they have to do is running "docker run --rm --privileged multiarch/qemu-user-static --reset -p yes" because the setup left behind by the KAS container is broken for them :-/
>>
>> After having learned what kas-container does in my host, I won't be running it again in its pristine form (like I was doing until now). I'll make sure to run a patched version. And I suspect I'm not the only one doing so... I wonder if you are really running kas-container in your own desktop fully aware of what it's doing to your host.
>>
> You are free to do that. The majority of our users won't (...be able to).
I'd like to give those users the same alternative I'd like to have :-) One that give them control over their system.
>
>>>> On systems automatically preparing execution environments (like CI runners) the setup doesn't seem to be that complex. Only a matching between the execution environment and the container requirements would be needed, what can typically be accomplished with tags or similar mechanisms (CI jobs requesting the tag "binfmt-qemu-52" only run on execution environments bind-mounting qemu-user statically linked binaries > 5.2).
>>>>
>>> Right. For CI runners with shared, concurrently builds, we likely want
>>> to provide an alternative strategy, one that "just works", detects when
>>> a host is outdated (though, how?) and otherwise does not stumble when it
>>> cannot update the settings from within the container.
>> Detecting outdated from the host before starting the CI job is trivial with "qemu-<arch>-static -version".
>>
>> I can investigate how to detect outdated qemu-user binaries from within the emulated architecture. Possibly checking the existence of new syscalls could help on this.
>>
>> Updating the settings from within the container should be completely forbidden in a CI runner.
>>
> I'm fine with that - as long as the desktop use case is not broken.
Of course, I foster for a solution that enables both use-cases. Breaking one of them is not an alternative for me. I'm clear about the need for something like kas-container (that's why I'm contributing to it right from the beginning).
>
> If there is a way to present kas-isar on a CI runner a fitting setup and
> prevent it from changing that, kas-isar can simply validate it and only
> fail (or warn) when expectations are not met. In the kas-container case,
> configuration (and qemu deployment) should continue to happen via kas-isar.
I think we kind of agree on the containerized CI use-case. My only doubt is how to validate the expectations from inside of the container...
But I think we disagree on the host configuration done by kas-isar in the kas-container use-case. I'd give the user the opportunity (documentation) to configure the system himself. Or offer the configuration from kas-container (but not magically from the container), if strictly required and clearly communicating what will be done. As mentioned above, using an option of kas-container as a flag. What kas-container would basically do is either installing a distro package (like "buster-backport" in a Debian Buster system) or directly obtaining the binaries and registering them on non-supported distros.
>
>>>> I personally would go for the first approach on desktop systems and consider using the second approach on CI systems. In any case, none of them require any binfmt_misc configurations from within the containers! What's my goal with this lengthy e-mail :-)
>>>>
>>> We will need both, see above.
>> I wonder if a setup with new qemu-user binaries in the host would fit all the hereby described scenarios. But in order to answer that it's critical understanding the root-cause for the binfmt_misc reconfiguration from within kas-container in the first place (see my questions above).
> See above, try to think it through from the perspective of a non-expert
> and/or non-Debian Linux user.
I'm thinking from a non-expert and/or non-Debian Linux user. The main goal of this thread is identifying the use-cases, requirements, issues to fix,... before sending a RFC patch.
One important question that is still open for me is which QEMU-User version is required for kas-isar. Do we have a known number? Apparently 3.1.0 (what Debian Buster provides) doesn't fulfill the requirements, but 5.2.0 (what the Debian Bullseye backport to Buster provides) does. If something in between possible provided by other distros suffice remains unclear to me.
Silvano
>
> Jan
>
--
Siemens AG, T RDA IOT SES-DE