* Giuseppe Scrivano:
> Hi Florian,
>
> Florian Weimer <
fwe...@redhat.com> writes:
>
>> This is related to the kernel patch I just sent:
>>
>> <
https://groups.google.com/a/opencontainers.org/g/dev/c/gj9ErIn5LQI>
>>
>> I think it would be nice if we could phase out that ugly userspace
>> probing sequence eventually, but that requires switching from EPERM to
>> ENOSYS, with a patch like this:
>>
>> diff --git a/config-linux.md b/config-linux.md
>> index 9ea44a0..19278e1 100644
>> --- a/config-linux.md
>> +++ b/config-linux.md
>> @@ -646,7 +646,7 @@ The following parameters can be specified to set up seccomp:
>>
>> * **`errnoRet`** *(uint, OPTIONAL)* - the errno return code to use.
>> Some actions like `SCMP_ACT_ERRNO` and `SCMP_ACT_TRACE` allow to specify the errno
>> - code to return. If not specified its default value is `EPERM`.
>> + code to return. If not specified its default value is `ENOSYS`.
>>
>> * **`args`** *(array of objects, OPTIONAL)* - the specific syscall in seccomp.
>> Each entry has the following structure:
>>
>> Thoughts?
>
> I am afraid it will be a breaking change.
Some bug fixes necessarily are, unfortuantely. Not making this change
also breaks things.
For glibc's use, it would be sufficient to attach the EPERM vs ENOSYS
default to the base image and have it inherit by any derived images. (I
have no idea whether such a mechanism exist.) glibc updates typically
happen at distribution release boundaries, and that is more or less a
well-defined event, so the wider impact of the EPERM → ENOSYS transition
could be mentioned in the container image/distribution release notes.
Since the actually permitted set of system calls does not change, it
would be safe to ask the image itself for the preferred error default.
> I think ENOSYS makes sense only for new added syscalls, as likely there
> is a fallback, but IMO it should be handled at a higher level.
For security reasons, I think these kinds of seccomp filters need to
come in the form of a list of permitted (and thus known) system calls.
The system calls not in this list are unknown. There is no third,
fourth &c state here. Where would it come from?
libseccomp could hard-code a particular system call universe at build
time, based on the kernel headers it finds (or built-in tables). This
means that on every libseccomp upgrade, the universe could potentially
change, and with it some system call errors would turn from ENOSYS
(working fallback, previously not within the universe) to EPERM (broken
fallback, now in the universe). Any component that hard-codes such a
universe would have the same issue because the behavior outside the
permitted syscall list is essentially arbitrary. That includes the host
kernel: Assume a new system call is backported. Should containers
change their error code from ENOSYS to EPERM? I don't think so.
In a sense, the new/old system call distinction would turn what is a one
time potential breakage (due to the EPERM → ENOSYS transition) into an
ongoing source of issues related to potential ENOSYS → EPERM changes at
any cluster infrastructure update.
(Note that this message was written based on a view from the outside,
looking at various failure modes and discussions of related topics. I
do not anything about the inner workings of libseccomp.)