Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Fixing Linux getrandom() in stable

166 views
Skip to first unread message

Adrian Bunk

unread,
May 13, 2018, 4:50:13 PM5/13/18
to
On Wed, May 09, 2018 at 11:46:00PM +0100, Ben Hutchings wrote:
>...
> # Security flaw and initial fix
>
> Recently it was discovered that getrandom() could return successfully
> before the RNG was really ready to produce unpredictable data. This
> issue was designated as CVE-2018-1108, and was fixed in Linux 4.17-rc2
> and various stable updates.
>
> We fixed CVE-2018-1108 with an update to stretch last week
> (DSA-4188-1). The kernel versions in wheezy and jessie do not provide
> getrandom().
>
> # Regression
>
> The version of glibc in stretch does not provide access to getrandom(),
> but some packages in stable use syscall() to call it anyway, including:
>
> * krb5: k5_get_os_entropy()
> * libbsd: arc4_random_buf(). This is used by many other packages
> including libICE, and so indirectly by gnome-session.
>
> Following DSA-4188-1, it turned out that the RNG did not become ready
> on some systems until several minutes after boot, causing severe
> regressions for GNOME/gdm (#897631, #897632) and Kerberos (#897599,
> #897917). We therefore reverted the fix in yesterday's update
> (DSA-4196-1).
>
> # Options for a new fix
>
> It is unlikely that any further fix will be forthcoming on the kernel
> side, so I believe that we need to do one of:
>
> 1. Add entropy to the kernel during boot; either:
> a. Improve systemd-random-seed
> b. Recommend use of haveged

I don't see any solution above that both always works and never results
in new CVEs.

As an example, what happens if I debootstrap and deploy the resulting
filesytem to a large number of identical embedded systems without
entropy sources?

As far as I can see, any solution above would either give me boot hangs
or might result in nasty security issues due to the same (known) entropy
being fed to /dev/random on many machines.

Similar problems for cases like live CDs and installers.

> 2. For each affected userland package, either:
> a. Revert to using /dev/urandom

I wonder whether the current issue is just the tip of the iceberg,
and usage of /dev/urandom is a gazillion CVEs waiting to be reported.

In that case the CVE-2018-1108 fix only revealed a long existing
vulnerability in some packages that already switched to getrandom().

/dev/urandom is documented in a very misleading way, quoting random(4):
When read during early boot time, /dev/urandom may return data prior to
the entropy pool being initialized. If this is of concern in your
application, use getrandom(2) or /dev/random instead.

What is the worst case for "early boot time" here? "always"?

Due to the gdm bugs mentioned above we know that there are real-life
situations where gdm currently uses "random" data that might be
predictable.

grep tells me:
daemon/gdm-x-session.c: auth_entry.data = gdm_generate_random_bytes (auth_entry.data_length, &error);
daemon/gdm-display-access-file.c: *cookie = gdm_generate_random_bytes (GDM_DISPLAY_ACCESS_COOKIE_SIZE,

Repeat the same for every package that uses /dev/urandom.

> b. Tolerate a longer wait for getrandom() to return

I suspect there might be no guaranteed upper bound for the waiting time.

>...
> The libbsd maintainer (Guillem Jover) favours option 2a.
>
> One of the krb5 maintainers (Benjamin Kaduk) favours option 2b, and
> also proposed that systemd could provide a wait-for-rng-ready unit to
> support this.

I don't see any general solution that is both correct and easy.

The proper way forward might be to deprecate /dev/urandom and add a
third option GRND_UNSAFE_RANDOM to getrandom() that is documented to
never block but might return predictable data in some cases.

It would then be up to the application to decide whether predictable
data is acceptable, and what to do in entropy-starved situations.

Regarding the suggested wait-for-rng-ready systemd unit for others to
wait on, this only makes sense for cases where "do not start at all"
is the best handling for a "no entropy" situation.

> Ben.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

Thorsten Glaser

unread,
May 13, 2018, 5:40:06 PM5/13/18
to
Adrian Bunk dixit:

>As an example, what happens if I debootstrap and deploy the resulting
>filesytem to a large number of identical embedded systems without
>entropy sources?

Just get into a habit of not doing so, for example by modifying the
image during each writing process.

Having the bootloader inject entropy into the kernel would of course
help (something OpenBSD actually did, which I’d dreamt of only until).
Reuse is only problematic then if the system actually booted, i.e. an
early userspace thing reading and immediately writing to a file will
stave off the remaining issues.

>As far as I can see, any solution above would either give me boot hangs
>or might result in nasty security issues due to the same (known) entropy
>being fed to /dev/random on many machines.
>
>Similar problems for cases like live CDs and installers.

And CPUs, and architectures, without usable boot entropy.

For the “CD image” case, though, adding stuff like the MAC addresses
of the onboard NICs and the current time would at least shuffle the
existing (albeit known) entropy around enough for it to begin to differ.

A web service for getting random bits sounds like an idea, until you
get to the privacy implications of that (as well as reliability). But
if it’s inhouse, it’s doable.

>I wonder whether the current issue is just the tip of the iceberg,
>and usage of /dev/urandom is a gazillion CVEs waiting to be reported.

Did you see the fallback code for Linux in OpenBSD’s code for libbsd?
It’s… like trying to find randomly-looking things on an 1990s Unix.

This is best fixed in the kernel and earlier, plus an extra read/write
in early userspace. Of course, embedding some entropy into the kernel
image itself will make the reproducible-builds people entirely unhappy…
(this one *is* implemented in MirBSD, complete with a tool to update
it).

>Due to the gdm bugs mentioned above we know that there are real-life
>situations where gdm currently uses "random" data that might be
>predictable.

The question is which uses actually need entropy estimated good enough?

>> b. Tolerate a longer wait for getrandom() to return
>
>I suspect there might be no guaranteed upper bound for the waiting time.

On a discless system with no hardware sources (possibly no network)
and no keyboard interaction? Infinite.

Of course, if early userspace could reliably update a file, then the
file’s content could be estimated as good enough and be credited to
the RNG, at least for non-identical/readonly-/shared-media systems.

>never block but might return predictable data in some cases.

“What level of predictability?”

>It would then be up to the application to decide whether predictable
>data is acceptable, and what to do in entropy-starved situations.

I guess most application authors have no answer for you here.

It’s also no solution for the arc4random API… seems like a cultural
clash (BSD expectations vs. what Linux can actually deliver).

bye,
//mirabilos
--
“Having a smoking section in a restaurant is like having
a peeing section in a swimming pool.”
-- Edward Burr

Thorsten Glaser

unread,
May 13, 2018, 9:00:05 PM5/13/18
to
Theodore Y. Ts'o dixit:

>that problems helps most of our users, and we shouldn't let the
>perfect be the enemy of the good.

Agreed. Start small, then enhance one bootloader at a time.
Or boot protocol, I assume.

>Also note that the bootloader has depend on userspace to refresh the
>seed entropy, both in early boot (in case the syscrashes), and at
>shutdown (so the entropy captured while the system is running can be

Definitely!

>saved as seed entropy). And this is trickier in Linux because the
>bootloader lives in a different source tree, and is maintained by
>different people from the systemd and/or initscripts people, and for

Yes, unfortunately.

>that matter the bootloader doesn't know which distribution it is

But in this case, the distribution can tell the bootloader the
path to the file to load.

>the *BSD's has its advantages. And this is where perhaps Debian as a
>distribution can solve this problem by coordinating action across
>multiple Debian packages.)

Of course.

>The *point* is that we can't really make a turn-key solution which
>will work for everyone. For as much we have the desire for a
>"Universal OS", something that works for all hardware, all users, and
>all workloads, is probably just not attainable here.

As Debian, we can try to come close, but, as you said, don’t let
the perfect be the enemy of the good. Perhaps there are multiple
somethings that, together (or having the local admin choose) can
help more people than one simple solution, even if the latter may
help a majority. (I’m a fan of minorities, in case you couldn’t
tell. I run an x32 system, after all, and helped out m68k a bit…)

>(It never was a complete solution, BTW; even before the patches to
>address CVE-2018-1108, there were already hardware systems where you
>couldn't count on the RNG being initialized in time and getrandom(2)

Another question is what it means that the RNG is initialised.
It all depends on what in the end boils down to guesswork,
although I tip my hat because that RNG code of yours, both the
Linux and the BSD version, are pretty impressive.

But the point here is that, even if the RNG thinks it’s fully
initialised, it may not be “good” yet, depending on circumstances.
(Again, it should not stop us from trying.)

bye,
//mirabilos
--
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r

Theodore Y. Ts'o

unread,
May 13, 2018, 9:20:05 PM5/13/18
to
(Quoting somewhat out of order)

On Sun, May 13, 2018 at 09:23:39PM +0000, Thorsten Glaser wrote:
>
> It’s also no solution for the arc4random API… seems like a cultural
> clash (BSD expectations vs. what Linux can actually deliver).

It's instructive to look how OpenBSD solves this problem. OpenBSD
supports a much smaller set of architectures than linux, and a very
small set of bootloaders (which are part of the OpenBSD sources). So
what OpenBSD is make the bootloader responsible for reading in the
random seed file from persistent storage. Therefore OpenBSD doesn't
wait for the RNG to be initialized, because it assumes that this never
happens. (Hand-waving what happens during the install, but presumably
harvesting entropy from the CD installer is not a problem, and OpenBSD
doesn't support debootstrap. :-)

So the first thing is that we *really* should get folks working on
adding support to the x86 boot protocol so that in addition to passing
a pointer to the loaded kernel, the inital ramdisk, and the boot
command line, there should also be a pointer passed to the kernel
containing a pointer to X bytes of seed entropy. This begs the
question of how do we trust that the bootloader as actually gotten an
effective source of seed entropy. Unlike OpenBSD, there are at least
five or six different bootloaders which implement the x86 boot loader
protocol for Linux (probably more), and can we trust that they are all
implemented correctly? And of course, this is an x86-only solution.
What about all of the other architectures supported by Debian?

Still, the vast majority of Debian users are using x86, so solving
that problems helps most of our users, and we shouldn't let the
perfect be the enemy of the good.

Also note that the bootloader has depend on userspace to refresh the
seed entropy, both in early boot (in case the syscrashes), and at
shutdown (so the entropy captured while the system is running can be
saved as seed entropy). And this is trickier in Linux because the
bootloader lives in a different source tree, and is maintained by
different people from the systemd and/or initscripts people, and for
that matter the bootloader doesn't know which distribution it is
booting. (This is one of places where having a single source tree ala
the *BSD's has its advantages. And this is where perhaps Debian as a
distribution can solve this problem by coordinating action across
multiple Debian packages.)

> >Due to the gdm bugs mentioned above we know that there are real-life
> >situations where gdm currently uses "random" data that might be
> >predictable.

When does gdm need true cryptographic randomness? We should take a
step back and take look at the big picture. The only uses I can think
of involving using XDMCP or some other Remote Desktop Protocol. But
that protocol was invented in the days pre-SSH, and it is about as
secure as telnet --- which is to say, not at all. So picking a
randomly generated password for networked X or MIT Magic Cookie is
something where I'd argue if you're worried about the quality of
/dev/urandom, you're not worried about the your biggest security
vulnerability. (Think bank vault doors attached to Papier mâché
walls....)

The util-linux-ng package made a similar calculation in v2.32
(interestingly, *before* the changes to address CVE-2018-1108 were
made):

commit a9cf659e0508c1f56813a7d74c64f67bbc962538
Author: Carlo Caione <ca...@endlessm.com>
Date: Mon Mar 19 10:31:07 2018 +0000

lib/randutils: Do not block on getrandom()

In Endless we have hit a problem when using 'sfdisk' on the really first
boot to automatically expand the rootfs partition. On this platform
'sfdisk' is blocking on getrandom() because not enough random bytes are
available. This is an ARM platform without a hwrng.

We fix this passing GRND_NONBLOCK to getrandom(). 'sfdisk' will use the
best entropy it has available and fallback only as necessary.

Signed-off-by: Carlo Caione <ca...@endlessm.com>

commit edc1c90cb972fdca1f66be5a8e2b0706bd2a4949
Author: Karel Zak <kz...@redhat.com>
Date: Tue Mar 20 14:17:24 2018 +0100

lib/randutils: don't break on EAGAIN, use usleep()

....

Note that we do not use random numbers for security sensitive things
like keys or so. It's used for random based UUIDs etc.

Addresses: https://github.com/karelzak/util-linux/pull/603
Signed-off-by: Karel Zak <kz...@redhat.com>

> >> b. Tolerate a longer wait for getrandom() to return
> >
> >I suspect there might be no guaranteed upper bound for the waiting time.
>
> On a discless system with no hardware sources (possibly no network)
> and no keyboard interaction? Infinite.
>
> Of course, if early userspace could reliably update a file, then the
> file’s content could be estimated as good enough and be credited to
> the RNG, at least for non-identical/readonly-/shared-media systems.

... and ultimately, this is the problem. Having an initialized RNG is
ultimately, a system design issue that has to be considered
holistically. If you have special hardware that you trust, it's easy.
Or in a VM environment, where you have to implicitly trust the host
*anyway* you could just use Virtio-rng and be done with it.

Or it might depend on your workload. The security requirements of a
information kiosk system will be quite different from a Kerberos KDC
server.

Or it might depend on who you are. If you're Intel or the US
government, maybe you're willing to trust RDRAND, either because you
know that it's secure because you've laid eyes on the internal CPU
chip designs (or perhaps, maybe, you put the back door in yourself,
and you've decided you don't need to worry about own goals :-).

The *point* is that we can't really make a turn-key solution which
will work for everyone. For as much we have the desire for a
"Universal OS", something that works for all hardware, all users, and
all workloads, is probably just not attainable here.

(It never was a complete solution, BTW; even before the patches to
address CVE-2018-1108, there were already hardware systems where you
couldn't count on the RNG being initialized in time and getrandom(2)
would block. It's just that they were few in number, and they tended
to very niche systems for the tiniest of IOT devices, where you
wouldn't be using gdm, or for that matter, systemd, because they
simply wouldn't fit.)

- Ted

Ben Hutchings

unread,
May 13, 2018, 10:20:06 PM5/13/18
to
On Sun, 2018-05-13 at 23:48 +0300, Adrian Bunk wrote:
> On Wed, May 09, 2018 at 11:46:00PM +0100, Ben Hutchings wrote:
[...]
> > # Options for a new fix
> >
> > It is unlikely that any further fix will be forthcoming on the kernel
> > side, so I believe that we need to do one of:
> >
> > 1. Add entropy to the kernel during boot; either:
> > a. Improve systemd-random-seed
> > b. Recommend use of haveged
>
> I don't see any solution above that both always works and never results
> in new CVEs.

Indeed.

> As an example, what happens if I debootstrap and deploy the resulting
> filesytem to a large number of identical embedded systems without
> entropy sources?

Then it is your fault when they turn into a botnet. :-) Availability
of randomness must be considered in the design of embedded systems.

[...]
> /dev/urandom is documented in a very misleading way, quoting random(4):
> When read during early boot time, /dev/urandom may return data prior to
> the entropy pool being initialized. If this is of concern in your
> application, use getrandom(2) or /dev/random instead.
>
> What is the worst case for "early boot time" here? "always"?

No, I don't think so.

> Due to the gdm bugs mentioned above we know that there are real-life
> situations where gdm currently uses "random" data that might be
> predictable.
>
> grep tells me:
> daemon/gdm-x-session.c: auth_entry.data = gdm_generate_random_bytes (auth_entry.data_length, &error);
> daemon/gdm-display-access-file.c: *cookie = gdm_generate_random_bytes (GDM_DISPLAY_ACCESS_COOKIE_SIZE,
>
> Repeat the same for every package that uses /dev/urandom.

This is certain undesirable, but it's exploitable only by local users.
(If you let the X server listen to the network, all authentication
cookies are sent in the clear so you've already lost. If you use ssh X
forwarding, it generates a new authentication cookie for use with the X
proxy on the remote machine.)

>
> > b. Tolerate a longer wait for getrandom() to return
>
> I suspect there might be no guaranteed upper bound for the waiting time.

Interrupt timing feeds into the RNG, and as long as there's at least
one interrupt per second then I think the RNG will reach the fully
initialised state after a few minutes. I just started a VM with a
serial console and only a shell running as pid 1, which is about as
idle a system as I can imagine, and it was seeing more than one
interrupt per second. However, other architectures (e.g. s390x) might
achieve greater idleness.

> > ...
> > The libbsd maintainer (Guillem Jover) favours option 2a.
> >
> > One of the krb5 maintainers (Benjamin Kaduk) favours option 2b, and
> > also proposed that systemd could provide a wait-for-rng-ready unit to
> > support this.
>
> I don't see any general solution that is both correct and easy.

Indeed.

> The proper way forward might be to deprecate /dev/urandom and add a
> third option GRND_UNSAFE_RANDOM to getrandom() that is documented to
> never block but might return predictable data in some cases.

This doesn't solve anything for us. (It does help with the original
problem of device nodes possibly being absent from a minimal container
or chroot.)

> It would then be up to the application to decide whether predictable
> data is acceptable, and what to do in entropy-starved situations.
>
> Regarding the suggested wait-for-rng-ready systemd unit for others to
> wait on, this only makes sense for cases where "do not start at all"
> is the best handling for a "no entropy" situation.

Yes.

Ben.

> > Ben.
>
> cu
> Adrian
>
--
Ben Hutchings
For every action, there is an equal and opposite criticism. - Harrison

signature.asc

Sam Hartman

unread,
May 14, 2018, 9:20:07 AM5/14/18
to
>>>>> "Thorsten" == Thorsten Glaser <t...@mirbsd.de> writes:

Thorsten> Adrian Bunk dixit:
>> As an example, what happens if I debootstrap and deploy the
>> resulting filesytem to a large number of identical embedded
>> systems without entropy sources?

Thorsten> Just get into a habit of not doing so, for example by
Thorsten> modifying the image during each writing process.

I'm sorry, but modifying the image before each write is simply not
realistic.

My company has found that it's easy to get suppliers to deploy a static
image to the storage of appliances we're constructing during the
manufacturing process.
They do not have tools for modifying the image. We do detect first boot
and do things like change filesystem UUIDs.
Mixing in any entropy we can obtain during the first boot is relatively
easy. However, very quickly, we're going to need to do things like
generate ssh keys for management and generate a few other public keys.

Similar situations show up in cloud environments. There you can use
virtio-rng or similar.

However, the fact is that when we design systems, we are constrained by
constraints placed by other parts of the process out of our control.
Delivering an image that is static and that will be deployed onto
multiple systems is something that does happen and it happens because
it's the best design tradeoff available.
It does have security implications, and in fact may decrease security of
random numbers overall. On the other hand, it can increase security of
code integrity and tends to be associated with design methodologies that
create reproducible environments.

So, you can try and sweep static images under the rug, but all you're
doing is dsmissing people with real problems they need to solve.
It would be much more constructive to acknowledge that people will use
static images, discuss the security implications, solve the problems we
can solve, and document the residual security implications so our users
and the broader community are aware of our limitations.

--Sam

Adrian Bunk

unread,
May 22, 2018, 3:50:08 PM5/22/18
to
On Mon, May 14, 2018 at 03:11:30AM +0100, Ben Hutchings wrote:
> On Sun, 2018-05-13 at 23:48 +0300, Adrian Bunk wrote:
>...
> > Due to the gdm bugs mentioned above we know that there are real-life
> > situations where gdm currently uses "random" data that might be
> > predictable.
> >
> > grep tells me:
> > daemon/gdm-x-session.c: auth_entry.data = gdm_generate_random_bytes (auth_entry.data_length, &error);
> > daemon/gdm-display-access-file.c: *cookie = gdm_generate_random_bytes (GDM_DISPLAY_ACCESS_COOKIE_SIZE,
> >
> > Repeat the same for every package that uses /dev/urandom.
>
> This is certain undesirable, but it's exploitable only by local users.
> (If you let the X server listen to the network, all authentication
> cookies are sent in the clear so you've already lost. If you use ssh X
> forwarding, it generates a new authentication cookie for use with the X
> proxy on the remote machine.)

It is possible that this specific case is not a problem.

There was a certain "never use /dev/random, /dev/urandom is always good
enough" push that started several years before getrandom() became
available, and I'd bet someone will find exploitable cases due to
that somewhere.

The documented behaviour is that it is safe to use /dev/urandom except
during "early boot", and this is not always true in practice.

>...
> > The proper way forward might be to deprecate /dev/urandom and add a
> > third option GRND_UNSAFE_RANDOM to getrandom() that is documented to
> > never block but might return predictable data in some cases.
>
> This doesn't solve anything for us. (It does help with the original
> problem of device nodes possibly being absent from a minimal container
> or chroot.)
>...

I am less worried about device nodes possibly being absent, and more
worried about 3 different cases splintered over 2 completely different
APIs.

Ignoring any security implications, "workaround by switching from
getrandom() to /dev/urandom" sounds wrong since you shouldn't be
forced to a different API for that - getrandom() is what people
should use, therefore it should offer all 3 options.

> Ben.
>...
0 new messages