Salt orchestration

80 views

Skip to first unread message

Brian C. Duggan

unread,

Jan 4, 2019, 3:08:38 PM1/4/19

to qubes-users

Hi,

I need to orchestrate Salt states so that VMs are started, stopped and
configured in stages. I tried using the Salt Orchestrate Runner, but it
couldn't find states that I can use with 'qubesctl state.sls <state>'
and 'qubesctl state.highstate'.

I have two use cases:

1. Salt should start, configure, and halt template VMs before it starts
app VMs that use them. For example, the Salt GPG state requires the
python-gnupg package. This package needs to be installed in the template
VM so that the Salt GPG state can import keys in the app VM.

The current sequence of my states appears to let template VMs halt
before Salt starts app VMs. But I would like to strictly enforce this
ordering between admin VM states and regular VM states.

2. Salt should ensure that service VMs are running before Salt applies
states to their client VMs. For example, I have a service VM that
exports gpg-agent's SSH socket through Qrexec. This VM needs to be
running so that the client VM can clone git repos using keys on the
serivce VM.

This second case is more difficult to enforce without orchestration.

I can approximate this functionality with a series of commands:

qubesctl --target template-vm state.highstate
qvm-shutdown template-vm
qubesctl --target service-vm state.highstate
qvm-start service-vm
qubesctl --target client-vm state.highstate

But I would like to be able to describe this orchestration in Salt.

Does the Salt Orchestrate Runner work on Qubes? If not, is there a way
to orchestrate Salt on Qubes?

Thanks,
Brian

--
Brian C. Duggan
he/him/his

Brian C. Duggan

unread,

Jan 7, 2019, 12:20:38 PM1/7/19

to qubes-users

On 1/4/19 3:08 PM, Brian C. Duggan wrote:
> 2. Salt should ensure that service VMs are running before Salt applies
> states to their client VMs. For example, I have a service VM that
> exports gpg-agent's SSH socket through Qrexec. This VM needs to be
> running so that the client VM can clone git repos using keys on the
> serivce VM.
>

I did some more testing. Of course, Qubes starts halted VMs when another
VM makes a Qrexec RPC call to it. The calling process on the client VM
will block until the service VM starts and the RPC call returns. So this
isn't really a valid use case for orchestration.

At first, I thought the SSH authentication attempts failed because the
service VM wasn't started yet. After more testing, I can see that the
systemd socket service just doesn't work at the stage during initial
boot that Salt runs. The socket file exists at this stage, though. SSH
authentication succeeds during subsequent Salt runs after the VM is booted.

But I've also noticed that sometimes a new app VM's grain ID is still
the template's ID when Salt processes templates. This can be a problem
when both dom0 and app VMs need the same pillar data:

pillar/app/client-vm-1.sls:
app:
client-vm-1:
server-name: server-vm-1

pillar/app/client-vm-2.sls:
app:
client-vm-2:
server-name: server-vm-1

pillar/top.sls:
base:
dom0,client-vm-1:
- match: list
- app.client-vm-1
dom0,client-vm-2:
- match: list
- app.client-vm-2

dom0 needs the combined app data to set RPC policies between the clients
and their servers. The clients need their own data to configure which
service VM to send their RPC to. It's convenient for clients to find it
through pillar['app'][grains['id']]. Maybe there's a better way of
constructing this pillar data?

Is there a way to delay Salt execution on VMs until they are fully booted?

For the curious, I'm using a Salt formula to set up access to gpg-agent
on a service VM from client VMs through Qrexec:

https://gitlab.com/bcduggan/qrexec-gpg-agent-formula

Marek Marczykowski-Górecki

unread,

Jan 11, 2019, 7:41:37 PM1/11/19

to Brian C. Duggan, qubes-users

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Mon, Jan 07, 2019 at 12:20:31PM -0500, Brian C. Duggan wrote:
> On 1/4/19 3:08 PM, Brian C. Duggan wrote:
> > 2. Salt should ensure that service VMs are running before Salt applies
> > states to their client VMs. For example, I have a service VM that
> > exports gpg-agent's SSH socket through Qrexec. This VM needs to be
> > running so that the client VM can clone git repos using keys on the
> > serivce VM.
> >
>
> I did some more testing. Of course, Qubes starts halted VMs when another
> VM makes a Qrexec RPC call to it. The calling process on the client VM
> will block until the service VM starts and the RPC call returns. So this
> isn't really a valid use case for orchestration.
>
> At first, I thought the SSH authentication attempts failed because the
> service VM wasn't started yet. After more testing, I can see that the
> systemd socket service just doesn't work at the stage during initial
> boot that Salt runs. The socket file exists at this stage, though. SSH
> authentication succeeds during subsequent Salt runs after the VM is booted.
>
> But I've also noticed that sometimes a new app VM's grain ID is still
> the template's ID when Salt processes templates.

That shouldn't happen in theory... Can you give more details, especially
which templates, and qubes* packages version?

Additionally, even if grain['id'] doesn't match, target VM will get
access to other's VM pillar data - it's enforced when copying pillar
data out of dom0.

> This can be a problem
> when both dom0 and app VMs need the same pillar data:
>
> pillar/app/client-vm-1.sls:
> app:
> client-vm-1:
> server-name: server-vm-1
>
> pillar/app/client-vm-2.sls:
> app:
> client-vm-2:
> server-name: server-vm-1
>
> pillar/top.sls:
> base:
> dom0,client-vm-1:
> - match: list
> - app.client-vm-1
> dom0,client-vm-2:
> - match: list
> - app.client-vm-2
>
> dom0 needs the combined app data to set RPC policies between the clients
> and their servers. The clients need their own data to configure which
> service VM to send their RPC to. It's convenient for clients to find it
> through pillar['app'][grains['id']]. Maybe there's a better way of
> constructing this pillar data?

The fact that you'll see only the right pillar data, regardless of
grains['id'] may help you. You can iterate over 'app' dict and use
whatever you find there, regardless of the first key name level.
It will complicate your configuration, but until proper solution is
found, it should work.

> Is there a way to delay Salt execution on VMs until they are fully booted?

By default it's delayed until qrexec-agent is started, which should be
after essential services. If you want, you may:

1. Add a state waiting for user session and order other things after it.
This won't help with grains and such things, as salt load them before
considering states, but may help with some states, if are dependent on
running X server for example. For this, add this:

/etc/qubes-rpc/qubes.WaitForSession:
cmd.run:
- runas: user

2. Configure qubes.VMRootShell qrexec service in a VM (used by salt) to
wait for user session. This will affect the whole salt call for that VM.
But also means it will wait indefinitely if no user session is started
at all (for example you're logged out of dom0).
For this create /etc/qubes/rpc-config/qubes.VMRootShell in the template
with "wait-for-session=1" inside.

> For the curious, I'm using a Salt formula to set up access to gpg-agent
> on a service VM from client VMs through Qrexec:
>
> https://gitlab.com/bcduggan/qrexec-gpg-agent-formula

One MAJOR problem with giving unfiltered access to gpg-agent is that,
client can request gpg-agent to export secret keys. Which defeat the
whole purpose of keeping secret keys in separate qube - that client have
no access to its secret part.
You may want to look at https://github.com/hw42/qubes-app-linux-split-gpg2/

I think this problem does not apply to ssh-agent protocol, which AFAIK
does not allow client to extract secret keys.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAlw5N7kACgkQ24/THMrX
1yzQPwf+I1+7XjklLKxfGUVG1mBMWUdsvv5WOchp4uhWJeNpZVlavCLZNj0S09IL
T5kGdw0/oM78LDnFRPlAEXRp/w/r2pg1Q0aA/dG7iyQsMWdzqYl/uAdNEpx2ML+h
6T7pRrTCBMUrxAub5rJq3xpGPgfwA9JwCDrR8h4xVC55grUuvMOuR5PH/A1ksbg8
c/RfU/GeTGPjjisEAyYARSM29BT098BD3IcZjaMe1X2jnaQkdZYJnf6nDZ+qMR7t
Thy21mn45BPVcM1TF1012waXimlz9utVI3zytUKDZHURQtfWwTzKB3UOwmOH7460
u2qWHMnEOURbzGBUcp2oiXiG3JEFSA==
=DMM5
-----END PGP SIGNATURE-----

Brian C. Duggan

unread,

Jan 14, 2019, 6:34:03 AM1/14/19

to qubes-users

On 1/11/19 7:41 PM, Marek Marczykowski-Górecki wrote:
> On Mon, Jan 07, 2019 at 12:20:31PM -0500, Brian C. Duggan wrote:
>> On 1/4/19 3:08 PM, Brian C. Duggan wrote:
>>> 2. Salt should ensure that service VMs are running before Salt applies
>>> states to their client VMs. For example, I have a service VM that
>>> exports gpg-agent's SSH socket through Qrexec. This VM needs to be
>>> running so that the client VM can clone git repos using keys on the
>>> serivce VM.
>>>
>
>> I did some more testing. Of course, Qubes starts halted VMs when another
>> VM makes a Qrexec RPC call to it. The calling process on the client VM
>> will block until the service VM starts and the RPC call returns. So this
>> isn't really a valid use case for orchestration.
>
>> At first, I thought the SSH authentication attempts failed because the
>> service VM wasn't started yet. After more testing, I can see that the
>> systemd socket service just doesn't work at the stage during initial
>> boot that Salt runs. The socket file exists at this stage, though. SSH
>> authentication succeeds during subsequent Salt runs after the VM is booted.
>
>> But I've also noticed that sometimes a new app VM's grain ID is still
>> the template's ID when Salt processes templates.
>
> That shouldn't happen in theory... Can you give more details, especially
> which templates, and qubes* packages version?
>

Sure, I was able to reproduce it and I created an issue for it with the
templates and qubes* packages versions:

https://github.com/QubesOS/qubes-issues/issues/4709

Briefly, app VMs get their templates' grains['id'] for about five
minutes after applying a state to the template and shutting down the
template.

> Additionally, even if grain['id'] doesn't match, target VM will get
> access to other's VM pillar data - it's enforced when copying pillar
> data out of dom0.
>

Hm, my target VMs only get access to pillar data that is applied to them
through the top file or through templating. They don't see other VM's
pillar data. Did I understand you right?

That's what I ended up doing, I think. In my formula I select the
first key in the app dict in the Jinja template. Since there's only one
key available to each client VM, it doesn't matter that the grains['id']
doesn't match the key name.

>> Is there a way to delay Salt execution on VMs until they are fully booted?
>
> By default it's delayed until qrexec-agent is started, which should be
> after essential services. If you want, you may:
>
> 1. Add a state waiting for user session and order other things after it.
> This won't help with grains and such things, as salt load them before
> considering states, but may help with some states, if are dependent on
> running X server for example. For this, add this:
>
> /etc/qubes-rpc/qubes.WaitForSession:
> cmd.run:
> - runas: user
>
> 2. Configure qubes.VMRootShell qrexec service in a VM (used by salt) to
> wait for user session. This will affect the whole salt call for that VM.
> But also means it will wait indefinitely if no user session is started
> at all (for example you're logged out of dom0).
> For this create /etc/qubes/rpc-config/qubes.VMRootShell in the template
> with "wait-for-session=1" inside.
>

These are great ideas! I'll try them out.

>> For the curious, I'm using a Salt formula to set up access to gpg-agent
>> on a service VM from client VMs through Qrexec:
>
>> https://gitlab.com/bcduggan/qrexec-gpg-agent-formula
>
> One MAJOR problem with giving unfiltered access to gpg-agent is that,
> client can request gpg-agent to export secret keys. Which defeat the
> whole purpose of keeping secret keys in separate qube - that client have
> no access to its secret part.

This is correct for the default gpg-agent 2.1.x socket, S.gpg-agent. I
don't export that one to client VMs.

Starting in GnuPG 2.1.1, gpg-agent presents a socket with restricted
functionality that it says is safe to forward to remote hosts,
S.gpg-agent.extra:

https://wiki.gnupg.org/AgentForwarding

gpg-agent forbids secret key deletion and export through the extra
socket. This is the one I export with the formula. Would you consider
this filtered enough for inter-VM use?

> You may want to look at https://github.com/hw42/qubes-app-linux-split-gpg2/
>

Thanks, Marek. I think I have seen this before, and it's a good option.
It looks like it accomplishes one of the features of exporting the extra
socket: each client VM gets its own public keyring.

But, like normal Split GPG, it still requires using gpg wrappers on
clients and Split-GPG-specific environment variables to accommodate the
wrappers. I know that doesn't create a lot of friction for many users.
But I want to be able to use gpg and ssh on client VMs with no special
wrappers or environment. Using the extra socket allows this, too.

It also doesn't look like Split GPG2 allows SSH to use gpg-agent as an
SSH agent, which is another reason I wrote the formula.

> I think this problem does not apply to ssh-agent protocol, which AFAIK
> does not allow client to extract secret keys.
>