Connection plugin with slow _connect

36 views
Skip to first unread message

Roman Bolshakov

unread,
Jul 8, 2021, 3:32:34 PM7/8/21
to ansibl...@googlegroups.com, rjo...@redhat.com, Konstantin Shelekhin, Vyacheslav Spiridonov
Hi all,

I'm working on a guestfs [1] connection plugin and seeking for a design
advice.

libguestfs provides a set of command line tools that can be used to
operate on virtual machine disk images and modify its contents.

For every task, the connection plugin:

1. Starts guestfish in --remote mode on a remote host over ssh and adds
a disk (passed as a parameter to guestfs connection).

2. Runs supermin applicance [2][3]. It typically takes two to four
seconds to spin up the applicance VM.

3. Mounts root fs partition (partition number is passed as a parameter to the
guestfs connection)

4. Performs the task:

Some implementation details:
- put_file/fetch_file is implemented using copy-in/copy-out [4][5]
guestfish commands

- there's intermediate copy to/from remote host over ssh (to enable
remote guestfs operation)

- exec_command is implemented using "command" [6] guestfish command

5. Stops supermin appliance/a guestfish instance


Here's an example how it looks in a playbook:

- name: Add disk image to inventory
add_host:
name: "{{ vm_disk_path }}"
ansible_host: "{{ ansible_host }}"
ansible_connection: guestfs
ansible_guestfs_disk_path: "{{ vm_disk_path }}"
ansible_guestfs_root_partnum: "{{ root_partnum }}"
changed_when: false

- name: Test guestfs
ping:
delegate_to: "{{ vm_disk_path }}"

The ping command is performed using the execution environment from
within the disk image on remote host:

TASK [Add disk image to inventory] ******************************************
ok: [remote-hypervisor]

TASK [Test guestfs] *********************************************************
ok: [remote-hypervisor -> /home/user/test.qcow2]

Likewise, a role can be delegated to the guestfs disk image.

The problem is that _connect() spins up supermin VM on every task and
stops afterwards. So, it takes at least two seconds only to perform
_connect(). Obviously it's very slow for plays with a lot of tasks and
roles.

The question is how it can be optimized to avoid costly _connect caused
by appliance start?

I think of the following approaches:

1. Introduce a separate module that starts up or stops guestfs appliance
and remove the action from the connection plugin

Pros: similar to ldx, docker, virt connections that have separate
tasks for start/stop of the conntainers/VMs
Cons: extra tasks need to be added for every play to start/stop guestfish

2. Add a separate meta task that closes the connection and a connection
flag that effectively doesn't stop guestfish after the first task

The meta task 'close_connection' can either be added as a separate
module or as an extension to builtin meta module.

Cons:
- it looks flaky - guestfish might be unintentionally left running
somewhere in the middle of the play in case of an error. Extra
care (i.e. blocks) might be needed to always close guestfs
connection.

3. Extend persistent connection framework [7]. There might be new mode
that keeps connection open for a sequence of tasks running on the
same connection without an explicit timeout. So this mode looks like
this:

task 1 on a guestfs connection - implicit _connect
task 2 on the same guestfs connection - no _connect
...
task n on the same guestfs connection - no _connect
task z on any other connection or the end of play - implicit close()
of the guestfs connection

Pros: reliable, tidy - no need of extra tasks/blocks
Cons:
- need to modify ansible core - task_executor, etc :)
- not sure if ansible is able persist connections across the roles

Looking forward to a feedback on what of the approaches is the most
solid/sane.

1. https://libguestfs.org
2. https://libguestfs.org/guestfs-internals.1.html#architecture
3. https://libguestfs.org/supermin.1.html
4. https://libguestfs.org/guestfish.1.html#copy-in
5. https://libguestfs.org/guestfish.1.html#copy-out
6. https://libguestfs.org/guestfish.1.html#command
7. https://www.ansible.com/deep-dive-with-network-connection-plugins

Thanks,
Roman

Roman Bolshakov

unread,
Jul 9, 2021, 5:52:33 AM7/9/21
to Richard W.M. Jones, ansibl...@googlegroups.com, Konstantin Shelekhin, Vyacheslav Spiridonov
On Thu, Jul 08, 2021 at 09:11:08PM +0100, Richard W.M. Jones wrote:
> On Thu, Jul 08, 2021 at 10:32:25PM +0300, Roman Bolshakov wrote:
> > Hi all,
> >
> > I'm working on a guestfs [1] connection plugin and seeking for a design
> > advice.
> >
> > libguestfs provides a set of command line tools that can be used to
> > operate on virtual machine disk images and modify its contents.
> >
> > For every task, the connection plugin:
> >
> > 1. Starts guestfish in --remote mode on a remote host over ssh and adds
> > a disk (passed as a parameter to guestfs connection).
> >
> > 2. Runs supermin applicance [2][3]. It typically takes two to four
> > seconds to spin up the applicance VM.
>
> Depending on the target, simply running something like
>
> guestfish -a /dev/null run
>
> will create and cache an appliance in /var/tmp/.guestfs-$UID/ (and
> it's safe if two processes run in parallel). Once the appliance is
> cached new libguestfs instances will use the cached appliance without
> any delay.
>
> Doesn't this mechanism work?

Hi Rich,

Appliance caching indeed works. If I remove it, it takes around 20
seconds to rebuild new appliance. Then it's used for new libguestfs
instances.

I was rather talking about the inherent latency caused by instance/VM
start. In the current implementation of guestfs plugin, the appliance is
started before each task and stopped afterwards.

My intent is to find a way to run multiple ansible tasks on the same
libguestfs instance. That saves up to 2-4 seconds per task.

>
> Nevertheless for virt-v2v we have something similar because virt-v2v
> is a long-running process that we want to start and query status from.
> My colleague wrote a wrapper (essentially a sort of daemon) which
> manages virt-v2v, and I guess may be useful to look at:
>
> https://github.com/ManageIQ/manageiq-v2v-conversion_host/tree/master/wrapper

I'm doing something similar, except I'm running guestfish ---remote
under nohup, remember PID and then interact with it. If we find a way to
pass PID associated to a connneciton from task to task in ansible and
kill it when it's no longer needed (to be able to start a real VM with
the disk image) then we can achieve very fast and reliable task
execution on the disk images.

Thanks,
Roman

>
> Rich.
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-builder quickly builds VMs from scratch
> http://libguestfs.org/virt-builder.1.html
>

Roman Bolshakov

unread,
Jul 9, 2021, 6:43:22 AM7/9/21
to Richard W.M. Jones, ansibl...@googlegroups.com, Konstantin Shelekhin, Vyacheslav Spiridonov
On Fri, Jul 09, 2021 at 11:02:38AM +0100, Richard W.M. Jones wrote:
> Oh I see, yes that's right.
>
> > My intent is to find a way to run multiple ansible tasks on the same
> > libguestfs instance. That saves up to 2-4 seconds per task.
>
> You shouldn't really reuse the same appliance across trust boundaries
> (eg. if processing two disks which are owned by different tenants of
> your cloud), since it means one tenant would be able to interfere with
> or extract secrets from the other tenant. The 2-4 seconds is the
> price you pay here I'm afraid :-/
>
> If all disks you are processing are owned by the same tenant then
> there's no worry about security.
>

Right, I'm only trying to optimize access to the same disk by a set of
consecutive ansible tasks in the same playbook (and typically belonging
to a VM owned by a specific user) to ensure the trust boundaries.

Thanks,
Roman

> > >
> > > Nevertheless for virt-v2v we have something similar because virt-v2v
> > > is a long-running process that we want to start and query status from.
> > > My colleague wrote a wrapper (essentially a sort of daemon) which
> > > manages virt-v2v, and I guess may be useful to look at:
> > >
> > > https://github.com/ManageIQ/manageiq-v2v-conversion_host/tree/master/wrapper
> >
> > I'm doing something similar, except I'm running guestfish ---remote
> > under nohup, remember PID and then interact with it. If we find a way to
> > pass PID associated to a connneciton from task to task in ansible and
> > kill it when it's no longer needed (to be able to start a real VM with
> > the disk image) then we can achieve very fast and reliable task
> > execution on the disk images.
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-p2v converts physical machines to virtual machines. Boot with a
> live CD or over the network (PXE) and turn machines into KVM guests.
> http://libguestfs.org/virt-v2v
>

Roman Bolshakov

unread,
Jul 12, 2021, 2:15:35 PM7/12/21
to ansibl...@googlegroups.com, rjo...@redhat.com, Konstantin Shelekhin, Vyacheslav Spiridonov
Brian Coca from Ansible team told on IRC that this is the way to go.

I'm all set,
Thanks!
Reply all
Reply to author
Forward
0 new messages