Metal3 3rd party CI on ironic?

Dmitry Tantsur

não lida,

6 de out. de 2020, 08:50:0106/10/2020

para Metal3 Development List

Hi folks,

I'm wondering if we can somehow run our CI jobs on upstream Ironic projects. BMO uses Ironic in a somewhat unusual way, so regressions are not impossible (remember how long it took us to fix all agent token problems?). We also keep seeing problems with sushy-tools.

There are two paths we could follow:

1) A proper 3rd party CI. Something (what?) will listen for events from Gerrit, trigger a run and report the results. The "proper" option but may require a lot of work on the CI side.

2) A normal Zuul CI job that somehow (how?) triggers a run, collects and reports results.

Does anyone have opinions on how we should proceed (and whether we should at all)?

Dmitry

--

Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

Zane Bitter

não lida,

6 de out. de 2020, 14:06:4606/10/2020

para metal...@googlegroups.com

On 6/10/20 8:49 am, Dmitry Tantsur wrote:
> Hi folks,
>
> I'm wondering if we can somehow run our CI jobs on upstream Ironic
> projects. BMO uses Ironic in a somewhat unusual way, so regressions are
> not impossible (remember how long it took us to fix all agent token
> problems?). We also keep seeing problems with sushy-tools.

+1

> There are two paths we could follow:
> 1) A proper 3rd party CI. Something (what?) will listen for events from
> Gerrit, trigger a run and report the results. The "proper" option but
> may require a lot of work on the CI side.
> 2) A normal Zuul CI job that somehow (how?) triggers a run, collects and
> reports results.
>
> Does anyone have opinions on how we should proceed (and whether we
> should at all)?

I wonder if it makes sense to use OpenLab for this - it seems like
exactly the kind of thing they exist for (if, in fact, it is still
maintained): https://docs.openlabtesting.org/

It looks like somebody started on this 18 months ago but I don't know
what happened next: https://github.com/theopenlab/openlab/issues/260

cheers,
Zane.

Dmitry Tantsur

não lida,

7 de out. de 2020, 07:49:2107/10/2020

para Zane Bitter, Metal3 Development List

On Tue, Oct 6, 2020 at 8:06 PM Zane Bitter <zbi...@redhat.com> wrote:

On 6/10/20 8:49 am, Dmitry Tantsur wrote:
> Hi folks,
>
> I'm wondering if we can somehow run our CI jobs on upstream Ironic
> projects. BMO uses Ironic in a somewhat unusual way, so regressions are
> not impossible (remember how long it took us to fix all agent token
> problems?). We also keep seeing problems with sushy-tools.

+1

> There are two paths we could follow:
> 1) A proper 3rd party CI. Something (what?) will listen for events from
> Gerrit, trigger a run and report the results. The "proper" option but
> may require a lot of work on the CI side.
> 2) A normal Zuul CI job that somehow (how?) triggers a run, collects and
> reports results.
>
> Does anyone have opinions on how we should proceed (and whether we
> should at all)?

I wonder if it makes sense to use OpenLab for this - it seems like
exactly the kind of thing they exist for (if, in fact, it is still
maintained): https://docs.openlabtesting.org/

At least I'm still using it for my pet projects :) But I feel like that's the opposite thing: a zuul-based CI to run on metal3 repos.

Dmitry

It looks like somebody started on this 18 months ago but I don't know
what happened next: https://github.com/theopenlab/openlab/issues/260

cheers,
Zane.

--
You received this message because you are subscribed to the Google Groups "Metal3 Development List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metal3-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/fc0737bb-592d-b6e6-4be0-7dae7089887e%40redhat.com.

Julia Kreger

não lida,

7 de out. de 2020, 10:01:0707/10/2020

para Dmitry Tantsur, Zane Bitter, Metal3 Development List

On Wed, Oct 7, 2020 at 4:49 AM Dmitry Tantsur <dtan...@redhat.com> wrote:
>
>
>
> On Tue, Oct 6, 2020 at 8:06 PM Zane Bitter <zbi...@redhat.com> wrote:
>>
>> On 6/10/20 8:49 am, Dmitry Tantsur wrote:
>> > Hi folks,
>> >
>> > I'm wondering if we can somehow run our CI jobs on upstream Ironic
>> > projects. BMO uses Ironic in a somewhat unusual way, so regressions are
>> > not impossible (remember how long it took us to fix all agent token
>> > problems?). We also keep seeing problems with sushy-tools.
>>
>> +1

I think it is a good idea and worthwhile. It is kind of the same
reason we created
bifrost beyond being a tool for operators, but as a way to make sure we didn't
break standalone usage.

I guess the question I have is what would be the lightest weight way
of testing the
BMO and requisite integrations? We're very resource constrained in upstream
OpenDev CI, but multinode jobs may be a possibility? One node running

I guess what might be ideal is cross-project CI so if a job was
defined in the ironic repo,
then another zuul instance might be able to import that configuration
and exec that job
based upon another repository. There is also the github event
integration if memory serves,
but that is likely a better question for the opendev infra folks.

Out of curiosity, is this same discussion taking palace on the Metal3 side?

> To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/CACNgkFxokaDsiKffXhHeTdLc4gKBeaoSnrq2fb_sjgJGq6rYcg%40mail.gmail.com.

Julia Kreger

não lida,

7 de out. de 2020, 10:01:5307/10/2020

para Dmitry Tantsur, Zane Bitter, Metal3 Development List

And of course, my first look at this and think it was
openstack-discuss not metal3-dev. I'll show myself out now :)

Doug Hellmann

não lida,

7 de out. de 2020, 10:31:4707/10/2020

para Julia Kreger, Dmitry Tantsur, Zane Bitter, Metal3 Development List

I don't think it's likely that the metal3 community is going to set up a Zuul instance. We don't have the experience or resources. So, whatever we do either needs to be based on triggering jobs on the existing metal3 Jenkins service, porting a job over to run entirely on the OpenDev Zuul, or some mix.

Zuul already knows what patches are being tested and I presume there are jobs (or parts of jobs) that build the IPA image from those patches. How much of that could be reused to run a metal3 job? Would the same scripts run in a job on the jenkins server, for example? Or would we want Zuul to run some steps itself (taking advantage of the Depends-On feature of Zuul), then trigger a Jenkins job that grabs pre-built artifacts from the Zuul job to use with metal3?

How many job triggers should we expect based on the rate of change in the Ironic repositories we would want to monitor? Do we have the capacity to run those jobs on the metal3 CI setup that we have?

If we had a periodic (nightly?) job, would that be useful, or would it be hidden enough that it would be ignored?

To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/CAF7gwdjgPNO1tL%2B53JfGqoSf8eCiwxn2YF4jOq0a%3DRXH-bGpbw%40mail.gmail.com.

Julia Kreger

não lida,

7 de out. de 2020, 12:26:4607/10/2020

para dhel...@redhat.com, Dmitry Tantsur, Zane Bitter, Metal3 Development List

Well, I think part of the conundrum that makes this infinitely more
difficult is the model of the BMO controls the pods/containers and we
would need to build/override with replacement content, ideally on
every single patch. It would be far easier if we could just point it
at an external ironic. Then all of the existing job build/setup
substrate would be there and we could likely just do this in OpenDev
without much of a pain. Such a capability would entirely govern how
much re-use would even be possible.

For job triggers, we're likely looking in excess of twenty a day.

I don't really believe a nightly job would be useful since it would
only be a sign that something needs to be reverted once a problem has
been identified. The code has already been broken at that point. If
that makes sense.

Doug Hellmann

não lida,

7 de out. de 2020, 12:31:0207/10/2020

para Julia Kreger, Dmitry Tantsur, Zane Bitter, Metal3 Development List

On Wed, Oct 7, 2020 at 12:26 PM Julia Kreger <juliaash...@gmail.com> wrote:

Well, I think part of the conundrum that makes this infinitely more
difficult is the model of the BMO controls the pods/containers and we
would need to build/override with replacement content, ideally on
every single patch. It would be far easier if we could just point it
at an external ironic.

That's possible using some environment variables today. I don't know if the existing CI job supports running it that way, but a new job could configure the baremetal-operator with the Ironic and Inspector endpoints.

Then all of the existing job build/setup
substrate would be there and we could likely just do this in OpenDev
without much of a pain. Such a capability would entirely govern how
much re-use would even be possible.

For job triggers, we're likely looking in excess of twenty a day.

Mael or someone else with more knowledge of the metal3 infrastructure will have to tell us if that would be enough to cause problems.

I don't really believe a nightly job would be useful since it would
only be a sign that something needs to be reverted once a problem has
been identified. The code has already been broken at that point. If
that makes sense.

Strictly, yes. As a compromise between doing nothing and completely consuming the metal3 CI capacity with these jobs, maybe we can figure out a way to make it useful.

Maël Kimmerlin

não lida,

8 de out. de 2020, 01:39:4408/10/2020

para Julia Kreger, dhel...@redhat.com, Dmitry Tantsur, Zane Bitter, Metal3 Development List

Hello,

I really like this idea of running a Metal3 CI for Ironic. In the current state of our infra, I have high doubts that we would be able to provide this with the considered number of runs. In terms of resources on Citycloud (the Openstack provider on which our CI run), we do not have much margin. There are already situations where we reach our quota and some CI runs fail due to resource contention for Metal3 only. We could try to find other resources, maybe via CNCF, or another sponsor for this. In addition, the current infra (and jobs for Metal3) are not always stable.

If we heavily limit the scope of the jobs (to simply deploy a single vm with BMO, and not involve CAPM3 or CAPI, we could have a run in up to 30 minutes, requiring a 4C, 8GB ram, 50GB disk vm as a minimum I would say. I think we could start some design and work on this, put it in Jenkins for now , make it not triggered every time on Ironic for now, and see if/when we reach some capacity issue, to get the whole thing started at least. There is anyways quite some work needed on metal3-dev-env to have a simplified version that would skip anything non-BMO related.

Regarding the trigger, we could also setup an Opendev integration in Jenkins directly, so that the job would either get automatically triggered or triggered by a keyword in comments. We already have a partial integration to Opendev in that Jenkins instance, we could look further there.

Best regards,

Maël

From: metal...@googlegroups.com <metal...@googlegroups.com> on behalf of Doug Hellmann <dhel...@redhat.com>
Sent: 07 October 2020 19:30
To: Julia Kreger <juliaash...@gmail.com>
Cc: Dmitry Tantsur <dtan...@redhat.com>; Zane Bitter <zbi...@redhat.com>; Metal3 Development List <metal...@googlegroups.com>
Subject: Re: [metal3-dev] Metal3 3rd party CI on ironic?

To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/CALs0ddNJVGKJ%2BgdGc4qSqxJHpruMaf%2B0bLELM8racs-HAyBJNA%40mail.gmail.com.

Dmitry Tantsur

não lida,

8 de out. de 2020, 06:49:3108/10/2020

para Maël Kimmerlin, dhel...@redhat.com, Metal3 Development List

Okay, thanks for input.

It seems that it's more likely that we should re-create the same CI job on OpenDev rather than trying to somehow attach the existing Metal3 CI to OpenDev. Could someone point me at the exact way the CI is run?

The conundrum, as Julia rightfully noted, to use a patched version of upstream projects. We have pretty much two options:

1) Update metal3-dev-env to be able to build and use images with ironic/inspector/sushy-tools from source. We'll also need a way to re-build ironic-python-agent images.

2) Set up the CI based on Bifrost, so that Bifrost manages ironic, inspector and virtual VMs, while Metal3 uses it as external ironic.

I'm not sure what is easier and what is more beneficial for us. Any opinions?

Dmitry

Riccardo Pittau

não lida,

8 de out. de 2020, 07:28:4708/10/2020

para Dmitry Tantsur, Maël Kimmerlin, dhel...@redhat.com, Metal3 Development List

Hello everyone,

I was waiting for this part to reply as I knew we were going to talk about source-based images :)

There is currently some work going on to be able to build images using source code directly and integrate patches coming from different sources (mainly gerrit), but it's in early stages.
We also have a good way to rebuild ironic-python-agent ramdisks using mechanisms integrated in the ipa-downloader image, although that doesn't include patching or building from source, so some work is also needed here.

At the moment I believe it would be easier to just use Bifrost and eventually switch (or not) to metal3-dev-env once we have the opportunity to build/patch the code in the images.

Thanks,

Riccardo

To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/CACNgkFwixdGAyRVxL9JgaE7h9zX1-MoY4Vi-JHhBiTQpXqQJfw%40mail.gmail.com.

Doug Hellmann

não lida,

8 de out. de 2020, 07:45:3508/10/2020

para Dmitry Tantsur, Maël Kimmerlin, Metal3 Development List

On Thu, Oct 8, 2020 at 6:49 AM Dmitry Tantsur <dtan...@redhat.com> wrote:

Okay, thanks for input.

It seems that it's more likely that we should re-create the same CI job on OpenDev rather than trying to somehow attach the existing Metal3 CI to OpenDev. Could someone point me at the exact way the CI is run?

The conundrum, as Julia rightfully noted, to use a patched version of upstream projects. We have pretty much two options:
1) Update metal3-dev-env to be able to build and use images with ironic/inspector/sushy-tools from source. We'll also need a way to re-build ironic-python-agent images.
2) Set up the CI based on Bifrost, so that Bifrost manages ironic, inspector and virtual VMs, while Metal3 uses it as external ironic.

As Mael pointed out, we don't really need all of metal3, since the integration point is the baremetal-operator. So, a job based on Bifrost could use the quay.io image to run the baremetal-operator in a small k8s deployment like KIND or minikube and use it to drive Ironic to provision and then deprovision a host and give us a reasonable starting point for test coverage.

Dmitry Tantsur

não lida,

8 de out. de 2020, 09:59:1608/10/2020

para Riccardo Pittau, Metal3 Development List, Doug Hellmann

On Thu, Oct 8, 2020 at 1:28 PM Riccardo Pittau <elfo...@gmail.com> wrote:

Hello everyone,

I was waiting for this part to reply as I knew we were going to talk about source-based images :)

There is currently some work going on to be able to build images using source code directly and integrate patches coming from different sources (mainly gerrit), but it's in early stages.
We also have a good way to rebuild ironic-python-agent ramdisks using mechanisms integrated in the ipa-downloader image, although that doesn't include patching or building from source, so some work is also needed here.

I hoped that you would jump in :) Yeah, this work would probably reduce the task to just installing metal3-dev-env in a zuul job.

At the moment I believe it would be easier to just use Bifrost and eventually switch (or not) to metal3-dev-env once we have the opportunity to build/patch the code in the images.

"Easier" assumes that we do have a way to do it now.

This approach would also prevent us from testing that we're not breaking the configuration that ironic-image/ironic-inspector-image use. I'm fine with it as a temporary solution, assuming it's actually easier.

Does anyone know how to point metal3-dev-env to an external ironic?

Dmitry

Maël Kimmerlin

não lida,

8 de out. de 2020, 11:56:3508/10/2020

para Dmitry Tantsur, Riccardo Pittau, Metal3 Development List, Doug Hellmann

For your question about pointing to an external Ironic, It's not possible out of the box right now, Metal3-dev-env will anyways deploy an ironic instance because we do not have a logic in place to prevent it. But if we have that in place, then it's a matter of modifying the environment variables to set IRONIC_URL and IRONIC_INSPECTOR_URL I think : https://github.com/metal3-io/metal3-dev-env/blob/master/lib/network.sh#L167 .

Best regards,

Maël

metal3-io/metal3-dev-env

Metal³ Development Environment. Contribute to metal3-io/metal3-dev-env development by creating an account on GitHub.

github.com

From: metal...@googlegroups.com <metal...@googlegroups.com> on behalf of Dmitry Tantsur <dtan...@redhat.com>
Sent: 08 October 2020 16:58
To: Riccardo Pittau <elfo...@gmail.com>
Cc: Metal3 Development List <metal...@googlegroups.com>; Doug Hellmann <dhel...@redhat.com>

To view this discussion on the web visit https://groups.google.com/d/msgid/metal3-dev/CACNgkFwg8_%3DuExgxhFGN2oEgJ%3Ds9BiO9r9H8KFmox1k5KHdeuw%40mail.gmail.com.

Dmitry Tantsur

não lida,

15 de out. de 2020, 06:30:0815/10/2020

para Maël Kimmerlin, Riccardo Pittau, Metal3 Development List

Okay, then it may be more beneficial to find a way to build images from an upstream source (or somehow inject it in the images). Then we can run the CI the usual way.

Riccardo, how far are we from ^^^?

Riccardo Pittau

não lida,

19 de out. de 2020, 04:14:1119/10/2020

para Dmitry Tantsur, Maël Kimmerlin, Metal3 Development List

Hey Dmitry,

On Thu, Oct 15, 2020 at 12:30 PM Dmitry Tantsur <dtan...@redhat.com> wrote:

Okay, then it may be more beneficial to find a way to build images from an upstream source (or somehow inject it in the images). Then we can run the CI the usual way.

Riccardo, how far are we from ^^^?

Building 100% from source is not in the scope at the moment, I'm just focusing on applying patches during containers build process.
Although from one to the other it's a small step, excluding configuration, if we really want to go down that road.
To automatize the process, I'm also working on decoupling the build logic in dev-scripts from the workflow, to have a standalone build script.
This will also help in troubleshooting just the build part.
Considering other various things happening this week and the next, I expect to have something up probably beginning of next week, or the week after.

Responder a todos

Responder ao autor

Encaminhar