Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#1432 (master - 0aceafc)

8 views
Skip to first unread message

Nick Desaulniers

unread,
May 19, 2020, 8:56:44 PM5/19/20
to Michael Ellerman, linuxppc-dev, clang-built-linux
Looks like our CI is still red from this:

https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584

Filing a bug to follow up on:
https://github.com/ClangBuiltLinux/linux/issues/1031

On Thu, May 7, 2020 at 8:29 PM Michael Ellerman <m...@ellerman.id.au> wrote:
>
> Nick Desaulniers <ndesau...@google.com> writes:
> > Looks like ppc64le powernv_defconfig is suddenly failing the locking
> > torture tests, then locks up?
> > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167
> > Any recent changes related here in -next? I believe this is the first
> > failure, so I'll report back if we see this again.
>
> Thanks for the report.
>
> There's nothing newly in next-20200507 that seems related.
>
> Odd that it just showed up.
>
> cheers
>
>
> > ---------- Forwarded message ---------
> > From: Travis CI <bui...@travis-ci.com>
> > Date: Thu, May 7, 2020 at 9:40 AM
> > Subject: [CRON] Broken: ClangBuiltLinux/continuous-integration#1432 (master
> > - 0aceafc)
> > To: <ndesau...@google.com>, <natecha...@gmail.com>
> >
> >
> > ClangBuiltLinux
> >
> > /
> >
> > continuous-integration
> > <https://travis-ci.com/github/ClangBuiltLinux/continuous-integration?utm_medium=notification&utm_source=email>
> >
> > [image: branch icon]master
> > <https://github.com/ClangBuiltLinux/continuous-integration/tree/master>
> > [image: build has failed]
> > Build #1432 was broken
> > <https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/164415390?utm_medium=notification&utm_source=email>
> > [image: arrow to build time]
> > [image: clock icon]7 hrs, 0 mins, and 54 secs
> >
> > [image: Nick Desaulniers avatar]Nick Desaulniers
> > 0aceafc CHANGESET →
> > <https://github.com/ClangBuiltLinux/continuous-integration/compare/877d002bdcfe6bc5cb0255c3c39192e8175e2c19...0aceafcfcca7c4a095957efae0939a612d755077>
> >
> > Merge pull request #182 from ClangBuiltLinux/i386
> >
> > i386
> >
> > Want to know about upcoming build environment updates?
> >
> > Would you like to stay up-to-date with the upcoming Travis CI build
> > environment updates? We set up a mailing list for you!
> > SIGN UP HERE <http://eepurl.com/9OCsP>
> >
> > [image: book icon]
> >
> > Documentation <https://docs.travis-ci.com/> about Travis CI
> > Have any questions? We're here to help. <sup...@travis-ci.com>
> > Unsubscribe
> > <https://travis-ci.com/account/preferences/unsubscribe?repository=6718752&utm_medium=notification&utm_source=email>
> > from build emails from the ClangBuiltLinux/continuous-integration
> > repository.
> > To unsubscribe from *all* build emails, please update your settings
> > <https://travis-ci.com/account/preferences/unsubscribe?utm_medium=notification&utm_source=email>.
> >
> > [image: black and white travis ci logo] <https://travis-ci.com>
> >
> > Travis CI GmbH, Rigaer Str. 8, 10427 Berlin, Germany | GF/CEO: Randy Jacops
> > | Contact: con...@travis-ci.com | Amtsgericht Charlottenburg, Berlin, HRB
> > 140133 B | Umsatzsteuer-ID gemäß §27 a Umsatzsteuergesetz: DE282002648
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers



--
Thanks,
~Nick Desaulniers

Nathan Chancellor

unread,
May 19, 2020, 9:01:51 PM5/19/20
to Nick Desaulniers, Michael Ellerman, linuxppc-dev, clang-built-linux
This is probably still a manifestation of
https://github.com/ClangBuiltLinux/continuous-integration/issues/262
because rekicking the tests usually fixes it.

We should probably just disable the torture tests like we do for x86_64
for CI because we do not have access to QEMU 5.0.0 where this should be
fixed. I believe it is slated for 4.2.1 as well but we still have to
wait for that to be updated and packaged in Ubuntu.

Relevant threads:

https://lore.kernel.org/linuxppc-dev/20200410205932.GA880@ubuntu-s3-xlarge-x86/

https://lore.kernel.org/qemu-devel/20200414111131....@gmail.com/

Cheers,
Nathan

Michael Ellerman

unread,
May 21, 2020, 8:59:58 AM5/21/20
to Nathan Chancellor, Nick Desaulniers, linuxppc-dev, clang-built-linux
Nathan Chancellor <natecha...@gmail.com> writes:
> On Tue, May 19, 2020 at 05:56:32PM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
>> Looks like our CI is still red from this:
>>
>> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584
>>
>> Filing a bug to follow up on:
>> https://github.com/ClangBuiltLinux/linux/issues/1031
>>
>> On Thu, May 7, 2020 at 8:29 PM Michael Ellerman <m...@ellerman.id.au> wrote:
>> >
>> > Nick Desaulniers <ndesau...@google.com> writes:
>> > > Looks like ppc64le powernv_defconfig is suddenly failing the locking
>> > > torture tests, then locks up?
>> > > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167
>> > > Any recent changes related here in -next? I believe this is the first
>> > > failure, so I'll report back if we see this again.
>> >
>> > Thanks for the report.
>> >
>> > There's nothing newly in next-20200507 that seems related.
...
>
> This is probably still a manifestation of
> https://github.com/ClangBuiltLinux/continuous-integration/issues/262
> because rekicking the tests usually fixes it.

Oh yep.

I was looking at the RCU warning, which I still don't understand, but
the lockup is presumably the same problem you hit with interrupts being
lost.

> We should probably just disable the torture tests like we do for x86_64
> for CI because we do not have access to QEMU 5.0.0 where this should be
> fixed. I believe it is slated for 4.2.1 as well but we still have to
> wait for that to be updated and packaged in Ubuntu.

You just need to start building Qemu HEAD as part of your CI ;)

cheers

Nick Desaulniers

unread,
May 21, 2020, 6:23:25 PM5/21/20
to Michael Ellerman, Nathan Chancellor, linuxppc-dev, clang-built-linux
On Thu, May 21, 2020 at 6:00 AM Michael Ellerman <m...@ellerman.id.au> wrote:
>
> Nathan Chancellor <natecha...@gmail.com> writes:
> > On Tue, May 19, 2020 at 05:56:32PM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
> >> Looks like our CI is still red from this:
> >>
> >> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584
> >>
> >> Filing a bug to follow up on:
> >> https://github.com/ClangBuiltLinux/linux/issues/1031
> >>
> >> On Thu, May 7, 2020 at 8:29 PM Michael Ellerman <m...@ellerman.id.au> wrote:
> >> >
> >> > Nick Desaulniers <ndesau...@google.com> writes:
> >> > > Looks like ppc64le powernv_defconfig is suddenly failing the locking
> >> > > torture tests, then locks up?
> >> > > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167
> >> > > Any recent changes related here in -next? I believe this is the first
> >> > > failure, so I'll report back if we see this again.
> >> >
> >> > Thanks for the report.
> >> >
> >> > There's nothing newly in next-20200507 that seems related.
> ...
> >
> > This is probably still a manifestation of
> > https://github.com/ClangBuiltLinux/continuous-integration/issues/262
> > because rekicking the tests usually fixes it.

I thought we had upgraded our version of QEMU in response to this already?
https://github.com/ClangBuiltLinux/dockerimage/pull/44
https://github.com/ClangBuiltLinux/dockerimage/pull/46

>
> Oh yep.
>
> I was looking at the RCU warning, which I still don't understand, but
> the lockup is presumably the same problem you hit with interrupts being
> lost.
>
> > We should probably just disable the torture tests like we do for x86_64
> > for CI because we do not have access to QEMU 5.0.0 where this should be
> > fixed. I believe it is slated for 4.2.1 as well but we still have to
> > wait for that to be updated and packaged in Ubuntu.
>
> You just need to start building Qemu HEAD as part of your CI ;)

LOL
https://github.com/ClangBuiltLinux/dockerimage/pull/46#pullrequestreview-395639442
Yeah I think the hard part for all these dependendencies is the risk
of living on the edge of "top of tree" for all of them, and trying to
control for some by using stable releases. May not always be
possible.
--
Thanks,
~Nick Desaulniers

Nathan Chancellor

unread,
May 21, 2020, 6:59:38 PM5/21/20
to Nick Desaulniers, Michael Ellerman, linuxppc-dev, clang-built-linux
On Thu, May 21, 2020 at 03:23:11PM -0700, Nick Desaulniers wrote:
> On Thu, May 21, 2020 at 6:00 AM Michael Ellerman <m...@ellerman.id.au> wrote:
> >
> > Nathan Chancellor <natecha...@gmail.com> writes:
> > > On Tue, May 19, 2020 at 05:56:32PM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
> > >> Looks like our CI is still red from this:
> > >>
> > >> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584
> > >>
> > >> Filing a bug to follow up on:
> > >> https://github.com/ClangBuiltLinux/linux/issues/1031
> > >>
> > >> On Thu, May 7, 2020 at 8:29 PM Michael Ellerman <m...@ellerman.id.au> wrote:
> > >> >
> > >> > Nick Desaulniers <ndesau...@google.com> writes:
> > >> > > Looks like ppc64le powernv_defconfig is suddenly failing the locking
> > >> > > torture tests, then locks up?
> > >> > > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167
> > >> > > Any recent changes related here in -next? I believe this is the first
> > >> > > failure, so I'll report back if we see this again.
> > >> >
> > >> > Thanks for the report.
> > >> >
> > >> > There's nothing newly in next-20200507 that seems related.
> > ...
> > >
> > > This is probably still a manifestation of
> > > https://github.com/ClangBuiltLinux/continuous-integration/issues/262
> > > because rekicking the tests usually fixes it.
>
> I thought we had upgraded our version of QEMU in response to this already?
> https://github.com/ClangBuiltLinux/dockerimage/pull/44
> https://github.com/ClangBuiltLinux/dockerimage/pull/46

That was more of a bandaid than an actual fix. It happens a lot less
often with QEMU 4.2.0 but I could still reproduce that hang very
sparingly with the POWER9 machines on it. My machines are way more
powerful than the ones on Travis, which I am sure factors into that.
the hang with the POWER9 machines very sparingly with QEMU 4.2.0 but

The real solution is to upgrade to QEMU 5.0.0, which we could probably
do via a PPA (or through our Docker image), or wait for QEMU 4.2.1,
which should hopefully have that fix since it was CC'd for QEMU stable.

> >
> > Oh yep.
> >
> > I was looking at the RCU warning, which I still don't understand, but
> > the lockup is presumably the same problem you hit with interrupts being
> > lost.
> >
> > > We should probably just disable the torture tests like we do for x86_64
> > > for CI because we do not have access to QEMU 5.0.0 where this should be
> > > fixed. I believe it is slated for 4.2.1 as well but we still have to
> > > wait for that to be updated and packaged in Ubuntu.
> >
> > You just need to start building Qemu HEAD as part of your CI ;)
>
> LOL
> https://github.com/ClangBuiltLinux/dockerimage/pull/46#pullrequestreview-395639442
> Yeah I think the hard part for all these dependendencies is the risk
> of living on the edge of "top of tree" for all of them, and trying to
> control for some by using stable releases. May not always be
> possible.

Unfortunately, we are at the mercy of a bunch of different parties. If
only we had a ClangBuiltLinux build server that we maintained...

Cheers,
Nathan
Reply all
Reply to author
Forward
0 new messages