Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A head buildworld race visible in the ci.freebsd.org build history

2 views
Skip to first unread message

Mark Millard

unread,
Jun 16, 2018, 1:59:03 AM6/16/18
to
In watching ci.freebsd.org builds I've seen a notable
number of one time failures, such as (example from
powerpc64):

--- all_subdir_lib/libufs ---
ranlib -D libufs.a
ranlib: fatal: Failed to open 'libufs.a'
*** [libufs.a] Error code 70

where the next build works despite the change being
irrelevant to whatever ranlib complained about.

Other builds failed similarly:

--- all_subdir_lib/libbsm ---
ranlib -D libbsm_p.a
ranlib: fatal: Failed to open 'libbsm_p.a'
*** [libbsm_p.a] Error code 70

and:

--- kerberos5/lib__L ---
ranlib -D libgssapi_spnego_p.a
--- libgssapi_spnego.a ---
ranlib -D libgssapi_spnego.a
--- libgssapi_spnego_p.a ---
ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
*** [libgssapi_spnego_p.a] Error code 70

and so on.


It is not limited to powerpc64. For example, for aarch64
there are:

--- libpam_exec.a ---
building static pam_exec library
ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q`
ranlib -D libpam_exec.a
ranlib: fatal: Failed to open 'libpam_exec.a'
*** [libpam_exec.a] Error code 70

and:

--- all_subdir_lib/libusb ---
ranlib -D libusb.a
ranlib: fatal: Failed to open 'libusb.a'
*** [libusb.a] Error code 70

and:

--- all_subdir_lib/libbsnmp ---
ranlib: fatal: Failed to open 'libbsnmp.a'
--- all_subdir_lib/ncurses ---
--- all_subdir_lib/ncurses/panelw ---
--- panel.pico ---
--- all_subdir_lib/libbsnmp ---
*** [libbsnmp.a] Error code 70


Even amd64 gets such:

--- libpcap.a ---
ranlib -D libpcap.a
ranlib: fatal: Failed to open 'libpcap.a'
*** [libpcap.a] Error code 70

and:


--- libkafs5.a ---
ranlib: fatal: Failed to open 'libkafs5.a'
--- libkafs5_p.a ---
ranlib: fatal: Failed to open 'libkafs5_p.a'
--- cddl/lib__L ---
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
--- kerberos5/lib__L ---
*** [libkafs5_p.a] Error code 70

make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
--- libkafs5.a ---
*** [libkafs5.a] Error code 70

and:


--- lib__L ---
ranlib -D libclang_rt.asan_cxx-i386.a
ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
*** [libclang_rt.asan_cxx-i386.a] Error code 70


(Notice the variability in what .a the ranlib's fail for.)





===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
freebsd...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Bryan Drewery

unread,
Jun 18, 2018, 3:46:41 PM6/18/18
to
I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.


--
Regards,
Bryan Drewery

Konstantin Belousov

unread,
Jun 18, 2018, 4:49:46 PM6/18/18
to
FWIW, I got the similar failure when I did last checks for the OFED
commit. For me, it was libgcc.a.

Mark Millard

unread,
Jun 18, 2018, 5:07:51 PM6/18/18
to
Looking at a bunch of the failures, spanning multiple
FreeBSD-head-*-build types of builds, I see only:

NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
NODE_NAME butler1.nyi.freebsd.org

for the failures that I looked at.

So your "on that system" might well be correct.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Bryan Drewery

unread,
Jun 18, 2018, 6:31:55 PM6/18/18
to
If it was -lgcc_s then it's a known rare build race due to
tools/install.sh not handling -S.

--
Regards,
Bryan Drewery

signature.asc

Li-Wen Hsu

unread,
Jun 18, 2018, 6:35:13 PM6/18/18
to
It seems a more general problem, this one:

https://ci.freebsd.org/job/FreeBSD-head-aarch64-build/8190/console

calls for libcuse_p.a, while this one:

https://ci.freebsd.org/job/FreeBSD-head-mips-build/2919/console

calls for libfifolog.a

--
Li-Wen Hsu <lw...@FreeBSD.org>
https://lwhsu.org

Li-Wen Hsu

unread,
Jun 18, 2018, 6:37:16 PM6/18/18
to
Thanks for the insight, the build is done in a 11.1-R jail on a
-CURRENT host. butler1.nyi is running r333388 (as a canary) while
other builders are mostly running r328278. I upgraded few others and
it seems can reproduce the issue, and now I downgraded all the build
slaves to r328278 before we find the root cause.

Li-Wen

--
Li-Wen Hsu <lw...@FreeBSD.org>
https://lwhsu.org

Bryan Drewery

unread,
Jun 18, 2018, 6:40:03 PM6/18/18
to
Well why is ar -> ranlib so special? Nothing else is failing.
What filesystem are these using for objdirs?
What revision is the host kernel?

--
Regards,
Bryan Drewery

signature.asc

Bryan Drewery

unread,
Jun 18, 2018, 7:12:56 PM6/18/18
to
On 6/18/2018 3:27 PM, Li-Wen Hsu wrote:
> ranlib -D libpcap.a
> ranlib: fatal: Failed to open 'libpcap.a'

Where is this error even coming from? It's not in the usr.bin/ar code
and ranlib does not cause it.

# ranlib -D uh
ranlib: warning: uh: no such file



--
Regards,
Bryan Drewery

signature.asc

Bryan Drewery

unread,
Jun 18, 2018, 7:33:18 PM6/18/18
to
On 6/18/2018 3:27 PM, Li-Wen Hsu wrote:
The error is coming from libarchive which had a change between those
revisions:

> ------------------------------------------------------------------------
> r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines
>
> MFV r328323,328324:
> Sync libarchive with vendor.
>
> Relevant vendor changes:
> PR #893: delete dead ppmd7 alloc callbacks
> PR #904: Fix archive freeing bug in bsdcat
> PR #961: Fix ZIP format names
> PR #962: Don't modify attributes for existing directories
> when ARCHIVE_EXTRACT_NO_OVERWRITE is set
> PR #964: Fix -Werror=implicit-fallthrough= for GCC 7
> PR #970: zip: Allow backslash as path separator
>
> MFC after: 1 week
>
> ------------------------------------------------------------------------

Nothing obvious stands out in the change to me though from a brief look.


--
Regards,
Bryan Drewery

signature.asc

Ed Maste

unread,
Jun 18, 2018, 8:40:30 PM6/18/18
to
On 18 June 2018 at 19:29, Bryan Drewery <bdre...@freebsd.org> wrote:
>
> The error is coming from libarchive which had a change between those
> revisions:
>
>> ------------------------------------------------------------------------
>> r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines

Li-Wen reported that the build is done in a 11.1-rel jail though, so
the libarchive (or any userland) change shouldn't be responsible.

Can we update a canary builder to somewhere between r328278 and r333388?

Mark Millard

unread,
Jun 18, 2018, 9:17:43 PM6/18/18
to


On 2018-Jun-18, at 4:08 PM, Bryan Drewery <bdrewery at FreeBSD.org> wrote:

> On 6/18/2018 3:27 PM, Li-Wen Hsu wrote:
>> ranlib -D libpcap.a
>> ranlib: fatal: Failed to open 'libpcap.a'
>
> Where is this error even coming from? It's not in the usr.bin/ar code
> and ranlib does not cause it.
>
> # ranlib -D uh
> ranlib: warning: uh: no such file

A more complete sequence is (with some
other text mixed in, as in where I got
the text from on ci.freebsd.org):

--- libvgl.a ---
building static vgl library
ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o mouse.o keyboard.o | tsort -q`
--- all_subdir_lib/libsysdecode ---
ranlib -D libsysdecode.a
--- all_subdir_lib/libvgl ---
ranlib -D libvgl.a
ranlib: fatal: Failed to open 'libvgl.a'
--- all_subdir_lib/libsysdecode ---
ranlib: fatal: Failed to open 'libsysdecode.a'
--- all_subdir_lib/libvgl ---
*** [libvgl.a] Error code 70

So, in essence,

ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o mouse.o keyboard.o | tsort -q`
ranlib -D libvgl.a
ranlib: fatal: Failed to open 'libvgl.a'

It is not obvious to me that the "Failed to open" means
that there was "no such file". Might there be some other
form of "Failed to open" for a file that does exist from
the ar at least having created its output .a file?


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Mark Millard

unread,
Jun 18, 2018, 9:51:09 PM6/18/18
to
Also, if what varies is the head system version (for failing vs.
working) and what is the same is running a 11.1R jail, then it
would seem to be the underlying head system software in each
that matters for the ar -> ranlib sequence behavior, but not
11.1R's ar or ranlib or 11.1R's libraries indirectly involved
--nor in head's ar or ranlib (or their indirections). head's:
unused.

The only parts of head that could be involved are parts that the
11.1R jail does not avoid.

This suggests more basic infrastructure in head to me.

Li-Wen Hsu

unread,
Jun 19, 2018, 11:07:09 AM6/19/18
to
On Mon, Jun 18, 2018 at 8:36 PM Ed Maste <ema...@freebsd.org> wrote:
> Li-Wen reported that the build is done in a 11.1-rel jail though, so
> the libarchive (or any userland) change shouldn't be responsible.
>
> Can we update a canary builder to somewhere between r328278 and r333388?

butler1.nyi.freebsd.org is running r331373 now.

--
Li-Wen Hsu <lw...@FreeBSD.org>
https://lwhsu.org

Mark Millard

unread,
Jun 19, 2018, 9:27:44 PM6/19/18
to


On 2018-Jun-19, at 8:02 AM, Li-Wen Hsu <lwhsu at freebsd.org> wrote:

> On Mon, Jun 18, 2018 at 8:36 PM Ed Maste <emaste at freebsd.org> wrote:
>> Li-Wen reported that the build is done in a 11.1-rel jail though, so
>> the libarchive (or any userland) change shouldn't be responsible.
>>
>> Can we update a canary builder to somewhere between r328278 and r333388?
>
> butler1.nyi.freebsd.org is running r331373 now.


But there seems to be another of the ar -> ranlib failures
after that on butler1.nyi.freebsd.org :

https://ci.freebsd.org/job/FreeBSD-head-powerpc-build/6321/ shows:

22:12:05
--- _bootstrap-tools-lib/liby ---

22:12:05
ranlib -D liby.a

22:12:05
ranlib: fatal: Failed to open 'liby.a'

22:12:05
*** [liby.a] Error code 70


with:

NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
NODE_NAME butler1.nyi.freebsd.org



And in fact there is at least one more:

https://ci.freebsd.org/job/FreeBSD-head-sparc64-build/8291/consoleText

shows:

--- all_subdir_lib/libipsec ---
ranlib -D libipsec_p.a
ranlib: fatal: Failed to open 'libipsec_p.a'
*** [libipsec_p.a] Error code 70



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Li-Wen Hsu

unread,
Jun 20, 2018, 12:18:25 AM6/20/18
to
On Tue, Jun 19, 2018 at 9:24 PM Mark Millard <mar...@yahoo.com> wrote:
>
> On 2018-Jun-19, at 8:02 AM, Li-Wen Hsu <lwhsu at freebsd.org> wrote:
>
> > On Mon, Jun 18, 2018 at 8:36 PM Ed Maste <emaste at freebsd.org> wrote:
> >> Li-Wen reported that the build is done in a 11.1-rel jail though, so
> >> the libarchive (or any userland) change shouldn't be responsible.
> >>
> >> Can we update a canary builder to somewhere between r328278 and r333388?
> >
> > butler1.nyi.freebsd.org is running r331373 now.
>
>
> But there seems to be another of the ar -> ranlib failures
> after that on butler1.nyi.freebsd.org :

Yes I was trying to narrow down the cause, now it seems between
r328278 and r330304.

butler1.nyi.freebsd.org is back to run r328278. And I'll try to
reproduce this in elsewhere.

--
Li-Wen Hsu <lw...@FreeBSD.org>
https://lwhsu.org

Mark Millard

unread,
Jun 20, 2018, 1:58:38 AM6/20/18
to


On 2018-Jun-19, at 9:14 PM, Li-Wen Hsu <lwhsu at freebsd.org> wrote:

> On Tue, Jun 19, 2018 at 9:24 PM Mark Millard <marklmi at yahoo.com> wrote:
>>
>> On 2018-Jun-19, at 8:02 AM, Li-Wen Hsu <lwhsu at freebsd.org> wrote:
>>
>>> On Mon, Jun 18, 2018 at 8:36 PM Ed Maste <emaste at freebsd.org> wrote:
>>>> Li-Wen reported that the build is done in a 11.1-rel jail though, so
>>>> the libarchive (or any userland) change shouldn't be responsible.
>>>>
>>>> Can we update a canary builder to somewhere between r328278 and r333388?
>>>
>>> butler1.nyi.freebsd.org is running r331373 now.
>>
>>
>> But there seems to be another of the ar -> ranlib failures
>> after that on butler1.nyi.freebsd.org :
>
> Yes I was trying to narrow down the cause, now it seems between
> r328278 and r330304.
>
> butler1.nyi.freebsd.org is back to run r328278. And I'll try to
> reproduce this in elsewhere.

Okay. Then I'll quit looking to report which way butler1.nyi.freebsd.org
is working (implicitly: search direction information).

I will report if I see any new examples. (Seems unlikely.)


Side note . . .

It took me a while to find what to look to find the head version
and jail version involved. For what I reported (powerpc):

22:12:03 uname:
22:12:03 FreeBSD FreeBSD-head-powerpc-build.jail.ci.FreeBSD.org 11.1-RELEASE FreeBSD 12.0-CURRENT #0 r330304M: Sat Mar 3 02:23:02 UTC 2018 pe...@build-12.freebsd.org:/usr/obj/usr/src/sys/CLUSTER12 amd64

Now I know.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

Mark Millard

unread,
Jun 21, 2018, 5:53:05 PM6/21/18
to
On 2018-Jun-19, at 9:14 PM, Li-Wen Hsu <lwhsu at freebsd.org> wrote:

> On Tue, Jun 19, 2018 at 9:24 PM Mark Millard <marklmi at yahoo.com> wrote:
>>
>> On 2018-Jun-19, at 8:02 AM, Li-Wen Hsu <lwhsu at freebsd.org> wrote:
>>
>>> On Mon, Jun 18, 2018 at 8:36 PM Ed Maste <emaste at freebsd.org> wrote:
>>>> Li-Wen reported that the build is done in a 11.1-rel jail though, so
>>>> the libarchive (or any userland) change shouldn't be responsible.
>>>>
>>>> Can we update a canary builder to somewhere between r328278 and r333388?
>>>
>>> butler1.nyi.freebsd.org is running r331373 now.
>>
>>
>> But there seems to be another of the ar -> ranlib failures
>> after that on butler1.nyi.freebsd.org :
>
> Yes I was trying to narrow down the cause, now it seems between
> r328278 and r330304.
>
> butler1.nyi.freebsd.org is back to run r328278. And I'll try to
> reproduce this in elsewhere.

Has the range r328278 < PROBLEM_START <= r330304 been narrowed down
some more?

(I'm just curious were the problem started.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

0 new messages