gyp breaking ios ninja builds

17 views
Skip to first unread message

Torne (Richard Coles)

unread,
Apr 29, 2014, 12:35:56 PM4/29/14
to gyp-developer
I just tried to roll gyp to r1907 in Chromium and I had to revert this because it caused a build failure on the ios ninja bot: http://build.chromium.org/p/chromium.mac/builders/iOS%20Device%20%28ninja%29/builds/767/steps/compile/logs/stdio

ninja:error: '../../base/test/data/autofill/merge/input/ambiguous.in', needed by 'base_unittests.app/base/test/data/file_util/binary_file.bin', missing and no known rule to make it

My gyp change r1907 was Android-only so can't be causing this breakage; can someone investigate and find which gyp change is to blame? I don't have a mac or the time to do this :/

--
Torne (Richard Coles)
to...@google.com

Torne (Richard Coles)

unread,
Apr 30, 2014, 7:04:17 AM4/30/14
to gyp-developer, Nico Weber
+thakis specifically

Nico, it looks like all the changes between 1895 and 1907 that aren't specific to android or msvs were landed by you, though I realise you didn't write all those patches.. can you help me try and figure out what might be causing this problem?

Nico Weber

unread,
Apr 30, 2014, 11:37:19 AM4/30/14
to Torne (Richard Coles), Daniel Bratell, gyp-developer
Daniel, can you debug this please?

Nico Weber

unread,
May 1, 2014, 3:35:55 PM5/1/14
to Torne (Richard Coles), Daniel Bratell, gyp-developer
bratell: ping. (This is why I asked you to roll gyp in small batches
as things land :-/)

Daniel Bratell

unread,
May 2, 2014, 4:46:58 AM5/2/14
to Torne (Richard Coles), Nico Weber, gyp-developer
On Fri, 02 May 2014 09:23:24 +0200, Daniel Bratell <bra...@opera.com>
wrote:

> On Wed, 30 Apr 2014 17:37:19 +0200, Nico Weber <tha...@chromium.org>
> wrote:
>
>> Daniel, can you debug this please?
>
> Yup, will look at it immediately. Sorry for not checking earlier (public
> holiday :p)

So far I know:

The problem is triggered in 1900 (which is what trunk is currently using)
but not by 1895. The changes there are:

1896: Potentially change the number of parallel threads
1897: Micro optimzed "IsPathSections"
1898: Adding a set() to speed up FlattenToList
1899: Using a set() to speed up DeepDependencies
1900: Cached results of compile()

Nothing obvious.

1896 might change timing if there is a build system race somewhere.

1897 is a local change. I've re-read it several times without seeing what
it could be.

1898 and 1899 were in 1905 rewritten to use a helper class and the
problems didn't change so unlikely to have caused this.

1900 is caching the AST so that a python expression doesn't have to be
re-compiled all the time. This one depends on the cached AST actually
being just that and not mutating as it's being used.

It's all complicated by the fact that only a single builder (chromium.mac
/ iOS Device (ninja)) seems affected and it's one that I cannot easily
trigger tests at. tryserver.chromium / ios_rel_device_ninja which is the
closest one I found has no problem.

Is there anyone that know anything about chromium.mac / iOS Device (ninja)
that might help me debug further?

/Daniel

Torne (Richard Coles)

unread,
May 2, 2014, 6:33:44 AM5/2/14
to Daniel Bratell, Nico Weber, gyp-developer
I haven't looked at the actual code changes so your analysis is probably better, but my guesses would have been 1898 or 1899 since they touch dep handling.

iOS doesn't build blink, which means its dependency graph looks *very* different to the other platforms, and so I can imagine it being affected differently/worse if dependency handling is busted.

Daniel Bratell

unread,
May 2, 2014, 7:40:31 AM5/2/14
to Torne (Richard Coles), Nico Weber, gyp-developer
On Fri, 02 May 2014 10:46:58 +0200, Daniel Bratell <bra...@opera.com>
wrote:

> Is there anyone that know anything about chromium.mac / iOS Device
> (ninja) that might help me debug further?

It seems to go wrong in build/copy_test_data_ios.gypi because it executes
"copy_test_data_ios.py --input test/data" in two different directories. If
those invocations happen in the same process then the command cache will
use the first results also in the second run.

So the cause is most likely the patch that changes gyp to use as many
processes as there are cores (r1896). That patch seems to have changed so
that those two invocations ended up in the same process much more often
(in the "iOS Device (ninja)" bot 100% of the time it seems).

Good news, this is a known bug:
http://code.google.com/p/gyp/issues/detail?id=112
with a proposed patch
https://codereview.chromium.org/225783006/

Bad news: That patch has deadlocked.

Short term we could try reverting 1896. The root problem will still be
there but it should no longer reproduce 100% of the time on a bot.

/Daniel
Reply all
Reply to author
Forward
0 new messages