"gn help" just hangs, no output

331 views
Skip to first unread message

jack....@gmail.com

unread,
Sep 21, 2016, 5:58:12 PM9/21/16
to gn-dev
I ran "fetch --no-history chromium". It completed successfully. However when I now run "gn gen out/Default" (or "gn help" or "buildtools/linux64/gn help") it just hangs, there's no output.
I'm running Debian Stretch.
Any idea what might be going on, or steps to figure out what's going on?
I thought maybe I need to build GN from source, but from tools/gn it looks like I need GN to build GN?

Dirk Pranke

unread,
Sep 21, 2016, 6:02:04 PM9/21/16
to jack....@gmail.com, gn-dev
Are you sure you're getting the right gn binary (the one in buildtools/linux64)?

You can build your own binary with the //tools/gn/bootstrap/bootstrap.py script, though you do need a functional ninja binary first.

-- Dirk

jack....@gmail.com

unread,
Sep 21, 2016, 6:44:56 PM9/21/16
to gn-dev, jack....@gmail.com

Thanks, I'm sure, because I printed gn_path in depot_tools/gn.py and because I tried "buildtools/linux64/gn help" explicitly. It just hung in exactly the same way.

jack....@gmail.com

unread,
Sep 21, 2016, 6:51:57 PM9/21/16
to gn-dev, jack....@gmail.com
On Wednesday, September 21, 2016 at 3:02:04 PM UTC-7, Dirk Pranke wrote:

Thanks for the tip about bootstrapping GN. Unfortunately this is the output:

> $ tools/gn/bootstrap/bootstrap.py
> Building gn manually in a temporary directory for bootstrapping...
> ninja: Entering directory `/tmp/tmp1CmrVG'
> ninja: error: '/chromium/src/base/memory/ref_counted.cc', needed by 'base/memory/ref_counted.o', missing and no known rule to make it
> Command '['ninja', '-C', '/tmp/tmp1CmrVG', 'gn']' returned non-zero exit status 1
> $

I think this means I have a functional ninja binary? What am I missing?

Dirk Pranke

unread,
Sep 21, 2016, 7:07:02 PM9/21/16
to Jack Bates, gn-dev
That means that the bootstrap.py script is broken and out of date :) Delete line 400 from bootstrap.py and try again?

-- Dirk

jack....@gmail.com

unread,
Sep 21, 2016, 7:27:42 PM9/21/16
to gn-dev, jack....@gmail.com
That did it, thanks! But the result still just hangs without any output:

> $ tools/gn/bootstrap/bootstrap.py
> Building gn manually in a temporary directory for bootstrapping...
> ninja: Entering directory `/tmp/tmpF_WroE'
> [328/328] LINK gn
> Building gn using itself to out/Release...
> Done. Made 5570 targets from 1161 files in 2475ms
> ninja: Entering directory `/chromium/src/out/Release'
> [478/478] LINK ./gn
> $ out/Release/gn help
> ^C
> $

Any more ideas what might be going on? Anything else I can try? Do I need to start digging into the GN source?

Scott Graham

unread,
Sep 21, 2016, 7:29:34 PM9/21/16
to jack....@gmail.com, gn-dev
Maybe pastebin the output of `strace gn help` somewhere?

jack....@gmail.com

unread,
Sep 22, 2016, 11:15:07 AM9/22/16
to gn-dev, jack....@gmail.com
Here is the strace output: http://nottheoilrig.com/strace

Scott Graham

unread,
Sep 22, 2016, 1:51:02 PM9/22/16
to Jack Bates, gn-dev
I _think_ the next thing that should be happening when yours decided to go to sleep is trying to open /dev/urandom. Maybe there's some reason that might be blocked or otherwise problematic on your system?

I also see something about querying /proc/cpuinfo that could do a sleep https://cs.chromium.org/chromium/src/third_party/tcmalloc/vendor/src/base/sysinfo.cc?rcl=0&l=323, maybe that's not available for some reason?

(Shooting in the dark here, maybe someone else has a better idea.)

jack....@gmail.com

unread,
Sep 22, 2016, 3:30:46 PM9/22/16
to gn-dev, jack....@gmail.com
Well "cat /dev/urandom" and "cat /proc/cpuinfo" both seem to work fine.
I tried to step through the program, but it hangs before main():
> $ gdb -ex 'b main' -ex r --args out/Debug/gn help
> GNU gdb (Debian 7.11.1-2) 7.11.1
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from out/Debug/gn...done.
> Breakpoint 1 at 0x412256: file ../../tools/gn/gn_main.cc, line 40.
> Starting program: /chromium/src/out/Debug/gn help
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

There's no output after this point.
It looks to me like it sets the breakpoint in gn_main.cc but hangs before reaching it? How is that possible? Isn't main() the first thing that gets executed?

Dirk Pranke

unread,
Sep 22, 2016, 3:39:50 PM9/22/16
to Jack Bates, gn-dev
The runtime loader and some setup code in libc both execute before the first line of main does. Are you building using the version of clang that chromium normally uses? Do you have any GN args set?

I'm suspecting you're hitting a bug in Stretch; I don't know if I've heard anyone else running this OS. You might also try building without the sysroot and see if that helps.

-- Dirk

jack....@gmail.com

unread,
Sep 22, 2016, 3:47:46 PM9/22/16
to gn-dev, jack....@gmail.com
Thanks, I built as follows:

> $ tools/gn/bootstrap/bootstrap.py -d

Stretch is just the name of the next Debian release (Jessie +1).

Peter Mayo

unread,
Sep 22, 2016, 3:52:56 PM9/22/16
to jack....@gmail.com, gn-dev
Just a suggestion, feel free to ignore, but can you compile hello_world.cc and make sure that runs?  It would provide a bunch of validation of the environment in which you are building and running.

Peter.

jack....@gmail.com

unread,
Sep 22, 2016, 4:05:46 PM9/22/16
to gn-dev, jack....@gmail.com
It's weird that the "Building gn using itself to out/Debug..." step succeeds:

> $ tools/gn/bootstrap/bootstrap.py -d
> Building gn manually in a temporary directory for bootstrapping...
> ninja: Entering directory `/tmp/tmpGnEMBR'
> [328/328] LINK gn
> Building gn using itself to out/Debug...
> Done. Made 5526 targets from 1155 files in 13003ms
> ninja: Entering directory `/chromium/src/out/Debug'
> [478/478] LINK ./gn
> $

Doesn't this mean that the following doesn't hang?

> cmd = [temp_gn, 'gen', build_dir, '--args=%s' % gn_gen_args]

(build_gn_with_gn() in bootstrap.py)

What's the difference between the intermediate executable in /tmp/tmpGnEMBR and the final one in out/Debug?

Nico Weber

unread,
Sep 22, 2016, 4:13:37 PM9/22/16
to jack....@gmail.com, gn-dev
If you say just `gn` you probably get the wrapper from depot_tools. Check `which gn`, `file $(which gn)`, `cat $(which gn)`. Maybe that script does something that doesn't work on your system?

jack....@gmail.com

unread,
Sep 22, 2016, 4:15:06 PM9/22/16
to gn-dev, jack....@gmail.com, pete...@google.com
Is the following what you mean, or do I need to test something else?

> $ cc tools/gn/tutorial/hello_world.cc
> $ ./a.out
> Hello, world.
> $ cc -v
> Using built-in specs.
> COLLECT_GCC=cc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Debian 6.1.1-11' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 6.1.1 20160802 (Debian 6.1.1-11)
> $

jack....@gmail.com

unread,
Sep 22, 2016, 4:22:07 PM9/22/16
to gn-dev, jack....@gmail.com
"buildtools/linux64/gn help" does hang, but so do "out/Release/gn help" and "out/Debug/gn help".
I think depot_tools/gn is successfully calling buildtools/linux64/gn (which hangs).

Dirk Pranke

unread,
Sep 22, 2016, 4:49:54 PM9/22/16
to Jack Bates, gn-dev
Yes; good point.
 

What's the difference between the intermediate executable in /tmp/tmpGnEMBR and the final one in out/Debug?

Most likely they have fairly different command line flags. I doubt the intermediate executable builds against the
sysroot, either.

So, the problem probably has something to do with the command lines. You might want to run ninja -v on both builds and start comparing differences.

-- Dirk

jack....@gmail.com

unread,
Sep 23, 2016, 2:56:45 PM9/23/16
to gn-dev, jack....@gmail.com
How do I build GN without the sysroot?
I'm trying to figure it out, but maybe someone can beat me to the punch.
I guess there's a GN arg I need to set?
The intermediate executable is built with c++ (GCC)
whereas the ultimate executable is built with ../../third_party/llvm-build/Release+Asserts/bin/clang++.
Thanks everyone for your help!

jack....@gmail.com

unread,
Sep 23, 2016, 4:43:59 PM9/23/16
to gn-dev, jack....@gmail.com
It seems like I need all of the Chromium build dependencies in order to build GN, is this as intended?

Dirk Pranke

unread,
Sep 23, 2016, 5:20:07 PM9/23/16
to Jack Bates, gn-dev
In a "normal" Chromium GN build, you would set `use_sysroot=false` in your GN args. You could also try setting `use_clang=false`,
which will then fall back to the system compiler (which I'd guess is gcc?) which is the same compiler used during the bootstrap.

Perhaps there is also an issue w/ the bundled version of clang on that platform.

Yes, you can only really build GN from a Chromium checkout at the moment. This is mostly intentional, though I've had a bug
open forever to set up a stripped-down mirror of just the files you really need.

-- Dirk


jack....@gmail.com

unread,
Sep 23, 2016, 5:30:42 PM9/23/16
to gn-dev, jack....@gmail.com
Thanks, I would like to try the system compiler, but use_clang isn't a build argument?

> $ out/Release/gn gen out/Release --args=use_clang=false
> ERROR at the command-line "--args":1:11: Build argument has no effect.
> use_clang=false
> ^----
> The variable "use_clang" was set as a build argument
> but never appeared in a declare_args() block in any buildfile.
>
> To view possible args, run "gn args --list <builddir>"
> Done. Made 5526 targets from 1155 files in 5260ms
> $

Building without the sysroot (as follows) still just hangs:

> $ tools/gn/bootstrap/bootstrap.py --gn-gen-args 'use_sysroot=false use_gtk3=true'

Dirk Pranke

unread,
Sep 23, 2016, 5:31:32 PM9/23/16
to Jack Bates, gn-dev
Oh, sorry, 'is_clang=false'.

Nico Weber

unread,
Sep 23, 2016, 5:31:46 PM9/23/16
to Jack Bates, gn-dev
it's is_clang

jack....@gmail.com

unread,
Sep 23, 2016, 6:03:40 PM9/23/16
to gn-dev, jack....@gmail.com
Hmm, building without the sysroot and with the system compiler (as follows) still just hangs, whereas the intermediate executable (also built with the system compiler) works. Any idea what to try next?

> $ tools/gn/bootstrap/bootstrap.py --gn-gen-args 'is_clang=false use_sysroot=false use_gtk3=true treat_warnings_as_errors=false' -v

Dirk Pranke

unread,
Sep 23, 2016, 6:08:13 PM9/23/16
to Jack Bates, gn-dev
Seems like it's some combination of the compile flags, then, so I'd start comparing command lines.

-- Dirk

jack....@gmail.com

unread,
Sep 26, 2016, 12:30:51 PM9/26/16
to gn-dev, jack....@gmail.com
The following did the trick:

> $ tools/gn/bootstrap/bootstrap.py --gn-gen-args 'use_allocator="none"'

What do I need on my system to make TCMalloc work?
How can I reproduce this in an isolated test case?
How can I step through the code before the first line of main()?
I tried breaking on InitializeSystemInfo() (sysinfo.cc) but GDB just hangs without breaking.
Why would it just hang, before the first line of main()?

jack....@gmail.com

unread,
Sep 26, 2016, 1:50:24 PM9/26/16
to gn-dev, jack....@gmail.com
The following isn't sufficient to reproduce this (it doesn't hang):

> #include <stdlib.h>
>
> int main() {
> malloc(1);
> }

> $ gcc malloc.c -Lthird_party/binutils/Linux_x64/Release/lib -ltcmalloc_minimal
> $ LD_LIBRARY_PATH=third_party/binutils/Linux_x64/Release/lib ./a.out
> $

How can I exercise base/allocator in an isolated test case?

Dirk Pranke

unread,
Sep 26, 2016, 2:09:39 PM9/26/16
to Jack Bates, gn-dev
Hi Jack,

Can you file a bug for this? It's probably easier to follow up there.

I don't think there's a particularly compelling reason for GN to be using tcmalloc and so it's easy enough for us to turn it off for the official builds, but it would probably also be good to figure out what's going on with the hang.

-- Dirk

Primiano Tucci

unread,
Sep 26, 2016, 2:20:59 PM9/26/16
to Dirk Pranke, Jack Bates, gn-dev
On Mon, Sep 26, 2016 at 7:09 PM, Dirk Pranke <dpr...@chromium.org> wrote:
I don't think there's a particularly compelling reason for GN to be using tcmalloc
In crbug.com/586444 brettw/you reported a 10-40% perf hit by going back to the default allocator
  
> Can you file a bug for this? It's probably easier to follow up there.
+1 I'd happily try to help on this, but I couldn't reconstruct what's going on in this thread. Glanced through this but didn't manage to figure out what is the problem and what you are trying to achieve.

--
You received this message because you are subscribed to the Google Groups "gn-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gn-dev+unsubscribe@chromium.org.



--
Primiano Tucci
Software Engineer
Google UK Limited
Registered Office: Belgrave House, 76 Buckingham Palace Road, London SW1W 9TQ

Dirk Pranke

unread,
Sep 26, 2016, 3:54:17 PM9/26/16
to Primiano Tucci, Jack Bates, gn-dev, thomasa...@chromium.org
On Mon, Sep 26, 2016 at 11:20 AM, Primiano Tucci <prim...@google.com> wrote:


On Mon, Sep 26, 2016 at 7:09 PM, Dirk Pranke <dpr...@chromium.org> wrote:
I don't think there's a particularly compelling reason for GN to be using tcmalloc
In crbug.com/586444 brettw/you reported a 10-40% perf hit by going back to the default allocator

Yes, but all other things being equal I don't think that's a compelling reason enough by itself to stay on it :). That's more of a compelling reason to use it if it's available ...

As you well know, we've been having other debates about whether tcmalloc is worth it or not. This is another data point for one side of that debate.

Brett, being more performance-sensitive than I, probably would feel more strongly about staying on it if we can.

-- Dirk

 
To unsubscribe from this group and stop receiving emails from it, send an email to gn-dev+un...@chromium.org.

Nico Weber

unread,
Sep 26, 2016, 3:57:05 PM9/26/16
to Dirk Pranke, Primiano Tucci, Jack Bates, gn-dev, thomasa...@chromium.org
On Mon, Sep 26, 2016 at 3:53 PM, Dirk Pranke <dpr...@chromium.org> wrote:


On Mon, Sep 26, 2016 at 11:20 AM, Primiano Tucci <prim...@google.com> wrote:


On Mon, Sep 26, 2016 at 7:09 PM, Dirk Pranke <dpr...@chromium.org> wrote:
I don't think there's a particularly compelling reason for GN to be using tcmalloc
In crbug.com/586444 brettw/you reported a 10-40% perf hit by going back to the default allocator

Yes, but all other things being equal I don't think that's a compelling reason enough by itself to stay on it :).

I'm not aware of these other discussions, but slowing down gn 10-40% because of this thread (where gn doesn't run on one person's linux installation, which might be weird in a million ways, and where we don't understand yet what's going on) seems a bit over-eager to me...

jack....@gmail.com

unread,
Sep 26, 2016, 4:54:29 PM9/26/16
to gn-dev, dpr...@chromium.org, jack....@gmail.com
On Monday, September 26, 2016 at 11:20:59 AM UTC-7, Primiano Tucci wrote:
> +1 I'd happily try to help on this, but I couldn't reconstruct what's going on in this thread. Glanced through this but didn't manage to figure out what is the problem and what you are trying to achieve.

Thanks everyone for your help.
I'm happy to open a bug, now or after we understand what's going on.
The original problem is that when I follow these instructions [1], GN just hangs:

> $ fetch --no-history chromium
> $ gn gen out/Default
> (No output, just hangs here)

I can workaround this by first building GN without TCMalloc:

> $ tools/gn/bootstrap/bootstrap.py --gn-gen-args 'use_allocator="none"'

To understand what's going on, I'm attempting to isolate and reproduce the problem.
I'm stuck here at the moment:

> #include <malloc.h>
>
> void* malloc(size_t size) {
> return tc_malloc(size);
> }
>
> int main() {
> malloc(1);
> }

> $ gcc -c main.c
> $ g++ main.o out/Debug/obj/base/allocator/tcmalloc/*.o out/Debug/obj/base/third_party/dynamic_annotations/dynamic_annotations/dynamic_annotations.o
> $ ./a.out
Segmentation fault
> $

I'm trying to replicate the relevant part of allocator_shim_override_libc_symbols.h but I think I'm missing some magic words.
Can anyone spot what's wrong with this test case?
Or suggest another approach to figuring out what's going on?

[1] https://www.chromium.org/developers/how-tos/get-the-code

Tom Anderson

unread,
Sep 26, 2016, 7:52:18 PM9/26/16
to Nico Weber, Dirk Pranke, Primiano Tucci, Jack Bates, gn-dev, thomasa...@chromium.org, the...@chromium.org

Do we even support building on debian (let alone debian testing)?  I just tried running install-build-deps.sh and got this

ERROR: Only Ubuntu 12.04 (precise), 14.04 (trusty),  14.10 (utopic), 15.04 (vivid), 15.10 (wily) and 16.04 (xenial)  are currently supported

Tom Anderson

unread,
Sep 26, 2016, 8:50:29 PM9/26/16
to Nico Weber, Dirk Pranke, Primiano Tucci, Jack Bates, gn-dev, thomasa...@chromium.org, the...@chromium.org

I'm not seeing this issue on a fresh install of Debian Stretch.  The steps I took were:

1. fetch chromium
2. sudo apt-get install --reinstall libasound2:i386 libcap2:i386 libelf-dev:i386 libfontconfig1:i386 libgconf-2-4:i386 libgl1-mesa-glx:i386 libglib2.0-0:i386 libgpm2:i386 libgtk2.0-0:i386 libncurses5:i386 libnss3:i386 libpango1.0-0:i386 libtinfo-dev:i386 libudev1:i386 libxcomposite1:i386 libxcursor1:i386 libxdamage1:i386 libxi6:i386 libxrandr2:i386 libxss1:i386 libxtst6:i386 linux-libc-dev:i386 ant apache2-bin autoconf bison cdbs cmake curl devscripts dpkg-dev elfutils fakeroot flex fonts-indic fonts-thai-tlwg g++ g++-6-multilib gawk git-core git-svn g++-mingw-w64-i686 gperf intltool lib32gcc1 lib32ncurses5-dev lib32stdc++6 lib32z1-dev libapache2-mod-php7.0 libasound2 libasound2-dev libatk1.0-0 libav-tools libbluetooth-dev libbrlapi0.6 libbrlapi-dev libbz2-1.0 libbz2-dev libc6 libc6-dbg libc6-dev-armhf-cross libc6-i386 libcairo2 libcairo2-dbg libcairo2-dev libcap2 libcap-dev libcups2 libcups2-dev libcurl4-gnutls-dev libdrm-dev libelf-dev libexpat1 libffi6 libffi6-dbg libffi-dev libfontconfig1 libfontconfig1-dbg libfreetype6 libgbm-dev libgconf2-dev libgl1-mesa-dev libgles2-mesa-dev libglib2.0-0 libglib2.0-0-dbg libglib2.0-dev libglu1-mesa-dev libgnome-keyring0 libgnome-keyring-dev libgtk2.0-0 libgtk2.0-0-dbg libgtk2.0-dev libjpeg-dev libkrb5-dev libnspr4 libnspr4-dbg libnspr4-dev libnss3 libnss3-dbg libnss3-dev libpam0g libpam0g-dev libpango1.0-0 libpango1.0-0-dbg libpci3 libpci-dev libpcre3 libpcre3-dbg libpixman-1-0 libpixman-1-0-dbg libpulse0 libpulse-dev libsctp-dev libspeechd2 libspeechd-dev libsqlite3-0 libsqlite3-0-dbg libsqlite3-dev libssl-dev libstdc++6  libtinfo-dev libtool libudev1 libudev-dev libwww-perl libx11-6 libx11-6-dbg libx11-xcb1 libx11-xcb1-dbg libxau6 libxau6-dbg libxcb1 libxcb1-dbg libxcomposite1 libxcomposite1-dbg libxcursor1 libxcursor1-dbg libxdamage1 libxdamage1-dbg libxdmcp6 libxdmcp6-dbg libxext6 libxext6-dbg libxfixes3 libxi6 libxi6-dbg libxinerama1 libxinerama1-dbg libxkbcommon-dev libxrandr2 libxrandr2-dbg libxrender1 libxrender1-dbg libxslt1-dev libxss-dev libxt-dev libxtst6 libxtst6-dbg libxtst-dev linux-libc-dev-armhf-cross mesa-common-dev  openbox patch perl php7.0-cgi pkg-config python python-cherrypy3 python-crypto python-dev python-numpy python-opencv python-openssl python-psutil python-yaml realpath rpm ruby subversion texinfo ttf-dejavu-core wdiff xcompmgr xsltproc xutils-dev xvfb zip zlib1g zlib1g-dbg
3. gn gen out/Debug

Dirk Pranke

unread,
Sep 26, 2016, 8:58:11 PM9/26/16
to Nico Weber, Primiano Tucci, Jack Bates, gn-dev, thomasa...@chromium.org
On Mon, Sep 26, 2016 at 12:57 PM, Nico Weber <tha...@chromium.org> wrote:
On Mon, Sep 26, 2016 at 3:53 PM, Dirk Pranke <dpr...@chromium.org> wrote:


On Mon, Sep 26, 2016 at 11:20 AM, Primiano Tucci <prim...@google.com> wrote:


On Mon, Sep 26, 2016 at 7:09 PM, Dirk Pranke <dpr...@chromium.org> wrote:
I don't think there's a particularly compelling reason for GN to be using tcmalloc
In crbug.com/586444 brettw/you reported a 10-40% perf hit by going back to the default allocator

Yes, but all other things being equal I don't think that's a compelling reason enough by itself to stay on it :).

I'm not aware of these other discussions, but slowing down gn 10-40% because of this thread (where gn doesn't run on one person's linux installation, which might be weird in a million ways, and where we don't understand yet what's going on) seems a bit over-eager to me...

Sure. The larger issue is that tcmalloc causes us pain in other ways as well in Chromium (though this is not the thread to go into those details) and it's unclear if tcmalloc is even still a win for Chromium, perf-wise. 

I.e., all other things aren't equal, so it's not really worth debating this by itself :).

-- Dirk

Dirk Pranke

unread,
Sep 27, 2016, 12:15:50 AM9/27/16
to Nico Weber, Primiano Tucci, Jack Bates, gn-dev, thomasa...@chromium.org
On Mon, Sep 26, 2016 at 5:57 PM, Dirk Pranke <dpr...@chromium.org> wrote:


On Mon, Sep 26, 2016 at 12:57 PM, Nico Weber <tha...@chromium.org> wrote:
On Mon, Sep 26, 2016 at 3:53 PM, Dirk Pranke <dpr...@chromium.org> wrote:


On Mon, Sep 26, 2016 at 11:20 AM, Primiano Tucci <prim...@google.com> wrote:


On Mon, Sep 26, 2016 at 7:09 PM, Dirk Pranke <dpr...@chromium.org> wrote:
I don't think there's a particularly compelling reason for GN to be using tcmalloc
In crbug.com/586444 brettw/you reported a 10-40% perf hit by going back to the default allocator

Yes, but all other things being equal I don't think that's a compelling reason enough by itself to stay on it :).

I'm not aware of these other discussions, but slowing down gn 10-40% because of this thread (where gn doesn't run on one person's linux installation, which might be weird in a million ways, and where we don't understand yet what's going on) seems a bit over-eager to me...

Sure. The larger issue is that tcmalloc causes us pain in other ways as well in Chromium (though this is not the thread to go into those details) and it's unclear if tcmalloc is even still a win for Chromium, perf-wise. 

I.e., all other things aren't equal, so it's not really worth debating this by itself :).

In retrospect, what I wrote at first was really the opposite of what I should've written :). Apologies for the confusion.

-- Dirk

Primiano Tucci

unread,
Sep 27, 2016, 1:56:35 PM9/27/16
to Dirk Pranke, Nico Weber, Jack Bates, gn-dev, thomasa...@chromium.org
>I'm not seeing this issue on a fresh install of Debian Stretch.  The steps I took were: 

Same here. I just brought up a fresh docker container running debian stretch/sid and gn just runs fine.


jack....@gmail.com

unread,
Sep 27, 2016, 2:51:07 PM9/27/16
to gn-dev, tha...@chromium.org, dpr...@chromium.org, prim...@google.com, jack....@gmail.com, thomasa...@chromium.org, the...@chromium.org, thomasa...@google.com
Okay, here's what's going on:
There's a package on my system that calls dlsym() from open(), and glibc calls calloc() from dlsym(), which deadlocks.

More fully, very early, dl-init.c [1] calls malloc(), via e.g. [2] or [3].
tc_malloc() calls open("/dev/urandom") [4].
On my system, open() calls dlsym() [5].
dlsym() calls calloc() [6].
Finally tc_calloc() calls ThreadCache::InitModule() which calls Static::pageheap_lock() [7], which the original tc_malloc() is already holding :-(

So is there a bug?
Is it wrong of my system to call dlsym() from open()?
Is it wrong of TCMalloc to call open() from malloc()?
Should ThreadCache::InitModule() or tc_malloc(), etc. handle reentrancy, even with an error?
Or is this just an understandable conflict between Chromium and the cowdancer package?
(I guess dlsym() does what's expected.)

You can reproduce the problem with the following:

> #define _GNU_SOURCE
> #include <dlfcn.h>
>
> # Like https://anonscm.debian.org/git/pbuilder/cowdancer.git/tree/cowdancer.c
> int open64(const char* filename, int flags) {
> int (*origlibc_open64)(const char *, int) = dlsym(RTLD_NEXT, "open64");
>
> return origlibc_open64(filename, flags);
> }
>
> int main() {}

> $ tools/gn/bootstrap/bootstrap.py --gn-gen-args use_experimental_allocator_shim=false
> $ gcc -c main.c
> $ g++ -lpthread main.o out/Release/obj/base/allocator/tcmalloc/*.o out/Release/obj/base/third_party/dynamic_annotations/dynamic_annotations/dynamic_annotations.o -ldl
> $ ./a.out


> (No output, just hangs here)

The problem goes away unless all of the following are true:
The Chromium TCMalloc fork is used. Vanilla TCMalloc doesn't open("/dev/urandom").
ASLR is turned on. Address space layout randomization is what opens /dev/urandom.
Threads are present. Without libpthread, dlsym() uses a static buffer, not calloc().
ASLR isn't already initialized. Presumably if GetRandomAddrHint() were called outside tc_malloc(), it wouldn't deadlock.
cowdancer isn't already initialized. I didn't test, but if cowdancer.c:initialize_functions() were called outside tc_malloc(), presumably it wouldn't deadlock.

Is there a bug anywhere in all this?

[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-init.c;h=818c3aa37cd052e6edbf5f55524647b45b5bfe87;hb=HEAD#l72
[2] https://git.gnome.org/browse/glib/tree/glib/gthread-posix.c#n1000
[3] https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/eh_alloc.cc#L123
[4] https://chromium.googlesource.com/chromium/src.git/+/master/third_party/tcmalloc/chromium/src/system-alloc.cc#180
[5] https://anonscm.debian.org/git/pbuilder/cowdancer.git/tree/cowdancer.c#n182
[6] https://sourceware.org/git/?p=glibc.git;a=blob;f=dlfcn/dlerror.c;h=41b2bd6bf29be5f61affc5e750775ab2f64ee4b0;hb=HEAD#l141
[7] https://chromium.googlesource.com/chromium/src.git/+/master/third_party/tcmalloc/chromium/src/thread_cache.cc#322

Primiano Tucci

unread,
Sep 27, 2016, 3:24:17 PM9/27/16
to jack....@gmail.com, gn-dev, tha...@chromium.org, dpr...@chromium.org, thomasa...@chromium.org, the...@chromium.org, thomasa...@google.com
On Tue, Sep 27, 2016 at 7:51 PM <jack....@gmail.com> wrote:
Okay, here's what's going on:
There's a package on my system that calls dlsym() from open(), and glibc calls calloc() from dlsym(), which deadlocks.
This really reminds me of crbug.com/586444
 
So is there a bug?
IMHO a couple

Is it wrong of my system to call dlsym() from open()?
A lot of code (like tcmalloc) expects open to be a pure syscall wrapper. By doing fancy things which are out of your control, like invoking other glibc functions, you break this expectation. There is no right or wrong in these cases, just: how many things depend on that assumptions, and how many things you break when you change that.

Is it wrong of TCMalloc to call open() from malloc()?
Again, see "expectations" above. Honestly I find very odd for open() to end up calling malloc() (directly or indirectly)
 
Should ThreadCache::InitModule() or tc_malloc(), etc. handle reentrancy, even with an error?
Best place to discuss this would be on https://github.com/gperftools/gperftools
 
Or is this just an understandable conflict between Chromium and the cowdancer package?
(I guess dlsym() does what's expected.)
Well the problem here is that you have unexpected (and IMHO debatable) things, which just clash together: whatever overrides open() to call dlsym, and dlsym invoking allocator functions.
 
You can reproduce the problem with the following:

> #define _GNU_SOURCE
> #include <dlfcn.h>
>
> # Like https://anonscm.debian.org/git/pbuilder/cowdancer.git/tree/cowdancer.c
No idea what cowdancer is, but definitely this smells like an unsupported configuration for chromium. 
 
The problem goes away unless all of the following are true:
The Chromium TCMalloc fork is used. Vanilla TCMalloc doesn't open("/dev/urandom").
ASLR is turned on. Address space layout randomization is what opens /dev/urandom.
I believe this is one of the many reasons why chrome uses tcmalloc: security
 
Threads are present. Without libpthread, dlsym() uses a static buffer, not calloc().
ASLR isn't already initialized. Presumably if GetRandomAddrHint() were called outside tc_malloc(), it wouldn't deadlock.
cowdancer isn't already initialized. I didn't test, but if cowdancer.c:initialize_functions() were called outside tc_malloc(), presumably it wouldn't deadlock.

Is there a bug anywhere in all this?

Honestly, this could probably being worked around if we change third_party/tcmalloc/chromium/src/system-alloc.cc to use sys_open() (from linux_syscall_support.h
) instead of open(). But then I fear that your cowdancer is just going to cause some similar issue somewhere else.
 

Torne (Richard Coles)

unread,
Sep 27, 2016, 3:38:21 PM9/27/16
to Primiano Tucci, jack....@gmail.com, gn-dev, tha...@chromium.org, dpr...@chromium.org, thomasa...@chromium.org, the...@chromium.org, thomasa...@google.com

Yeah, don't use cowdancer, use a civilised kernel based CoW implementation instead (like btrfs snapshots) :)

Tom Anderson

unread,
Sep 27, 2016, 3:44:34 PM9/27/16
to Torne (Richard Coles), Primiano Tucci, jack....@gmail.com, gn-dev, tha...@chromium.org, dpr...@chromium.org, thomasa...@chromium.org, the...@chromium.org



On 09/27/2016 12:38 PM, Torne (Richard Coles) wrote:

Yeah, don't use cowdancer, use a civilised kernel based CoW implementation instead (like btrfs snapshots) :)


On Tue, 27 Sep 2016, 8:24 pm 'Primiano Tucci' via gn-dev, <gn-...@chromium.org> wrote:
On Tue, Sep 27, 2016 at 7:51 PM <jack....@gmail.com> wrote:
Okay, here's what's going on:
There's a package on my system that calls dlsym() from open(), and glibc calls calloc() from dlsym(), which deadlocks.
This really reminds me of crbug.com/586444
 
So is there a bug?
IMHO a couple

Is it wrong of my system to call dlsym() from open()?
A lot of code (like tcmalloc) expects open to be a pure syscall wrapper. By doing fancy things which are out of your control, like invoking other glibc functions, you break this expectation. There is no right or wrong in these cases, just: how many things depend on that assumptions, and how many things you break when you change that.

I'd have to agree with this.  open() is supposed to be async-signal safe, however it's not on your system since it calls malloc.  I know this is not specifically about signals, but it's quite an odd case, definitely not one the Chromium expects.
Is it wrong of TCMalloc to call open() from malloc()?
Again, see "expectations" above. Honestly I find very odd for open() to end up calling malloc() (directly or indirectly)
 
Should ThreadCache::InitModule() or tc_malloc(), etc. handle reentrancy, even with an error?
Best place to discuss this would be on https://github.com/gperftools/gperftools
 
malloc() is not reentrant, there's no reason tc_malloc() should be.

jack....@gmail.com

unread,
Sep 28, 2016, 2:01:41 PM9/28/16
to gn-dev, to...@chromium.org, prim...@google.com, jack....@gmail.com, tha...@chromium.org, dpr...@chromium.org, thomasa...@chromium.org, the...@chromium.org, thomasa...@google.com
> This really reminds me of crbug.com/586444

Yes, this conflict has a lot in common with that one.

> I'd have to agree with this.  open() is supposed to be
> async-signal safe, however it's not on your system since it calls
> malloc.  I know this is not specifically about signals, but it's
> quite an odd case, definitely not one the Chromium expects.

> malloc() is not reentrant, there's no reason tc_malloc() should
> be.

Gotcha. So it's okay for malloc() to call open() because open() is supposed to be async-signal safe [1], therefore open() mustn't call malloc(), therefore malloc() can call open() without risking reentrancy.

To keep open() AS-safe, cowdancer mustn't call dlsym() from open(), if that's possible. That means either:

A) Call dlsym() early, before anything calls open(). I don't know how or if you could orchestrate that, but it doesn't matter, because even if you managed to call dlsym() early, it would call calloc() which would call open("/dev/urandom") before cowdancer was done initializing :-(

B) Avoid dlsym() altogether. What are the options in this case?

1) Use sys_open() instead of dlsym(RTLD_NEXT, "open")

2) ???

> Yeah, don't use cowdancer, use a civilised kernel
> based CoW implementation instead (like btrfs snapshots) :)

Great suggestion, I wasn't aware of "cp --reflink", thanks! That definitely seems like the "right" solution, however I suspect a lot of systems don't support it yet (mine included). (Upgrading to btrfs solves that.)

[1] http://man7.org/linux/man-pages/man7/signal.7.html "Async-signal-safe functions"

Torne (Richard Coles)

unread,
Sep 28, 2016, 3:18:12 PM9/28/16
to jack....@gmail.com, gn-dev, prim...@google.com, tha...@chromium.org, dpr...@chromium.org, thomasa...@chromium.org, the...@chromium.org, thomasa...@google.com
btrfs subvolume snapshots are even faster and more disk-efficient than cp --reflink; if you make the thing you want to be able to treat as a CoW volume into a btrfs subvolume you can create snapshots of that subvolume basically instantly at the cost of a couple of kilobytes of disk at most (instead of being linearly proportional to number-of-files like cp --reflink) and also delete the subvolume similarly efficiently instead of waiting for rm -rf :)

But yes, you need to be using btrfs to use it. :)

jack....@gmail.com

unread,
Sep 29, 2016, 1:03:41 PM9/29/16
to gn-dev, jack....@gmail.com, prim...@google.com, tha...@chromium.org, dpr...@chromium.org, thomasa...@chromium.org, the...@chromium.org, thomasa...@google.com
Bug filed [1].
What's the best way to do sys_open(), on Linux and BSD?
Thanks everyone for sharing your expertise.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=839101

On Wednesday, September 28, 2016 at 12:18:12 PM UTC-7, Torne (Richard Coles) wrote:
> btrfs subvolume snapshots are even faster and more disk-efficient than cp --reflink; if you make the thing you want to be able to treat as a CoW volume into a btrfs subvolume you can create snapshots of that subvolume basically instantly at the cost of a couple of kilobytes of disk at most (instead of being linearly proportional to number-of-files like cp --reflink) and also delete the subvolume similarly efficiently instead of waiting for rm -rf :)
>
> But yes, you need to be using btrfs to use it. :)

I've added trying out btrfs and a subvolume solution to my to-do list, thanks!

jack....@gmail.com

unread,
Sep 29, 2016, 1:03:58 PM9/29/16
to gn-dev, jack....@gmail.com, prim...@google.com, tha...@chromium.org, dpr...@chromium.org, thomasa...@chromium.org, the...@chromium.org, thomasa...@google.com
Bug filed [1].
What's the best way to do sys_open(), on Linux and BSD?
Thanks everyone for sharing your expertise.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=839101

On Wednesday, September 28, 2016 at 12:18:12 PM UTC-7, Torne (Richard Coles) wrote:

> btrfs subvolume snapshots are even faster and more disk-efficient than cp --reflink; if you make the thing you want to be able to treat as a CoW volume into a btrfs subvolume you can create snapshots of that subvolume basically instantly at the cost of a couple of kilobytes of disk at most (instead of being linearly proportional to number-of-files like cp --reflink) and also delete the subvolume similarly efficiently instead of waiting for rm -rf :)
>
> But yes, you need to be using btrfs to use it. :)

I've added trying out btrfs and a subvolume solution to my to-do list, thanks!

Primiano Tucci

unread,
Sep 29, 2016, 1:05:35 PM9/29/16
to Jack Bates, gn-dev, Nico Weber, Dirk Pranke, thomasa...@chromium.org, Lei Zhang, Tom Anderson
What's the best way to do sys_open(), on Linux 

and BSD?
I have absolutely no clue about bsd. I imagine they have some similar syscall() interface.
Reply all
Reply to author
Forward
0 new messages