parallel build efficiency

72 views
Skip to first unread message

Erik Rantapaa

unread,
Jul 8, 2016, 2:53:47 PM7/8/16
to haskell-stack
Hi all,

I've been building a large number of packages with a single `stack install` command, e.g. something like:

    stack install --resolver ... --keep-going $(cat all-packages)

On a 8-core / 16 GB box I've noticed that the worker threads are not always kept busy. The overall %-idle time is at least 20%.

So I'm wondering...

- Is there anything I can do on the command line to improve CPU utilization?

- Where can I find the code that does the job scheduling? I would like to modify how package builds are prioritized to improve throughput.

Thanks,
Erik

Christopher Allen

unread,
Jul 8, 2016, 3:01:10 PM7/8/16
to Erik Rantapaa, haskell-stack
How'd you generate the all-packages file? I'd like to test this on my machine.
> --
> You received this message because you are subscribed to the Google Groups
> "haskell-stack" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to haskell-stac...@googlegroups.com.
> To post to this group, send email to haskel...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/haskell-stack/329a3117-b2ff-411c-954d-156001b622d6%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Chris Allen
Currently working on http://haskellbook.com

Christopher Allen

unread,
Jul 8, 2016, 3:03:27 PM7/8/16
to Erik Rantapaa, haskell-stack
Speaking anecdotally, it took me 3.5 minutes to compile lens on a 10
core / 20 thread machine with 32 gb of ram and it was not using my
cores very much for most of it. I don't think it's I/O limited because
the disk is an M.2 PCIe SSD.

This contrasts with lens taking ~5-5.5 minutes on my quad core machine
with a SATA SSD.

Michael Sloan

unread,
Jul 8, 2016, 3:31:48 PM7/8/16
to Erik Rantapaa, haskell-stack
On Fri, Jul 8, 2016 at 11:53 AM, Erik Rantapaa <eran...@gmail.com> wrote:
Hi all,

I've been building a large number of packages with a single `stack install` command, e.g. something like:

    stack install --resolver ... --keep-going $(cat all-packages)

On a 8-core / 16 GB box I've noticed that the worker threads are not always kept busy. The overall %-idle time is at least 20%.
 

So I'm wondering...

- Is there anything I can do on the command line to improve CPU utilization?


You can definitely increase CPU utilization by specifying "-jN" options for GHC (e.g. "-j2").  Unfortunately, in my experience it usually doesn't help as much as I'd hope with overall build time.  One issue is that it's possible for it to interact poorly with package-level parallelism.  For example, if stack is building 5 packages with "-j5", suddenly there's potentially 25 processes each trying to use 100% of a core.
 

- Where can I find the code that does the job scheduling? I would like to modify how package builds are prioritized to improve throughput.

Michael Sloan

unread,
Jul 8, 2016, 3:34:46 PM7/8/16
to Erik Rantapaa, haskell-stack
Oh, and it's also worth mentioning that stack also takes "-j" to specify building more packages concurrently.  It defaults to your processor count.  It can be beneficial to use a higher value.

To pass the -j to the cabal builds, use 

--ghc-options "-j2"

Christopher Allen

unread,
Jul 8, 2016, 3:35:31 PM7/8/16
to Michael Sloan, Erik Rantapaa, haskell-stack
I've tried setting Stack's package build parallelism to j20 or j30
instead of the j10 it infers fro my core count but it just leads to
more sleepy threads.
> https://groups.google.com/d/msgid/haskell-stack/CAEYHaY6jQkuKUpa5E0pqNGgLOv47%2Bpqtb49wPat18%2Br1rBE9Jw%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



al...@frontrowed.com

unread,
Jul 8, 2016, 3:42:40 PM7/8/16
to Christopher Allen, Erik Rantapaa, haskell-stack, Michael Sloan
On a semi-related note, if folks have figured out a sane way of parallelizing large Haskell builds and test suites in CircleCI, I'd be all ears.

> https://groups.google.com/d/msgid/haskell-stack/CAEYHaY6jQkuKUpa5E0pqNGgLOv47%2Bpqtb49wPat18%2Br1rBE9Jw%40mail.gmail.com….

>
> For more options, visit https://groups.google.com/d/optout.

--
Chris Allen
Currently working on http://haskellbook.com

--
You received this message because you are subscribed to the Google Groups "haskell-stack" group.
To unsubscribe from this group and stop receiving emails from it, send an email to haskell-stac...@googlegroups.com.
To post to this group, send email to haskel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/haskell-stack/CADnndOr-e8tL85zbjy-5wesOH1g%3DxDezJG_aP%3Dzv0i_M%2BD1iOw%40mail.gmail.c….

Christopher Allen

unread,
Jul 8, 2016, 3:58:53 PM7/8/16
to Alexandr Kurilin, Erik Rantapaa, haskell-stack, Michael Sloan
The parallelization machinery in CircleCI is coarse-grained and intended for orthogonal builds I think, I don't know how well you could a fork/join model with it, but it may be possible. I'd suggest aggressive caching first if you haven't exhausted that.

As you probably know, we abandoned CircleCI for greener (faster) pastures: http://bitemyapp.com/posts/2016-03-28-speeding-up-builds.html those numbers I quote in the article are for a quad-core dedi I think, not the 10 core I mentioned earlier.

Erik Rantapaa

unread,
Jul 8, 2016, 4:00:30 PM7/8/16
to haskell-stack


On Friday, July 8, 2016 at 2:01:10 PM UTC-5, Christopher Allen wrote:
How'd you generate the all-packages file? I'd like to test this on my machine.


I get them from the cabal.config file for the resolver, e.g.:


Christopher Allen

unread,
Jul 8, 2016, 4:11:14 PM7/8/16
to Erik Rantapaa, haskell-stack
Some samples from when it got into the thick of things.

The utilization doesn't seem that bad to me given how each job is
bouncing between networking, disk I/O, and solid CPU work.

I started it ~5 minutes ago, it's at 289/1813.

I think you'd need to start assigning the work to thread pools between
different jobs to saturate the CPU cores and that's not going to be a
_ton_ of work in typical builds. Yesod is ~120 or so packages so it
benefits from my having more cores, but even lens hits a couple
bottlenecks where it's just waiting on one package before being able
to proceed.
> --
> You received this message because you are subscribed to the Google Groups
> "haskell-stack" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to haskell-stac...@googlegroups.com.
> To post to this group, send email to haskel...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/haskell-stack/be325f6c-b02e-4ad7-ae9a-c5fbf5a13da7%40googlegroups.com.
Screenshot from 2016-07-08 15-06-12.png
Screenshot from 2016-07-08 15-08-25.png

Erik Rantapaa

unread,
Jul 8, 2016, 5:26:22 PM7/8/16
to haskell-stack
I've created a repo for some Python scripts I used to monitor a long-running stack install job:


Check it out if you have a chance and let me know how it works for you.

Reply all
Reply to author
Forward
0 new messages