Massive slowdown in recent Hakyll versions due to parallelism and/or changed dep checking?

57 views
Skip to first unread message

Gwern Branwen

unread,
May 3, 2022, 2:38:15 PM5/3/22
to hakyll
So last night I upgraded from Ubuntu GHC 8.6.3 to 9.0.1*, and my
Hakyll from 4.13.4.1 (as of ~2021-03-14) to HEAD (4.15.1.1), with my
symlink patch <https://github.com/jaspervdj/hakyll/issues/786> added,
because without that, I am literally unable to compile my site
(https://github.com/gwern/gwern.net/) due to the 20k or so
documents/files in it (the hosted documents + generated infrastructure
files providing features like annotations, backlinks, and link
recommendations) - aside from copying being unnecessary & slow, I
simply don't have the disk space right now to store the full _site/ in
all its redundant glory. I have spent my morning trying to debug this.

The Hakyll compiler code for these static copied files is very simple
and has not changed in a long time other than the symlinking instead
of copying and occasionally adding an additional filetype or
directory: https://github.com/gwern/gwern.net/blob/a18b7851966735bd51e5dc5e2676d9168c5280fc/build/hakyll.hs#L117

let static = route idRoute >> compile symlinkFileCompiler
-- WARNING: custom optimization requiring forked Hakyll installation;
see https://github.com/jaspervdj/hakyll/issues/786
version "static" $ mapM_ (`match` static) [
"docs/**",
"images/**",
"**.hs",
"**.sh",
"**.txt",
"**.html",
"**.page",
...]

So I am surprised to find that while my static symlinking was the
*last* step in the build before, as it is in the do-block, somehow it
now runs *first*. I was even more surprised to see that whereas the
symlink step before took less than half a minute and printed output
faster than one could read, now it is taking somewhere upwards of an
hour and a half (thus far, as I write this, it is still going) while
leisurely printing out line by line. The slowdown would appear to be
on the order of >200×. (The symlinks otherwise do appear to be
correctly generated.) This would make my site syncing take not the
current ~1.5h but something crazy like >5h on average. (It would be
even slower than the slowness implies because the
segfaults/strange-closures appear to strike at random, anywhere in a
run, without regexps necessarily running, so the longer the run takes,
the more likely it is to fail due to a segfault, restarting from
scratch; at a certain point, it becomes effectively impossible to
sync.)

As it doesn't *do* anything other than copy the file name in idRoute
and then call `createFileLink` which ultimately ends up in
System.Posix.File passing into C-land to do the equivalent of `ln -s`,
I am baffled as to what is going on. It can't be Pandoc because Pandoc
isn't involved at all, there is no Markdown processing. I've been
having problems with regexp libraries causing segfaults &
strange-closure errors (which is why I upgraded, yak-shaving) but
there aren't any regexps here, just globs, not that they should
possibly be able to cause the slowdown (if they were slow at listing
because some pathological case was hit, it'd freeze for a long time
and then create the symlinks at full speed). It doesn't call into any
of my janky annotation code where all sorts of crazy stuff could be
happening, since the annotations are merely copied without even being
read, so 99% of the codebase is irrelevant. Reading the git patches
and the CHANGELOG don't include anything that looks like it ought to
create obviously pathologically slow Hakyll ops, as it's mostly doc
updates...

The main culprit I notice is that running with my usual +RTS -N25 -RTS
(I have a 16-core Threadripper CPU), thread utilization is much higher
than before in htop, and RAM use has also roughly doubled (from ~10%
of 78GB to ~20%). One CHANGELOG entry pops out, referring to
142fc79c82b1bd4601c193f1c66edbb1bdaa3174
4c9ee55e14cc96dcd2312bfb235f6431b3da007a
6e77b4e7d2f74da964fd95494dad1ee56d4c4536

> - Make the runtime concurrent, which brings 30% speedups on real-world sites.
> This adds dependencies on `array` and `lifted-async`. Please note that it
> doesn't scale past the number of physical cores; ideas are welcome in
> https://github.com/jaspervdj/hakyll/issues/850 (contribution by
> Laurent P. René de Cotret and Vaibhav Sagar)

If I run with -N1 it somehow manages to OOM out after hitting 57GB
RAM. If I recompile without `-threaded` and run, it hits 20% RAM and
htop shows 2 processes taking 100% CPU time. About 10 minutes later,
it finally starts creating the symlinks; they are perhaps somewhat
slower than the -N25 but it can't be by much. `runhaskell` fails
because of what looks like library $PATH issues I can't be bothered to
figure out right now since they probably are unrelated. Going back to
`-threaded`, -N12, uses more like 7% RAM and runs much faster,
starting the symlinking quickly and running 10x faster than the -N25.
Bumping it to -N16 to match my physical core count, at 10%, it runs a
little faster. -N15 however appears faster than -N16; -N14 about the
same. None are as it was before, and they appear to both increase RAM
use & slow down drastically as they go, so the thread count does not
fix the problem.

I would've liked to ablate it by disabling just the parallelism, but
it appears to be hardwired and a complete rewrite of that
functionality (which probably explains issues like no longer
preserving build order), so I instead reverted back to patch
a35e1c35b138c2a25b27c05936a6bb95f5f3d9e6 / version 4.14.0.0. This
doesn't work with GHC 9.0.1 & available libraries, so I had to change
the hakyll.cabal to loosen memory, cryptonite, template-haskell, base,
and pandoc constraints, and add the aeson constraint. The aeson causes
hashmap/keymap type errors because support was added post-4.14.0.0, so
I had to backport that specific patch
(b8df202e6d310243ff384166e94fae7e8197d144) and edit it until it
applied. *Then* the pre-parallelism Hakyll installed.

The pre-parallelism Hakyll builds in the expected order with the expected speed.

So I conclude that the set of changes mostly involving parallelism
seems to be the issue. My impression of it is that it is fancy and
sounds nice, but far from delivering a 30% speedup, has disastrously
unpredictable and bad performance for my site: it has some sort of
severe quadratic-esque blowup with core count, and even at the best
setting I tried, was at least an order or two slower than before. So
the parallelism part is really really bad for me, but there may be
additional issues with the other packaged changes adding nasty
constant factor overhead.

* I would've gone to the latest GHC but 9.0.1 was the latest that
https://launchpad.net/~hvr/+archive/ubuntu/ghc had packaged for
Ubuntu, and I didn't feel like asking for trouble by installing GHC
manually.

--
gwern
https://www.gwern.net
Reply all
Reply to author
Forward
0 new messages