So compiling my site has become increasingly difficult because hakyll
will eat so much memory it gets killed by the OOM. Some files are
particularly bad, for example my page of Hofstadter's superrationality
essays causes RAM consumption to go from a final 60% to 80%+ when
profiling is turned on and I have to delete it just to get the
profiling runs to finish. (I don't know why this would happen, when on
the commandline, Pandoc apparently can compile it in a fraction of a
second.)
Naturally, the first step is to profile
http://www.gwern.net/hakyll.hs
(somewhat out of date repo:
https://github.com/gwern/gwern.net )
$ ghc -prof -auto-all -rtsopts -O2 -fforce-recomp -optl-s --make
hakyll.hs && nice ./hakyll rebuild +RTS -p -m10 -RTS
which produces bottomUp.prof.
The top culprits:
pandocTransform Main 20.5 17.9
anyLine Text.Pandoc.Parsing 12.8 25.2
main Main 10.6 9.1
convertInterwikiLinks Main 4.2 0.0
likelyAbbrev Text.Pandoc.Readers.Markdown 4.1 3.8
CAF Main 4.0 5.3
str Text.Pandoc.Readers.Markdown 4.0 4.8
myPageCompiler Main 3.6 5.4
convertHakyllLinks Main 2.1 0.0
inline Text.Pandoc.Readers.Markdown 1.8 0.9
spaceChar Text.Pandoc.Parsing 1.3 1.7
inlineListToHtml Text.Pandoc.Writers.HTML 1.0 0.6
variable Text.Pandoc.Templates 1.0 1.2
charOrRef Text.Pandoc.Parsing 1.0 1.0
characterReference Text.Pandoc.CharacterReferences 0.9 1.0
htmlTag Text.Pandoc.Readers.HTML 0.7 1.1
There's probably nothing to be done about main or anyLine or
likelyAbbrev, so what is pandocTransform?
pandocTransform :: Pandoc -> Pandoc
pandocTransform = bottomUp (map (convertInterwikiLinks . convertHakyllLinks))
That's probably indicating one of the 2 convert functions is using too
much memory. So we look at it and its sub-nodes:
pandocTransform Main
22933 0 20.5 17.9 26.8 17.9
convertInterwikiLinks Main
23046 247843051 4.2 0.0 4.2 0.0
inlinesToString Main
23048 6102 0.0 0.0 0.0 0.0
convertHakyllLinks Main
23045 247841141 2.1 0.0 2.1 0.0
inlinesToURL Main
23362 121 0.0 0.0 0.0 0.0
inlinesToString Main
23363 121 0.0 0.0 0.0 0.0
The inlines* are irrelevant, but the two convert* functions use 0% RAM
and just 6.3% of CPU time! Where is the other 14% of time and *26.8*%
RAM going? The only other functions are 'map' and 'bottomUp'!
Looking at bottomUp
http://hackage.haskell.org/packages/archive/pandoc-types/1.8.2/doc/html/Text-Pandoc-Generic.html
it sounds like it might be inefficient. Unfortunately, if we swap
bottomUp for topDown, we find the profile output changes only
trivially (hakyll.prof). The module offers no other options. Am I
stuck?
--
gwern
http://www.gwern.net