The config/models file has 13 tables with four or five fields each.
Model.hs has three derivePersistField calls, two for two-constructor
datatypes and one for a six-constructor datatype. I'm not sure if any
of these individually is causing pathological behavior in GHC, or if
this is a general TH bug.
I'm actually going to be starting a new Yesod-based project later this
week, so I'll look at GHC's behavior on a simpler model and see if
it's just as bad or if it's reasonable.
Braden Shepherdson
Just a wild guess:
The entityDefs are now using Text instead of Strings every where. While
Text's are supposed to be more efficient, the TH code still requires
Strings for variable names' constructors etc. This means that the TH
should be doing a pack, unpack routine which might be costing a lot.
Regards
ppk
I've put together a small test case:
https://gist.github.com/2048581
When I run this, it's slow, and it uses a lot of memory... but it
doesn't look like there are any bugs. It eventually compiles. It just
seems that GHC has trouble compiling so much code in a single module.
Perhaps your Model.hs file has something different in it that really
is triggering a bug, but I think the case is just one of "we need a
lot of memory."
I have a theoretical solution to the problem. Currently, the TH code
generates all of the model code (data types and instances) into a
single module and tries to compile them all at once. Most likely, if
we split this into multiple files, it will run much more smoothly.
Unfortunately, we can not longer do that from TH, but instead need to
move into code generation. My idea would be to define all of the
datatypes in one module, each instance of PersistEntity in its own
module, and then import all of those in Model.hs.
This would be a significant effort to implement, so I want to make
sure we have all the facts before embarking on this.
Michael
The following code from persistent-template/Database/Persist/TH.hs
looks suspicious.
persistWith :: PersistSettings -> QuasiQuoter
persistWith ps = QuasiQuoter
{ quoteExp = lift . parse ps . pack
}
from persistent/Database/Persist/Quasi.hs
-- | Parses a quasi-quoted syntax into a list of entity definitions.
parse :: PersistSettings -> Text -> [EntityDef]
parse ps = parse' ps
. removeSpaces
. filter (not . empty)
. map tokenize
. T.lines
My guess is that that the entire file has to be read as a string first
then packed and then parsed.
Can we atleast try the following code.
persistWith :: PersistSettings -> QuasiQuoter
persistWith ps = QuasiQuoter
{ quoteExp = lift . parse ps . pack
}
parse :: PersistSettings -> String -> [EntityDef]
parse ps = parse' ps
. removeSpaces
. filter (not . empty)
. map tokenize
. map T.pack
. lines
Here the over head will be only one line I guess.
Regards
ppk
Without a reproducing test case, discussing this is not really worthwhile. However, from everything I've heard, the code slowdown happens at the simplification phase, which is after any Text packing and unpacking has already occurred.
That's what I think, too. Here's my hypothesis: some function foo,
that is being called by the code that TH has generated, is getting
inlined everywhere; this creates a huge Core, which consumes lots of
memory and CPU time. On your test with my GHC 7.0, after desugaring
the result size is 55,760, but after simplification phase 0 it jumps
to whopping 985,118, and it tops at 1,032,068. The Core has grown by
18.5x!
> I have a theoretical solution to the problem. Currently, the TH code
> generates all of the model code (data types and instances) into a
> single module and tries to compile them all at once. Most likely, if
> we split this into multiple files, it will run much more smoothly.
> Unfortunately, we can not longer do that from TH, but instead need to
> move into code generation. My idea would be to define all of the
> datatypes in one module, each instance of PersistEntity in its own
> module, and then import all of those in Model.hs.
>
> This would be a significant effort to implement, so I want to make
> sure we have all the facts before embarking on this.
IMHO, we should instead try to understand exactly what is getting
inlined on the model file, or which RULES are being fired. Even if
using multiple files solved the problem (we don't know that), it'll
still generate a lot of code for those definitions, which is bad for
the cache of our processors.
Cheers,
--
Felipe.
Can someone who's been suffering from this bug test out the code and
let me know if it solves the problem? If so, I'll release a new
version to Hackage, and then we can figure out if this is something
that can be fixed in text/GHC or not.
Michael
[1] https://github.com/yesodweb/persistent/commit/bd3398f6ddd5c8540e9f8f38acdfe8827c998cc4
Also, forgot to mention: good call Piyush, I should have paid more attention to what you were saying.
If this doesn't solve it and we're still having trouble building a
reproducing case, I'm comfortable with sending the files privately to
some people who are investigating. I can do that this evening (North
American evening) if it proves necessary.
I'll follow up shortly with the results of that new version.
Braden
max
With that patched version of persistent-template, I did a clean and
build, and GHC peaked at 420MB resident, 520MB virtual, both less than
half what they were before. I tried again with +RTS -M400M to see if
it would manage on my old 512MB VM, and it peaked at 350MB virtual,
320MB resident. So this seems to have solved the problem!
Thanks for the quick turnaround, and I'm sorry I didn't get back to
the thread sooner with the files in question.
Braden
Normally inlining is good. Would we ask them to avoid too much inlining?
It seems that we should really be asking them to switch to Text.
Felipe already sent a report to the cafe, with Bryan CCed. It seems that the immediate issue is with text's over-use of INLINE. However, I think we might ideally want some kind of enhancement to the IsString that makes it possible to more explicitly deal with string literals.