Runaway RAM usage on Model.hs

Braden Shepherdson

unread,

Mar 15, 2012, 3:24:25 PM3/15/12

to yeso...@googlegroups.com

I have a fairly modest Model.hs and config/models file that GHC 7.4.1
requires huge quantities of RAM to compile. I was running on a 512MB
VM and it was thrashing painfully. Limiting the RAM with +RTS -M400M
caused GHC to fail after it hit that limit. Increasing the RAM on the
VM to 1GB allowed it to succeed, though it still required all the RAM
on the system and chugged for a while before moving on to the other
files. This wasn't a problem on 7.0. I'm running the latest Yesod and
Persistent (Sqlite), having just cabal installed fresh today after
upgrading to GHC 7.4.1.

The config/models file has 13 tables with four or five fields each.
Model.hs has three derivePersistField calls, two for two-constructor
datatypes and one for a six-constructor datatype. I'm not sure if any
of these individually is causing pathological behavior in GHC, or if
this is a general TH bug.

I'm actually going to be starting a new Yesod-based project later this
week, so I'll look at GHC's behavior on a simpler model and see if
it's just as bad or if it's reasonable.

Braden Shepherdson

Michael Snoyman

unread,

Mar 15, 2012, 3:33:12 PM3/15/12

to yeso...@googlegroups.com

Would it be possible to share the config/models and Model.hs files? So
far, most of the reports have been either far too complicated for a
simple test case, or from closed-source projects. If possible, I'd
like to get something that I can pass on to the GHC team as a
standalone, reproducing example.

Piyush P Kurur

unread,

Mar 16, 2012, 12:22:34 AM3/16/12

to yeso...@googlegroups.com

On Thu, Mar 15, 2012 at 09:33:12PM +0200, Michael Snoyman wrote:
> Would it be possible to share the config/models and Model.hs files? So
> far, most of the reports have been either far too complicated for a
> simple test case, or from closed-source projects. If possible, I'd
> like to get something that I can pass on to the GHC team as a
> standalone, reproducing example.

Just a wild guess:

The entityDefs are now using Text instead of Strings every where. While
Text's are supposed to be more efficient, the TH code still requires
Strings for variable names' constructors etc. This means that the TH
should be doing a pack, unpack routine which might be costing a lot.

Regards

ppk

Michael Snoyman

unread,

Mar 16, 2012, 1:15:21 AM3/16/12

to yeso...@googlegroups.com

On Thu, Mar 15, 2012 at 9:24 PM, Braden Shepherdson
<braden.sh...@gmail.com> wrote:

I've put together a small test case:

https://gist.github.com/2048581

When I run this, it's slow, and it uses a lot of memory... but it
doesn't look like there are any bugs. It eventually compiles. It just
seems that GHC has trouble compiling so much code in a single module.
Perhaps your Model.hs file has something different in it that really
is triggering a bug, but I think the case is just one of "we need a
lot of memory."

I have a theoretical solution to the problem. Currently, the TH code
generates all of the model code (data types and instances) into a
single module and tries to compile them all at once. Most likely, if
we split this into multiple files, it will run much more smoothly.
Unfortunately, we can not longer do that from TH, but instead need to
move into code generation. My idea would be to define all of the
datatypes in one module, each instance of PersistEntity in its own
module, and then import all of those in Model.hs.

This would be a significant effort to implement, so I want to make
sure we have all the facts before embarking on this.

Michael

Piyush P Kurur

unread,

Mar 16, 2012, 1:59:20 AM3/16/12

to yeso...@googlegroups.com

The following code from persistent-template/Database/Persist/TH.hs
looks suspicious.

persistWith :: PersistSettings -> QuasiQuoter
persistWith ps = QuasiQuoter
{ quoteExp = lift . parse ps . pack
}

from persistent/Database/Persist/Quasi.hs

-- | Parses a quasi-quoted syntax into a list of entity definitions.
parse :: PersistSettings -> Text -> [EntityDef]
parse ps = parse' ps
. removeSpaces
. filter (not . empty)
. map tokenize
. T.lines

My guess is that that the entire file has to be read as a string first
then packed and then parsed.

Can we atleast try the following code.

persistWith :: PersistSettings -> QuasiQuoter
persistWith ps = QuasiQuoter
{ quoteExp = lift . parse ps . pack
}

parse :: PersistSettings -> String -> [EntityDef]
parse ps = parse' ps
. removeSpaces
. filter (not . empty)
. map tokenize
. map T.pack
. lines

Here the over head will be only one line I guess.

Regards

ppk

Michael Snoyman

unread,

Mar 16, 2012, 2:02:35 AM3/16/12

to yeso...@googlegroups.com

Without a reproducing test case, discussing this is not really worthwhile. However, from everything I've heard, the code slowdown happens at the simplification phase, which is after any Text packing and unpacking has already occurred.

Felipe Almeida Lessa

unread,

Mar 16, 2012, 6:28:33 AM3/16/12

to yeso...@googlegroups.com

On Fri, Mar 16, 2012 at 2:15 AM, Michael Snoyman <mic...@snoyman.com> wrote:
> I've put together a small test case:
>
> https://gist.github.com/2048581
>
> When I run this, it's slow, and it uses a lot of memory... but it
> doesn't look like there are any bugs. It eventually compiles. It just
> seems that GHC has trouble compiling so much code in a single module.
> Perhaps your Model.hs file has something different in it that really
> is triggering a bug, but I think the case is just one of "we need a
> lot of memory."

That's what I think, too. Here's my hypothesis: some function foo,
that is being called by the code that TH has generated, is getting
inlined everywhere; this creates a huge Core, which consumes lots of
memory and CPU time. On your test with my GHC 7.0, after desugaring
the result size is 55,760, but after simplification phase 0 it jumps
to whopping 985,118, and it tops at 1,032,068. The Core has grown by
18.5x!

> I have a theoretical solution to the problem. Currently, the TH code
> generates all of the model code (data types and instances) into a
> single module and tries to compile them all at once. Most likely, if
> we split this into multiple files, it will run much more smoothly.
> Unfortunately, we can not longer do that from TH, but instead need to
> move into code generation. My idea would be to define all of the
> datatypes in one module, each instance of PersistEntity in its own
> module, and then import all of those in Model.hs.
>
> This would be a significant effort to implement, so I want to make
> sure we have all the facts before embarking on this.

IMHO, we should instead try to understand exactly what is getting
inlined on the model file, or which RULES are being fired. Even if
using multiple files solved the problem (we don't know that), it'll
still generate a lot of code for those definitions, which is bad for
the cache of our processors.

Cheers,

--
Felipe.

Holger Reinhardt

unread,

Mar 16, 2012, 7:18:40 AM3/16/12

to yeso...@googlegroups.com

I've looked at the simplifier output and there's a lot of code being inlined from the text library. It seems that every call to Text.pack produces 200 lines of intermediate code.

Maybe we can define an alias for Text.pack that is not inlined.

2012/3/16 Felipe Almeida Lessa <felipe...@gmail.com>

Michael Snoyman

unread,

Mar 16, 2012, 7:22:11 AM3/16/12

to yeso...@googlegroups.com

I was in the middle of writing an email that said exactly that. Felipe
and I just worked on this a bit, and sure enough he got the exact same
result you did. I just put up a new version of persistent-template on
Yackage and Github[1] that does *exactly* that. In my test cases, it
certainly sped up compilation, but I haven't had one of those
cannot-compile-runs-out-of-memory cases yet.

Can someone who's been suffering from this bug test out the code and
let me know if it solves the problem? If so, I'll release a new
version to Hackage, and then we can figure out if this is something
that can be fixed in text/GHC or not.

Michael

[1] https://github.com/yesodweb/persistent/commit/bd3398f6ddd5c8540e9f8f38acdfe8827c998cc4

Michael Snoyman

unread,

Mar 16, 2012, 7:27:13 AM3/16/12

to yeso...@googlegroups.com

Also, forgot to mention: good call Piyush, I should have paid more attention to what you were saying.

Braden Shepherdson

unread,

Mar 16, 2012, 10:38:53 AM3/16/12

to yeso...@googlegroups.com

I'm pulling down that github version now and will let you know what it
does to my memory and time performance.

If this doesn't solve it and we're still having trouble building a
reproducing case, I'm comfortable with sending the files privately to
some people who are investigating. I can do that this evening (North
American evening) if it proves necessary.

I'll follow up shortly with the results of that new version.

Braden

Max Cantor

unread,

Mar 16, 2012, 10:47:24 AM3/16/12

to yeso...@googlegroups.com

On OSX, as a temp work around, deleting all the -Ox flags from the cabal file helps a lot. Its not really a production solution, but its saving my life in development.

max

Braden Shepherdson

unread,

Mar 16, 2012, 10:49:00 AM3/16/12

to yeso...@googlegroups.com

Success!

With that patched version of persistent-template, I did a clean and
build, and GHC peaked at 420MB resident, 520MB virtual, both less than
half what they were before. I tried again with +RTS -M400M to see if
it would manage on my old 512MB VM, and it peaked at 350MB virtual,
320MB resident. So this seems to have solved the problem!

Thanks for the quick turnaround, and I'm sorry I didn't get back to
the thread sooner with the files in question.

Braden

Max Cantor

unread,

Mar 16, 2012, 10:53:30 AM3/16/12

to yeso...@googlegroups.com

awesome! thanks guys!

Michael Snoyman

unread,

Mar 16, 2012, 10:55:06 AM3/16/12

to yeso...@googlegroups.com

Thanks for testing this. I think I speak for everyone who's been
working on this that it's a huge relief. I'll release a new version to
Hackage now.

Greg Weber

unread,

Mar 16, 2012, 11:27:47 AM3/16/12

to yeso...@googlegroups.com

can you make a report to GHC or would you like met to?

Normally inlining is good. Would we ask them to avoid too much inlining?
It seems that we should really be asking them to switch to Text.

Michael Snoyman

unread,

Mar 16, 2012, 11:31:42 AM3/16/12

to yeso...@googlegroups.com

Felipe already sent a report to the cafe, with Bryan CCed. It seems that the immediate issue is with text's over-use of INLINE. However, I think we might ideally want some kind of enhancement to the IsString that makes it possible to more explicitly deal with string literals.

Reply all

Reply to author

Forward