[Haskell-cafe] An interesting paper from Google

10 views
Skip to first unread message

Andrew Coppin

unread,
Oct 15, 2010, 4:28:09 PM10/15/10
to haskel...@haskell.org
http://k1024.org/~iusty/papers/icfp10-haskell-reagent.pdf

I'm sure some of you have seen this already. For those who lack the time
or inclination to read through the (six) pages of this report, here's
the summary...

We [i.e., the report authors] took a production Python system and
rewrote bits of it in Haskell, some of which is now in production use.
We conclude the following:

- Python lets you just do whatever the hell you want, but Haskell
demands that you actually have a *plan* before you start churning out
code and running it. The result is generally cleaner and more consistent
when you get there.

- Haskell's much-criticised immutable data is actually an *advantage*
for backtracking search problems.

- Haskell wins for thread-safety.

- ADTs are nicer than exceptions.

- The parts of Haskell stolen by Python aren't as nice in Python as they
are in Haskell. [Well, duh.]

- We like what GHC provides for profiling.

- We are dissappointed by what GHC provides for debugging.

- "String" is too slow. None of the alternatives seem to be widely
supported. If your library consumes Strings and returns Strings, the
fact that ByteString exists doesn't help you.

- Recent changes to GHC broke our code. (Specifically, extensible
exceptions.) We were quite surprised that such a "stable" and "mature"
system as GHC would do this to us.

- Haskell has quite a high barrier to entry. [Again, duh.]

The paper also contains an interesting section that basically says "we
tried porting the Python implementing of XYZ into Haskell, but there
wasn't really any advantage because it's all I/O". In my humble opinion,
"it's all I/O" is a common beginner's mistake. Reading between the
lines, it sounds like they wrote the whole thing in the IO monad, and
then decided it looked just like the existing Python code so there
wasn't much point in continuing. I guess it's one of the bad habits that
imperative programmers get into. With a little more experience, you
eventually figure out that you can limit the stuff actually in the IO
monad to a surprisingly small section, and so almost everything else in
pure code, no matter how much the problem looks like it's "all I/O". But
anyway, I'm only guessing from what I can actually see with my own eyes
in the report itself.

I'm surprised about the profiler. They seem really, really impressed
with it. Which is interesting to me, since I can never seen to get
anything sensible out of it. It always seems to claim that my program is
spending 80% of its runtime executing zipWith or something equally
absurd. I'm unsurprised which their dissappointment with debugging. I'm
still quite surprised that there's no tool anywhere which will trivially
print out the reduction sequence for executing an expression. You'd
think this would be laughably easy, and yet nobody has done it yet.

Their comments about String are sadly true.

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Iustin Pop

unread,
Oct 15, 2010, 5:43:39 PM10/15/10
to Andrew Coppin, haskel...@haskell.org
On Fri, Oct 15, 2010 at 09:28:09PM +0100, Andrew Coppin wrote:
> http://k1024.org/~iusty/papers/icfp10-haskell-reagent.pdf
>
> I'm sure some of you have seen this already. For those who lack the
> time or inclination to read through the (six) pages of this report,
> here's the summary...

Nice summary, I hope you found the paper interesting!

> We [i.e., the report authors] took a production Python system and
> rewrote bits of it in Haskell, some of which is now in production
> use. We conclude the following:
>
> - Python lets you just do whatever the hell you want, but Haskell
> demands that you actually have a *plan* before you start churning
> out code and running it. The result is generally cleaner and more
> consistent when you get there.
>
> - Haskell's much-criticised immutable data is actually an
> *advantage* for backtracking search problems.
>
> - Haskell wins for thread-safety.
>
> - ADTs are nicer than exceptions.
>
> - The parts of Haskell stolen by Python aren't as nice in Python as
> they are in Haskell. [Well, duh.]

I'd say unfortunately, not just duh…

> - We like what GHC provides for profiling.
>
> - We are dissappointed by what GHC provides for debugging.
>
> - "String" is too slow. None of the alternatives seem to be widely
> supported. If your library consumes Strings and returns Strings, the
> fact that ByteString exists doesn't help you.
>
> - Recent changes to GHC broke our code. (Specifically, extensible
> exceptions.) We were quite surprised that such a "stable" and
> "mature" system as GHC would do this to us.
>
> - Haskell has quite a high barrier to entry. [Again, duh.]
>
> The paper also contains an interesting section that basically says
> "we tried porting the Python implementing of XYZ into Haskell, but
> there wasn't really any advantage because it's all I/O". In my
> humble opinion, "it's all I/O" is a common beginner's mistake.
> Reading between the lines, it sounds like they wrote the whole thing
> in the IO monad, and then decided it looked just like the existing
> Python code so there wasn't much point in continuing.

Not quite (not all was in the I/O monad). It doesn't make sense to
rewrite 40K of lines from language A into language B just for fun. But
the advantages were not as strong as for the balancing algorithms to
justify any potential conversion. They were strong, just not strong
enough.

> I guess it's
> one of the bad habits that imperative programmers get into. With a
> little more experience, you eventually figure out that you can limit
> the stuff actually in the IO monad to a surprisingly small section,
> and so almost everything else in pure code, no matter how much the
> problem looks like it's "all I/O". But anyway, I'm only guessing
> from what I can actually see with my own eyes in the report itself.

That's not how I would describe it (if I had to write it in a single
paragraph).

Basically, if you take a random, numerical/algorithmic problem, and you
write it in FP/Haskell, it's easy to show to most non-FP programmers why
Haskell wins on many accounts. But if you take a heavy I/O problem
(networking code, etc.), while Haskell is as good as Python, it is less
easy to show the strengths of the language. Yes, all the nice bits are
still there, but when you marshall data between network and your
internal structures the type system is less useful than when you just
write algorithms that process the internal data. Similar with the other
nice parts.

Now, if I were to start from scratch… :)

> I'm surprised about the profiler. They seem really, really impressed
> with it. Which is interesting to me, since I can never seen to get
> anything sensible out of it. It always seems to claim that my
> program is spending 80% of its runtime executing zipWith or
> something equally absurd.

I'm surprised that you're surprised :) The profiler is indeed awesome,
and in general I can manage to get one factor of magnitude speedup on my
initial algorithms, if not more.

Even if it just tells me that zipWith is the slow part, that's enough.
I'd even say it's a very good hint where to start.

> I'm unsurprised which their
> dissappointment with debugging. I'm still quite surprised that
> there's no tool anywhere which will trivially print out the
> reduction sequence for executing an expression. You'd think this
> would be laughably easy, and yet nobody has done it yet.

Indeed.

Thanks again for the summary :)

regards,
iustin

Andrew Coppin

unread,
Oct 15, 2010, 6:08:14 PM10/15/10
to haskel...@haskell.org
On 15/10/2010 10:43 PM, Iustin Pop wrote:
> On Fri, Oct 15, 2010 at 09:28:09PM +0100, Andrew Coppin wrote:
>> http://k1024.org/~iusty/papers/icfp10-haskell-reagent.pdf
>>
>> I'm sure some of you have seen this already. For those who lack the
>> time or inclination to read through the (six) pages of this report,
>> here's the summary...
> Nice summary, I hope you found the paper interesting!

I often find it interesting seeing newcommer's opinions of Haskell.
Usually that's in the form of a blog that's just a braindump of what a
person has learned in half an hour of tinkering, following a tutorial.
(And usually it either says "oh wow, this is amazing! I never knew
writing programs could be so much FUN!" or it says "oh my God, I can't
believe Haskell SUCKS so much! It doesn't even have an IDE yet. How
primitive is that? And what the heck is a monomorphism anyway?!") It's a
first for me to see a coherantly written, measured account from people
who actually do software "for real", as it were.

>> - The parts of Haskell stolen by Python aren't as nice in Python as
>> they are in Haskell. [Well, duh.]
> I'd say unfortunately, not just duh…

Well, yeah, I can see that angle. I'm just a hopeless Haskell fanatic,
sorry. ;-)

>> The paper also contains an interesting section that basically says
>> "we tried porting the Python implementing of XYZ into Haskell, but
>> there wasn't really any advantage because it's all I/O". In my
>> humble opinion, "it's all I/O" is a common beginner's mistake.
>> Reading between the lines, it sounds like they wrote the whole thing
>> in the IO monad, and then decided it looked just like the existing
>> Python code so there wasn't much point in continuing.
> Not quite (not all was in the I/O monad). It doesn't make sense to
> rewrite 40K of lines from language A into language B just for fun.

...you're talking to somebody who just spent an entire day implementing
a 800-line JavaScript monstrosity that uses the DOM to programatically
generate SVG (hell, I didn't even know that was physically possible!) in
order to do a real-time demonstration of Huffman coding. Just to see if
I could do it. Just for the hell of it.

Really, the notion of actually getting *paid* to write software is quite
alien to me. I'd imagine you prioritise things quite differently.

> But
> the advantages were not as strong as for the balancing algorithms to
> justify any potential conversion. They were strong, just not strong
> enough.

> Basically, if you take a random, numerical/algorithmic problem, and you


> write it in FP/Haskell, it's easy to show to most non-FP programmers why
> Haskell wins on many accounts. But if you take a heavy I/O problem
> (networking code, etc.), while Haskell is as good as Python, it is less
> easy to show the strengths of the language. Yes, all the nice bits are
> still there, but when you marshall data between network and your
> internal structures the type system is less useful than when you just
> write algorithms that process the internal data. Similar with the other
> nice parts.

I forget whether it was Galios or Well-Typed who claimed that "every
program we write is either a compiler or an interpretter". It depends on
exactly how much low-level bit-fiddling your program domain actually
requires of course, but most problems aren't nearly as I/O-centric as
they look. (Then again, you're the one who's seen the code, not me.)

> Now, if I were to start from scratch… :)

Hasn't every programmer said *that* before? ;-)

>> I'm surprised about the profiler. They seem really, really impressed
>> with it. Which is interesting to me, since I can never seen to get
>> anything sensible out of it. It always seems to claim that my
>> program is spending 80% of its runtime executing zipWith or
>> something equally absurd.
> I'm surprised that you're surprised :) The profiler is indeed awesome,
> and in general I can manage to get one factor of magnitude speedup on my
> initial algorithms, if not more.
>
> Even if it just tells me that zipWith is the slow part, that's enough.
> I'd even say it's a very good hint where to start.

zipWith is a generic library function which always takes exactly the
same amount of time. Unless you're using it so extensively that it's
allocating huge amounts of memory or something, it would seem infinitely
more likely that whatever function zipWith is *applying* should be the
actual culprit, not zipWith itself.

Of course, I'm talking about profiling in time. GHC also enables you to
profile in space as well. I'm not actually sure to which one you're
referring. I haven't had much success with either. It's just too hard to
figure out what the sea of numbers actually represent. (Since it's quite
new, I'm assuming it's not the new ThreadScope functionallity - which I
haven't tried yet, but looks extremely cool...)

> Thanks again for the summary :)
>
> regards,
> iustin

No problem. There's no charge for this service. ;-)

Iustin Pop

unread,
Oct 15, 2010, 6:18:28 PM10/15/10
to Andrew Coppin, haskel...@haskell.org
On Fri, Oct 15, 2010 at 11:08:14PM +0100, Andrew Coppin wrote:
> On 15/10/2010 10:43 PM, Iustin Pop wrote:
> >On Fri, Oct 15, 2010 at 09:28:09PM +0100, Andrew Coppin wrote:
> >>I'm surprised about the profiler. They seem really, really impressed
> >>with it. Which is interesting to me, since I can never seen to get
> >>anything sensible out of it. It always seems to claim that my
> >>program is spending 80% of its runtime executing zipWith or
> >>something equally absurd.
> >I'm surprised that you're surprised :) The profiler is indeed awesome,
> >and in general I can manage to get one factor of magnitude speedup on my
> >initial algorithms, if not more.
> >
> >Even if it just tells me that zipWith is the slow part, that's enough.
> >I'd even say it's a very good hint where to start.
>
> zipWith is a generic library function which always takes exactly the
> same amount of time. Unless you're using it so extensively that it's
> allocating huge amounts of memory or something, it would seem
> infinitely more likely that whatever function zipWith is *applying*
> should be the actual culprit, not zipWith itself.

I know about zipWith. And if the profile tells me I spend too much time
in zipWith, it means a few things:

- zipWith might have to force evaluation of the results, hence the
incorrect attribution of costs
- if even after that zipWith is the culprit, it might be the way the
lists are consumed (are they lazy-built?), and that might mean you
just have to workaround that via a different algorithm

Again, the fact that it tells you time is being spent in a library
function is not bad, not at all.

> Of course, I'm talking about profiling in time. GHC also enables you
> to profile in space as well. I'm not actually sure to which one
> you're referring.

In general, time profiling. Although the space profiling is useful too,
it gives you hints on what the (lazy) program does, as opposed to what
you think it does. The retainer graphs are cool, e.g. you might see that
some code hangs on to data more than you fought, and you can save some
heap and GC time due to that.

> I haven't had much success with either. It's just
> too hard to figure out what the sea of numbers actually represent.
> (Since it's quite new, I'm assuming it's not the new ThreadScope
> functionallity - which I haven't tried yet, but looks extremely
> cool...)

I haven't used ThreadScope yet, but it's on my todo list.

iustin

Andrew Coppin

unread,
Oct 15, 2010, 6:29:50 PM10/15/10
to haskel...@haskell.org
On 15/10/2010 11:18 PM, Iustin Pop wrote:
> I know about zipWith. And if the profile tells me I spend too much time
> in zipWith, it means a few things:
>
> - zipWith might have to force evaluation of the results, hence the
> incorrect attribution of costs
> - if even after that zipWith is the culprit, it might be the way the
> lists are consumed (are they lazy-built?), and that might mean you
> just have to workaround that via a different algorithm
>
> Again, the fact that it tells you time is being spent in a library
> function is not bad, not at all.

I remember writing a long, complex program that was really slow. So I
looked at the profile, and discovered that it was spending 98% of the
runtime in... Prelude.floor?! o_O I thought maybe the costs were just
being mis-attributed. But on replacing Prelude.floor with some dark,
mysterious GHC primitives, my program's runtime went from minues to
milliseconds.

So, yeah, profiling can sometimes help you to discover where the time is
_really_ being spent, not where you _think_ it's being spent. (I would
have expected the long complex numerical algorithm to be eating cycles,
not a trivial conversion from floating-point to integers.) But
personally, I usually find it almost intractably difficult to figure out
precisely what's going on by looking at a time profile. (Maybe that says
more about me than about GHC's time profiler...)

>> Of course, I'm talking about profiling in time. GHC also enables you
>> to profile in space as well. I'm not actually sure to which one
>> you're referring.
> In general, time profiling. Although the space profiling is useful too,
> it gives you hints on what the (lazy) program does, as opposed to what
> you think it does. The retainer graphs are cool, e.g. you might see that
> some code hangs on to data more than you fought, and you can save some
> heap and GC time due to that.

Again, I have great difficulty figuring out exactly what the numbers
mean. But it's definitely nice that GHC can spit out all these
statistics for you just by recompiling your program. (Shame it slows
down to a crawl, but I guess that's rather unavoidable...) Knowing how
much GC time you're using is especially invaluable.

> I haven't used ThreadScope yet, but it's on my todo list.

Yeah, that does look like an extremely cool piece of equipment. I hope
they start adding things like space profiling and so forth to the same
infrastructure. (As I understand it, they're intending to do so, it's
just a question of developer time...)

Brandon S Allbery KF8NH

unread,
Oct 15, 2010, 9:24:40 PM10/15/10
to haskel...@haskell.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/15/10 16:28 , Andrew Coppin wrote:
> I'm surprised about the profiler. They seem really, really impressed with
> it. Which is interesting to me, since I can never seen to get anything
> sensible out of it. It always seems to claim that my program is spending 80%
> of its runtime executing zipWith or something equally absurd. I'm

That just means you haven't internalized managing laziness yet, so you're
seeing thunks get processed by zipWith instead of where they ought to be.
(Not that I'll claim to be any better; I just know why it happens.)

> surprised that there's no tool anywhere which will trivially print out the
> reduction sequence for executing an expression. You'd think this would be
> laughably easy, and yet nobody has done it yet.

Hat hasn't been maintained for years, sigh. A number of times I could have
used it... and I'm not confident enough of my ability to grok the code.

> Their comments about String are sadly true.

HP's still struggling with that one (I think some people need to realize
that Text and ByteString have different use cases and correspondingly
different data models, and trying to force both into the list API will only
cause grief, but I digress). I have hope that this situation will improve
in the future.

(Also, I process enough short strings that I'm uncertain of the wisdom of
just using Text or ByteString for everything; this is admittedly a function
of lack of experience.)

- --
brandon s. allbery [linux,solaris,freebsd,perl] all...@kf8nh.com
system administrator [openafs,heimdal,too many hats] all...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university KF8NH
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAky4/tgACgkQIn7hlCsL25UR4ACeJ/HY2OGyjEPCz1k3te+x0MRU
ZUIAoI+P5KL//rkPv8nOZmqYqs90VruC
=UBvU
-----END PGP SIGNATURE-----

Ben Millwood

unread,
Oct 16, 2010, 7:04:23 PM10/16/10
to Andrew Coppin, haskel...@haskell.org
On Fri, Oct 15, 2010 at 9:28 PM, Andrew Coppin
<andrew...@btinternet.com> wrote:
> I'm still quite
> surprised that there's no tool anywhere which will trivially print out the
> reduction sequence for executing an expression. You'd think this would be
> laughably easy, and yet nobody has done it yet.
>

I tried to do something a bit like this:

http://github.com/benmachine/stepeval

but it could be charitably described as "crude": has three failing
testcases and a bagful of unimplemented functionality.

Dan Doel

unread,
Oct 16, 2010, 8:25:46 PM10/16/10
to haskel...@haskell.org
On Saturday 16 October 2010 7:04:23 pm Ben Millwood wrote:
> On Fri, Oct 15, 2010 at 9:28 PM, Andrew Coppin
>
> <andrew...@btinternet.com> wrote:
> > I'm still quite
> > surprised that there's no tool anywhere which will trivially print out
> > the reduction sequence for executing an expression. You'd think this
> > would be laughably easy, and yet nobody has done it yet.
>
> I tried to do something a bit like this:
>
> http://github.com/benmachine/stepeval
>
> but it could be charitably described as "crude": has three failing
> testcases and a bagful of unimplemented functionality.

I believe the Buddha debugger could do something like this, as well, although
normally you wouldn't dump the entire sequence. But it has bit rotted,
unfortunately (it's quite tied to GHC internals, as far as I can tell).

I never used it, but I've had at least one person tell me it was the best
debugger they'd ever used. You type in an expression, and continually step
into different parts of the reduction sequence until you find some core source
of whatever error you're looking for.

If someone were to revive it, I'm sure many people would be appreciative.

-- Dan

wren ng thornton

unread,
Oct 16, 2010, 8:33:27 PM10/16/10
to haskel...@haskell.org
On 10/16/10 8:25 PM, Dan Doel wrote:
> On Saturday 16 October 2010 7:04:23 pm Ben Millwood wrote:
>> On Fri, Oct 15, 2010 at 9:28 PM, Andrew Coppin wrote:
>>> I'm still quite
>>> surprised that there's no tool anywhere which will trivially print out
>>> the reduction sequence for executing an expression. You'd think this
>>> would be laughably easy, and yet nobody has done it yet.
>>
>> I tried to do something a bit like this:
>>
>> http://github.com/benmachine/stepeval
>>
>> but it could be charitably described as "crude": has three failing
>> testcases and a bagful of unimplemented functionality.
>
> I believe the Buddha debugger could do something like this, [...]

> If someone were to revive it, I'm sure many people would be appreciative.

I've been wanting something like this recently. Or more particularly,
I've been wanting a partial evaluator somewhat like Coq's simpl tactic
(and lazy, cbv, red, hnf,...), which can look up the definitions of
functions to inline them and continue evaluation, and which can work
around variables and thus under lambdas. Though it would be nice to be
able to dump the whole sequence instead of just the result.

/me wonders how much time it'd suck away to actually implement it...

--
Live well,
~wren

Bernie Pope

unread,
Oct 16, 2010, 9:32:51 PM10/16/10
to Dan Doel, haskel...@haskell.org
On 17 October 2010 11:25, Dan Doel <dan....@gmail.com> wrote:
> On Saturday 16 October 2010 7:04:23 pm Ben Millwood wrote:
>> On Fri, Oct 15, 2010 at 9:28 PM, Andrew Coppin
>>
>> <andrew...@btinternet.com> wrote:
>> > I'm still quite
>> > surprised that there's no tool anywhere which will trivially print out
>> > the reduction sequence for executing an expression. You'd think this
>> > would be laughably easy, and yet nobody has done it yet.
>>
>> I tried to do something a bit like this:
>>
>> http://github.com/benmachine/stepeval
>>
>> but it could be charitably described as "crude": has three failing
>> testcases and a bagful of unimplemented functionality.
>
> I believe the Buddha debugger could do something like this, as well, although
> normally you wouldn't dump the entire sequence.

Buddha is/was a declarative debugger. The basic idea is to build a
computation tree, and then search the tree for buggy nodes. Normally
the debugger would decide how to traverse the tree, and the user would
simply make judgements about the correctness of reductions stored in
the visited nodes. However, buddha also allowed the user to explore
the tree manually. None of this was particularly unique to buddha, and
most other declarative debuggers allow you to do this too.
Unfortunately most declarative debuggers don't make it far past the
proof of concept stage.

The HAT tracer also supports/supported declarative debugging and has
many useful trace exploration tools.

> But it has bit rotted,
> unfortunately (it's quite tied to GHC internals, as far as I can tell).

As the author of buddha I can confirm that it hasn't been maintained.
The main dependency on GHC is a small amount of code for printing data
structures. In fact some of that could would be easier to do now than
it was then, because GHC includes data constructor names by default in
compiled code (this was added to support the ghci breakpoint
debugger).

> I never used it, but I've had at least one person tell me it was the best
> debugger they'd ever used. You type in an expression, and continually step
> into different parts of the reduction sequence until you find some core source
> of whatever error you're looking for.

I'm happy to hear someone like it so much. Declarative debugging is a
very nice idea (invented for Prolog by Ehud Shapiro - he is now famous
for DNA computing), but it is hard to make practical. Probably the
best declarative debugger I know of is the one provided for the
Mercury programming language. However, Mercury is a strict language,
which simplifies some aspects of the design.

The problem is that it is hard to scale to long running computations,
because the computation tree can become huge. HAT tackles this problem
by saving a trace record to file, although this can have rather high
runtime overheads. The declarative debugger for Mercury language
tackles this problem by piecemeal construction of the computation
tree, and by regenerating parts of it on demand by re-execution of
code. Re-execution in a lazy language is quite challenging (I tried to
do it in buddha).

I have some ideas about interspersing declarative debugging with
computation, but never had the time to implement it (though I think it
would make a great research project).

> If someone were to revive it, I'm sure many people would be appreciative.

I think it might be better to put more effort into HAT.

Cheers,
Bernie.

Evan Laforge

unread,
Oct 18, 2010, 9:37:35 PM10/18/10
to Andrew Coppin, haskel...@haskell.org
>> Of course, I'm talking about profiling in time. GHC also enables you
>> to profile in space as well. I'm not actually sure to which one
>> you're referring.
>
> In general, time profiling. Although the space profiling is useful too,
> it gives you hints on what the (lazy) program does, as opposed to what
> you think it does. The retainer graphs are cool, e.g. you might see that
> some code hangs on to data more than you fought, and you can save some
> heap and GC time due to that.

In my experience, space profiling is more relevant, because the big
time killer is the GC, and the reason for that is space leaks. That's
also the thing that is essential to fix for interactive apps because
it causes GC hangs. But while I think the heap profiler is pretty
good, it's still not a trivial matter by any means.

Unfortunately, tracking down space leaks is still very much a trial
and error process for me. Sometimes it leads me right to a suspicious
thunk and a quick $! or bang pattern fixes it. But often no matter
how much I experiment strictifying things, the lag, drag, and void is
unaffected. Sometimes it even causes more drag.

There's also a ghc bug that makes it hang when filtering the heap
profile by biography, so usually that feature is unavailable.

I've had variable results with time profiling too. As far as I can
see, you can't sort the output by allocation rather than time, and
sometimes it seems like the times don't add up (i.e. 'modify' is n%
time, but all instances in the list are 0.0%)

For instance, currently I have the top consumer of both time and alloc
as 'get', which is 'lift . Monad.State.Strict.get'. Of course it
occurs in a million places in the complete profile, along with
mysteries like a line with 0 entries 0.7%/0.1 time/alloc. Doesn't 0
entries mean it was never called? Meanwhile, a line with 37000
entries has 0.2/0.2. Is the difference how the 'get'ed value was
used? And then there's the wider question of how 'get' is taking so
much time and space. Doesn't it just return a pointer to the State
value? Biographical profiling shows large amounts of void, lag, and
drag, but no clear way to trace that to the code that is responsible.

And what does it mean when the top consumer is 'lift'? What's it
*doing* in there?

Maybe time and space profiling is just hard, no matter the tools, or
maybe it's just hard for me. But it's been the most opaque and
slippery part of haskell for me by far. Perhaps more people don't
have a problem with it only because there are not that many
interactive long-running haskell apps out there.

Also, I don't know what the time axis on the heap graphs is, but it
appears to be CPU time, because I can't correlate it to the time of
anything else, and if I try forcing things at certain times to see the
effect, there is no visible effect. The space axis also doesn't
correspond with the max residency on the -s output, and for that
matter, the .prof total alloc doesn't correspond with the 'bytes
allocated on the heap' from -s. The 'n bytes x seconds' in the heap
graph doesn't seem to correspond with the alloc rate, or anything else
I can see. But I guess I don't really need to know what these numbers
mean, I just need to know I want them to get smaller :)

My current mystery is a 'sequence_ $ replicate $
run_something_expensive' that shows large amounts of void and drag the
first 5 times... but the 6th time it's all been converted to either
USE or LAG. The only thing "using" the results is 'DeepSeq.deepseq'.
The actual use is split between a very large number of SCCs and types.

Anyway, I'm not sure where I was going with all this, except that it's
a source of continuing confusion for me and if others have different
or similar experiences I'd be interested to hear about them. While I
would love better debugging of course, I can always chunk in more
'trace's and write more detailed tests and eventually the bug has
nowhere to hide. No such luck with space leaks. If I were a
haskell-using company and could fund tool improvement, it would be in
the area of tools for detecting and tracking down space usage or even
static analysis for space leaks.

Brandon S Allbery KF8NH

unread,
Oct 19, 2010, 12:53:35 PM10/19/10
to haskel...@haskell.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/18/10 21:37 , Evan Laforge wrote:
> For instance, currently I have the top consumer of both time and alloc
> as 'get', which is 'lift . Monad.State.Strict.get'. Of course it
> occurs in a million places in the complete profile, along with
> mysteries like a line with 0 entries 0.7%/0.1 time/alloc. Doesn't 0
> entries mean it was never called? Meanwhile, a line with 37000
> entries has 0.2/0.2. Is the difference how the 'get'ed value was
> used? And then there's the wider question of how 'get' is taking so
> much time and space. Doesn't it just return a pointer to the State
> value? Biographical profiling shows large amounts of void, lag, and
> drag, but no clear way to trace that to the code that is responsible.

Any time you see something "inexplicable" like lots of time being attributed
to something simple like "get", it means that something isn't strict enough
and "get" is having to force a bunch of lazy evaluations to do its job.
Since you're using State.Strict but lift-ing to get there, I'd first look at
the strictness of the monad you're lift-ing from. (I'm assuming
State.Strict does what the label says, but it's possible that it's not
strict in the way you need; strictness is kinda tricky.)

Moral of the story: time is accounted to the function that forces
evaluation of lazy thunks, not to the thunks themselves or the function that
created the lazy thunks. (I think the latter is impossible without passing
around a lot of expensive baggage, and in any case doesn't tell you anything
useful; unexpected functions taking a lot of time, on the other hand, tells
you right away that there's excessive laziness in the invocation somewhere
and gives you a starting point to track it down.)

- --
brandon s. allbery [linux,solaris,freebsd,perl] all...@kf8nh.com
system administrator [openafs,heimdal,too many hats] all...@ece.cmu.edu
electrical and computer engineering, carnegie mellon university KF8NH
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAky9zQ8ACgkQIn7hlCsL25UvhACeIGaziKg+nx6cTWRLnwjf0T5c
Gg8An1ZvNSDj/NXh032wsTGWZjLxZ7xD
=VPo+
-----END PGP SIGNATURE-----

Reply all
Reply to author
Forward
0 new messages