Re: [HF Board] Why Hasura is moving from Haskell to Rust

36 views

Skip to first unread message

Andrew Lelechenko

unread,

Jan 6, 2024, 5:13:16 AMJan 6

to Haskell Foundation Board

FWIW I think there is a more specific advice for industrial codebases than just “use boring haskell”. E. g., GADTs / DataKinds are less of an issue: they might be difficult to write, but “if it typechecks, it typechecks”, so they incur only limited maintenance costs, little compile-time penalty and no runtime penalty.

But TH or type families are a recipe for disaster, because they do not scale.

The pattern I saw a couple of times is: we start with a bad design / architecture, which requires lots of boilerplate. Instead of rethinking it, we say “haskell has great metaprogramming capabilities” and splash TH / SYB / Generics everywhere. Our data types are all ad-hoc and need a proper refactoring, but we just go for TypeFamilies to make it stick together. Then we are trapped: once our application has grown beyond a certain size, compilation is insanely long, migration to a new GHC is impossible, runtime is sluggish - and there is no way out, because now you have to redesign everything from the scratch.

Best regards,

Andrew

On 22 Dec 2023, at 17:04, Chris Dornan <ch...@chrisdornan.com> wrote:
Tom and Ryan,

As I see it Haskellers do have a tendency to excess cleverness that can be lethal if unchecked. It is incredibly important when building code bases _at scale_ to be incredibly conservative and only structure things along lines that have been proven _at scale_ as it is just too likely that nasty properties will emerge otherwise. I think this is a universal principle, but Haskellers are probably most likely to push out the boat on this where the desire to 'do things right' can easily subvert more conservative principles.

The second system effect looks like it _might_ have been a factor is this development (we would need more details to be sure).

An extremely sobering and valuable case study.

I am curious to hear what Andres thinks.

Chris
On 22 Dec 2023, at 16:39, 'Ryan Trinkle' via Haskell Foundation Board <bo...@haskell.foundation> wrote:
Hi Tom,
Thanks for the summary; this is indeed very interesting.
This matches my experience. When developing Reflex, getting the performance good enough was one of the main difficulties. The only thing I could really get much use out of was 1) custom instrumentation of my code, and 2) linux perf. The profiling tools of Haskell itself were not very useful because everything in Reflex needs to get inlined in order to have reasonable performance, and this screws up the attribution of costs completely. I also found that I needed to convert a lot of code to be very close to IO to achieve reasonable performance, rather than writing it the way I would have liked. (The heap profiler, on the other hand, was very useful; I don't really have any complaints there.)
One thing I found is that refactoring performance-sensitive code is terrifying because optimizations are not very stable under common refactors. For instance, combining two functions or splitting a function in two will often impact inlining dramatically. I assume this is also the case for languages like Rust, but they are so much less reliant on inlining/optimization that it is less of an issue. (Or maybe it is an issue and I'm just not experienced enough in those languages. I'm sure the high frequency trading guys don't like to refactor their perf-sensitive bits.)
The issue with developers being given the tools to hang themselves with is a huge deal. Probably half of the "engineering culture" at Obsidian is dedicated to reversing the tendency of Haskell developers to build things that are "too fancy". I'm basically happy with what our engineers put out in that regard, but I have seen a lot of other companies where the leadership is unable or unwilling to oppose the "too fancy" stuff sufficiently, and it basically tanks the codebase. There was the whole "simple haskell" movement a few years ago, which I think is basically in the right direction, but also I think the enforcement has to be pretty local and specific to actually work. Of course, strong leadership is also needed in other languages; this is just the flavor that our need for strong leadership takes.

Best,
Ryan

On 12/22/23 12:13, Tom Ellis wrote:
Dear board,

I have just interviewed an industrial Haskell user and in his
experience:

* "Well-written" Haskell often performs poorly

* GHC consumes too much time and memory

* The tooling is flaky

* These problems form a vicious cycle and compound each other

My interviewee was developer Lyndon Maydwell from database company
Hasura.  You may have heard that Hasura, which developed the first two
versions of its software in Haskell, is changing to Rust for their
version 3.  Lyndon kindly agreed to talk to me about the reasons for
the switch.  This message is a summary of the the parts of the
conversation I found pertinent to the HF.

I'm paraphrasing a lot, and all errors or omissions are mine not
Lyndon's.  It's probably best if you don't read this message as a
literal report into Haskell use at Hasura, but rather as my
explanation of some systemic risks faced by companies that use
Haskell, informed by Lyndon's experience at Hasura (I surely don't
have enough insight to accurately present the former).


Lyndon said that there were two major issues with developing their
version 2 in Haskell

1. Garbage collection issues and run time memory requirements

2. Build time and memory requirements

As I understand it, the major contributing factor to point 1 is the
architecture of Hasura's codebase. Lyndon suggested that a Haskell
architecture designed taking into account the lessons learned from
their version 2 would not suffer from the problems to the same degree.

However, the poor architecture of the codebase was a direct
consequence of trying to implement Haskell "best practices".  Lyndon
says that the version 2 architecture adheres to the principles of
"Parse don't validate" (although maybe "Make invalid states
unrepresentable" would be more accurate slogan).  That is, their
design takes advantage of Haskell's type system to enforce invariants.
But some ways of making invalid states unrepresentable led to a
codebase that performed poorly and was difficult to evolve. In
particular those ways include abstraction that involved
closures/wrapping things in lambdas (as opposed to pure data
representations) and type-level programming.

To elaborate more on the garbage collection part of 1, the major
problem was pause times in what should be low latency systems.  Lyndon
said that it's not purely Haskell's fault, and that using compact
regions would have solved the problem.  However, by the time they
needed compact regions it was impossible to refactor their
architecture to an architecture where they could use compact regions,
because of the rigid style of their use of closures/lambdas and
type-level programming.

A further problem along similar lines was that the architecture had
high peak memory usage at server startup which meant a lot of wasted
memory during the rest of the server operation.  Ideally the results
of server startup would be cached and not recomputed next time the
server started.  However, to the end of of making invalid states
unrepresentable, the data that should have been cached was full of
lambdas/closures, which couldn't be serialised.

One way of understanding what happened is to notice that there is a
vicious cycle amongst various interacting phenomena such as the
following:

* Haskell programmers (creditably) like to "make invalid states
  unrepresentable"

* Some approaches to making invalid states unrepresentable lead to
  poor system architecture.

* Some approaches to making invalid states unrepresentable lead to
  code which is prohibitively difficult to refactor

* Compile times become painfully slow because of trying to make
  invalid states unrepresentable using type-level programming, or
  because of using Generic.

* It becomes prohibitively difficult to evolve the codebase because it
  is too easy to accidentally introduce high memory usage (for example
  because RULES that once fired failed to fire and/or space leaks were
  introduced)

* Tooling is flaky, so changes needed to mitigate resource usage
  problems snag on tooling problems (Lyndon mentioned that he saw HLS
  have problems with .hs-boot files)

* Profiling tooling is insufficient or too hard to use

I get the impression that these interacting factors made Hasura's
developers get bogged down and development slowed, and they just
struggled to get Haskell to do what is needed.

Regarding item 2, Lyndon says that "the build times and memory
requirements for building Haskell programs was enormous".  I think
items 1 and 2 are actually related: I haven't analysed this carefully,
but I think it's plausible that GHC contains numerous space leaks
which lead to high memory usage, and high memory usage leads to lots
of time spent in garbage collection.  Undoubtedly there are also many
other opportunities for optimising GHC.


Lyndon mentioned some other, less significant, factors

3. HLS was a massive improvement when it was introduced, but the
   benefits from using it seem to have plateaued (not to mention it
   can be flaky) despite scope for much more

4. He wanted to develop a plugin system for Hasura using Haskell's
   ability to dynamically load libraries, but the absence of ABI
   compatibility made this worthless in practice.

5. They had a tiny number of open source contributions and they guess
   that's because very few people know Haskell


Hasura chose to develop their version 3 in Rust.  As I understand
things, the impetus for version 3 was not to move away from Haskell
but to move to a new architecture that wouldn't suffer from the
problems I explained above.  Since their codebase couldn't be
refactored to the new architecture (due to reasons described above) it
would require a rewrite anyway.  If they were going to do a rewrite
anyway they decided it may as well be in Rust, because it wouldn't
suffer from most of the problems mentioned above and have other
benefits besides.  In particular, there would be less incentive to try
to be "clever" and do things the "right way" by enforcing invariants
by abstracting with closures/lambdas and with type-level programming.
That is, they chose to move away from Haskell not so much because of
problems with well-architected Haskell, but because it's too easy,
convenient and tempting to get stuck in poorly-architected tar pits
(and those tar pits come from using Haskell the "right way").

What didn't Lyndon mention?  Interestingly, he didn't mention any
frustrations with stability/backwards compatibility.

Hope that's interesting to folks!

Tom
--
You received this message because you are subscribed to the Google Groups "Haskell Foundation Board" group.
To unsubscribe from this group and stop receiving emails from it, send an email to board+un...@haskell.foundation.
To view this discussion on the web visit https://groups.google.com/a/haskell.foundation/d/msgid/board/87d585b8-0844-4000-a6a5-f77468050e3e%40trinkle.org.
--
You received this message because you are subscribed to the Google Groups "Haskell Foundation Board" group.
To unsubscribe from this group and stop receiving emails from it, send an email to board+un...@haskell.foundation.
To view this discussion on the web visit https://groups.google.com/a/haskell.foundation/d/msgid/board/5F98277A-29E9-4F14-B8CE-941FF6F428FB%40chrisdornan.com.

Reply all

Reply to author

Forward

0 new messages