On 2013-08-09, Cody Koeninger wrote:
>
> Looking forward to the meetup, will see if some of my coworkers @ Digby are
> interested. From the topics you listed, I'm especially curious about your
> thoughts on architecture / avoiding language features.
I'm building a small presentation right now (nothing too fancy), but I'm hoping
to cover some of this stuff (not sure how much at the kickoff). Actually
writing this out is good preparation for pending meetups, so I'll share a
little here in response. Feedback will help me refine the content, so I
appreciate your responses.
The basics are talked about a lot in the larger Scala community (although
broken surprisingly often):
- Prefer val over var.
- Turn all nulls into Options, Eithers, or something similar.
- Avoid Any (refine to the appropriate type).
- Avoid reflection.
- Avoid asInstanceOf casting.
but there's also some more that might be less common among the community at
large:
- Always pattern match in a way that the compiler can check for
exhaustivity.
- Never expose a pattern-match to an algebra that isn't locked down. We
can accept by definition that (List a) is (Cons a List a) or (Nil), but a
lot of our algebras are not so lucky. We should write visitor/fold
functions for these algebras instead and make the algebraic data type
abstract (by making type constructors private).
- Don't let case classes that encode type constructors of ADTs leak out of
the API as types. They aren't types; they are type constructors.
Scala's encoding of ADTs with subtype polymorphism is problematic.
Because of the impedance mismatch between subtyping and ADTs, in Scala we
can compile things like (Some(1): Some[Int]), but Haskell gets it right
because the compiler rejects (Just 1 :: Just Int).
- Limit subtype polymorphism. Traits can be useful internally, but look at
how badly they complicate external APIs like the standard collection
library.
- Furthermore, never use traits as mixins where linearization order
actually matters ("super" is extremely hard to reason about).
- Avoid variance (both call-site and declaration-site) to avoid error-prone
signatures like CovariantCollection.contains(a: Any), not to mention
having to back out of variance when you hit variance gridlock.
- Turn all handleable exceptions into Options, Eithers, or something
similar.
- Similarly, avoid partial functions.
- Wrap all effects with effect-tracking types (like scalaz.IO, or
scalaz.concurrent.Task) for complete referentially transparency.
Monad transformers and free monads from Scalaz helps with some of this.
At Rackspace, we're writing this work orchestration engine, so fault-tolerance
and correctness has been much larger of a concern for us than performance. I
kind of like this, though, because it plays extremely nicely into some
strengths of typed FP.
Finally, there's some features that Odersky/Typesafe has labeled "advanced"
with SIP-18, but that I feel are appropriate if not required -- especially
after all the architectural constraints above:
- using implicits (especially for type class encoding)
- enabling higher-kinded types
- using abstract types to encode existential types
- using type lambdas to partially partially apply type parameters
- allowing the use of post-fix notation, since most of our stuff is
expression-based and semicolon inference isn't that tricky.
Actually, all of this discussion reminds me. Have you seen this post?
http://eed3si9n.com/scala-the-flying-sandwich-parts
I agree with a lot of it, with only a couple of exceptions, which I talked to
in comments:
http://eed3si9n.com/scala-the-flying-sandwich-parts#comment-939276530
Actually, merging this post with some of my own thoughts might lead to a
"Scala: The Good and Bad Parts" presentation. So much to cover. . . for the
kickoff meeting, I need to make sure to not try to cover too much.
> We've been doing a fair amount of work with Spark, which might fall into the
> category of a library with a deeper concept (distributed collections). I'd
> be happy to talk about that at some point if people are interested.
This sounds good to me. I've been interested in Spark for some time, but
haven't had a project that's made it an obvious choice/fit. I'm definitely in
line with the premise that Hadoop is overused for many analytic applications.
-Sukant