Help requested: naming things in conduit

33 views
Skip to first unread message

Michael Snoyman

unread,
Jun 28, 2012, 1:11:26 PM6/28/12
to Haskell Cafe, streamin...@googlegroups.com
Hi all,

I'm just about ready to make the 0.5 release of conduit. And as usual,
I'm running up against the hardest thing in programming: naming
things.

Here's the crux of the matter: in older versions of conduit, functions
would have a type signature of Source, Sink, or Conduit. For example:

sourceFile :: MonadResource m => FilePath -> Source m ByteString

I think most people can guess at what this function does: it produces
a stream of ByteStrings, which are read from the given file.

Now the trick: Source (and Sink and Conduit) are all type synonyms
wrapping around the same type, Pipe. Ideally, we'd like to be able to
reuse functions like sourceFile in other contexts, such as producing a
Conduit that calls sourceFile[1]. However, the type synonym Source
over-specifies some of the type parameters to Pipe, and therefore
`sourceFile` can't be used directly to create a Conduit[2].

To get around this whole problem, I've added a number of type synonyms
with rank-2 types, that don't over-specify. You can see the type
synonyms here[3], and more explanation of the problem here[4]. So my
question is: can anyone come up with better names for these synonyms?
Just to summarize here:

* All of the generalized types start with a G, e.g., Source becomes GSource.
* For Sinks and Conduits, if leftovers are generated, there's an L
after the G (e.g., GLSink).
* For Sinks and Conduits which consume all of their input and then
return the upstream result, we tack on an Inf for Infinite (e.g.,
GInfConduit, GLInfSink).

I think these names are relatively descriptive, and certain `GSink
ByteString m Int` is easier to follow than `Pipe l ByteString o u m
Int`, but I was wondering if anyone had some better recommendations.

Michael

[1] For example, maybe we want to produce `conduitFiles ::
MonadResource m => Conduit FilePath m ByteString`
[2] This problem exists to a smaller extent in conduit 0.4. This is
the purpose of the sinkToPipe function.
[3] https://github.com/snoyberg/conduit/blob/52d7bc0b551b877de92be4c87f933e3ffb1bb9f6/conduit/Data/Conduit/Internal.hs#L132
[4] https://github.com/snoyberg/conduit/blob/a853141d7b9eed047c7cc790979f73a346740ea0/conduit/Data/Conduit.hs#L403

Paolo Capriotti

unread,
Jun 28, 2012, 1:36:30 PM6/28/12
to streamin...@googlegroups.com, Haskell Cafe
I ran into this problem myself with my implementation that used 7 type
parameter (the extra parameter wrt to conduit was used by Defer), and I
couldn't think of any satisfactory solution.

The dilemma here is:

- exposing the full `Pipe` type as the primary API would be really confusing
for new users
- creating a bunch of type synonyms adds a lot of conceptual overhead, and
it's actually a leaky abstraction, because `Pipe` will probably be shown in
error messages, and appears in the signatures of basic combinators

In the end, I gave up the 2 non-essential parameters, built the corresponding
lost features on top of `Pipe` using newtypes, and decided to expose a
5-parameter `Pipe` type with no universally quantified synonyms.

I'm not sure how easy this Pipe type is to understand, but at least all
parameters have a clear meaning that can be explained in the documentation,
whereas the `l` parameter is sort of a hack (like my 'd' parameter).

BR,
Paolo

Michael Snoyman

unread,
Jun 29, 2012, 12:22:19 AM6/29/12
to streamin...@googlegroups.com, Haskell Cafe
I think even five parameters are too many. The original conduit types
had either 2 or 3 parameters, and each one was essential and easily
explainable. I realize that- for now- type synonyms will not help at
all with error messages (which I consider a serious problem), but at
least normal API functions like sourceFile will get helpful
signatures.

One idea that I've toyed around with- but not really pursued- is
creating actual newtypes for Source, Conduit, and Sink, and using
Chris's typeclass approach for when we want general functions. After
some basic fiddling, the typeclasses just seem to make everything more
difficult to work with.

You're correct by the way that we need a lot of type synonyms (I got 9
of them). But I still think it helps with the overhead instead of
hurting. While it may be important for some cases to understand the
different between GSink and GLSink, for most use cases simply knowing
"oh, this thing takes a stream of `a` and gives a single result of
`b`" is sufficient. But I think only real world usage is going to help
us determine the best approach here.

Michael
Reply all
Reply to author
Forward
0 new messages