On the need for an Empty input case in the Iteratee pattern

77 views
Skip to first unread message

Ben Hutchison

unread,
Jun 24, 2013, 8:52:38 PM6/24/13
to scala...@googlegroups.com, James Roper, Runar Bjarnason, Josh Suereth
At last night's Melbourne Scala meeting, there was much discussion,
debate and handwaving around a whiteboard, focused on the Iteratee
pattern.

The subject of debate was why the Iteratee pattern is always described
with 3 input cases:

1. An Element
2. EOF
3. An Empty element

What is the purpose of the Empty input? Why doesn't the producer code
that "runs" the iteratee simply collapse empty elements out of the
stream?

The matter was not settled to general satisfaction last night, so I
wrote up a question on stack overflow:

http://stackoverflow.com/questions/17287415/why-is-the-empty-input-case-needed-in-scala-iteratees

CC'ing James, Runar & Josh, the authors of 3 prominent blog posts on
Scala Iteratees. Any clarifications appreciated..

-Ben

Tony Morris

unread,
Jun 24, 2013, 8:56:46 PM6/24/13
to scala...@googlegroups.com
It just means, "this iteratee has no more input for you (but that does
not mean input is exhausted completely)."


--
Tony Morris
http://tmorris.net/

Bernie Pope

unread,
Jun 24, 2013, 9:33:43 PM6/24/13
to scala...@googlegroups.com

On 25/06/2013, at 10:56 AM, Tony Morris wrote:

> On 25/06/13 10:52, Ben Hutchison wrote:
>>
>>
>> What is the purpose of the Empty input? Why doesn't the producer code
>> that "runs" the iteratee simply collapse empty elements out of the
>> stream?

> It just means, "this iteratee has no more input for you (but that does
> not mean input is exhausted completely)."

Thanks for raising the question on SO Ben.

If you look at the documentation for the Haskell Enumerator library:

http://hackage.haskell.org/packages/archive/enumerator/0.4.19/doc/html/Data-Enumerator-Internal.html

it says:

"(Chunks []) is used to indicate that a stream is still active, but currently has no available data. Iteratees should ignore empty chunks."

But that raises the question we had last night: if empty chunks should be ignored by iteratees, would it make sense for the enumerator to collapse them instead?

Cheers,
Bernie.

Ben Hutchison

unread,
Jun 24, 2013, 10:10:00 PM6/24/13
to scala...@googlegroups.com
Apparently, since James isn't on the list this reply bounced..

---------- Forwarded message ----------
From: James Roper <james...@typesafe.com>
Date: Tue, Jun 25, 2013 at 11:35 AM
Subject: Re: On the need for an Empty input case in the Iteratee pattern
To: Ben Hutchison <brhut...@gmail.com>
Cc: "scala...@googlegroups.com" <scala...@googlegroups.com>, Runar
Bjarnason <runar...@gmail.com>, Josh Suereth
<joshua....@gmail.com>


Hi Ben,

There is one main purpose for Empty, and that is for use with Done and
flatMap. Done must contain the left over input, so for example, if
you had a dropWhile iteratee:

def dropWhile[E](p: E => Boolean): Iteratee[E, Unit] = Cont {
case Input.El(e) if p(e) => dropWhile(p)
case Input.Empty => dropWhile(p)
case in => Done(Unit, in)
}

Now imagine this code:

Enumerator(1, 2, 3, 4) &> dropWhile[Int](_ < 3).flatMap(_ => Iteratee.getChunks)

The result should be Seq(3, 4), but the dropWhile iteratee consumes 3
from the enumerator, how does Iteratee.getChunks get that 3? The way
it works is it passes 3 to Done as the left over input, and that
allows the implementation of flatMap to feed the left over 3 into the
getChunks iteratee before consuming more input from the enumerator.

But what if the iteratee doesn't have any left over input when it's
done? For example:

def drop[E](n: Int): Iteratee[E, Unit] = if (n <= 0) {
Done(Unit, Input.Empty)
} else Cont {
case _: Input.El => drop(n - 1)
case Input.Empty => drop(n)
case eof: Input.EOF => Done(Unit, eof)
}

As you can see, it needs to instead pass empty (this is actually the
default value for the argument), and then the implementation flatMap
can know that there's nothing to feed to the next iteratee.

Another convenient use for empty is for dropping input with an enumerator:

Enumerator.mapInput[Int] {
case Input.El(v) if v < 3 => Input.Empty
case other => other
}

I guess another way it could be implemented is to have Option[Input]
as the left over input for Done. This would make implementing
iteratees simpler since they wouldn't need to handle empty. I'm not
sure of any other purposes for empty.

Cheers,

James
--
James Roper
Software Engineer

Typesafe – Empowering professional developers to build amazing apps.
Twitter: @jroper

See you at Scala Days 2013 in NYC!
June 10th - June 12th
www.scaladays.org

Chen Harry

unread,
Jun 25, 2013, 3:10:20 AM6/25/13
to scala...@googlegroups.com
Hi, suppose I want to count the number of alphabetic character in a string, for instance, "I love Scala". The Enumerator can interpret the space character (and '\t', '\n') as Empty so that the Counter Iteratee can be implemented easier. 
Please correct me if I am wrong.
Regards,
Harry



--
You received this message because you are subscribed to the Google Groups "Melbourne Scala User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-melb+...@googlegroups.com.
To post to this group, send an email to scala...@googlegroups.com.
Visit this group at http://groups.google.com/group/scala-melb.
For more options, visit https://groups.google.com/groups/opt_out.





--
Best regards,

Harry

Ben Hutchison

unread,
Jun 25, 2013, 3:54:49 AM6/25/13
to scala...@googlegroups.com

In your example, there are 2 iteratees, a whitespace filter and a counter. I think its clear that filter iteratees are useful, that's not in dispute.

But I think filters can be implemented without needing an Empty input message type. The enumerator can just omit the call to the iteratee entirely if the message is Empty.

Ben

Chen Harry

unread,
Jun 25, 2013, 10:32:32 PM6/25/13
to scala...@googlegroups.com
Sometimes the Iteratee doesn't consume the chunk (i.e, the peek function) and hence it return Done(result, input),
Sometime it does (i.e. the head function). and hence it return Done(result, Empty) to indicate that the chunk has been consumed.
In Haskell, they use Maybe instead (Done(result, Just(input)) or Done(result, Nothing)) but I think the idea is same that we need to differentiate whether the iteratee consumes the chunk and Empty is just a signal to tell this.


Bryan Murphy

unread,
Jul 9, 2013, 1:52:36 AM7/9/13
to scala...@googlegroups.com, James Roper, Runar Bjarnason
I think I now understand Empty and EOF in Iteratess to my own satisfaction so I will have a go at giving my spin on it.

The Input[E] trait has three sub classes El[E], Empty and EOF

The Enumerator will only ever send El[E] or EOF

The Iteratee may receive input directly from an Enumerator or via the flatMap operation that composes two Iteratees together
- it is the invokation via flatMap where Empty may be received
- the flatMap operation takes the remaining Input[E] from Done and passes to the next Iteratee (if it is in Cont state)

The Done state of the Iteratee uses the Input[E] trait to specify remaining elements after processing
- Empty is used here to indicate all the elements have been consumed
- I haven't seen it explicitly documented but I believe an Iteratee receiving EOF must pass EOF into its Done constructor
  - this ensures that EOF is cascaded on to all Iteratees that are down stream via flatMap
  - this is why EOF cannot be used to indicate no elements to Done as it would terminate further processing

So Empty is really a message to flatMap but its presence means that all Iteratees must include a
a case statement for Empty when Input[E] is received (to satisfy the pattern matcher)

So to me the Iteratee interface has a bit of a smell about it since it relies on the user to do 2 things 
for the code to work properly and both things are easy to get wrong
- handle the Empty case (must return the Iteratee in the same state)
- propagate the EOF into Done
  - I have seen a number of blog posts explaining Iteratees that use Empty in this case but I think they are wrong

I think the flatMap operation could/should be changed to check the Input[E] from the Done state and
if Empty not call the operation that handles the Input[E].  To satisfy the pattern matcher the types
would need to be fiddled so that Empty is not an instance of Input[E] but a new base class that sits above
Input[E] (EOF and El[E] would be still be sub classes of Input[E].  Alternatively, James' suggestion of
Option[Input[E]] could be used for remiaing elements in Done at the cost of some verbosity.

It would be good to find a way to automatically propagate EOF but I can't see an easy way to do that.
Documentation and examples stressing the importance of doing this would be an alternative

Bryan Murphy.
Reply all
Reply to author
Forward
0 new messages