On Wed, Feb 12, 2014 at 3:49 AM, Michael Snoyman <mic...@snoyman.com> wrote:
As some people may know, the current state of Unicode decoding (e.g., UTF-8 or UTF-32BE decoding) is a bit tricky today in a Haskell streaming data framework. The problem arises from the two facts that:1. The text package, until recently, provided no streaming API. Instead, it only provided a strict API (which assumes all data is in a single strict ByteString) or a lazy API, which would require using lazy I/O. Obviously streaming data fans would prefer to consume the bytes incrementally. (The recent text 1.1 adds a streaming API for UTF8.)2. text generally treats errors in decoding via exceptions from pure code. In streaming data, we usually want to be able to decode as much of the stream as possible, and then do something intelligent with the remaining bytes.The original solution (to my knowledge) to this problem came from John Millikin in Data.Enumerator.Text. This code has since been used in both conduit and pipes-text. However, there are some downsides: the code is complicated, it's now been copy-pasted into three different projects, and it has some subtle differences from text's own decode which can lead to bugs[1][2][3].Michael Thompson raised issue #60 against the text package[4] about this problem. My most recent comment[5] on that issue points to some new code I've just written for conduit, which takes text's own decoding functions, and tweaks them to have the streaming data behavior we'd want. The code currently lives in conduit.My questions for this list are:* Are others interested in using this code outside of conduit?
In principle yes; I've been quite disappointed with the current text api WRT streaming. Although I don't deal with much textual data, so it hasn't really been a big issue for me and in all likelyhood I won't use it for some time even if you do break it out.
* Are there issues with the API that I've provided? (Note that I've designed this as a low-level API, and have used the technique of using a null ByteString to indicate end-of-stream. It's a hack, but it avoids extra runtime checks and surface area of the API.)
It seems adequate, although it might be nice if DecodeFailure reported a bit more information about the failure. Although perhaps it provides enough to add that on top.
* I've run a battery of tests on this code, but I don't feel 100% comfortable with it yet. If others want to try and break it, I'd appreciate it.
Not being a big text user, I'm probably not well-positioned to help much with this.Cheers,John L.
--
You received this message because you are subscribed to the Google Groups "streaming-haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to streaming-hask...@googlegroups.com.
To post to this group, send email to streamin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/streaming-haskell/52FCA469.2030601%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
benchmarking lazy textmean: 4.928981 ms, lb 4.903218 ms, ub 5.024673 ms, ci 0.950std dev: 225.9253 us, lb 55.82041 us, ub 522.7356 us, ci 0.950benchmarking new conduitmean: 10.72952 ms, lb 10.55484 ms, ub 10.97152 ms, ci 0.950std dev: 1.043854 ms, lb 815.5910 us, ub 1.342237 ms, ci 0.950benchmarking streammean: 10.74237 ms, lb 10.60407 ms, ub 10.93885 ms, ci 0.950std dev: 834.0893 us, lb 627.3384 us, ub 1.150477 ms, ci 0.950benchmarking pipes-text-old -- with FFI'd functionmean: 5.140928 ms, lb 5.116550 ms, ub 5.198108 ms, ci 0.950std dev: 181.4065 us, lb 94.51407 us, ub 361.4704 us, ci 0.950benchmarking pipes-text-newmean: 11.06904 ms, lb 10.92610 ms, ub 11.28362 ms, ci 0.950std dev: 886.1462 us, lb 647.3568 us, ub 1.243963 ms, ci 0.950
--
You received this message because you are subscribed to the Google Groups "streaming-haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to streaming-hask...@googlegroups.com.
To post to this group, send email to streamin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/streaming-haskell/e0e7c698-cd75-46ef-a3b3-f088f7e855b0%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "streaming-haskell" group.
To unsubscribe from this group and stop receiving emails from it, send an email to streaming-hask...@googlegroups.com.
To post to this group, send email to streamin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/streaming-haskell/340d0640-1331-4fc6-b6ba-67000217b293%40googlegroups.com.