Collecting design ideas for File/FileReader, ArrayBuffer and TypedArrays/DataViews

198 views
Skip to first unread message

Daniel Bachler

unread,
Aug 15, 2016, 5:00:31 PM8/15/16
to Elm Discuss
In this thread we collected use cases for dealing with loading files and decoding and constructing of binary data.

I would now like to take this discussion to the next stage and start collecting design ideas for those use cases. I think it would be best to try and see what we would like an Elm solution to the collected use cases to look like, where the goal would be for the solution to feel "elmish" but ideally also be informed by the capabilities and limitations of the available native browser APIs. Let us explore the design space a little and see what we can come up with.

I think that discussing several different proposals in detail in one dicussion thread will quickly become unwieldy, so my thought was that we could try to discuss individual ideas as gists with specific feedback happening there, and use this thread for big picture ideas and to synchronize what is happening on the gists.

Simon has already started with a draft of reading local files and uploading them as signed uploads to S3 - Simon, could you create a gist for that?

I started reading up a bit on the TypedArray/DataView apis and created drafts for 3 use cases that, in addition to Simon's read-and-upload example represent some of the major use case themes.

Here is what one of the simplest use cases could look like, reading local text files:
https://gist.github.com/danyx23/35cea7421be6691cbda9e437d58641f0

Here are some initial thoughts on what a binary data decoder could look like. I haven't thought about this in detail, it is really just a conversation starter. This is a very bare-bones api idea - it is probably a lot more elmish to create a BinaryDecoder library similar to the Json.Decoder library.
https://gist.github.com/danyx23/852d793f44a74008868d37596069d49a

And finally a rough brainstorm what creating binary data in Elm could look like. When creating/writing binary, we have the problem that this will mutate the underlying ArrayBuffer, thus undermining some guarantees we have with normal Elm code. Maybe this will have to be modelled as a Cmd instead?
https://gist.github.com/danyx23/a1f88da8913299bdb153a83979234878

I am very interested in what you all think and in alternative design ideas and/or feedback on these very rough ideas.

Markus

unread,
Aug 16, 2016, 7:33:17 AM8/16/16
to Elm Discuss
Regarding the binary encoder/decoder, our team is developing a SPA (mostly native but slowly moving it to Elm) where the server uses binary strings to encode large typed arrays inside json responses. These typed arrays are encoded in base64 strings, and must be decoded depending on the data they represent. For instance, we use uInt32 for timestamps and float32 for real numbers.

The encode/decode process is pretty straightforward using the native TypedArray, but unfortunately that means that any json response that contains binary data must be handled outside elm and then sent as flags to the app or using a port.

I've been playing with several ideas to integrate this _inside_ elm, so I can use packages like HTTP. The result is a more or less elaborate experiment, available here https://github.com/mapmarkus/elm-bytestring. It contains an example where you can see the result of encoding and decoding different kinds of typed arrays. The native implementation is basically what we use right now in plain javascript to manipulate binary strings.

I hope that servers as a valid example for this thread.

Daniel Bachler

unread,
Aug 20, 2016, 6:47:32 AM8/20/16
to elm-d...@googlegroups.com
That is a great example, thanks Markus! I hope to have some time next
week to look at your example in more detail, but one thing I already
noticed is that you don't explicitly deal with endianness. Have you
verified that all your clients use the same (in this case presumably
little endian) encoding?

I am currently thinking back and forth how closely to expose the
various browser apis surrounding binary data especially with regards
to endianness. Do we want to expose both typed arrays and data views?
Or should the elm api be based solely on dataviews and require the
user to always specify the endianness?

Several file formats either specify the endianness in the file and
some insane ones even switch endinaness for certain blocks (psd for
example but I think also jpeg because the exif block can be in a
different endianness than the rest of the file). So maybe the actual
functions of a binary decoder/encoder will have to explicitly state
the endianness.

I am brainstorming while I write this - I think a binary
decoder/encoder might be based on typed array using uint8 only
internally, and on the elm side we expose a number of functions with
specified endianness.

```elm
type alias Cursor = { arrayBuffer : ArrayBuffer, position : Int }

-- Decoders:
uint16Little : Cursor -> (Int, Cursor)
uint16Big: Cursor -> (Int, Cursor)
float32Little : Cursor -> (Float, Cursor)
float32Big : Cursor -> (Float, Cursor)
float64Little : Cursor -> (Float, Cursor)
float64Big : Cursor -> (Float, Cursor)

type alias DecoderFn a = (Cursor -> (a, Cursor))

array : DecoderFn a -> Int -> Cursor -> (Array a, Cursor)
combine : (a -> b -> c) -> DecoderFn a -> DecoderFn b -> Cursor -> (c, Cursor)
-- we may want to have map and andThen too or instead of combine
```

Do you think you will also have the use case where you have data that
is more complex than just lists of numbers? Do you think an api along
these lines would make sense?

--
Daniel Bachler
http://www.danielbachler.de
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Elm Discuss" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elm-discuss/fVFXqEHpxAQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elm-discuss...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Markus

unread,
Aug 20, 2016, 12:56:39 PM8/20/16
to Elm Discuss
Thanks Daniel.

That is a great example, thanks Markus! I hope to have some time next
week to look at your example in more detail, but one thing I already
noticed is that you don't explicitly deal with endianness. Have you
verified that all your clients use the same (in this case presumably
little endian) encoding?

Where this experiment/idea come from, a big SPA that uses binary data inside fields of most json responses, we use big endian exclusively so, originally, handling endianness was unnecessary. 
Well, my example is designed to address the most trivial use case, that is a list in plain binary data. The full version should support zlib and protocol-buffer, which I have no idea how to implement in Elm at the moment :)

Regarding the use of unit8 only, it's what I actually do to create all typed arrays, first the data is dumped to an Uint8Array, and then its underlying buffer is passed to the constructor of each specific typed array.

TypedArrays are used for lots of stuff internally in javascript (webgl, audio, canvas, websockets, etc), which means that any api has to be able to work with all those use cases (some are already Elm packages), so the api you propose, even though is reasonable and type safe, might not be enough, or flexible enough, to cover all use cases. Additionally, there are things like decoding files (psd or jpeg as you've mentioned), or strings that contain binary data (like my example), that might require a different way of working with data.

Finally, performance is also a big design concern, because letting javascript do most of the work will be a lot faster than providing a rich Elm api that allows you to read and manipulate the input byte by byte.

Providing a minimal api, like Blob in elm-http, might be another interesting approach. However, more use cases will definitely help a lot.
Reply all
Reply to author
Forward
0 new messages