Writing a Base64 encoder/decoder on top of core?

99 views
Skip to first unread message

Jesper Louis Andersen

unread,
Jul 26, 2014, 6:38:44 AM7/26/14
to ocaml...@googlegroups.com
Hi,

Say I need to encode/decode Base64 encoded strings into something Bytes.t (-ish). My problem is that I am not sure what approach to take on this one, given that I am to write it on top of Core. What I am lacking are which idioms to pick and how to start out. There are Base64 libraries out there, but they seem to be deeply wrapped up into Batteries.

The naive solution is to pick out char's one at a time from the source string, throw them through a lookup table and then write to a Buffer.t as you go along. This will work to a first approximation but it also means you have to keep the whole data set in memory. So I was looking for a solution which bases itself on streams to a greater extent:

* Turn the string into an In_channel.t (How? Scanf.Scanning.from_string maybe?)
* *Provide* a result In_channel.t so when reads happen on it, we lazily drag values from the string.

What types should I be looking at to achieve this in Core? The library is fairly daunting in size when you have spent a couple of hours with it only :)

The goal is to be able to ditch the string requirement and work directly on channels instead. I Guess what I am after is the answer to the question: "Say you were to write a Base64 library on Core. How would you do it?"

J.

Ralph Douglass

unread,
Jul 26, 2014, 11:04:01 AM7/26/14
to ocaml...@googlegroups.com
If you are looking for existing code not tied to Batteries that just encodes or decodes Base64 encoded strings, you can look here:

https://github.com/janestreet-alpha/email_message/blob/master/lib/encoding.ml

I don't remember if it's doing anything fancy, so this is probably not a full answer to your question.


--
You received this message because you are subscribed to the Google Groups "ocaml-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ocaml-core+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Ralph

Malcolm Matalka

unread,
Jul 26, 2014, 11:21:48 AM7/26/14
to ocaml...@googlegroups.com
It depends on your exact use case but I'll suggest how I would solve it:

- I believe ExtLib comes with a Base64 module.

- You'll primarily want to operate on chunks of strings rather than a
character at a time. So whichever of the following suggestions you go
with you'll want to consume as much string as makes sense, then send
it through processing, getting out the Base64 chunk.

- If you are doing asynchronous things, you probably want to read from
some place, then publish it to a Pipe.t and have processing pull off
the Pipe.t and do something with it and pass it on wherever.

- If you're not, then you want to read a chunk, send it through
processing, and push out a string on the other side.

IME it is rare to structure Ocaml code like Haskell where you are
stringing together a bunch of lazy data structures and hiding where
exactly that list is being produced from. That means you wouldn't try
to feed a string into an in_channel (although it can be done) but your
lowest case would be handling a string and then layers above pull the
string (or are given it) from the appropriate place.

That being said, I think the Base64 module provides some mechanism to
hide that the input is coming from a string vs an in_channel. I don't
know how to use the latter.

So the simplest form of what you want looks something like just a
recursive function that asks an In_channel.t for its next work, does
Base64.str_encode or str_decode on it, and does something with the
result, and loops.

Although someone who writes more elegant Ocaml code can correct me.

Yaron Minsky

unread,
Jul 26, 2014, 1:32:23 PM7/26/14
to ocaml...@googlegroups.com
I agree that the most general approach is to have a chunk-oriented
decoder, rather than going through an in-channel. That way you can
adapt it for synchronous use, or use it with an asynchronous I/O
library like Async or Lwt. Breaking out the pure-computational piece
of your library from the I/O will generally increase its range of
applicability.

Here's an interface that kind of gets at the idea:

module Encoder : sig
type t
val empty : t

type temp_data =
private { buffer: Bytes.t
; mutable pos: int
; mutable len: int
}

type add_result =
| Data of temp_data
| Error of Error.t
| Nothing

val add_bytes
: t
-> Bytes.t
-> pos:int
-> len:int
-> encode_result

val end_of_stream : t -> encode_result
end

You could run this without any allocation at all, just re-using the
buffers and the temp_data over and over. One thing that's a little
awkward about the above is that you can't really write out a Bytes.t
without copying it, due to OCaml's copying collector. So you might
want to back this on Bigstrings instead, or, worst case scenario, have
a bit of code duplication to allow you to support both.

With an API of this kind, you should be able to hook it in to Async,
Lwt or OCaml's native channels pretty straightforwardly.

y

Stephen Weeks

unread,
Jul 26, 2014, 5:20:27 PM7/26/14
to ocaml...@googlegroups.com
This reminds me of [Core.Unpack_buffer]

Yaron Minsky

unread,
Jul 26, 2014, 9:21:27 PM7/26/14
to ocaml...@googlegroups.com
Yeah, I agree this is quite close. Here's the API, from Core_kernel,
technically:

https://github.com/janestreet/core_kernel/blob/master/lib/unpack_buffer.mli

I think it's not quite suitable here, because for the base-64
encoding, I think the Unpack_buffer API will require more allocation
--- in particular, it requires the generation of a Queue of results,
whereas the API I was proposing would allow for zero-allocation
streaming conversions, since you can get away with allocating just two
buffers and filling and refilling them.

All told, I think we could probably use a bit more work on figuring
out better abstractions for handling this class of designs.

y

Anil Madhavapeddy

unread,
Jul 29, 2014, 8:19:51 AM7/29/14
to ocaml...@googlegroups.com
On 26 Jul 2014, at 12:38, Jesper Louis Andersen <jesper.lou...@gmail.com> wrote:

There are Base64 libraries out there, but they seem to be deeply wrapped up into Batteries.

If you just need a Base64 library (as opposed to wanting to write one to learn Core), then there's a widely used standalone one here:


-anil
Reply all
Reply to author
Forward
0 new messages