Scala ByteString

853 views
Skip to first unread message

Richard Wallace

unread,
Aug 4, 2012, 10:41:23 PM8/4/12
to sca...@googlegroups.com
Hey everybody,

I just pushed an initial pass at a ByteString library for Scala.

A ByteString can be created from an Array[Byte] using ByteString.pack.
There are also some functions for reading from InputStreams and
writing to OutputStreams in purefn.bytestring.io and pimps in
purefn.bytestring.syntax.

A ByteString is backed by a java.nio.ByteBuffer and is immutable.
Currently the ByteBuffers are always array based. I'd like to do some
comparing the performance of array based buffers to direct buffers and
may change it depending on the results. At the very least, there will
probably be specialized functions for reading/writing using direct or
array based buffers so that you can choose which you want to use.

For now, I just wanted to get _something_ done and I'm pretty happy
with the API so far.

In the next week or so I plan to add tests, enumerator/iteratee
support and ByteChannel support.

I'd love feedback.

Rich

Roland Kuhn

unread,
Aug 6, 2012, 2:19:09 AM8/6/12
to sca...@googlegroups.com, sca...@googlegroups.com
Hi Richard,

have you had a look at https://github.com/akka/akka/blob/master/akka-actor/src/main/scala/akka/util/ByteString.scala ? It will enter the Scala distribution by way of akka-actor.jar for 2.10 and it would be awesome if we could make it so good that only one implementation is needed (plus a few type classes in scalaz, I guess).


Regards,

Roland Kuhn
Typesafe — The software stack for applications that scale
twitter: @rolandkuhn
> --
> You received this message because you are subscribed to the Google Groups "scalaz" group.
> To post to this group, send email to sca...@googlegroups.com.
> To unsubscribe from this group, send email to scalaz+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scalaz?hl=en.
>

Richard Wallace

unread,
Aug 6, 2012, 2:24:02 AM8/6/12
to sca...@googlegroups.com

By "Scala distribution" do you mean the standard library?

Roland Kuhn

unread,
Aug 6, 2012, 2:38:23 AM8/6/12
to sca...@googlegroups.com
As I linked to, it is still in the akka source repository: the scala distribution for 2.10 will consist of multiple JARs, e.g. scala-library, scala-reflect, scala-actors, akka-actors, … where some of them are built from the scala/scala repository and others are not. This is my complicated way of saying that I would call scala-library.jar the “standard library”, but that is my personal view.

So, if we can arrive at a common and high performance ByteString implementation, then I think it should be moved into the standard library proper, under the scala.collection package. I would presume that such an addition could happen in a minor release.

Regards,

Roland

Roland Kuhn
Typesafe – The software stack for applications that scale.
twitter: @rolandkuhn


Richard Wallace

unread,
Aug 6, 2012, 3:12:35 AM8/6/12
to sca...@googlegroups.com
The reason I asked is because the standard lib often seems to be where
things go to die. Not just in Scala, but I've seen it in other
languages too. I'd just as soon keep useful things out of the
language stdlib so that they can be more rapidly improved independent
of the rest of the language.

I'm also not entirely sure that being a part of another library is the
best place for something like this, especially something as large as
Akka. There is something to be said for small libraries that can be
quickly evolve.

All that being said, I'd be totally up for creating a common
ByteString that is really really fast. We will probably never get it
quite as good as the Haskell version because we lack array fusion and
some low level memory functions (realloc, memchr, and a few others I
can't think of right now) - but I've been very seriously considering
writing some JNI to fix that problem. Array fusion is the biggest
win, and maybe macros could help make that happen (Paul P posted
something that looked interesting in this regard a while back, but
with very few details).

Tony Morris

unread,
Aug 6, 2012, 3:23:49 AM8/6/12
to sca...@googlegroups.com

I totally agree that if it useful, it belongs. All you need is three or so instances and a few handy combinators to compel the case.

It is a shame that it is String which gets special language syntax support in Scala (and only slightly better in Haskell with [Char] mod OverloadedStrings). I totally do not buy the "yeah but Java interop" lame excuse.

Assuming Cord is a reasonable standard for a general purpose String and it has the library support to go with it, then I think IsString would be great to give library users their best choice. i.e. I would love to defer the choice of string with a generalisation and let users decide if they will use String at their own peril or perhaps Cord or ByteString.Char if they wish to forego the syntax support in exchange for a useful string.

Roland Kuhn

unread,
Aug 6, 2012, 3:25:17 AM8/6/12
to sca...@googlegroups.com
I see what you’re getting at, and in general I find myself in resonance, but in this specific case I’d beg to differ: ByteString is a very stable problem, with all properties known up-front, which means that my idea would be to write a good solution once and then be done with it. The “quickly evolving” stage surely comes long before the “consider putting into standard library” stage. And after that, it will be pretty good by definition, and if a 2% performance improvement can be achieved, that’s fine, but that can then also wait some time for the next update release.

If you want to be sufficiently fanatic (i.e. go down the JNI route), then it would need to be a separate library in any case, since JNI is just horrible to support in general.

Which brings my to the conclusion that we might be aiming for different things after all: I’m shooting for a solid and fast implementation using only JVM-bytecode, so that it can enter the standard library proper, while you seem to be going for the fastest possible at all costs.

Jason Zaugg

unread,
Aug 6, 2012, 3:26:30 AM8/6/12
to sca...@googlegroups.com
On Mon, Aug 6, 2012 at 9:23 AM, Tony Morris <tonym...@gmail.com> wrote:
> I totally agree that if it useful, it belongs. All you need is three or so
> instances and a few handy combinators to compel the case.
>
> It is a shame that it is String which gets special language syntax support
> in Scala (and only slightly better in Haskell with [Char] mod
> OverloadedStrings). I totally do not buy the "yeah but Java interop" lame
> excuse.

Actually with a macro you could convert a string literal to another
representation at compile time.

-jason

Roland Kuhn

unread,
Aug 6, 2012, 3:30:11 AM8/6/12
to sca...@googlegroups.com
Since I’m not aware of array literals at the JVM level, how much would you gain? The other representation would have to result in byte-code which reconstructs the same thing at runtime.

Regards,

Roland Kuhn
Typesafe – The software stack for applications that scale.
twitter: @rolandkuhn


Tony Morris

unread,
Aug 6, 2012, 3:34:34 AM8/6/12
to sca...@googlegroups.com
Well that would be even more awesome if we could keep it as flexible as
a type-class but with all the other benefits of macros.

--
Tony Morris
http://tmorris.net/


Richard Wallace

unread,
Aug 6, 2012, 4:01:37 AM8/6/12
to sca...@googlegroups.com
I'd like to get as fast as possible before going down the JNI path
because I agree that it would be a PITA to support. I also agree that
_eventually_ it would make sense to make this part of the standard
lib, once things stabilize. At the same time, I think it makes sense
for a ByteString library to be it's own library, not a smaller part of
something larger.

TBH I have no idea how performant by current implementation of
ByteString measures up. Like I said before, I just wanted to get
_something_ up. I'm writing tests now and then I plan on doing some
performance testing and trying out some different implementations.
When I started I went down a similar path as Runàr's Cord
implementation, using a Rope[Byte]. I ran into some problems with
that and thought that ByteBuffers would make more sense because of the
IO operations involved. Looking at it again, I may change the impl to
use Rope[ByteBuffer] since I couldn't make use of memchr or other low
level functions that I was hoping to be to use. But before I do
something like that I would very much like to get some performance
numbers.

Rich

Tony Morris

unread,
Aug 6, 2012, 4:07:29 AM8/6/12
to sca...@googlegroups.com
I don't know if you have ever supported a JNI library, but I have at
(IBM's rough equivalent to keytool) and I must caution you, it becomes a
full-time job.


On 06/08/12 18:01, Richard Wallace wrote:
> I'd like to get as fast as possible before going down the JNI path
> because I agree that it would be a PITA to support. I also agree that
> _eventually_ it would make sense to make this part of the standard
> lib, once things stabilize. At the same time, I think it makes sense
> for a ByteString library to be it's own library, not a smaller part of
> something larger.
>
> TBH I have no idea how performant by current implementation of
> ByteString measures up. Like I said before, I just wanted to get
> _something_ up. I'm writing tests now and then I plan on doing some
> performance testing and trying out some different implementations.
> When I started I went down a similar path as Run�r's Cord
> implementation, using a Rope[Byte]. I ran into some problems with
> that and thought that ByteBuffers would make more sense because of the
> IO operations involved. Looking at it again, I may change the impl to
> use Rope[ByteBuffer] since I couldn't make use of memchr or other low
> level functions that I was hoping to be to use. But before I do
> something like that I would very much like to get some performance
> numbers.
>
> Rich
>
> On Mon, Aug 6, 2012 at 12:25 AM, Roland Kuhn <goo...@rkuhn.info> wrote:
>> I see what you�re getting at, and in general I find myself in resonance, but in this specific case I�d beg to differ: ByteString is a very stable problem, with all properties known up-front, which means that my idea would be to write a good solution once and then be done with it. The �quickly evolving� stage surely comes long before the �consider putting into standard library� stage. And after that, it will be pretty good by definition, and if a 2% performance improvement can be achieved, that�s fine, but that can then also wait some time for the next update release.
>>
>> If you want to be sufficiently fanatic (i.e. go down the JNI route), then it would need to be a separate library in any case, since JNI is just horrible to support in general.
>>
>> Which brings my to the conclusion that we might be aiming for different things after all: I�m shooting for a solid and fast implementation using only JVM-bytecode, so that it can enter the standard library proper, while you seem to be going for the fastest possible at all costs.
>>
>> 6 aug 2012 kl. 09:12 skrev Richard Wallace:
>>
>>> The reason I asked is because the standard lib often seems to be where
>>> things go to die. Not just in Scala, but I've seen it in other
>>> languages too. I'd just as soon keep useful things out of the
>>> language stdlib so that they can be more rapidly improved independent
>>> of the rest of the language.
>>>
>>> I'm also not entirely sure that being a part of another library is the
>>> best place for something like this, especially something as large as
>>> Akka. There is something to be said for small libraries that can be
>>> quickly evolve.
>>>
>>> All that being said, I'd be totally up for creating a common
>>> ByteString that is really really fast. We will probably never get it
>>> quite as good as the Haskell version because we lack array fusion and
>>> some low level memory functions (realloc, memchr, and a few others I
>>> can't think of right now) - but I've been very seriously considering
>>> writing some JNI to fix that problem. Array fusion is the biggest
>>> win, and maybe macros could help make that happen (Paul P posted
>>> something that looked interesting in this regard a while back, but
>>> with very few details).
>>>
>>> On Sun, Aug 5, 2012 at 11:38 PM, Roland Kuhn <goo...@rkuhn.info> wrote:
>>>> As I linked to, it is still in the akka source repository: the scala
>>>> distribution for 2.10 will consist of multiple JARs, e.g. scala-library,
>>>> scala-reflect, scala-actors, akka-actors, � where some of them are built
>>>> from the scala/scala repository and others are not. This is my complicated
>>>> way of saying that I would call scala-library.jar the �standard library�,
>>>> but that is my personal view.
>>>>
>>>> So, if we can arrive at a common and high performance ByteString
>>>> implementation, then I think it should be moved into the standard library
>>>> proper, under the scala.collection package. I would presume that such an
>>>> addition could happen in a minor release.
>>>>
>>>> Regards,
>>>>
>>>> Roland
>>>>
>>>> 6 aug 2012 kl. 08:24 skrev Richard Wallace:
>>>>
>>>> By "Scala distribution" do you mean the standard library?
>>>>
>>>> On Aug 5, 2012 11:19 PM, "Roland Kuhn" <goo...@rkuhn.info> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> have you had a look at
>>>>> https://github.com/akka/akka/blob/master/akka-actor/src/main/scala/akka/util/ByteString.scala
>>>>> ? It will enter the Scala distribution by way of akka-actor.jar for 2.10 and
>>>>> it would be awesome if we could make it so good that only one implementation
>>>>> is needed (plus a few type classes in scalaz, I guess).
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Roland Kuhn
>>>>> Typesafe � The software stack for applications that scale
>>>> Typesafe � The software stack for applications that scale.
>>>> twitter: @rolandkuhn
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "scalaz" group.
>>>> To post to this group, send email to sca...@googlegroups.com.
>>>> To unsubscribe from this group, send email to
>>>> scalaz+un...@googlegroups.com.
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/scalaz?hl=en.
>>> --
>>> You received this message because you are subscribed to the Google Groups "scalaz" group.
>>> To post to this group, send email to sca...@googlegroups.com.
>>> To unsubscribe from this group, send email to scalaz+un...@googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/group/scalaz?hl=en.
>>>
>> Roland Kuhn
>> Typesafe � The software stack for applications that scale.
>> twitter: @rolandkuhn
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups "scalaz" group.
>> To post to this group, send email to sca...@googlegroups.com.
>> To unsubscribe from this group, send email to scalaz+un...@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/scalaz?hl=en.
>>


--
Tony Morris
http://tmorris.net/


Richard Wallace

unread,
Aug 6, 2012, 4:36:45 AM8/6/12
to sca...@googlegroups.com
I already have too many full-time jobs. I'd definitely like to
squeeze every last drop of blood out of the JVM before going down that
path, but I don't want to rule anything out yet.

On Mon, Aug 6, 2012 at 1:07 AM, Tony Morris <tonym...@gmail.com> wrote:
> I don't know if you have ever supported a JNI library, but I have at
> (IBM's rough equivalent to keytool) and I must caution you, it becomes a
> full-time job.
>
>
> On 06/08/12 18:01, Richard Wallace wrote:
>> I'd like to get as fast as possible before going down the JNI path
>> because I agree that it would be a PITA to support. I also agree that
>> _eventually_ it would make sense to make this part of the standard
>> lib, once things stabilize. At the same time, I think it makes sense
>> for a ByteString library to be it's own library, not a smaller part of
>> something larger.
>>
>> TBH I have no idea how performant by current implementation of
>> ByteString measures up. Like I said before, I just wanted to get
>> _something_ up. I'm writing tests now and then I plan on doing some
>> performance testing and trying out some different implementations.
>> When I started I went down a similar path as Runàr's Cord
>> implementation, using a Rope[Byte]. I ran into some problems with
>> that and thought that ByteBuffers would make more sense because of the
>> IO operations involved. Looking at it again, I may change the impl to
>> use Rope[ByteBuffer] since I couldn't make use of memchr or other low
>> level functions that I was hoping to be to use. But before I do
>> something like that I would very much like to get some performance
>> numbers.
>>
>> Rich
>>
>> On Mon, Aug 6, 2012 at 12:25 AM, Roland Kuhn <goo...@rkuhn.info> wrote:
>>> I see what you’re getting at, and in general I find myself in resonance, but in this specific case I’d beg to differ: ByteString is a very stable problem, with all properties known up-front, which means that my idea would be to write a good solution once and then be done with it. The “quickly evolving” stage surely comes long before the “consider putting into standard library” stage. And after that, it will be pretty good by definition, and if a 2% performance improvement can be achieved, that’s fine, but that can then also wait some time for the next update release.
>>>
>>> If you want to be sufficiently fanatic (i.e. go down the JNI route), then it would need to be a separate library in any case, since JNI is just horrible to support in general.
>>>
>>> Which brings my to the conclusion that we might be aiming for different things after all: I’m shooting for a solid and fast implementation using only JVM-bytecode, so that it can enter the standard library proper, while you seem to be going for the fastest possible at all costs.
>>>
>>> 6 aug 2012 kl. 09:12 skrev Richard Wallace:
>>>
>>>> The reason I asked is because the standard lib often seems to be where
>>>> things go to die. Not just in Scala, but I've seen it in other
>>>> languages too. I'd just as soon keep useful things out of the
>>>> language stdlib so that they can be more rapidly improved independent
>>>> of the rest of the language.
>>>>
>>>> I'm also not entirely sure that being a part of another library is the
>>>> best place for something like this, especially something as large as
>>>> Akka. There is something to be said for small libraries that can be
>>>> quickly evolve.
>>>>
>>>> All that being said, I'd be totally up for creating a common
>>>> ByteString that is really really fast. We will probably never get it
>>>> quite as good as the Haskell version because we lack array fusion and
>>>> some low level memory functions (realloc, memchr, and a few others I
>>>> can't think of right now) - but I've been very seriously considering
>>>> writing some JNI to fix that problem. Array fusion is the biggest
>>>> win, and maybe macros could help make that happen (Paul P posted
>>>> something that looked interesting in this regard a while back, but
>>>> with very few details).
>>>>
>>>> On Sun, Aug 5, 2012 at 11:38 PM, Roland Kuhn <goo...@rkuhn.info> wrote:
>>>>> As I linked to, it is still in the akka source repository: the scala
>>>>> distribution for 2.10 will consist of multiple JARs, e.g. scala-library,
>>>>> scala-reflect, scala-actors, akka-actors, … where some of them are built
>>>>> from the scala/scala repository and others are not. This is my complicated
>>>>> way of saying that I would call scala-library.jar the “standard library”,
>>>>> but that is my personal view.
>>>>>
>>>>> So, if we can arrive at a common and high performance ByteString
>>>>> implementation, then I think it should be moved into the standard library
>>>>> proper, under the scala.collection package. I would presume that such an
>>>>> addition could happen in a minor release.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Roland
>>>>>
>>>>> 6 aug 2012 kl. 08:24 skrev Richard Wallace:
>>>>>
>>>>> By "Scala distribution" do you mean the standard library?
>>>>>
>>>>> On Aug 5, 2012 11:19 PM, "Roland Kuhn" <goo...@rkuhn.info> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> have you had a look at
>>>>>> https://github.com/akka/akka/blob/master/akka-actor/src/main/scala/akka/util/ByteString.scala
>>>>>> ? It will enter the Scala distribution by way of akka-actor.jar for 2.10 and
>>>>>> it would be awesome if we could make it so good that only one implementation
>>>>>> is needed (plus a few type classes in scalaz, I guess).
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Roland Kuhn
>>>>>> Typesafe — The software stack for applications that scale
>>>>> Typesafe – The software stack for applications that scale.
>>>>> twitter: @rolandkuhn
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups
>>>>> "scalaz" group.
>>>>> To post to this group, send email to sca...@googlegroups.com.
>>>>> To unsubscribe from this group, send email to
>>>>> scalaz+un...@googlegroups.com.
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/scalaz?hl=en.
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "scalaz" group.
>>>> To post to this group, send email to sca...@googlegroups.com.
>>>> To unsubscribe from this group, send email to scalaz+un...@googlegroups.com.
>>>> For more options, visit this group at http://groups.google.com/group/scalaz?hl=en.
>>>>
>>> Roland Kuhn
>>> Typesafe – The software stack for applications that scale.
>>> twitter: @rolandkuhn
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "scalaz" group.
>>> To post to this group, send email to sca...@googlegroups.com.
>>> To unsubscribe from this group, send email to scalaz+un...@googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/group/scalaz?hl=en.
>>>
>
>
> --
> Tony Morris
> http://tmorris.net/
>
>
Reply all
Reply to author
Forward
0 new messages