Experience report on LFE after first major sprint.

134 views
Skip to first unread message

anu...@cinova.co

unread,
Dec 2, 2014, 10:21:22 PM12/2/14
to lisp-flavo...@googlegroups.com
As previously posted, we have embarked on developing a substantial web application using LFE. We just finished our first major sprint and I thought this would be a good time to report on my experience. Our stack is built around Yaws and we are using mysql as the database. We're not using any templating language, but instead relying on and API based architecture with all front-end interactions in Javascript+HTML. 

We are a small team of developers (5) who are trying to get a pretty major web application launched in 12 weeks. This is a challenge in its own right, but as we were evaluating web platforms, we ended up rejecting most others in favor of Erlang. The choice was made collectively by all the devs who evaluated all of our different options (the others being PHP, node.js, clojure, scala, Haskell).  We picked erlang primarily for the scalability, reliability and ecosystem support. 

Of the team, I am the only experienced Lisper (Scheme, Racket, CL). I was naturally attracted to LFE because of the philosophy of staying close to Erlang, but still being a Lisp. I further made the decision that at least some people would use LFE for development, perhaps mixed with Erlang. So we now have a team that is learning Erlang and LFE at the same time. The devs are all very competent, but this learning curve is our biggest risk at this point. It is my hope that LFE will help us transcend the learning curve. There are some indications that this is happening, but I will know more for sure in a few more days. 

So, now on to my opinion of LFE so far:

The good:
1. I am impressed at the correctness of the implementation. I have not seen unpredictable behavior yet and have unearthed no major correctness issues in the language implementation. Kudos to the team for that!
2. I like the closeness to Erlang. Being able to mindlessly call Erlang modules is a huge plus. Including .hrl files and having the records available in the defrecord form is very convenient. While I am no Erlang expert, I find converting cut and pasted Erlang code to LFE quite easy. So far, I only had a little bit of trouble with Bit comprehensions on that front. But, I suspect that is more because I am new to Erlang. 
3. The compiler seems fast enough, but I'll know more as the number of source files grow. 

The moderately good: 
1. Documentation ... It exists, but things are hard to find. I realize this is a work in progress and hopefully we will be able to help in some ways. 
2. The LFE repl. I was thankful for it for understanding behavior which was not in the documentation or if I was too lazy to look it up. Working with the shell is not smooth I have listed the issues below. 
3. Unit test framework. Useful for many cases, but it took me a while to get it working correctly. I had trouble understanding test outputs, and the inability to isolate runs to specific tests was a little painful. 

The frustrations:  Please bear with me on this. Not all of this relates to LFE entirely, but I feel it better to list it here in case LFE can have better solutions.  The section is bigger because I'm trying to give all the gory details. 

1. The macro system. About 80% of the code I wrote in the past 10 days is for macros. It was a painful experience. 
     -- The biggest problem is that the macro system does not report errors properly. If an expansion encounters an error, all that the compiler reports is something like could not expand form. Or, even more frustrating 'bad application'. 
         No other information is provided about what the error was and where it arose. This was true in the LFE repl as well. It would REALLY REALLY help if the underlying error is reported. 
     -- (macroexpand ...) in the repl is broken, or I don't know how to use it. I never once was able to get a macroexpanded output. 
     -- I got around these issues by writing helper functions which I could debug in the repl, but it took me a while to settle on this methodology. It makes my macros a trivial shell over a helper function, which makes for uglier code.

    Note, however, the macro system works correctly once the macros are debugged. There are no issues with correctness. 

2. Strings/Binary complexity. I realize this is more a problem with Erlang's idiotic language design decision, but it flows into LFE and caused too many frustrations. I would randomly see strings looking like "(129 33 83 121)" in my database for reasons I still don't understand. I began brute-force lists:flatten, or my own utility function ensure-string around anything that is supposed to return a string. I think lfe_io:format1 was a big offender here, but I'm not sure about it. I suspect I don't understand it very well at all. 

In any case, the reason I list it here, is that LFE has a chance to undo the Erlang idiocy. My ideal situation here would be that LFE force all strings to be Erlang binary always. I don't know how to do this safely and still maintain interoperability. 

3. Erlang's poor formatted i/o. "~p" is woefully inadequate. One of my resolutions is to contribute a CL compatible format function to LFE. Let me know if someone else is already working on this or there is one already out there. 

One other thing on my wish-list would be that LFE be a little more opinionated on syntax choices. Allowing multiple ways to define functions etc., while helpful to lazy old goats like me, probably complicates the language more than necessary. It may be best left to user-defined macros to provide alternatives. 

Anyhow, that is all for now. More as things develop. Congratulations on getting LFE this far. I am working on a document called "LFE for Erlang programmers", which I will publish to this group as soon as it reaches some level of stability. 

Thanks again!

Anurag.







Fred Hebert

unread,
Dec 3, 2014, 8:19:17 AM12/3/14
to lisp-flavo...@googlegroups.com
On 12/02, anu...@cinova.co wrote:
> 2. Strings/Binary complexity. I realize this is more a problem with
> Erlang's idiotic language design decision, but it flows into LFE and caused
> too many frustrations. I would randomly see strings looking like "(129 33
> 83 121)" in my database for reasons I still don't understand. I began
> brute-force lists:flatten, or my own utility function ensure-string around
> anything that is supposed to return a string. I think lfe_io:format1 was a
> big offender here, but I'm not sure about it. I suspect I don't understand
> it very well at all.
>

So what you noticed was not Erlang idiocy, it was Erlang smarts. I mean,
outside the fact that lists vs. strings is a bit confusing, what you
noticed (if it's solved through lists:flatten) was the presence of
iolists and iodata.

From Learn You Some Erlang (http://learnyousomeerlang.com/buckets-of-sockets#io-lists)

A string is a bit like a linked list of integers: for each character,
you've got to store the character itself plus a link towards the rest of
the list. Moreover, if you want to add elements to a list, either in the
middle or at the end, you have to traverse the whole list up to the
point you're modifying and then add your elements. This isn't the case
when you prepend, however:

A = [a]
B = [b|A] = [b,a]
C = [c|B] = [c,b,a]

In the case of prepending, as above, whatever is held into A or B or
C never needs to be rewritten. The representation of C can be seen
as either [c,b,a], [c|B] or [c,|[b|[a]]], among others. In the last
case, you can see that the shape of A is the same at the end of the
list as when it was declared. Similarly for B. Here's how it looks
with appending:

A = [a]
B = A ++ [b] = [a] ++ [b] = [a|[b]]
C = B ++ [c] = [a|[b]] ++ [c] = [a|[b|[c]]]

Do you see all that rewriting? When we create B, we have to rewrite
A. When we write C, we have to rewrite B (including the [a|...] part
it contains). If we were to add D in a similar manner, we would need
to rewrite C. Over long strings, this becomes way too inefficient,
and it creates a lot of garbage left to be cleaned up by the Erlang
VM.

...

In these cases, IO lists are our saviour. IO lists are a weird type
of data structure. They are lists of either bytes (integers from 0
to 255), binaries, or other IO lists. This means that functions that
accept IO lists can accept items such as [$H, $e, [$l, <<"lo">>, "
"], [[["W","o"], <<"rl">>]] | [<<"d">>]]. When this happens, the
Erlang VM will just flatten the list as it needs to do it to obtain
the sequence of characters Hello World.

IoLists will be accepted by most Erlang functions (sockets, io module,
file module, re module, unicode module, etc.) and will allow to do
appending (or splicing) in a waaay more efficient manner than having to
do an operation, then flatten, then do an operation, etc.

It saves you a whole lot of runtime complexity, and as long as you
perceive the iodata as an opaque data type (meaning you don't go and
iterate all willy-nilly down the data structure which makes sense in the
unicode age), you should be able to have a good time with them at a
cheaper cost.

Regards,
Fred.

anu...@cinova.co

unread,
Dec 3, 2014, 11:06:39 AM12/3/14
to lisp-flavo...@googlegroups.com

Sorry a clarification: What I meant to say is that when I tried to print what I assumed were strings (which I do know are lists), they got printed out as lists instead of strings for no apparent reason. It is this behavior that I cannot explan. 

BTW, I still maintain it is an idiocy :-)

A.

Fred Hebert

unread,
Dec 3, 2014, 11:14:06 AM12/3/14
to lisp-flavo...@googlegroups.com
On 12/03, anu...@cinova.co wrote:
>
> Sorry a clarification: What I meant to say is that when I tried to print
> what I assumed were strings (which I do know are lists), they got printed
> out as lists instead of strings for no apparent reason. It is this behavior
> that I cannot explan.
>
> BTW, I still maintain it is an idiocy :-)
>
> A.

That could have been because they contained non-printable characters.
That may happen specifically if you're using unicode and using '~p' or
'~s' to print them, which assumes lists or binaries of bytes.

I.e. ~p is unreliable because you're printing a string text
representation, not actual text data.

~s expects lists of bytes and prints them as such -- the implicit
encoding scheme expected is latin1 (possibly ISO-8859-1, but I don't
remember there).

~ts is to be used to print unicode strings or iolists or iodata. This
will allow you to print lists of codepoints as a unicode string, or
binaries encoded in utf8, utf16, or utf32.

Duncan McGreggor

unread,
Dec 3, 2014, 11:15:28 AM12/3/14
to lisp-flavo...@googlegroups.com
Anurag, I know Fred's (amazing!) explanation was a bit long (and therefore awesome!), but he did explain the reason that you got the behaviour: because Erlang was doing the smart thing and not manipulating lists inefficiently.

If you need them in a particular form, just wait until the step right before you need that form, and do your final formatting then.

Also, you might consider using binary instead of string, in which case that last-step conversion is as simple as (erlang:iolist_to_binary ...)

Keep up with the LFE!

d

P.S. You'll get used to the "string" situation in Erlang :-) In a few months, it won't even bother you. Really!

--
You received this message because you are subscribed to the Google Groups "Lisp Flavoured Erlang" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.
To post to this group, send email to lisp-flavo...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-flavoured-erlang.
For more options, visit https://groups.google.com/d/optout.

anu...@cinova.co

unread,
Dec 3, 2014, 11:37:11 AM12/3/14
to lisp-flavo...@googlegroups.com
Thanks .. That is definitely helpful! 

A.

anu...@cinova.co

unread,
Dec 3, 2014, 12:01:25 PM12/3/14
to lisp-flavo...@googlegroups.com
Thanks Duncan. 

I think the biggest problem I have is that I don't have a coding hygiene figured out around strings. Keeping things in binary through out the program makes sense, but I don't know yet how I'll trip up against various built-ins that don't expect a binary. 

Anyhow, the reason I brought this up in this discussion forum is to really put forth the idea that since LFE is a language processing system, it can actually choose to deal with strings differently than Erlang. There are advantages to thinking about lists and strings interchangeably in many contexts, but choosing to implement strings as lists is something you do in a homework assignment. Erlang made it part of the base language and so far I have only seen negative consequences of this. As you suggested, programmers are forced to develop a discipline of using binaries and putting converters at the boundary of their system. For example, in the strings heavy web-app world, the fastest Erlang library for JSON encode decode is the Jiffy library, which only deals with binary exclusively for performance reasons.  This feels artificial and something the base language should help with.

Other modern lisps (CL, Racket, Clojure)  have unified strings and lists (and vectors) through the concept sequences where a range of functions is defined to efficiently implement sequence manipulations where they make sense, but not conflate the representations. In CL, this was rather easy to incorporate in the language because of its excellent model of generic-functions. LFE is tied to the hip with Erlang, but as an emerging Lisp, I think LFE has an opportunity to further improve upon Erlang's design. 

Anurag. 
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-erlang+unsub...@googlegroups.com.

Duncan McGreggor

unread,
Dec 3, 2014, 12:29:10 PM12/3/14
to lisp-flavo...@googlegroups.com
A couple of thoughts here:

1) LFE can't really improve upon Erlang's design because it's built on top of the Erlang VM,  not instead of it. LFE can't make Erlang's internals different (and thus, e.g., can't make string processing more efficient, etc.).

However! There is hope :-)

2) There's no reason you can't create an lstring module for LFE that becomes the standard for the community, providing all the features that you want. Elixir did something similar, iirc. Under the covers, you would still want your library to play by the rules for efficient Erlang, but you could wrap that in exported functions or macros so that the user/dev would never have to worry about it. They'd just use your API.

Which, when it really comes down to it, is playing to a Lisp's strengths. 

d


To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.

anu...@cinova.co

unread,
Dec 3, 2014, 12:57:18 PM12/3/14
to lisp-flavo...@googlegroups.com
Both good points. In addition, I would also ideally like that LFE  make binary the default representation for  anything in double-quotes. In other words, flip the default. This may impact some things like ++ or  other dependencies on the with the VM, but most functions that take IoData will still be happy, and the few places where a list representation is desired can be explicitly bin_to_list'ed.  I also admit that I don't fully understand all of this yet :-) so I don't know how practical the suggestion is. 

Also, does LFE have reader macros? They will help further in making this a user/framework level choice instead of being built into the language. (And keep people like me from harassing you all :-)

A. 
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-erlang+unsubscri...@googlegroups.com.
To post to this group, send email to lisp-flavo...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-flavoured-erlang.
For more options, visit https://groups.google.com/d/optout.

anu...@cinova.co

unread,
Dec 3, 2014, 1:03:18 PM12/3/14
to lisp-flavo...@googlegroups.com
ALL:

I want to apologize in advance if I come across too harshly when I use terms like "idiotic" and "homework assignment". As many in our profession tend to do, I attribute Programming Languages with independent existence without associating them with the people behind them. 

Please treat this as my whimsical style of debate. I mean no disrespect to anyone. 

A.

Duncan McGreggor

unread,
Dec 3, 2014, 1:40:59 PM12/3/14
to lisp-flavo...@googlegroups.com
On Wed, Dec 3, 2014 at 11:57 AM, <anu...@cinova.co> wrote:
Both good points. In addition, I would also ideally like that LFE  make binary the default representation for anything in double-quotes.

Huh, that's an interesting proposition. I like it.

Robert, what do you think?
 
In other words, flip the default. This may impact some things like ++ or  other dependencies on the with the VM, but most functions that take IoData will still be happy, and the few places where a list representation is desired can be explicitly bin_to_list'ed.  I also admit that I don't fully understand all of this yet :-) so I don't know how practical the suggestion is. 

Yeah, I'm not sure what the full implications would be, but perhaps a breaking-change would be acceptable as we move to the 1.x series for LFE :-)

If we do that, though, I'd like to tack on a UX request: provide formatting for binary lists in the REPL similarly to what Erlang does in its shell. Instead of #B(int int int ...), display #B("...").
 
Also, does LFE have reader macros? They will help further in making this a user/framework level choice instead of being built into the language. (And keep people like me from harassing you all :-)

I don't believe so. I think Robert has discussed in the past some of the difficulties in doing that with LFE.

d

Duncan McGreggor

unread,
Dec 3, 2014, 1:42:52 PM12/3/14
to lisp-flavo...@googlegroups.com
For me, this has not been a problem. From your first email your personality was clear -- no malice or flippancy, just passion, interest, and experience in other languages :-)

Your emails are great -- thanks for contributing!

d

--
You received this message because you are subscribed to the Google Groups "Lisp Flavoured Erlang" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.

Dreki Þórgísl

unread,
Dec 3, 2014, 3:56:33 PM12/3/14
to lisp-flavo...@googlegroups.com
On Wed, Dec 3, 2014 at 12:40 PM, Duncan McGreggor <dun...@cogitat.io> wrote:

On Wed, Dec 3, 2014 at 11:57 AM, <anu...@cinova.co> wrote:
Both good points. In addition, I would also ideally like that LFE  make binary the default representation for anything in double-quotes.

Huh, that's an interesting proposition. I like it.

Robert, what do you think?
 
In other words, flip the default. This may impact some things like ++ or  other dependencies on the with the VM, but most functions that take IoData will still be happy, and the few places where a list representation is desired can be explicitly bin_to_list'ed.  I also admit that I don't fully understand all of this yet :-) so I don't know how practical the suggestion is. 

Yeah, I'm not sure what the full implications would be, but perhaps a breaking-change would be acceptable as we move to the 1.x series for LFE :-)

If we do that, though, I'd like to tack on a UX request: provide formatting for binary lists in the REPL similarly to what Erlang does in its shell. Instead of #B(int int int ...), display #B("...").
 
Also, does LFE have reader macros? They will help further in making this a user/framework level choice instead of being built into the language. (And keep people like me from harassing you all :-)

I don't believe so. I think Robert has discussed in the past some of the difficulties in doing that with LFE.

A 2-krone contribution/paste:

In summary: not yet, they're hard, but possibly in the future. Robert has given them some thought.

~dþ
 

--
You received this message because you are subscribed to the Google Groups "Lisp Flavoured Erlang" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.

anu...@cinova.co

unread,
Dec 4, 2014, 11:42:08 AM12/4/14
to lisp-flavo...@googlegroups.com
Thanks, Duncan!
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-erlang+unsub...@googlegroups.com.

Robert Virding

unread,
Dec 4, 2014, 5:07:57 PM12/4/14
to lisp-flavo...@googlegroups.com
Hi, I am on the road and have not had much time to respond. Here are some comments to your initial comments:

- Yes, the handling of errors in macros is not good and is on my list of things to do. My goal is to report what went wrong in the macro evaluation and the call macro call which caused the error.

- (macroexpand ...) in the shell by default only expands the pre-defined macros. To get it to expand other macros you need to add an extra argument which is the environment which contains macros you wish to use. In the shell $ENV is this environment so you can do (macroexpand ... $ENV) to expand all macros defined in the shell. Maybe this should be the default.

- I will take strings as lists later. I will just say that this is common in many functional languages and actually a very good way of representing strings, but different from what many are used to.

- What exactly are you after in the formatting. One problem I see with CL format is that is it contains too much junk, not just the kitchen sink but most of the the house as well.

- User defined reader macros are of course possible but not with the current way of reading input. It works in a more traditional fashion of first tokenising the input and then parsing the tokens. This would only be able to handle reader macros in a very limited way.

I will get back later,

Robert

anu...@cinova.co

unread,
Dec 4, 2014, 5:44:41 PM12/4/14
to lisp-flavo...@googlegroups.com
Thanks for the detailed responses! My comments to some points below.

Best,

Anurag.


On Thursday, December 4, 2014 2:07:57 PM UTC-8, Robert Virding wrote:
Hi, I am on the road and have not had much time to respond. Here are some comments to your initial comments:

- Yes, the handling of errors in macros is not good and is on my list of things to do. My goal is to report what went wrong in the macro evaluation and the call macro call which caused the error.

- (macroexpand ...) in the shell by default only expands the pre-defined macros. To get it to expand other macros you need to add an extra argument which is the environment which contains macros you wish to use. In the shell $ENV is this environment so you can do (macroexpand ... $ENV) to expand all macros defined in the shell. Maybe this should be the default.

Got it! This will certainly help things a lot. 
 
- I will take strings as lists later. I will just say that this is common in many functional languages and actually a very good way of representing strings, but different from what many are used to.

- What exactly are you after in the formatting. One problem I see with CL format is that is it contains too much junk, not just the kitchen sink but most of the the house as well.


While it is true that format is overly loaded, I specifically have a need for complex formatting because we do extensive (non-erlang/lfe source) code generation. At various times, I find myself craving format magic.  

Robert Virding

unread,
Dec 12, 2014, 11:43:50 AM12/12/14
to lisp-flavo...@googlegroups.com
I want to add a comment about having strings as lists rather than binary data. This is quite common in functional languages and actually has many advantages.

- Having them as lists means that you automatically get a large number of functions which work on strings.
- You can completely avoid all the problems with encoding, it can be a list of unicode codepoints.
- It is also usually *faster* working with lists than binaries as there is often a lot less copying of data involved. This may sound strange but check out this thread where a guy did some measurements and found the results surprising:

http://erlang.org/pipermail/erlang-questions/2012-October/070067.html

Others were surprised as well but they shouldn't have been. Note that this is not due to bad implementation of erlang binaries, in fact the binary implementation does some work to optimise concatenating binaries.

Also erlang has the concept of an iolist which is a nested list of integers, lists and binaries which don't need to be concatenated or flattened if they are just going to be output. It means that you can concatenate to strings, either list or binary, by doing [String1|String2] and appending a character to the end of a string by [String|Char]. If you need to work with the string *then* you flatten it, either by using erlang:iolist_to_binary or unicode:characters_to_binary/list if you utf-8 encoded binaries. This is of course *much* more efficient than copying binary strings every time you add something to them.

This is one reason for example that the io_lib functions don't bother flattening their return strings, they leave it to the caller to decide what post-processing, if any, needs to be done.

One problem with strings as lists is that people are just not used to them. For a more philosophical discussion see this thread:

http://erlang.org/pipermail/erlang-questions/2012-December/071399.html

One problem with Erlnag/OTP handling of strings and characters is that they are latin1 based and not unicode based. It is slowly migrating but has not reached the end of the path yet. This has nothing to do with strings as lists.

This why LFE uses strings as lists, so it can integrate seamlessly with Erlang/OTP (and because I like it :-)). If we were to choose strings as binaries then we would need a set of interface libraries for OTP so we could talk binary strings while erlang talks list strings. Elixir chose this path.

Robert

Duncan McGreggor

unread,
Dec 12, 2014, 9:25:47 PM12/12/14
to lisp-flavo...@googlegroups.com
Thanks for the extra info, Robert -- thanks good to know. It was a prefect complement to Fred's email :-) 

I was surprised to read about the performance of lists vs.binaries ... and I now take back my suggestion to Anurag about them :-) (using binaries, that is).

Robert and Fred, one last question: can either of you recommend a good example of string/list usage/manipulation in an Erlang library? Something that could be used as a reference point on how to work with them in the right, most elegant way?

Thanks again :-)

d




--
You received this message because you are subscribed to the Google Groups "Lisp Flavoured Erlang" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.

Robert Virding

unread,
Dec 13, 2014, 12:54:47 PM12/13/14
to lisp-flavo...@googlegroups.com
After a bit of thinking a reasonable syntactic extension would be to add a construct for binary utf-8 encoded strings. Either #"åäö ð" or #b"åäö ð" would be reasonable, though I prefer the first. Prettyprinting would also then generate these if the binary was a urf-8 encoded binary string.

Thoughts? Alternatives?

Robert

Duncan McGreggor

unread,
Dec 13, 2014, 1:15:28 PM12/13/14
to lisp-flavo...@googlegroups.com
Oh, nice -- this would be helpful... string encoding can be fairly tedious, and thus this looks like it would be very helpful.

Why do you prefer the first syntax? Easier to parse for LFE?

The second one makes more intuitive sense to me (visually looking at it, that is) ... but I'm not sure how unambiguously it would be parsed.

This is a good idea :-)

d


--
You received this message because you are subscribed to the Google Groups "Lisp Flavoured Erlang" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.

anu...@cinova.co

unread,
Dec 14, 2014, 6:30:41 PM12/14/14
to lisp-flavo...@googlegroups.com
I'd like to throw in a vote for #"..." , for no other reason than it is a little bit cleaner to look at.

A. 

To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-erlang+unsub...@googlegroups.com.

anu...@cinova.co

unread,
Dec 14, 2014, 6:57:41 PM12/14/14
to lisp-flavo...@googlegroups.com
I agree with the interoperability points, but I am still not convinced that lists are a better representation for strings in the big scheme of things. 

I looked through the first link, and it sort of makes sense in the context of the test the person was doing, but that is all. The issue is not so much of whether binaries are more efficient than strings within the Erlang environment alone. In the past 20 years, the world of array-representation of strings has found support all the way down to the hardware making it essentially a single instruction to do byte block manipulations. Also, with streaming instructions built into CPU's, cache behavior is now critical in determining performance, and cache behavior really supports the array representation of strings over list representation of strings, making it a much more efficient choice to represent strings as arrays. 

I also understand that this level of optimization is out of the scope of LFE since it lies in the purview of the Erlang VM, so this discussion is probably moot :-)

In any case, it was exactly in the context of JSON (the use-case of the first link here) that I first realized the difference between binaries and strings. The fastest JSON library for erlang is Jiffy, which uses C to parse/unparse JSON. C libraries, as you know are heavily tuned to run super efficiently on modern processors. This link https://kivikakk.ee/2013/05/20/erlang_is_slow.html did some measurements on different JSON libraries and found Jiffy to be nearly 5x-10x faster than Mochijson (which does seem to use binaries internally, but no C assistance). Brief recap of the numbers:

  • jiffy: 1,271ms
  • mochijson2: 8,692ms
  • mochijson: 11,111ms

Duncan McGreggor

unread,
Dec 14, 2014, 9:00:19 PM12/14/14
to lisp-flavo...@googlegroups.com
My hesitation with #"..." is that I'm not sure how much it maintains the visual consistency... Let's take a look, maybe I can talk myself into it:

 * #(...) - tuple literal
 * #b(...) - binary literal
 * #m(...) - map literal

Then we have the following for numbers in various bases:

 * #b
 * #o
 * #d
 * #x
 * #23r

Hrm, I don;t think I talked myself into #"..."; worse, I may have talked myself out of #b"..."! The numbers did it for me. Here's my thinking:

 * Strings are lists, not  scalar quantities like numbers.
 * They are more similar to binaries and tuples, in that regard.
 * In particular, they should resemble the binary form as much as possible.

Instead of either #"..." or #b"...", how about this:

#u(...)

That maintains symmetry with #(...), #b(...), and #m(...) and signifies that it's unicode with an obvious reference via the "u" ...

Thoughts?

d



To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.

Duncan McGreggor

unread,
Dec 15, 2014, 11:44:13 AM12/15/14
to lisp-flavo...@googlegroups.com
I had more thoughts regarding unicode this morning. Let me try to write them out... I think I now agree with *both* approaches :-)

There are four ways of representing string data in Erlang (for various definitions of "string data"):

 * As a list
 * As a printable list
 * As a binary 
 * As a printable binary

For example:

> (io:format "~w~n" (list "apple"))
[97,112,112,108,101]
ok
> (io:format "~p~n" (list "apple"))
"apple"
ok
> (io:format "~w~n" (list (binary "apple")))
<<97,112,112,108,101>>
ok
> (io:format "~p~n" (list (binary "apple")))
<<"apple">>
ok

(lfe_io doesn't render the last one as #B("apple"), rather the output for ~w and ~p are the same; otherwise I would have used it instead of io).

For unicode strings:

> (io:format "~w~n" (list "åäö ð"))
[229,228,246,32,240]
ok
> (io:format "~p~n" (list "åäö ð"))
"åäö ð"
ok
> (io:format "~w~n" (list (binary "åäö ð")))
<<229,228,246,32,240>>
ok
> (io:format "~p~n" (list (binary "åäö ð")))
<<"åäö ð">>
ok

So the questions in my mind are:
 * in what way(s) does it make sense to represent unicode, that
 * doesn't break the symmetry of the above?

I haven't worked that much with unicode in Erlang/LFE, so I can't speak very well to coding workflows, usability of functions/options/etc. Regardless, I will continue venturing :-) If a list or binary value can be interpreted as unicode, I feel (today!) that the following presents a nice symmetry with the existing conventions while making the special case of unicode obvious:

 * Unicode data as a list: (229 228 246 32 240)
 * Unicode data as a printable list: u"åäö ð"
 * Unicode data as a binary: #b(229 228 246 32 240)
 * Unicode data as a printable binary: #u("åäö ð")

Thoughts?

d


anu...@cinova.co

unread,
Dec 15, 2014, 12:02:04 PM12/15/14
to lisp-flavo...@googlegroups.com
I like this proposal. One small mod, however: 
Unicode data as printable binary should be #u"åäö ð" without the parenthesis, in order maintain symmetry with unicode strings u"åäö ð".

A. 
A. 

To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-erlang+unsubscri...@googlegroups.com.
To post to this group, send email to lisp-flavo...@googlegroups.com.
Visit this group at http://groups.google.com/group/lisp-flavoured-erlang.
For more options, visit https://groups.google.com/d/optout.

Duncan McGreggor

unread,
Dec 15, 2014, 12:16:04 PM12/15/14
to lisp-flavo...@googlegroups.com
So the idea there with u("...") is to is to keep symmetry with the binary form of b("...") while u"..." would keep symmetry with printable string "...".

Does that make sense?

d

To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.

anu...@cinova.co

unread,
Dec 15, 2014, 5:21:54 PM12/15/14
to lisp-flavo...@googlegroups.com
I see ... That makes sense as well. I guess where I am coming from is that after a few weeks of struggling with strings/binaries, I am coming to the conclusion that I would like to exclusively use binaries as coding practice since it seems to be the common denominator in all the external dependencies that I have. In that case, I'd like to keep the code as clean looking and minimalist as possible. To build on your proposal: 

"latin-string"
#"latin-binary"
u"unicodestring"
#u"unicodebinary"

Of course, in this day and age, I would also push for dropping latin strings and latin binaries entirely, and exclusively push for utf8 encoded unicode strings, without having to call them out syntactically. (This is the approach taken by Racket, and it is wonderful working with strings in Racket). In this case, we would have:

"utf8 string"
#"utf8 binary"

This would be the cleanest syntax and simplest semantics.

A. 

Robert Virding

unread,
Dec 15, 2014, 8:17:12 PM12/15/14
to lisp-flavo...@googlegroups.com
For the binary string I think it is good, I thinking along those lines myself. I would extend it slightly with by having:

#"latin binary"
#u"utf8 binary"
#u8"utf8 binary"
#u16"utf16 binary"
#u32"utf32 binary"

The default is utf8 binary but the last three would allow you to be specific, and utf-16 and utf-32 are valid formats.

Just to be difficult: to flip the question do we actually need latin1 binary strings? We could always just go like this which mirrors erlang binary syntax:

#"utf8 binary"
#b("latin1 binary")

We can already input the second form as lists of positive integers are specially treated, though they are masked down to bytes so you can't use it for general unicode codepoints.

The question with strings is what do we mean by a latin1 string and a utf8 string? The string is just a list of integers and whether they are latin1 or unicode codepoints doesn't matter, in this respect they are the same. We never use utf8 encode strings, or at least never should, as there is no point. So we don't need any syntax for it

I know that OTP is slowly going towards having utf-8 strings and I think we should follow them. They are getting more BIFs which work with utf8 binaries and the module unicode.

Robert

anu...@cinova.co

unread,
Dec 15, 2014, 8:42:13 PM12/15/14
to lisp-flavo...@googlegroups.com

#"utf8 binary"
#b("latin1 binary")

I like this proposal. 
 

We can already input the second form as lists of positive integers are specially treated, though they are masked down to bytes so you can't use it for general unicode codepoints.

The question with strings is what do we mean by a latin1 string and a utf8 string? The string is just a list of integers and whether they are latin1 or unicode codepoints doesn't matter, in this respect they are the same. We never use utf8 encode strings, or at least never should, as there is no point. So we don't need any syntax for it


I am not sure I understand this fully. If I type in "å√å©" as a string in my source files (which are presumably utf8 encoded), is it automatically read as a list of unique code points, or is it a list of unsigned byte with the choice of how to interpret it left to the program. 

 
I know that OTP is slowly going towards having utf-8 strings and I think we should follow them. They are getting more BIFs which work with utf8 binaries and the module unicode.

Yes, I fully support this :-)

Robert Virding

unread,
Dec 16, 2014, 12:38:15 PM12/16/14
to lisp-flavo...@googlegroups.com
On Tuesday, December 16, 2014 2:42:13 AM UTC+1, anu...@cinova.co wrote:

#"utf8 binary"
#b("latin1 binary")

I like this proposal. 
 

We can already input the second form as lists of positive integers are specially treated, though they are masked down to bytes so you can't use it for general unicode codepoints.

The question with strings is what do we mean by a latin1 string and a utf8 string? The string is just a list of integers and whether they are latin1 or unicode codepoints doesn't matter, in this respect they are the same. We never use utf8 encode strings, or at least never should, as there is no point. So we don't need any syntax for it


I am not sure I understand this fully. If I type in "å√å©" as a string in my source files (which are presumably utf8 encoded), is it automatically read as a list of unique code points, or is it a list of unsigned byte with the choice of how to interpret it left to the program. 

Something about this is mentioned here http://erlang.org/doc/reference_manual/character_set.html and in its follow-links, but I have not had the chance to see what all this *really* means. What I hope is that while the file may be utf-8 encoded what the scanner/parser/compiler sees are the unicode codepoints. This I think would be the logical and best way of doing it. Otherwise we would have to be handling potential utf encoding everywhere. There seems to be handling of it in both epp.erl and file_io_server.erl, but the code is not very clear and undocumented so it is not trivial to work out what is going one.

When this has stabilised I will try to do something sensible for LFE.

Duncan McGreggor

unread,
Dec 16, 2014, 1:31:42 PM12/16/14
to lisp-flavo...@googlegroups.com
On Mon, Dec 15, 2014 at 7:17 PM, Robert Virding <rvir...@gmail.com> wrote:
For the binary string I think it is good, I thinking along those lines myself. I would extend it slightly with by having:

#"latin binary"
#u"utf8 binary"
#u8"utf8 binary"
#u16"utf16 binary"
#u32"utf32 binary"

I*love* the u8, u16, and u32 distinctions. Nice touch.

I can't say that I'm wild about breaking the symmetry of form with regular binaries, though. I'd prefer:

#b("latin binary")
#u("utf8 binary")
#u8("utf8 binary")
#u16("utf16 binary")
#u32("utf32 binary")

This is a nice visual reminder that these aren't "strings" or lists of integers that can be represented as strings, but rather actual Erlang/LFE binaries.

I realize that this may seem like a minor (and possibly silly) point. I have deep philosophical views on the nature of language, its representation, and how that affects cognition :-) As part of that, I feel the more consistent, the better ...

That being said, I've probably started to beat a dead horse, so I will let this go! I will be happy to use whatever LFE ships with :-)

Another related question on the representation of unicode data... right now, lfe_io:format doesn't work with unicode data:

> (set encoded (binary ("åäö ð" utf8)))
#B(195 165 195 164 195 182 32 195 176)
> (io:format "~tp~n" (list encoded))
<<"åäö ð"/utf8>>
ok
> (lfe_io:format "~tp" (list encoded))
exception error: badarg
  in (: lfe_io fwrite1 "~tp" (#B(195 165 195 164 195 182 32 195 176)))
  in (lfe_io format 3)

(Hrm, Robert... I wonder: do you suppose this could be related to the issues we were seeing in Erjang earlier this year?)

If LFE handled this, I would imagine it to look something like this:

> (lfe_io:format "~tp~n" (list encoded))
#B(("åäö ð" utf8))

Is that what you would do, if we weren't thinking about the #u syntax?

With the #u syntax, then, something like these?

> (lfe_io:format "~tp~n" (list encoded))
#U8("åäö ð" utf8)
or
#U8"åäö ð"
?

An then, the following would be equivalent, yes?

> (set encoded (binary ("åäö ð" utf8)))
#B(195 165 195 164 195 182 32 195 176)
> (set encoded #u8("åäö ð"))
#B(195 165 195 164 195 182 32 195 176)
or
> (set encoded #u8"åäö ð")
#B(195 165 195 164 195 182 32 195 176)

Btw, this is a great discussion :-)

d

 
The default is utf8 binary but the last three would allow you to be specific, and utf-16 and utf-32 are valid formats.

Just to be difficult: to flip the question do we actually need latin1 binary strings? We could always just go like this which mirrors erlang binary syntax:

#"utf8 binary"
#b("latin1 binary")

As you know, I'd prefer the latter :-) 
 
To unsubscribe from this group and stop receiving emails from it, send an email to lisp-flavoured-e...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages