[erlang-questions] Maps

304 views
Skip to first unread message

Björn-Egil Dahlberg

unread,
May 8, 2013, 10:18:57 AM5/8/13
to ee...@erlang.org, erlang-q...@erlang.org
Hi everyone!

We finally have a Maps EEP for you. This will get you an idea on what we are working on. Some of the ideas presented here was presented at Erlang Factory SF Bay Area 2013.

I am sure that there will be some discussions about the contents of this EEP and I hope the discussions will be both fruitful and civilized.

The journey of Maps and this EEP has been long and by no means a straight-forward or continuous one. I had a crystal clear picture of what I wanted Maps to be when we first started discussing it within OTP about two-three years ago. This EEP resembles that vision but it has had a lot of contributions of other ideas from both within and outside of OTP.

The idea was a functional data-type, a syntax aware mapping of key-value associations with pattern matching. A syntax similar to records but without the hazzle of compile-time dependency and with arbitrary terms as keys. Order was not important and it could be implemented with a Hash-Array-Mapped-Trie with good performance and memory trade-offs. This was not an approach to replace records. It was meant to replace records where suitable and in other regards not be a replacement but its own thing.

From the community there has been many wishes of a Map like data-type and a few suggestions.  The one suggestion that stands out is of course the Frames proposal from Richard O'Keefe. It is the most complete proposal I've seen and is very well thought out. Its goal is to be a record replacement and the proposal satisfies this goal very well.

- If Frames are that good, why a separate EEP?
- It boils down to goals and constraints.

A record replacement is just that, a replacement.
It's like asking the question, "What do we have?" instead of "What can we get?"
The instant rebuttal would be "What do we need?" I say Maps.

Frames has certainly influenced Maps. In many regards Maps also encompasses Frames but Maps tries to do more. I think the most significant difference would be, arbitrary terms as keys and how many different keys we would have in a Map. In the end I believe they are two different things and have different goals.

Some Notes and Disclaimers:

Later iterations of Maps has gone through some changes, most significantly,

  * From a set of key-values to a ordered set of key-value associations

I was originally against this change since it forces restrictions on the implementation and it illuminates the lack of distinction between arithmetic order and term order, i.e. the problem of mixing integer and float types as keys in a tree. However, I was later persuaded that key ordering is necessary. We have to respect the totalitarian order of terms.

Considerations has been made on how to, if at all possible, apply Frames characteristics to Maps. Most significantly memory space and key-sharing characteristics. This is not detailed in the EEP though, just mentioned.

The function interface has had many revisions as well. At some stage the API was considered to be a drop-in-replacement for `dict` and thus would have the same function-names. This goal/constraint was dropped by Technical Board decision recently.

From the very beginning Maps was envisioned to have the ability to bind variables derived from the environment. Like this:

    function(K1, #{ K1 := K2, K2 := V }) -> V.

This feature is a beast. Powerful and scary. It is not confined to only Maps but should also be applicable to other types as well:

    function(Skip, <<_:Skip/binary, Value:Size, _/bits>>, Size) -> Value.

It is uncertain how effective such an implementation would be and in the end we might not want this feature at all.

In this EEP we will describe syntax and semantics of Maps but very little is disclosed of its actual implementation. Current prototypes stems from using sparse tuples in a HAMT-like data structure and tuple-like data structures. The HAMT-like data structure is discontinued and will be replaced by some ordered tree.

The proposal is included as an attachment but can also be viewed at this git-repository:
https://github.com/psyeugenic/eep/blob/egil/maps/eeps/eep-0043.md


Regards,
Björn-Egil Dahlberg
Erlang/OTP
eep-XXX.md

Loïc Hoguin

unread,
May 8, 2013, 10:31:19 AM5/8/13
to Björn-Egil Dahlberg, erlang-q...@erlang.org
On 05/08/2013 04:18 PM, Björn-Egil Dahlberg wrote:
> Hi everyone!

Hi!

There was examples of updating a map in a map in the SF talk, what have
they become? I see nothing about this in the EEP.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Max Lapshin

unread,
May 8, 2013, 11:20:18 AM5/8/13
to Loïc Hoguin, Erlang-Questions Questions
It is great.

I haven't understood what is the difference from Frames? Syntax?

Michael Uvarov

unread,
May 8, 2013, 11:33:56 AM5/8/13
to Max Lapshin, Erlang-Questions Questions
Max, map is just new name for frames.
--
С уважением,
Уваров Михаил.
Best regards,
Uvarov Michael

Fred Hebert

unread,
May 8, 2013, 11:39:56 AM5/8/13
to Björn-Egil Dahlberg, ee...@erlang.org, erlang-q...@erlang.org
I was looking at an example:

1> M = #{ 1 => a },
#{ 1 => a }
2> M#{ 1.0 => b }.
#{ 1.0 => b }.
3> M#{ 1 := b }.
#{ 1 => b }
4> M#{ 1.0 := b }.
** exception error: bad argument

And given the definition where we have 'match' and 'equal' being two
different things, am I right in understanding that '=>' can both create
a mapping and update it based on 'equal' whereas ':=' can only update
them using 'match' ?

That seems like tricky semantics to me. Any reason why one can both
update and insert but not the other? It's not like one operator is the
alternative of the other exclusively on update/create or match/equal,
but they're different in both forms.

For the Dialyzer syntax, is there any distinction for types that have to
do with empty/non_empty maps?

I don't think there is any mention of the expected behaviour when
matching on #{_ := V} = #{a => 3}, unless I missed it (there
were cases in map comprehensions, but these are obviously not the same).
Does this need to be specified? Similarly, #{_ := V} = #{a => 1, b => 1}
and #{K := 1} = #{a => 1, b => 1} do not seem specified to me.

For the rest of it, that looks like a decent proposal, and I'm
interested by map comprehensions, but I'll let guys like ROK or jlouis
take a more critical look at it ;)

I'm just hoping nobody's gonna end up using maps as the thing that all
state of all apps of all processes use and send over messages and you
just have to look inside the map, son, because I'm not gonna go through
the bother of sending concise messages to you!

Regards,
Fred.

José Valim

unread,
May 8, 2013, 11:39:56 AM5/8/13
to Loïc Hoguin, erlang-q...@erlang.org
Thanks Björn-Egil, this proposal is great!
I've also found myself agreeing with all the proposals discussed in Open Questions.

There was examples of updating a map in a map in the SF talk, what have they become? I see nothing about this in the EEP.

For those who haven't seen the talk yet, I believe Loic is talking about this:

    A = #{ a => false },
    B = #{ b => A },
    C = B#{b}#{a => true},
    C == #{ b => #{a => true} }.

Notice how we did a nested update by nesting the access and update syntax. If there is a reason for it not to be included in the EEP I would love to hear it too.

Anthony Ramine

unread,
May 8, 2013, 6:50:07 PM5/8/13
to Björn-Egil Dahlberg, ee...@erlang.org, erlang-q...@erlang.org
Hello Björn-Egil,

So what happens with the following code?

X = 2, F = fun (X, <<X:X>>) end.

If we allow to bind variables derived from the environment, shouldn't we forbid confusing expressions where a variable is both shadowed and used?

With the current bindings, this is equivalent to:

> X = 2, F = fun (Y, <<Y:X>>) -> {X,Y} end.
#Fun<erl_eval.12.17052888>
> F(1, <<1:2>>).
{2,1}
> F(1, <<1:1>>).
** exception error: no function clause matching
erl_eval:'-inside-an-interpreted-fun-'(1,<<1:1voilà>>)

With the new bindings, this could mean:

> X = 2, F = fun (Y, <<Y:Y>>) -> {X,Y} end.
#Fun<erl_eval.12.17052888>
> F(1, <<1:1>>).
{1,1}
> F(1, <<1:2>>).
** exception error: no function clause matching
erl_eval:'-inside-an-interpreted-fun-'(1,<<1:2>>)

If variables from the outer environment are still used over the ones from the pattern's, the current behaviour would still be exhibited; but it would be weird to have such backwards bindings.

What I suggest on that matter is to forbid variables both used and shadowed at the same time in a pattern when maps are introduced, or maybe even before. I can write a patch for that.


On the matter of the patterns themselves, will this be allowed?

fun (<<A:B,C:D>>, <<B,D:A>>) -> {A,B,C,D} end.

To fully use the limited scope of patterns, we could also broaden the support of constant expressions in patterns to guard expressions using only variables bound in both environments. As the question of dependencies is already answered by the previous convoluted pattern, allowing guard expressions to be used that way wouldn't be difficult.

fun (<<A:B*2,C:D*3>>, <<B,D:A div 2>>) -> {A,B,C,D} end.
keyreplace(element(N, Tuple0), N, [Tuple0|L], Tuple) -> [Tuple|L].

I find this last example quite soothing. Obviously, that can be done already in Erlang:

keyreplace(Key, N, [Tuple0|L], Tuple) when Key =:= element(N, Tuple0) -> [Tuple|L].

But what about that one?

decode(<<5, Len, Data:(Len-1)/binary>>) -> {p5,Data};
% decode other packets.

This one is less obvious to rewrite in current Erlang and I won't bother to do it.

Regards,

--
Anthony Ramine

Le 8 mai 2013 à 16:18, Björn-Egil Dahlberg a écrit :

> function(Skip, <<_:Skip/binary, Value:Size, _/bits>>, Size) -> Value.

Richard A. O'Keefe

unread,
May 8, 2013, 11:12:07 PM5/8/13
to Björn-Egil Dahlberg, ee...@erlang.org, erlang-q...@erlang.org
Summary:
For their intended uses, frames are smaller and faster and safer
than maps.

For uses *other* than replacing records, maps are superior to frames.

It would be good to have BOTH something with clean syntax that
makes a good replacement for records AND something that does even
better than the 'dict' module for dictionaries.

It would be a very bad thing if we got maps *instead* of frames.

Joe Armstrong is writing a new Erlang book in which he *does* use
maps to replace records, but that would be a Bad Thing.

Björn-Egil Dahlberg <eg...@erlang.org> is rather unkind about
the Frames proposal when he says
A record replacement is just that, a replacement.
It's like asking the question, "What do we have?"
instead of "What can we get?"
The instant rebuttal would be "What do we need?" I say Maps.

The central claim, that Frames are *JUST* a replacement for
records, doing nothing novel, is quite untrue. We can see this
with a point-by-point comparison.

Quoting EEP 43:

The new data-type shall have semantics, syntax and operations that:

• provides an association set from key terms to value terms which can be constructed, accessed and updated using language syntax

Frames do that.

• can be uniquely distinguished from every other data-type in the language

Frames do that.

• has no compile-time dependency for constructing, accessing or updating contents of maps nor for passing maps between modules, processes or over Erlang distribution

Frames do that.

• can be used in matching expressions in the language

Frames do that.

• has a one-to-one association between printing and parsing the data-type

To the extent that I can make sense of that, NEITHER frames NOT maps
do that. For example, <{ b = 1, a = 2 }> would be expected to
print as <{ a = 2, b = 1 }>. The only way to understand the requirement
so that maps satisfy it is

if you print one and read back what you just read
you get a new term which is indistinguishable from
the old one.

Frames do that.

Frames do one more thing:

++++ has a canonical print form (up to white space) so that frames
print the same if and only if they are equal.

When maps were based on hash tables it did not appear that they would
satisfy this requirement, and for Erlang I think it _is_ a requirement.

• has a well defined order between terms of the type and other Erlang types

Frames do that.

• has at most O(log N) time complexity in insert and lookup operations, where N is the number of key-value associations.

Frames have at most O(log N) time complexity in lookup operations.

They do not satisfy a "fast insert" require", but that's not because I
didn't _think_ of it, but because I thought of it and deliberately
rejected it.

Frames are BY DELIBERATE CAREFUL CHOICE not a general purpose
dictionary data type. That's because they offer something that
EEP 43's maps do not and cannot:

++++ costs aN+b worst case space (yes, Maps do that do)
with b small and a <= 2 (Maps cannot do that) and
typical cases should see a == 1 (Maps cannot do that).

There is another explicit requirement for frames:

++++ lend themselves to type checking.

If you look at type checked dictionaries in other languages,
they require _all_ values to have the _same_ type. We would
presumably have something like map(Domain, Range) in Erlang
types. That makes them unsuitable for record-like uses.

Frames are for record-like uses where you want millions of the things,
holding few enough slots that you could bear to write them all down
by hand, and where if you are already using -type specifications and
the dialyzer, you want to continue to benefit from that.

If Maps are "meant to replace records where suitable",
the answer is that they are hardly ever suitable for that purpose.

I see no problem with a really good general purpose dictionary data
type _as such_.

What I'm terrified is that after waiting all this time for something
good to replace records, we'll get maps ***INSTEAD*** of frames,
rather than ***AS WELL AS*** frames.

We know (thanks to my micro-VM experiments) that frames can be
compact _enough_ and efficient _enough_ to be used instead of
records, and it's also clear that with type checking they can
be safe _enough_ (defined as "safer than records") for this to be
a good thing.

It's clear that a data structure that offers O(log N) update
instead of frames' O(N) --- well actually, that's the cost to
update *one* field, updating F fields is still O(N) so it's
O(N/F) per field --- has to pay for it somehow, and it's going
to be more memory and more time.

"In many records Maps also encompasses Frames but Maps tries to
do more" is misleading. The Maps proposal does not in any
practical sense "encompass Frames" because it does not achieve
what the Frames proposal set out to achieve. The Frames
proposal does more of what I *want* than the Maps proposal.

What are the two main user-visible differences?

- Frames are limited to atom keys in the interests of
compile-time error detection (both simply noticing that
a key is not an atom at all and allowing the possibility
of Dialyzer support).

Maps allow any term as a key.

Arbitrary keys is something I wanted the way I wanted my hand
nailed to a desk. If I want dictionaries with arbitrary keys,
I want the 'dict' module, and a more efficient implementation
of that would be just dandy. But by far the commoner case is
to need a restricted key space so that my mistakes get caught.

- Frames have O(lg N) access to a single field or O(N/k) access
to k fields. They have O(N/k) cost for copying a frame with
k fields changed.

Maps have O(lg N) access and O(lg N) update.

With a bit of data flow analysis, typical field access can be
reduced to O(1) time with frames. I cannot recall if I've
written that up, but it should be pretty obvious.

The interesting thing here is that for smallish N, even
fetching a *single* field can be faster with a frame than
with a Map. And this is precisely the use-case for frames.

Richard A. O'Keefe

unread,
May 8, 2013, 11:33:46 PM5/8/13
to Michael Uvarov, Erlang-Questions Questions

On 9/05/2013, at 3:33 AM, Michael Uvarov wrote:

> Max, map is just new name for frames.

ABSOLUTELY *NOT*!

The desire for O(lg N) update forces some crucial implementation
differences.

For record-like uses,
frames would be smaller, faster, and safer than maps.
For dictionary-like uses,
maps would be superior to frames.

Clearly distinguishing between record-like data structures and
dictionary-like data structures is the core of good design in
this area.

Looking at the example

function(K1, #{ K1 := K2, K2 := V }) -> V.

we see an example of the kind of thinking that has gone into
the two proposals.

The Frames proposal won't have anything to do with something
like this, because I am a bear of very little brain who makes
an embarrassing number of mistakes; I am _far_ more likely to
have typed K1 by mistake (intending k1) than to want a variable
here, and I would not give you a thank-you for a compiler that
let this kind of thing through. In particular, frames were
designed to be Dialyzer-friendly, so that there'd be a good
chance that the Erlang tools would not just tell me "Hey, K1
is not known to be an atom here" but "hey, you asked for slot
k1, but there isn't any such slot, did you mean k2?"

The Maps proposal is about *breadth* (a jack of all trades
data structure) and *expressiveness*. Note the example
above, which _has_ to be matched from left to right.
Frame patterns can be matched in any order, just like tuple
patterns. (No, I _don't_ want patterns that can only be
matched from left to right, thank you. I want the pattern
match compiler to be able to choose what subpattern to do
next based on information gain, not order.) I don't
believe it is possible to type check that example, at least
not in situations where the compiler doesn't know exactly what
keys might be present. In contrast, <{ k1 = X, k2 = Y }> _can_
be type checked no matter what other fields might be present.

Being able to write functions that require strict left to right
matching is definitely more expressive than not being able to.
Using frames, you'd have to write the running example as

function(K1, Frame) ->
frame_value(Frame, frame_value(Frame, K1)).

However, sometimes we like to re-order the arguments of our
functions. In the absence of binary patterns, we can do so
freely now. Making it hard to spot when a refactoring is
safe? That's not something the frames proposal is supposed to
contain, and if anyone can find such a thing, I promise I'll
fix it.

Max Lapshin

unread,
May 9, 2013, 4:10:31 AM5/9/13
to Richard A. O'Keefe, Erlang-Questions Questions
Richard. Would you, kindly, tell: where is the difference between maps
and frames?

As far as I understand, it is not about syntax, it is about internal
way of storing data?

Robert Virding

unread,
May 9, 2013, 4:56:42 PM5/9/13
to Björn-Egil Dahlberg, ee...@erlang.org, erlang-q...@erlang.org
Even before taking a really deep dive into studying the EEP one thing I can immediately say: get rid of this having both equal and match and USE ONLY MATCH. Keys are the same when they match. Period. This paragraph:

If key K does not equal any existing key in the map, a new association will be created from key K to value V. If key K is equal to an existing key in map M its associated value will be replaced by the new value V and the new key K will replace the old key if the key does not match. In both cases the evaluated map expression will return a new map.

is weird. Yes I know what it means but it is not intuitive. When keys are replaced or not replaced when they are equal is not can seem very strange to those not deep into erlang semantics.

Push
Yes, I think we made an error with the different types of equalities and that comparisons covert integers to floats. With 20-20 hindsight I would prefer ==,/=,<,=<,>,>= to ONLY work on numbers and convert while another set @==,@/=,@<,@=<,@>,@>= (say) to work on terms and never convert.
Pop

While we are at it I think the evaluation order for keys and values should be defined as left-to-right. The rest of erlang is so why not here?

Robert


Richard A. O'Keefe

unread,
May 9, 2013, 11:03:08 PM5/9/13
to Erlang-Questions Questions

On 9/05/2013, at 8:10 PM, Max Lapshin wrote:

> Richard. Would you, kindly, tell: where is the difference between maps
> and frames?
>
> As far as I understand, it is not about syntax, it is about internal
> way of storing data?

I thought I already had, but here goes.

There is one central semantic difference, which leads to two
others.

(U) What do you want to optimise the design for?

Frames are optimised (pared to the bone, in fact) for use in
record-like ways. They are somewhere between pathetic and
hopeless as general purpose dictionaries.

Maps optimised for service as general-purpose dictionaries.
They are somewhere between almost usable and pathetic for
record-like uses.

I do not believe that it is possible to design a data structure
which is *good* at both tasks. I suspect that it is possible to
prove this, but I must admit I haven't tried.

THIS is the reason that Björn-Egil Dahlberg was able to sneer at
frames as "just a replacement". I don't try to build perpetual
motion machines. I don't try to square the circle with ruler and
compass. I don't try to design bottle-openers that make good
chisels. And given two sets of conflicting requirements (even if
the sets overlap substantially), I don't try to satisfy them both
with a single data structure. (Using a Java ArrayList as a queue
is *possible*, but stupid.)

(K) Frames only allow atoms as keys, maps allow any term.
Frames can be manipulated using special syntax <{ K1 ~ V1, ... }>,
as can maps. In the special syntax, the keys must be visibly and
obviously literal atoms; they may not be expressions; maps allow
keys to be expressions in expression context and variables (or is
it guard expressions, and if not, why not?) in pattern context.

Frames and maps can also be manipulated using functions; that is
not a difference.

(T) Frames take O(M+N) time to update/insert M fields into a (copy of a)
frame that already has N keys. Maps take O(lg N) time to update/
insert a single field into an N-key map. Frames take O(M+N) time to
match M fields in an N-key frame; this can be reduced to O(M.lg N)
for small M and large N, but since the simple code can be blisteringly
fast, there's not really much point.

Frames go with a programming style in which instead of

... R#r.a ...
... R#r.b ...
... R#r.a ... R#r.c ...

you write

<{ a ~ A, b ~ B, c ~ C }> = R,
... A ...
... B ...
... A ... C ...

which has the advantage of failing early if some of the required fields
are not present. In fact my personal style for current Erlang says
"never use R#r.a".

These are not independent properties, and they bear on the key
use-case difference:

Frames were designed to be close to records in space and close to
records in performance, for *record*-like uses (fixed collections
of heterogeneous data) while also allowing a some flexibility,
especially for upgrading.

Maps were designed to be good for use as *dictionaries*; the design
is optimised for *large* numbers of key/value associations. Such a
design intrinsically satisfies the "functional requirements" for a
record replacement (at least in an untyped language), but not the
"non-functional" or quality requirements.

As for implementation, Björn-Egil Dahlberg wrote:

> Current prototypes stems from using sparse tuples in a
> HAMT-like data structure and tuple-like data structures.
> The HAMT-like data structure is discontinued and will be
> replaced by some ordered tree.

The HAMT-like data structure for OCAML that I've looked at
takes *at least* four words for every key/value association.
(I have been able to find plenty of numbers for the *speed*
of HAMTs but nothing for the *space* they need other than a
claim of "small".) That's not necessarily the same thing,
and the "tuple-like" part makes me wonder if something much
cleverer is being done. However, we're told that's going "and
will be replaced by some ordered tree". Suppose you have a
2-3-4 tree. We can play SML/Mercury-style games to move the
size of a node into the pointer to it, so we

2-node: C1 K1 V1 C2 4 words/pair
3-node: C1 K1 V1 C2 K2 V2 C3 3.5 words/pair
4-node: C1 K1 V1 C2 K2 V2 C3 K3 V3 C4 3.3+words/pair

So it's fair to say that this would take *roughly* 3.5 words
per field.

Playing similar move-the-balance-info-into-the-pointers games
would let us do red-black trees or AVL trees in 4 words/pair.

There are other games one can play, at the price of more
complicated code, where we take advantage of the leaves of
a 2-3-4 tree being at the same depth, so that leaf nodes
are K1 V1
or K1 V1 K2 V2
or K1 V1 K2 V2 K3 V3
or K1 V1 K2 V2 K3 V3 K4 V4
and the tree as a whole is roughly 2.3 words per pair.

I don't know what exactly Björn-Egil Dahlberg has in mind,
but if we charge 3 words per key/value association, it's
not likely that we're overestimating the space cost.

What about frames? A frame <{ K1 ~ V1, ..., Kn ~ Vn }>
is represented as

---> { size-and-tag , | , V1, ..., Vn }
/
/
{ size-and-tag, '', K1, ..., Kn }

It is, in fact, the same representation as a record, except that
instead of the first element being the record *name* it is a
pointer to the record's *descriptor*. The space needed is 4+2N
words.

But it gets better. In dictionary-like uses, you have *few*
dictionaries with key sets that tend to be *different*. In
record-like uses, you have *many* 'records, with a much smaller
number of key-sets, so the descriptors can be *shared*. A
typical record can be replaced by a frame with the *same*
space cost: 2+N words.

Segregating the keys is old idea for implementing environments in
Lisp-like languages. It works brilliantly for key-sets that don't
change much.

One really cool thing is that it really lends itself to supporting
basic operations like comparison. The storage representation is
*canonical*. Two equal frames have the same parts in the same order.
To compare two frames for equality:
- if the sizes are different, the frames are different.
- if the descriptors are not the same pointer, then
if the descriptors are not *bitwise* identical in contents,
the frames are different. (This can be implemented by a
tiny chunk of machine code that's faster than memcmp().)
- compare the rest of the frames as if comparing tuples.
To compare two frames for ordering:
- if the sizes are different, the shorter one is less
- if the descriptors are not the same pointer, then
compare the descriptors for ordering as tuples.
If they are not equal, the lesser frame is the one with
the lesser descriptor.
- compare the rest of the frames as if comparing tuples.

I don't understand HAMTs well enough to tell if they are canonical.
In the absence of hash collisions, they might be, but what if there
are collisions? It's certainly hard to see how they could be
canonical for both == and =:= at the same time. (Not a problem for
frames, because == and =:= coincide for atoms).

For search trees, however, it is well understood that two trees can
represent the same dictionary while having quite different shapes.
This makes comparison much much harder. You can still do it in
linear time, but you really have to work at it.

Loïc Hoguin

unread,
May 10, 2013, 7:48:19 AM5/10/13
to Richard A. O'Keefe, Erlang-Questions Questions
On 05/10/2013 05:03 AM, Richard A. O'Keefe wrote:
> Frames are optimised (pared to the bone, in fact) for use in
> record-like ways. They are somewhere between pathetic and
> hopeless as general purpose dictionaries.

I think that's the bigger issue with frames. Are they worth spending the
time implementing considering they are essentially a records
replacement? Records work good enough for most purposes, with the
exception of upgrades, which few people do anyway.

Maps have the potential to replace proplists, dicts, and also some
record misuses, where an interface requires you to include a file to
have the records definition (the file example found in the EEP is one).
They're much more interesting to implement.

As far as I understand maps are not meant as a replacement for #state{}
records.

> (K) Frames only allow atoms as keys, maps allow any term.

Atoms are only good for record-like use or list of options. They are a
minority in a reasonably sized application (Cowboy, 8k+ lines, for
example). Maps on the other hand with keys as iodata() would be much
helpful to store many things related to requests and responses, like
headers, cookies and more, and to convert them quickly with maps
comprehensions.

> But it gets better. In dictionary-like uses, you have *few*
> dictionaries with key sets that tend to be *different*.

Not my experience.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Tom Murphy

unread,
May 10, 2013, 9:19:12 AM5/10/13
to Loïc Hoguin, Erlang-Questions Questions
On Fri, May 10, 2013 at 4:48 AM, Loïc Hoguin <es...@ninenines.eu> wrote:
On 05/10/2013 05:03 AM, Richard A. O'Keefe wrote:
     Frames are optimised (pared to the bone, in fact) for use in
     record-like ways.  They are somewhere between pathetic and
     hopeless as general purpose dictionaries.

I think that's the bigger issue with frames. Are they worth spending the time implementing considering they are essentially a records replacement? Records work good enough for most purposes, with the exception of upgrades, which few people do anyway.

One of the things that's very compelling to me about frames as a record replacement is that (as I understand it), frames are fully-distinguishable as a separate data type.

The record abstraction is *very* leaky. Any Erlang coder who uses records has to know about - and contend with - its underlying representation as a tuple (insertion into ETS tables, for example, is something that's common and trips people up when the first atom (the record tag) is used as the key for the table).

Tom

Tom Murphy

unread,
May 10, 2013, 9:35:39 AM5/10/13
to Loïc Hoguin, Erlang-Questions Questions
Also, to anyone interested in the specifics of what frames actually are, I think the most recent/best resource for understanding them is here:

http://www.cs.otago.ac.nz/staffpriv/ok/frames.pdf

A lot of time has gone into refining this proposal - check it out!

Tom

Natesh Manikoth

unread,
May 10, 2013, 9:41:47 AM5/10/13
to Tom Murphy, Erlang-Questions Questions

Erlang newbie here. This comment therefore is not about the merits of the proposals.
I detect a certain tendency to dismiss suggestions if the suggestions are not germane to that particular user's current (or past) needs. Maybe the suggestions didn't originate from a particular community of Erlang users. I hope things don't get dismissed because of NIH syndrome.
There are very few (a handful) who take the time and trouble ROK seems to take to explore and explain the nuances of choices. I have followed his passionate support of Frames and read his proposal and see the value of something that replaces records. Since I am not a professional user of Erlang I will leave it to others to debate the technical merits. I just hope that folks ask the question whether it has to be an 'either/or' decision between Maps and Frames - can it be both.

Now I will go back to watching these highly educational exchanges on the technical merits of the proposals.

-A newbie

Loïc Hoguin

unread,
May 10, 2013, 10:34:58 AM5/10/13
to Natesh Manikoth, Erlang-Questions Questions
On 05/10/2013 03:41 PM, Natesh Manikoth wrote:
> Erlang newbie here. This comment therefore is not about the merits of
> the proposals.
> I detect a certain tendency to dismiss suggestions if the suggestions
> are not germane to that particular user's current (or past) needs.

Not dismissing it here, I'd be happy to have both, but maps are solving
a problem (dict manipulation is incredibly tedious and time consuming in
Erlang) while frames simply improve what we already have.

Personally not having maps prevents me from doing a particular project,
unless I want to spend all my time manipulating dicts of dicts or
records of records. Hence my only question about the maps proposal about
recursive updating being built-in.

Not having frames just requires me to know about a few pitfalls, but
doesn't prevent me to get the job done.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

José Valim

unread,
May 10, 2013, 11:17:46 AM5/10/13
to Loïc Hoguin, Erlang-Questions Questions
Maps have the potential to replace proplists, dicts, and also some record misuses, where an interface requires you to include a file to have the records definition (the file example found in the EEP is one). They're much more interesting to implement.

I am not sure realistically speaking how much Maps will replace proplists. They are extensively used in OTP and maps don't really seem to provide a huge advantage over them. My fear is adding more inconsistency to the libraries since some will accept maps as options, others accept proplists. The benefits would have to be considerable to replace proplists and I don't think it is the case now.

But I agree; they can replace a good amount of dict usage (hard to say if it will be a full replacement without proper benchmarks) and record misuses.

Personally not having maps prevents me from doing a particular project, unless I want to spend all my time manipulating dicts of dicts or records of records. Hence my only question about the maps proposal about recursive updating being built-in.

Agreed, nested access/updates are definitely a pain. Although there are other ways to ease this, like:

    access:find_in(NestedStructure, [dict_key, #some_record.a, another_key])
    access:store_in(NestedStructure, [dict_key, #some_record.a, another_key], Value)

So even if we don't get a recursive update built-in, it seems we can provide good solutions to the problem.

José Valim
Skype: jv.ptec
Founder and Lead Developer

Juan Jose Comellas

unread,
May 10, 2013, 12:21:26 PM5/10/13
to Erlang-Questions Questions
I think that maps give us the chance to greatly simplify inter-module communication. As Loïc said, I'd also like to have both maps and frames, but right now if you want to send simple structured data to a process/function in another module you can:

a) Use a record, with the added problem of sharing its definition and making sure that both modules are kept in sync with it

b) Use a proplist and deal with very verbose code when manipulating it

This is becoming increasingly relevant with all the JSON messaging being used in HTTP APIs and maps seem to be a great way to ease the pain in these situations (even erlson was already very convenient when dealing with JSON).

I don't see the need to allow anything more than atoms, binaries and strings for the keys, though, but I do feel that forcing the keys to be literals would severely restrict their usability for the cases I mentioned.

Juanjo

Max Lapshin

unread,
May 10, 2013, 12:30:15 PM5/10/13
to Juan Jose Comellas, Erlang-Questions Questions
Guys.

erlang is a rather simple language, which is seriously spoiled by
endless list_to_binary and binary_to_list conversions. Are you really
sure that we need map_to_frame and frame_to_map ?

ami...@gmail.com

unread,
May 10, 2013, 1:14:22 PM5/10/13
to Max Lapshin, Erlang-Questions Questions
Just like in the case of lists/strings and binaries, the reason the conversions happen is because the data types converted to and from are useful in different ways!

I agree we should avoid cluttering the language up, but if we view/use Maps as replacement for Dicts and Frames as replacement for Records, then the situation wouldn't be any more cluttered than it already is.

Tom

Robert Virding

unread,
May 10, 2013, 6:42:28 PM5/10/13
to ami...@gmail.com, Erlang-Questions Questions
I am personally a ambivalent to maps as they have been presented. They are trying to solve two problems which both need solving but I don't think *one* solution is the way to go. So maps:

- are to be a dynamic alternative to records which is a Good Thing.

- are to be a faster replacement to dicts, which is always useful.

The problems are different enough so that I think that in trying to be a solution to both will mean that they will do neither in the best way. So either pick one and solve it properly, or come with two solutions.

From my own experience I would say save the new syntax for the record alternative, this is where I would get the most out of it. Having atoms as keys would pose me no problems. Use a new data type here.

For the dict alternative having a special syntax would give me very little so I wouldn't bother about it, using functions calls handles my usage very well. It also makes it easy to test alternate implementations to find the one with the most appropriate properties, just change the name of the module. For a built in type I would prefer speed to ordering. Actually you probably wouldn't need a built in type, just suitable BIFs to manage existing tuples/lists.

I think I agree with ROK here.

Anyway, that's my take on it,

Robert

Steve Davis

unread,
May 10, 2013, 6:43:28 PM5/10/13
to erlang-pr...@googlegroups.com, Erlang-Questions Questions
I offer the suggestion that if "strings" were not processed as "just lists of integers" but treated as the binary data that they are then you'd not be needing to do this kind of thing...

However, owing to that incorrect insistence in most of the platform apis, I can see that it's hard to avoid.

best,
/s

Robert Virding

unread,
May 10, 2013, 6:45:13 PM5/10/13
to Steve Davis, erlang-pr...@googlegroups.com, Erlang-Questions Questions
Processing lists of integers is faster than processing binaries, which is not surprising really.

Robert


Loïc Hoguin

unread,
May 10, 2013, 6:47:14 PM5/10/13
to Robert Virding, Erlang-Questions Questions
On 05/11/2013 12:42 AM, Robert Virding wrote:
> For the dict alternative having a special syntax would give me very little

You break my heart.

It's the one thing that is important to me. I don't care about the
implementation speed at all.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Loïc Hoguin

unread,
May 10, 2013, 6:52:38 PM5/10/13
to Robert Virding, erlang-pr...@googlegroups.com, Steve Davis, Erlang-Questions Questions
Matching no (match context and sub binaries prevent the need for copying
anything), sending to another process no (smaller, plus above 64 bytes
binaries are not copied), appending no (you can always prepend a string,
but that's hardly intuitive use when manipulating text), etc.

On 05/11/2013 12:45 AM, Robert Virding wrote:
> Processing lists of integers is faster than processing binaries, which
> is not surprising really.
>
> Robert
>
> ------------------------------------------------------------------------
>
> *From: *"Steve Davis" <steven.cha...@gmail.com>
--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Steve Davis

unread,
May 10, 2013, 6:57:00 PM5/10/13
to erlang-pr...@googlegroups.com, Steve Davis, Erlang-Questions Questions
I agree... if they are actually lists of integers. 

However, if the list of integers represents a reference (e.g. path "strings", attribute name "strings") does this also hold?

Robert Virding

unread,
May 10, 2013, 7:01:28 PM5/10/13
to Loïc Hoguin, Erlang-Questions Questions
I am curious. Could you give me an example of how you would like to use the new syntax? Or a new syntax?

Robert

Michael Truog

unread,
May 10, 2013, 8:41:19 PM5/10/13
to Richard A. O'Keefe, Erlang-Questions Questions
I am a bit disturbed by the concern over Frames. Since I started using Erlang I have always wanted native, efficient dictionary functionality as an actual type within the language (not just a dict module, despite the fact dict has one of the best possible implementations within native Erlang source code). It is a major disappointment coming to the Erlang language, being given the process dictionary, and then being told that you should never use it, since it is dirty mutable state that ruins our pure referential transparency.. So, the native dictionary type, Maps, seems long (long) overdue when Erlang is compared with newer languages like Python, Ruby., Perl and Java.

Having a newer and better type of records named Frames seems like an opportunity to take something decent and make it slower (though I understand the extreme effort and attention to make performance close). The argument is hopefully not: "But wait, we have neat syntax which allows us to make neat examples, and they might be useful.... and now everyone should use Frames in their code instead of records", since that seems least desirable. I think it should be hard to make something more efficient than a simple array index lookup, which seems to be the basic idea behind a tuple (with the record syntactic sugar). I don't necessarily care about the records syntax being this preprocessor trick since that is made clear within the documentation. I think the idea that Frames provide a record alternative relies on the benefits of Frames, which appear too minimal because they seem to be focusing on improving the type system, more than extending functionality (so the concern looks
more academic than pragmatic). One of the basic benefits of Frames that has been mentioned is that the type can be used explicitly for external data, however an external API could easily wrap the creation of an atom and tuple, so that justification of Frames doesn't feel very compelling (though the Frame type would make a record/tuple object more naturally a C struct-like type within Erlang).

I understand often performance is not a concern, but I can't help but think it should always be the most important concern for compiler writers and VM writers (at the very least, not to mention the database writers, the message bus writers, and the algorithm library writers), which is why I believe it carries so much importance to the decision of the implementation of Maps or Frames. So, I am very happy with the pursuit of the Map type, while the pursuit of the Frame type seems inappropriate (in the sense of a feature triage).

Loïc Hoguin

unread,
May 11, 2013, 5:46:26 AM5/11/13
to Robert Virding, Erlang-Questions Questions
Here's a simplified snippet:

add_item(Player=#{ inventory => #{ ItemID => # { quantity = Q }}},
ItemID, ItemInfos) ->
Player#{ inventory => #{ ItemID => #{ quantity => Q + 1 }}};
add_item(Player, ItemID, ItemInfos) ->
Player#{ inventory => #{ ItemID => ItemInfos }}.

Pretty clear what this code is doing, the equivalent with functions
leaves me scratching my head. This kind of syntax also allows you to get
it right the first time a lot more often than with functions.

I need to do things like this for more than 500 different actions.
There's not enough time in the world to do it with functions. That
particular project is on hold until I can avoid wasting my time btw.

Richard A. O'Keefe

unread,
May 13, 2013, 1:22:32 AM5/13/13
to Natesh Manikoth, Erlang-Questions Questions

On 11/05/2013, at 1:41 AM, Natesh Manikoth wrote:

> Erlang newbie here. This comment therefore is not about the merits of the proposals.
> I detect a certain tendency to dismiss suggestions if the suggestions are not germane to that particular user's current (or past) needs.

The way I see it, we need *BOTH* a good replacement for records (which are
pretty much ubiquitous in OTP) *AND* a top notch implementation of dictionaries.

I don't just think it *can* be both, I think it *should* be both.

One thing I note about dictionaries is that sometimes I want hashed
dictionaries and sometimes I want sorted dictionaries, and when I
want sorted dictionaries I often need to provide my own comparison
function. (Java has HashMap and TreeMap. My Smalltalk has
Dictionary and SortedDictionary.) The fact that I want more than one
kind of dictionary and that I want more than one kind of hashing+equality
(does string case matter?) or sorting (ascending or descending? what fields)?
Suggests to me that no one kind of dictionary, however, good, should be
privileged with special syntax, unless they _all_ can be.

Just today, in the space of a few dozen lines of Java code, I found myself
juggling both hashed and sorted sets and maps, with three different kinds
of keys/elements.

Given the possibility of a top notch implementation of dictionaries,
a top notch implementation of sets should not be far away (hint hint).

Richard A. O'Keefe

unread,
May 13, 2013, 1:26:47 AM5/13/13
to Loïc Hoguin, Erlang-Questions Questions

On 11/05/2013, at 2:34 AM, Loïc Hoguin wrote:
> Not dismissing it here, I'd be happy to have both, but maps are solving a problem (dict manipulation is incredibly tedious and time consuming in Erlang) while frames simply improve what we already have.

There is another EEP that I haven't finished trying to set up a *general*
solution for the "deep updates" problem. It really isn't a data structure
issue.

Maps are about three things:
- a data structure (although it appears that the data structure will be changed)
- a syntax for that data structure which makes it seem as though it is strongly
recommended as a record replacement (and that is *precisely* how Joe Armstrong
has taken it; the new book he's working on uses maps where it would otherwise
have used records or frames)
- an attempt to solve the deep updates problem.

These things can be decoupled. We can have a "deep updates" solution
that works for lists AND tuples AND records AND frames AND maps. We
can have a top notch data structure. And these things do not require
maps to use frame-like syntax.

Jesper Louis Andersen

unread,
May 13, 2013, 5:02:20 AM5/13/13
to Robert Virding, erlang-pr...@googlegroups.com, Steve Davis, Erlang-Questions Questions

On May 11, 2013, at 12:45 AM, Robert Virding <robert....@erlang-solutions.com> wrote:

> Processing lists of integers is faster than processing binaries, which is not surprising really.

I would contend this claim. Processing a binary is more cache-friendly than processing a list, so in certain cases, cache locality would have the binary win by a large margin. Or are you thinking of something entirely different?

Jesper Louis Andersen
Erlang Solutions Ltd., Copenhagen

Joe Armstrong

unread,
May 13, 2013, 6:15:03 AM5/13/13
to Robert Virding, erlang-q...@erlang.org, ee...@erlang.org
On Thu, May 9, 2013 at 10:56 PM, Robert Virding <robert....@erlang-solutions.com> wrote:
Even before taking a really deep dive into studying the EEP one thing I can immediately say: get rid of this having both equal and match and USE ONLY MATCH. Keys are the same when they match. Period. This paragraph:

If key K does not equal any existing key in the map, a new association will be created from key K to value V. If key K is equal to an existing key in map M its associated value will be replaced by the new value V and the new key K will replace the old key if the key does not match. In both cases the evaluated map expression will return a new map.

is weird. Yes I know what it means but it is not intuitive. When keys are replaced or not replaced when they are equal is not can seem very strange to those not deep into erlang semantics.

I think the problem here is the description - not the semantics. It's not the keys which are replaced, the crucial
idea is to have two different syntaxes. This needs a longer explanation to make sense.

Short explanation
==============

    ':=' means "update an existing key - crash if they key is not present"
    '=>' means "update an existing key OR add a new key" 

The value is pretty much irrelevant

Long explanation of why this is good
=============================

We can update an existing map M with the syntax:

    M#{ K1 Op V1, K2 Op V2, ... Kn Op Vn}

Where Op is either => or :=.

The syntax K => V never generates an error and is used to introduce a new key
or to update an existing key with a new value.

The syntax K :=  V is used to update the value of an *existing* key and will
raise an exception if the key K does not already exist in the map M.

The difference between these two modes of update is crucial, but needs
a couple of examples to explain:

Assume we define a map M as follows:

    M = #{ foo => 1, bar => 2, baz => 3 }


The update

    M # {foo := 12,  bary := 24}
 
Will fail (raise an exception) since there is no key called bary in the
original map. This is good idea (ROK suggested this in his frames paper)
since we don't want too accidentally create a new key due to a spelling 
error. This is the crash-early property of the := update syntax.

Crashing later would make debugging difficult, we would accidentally add
a bad key to a map and learn about it way later.

The update

   M # {foo := 12, bar := 24}

Will succeed, but more importantly the new map has exactly the same
keys as the old map (since all the updates are ':=' updates) - and so
can *share* the same key descriptor. So if we have a very long list of
maps they can be stored in a space efficient manner. (Again this idea
comes from ROKs frames paper). Björn-Egil's eep didn't mention this
but the fact that we know that two maps have the same keys from the
syntax make a lot of optimizations possible).

All of this is possible because there are two operators not one :-)

---

As regards efficiency, utility, beauty and so on these are subjective.

If you want the last ounce of efficiency records and dicts are not
going to go away when maps arrive. So if maps have the wrong
performance characteristics then use the exiting mechanisms.

In the latest addition of my book I've been documenting the changes to maps
- this chapter has changed three times and has tended to be conservative 
so I haven't (yet) mentioned that keys can be any term (and not just atoms).

I'm rather looking forward to being able to represent things like XML
and JSON and property lists in a maps and to have an one-size-fits-all
replacement for dicts and records. I've never really worried about the
last ounce of efficiency - if I want real efficiency I'd change
language and go to C or program an FPGA.
 
Cheers

/Joe

Loïc Hoguin

unread,
May 13, 2013, 7:43:45 AM5/13/13
to Richard A. O'Keefe, Erlang-Questions Questions
On 05/13/2013 07:26 AM, Richard A. O'Keefe wrote:
>
> On 11/05/2013, at 2:34 AM, Loïc Hoguin wrote:
>> Not dismissing it here, I'd be happy to have both, but maps are solving a problem (dict manipulation is incredibly tedious and time consuming in Erlang) while frames simply improve what we already have.
>
> There is another EEP that I haven't finished trying to set up a *general*
> solution for the "deep updates" problem. It really isn't a data structure
> issue.

Don't forget the "deep access" possible not only in expressions but also
in function clauses. You'd only be solving half the problem otherwise.

> - a syntax for that data structure which makes it seem as though it is strongly
> recommended as a record replacement (and that is *precisely* how Joe Armstrong
> has taken it; the new book he's working on uses maps where it would otherwise
> have used records or frames)

To be perfectly honest I don't think that for most record uses this is
going to be much of a problem. There's plenty of #state{} records in
user applications that aren't accessed enough to make this significant.

And if you wait 10 years, any performance difference will be
insignificant...

> These things can be decoupled. We can have a "deep updates" solution
> that works for lists AND tuples AND records AND frames AND maps. We
> can have a top notch data structure. And these things do not require
> maps to use frame-like syntax.

I would agree for key/value structures, but I fear that by including
lists and non key/value tuples you're going to make this much too complex.

Another very important thing that maps have and your solution wouldn't
have, is that the syntax carries semantics that your solution won't
have. For example you can't add new fields to a record, while you can to
maps. If the same syntax is used for everything then that means I need
to look elsewhere to understand the code. The maps EEP doesn't force me
to do that.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Björn-Egil Dahlberg

unread,
May 13, 2013, 2:27:56 PM5/13/13
to ee...@erlang.org, erlang-q...@erlang.org
Hi!

Thank you all who have contributed in this discussion!

Instead of replying to every mail I will try to address and correct some misconceptions.

- As stated earlier by several people. Maps are not Frames.

- There are several semantic and syntax differences. Most notably arbitrary terms as keys.

- The implementation of Maps will have Frame like characteristics for "simple" Maps, i.e. small number of associations (at most ~30 - 40 entries) and immediates as keys. Granted, the type information will not be there at compile-time, and Maps will not have literals as keys (as proposed in Frames), but should otherwise be comparable in performance with Frames for those cases. I also believe this would cover most of the use-cases as a record-replacement.

The internal structure in the "simple" case would best be described with two tuples (but it would not have some other header and might have additionally meta-data in the header if necessary):

    { {K1, ...,Kn}, V1, ..., Vn}

This structure will also allow key-sharing between Maps of the same "type", i.e. Maps with identical keys.

When the Map no longer satisfies this "simple" arrangement it will transform into a tree and loose the previous characteristics and gain the characteristics of a tree.


- Maps are of type ordered set and follows term order. This has several implications.
  - Term order does not distinguish between types float and integer.
  - If data is stored in a tree where we search entries based on keys and we do so by term order. Since term order does not distinguish between float and integer, floats and integers which are equal occupy the same place.
  - When iterating over keys in a Map it is done so in Term order, or reversed Term order, meaning it is expected that floats and integer are mixed, ex. 1 < 2.0 < 3 < 4.0 < 5 < 6.0. If we distinguish between floats and integers in Maps, let's say integer < doubles, the previous order would be, ex: 1 < 3 < 5 < 2.0 < 4.0 < 6.0. We would no longer iterate in Term order and this would not be consistent with the rest of Erlang, hence unacceptable.

Maps emulates the behaviour of ETS ordered set in this sense.

I think we can all agree that this is an unfortunate feature of Erlangs term order. I also agree that it would be best to have *matching* only, and not *equals* but I don't see how this could be achieved easily. At least not in an acceptable way. For example, we cannot achieve this by specifying our own Term order since that would violate iteration order, printing order, etc, or at least expected order.

As noted previously, if we had Maps of type set we would not have this problem. However, since Maps needs to fit into Erlangs term order, Maps needs to be ordered (unless we could somehow abandon this strict rule).


- Deep updates of Maps are not covered by this EEP. We would welcome suggestions on this.

Thank you!

Regards,
Björn-Egil

Richard A. O'Keefe

unread,
May 13, 2013, 11:43:03 PM5/13/13
to Loïc Hoguin, Erlang-Questions Questions

On 13/05/2013, at 11:43 PM, Loïc Hoguin wrote:

> On 05/13/2013 07:26 AM, Richard A. O'Keefe wrote:
>>
>> On 11/05/2013, at 2:34 AM, Loïc Hoguin wrote:
>>> Not dismissing it here, I'd be happy to have both, but maps are solving a problem (dict manipulation is incredibly tedious and time consuming in Erlang) while frames simply improve what we already have.
>>
>> There is another EEP that I haven't finished trying to set up a *general*
>> solution for the "deep updates" problem. It really isn't a data structure
>> issue.
>
> Don't forget the "deep access" possible not only in expressions but also in function clauses. You'd only be solving half the problem otherwise.

I was talking about deep *update*, not deep *access*.
>
>
> To be perfectly honest I don't think that for most record uses this is going to be much of a problem. There's plenty of #state{} records in user applications that aren't accessed enough to make this significant.
>
> And if you wait 10 years, any performance difference will be insignificant...

The frames proposal has been around for 10 years already.
If I wait 10 years, it is quite likely that nothing will happen.

Machines are not getting faster these days, so I'm not sure why
waiting would make the performance difference insignificant.

>> I would agree for key/value structures, but I fear that by including lists and non key/value tuples you're going to make this much too complex.

Single mechanism. ONE thing to understand.
Proven technology: people using other languages with this approach
don't seem to have any problem.

Bruno Girin

unread,
May 14, 2013, 5:18:46 AM5/14/13
to Joe Armstrong, ee...@erlang.org, erlang-q...@erlang.org
Exactly and that is very, very useful. Being used to other languages that have maps but don't differentiate between those two use cases, I regularly find myself writing code like:

if key in map then do this else do that

So being able to differentiate between both behaviours by just using the right operator for what I want to do at the time is nice.



---

As regards efficiency, utility, beauty and so on these are subjective.

If you want the last ounce of efficiency records and dicts are not
going to go away when maps arrive. So if maps have the wrong
performance characteristics then use the exiting mechanisms.

In the latest addition of my book I've been documenting the changes to maps
- this chapter has changed three times and has tended to be conservative 
so I haven't (yet) mentioned that keys can be any term (and not just atoms).

I'm rather looking forward to being able to represent things like XML
and JSON and property lists in a maps and to have an one-size-fits-all
replacement for dicts and records. I've never really worried about the
last ounce of efficiency - if I want real efficiency I'd change
language and go to C or program an FPGA.

I'd debate that. As you explain very well in chapter 7 of your book, Erlang is very good at network protocol processing because of its bit syntax. For simple protocols, I'd really want to keep it lightweight and use something efficient like records. Records also allow me to say: this data has a fixed structured format, don't add random keys to it.

On the other hand, if I'm processing JSON or XML, I'd want the whole power of maps.

Now if I can have maps behave as efficiently as records for simple structures while giving me a single syntax to deal with both simple and complex structures then I'm all for it.

Cheers,

Bruno

Loïc Hoguin

unread,
May 14, 2013, 7:15:32 AM5/14/13
to Richard A. O'Keefe, Erlang-Questions Questions
On 05/14/2013 05:43 AM, Richard A. O'Keefe wrote:
>> To be perfectly honest I don't think that for most record uses this is going to be much of a problem. There's plenty of #state{} records in user applications that aren't accessed enough to make this significant.
>>
>> And if you wait 10 years, any performance difference will be insignificant...
>
> The frames proposal has been around for 10 years already.
> If I wait 10 years, it is quite likely that nothing will happen.

I would suggest not waiting, then.

> Machines are not getting faster these days, so I'm not sure why
> waiting would make the performance difference insignificant.

That's simply not true. Clock rate doesn't increase, but performance
continues getting better. Core i7 CPUs (and Xeon equivalents) are a huge
improvement over the previous generation, even for single-threaded
operations.

>>> I would agree for key/value structures, but I fear that by including lists and non key/value tuples you're going to make this much too complex.
>
> Single mechanism. ONE thing to understand.
> Proven technology: people using other languages with this approach
> don't seem to have any problem.

You don't say what language though, so I'm assuming it's obscure
languages and the technology has only been proven with a small
population, probably mostly academic.

The only thing Google returns me is Perl's Data::Deep which has a
function applyPatch which is unreadable.

If there is a fool proof solution I'm all for it, but I can't really
envision a readable universal syntax for something like this. (And this
is the part where you say that nobody needs syntax, they only need
functions, I assume.)

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Loïc Hoguin

unread,
May 14, 2013, 7:38:11 AM5/14/13
to Björn-Egil Dahlberg, erlang-q...@erlang.org
On 05/13/2013 08:27 PM, Björn-Egil Dahlberg wrote:
> - Deep updates of Maps are not covered by this EEP. We would welcome
> suggestions on this.

I think what Kenneth had in his talk was good. But

Simple update:
M#{ K => V }

Nested update:
M#{ K }#{ DK => DV }

I personally would prefer a more powerful:
M#{ K #{ DK => DV }}

Because it allows you to write things like:
M#{ K #{ DK => DV }, K2 => V2 }

And update multiple levels at once.

I know K can be a map, but deep update only works on values, the same
way normal update only works on values, so no confusion is possible,
we're only updating values, not K itself. If you feel an operator is
needed it can be introduced between K and # in the two examples above.

Programmatically this would translate as extracting the map found at key
K, and ensuring it's a map, updating this map with DK => DV, then
placing this map back into the key K in map M along with setting V2 in
key K2.

It doesn't sound hard to implement in the compiler, it's just unrolling
things for access and rolling back again for the actual update. Exactly
what we do manually today.

It should also be easy to compile to an optimized deep update with this
syntax as you got everything in a single expression.

Note: If you are not interested in it despite how simple it is please at
least ensure it can be done with a parse transform.

Thoughts?

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Jon Schneider

unread,
May 14, 2013, 7:43:56 AM5/14/13
to "Loïc Hoguin", Erlang-Questions Questions
> That's simply not true. Clock rate doesn't increase, but performance
> continues getting better. Core i7 CPUs (and Xeon equivalents) are a huge
> improvement over the previous generation, even for single-threaded
> operations.

Some would use the word slight rather than huge.

Available processing power is no excuse to throw it down the toilet.

Jon

Jesper Louis Andersen

unread,
May 14, 2013, 7:49:14 AM5/14/13
to Jon Schneider, Erlang-Questions Questions

On May 14, 2013, at 1:43 PM, "Jon Schneider" <j...@axismilton.ltd.uk> wrote:

>> That's simply not true. Clock rate doesn't increase, but performance
>> continues getting better. Core i7 CPUs (and Xeon equivalents) are a huge
>> improvement over the previous generation, even for single-threaded
>> operations.
>
> Some would use the word slight rather than huge.
>
> Available processing power is no excuse to throw it down the toilet.
>

One problem with more processing power is that we have a memory bandwidth bottleneck as well and we can't move data from the memory banks onto the CPU at ease. Optimizing for tighter data representations helps these days. Perhaps more than gaining a clock cycle here and there.

Jesper Louis Andersen
Erlang Solutions Ltd., Copenhagen



ami...@gmail.com

unread,
May 14, 2013, 8:51:08 AM5/14/13
to Jesper Louis Andersen, Erlang-Questions Questions

El May 14, 2013, a las 7:49 AM, Jesper Louis Andersen <jesper.lou...@erlang-solutions.com> escribió:

>
> On May 14, 2013, at 1:43 PM, "Jon Schneider" <j...@axismilton.ltd.uk> wrote:
>
>>> That's simply not true. Clock rate doesn't increase, but performance
>>> continues getting better. Core i7 CPUs (and Xeon equivalents) are a huge
>>> improvement over the previous generation, even for single-threaded
>>> operations.
>>
>> Some would use the word slight rather than huge.
>>
>> Available processing power is no excuse to throw it down the toilet.
>

+1

> One problem with more processing power is that we have a memory bandwidth bottleneck as well and we can't move data from the memory banks onto the CPU at ease. Optimizing for tighter data representations helps these days. Perhaps more than gaining a clock cycle here and there.
>

+1

Tom

Robert Virding

unread,
May 14, 2013, 9:07:09 AM5/14/13
to Michael Truog, Erlang-Questions Questions
One thing that has always puzzled me in these discussions is this fascination with *NATIVE* implementations. Why *native*? What is it with *native* that makes people want it? Is it the speed? Or is it the special syntax? Or what? Why care HOW things are done just as long as it fulfils my requirements. When I want a fast dictionary then FAST is the requirement, same with ORDERED or BAGS or ... . How it is done I don't care. If a special syntax is helpful then great, it is the syntax I am after. People complain that you need a special data type or else you can get them mixed up. I don't buy that. I work with lists, tuples, proplists, orddicts, dicts, gb_trees, records, etc together in one application and one problem I don't have is getting them mixed up. I see it as a solution to a problem which is at worst only very small.

Michael, please don't the idea that I am coming down on you specifically here. I am not. You just had the misfortune to mention this near the top where I could easily see it. And it is something I have been pondering about for a long time. It is one of the classic complaints about Erlang (after ;, if, variables, ...), why isn't there a built-in map type.

One of the ideas behind dicts/orddicts was to provide a standardised API which would allow you to choose and easily change which implementation you need/want. Unfortunately we never wrote this down anywhere and only two different versions exist (though I have a ttdict based on 2-3 trees if anyone needs a sorted tree, very fast too). While this forced you to think it would give you lots of options. One of the bad things of having a built-in type is that you don't have think, but you won't always get the most suitable. And seriously thinking about your data structures is always a Good Thing.

Sorry for the rant here.

What *I* would like is:

- A dynamic record-like alternative which I call rr (Robert's records) to avoid getting into a discussion about what frames are/aren't.

- A really fast dict implementation. I don't care how it's done but speed is the main requirement. If it represented as a built-in type or a tree of tuples/lists I don't care, just as long as it is fast.

Loïc you and I can meet at dawn like gentlemen to discuss special syntax or not. Swords or pistols? :-)

Robert

Max Lapshin

unread,
May 14, 2013, 9:41:15 AM5/14/13
to Robert Virding, Erlang-Questions Questions
I think, that first problem is a convenient syntax for key-value
structure. Syntax that will also work for pattern matching, not also
editing existing structure.

Second problem is a distinguishing between tuples and this special
type. Many of us have problems trying to understand: is this a list of
integers or a string? How should I pack it to json? As a result, many
modern pieces of software are migrating to binary-only.

Dan Gudmundsson

unread,
May 14, 2013, 11:17:11 AM5/14/13
to Max Lapshin, Erlang-Questions Questions
To summon up this thread with my view.

We want both Frames and Dictionaries.

Frames for it's small memory footprint and fast access, can replace records except in the most 
time critical places.

Dictionaries for it's arbitrary key terms and the ability to handle many objects.

Matching is nice so we want have syntax to use them both, we do not want to introduce two
new syntax's in the language because that will make it harder to read and learn.

Fred Hebert
>  #{_ := V} = #{a => 3},
> and variants.

Value searching will not be allowed.

Richard A. O'Keefe
> Frames
>
Maps are basically an extension of Frames which allows for arbitrary key terms.
Which could use your implementation proposal for small maps defined in source code,  
and switches to a tree (or something else) for larger dynamic sized Maps.

>- Frames are limited to atom keys in the interests of
> compile-time error detection (both simply noticing that
>  a key is not an atom at all and allowing the possibility
>  of Dialyzer support).

So use := and you will get a runtime/compile check that the key exists.

>- Frames have O(lg N) access to a single field or O(N/k) access
>  to k fields.  They have O(N/k) cost for copying a frame with
>  k fields changed.

You will still have that for small maps.

Robert Virding
> Use only matching

We would like to but we believe we can not do that because we can't output a defined order, 
i.e. you can not sort 1.0 and 1.
We would need to define a new term order in erlang and we need to introduce something
like <:<, >:> =:< and >:=.

/Dan

Chris King

unread,
May 14, 2013, 11:25:28 AM5/14/13
to Erlang-Questions Questions
+1 for "Robert's records".

I do NOT want to use a dictionary to store record data, for the same
reason I don't want to use a list to store tuple data -- it is not the
correct concept.

All compound data structures are one of two kinds -- heterogeneous and
static, or homogeneous and dynamic.

By "heterogeneous and static", I mean that the components are
potentially of differing types (heterogeneous), and that the form of
the structure as a whole is determinable at compile time (static).
This includes tuples, records, "Robert's records", frames, and
functional objects (as they are known in OCaml).

By "homogeneous and dynamic", I mean that the components are
necessarily of the same type (homogeneous), and that the form of the
structure as a whole is not necessarily determinable at compile time
(dynamic). This includes lists, binaries, maps, and set- or map-like
ADTs.

The other two combinations -- heterogeneous and dynamic, and
homogeneous and static -- are rather useless. The latter is merely a
degenerate case of both the above kinds, and the former is almost a
definite code smell (if you can't describe your data, how can you
operate on it?).

Maps, as presented, are heterogeneous and dynamic. I claim this is
folly: If my data is heterogeneous and static, what use have I for
non-atom keys, and why must I be bound by an explicit ordering of
keys(!?)? If my data is homogeneous and dynamic, why must I pay the
storage cost of identifying the type of each element, and preclude my
typechecker from ensuring the homogeneity of my data?

"Robert's records" / frames, on the other hand, are heterogeneous and
static. They lend themselves well to type-checking (see OCaml's
functional objects), and (IMPORTANTLY -- many seem to miss this!) they
provide SUBTYPING OF RECORDS. This is HUGE (I would say the entire
purpose of frames!), NOT merely syntactic, and importantly -- maps
don't provide this! thanks to the above-mentioned ordering of keys.

I personally don't care about a syntax for homogeneous dynamic data --
syntax is useless unless it provides additional expressivity, and I
don't see that the current proposal does for this case. But if others
do, all I ask is PLEASE don't present this syntax as a solution to the
record subtyping problem, as it is not for reasons outlined above --
and I suspect that a "maps" proposal *ignoring* the record subtyping
use case will result in a more efficient implementation.

Jeremy Ong

unread,
May 14, 2013, 11:38:47 AM5/14/13
to Chris King, Erlang-Questions Questions
>
The other two combinations -- heterogeneous and dynamic, and
homogeneous and static -- are rather useless.  The latter is merely a
degenerate case of both the above kinds, and the former is almost a
definite code smell (if you can't describe your data, how can you
operate on it?).

While I agree to some of your points, I don't quite agree here. For example, during a code upgrade, one may want to add a new field to the data (heterogeneous). With static data structures, this upgrade is trickier as it requires the origin and destination to have the correct number of fields or it won't work. With a dynamic structure, no state migration is necessary.

In addition, just because a feature could cause code smell in certain circumstances doesn't mean it should be immediately excluded. Erlang provides list_to_atom (with appropriate caution) because occasionally, it is the right thing to do (convenience in small scripts for example). I think a dynamic heterogeneous structure can provide tremendous value. Take JSON for example. It's used everywhere, and people import JSON into Erlang as a bunch of nested keylists. This is the heterogeneous dynamic structure you are talking about and it is very clunky to work with! Using nested keyfinds, keystores, and keydeletes is extremely clunky and I feel forced to "take an object oriented approach" by creating modules providing getters and setters for each of these values. I hate doing this personally but it's better than the alternative.

Cheers,
Jeremy

Michael Truog

unread,
May 14, 2013, 12:22:46 PM5/14/13
to Robert Virding, Erlang-Questions Questions
Comments below:

On 05/14/2013 06:07 AM, Robert Virding wrote:
> One thing that has always puzzled me in these discussions is this fascination with *NATIVE* implementations. Why *native*? What is it with *native* that makes people want it? Is it the speed? Or is it the special syntax? Or what? Why care HOW things are done just as long as it fulfils my requirements. When I want a fast dictionary then FAST is the requirement, same with ORDERED or BAGS or ... . How it is done I don't care. If a special syntax is helpful then great, it is the syntax I am after. People complain that you need a special data type or else you can get them mixed up. I don't buy that. I work with lists, tuples, proplists, orddicts, dicts, gb_trees, records, etc together in one application and one problem I don't have is getting them mixed up. I see it as a solution to a problem which is at worst only very small.

I think my concern about whether the implementation is native (Erlang-only source code) or not relates to the performance characteristics. It seems as if we need C integration to get faster than dict on a wide variety of key types. I agree that it is best to have many different types of dictionary implementations, and ideally we would have a bunch of dictionaries implemented in C also :-) However, I understand that more testing and considerations are necessary to make the C integration decent performance-wise and fault-tolerance-wise. Basically, what seems required, is performance similar to the process dictionary, or better, as a separate type, called Maps.


>
> Michael, please don't the idea that I am coming down on you specifically here. I am not. You just had the misfortune to mention this near the top where I could easily see it. And it is something I have been pondering about for a long time. It is one of the classic complaints about Erlang (after ;, if, variables, ...), why isn't there a built-in map type.
>
> One of the ideas behind dicts/orddicts was to provide a standardised API which would allow you to choose and easily change which implementation you need/want. Unfortunately we never wrote this down anywhere and only two different versions exist (though I have a ttdict based on 2-3 trees if anyone needs a sorted tree, very fast too). While this forced you to think it would give you lots of options. One of the bad things of having a built-in type is that you don't have think, but you won't always get the most suitable. And seriously thinking about your data structures is always a Good Thing.

I agree, and I wish OTP/Erlang was able to add more data structures like your rbtree to compliment the gbtree, and possibly resolving any differences between the aatree and the gbtree. So, I think having more data structures is more beneficial than having less, since it gives developers more options. Though I understand these decisions are probably conservative and slow moving. Please release the ttdict! Have been wanting to see that one for awhile.


>
> Sorry for the rant here.
>
> What *I* would like is:
>
> - A dynamic record-like alternative which I call rr (Robert's records) to avoid getting into a discussion about what frames are/aren't.
>
> - A really fast dict implementation. I don't care how it's done but speed is the main requirement. If it represented as a built-in type or a tree of tuples/lists I don't care, just as long as it is fast.

I agree a record-like alternative would be nice. I just see it as less important than the fast dictionary implementation. So, if given the choice between the two, I am happy we can get the fast dictionary implementation.

The record-like alternative is supposedly most important for external term representation, since it is a separate type. Normally, usage of records as external terms are a type of message format used in communication with ports or other things (cnodes, or possibly port drivers). So, perhaps what we want is something a bit different from the concept of a "record" but rather a "message" type. A message could just be a generic container that defines a name, a header, and a body, where the header is normally empty (the header is always key->value) and the name is always an atom (or possibly atom or binary/string, to avoid excessive atom consumption). The body could be a binary that is accessed efficiently based on key->values where each key maps to an index. I will admit this idea is coming from a different angle, so it isn't well-aligned with the previous discussions, but it seems like trying to replace records as they are typically used is futile (unless we just make a type
specific to tuples with an atom to keep the same performance, and satisfy both concerns). Either way, it seems important to focus on what the record-like alternative is suppose to solve and the problems it is trying to solve. In the past, the discussion of frames seems to have gotten bogged down in syntax discussions, and convenience, but whatever record-like alternative which is pursued seems to require clearly stated benefits. So far, the main one I have seen is just that there is a way to represent records as external terms, and that reason isn't entirely compelling when you first see it. Simplifying debugging and development is probably an obvious goal, but to really do that, requires the same performance to replace records usage, so then you get back to replicating a tuple atom combo as a separate type, instead of focusing on a new type with new characteristics.

Loïc Hoguin

unread,
May 14, 2013, 12:38:38 PM5/14/13
to Robert Virding, Erlang-Questions Questions
On 05/14/2013 03:07 PM, Robert Virding wrote:
> Loïc you and I can meet at dawn like gentlemen to discuss special syntax or not. Swords or pistols? :-)

Pistols.

I don't disagree with most of what you say. The special syntax isn't
really needed until you get a project where you need to keep really many
things in the state (hundreds if not thousands of terms per connection)
and update them many times, in my case many times per second, while
still striving for low latency.

When you have that many values, your first enemy is verbosity, not only
to be able to write the code fast, but also to find the bugs in it
quicker. Execution speed only comes after. Of course by verbosity I
don't mean saving a character or two, I mean that what the code says is
simpler. For example, why would you want to say "I want to extract this
value V1 then extract this value V2 then modify a value in V2 then put
V2 back in V1 then put V1 in the original structure" instead of "I want
to modify this deep value there"? We shouldn't have to worry about all
these intermediate operations. They're a lot longer to write, and a
bigger source of error.

My personal use case is probably not that universal, but I believe it
also applies to anyone who has to access or modify JSON or equivalent
(not XML). How do you access deep values in JSON in Erlang? How do you
update a value in JSON? How do you update 20 of them? Painfully. You can
write functions to do it if you don't need to do it many times in your
program, but the more you need to do it the more you'll want a special
syntax to actually get things done instead of handling the details of
the modification.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Chris King

unread,
May 14, 2013, 1:37:15 PM5/14/13
to Jeremy Ong, Erlang
On Tue, May 14, 2013 at 12:25 PM, Jeremy Ong <jer...@quarkgames.com> wrote:
>> Code upgrades involve only invoking new functions from old functions
>> -- these are syntactically different functions and hence may have
>> different static types.
>
> What about running processes? How does a gen_server upgrade work? The state
> isn't explicitly mutated but it does need to transition.

The old loop() function invokes the new upgrade() function, which
invokes the new loop() function. The two loop() functions are
separate functions and hence may have different types. There's no
magic or hand-waving involved.


>> Yes, and OCaml provides Obj.magic (C-style type cast). However it,
>> like list_to_atom, are *explicit* -- I know immediately by its
>> presence that something very weird is going on which warrants closer
>> inspection, and its uses are almost *always* at I/O boundaries.
>>
> C++ is an obvious extreme of a language featuring many things that could
> involve "code smell" but each of which may be situationally useful. Again,
> this is an extreme example but you get the idea.

Extreme indeed ;) However if feature X is useful only in certain
situations but is a code smell in most others, then I claim that
feature X is poorly designed. C++ templates come to mind -- yes, they
are useful for massive abstraction, but they provide enough means of
abuse that one should consider them a poorly designed feature.
(OCaml's module system is a superb example of a feature intended for
massive abstraction that makes doing "bad" things nigh impossible.)


>> JSON is not dynamic. I must know exactly what fields exist in any
>> JSON object I'm working with, or I'm doing something wrong. (This
>> design pattern is evinced by the fact that JSON keys may not be
>> arbitrary values.)
>
> Perhaps I misunderstand what you mean by the term dynamic. I should be able
> to access the field "foo" or 5 inside a json object regardless of whatever
> other crap is there.

Static structure does not preclude subtyping. OCaml's functional
objects capture this notion perfectly -- an object has a set of
methods which are known at compile time (hence static). However any
given function need only be aware of the presence of the subset of
methods in which it is interested. Note that this does not preclude a
function from creating a copy of said object with only those methods
modified!

By "dynamic" I mean not determinable at compile time: e.g. the length
of a list is generally not knowable at compile time, as the point of a
list is expressly to sequence an arbitrary number of items.

Not knowing which members are present in a structure renders the
structure nigh useless -- if a structure is missing some data my code
expects it to contain, then it is for all purposes broken. (Note that
using the absence of a member to indicate a "null" or default value is
folly -- how am I to distinguish between code wishing to use a default
value, and code which incorrectly did not specify a value due to a
typo?)

Naturally we encounter a problem at the I/O boundaries of our program
-- if we are reading JSON data from the network, we can in no way
guarantee it will contain (a superset of) the fields we expect. But
this is exactly where input sanitation must take place: if we blindly
accept foreign data, we open ourselves up to security risks (viz.
list_to_atom). To ensure correct and secure operation, we must
explicitly sanitize the homogeneous dynamic data from the network (via
parsing) to obtain heterogeneous static data (via construction).

Of course we may permit the function "binary_to_frame" as a shortcut,
but, like list_to_atom and the infamous eval, we lose the ability to
typecheck the result (unless we decorate with assertions, which
fortunately Erlang provides for), and we open up security holes.


> Yea I'm wrong here, for some reason, I thought JSON keys could be integers
> as well.

IIRC Javascript object keys can be (this is how arrays work), perhaps
this is what you were recalling?


>> The problem you describe is one of data interface. Assume for a
>> moment that the network does not exist, and all JSON data lives purely
>> within Erlang. JSON data may then happily be represented as a
>> heterogeneous static structure (e.g. a frame) rather than a
>> homogeneous dynamic one (e.g. a keylist).
>
> A better example would be the interface between Erlang and Ruby. Ruby hashes
> are heterogeneous and dynamic and allow for integer, atom, or string keys.
> If I want full interoperability between them, I will need to use a
> heterogeneous structure.

In more strict statically typed languages (e.g. the ML family), this
issue is resolved by tagged unions -- the type of a Ruby key could be
"{int, integer()} | {atom, atom()} | {str, string()}". This preserves
type-ability at the cost of (admittedly annoying) extra keystrokes.

Erlang, being dynamically typed, maintains an implicit tag for each
type, the full union of which is denoted "any()"; hence the above use
case may be supported by claiming that the Ruby dict() maps type any()
to type any().

Obviously this only works for maps, not frames -- the reason I argue
*for* frames (not against maps! I don't care if there's both!) is
precisely because I *don't* want to be able to do this: I want my
frames to have static structure, and I want the compiler (or Dialyzer)
to kick me in the pants if they don't!

Björn-Egil Dahlberg

unread,
May 14, 2013, 1:40:16 PM5/14/13
to Fred Hebert, erlang-q...@erlang.org, ee...@erlang.org



2013/5/8 Fred Hebert <mono...@ferd.ca>
I was looking at an example:

    1> M = #{ 1 => a },
    #{ 1 => a }
    2> M#{ 1.0 => b }.
    #{ 1.0 => b }.
    3> M#{ 1 := b }.
    #{ 1 => b }
    4> M#{ 1.0 := b }.
    ** exception error: bad argument

And given the definition where we have 'match' and 'equal' being two
different things, am I right in understanding that '=>' can both create
a mapping and update it based on 'equal' whereas ':=' can only update
them using 'match' ?

Yes.
 '=>' is used to associate new key value pairs, or update an existing (*equal*) key.
 ':=' is used to update an existing (matching) key, or used in patterns in matching.
 

I don't think there is any mention of the expected behaviour when
matching on #{_ := V} = #{a => 3}, unless I missed it (there
were cases in map comprehensions, but these are obviously not the same).
Does this need to be specified? Similarly, #{_ := V} = #{a => 1, b => 1}
and #{K := 1} = #{a => 1, b => 1} do not seem specified to me.

No, #{ _ := V } = #{ a => 1} is not allowed. Only keys which are bound by the environment are allowed.
 

Björn-Egil Dahlberg

unread,
May 14, 2013, 2:21:32 PM5/14/13
to Loïc Hoguin, erlang-q...@erlang.org

2013/5/14 Loïc Hoguin <es...@ninenines.eu>

On 05/13/2013 08:27 PM, Björn-Egil Dahlberg wrote:
- Deep updates of Maps are not covered by this EEP. We would welcome
suggestions on this.

I think what Kenneth had in his talk was good. But

Simple update:
  M#{ K => V }

Nested update:
  M#{ K }#{ DK => DV }

I believe this was a hasty example since it would, i gather, return the inner Map.

1> M = #{ a => #{ b => 1 } }.
#{ a => #{ b => 1 } }
2> M#{ a }#{ b := 2 }.
#{ b => 2 }
 



I personally would prefer a more powerful:
  M#{ K #{ DK => DV }}

Because it allows you to write things like:
  M#{ K #{ DK => DV }, K2 => V2 }

And update multiple levels at once.

I know K can be a map, but deep update only works on values, the same way normal update only works on values, so no confusion is possible, we're only updating values, not K itself. If you feel an operator is needed it can be introduced between K and # in the two examples above.

Programmatically this would translate as extracting the map found at key K, and ensuring it's a map, updating this map with DK => DV, then placing this map back into the key K in map M along with setting V2 in key K2.

It doesn't sound hard to implement in the compiler, it's just unrolling things for access and rolling back again for the actual update. Exactly what we do manually today.

It should also be easy to compile to an optimized deep update with this syntax as you got everything in a single expression.

Note: If you are not interested in it despite how simple it is please at least ensure it can be done with a parse transform.

Thoughts?

I think we want to be explicit here. Meaning using an operator and "seeing" the value in action. (I have bad memories of perl).

Just some first thoughts on deep updates here to see if I get the intention right. What I think we need is someway to use an associated value within the map. I figure, something like "escaping" the key which would fetch the associated value. Let's use '~' in this example.

Say I have a nested map like above, M = #{ a => #{ b => 1 }} and I would like to update the inner b value with it self plus four. I could write it like this:

3> M#{ a := ~a#{ b := ~b + 4 }}.
#{ a => #{ b => 5 }}

Another possibility would perhaps be to bind values first (wrapped in { .. } )and then use them in updates, say

M#{ a := M }{ a := M#{ b := V }{ b := V + 4 } }

.. that looks even more horrible .. =)

I haven't given this much thought but I'm trying to get a sense of it. Something like this?

// Björn-Egil

Jachym Holecek

unread,
May 14, 2013, 3:05:09 PM5/14/13
to Lo?c Hoguin, Erlang-Questions Questions
# Lo?c Hoguin 2013-05-14:
> On 05/14/2013 03:07 PM, Robert Virding wrote:
> >Lo?c you and I can meet at dawn like gentlemen to discuss special syntax
> >or not. Swords or pistols? :-)
>
> Pistols.
>
> I don't disagree with most of what you say. The special syntax isn't
> really needed until you get a project where you need to keep really many
> things in the state (hundreds if not thousands of terms per connection)
> and update them many times, in my case many times per second, while
> still striving for low latency.

Have you considered ETS tables? A private ordered_set variety would seem
to fit your needs rather well, based on what you just said and an example
you gave in an earlier post. You mention keeping large state, be mindful
this may, inherently, be playing against your low latency goal if you keep
all the data on heap. Well, depending on what precisely you mean by "large
state" and "low latency".

Just briefly -- you mentioned you need to maintain nested objects along
with attributes:

true = ets:insert_new(Tid, {{Player_id, possesions, Item_id, count}, 1})

There's also ets:{update_counter/X|lookup_element/X}, and ets:select/X with
known prefix behaves just nice too. Maybe a list would work better as key
type than tuple. Anyway, you get the idea.

Also -- you could consider using processes to encapsulate various bits of
state.

> When you have that many values, your first enemy is verbosity, not only
> to be able to write the code fast, but also to find the bugs in it
> quicker. Execution speed only comes after. Of course by verbosity I
> don't mean saving a character or two, I mean that what the code says is
> simpler.

True. It may be that a native type would help, it may be that more careful
choice of data representation and analysis of data flows in your application
would help a lot more (not to come across patronizing here -- I'm reminded
of a few messy protocol converters/gateways we did at ${dayjob}).

> For example, why would you want to say "I want to extract this
> value V1 then extract this value V2 then modify a value in V2 then put
> V2 back in V1 then put V1 in the original structure" instead of "I want
> to modify this deep value there"? We shouldn't have to worry about all
> these intermediate operations. They're a lot longer to write, and a
> bigger source of error.
>
> My personal use case is probably not that universal, but I believe it
> also applies to anyone who has to access or modify JSON or equivalent
> (not XML).

<rant>The exact degree of ugliness and dismal design work in all these
"modern" HTTP-ish "protocols" is perhaps new</rant>, dealing with deeply
nested data structures on external interfaces is not...

> How do you access deep values in JSON in Erlang?

I don't. Usually it ends up being pretty reasonable to convert them to
Erlang-friendly internal representations on protocol boundaries (good
thing one mostly knows what parts of them are actually of intereset),
these end up being mostly flat -- often tagged-value lists or records.

For conversion itself -- layer-by-layer, accumulating the result as
you go. But clearly, it all depends on the details here.

> How do you update a value in JSON? How do you update 20 of them?
> Painfully.

Convert them to something sensible and only then process them? A few
utility functions on top of tagged-value lists can work miracles for
you, for example.

> You can write functions to do it if you don't need to do it many times
> in your program, but the more you need to do it the more you'll want a
> special syntax to actually get things done instead of handling the
> details of the modification.

Just to be clear: I'm not expressing any opinions about any existing or
fictitious EEP. Merely trying to approach your problem from a different
angle, a more general one I think. Also my suggestion to use ETS in not
necessarily related to following points about data representation.

BR,
-- Jachym

Michael Truog

unread,
May 14, 2013, 3:51:43 PM5/14/13
to Björn-Egil Dahlberg, ee...@erlang.org, erlang-q...@erlang.org
I haven't seen it mentioned yet, so I thought I would mention it... sorry if this repeats what others have suggested:

It would help if we had a way to walk the tree representation of the Map.  I know it might be an uncommon use-case, but it probably would provide support for deep-updates and complex path usage (JSONPath, JSONQuery, or JSONSelect type of stuff, http://stackoverflow.com/a/8481733).  Walking the tree, could be emulated even if the data structure is not tree shaped due to a small size.

The particular use case I am thinking of, is for doing matching on the key value.  I have some trie source code here: https://github.com/okeuday/trie with a trie:find_match/2 function, which takes an exact string (list of integers) value and attempts to match it to stored key string patterns that possibly contain 1 or more "*" wildcard character (strings containing "**" do not exist, since the occurrence is considered an invalid entry).  A "*" wildcard character matches one or more characters in the exact string value provided as a trie:find_match/2 function argument.  The match could be ambiguous, except that it always finds the most exact match, going as deep into the tree as possible, and then iterating in alphabetical order (i.e., as exact as possible, based on the prefix).  The opposite functionality (pattern matching exact string keys stored) is provided by trie:fold_match/4.  This type of functionality should require an incomplete traversal of the tree structure (a set of functions that provide iterator behavior).

Thanks,
Michael
_______________________________________________
eeps mailing list
ee...@erlang.org
http://erlang.org/mailman/listinfo/eeps

Loïc Hoguin

unread,
May 14, 2013, 4:21:29 PM5/14/13
to Björn-Egil Dahlberg, erlang-q...@erlang.org
On 05/14/2013 08:21 PM, Björn-Egil Dahlberg wrote:
> 2013/5/14 Loïc Hoguin <es...@ninenines.eu <mailto:es...@ninenines.eu>>
That's taking it even further than I thought. I would be perfectly fine
with binding separately from updating, especially because:

M#{ a := ~a#{ b := ~b + 4 }}.

isn't so pretty when done with some key types like:

M#{ <<"a">> := ~<<"a">>#{ <<"b">> := ~<<"b">> + 4 }}.

Now most of what I would be using would be atoms and integers, but this
still would be the ugliest allowed syntax ever.

It works fine if done in two steps (with the first step most likely done
in the function clause itself).

doit(M=#{ a := A=#{ b := B }}) ->
M#{ a := A#{ b := B + 4 }}.

Writing this I realize it's quite possible that you can already do it
with the current EEP. Am I right?

And it also fits:

doit(M=#{ a := A=#{ b := B }, aa := yes_do_it}) ->
M#{ a := A#{ b := B + 4 }, aa := done}.

You can already do this with records. Can we do this with maps too? This
+ any type as key + dynamic is all I need.

It would be cool if that could also be optimized at compile time, though
that's more of an edge case so understandable if that comes later.

Loïc Hoguin

unread,
May 14, 2013, 4:38:43 PM5/14/13
to Jachym Holecek, Erlang-Questions Questions
On 05/14/2013 09:05 PM, Jachym Holecek wrote:
> # Lo?c Hoguin 2013-05-14:
>> On 05/14/2013 03:07 PM, Robert Virding wrote:
>>> Lo?c you and I can meet at dawn like gentlemen to discuss special syntax
>>> or not. Swords or pistols? :-)
>>
>> Pistols.
>>
>> I don't disagree with most of what you say. The special syntax isn't
>> really needed until you get a project where you need to keep really many
>> things in the state (hundreds if not thousands of terms per connection)
>> and update them many times, in my case many times per second, while
>> still striving for low latency.
>
> Have you considered ETS tables? A private ordered_set variety would seem
> to fit your needs rather well, based on what you just said and an example
> you gave in an earlier post. You mention keeping large state, be mindful
> this may, inherently, be playing against your low latency goal if you keep
> all the data on heap. Well, depending on what precisely you mean by "large
> state" and "low latency".
>
> Just briefly -- you mentioned you need to maintain nested objects along
> with attributes:
>
> true = ets:insert_new(Tid, {{Player_id, possesions, Item_id, count}, 1})
>
> There's also ets:{update_counter/X|lookup_element/X}, and ets:select/X with
> known prefix behaves just nice too. Maybe a list would work better as key
> type than tuple. Anyway, you get the idea.

Thanks but I already know ets. ets is too verbose, and I can't pattern
match against values easily in function clauses.

There's been a thread a while back with various "solutions" being given,
none coming close to what maps can allow, most particularly none
allowing matching in function clauses, which is the other requirement
for the code to be tidy.

> Also -- you could consider using processes to encapsulate various bits of
> state.

I'd just be passing the data back to the connection process all the
time, it's pointless. I don't control the protocol, and the protocol
requires many superfluous values to be sent to the client all the time.
The only thing I can control is how I structure things to make this
easier for me, and avoid writing too much code to do it. Maps allow me
to do that.

>> How do you access deep values in JSON in Erlang?
>
> I don't. Usually it ends up being pretty reasonable to convert them to
> Erlang-friendly internal representations on protocol boundaries (good
> thing one mostly knows what parts of them are actually of intereset),
> these end up being mostly flat -- often tagged-value lists or records.
>
> For conversion itself -- layer-by-layer, accumulating the result as
> you go. But clearly, it all depends on the details here.

If you only had objects you could do it in a single line with maps. If
you also have arrays, it'll be one list comprehension per array on top
of that, with possible operations inside it. You could process your
whole JSON with a one liner. No accumulators, not lists:reverse, no need
to write any function, and most likely a lot more optimized with less
chances of bugs than you'd have done otherwise.

>> How do you update a value in JSON? How do you update 20 of them?
>> Painfully.
>
> Convert them to something sensible and only then process them? A few
> utility functions on top of tagged-value lists can work miracles for
> you, for example.

I know what I can do today, you don't need to tell me. I also know what
I could do tomorrow. The future looks pretty damn good.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Björn-Egil Dahlberg

unread,
May 14, 2013, 5:04:10 PM5/14/13
to Loïc Hoguin, erlang-q...@erlang.org
2013/5/14 Loïc Hoguin <es...@ninenines.eu>
On 05/14/2013 08:21 PM, Björn-Egil Dahlberg wrote:
2013/5/14 Loďc Hoguin <es...@ninenines.eu <mailto:es...@ninenines.eu>>
I totally agree with you, it certainly lacks beauty.

 

It works fine if done in two steps (with the first step most likely done in the function clause itself).

doit(M=#{ a := A=#{ b := B }}) ->
  M#{ a := A#{ b := B + 4 }}.

Writing this I realize it's quite possible that you can already do it with the current EEP. Am I right?

Yes.
 

And it also fits:

doit(M=#{ a := A=#{ b := B }, aa := yes_do_it}) ->
  M#{ a := A#{ b := B + 4 }, aa := done}.

You can already do this with records. Can we do this with maps too? This + any type as key + dynamic is all I need.

Yes. I had similar examples in the EEP but didn't nest the Maps.
 

It would be cool if that could also be optimized at compile time, though that's more of an edge case so understandable if that comes later.

What was it you said to me? "Make it work, make it pretty, make it fast." =) Still on step oneish.
 


--
Loďc Hoguin

Loïc Hoguin

unread,
May 14, 2013, 5:48:38 PM5/14/13
to Björn-Egil Dahlberg, erlang-q...@erlang.org
On 05/14/2013 11:04 PM, Björn-Egil Dahlberg wrote:
> It works fine if done in two steps (with the first step most likely
> done in the function clause itself).
>
> doit(M=#{ a := A=#{ b := B }}) ->
> M#{ a := A#{ b := B + 4 }}.
>
> Writing this I realize it's quite possible that you can already do
> it with the current EEP. Am I right?
>
>
> Yes.
>
>
> And it also fits:
>
> doit(M=#{ a := A=#{ b := B }, aa := yes_do_it}) ->
> M#{ a := A#{ b := B + 4 }, aa := done}.
>
> You can already do this with records. Can we do this with maps too?
> This + any type as key + dynamic is all I need.
>
>
> Yes. I had similar examples in the EEP but didn't nest the Maps.

Could be worth adding to the EEP so as to not forget to test it. Should
I send a PR to your repository?

> It would be cool if that could also be optimized at compile time,
> though that's more of an edge case so understandable if that comes
> later.
>
>
> What was it you said to me? "Make it work, make it pretty, make it
> fast." =) Still on step oneish.

Yep. It was just a suggestion that heavily hinted at "optimize for that
case after everything else is optimized".

--
Loïc Hoguin


Erlang Cowboy
Nine Nines
http://ninenines.eu

Björn-Egil Dahlberg

unread,
May 14, 2013, 6:05:58 PM5/14/13
to Loïc Hoguin, erlang-q...@erlang.org
2013/5/14 Loïc Hoguin <es...@ninenines.eu>
On 05/14/2013 11:04 PM, Björn-Egil Dahlberg wrote:
    It works fine if done in two steps (with the first step most likely
    done in the function clause itself).

    doit(M=#{ a := A=#{ b := B }}) ->
       M#{ a := A#{ b := B + 4 }}.

    Writing this I realize it's quite possible that you can already do
    it with the current EEP. Am I right?


Yes.


    And it also fits:

    doit(M=#{ a := A=#{ b := B }, aa := yes_do_it}) ->
       M#{ a := A#{ b := B + 4 }, aa := done}.

    You can already do this with records. Can we do this with maps too?
    This + any type as key + dynamic is all I need.


Yes. I had similar examples in the EEP but didn't nest the Maps.

Could be worth adding to the EEP so as to not forget to test it. Should I send a PR to your repository?

If you do, make sure to base it of erlang/eep master since Raimo did not use my eep branch for some reason. But hopefully we will not forget to test this anyhow =)
 


    It would be cool if that could also be optimized at compile time,
    though that's more of an edge case so understandable if that comes
    later.


What was it you said to me? "Make it work, make it pretty, make it
fast." =) Still on step oneish.

Yep. It was just a suggestion that heavily hinted at "optimize for that case after everything else is optimized".

=)

Richard A. O'Keefe

unread,
May 14, 2013, 6:55:39 PM5/14/13
to Loïc Hoguin, Erlang-Questions Questions

On 14/05/2013, at 11:15 PM, Loïc Hoguin wrote:
>> Single mechanism. ONE thing to understand.
>> Proven technology: people using other languages with this approach
>> don't seem to have any problem.
>
> You don't say what language though, so I'm assuming it's obscure languages and the technology has only been proven with a small population, probably mostly academic.

Stop making assumptions. I want the EEP to have my name on it, but the
approach is used in a language which has enough users that new books about
it or based on it keep rolling off the presses (much faster than Erlang
books appear) and for which there are millions of lines of maintained code
in the public repositories (dwarfing Erlang).
>
> If there is a fool proof solution I'm all for it, but I can't really envision a readable universal syntax for something like this. (And this is the part where you say that nobody needs syntax, they only need functions, I assume.)

Stop making assumptions. Except perhaps for the spelling of a particular
operator, the syntax is familiar to every programmer, including you.
>
There are details that have to be spelled out clearly, and I'm very busy at
the moment, which is why the EEP isn't ready yet.

(Actually, the basic idea has been used in four different languages that I've
used. Great artists steal. (:-))

Richard A. O'Keefe

unread,
May 14, 2013, 9:15:34 PM5/14/13
to Dan Gudmundsson, Erlang-Questions Questions

On 15/05/2013, at 3:17 AM, Dan Gudmundsson wrote:
> Richard A. O'Keefe
> > Frames
> >
> Maps are basically an extension of Frames which allows for arbitrary key terms.

And from my perspective that is the major thing WRONG with them.

I have never understood why it is so hard to persuade people
that unbridled power is a *bad* thing.

"Unbridled Power" is the title of a book by a former New Zealand Prime Minister
arguing that there were practically no limits on what the New Zealand Government
could allow itself to do and that this was a bad thing. When MMP came in he
wrote a successor called "Bridled Power" explaining why the limitations this
would place on a government would be a good thing. However, actual experience
has shown that the party that gets most of the cabinet seats still thinks that
getting the votes of 35% of the electors is a licence to do what you please.
We definitely need better reins for Government.

See the analogy? Frames are valuable because of the things they CAN'T do.
Maps will take any old rubbish as a key EVEN IF YOU DIDN'T MEAN THAT.

For example, suppose you are dealing with JSON, and you add
X=Y to a map, intending this to mirror adding X: Y to the JSON.
But in fact you needed to add Y=X. The map doesn't know or care.

If you want to model JSON in Erlang, you want a data structure
that won't _let_ you use "arbitrary key terms".

> Which could use your implementation proposal for small maps defined in source code,
> and switches to a tree (or something else) for larger dynamic sized Maps.

And frames could also switch to a tree for larger frames.
>
> >- Frames are limited to atom keys in the interests of
> > compile-time error detection (both simply noticing that
> > a key is not an atom at all and allowing the possibility
> > of Dialyzer support).

Frames are limited to atom keys IN THE INTERESTS OF DOING WHAT
THE PROGRAMMER SAYS S/HE WANTS. Compile-time is always an
attractive option, but run-time will do.
>
> So use := and you will get a runtime/compile check that the key exists.

I used to have a Pogo cartoon on my office door where the punchline
was something like "Your mistake is not the right mistake." (:-).

The existence of two different notions of equality in Erlang is a
problem, and some day it will have to be addressed. Fortunately
Prolog was able to distinguish between X < Y (X is numerically less
than Y) and X @< Y (X is less than Y as a term) and we were able to
say that nan(X) => X = X, X == X, \+ (X =:= Y). (a NaN unifies with
itself, is identical to itself, but is not numerically equal to
itself.) One of the reasons for sticking with atoms in frames is
that the two notions of equality and indeed the two notions of
ordering coincide. You could extend that to allow integers as keys,
which arguably ML does. You could extend it to allow binaries as
keys, which could be a good thing. But you can't extend it to floats
without running into trouble.

> Robert Virding
> > Use only matching
>
> We would like to but we believe we can not do that because we can't output a defined order,
> i.e. you can not sort 1.0 and 1.
> We would need to define a new term order in erlang and we need to introduce something
> like <:<, >:> =:< and >:=.

No, the issue here is that ORDERED dictionaries and UNORDERED dictionaries
are different types that have similar but not identical interfaces,
constraints, and performance.

In particular, "is identical to" is the right comparison for unordered
dictionaries and "is numerically equal to" is the right comparison for
ordered dictionaries.

Just as I think trying to smush records and dictionaries into a single
data structure is rather like trying to make a combined motorbike/pogo
stick, so I think that trying to smuch ordered dictionaries and
unordered dictionaries into a single data structure is rather like
trying to make a combined motorbike/tracked vehicle.
See http://en.wikipedia.org/wiki/Kettenkrad. If you want a motorbike,
you probably _don't_ want a Kettenkrad, even though it looks like a
motorbike at the front.

In my Smalltalk library, SortedDictionary offers
_ first
_ first: n
_ firstKey
_ firstKeyGeq: key [ifNone: aBlock]
_ firstKeyGtr: key [ifNone: aBlock]
_ removeFirst
_ removeFirst: n
_ last
_ last: n
_ lastKey
_ lastKeyLeq: key [ifNone: aBlock]
_ lastKeyLss: key [ifNone: aBlock]
_ removeLast
_ removeLast: n
_ removeKeysGeq: key [return: aBoolean]
_ removeKeysGtr: key [return: aBoolean]
_ removeKeysLeq: key [return: aBoolean]
_ removeKeysLss: key [return: aBoolean]
_ reverse[Keys[AndValues]]Do: aBlock
(not all of these are implemented at the moment), as well
as the operations suitable for unordered dictionaries.
These methods make no sense at all for unordered dictionaries.

Providing them offers greater power. There are more things you can do.
But the last thing any sane Smalltalk programmer would want is to have
them as the *default* kind of dictionary, because this power comes at
the cost of increased storage and increased time. I use them
unhesitatingly when I *need* the sorted order. But I avoid them at all
other times.

The existence of two data structures with *different* contracts
(sorted dictionaries and unsorted dictionaries) makes it hard to
privilege one of them as THE map type to get a syntax.

In developing my Smalltalk, I assure you that I have felt the
cultural pressure from Python and Perl to have a syntax for
dictionaries most keenly. But I have 12 different kinds of
dictionary (one of them unfinished) and can foresee a need for
two more. No, wait, there's a "ShellEnvironment" class that is
also a dictionary. So somewhere between 12 and 15. Let's admit
that two of them are for compatibility only, so claim just 10
right now. They differ in
- what kinds of keys are allowed
- what kind of comparison is done
- is sorted traversal easy
- do they have a "parent" that is consulted for read
access to keys not directly present
- what the time and space costs are


Erlang doesn't have a huge number of dictionary types,
but it should certainly have at least

- keys any term, built in hashing and equality
- keys any term, programmer-defined hashing and equality
- keys any term, built-in term order
- keys any term, programmer-defined term order
- keys restricted somehow, perhaps a user-defined filter on addition
- perhaps some optimisation for binary keys or integer keys or atom keys.

Much of the variations could be hidden behind a common interface,
but the distinction between ORDERED and UNORDERED cannot be completely
hidden.

Richard A. O'Keefe

unread,
May 14, 2013, 9:32:57 PM5/14/13
to Loïc Hoguin, Erlang-Questions Questions

>
> My personal use case is probably not that universal, but I believe it also applies to anyone who has to access or modify JSON or equivalent (not XML). How do you access deep values in JSON in Erlang? How do you update a value in JSON?

Not present-day Erlang, but we could do it.

(1) Copy Pop-2.
Define that E0.[m:]f[([E1,...,En])]
is equivalent to [m:]f(E0[,E1,...,En]).

This means that X.hd and hd(X) are the same expression,
and X.lists:sublist(Y, Z) and lists:sublist(X, Y, Z) are the same expression.

This is a trivial source->source transformation that gives functional
semantics OO-like syntax.

Restriction: the dot must be immediately adjacent to the m or the f.
Similar restrictions are found in other languages. For example, in
Haskell, m.f is "f in module m" but m . f is "composition of functions
m and f". People live with it.

(2) With a slight twist.
Define that E0.[E1]
is equivalent to slot(E0, E1), whatever slot is in scope.

So if you do
-import(json, [slot/2]).
then J.[1].[foo].[3].[oar] would be the functional
equivalent of J[1]["foo"][3]["oar"] in Javascript.

Again, we have function *semantics* hiding under OO-ish syntax.
(1) and (2) can be mixed freely.

This is also a pretty trivial source->source transformation.

(3) Define update according to the EEP that I haven't finished yet,
so that
J.[1].[foo].[3].[oar] := Whatever
creates a new variable J' which is used downstream in the place
of J and does pretty much the right thing.

Getting all the details of this right is non-trivial, but
definitely doable.

All of this is purely at the level of syntax.
None of it is specific to records, frames, maps, or Dho-Nha curves.

We *DON'T* have this yet, but we *COULD*,
and it would require NO changes to the VM or the compiler back-end,
so BEAM code generated with a new compiler would be backwards
compatible.

Jesper Louis Andersen

unread,
May 15, 2013, 4:20:20 AM5/15/13
to Chris King, Erlang

On May 14, 2013, at 7:37 PM, Chris King <colan...@gmail.com> wrote:

> Static structure does not preclude subtyping. OCaml's functional
> objects capture this notion perfectly -- an object has a set of
> methods which are known at compile time (hence static). However any
> given function need only be aware of the presence of the subset of
> methods in which it is interested. Note that this does not preclude a
> function from creating a copy of said object with only those methods
> modified!

This is often called "structural subtyping" in contrast to "nominal subtyping". The difference is that the structural variant is implicit whereas the nominal variant is explicit. You have to explicitly define the relationsship in the nominal case, which is common in Java, C++, …

Google Go is another language with structural subtyping in its interface model. There is also some structurality in the Standard ML module system w.r.t. signature checks.

I agree we need something which is better than records at some point, with static guarantee of the contents. But currently, the thing we are missing is a fast map implementation which is functional, so we can avoid ETS tables for purposes they aren't designed for.



Jesper Louis Andersen
Erlang Solutions Ltd., Copenhagen



Robert Virding

unread,
May 15, 2013, 6:28:25 AM5/15/13
to Dan Gudmundsson, Erlang-Questions Questions
From: "Dan Gudmundsson" <dg...@erlang.org>


Robert Virding
> Use only matching

We would like to but we believe we can not do that because we can't output a defined order, 
i.e. you can not sort 1.0 and 1.
We would need to define a new term order in erlang and we need to introduce something
like <:<, >:> =:< and >:=.

I personally think this would be a good idea to introduce anyway as I think today's comparison ops are not good. So:

- introduce a new set of comparison operators that do not convert integers to floats on comparison
- fix the term ordering for this. This is actually very easy, just make integers < floats, we are comparing *terms*
- modify the libraries which use comparisons to use the new ops
- use them in maps if we are going to have sorted maps

I know we can't make the existing comparison ops only work numbers, but it would be nice.

Robert

Chris King

unread,
May 15, 2013, 10:27:43 AM5/15/13
to Jesper Louis Andersen, Erlang
On Wed, 15 May 2013 04:20:20 -0400, Jesper Louis Andersen
<jesper.lou...@erlang-solutions.com> wrote:

>
> On May 14, 2013, at 7:37 PM, Chris King <colan...@gmail.com> wrote:
>
>> Static structure does not preclude subtyping. OCaml's functional
>> objects capture this notion perfectly -- an object has a set of
>> methods which are known at compile time (hence static). However any
>> given function need only be aware of the presence of the subset of
>> methods in which it is interested. Note that this does not preclude a
>> function from creating a copy of said object with only those methods
>> modified!
>
> This is often called "structural subtyping" in contrast to "nominal
> subtyping". The difference is that the structural variant is implicit
> whereas the nominal variant is explicit. You have to explicitly define
> the relationsship in the nominal case, which is common in Java, C++, …
>
> Google Go is another language with structural subtyping in its interface
> model. There is also some structurality in the Standard ML module system
> w.r.t. signature checks.

Yes, I know.


> I agree we need something which is better than records at some point,
> with static guarantee of the contents. But currently, the thing we are
> missing is a fast map implementation which is functional, so we can
> avoid ETS tables for purposes they aren't designed for.

I understand that. But why must (a) this new fast map implementation have
an associated syntax, and (b) this syntax displace the syntax which *is*
needed to implement pattern-matching of frames?

If anything you can get an *even faster* map implementation if you ignore
the use case of extensible records.

Ola Bäckström

unread,
May 15, 2013, 11:15:10 AM5/15/13
to Erlang

> On Wed, 15 May 2013 04:20:20 -0400, Jesper Louis Andersen <jesper.lou...@erlang-solutions.com> wrote:
[snip]
> I understand that. But why must (a) this new fast map implementation have
> an associated syntax, and (b) this syntax displace the syntax which *is*
> needed to implement pattern-matching of frames?

Could the syntax somehow be decoupled from the implementation, allowing for usage in both scenarios?
I just played with the idea a little... so assume the map implementation is found in the module 'coolmap'

A) To create a map, one would have to point out what implementation module that should do the job.

M = coolmap#{foo => 1, bar => 2, baz =>3}.
Since coolmap is an atom, the compiler knows this must be creation of a new map... thus translated to
M= coolmap:create([{foo, 1},{bar, 2}, {baz, 3}]).

The variable M should somewhere contain the module name, so that a call extract_module(M) returns coolmap.

B) To update a map,

M2 = M#{foo := 12}.
Since M isn't an atom the compiler understands that this translates to
M2= (extract_module(M)):update([{foo, 12}], M).

C) To introduce new keys

M3 = M2#{foo:=100, bary => 24}
The Since => is used in the expression this will be translated to another function:
M2= (extract_module(M2)):edit([{update, foo, 100},{set, foo, 12}], M2).

The above would allow anybody to create new implementations, could it be useful to allow Frames and Maps to share syntax?

Richard A. O'Keefe

unread,
May 16, 2013, 1:43:06 AM5/16/13
to Erlang
Oddly enough, the EEP I've been working on (or at least Part 1, as I've
split it into four parts) _is_ syntax decoupled from the implementation.

Here's a copy of the current draft. I'm about to go through the torture
of adding Markdown, sigh.

eep-slot.md

Ola Bäckström

unread,
May 16, 2013, 2:30:54 AM5/16/13
to Erlang
Correction: It was Chris King that wrote what I cited below. I cut away the wrong part, sorry for the confusion.

After reading the EPP I see that similar functions to B and C are called Update and Put but just acts on one pair at a time.

Bengt Kleberg

unread,
May 16, 2013, 2:39:40 AM5/16/13
to Erlang
Please do not cut away parts of an email. Keep all, or remove all and
let me look up the things I need in the archive.


bengt

Loïc Hoguin

unread,
May 16, 2013, 6:38:27 AM5/16/13
to Richard A. O'Keefe, Erlang
If I understand right, you need '#[]' defined in the local module, and
as such can only have one defined?

This is going to be annoying when you want to use this syntax on both
records and JSON and some other structure in the same module.

Why not just have Erlang define a default function if one isn't
available that does what people expect depending on the type?

* lists, tuples: integer N fetching nth element
* proplists: atom A doing a keyfind
* dicts, maps: any A doing an equivalent of a keyfind
* etc.

You'd still allow overriding it but it would do the sane thing by default.

--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Björn-Egil Dahlberg

unread,
May 16, 2013, 9:23:35 AM5/16/13
to erlang-q...@erlang.org
On 2013-05-10 05:03, Richard A. O'Keefe wrote:
> On 9/05/2013, at 8:10 PM, Max Lapshin wrote:
>
>> Richard. Would you, kindly, tell: where is the difference between maps
>> and frames?
>>
>> As far as I understand, it is not about syntax, it is about internal
>> way of storing data?
> I thought I already had, but here goes.
>
> There is one central semantic difference, which leads to two
> others.
>
> (U) What do you want to optimise the design for?
>
> Frames are optimised (pared to the bone, in fact) for use in
> record-like ways. They are somewhere between pathetic and
> hopeless as general purpose dictionaries.
>
> Maps optimised for service as general-purpose dictionaries.
> They are somewhere between almost usable and pathetic for
> record-like uses.
>
> I do not believe that it is possible to design a data structure
> which is *good* at both tasks. I suspect that it is possible to
> prove this, but I must admit I haven't tried.
>
> THIS is the reason that Björn-Egil Dahlberg was able to sneer at
> frames as "just a replacement". I don't try to build perpetual
> motion machines. I don't try to square the circle with ruler and
> compass. I don't try to design bottle-openers that make good
> chisels. And given two sets of conflicting requirements (even if
> the sets overlap substantially), I don't try to satisfy them both
> with a single data structure. (Using a Java ArrayList as a queue
> is *possible*, but stupid.)

Richard, I'm sorry that you feel that way. It was not my intention to
"sneer" at Frames. I merely stated that Frames is intended as a record
replacement whereas Maps is not.

I also said: "The one suggestion that stands out is of course the Frames
proposal from Richard O'Keefe. It is the most complete proposal I've
seen and is very well thought out." .. but I wanted something else,
hence the EEP.

> As for implementation, Björn-Egil Dahlberg wrote:
>
>> Current prototypes stems from using sparse tuples in a
>> HAMT-like data structure and tuple-like data structures.
>> The HAMT-like data structure is discontinued and will be
>> replaced by some ordered tree.
> The HAMT-like data structure for OCAML that I've looked at
> takes *at least* four words for every key/value association.
> (I have been able to find plenty of numbers for the *speed*
> of HAMTs but nothing for the *space* they need other than a
> claim of "small".) That's not necessarily the same thing,
> and the "tuple-like" part makes me wonder if something much
> cleverer is being done.
I don't know how OCAML does it but that word count seems excessive.

Now, the HAMT impl. is discontinued because it was decided an ordered
structured was going to be used. But,
here is what I had i mind:

First things first. HAMT is a functional structure, meaning no
destructive updates.

In my implementation I used the structure of an radix tree. In the tree
we have three types of nodes,
leaf nodes, tuple nodes and "sparse-tuple" nodes.

Tuple nodes are a saturated "sparse-tuple". It is a plain tuple of fixed
size. Each element in the tuple is a leaf node, tuple node or
"sparse-tuple".

Leaf nodes are conses with key values [K|V]. Leaf nodes not are not
really necessary since we can save KVs directly in the other nodes but
it simplifies implementation and was used in the prototype. Storing KVs
directly in tuple and "sparse-tuple" nodes saves one word per KV but it
also doubles the number of words that needs copying in each update. This
boils down to a trade-off between speed and memory.

"Sparse-tuple" nodes is an internal construct, but a real type, only
used in this data structure. Elements may be indexed 1 - N but it would
*not* take N + 1 words in memory. Instead a mapping function is used to
index elements. We also need the information of which elements in the
sparse tuple are set. This information takes about one word. The number
of words required depends on the node size (which is fixed), but can be
placed in the header of this type. So the memory size of a sparse tuple,
indexable 1 - N with only two elements, takes 3 - 4 words (3 if the
mapping is in the header). N is typically 16 or 32.

What this means is that we can take a hashing approach to this but still
keep the tree compact. I believe this is meant to be the "small" memory
footprint.

The key to be used to traverse tree is hashed (using the same algorithm
as phash2) which produces an 32 bit integer. We index the tree by using
a set of bits from this hash at each level. The number of bits used is
dependent on the fixed size of the nodes. In my prototype i used nodes
of size 32. For each level in the tree we use 5 bits of the hash to
index down in the tree. When the hash is exhausted we produce an new
hash but add the level as well, ex. phash([Lvl|Key]).

When we update a value in the tree only the nodes in the path are
updated. "sparse-tuples" reduces the amount of copying.

The space this tree requires depends on the hashing algorithm (and the
keys that are hashed). Some "management size" is required ofc. The root
of this structure may be implemented in several ways. For instance we
can consider a variant of the "sparse-tuple" as the root. The management
size requirement would be 2 words, a header defining the type and
keeping the mapping, and a Eterm for the number of entries in the tree.
If we optimize for space we would require one word header per node in
the tree, the rest of the elements are key/values. I have no exact value
for words/pair but I think you get it.

It also worth noting that,
* the tree would be shallow,
* only one key comparison is done while traversing the tree (leaf node).

It is also worth noting that the mapping function in "sparse-tuples" is
an expensive one. We need hamming weight of a bitfield. This can be done
in newer hardware using a popcount instruction. Doing popcount in Erlang
code is doable but not very efficient unless compiler/runtime aided.

This whole explanation serves as one big parenthesis since HAMT will not
be used. Space conservation, key-sharing and ordering requirements were
added.

> However, we're told that's going "and
> will be replaced by some ordered tree". Suppose you have a
> 2-3-4 tree. We can play SML/Mercury-style games to move the
> size of a node into the pointer to it, so we
>
> 2-node: C1 K1 V1 C2 4 words/pair
> 3-node: C1 K1 V1 C2 K2 V2 C3 3.5 words/pair
> 4-node: C1 K1 V1 C2 K2 V2 C3 K3 V3 C4 3.3+words/pair
>
> So it's fair to say that this would take *roughly* 3.5 words
> per field.
>
> Playing similar move-the-balance-info-into-the-pointers games
> would let us do red-black trees or AVL trees in 4 words/pair.
>
> There are other games one can play, at the price of more
> complicated code, where we take advantage of the leaves of
> a 2-3-4 tree being at the same depth, so that leaf nodes
> are K1 V1
> or K1 V1 K2 V2
> or K1 V1 K2 V2 K3 V3
> or K1 V1 K2 V2 K3 V3 K4 V4
> and the tree as a whole is roughly 2.3 words per pair.
>
> I don't know what exactly Björn-Egil Dahlberg has in mind,
> but if we charge 3 words per key/value association, it's
> not likely that we're overestimating the space cost.

Maps would have a two tier approach.
Tier 1 would have an approach much like your frames { {K1, ..., Kn}, V1,
..., Vn }, essentially two tuples where keys are ordered in term order.

When the number of entries reaches a certain point the cost of doing
business becomes too great. Searching and the garbage generated by
updates costs more than space saving earns us. Typically 30 - 50
entries. At this point the structure is converted to some ordered tree,
Tier 2. I say "some" ordered tree because it is still undecided what is
suitable here. I think it is fair to say that space cost here would be 3
- 4 words per pair. I would view Tier 1 as an optimization of Tier 2 for
few entries.

I would also like to point out that when this point is reached, space is
not our primary concern.

Regards,
Björn-Egil

Vincent de Phily

unread,
May 17, 2013, 4:58:53 AM5/17/13
to erlang-q...@erlang.org
On Thursday 16 May 2013 12:38:27 Loïc Hoguin wrote:
> On 05/16/2013 07:43 AM, Richard A. O'Keefe wrote:
> > Oddly enough, the EEP I've been working on (or at least Part 1, as I've
> > split it into four parts) _is_ syntax decoupled from the implementation.
> >
> > Here's a copy of the current draft. I'm about to go through the torture
> > of adding Markdown, sigh.
>
> If I understand right, you need '#[]' defined in the local module, and
> as such can only have one defined?
>
> This is going to be annoying when you want to use this syntax on both
> records and JSON and some other structure in the same module.


That worried me when reading the draft too. I expect that the answer is to
define :
> '#[]'({my_type, V}, K) -> extract_mytype(K,V);
> '#[]'(Other, K) -> erlang_default:'#[]'(Other, K).

But this can become ugly and unwieldy quickly. Perhaps we need a standard
wraper function :
> '#[]'(K,V,[]) -> erlang:error(badarg).
> '#[]'(K,V,[M|T]) -> try M:'#[]'(K,V) catch error:badarg -> '#[]'(K,V,T).

That we can use in each module as :
> '#[]'(K,V) -> '#[]'(K,V,[mytype,json,maps,lists...]).

But you need to order your modules very carefully, puting the most selective
ones first.

Appart from that, while this eep looks interesting, it doesn't seem to address
pattern matching ? Or did I read incorrectly and you actually can
> case Json of
> Json#['class']#['foo'] -> handle_foo(Json);
> Json#['class']#['bar'] -> handle_bar(Json);
> end.

?
--
Vincent de Phily

Björn-Egil Dahlberg

unread,
May 17, 2013, 5:11:25 AM5/17/13
to Vincent de Phily, erlang-q...@erlang.org
I like this idea of deep update, haven't look too closely though. :)

But please, discuss this in a separate thread.



// egil
Sent from my iPhone

17 maj 2013 kl. 10:58 skrev Vincent de Phily
<vincent...@mobile-devices.fr>:

Patrik Nyblom

unread,
May 17, 2013, 5:11:44 AM5/17/13
to ee...@erlang.org, erlang-questions
Hi!

I think we need to clarify a few things from the perspective of the technical board that gave Björn-Egil the premises for this EEP.

1) Maps are Maps and Frames are Frames. Maps is not some kind of extension to Frames, as Frames hold properties that Maps never should hold. What was decided, and what Björn-Egil is working on, was to investigate if a clever implementation of a dictionary can be a good enough record replacement to be used in situations like interfaces between modules, instead of records and property lists. As the implementation is not yet done, we only have the interface to Maps to discuss. Even if the EEP mentions complexity, we still have to see if the implementation in a "record-like" situation can be made efficient both when it comes to memory consumption and time complexity.

2) Maps is *not* to be implemented using HAMT's, so discussing the complexity or implementation of HAMT's is interesting, but beside the point. The task is to define an ordered dictionary. Maps have keys in order. How that is implemented is not defined and that job is yet to be done. A maximal O(log N) complexity in look up was the only requirement, but an implementation with low memory consumption for "simple" Maps was *suggested*. Discussing characteristics of an implementation that is no more than a general idea seems futile. There are expectations, but no facts about the performance characteristics.

3) The behavior of float and integer keys that compare equal is the same as the one for ordered_set ETS tables. Even though we don't like it, that behavior is the only consistent we've found for ordered dictionaries. If we want to define two types of comparison in Erlang, that is another discussion. (An EEP for that would be welcome. Do not forget to add suggestions for all current interfaces where order is used, like lists:sort, orddict etc.)

4) Maps is supposed to be general, a general mapping from keys to values. The thinking was that the current data types (at least the compound ones) are very general. Also the current dictionaries have no restrictions on the keys or values. A Map data type should of course not have that either. When used in place of records, the Map gives maybe to much freedom, that is the price to pay when being very general.

5) To be able to differentiate between operations where you possibly add new keys and operations where you replace existing keys was a requirement. The arguments have already been outlined in Joe's earlier posts, so I see no point in repeating that.

6) Imposing an order on the keys was thought to make the data type fit better with the current built in types. Using for example the HAMT implementation to describe the order between Maps, to have a serialization looking different depending on how the map was constructed, or having random order of the keys when serializing or displaying the data, would make the Map an odd creature in the Erlang world. If we for example want to serialize and send two Maps to a program written in another language, we would like to be able to describe the order between the Maps so that comparison could be done in that other language. The order between Maps as they look in this EEP is easily described and comparison is simple to implement. Suggestions like "sorting the MAP before comparing and printing" was rejected. Making a comparison between two elements possibly have the complexity of O(N log N) was thought unacceptable.

7) To publish an EEP even before there was a prototype implementation was a requirement from the technical board, as we wanted feedback on the API, the semantics etc. I think this thread has shown that there is definitely a lot to discuss. It however also shows that as long as there is no description of the actual implementation, there's a lot that cannot be discussed, but we really would want to discuss...

So - these were the premises and I think Björn-Egil has done a really good job in writing this EEP. A lot of people has also taken the time to give valuable feedback. I think the EEP and the current feedback gives me reason enough to suggest a prototype implementation. Possibly with a few adjustments to the EEP text, examples etc.

Cheers,
Patrik
_______________________________________________
eeps mailing list
ee...@erlang.org
http://erlang.org/mailman/listinfo/eeps

Reply all
Reply to author
Forward
0 new messages