Literal Arrays vs. Tuples?

53 views
Skip to first unread message

Stefan Marr

unread,
Jan 28, 2017, 5:23:04 AM1/28/17
to Newspeak Programming Language, Richard Roberts
Hi:

SOMns is likely going to have literal arrays soon (https://github.com/smarr/SOMns/pull/100). Thanks to Richard.

So, I was looking again at the Newspeak spec [1], because I didn't really find arrays defined the last time I skimmed.

Now I saw that it defines tuples (sec. 5.1.7) instead of arrays, and as shallowly immutable.
What was the reasoning behind this design?
On the one hand, why make them 'tuples' (instance of ReadOnlyTuple) instead of arrays, and then, why only shallowly immutable?

I am guessing, the shallowly immutable is to have more flexibility in what can go in there. But I am not too sure about the immutable part. (Also since the spec says it's not actually implemented.)

I was thinking about having {} for array literals, and #{} for deeply immutable value array literals instead [2].

So now I am wondering what the benefit of having tuples is.
Any comments would be greatly appreciated.

Another thing I am wondering about is how to interpret the spec exactly.
I read it that the literal syntax essentially desugars into an expression that constructs and populates an array, which is then used to create the ReadOnlyTuple. So far so clear, I think.

But then, it says it refers to the class ReadOnlyTuple. So, that means, ReadOnlyTuple is a well-known value instead of a lookup?
I am wondering because I remember the discussion in the spec around literal objects and the question whether `Object()()` should behave differently from `()()` (I think for the later there would not be a lookup of Object, which for the former could be overridden.).

Thanks and best regards
Stefan


Gilad Bracha

unread,
Jan 28, 2017, 12:51:23 PM1/28/17
to Newspeak Programming Language, Richard Roberts
Hi Stefan,

Literal arrays are one of the few the Smalltalk expression language which is broken. They are at once too restrictive (they can only contain literals) and not restrictive enough (the set of objects they contain can change).  I've encountered (albeit rarely) code that changed literals after the fact. This was most egregious with literal strings (Squeak seems to prohibit this, but I remember systems that did not), but arrays as well. 

The spec is therefore very deliberate on literal tuples.  Literals should not change underfoot. Arrays are mutable, and hence should not appear as literals. The fact that the implementation has bug doesn't change this. 

The question then arises, why shallow immutability rather than deep immutability.  Here we have a countervailing issue. Being able to place arbitrary expressions in a tuple rather than just literals is very useful (as I noted, Smalltalk is annoyingly restrictive here).  This does not create a conflict with the litreal as written, because it contains the same objects as when it was created. Nor does it preclude one from filling the tuple with deeply immutable values, in which case the implementation can determine quite easily that it is deeply immutable and mark it as such when the tuple is constructed (or lazily, or at a later time). So overall I think the current rules are a win.

As for the question of where ReadOnlyTuple comes from. I expect it would be defined as method on Object as are a handful of other things (like String).  This does imply it could be overridden, but the spec also says that when we use such names, we mean the platform's standard version, so the implementation can rely on those types. 

I'd urge you not to create semantic differences between the implementations. I'd like to try and harmonize the syntax of SOMns with the specification (and the other implementations) instead.




Stefan Marr

unread,
Jan 28, 2017, 2:43:02 PM1/28/17
to newspeak...@googlegroups.com, Richard Roberts
Hi Gilad:

Thanks for the swift answer.

> On 28 Jan 2017, at 18:51, Gilad Bracha <gbr...@gmail.com> wrote:
>
> Literal arrays are one of the few the Smalltalk expression language which is broken. They are at once too restrictive (they can only contain literals)

Ok, let’s start with literals in general. I suppose, I try to see literals in a way that allows them to be treated uniformly across the different kinds of literals.

We got numeric, boolean, nil, character, string, symbol, tuple, closure, and object literals (I haven’t really looked into pattern literals).
The only form to treat them uniformly for me seems to look at them as syntactic sugar for instantiating new objects with the ‘value’ encoded by the literal expression.

Taking Object literals, I assume it is natural to assume that it can have mutable slots. So, I would like to look at literals in Newspeak as rather different from literals in Smalltalk. The problem with Smalltalk is that they don’t have the notion to be expressions that generate new objects consistently. I suppose that’s mostly an unfortunate ‘optimization’ and only blocks retain that notion.

However, for Newspeak, I’d argue it is not an issue. Well, at least for SOMns it is not an issue to take that stance. Specifically, numeric, boolean, nil, character, string, and symbol literals all generate values, which don’t have any identity. So, arguing that they are fresh objects doesn’t really make a difference for them.

It’s not observable. For closures and objects, I suppose it is natural to see them as generator expressions that create a new object every time. And, both are somewhat mutable kinds objects.

From that, I’d argue that arrays should also be seen as freshly generated objects, which makes modifying them at least in my mind as natural as objects or closures.

> and not restrictive enough (the set of objects they contain can change). I've encountered (albeit rarely) code that changed literals after the fact. This was most egregious with literal strings (Squeak seems to prohibit this, but I remember systems that did not), but arrays as well.

Squeak and Pharo also have the `{Object new. Object new. #d. 3}` style arrays. They conform to the generator expression notion. So, you get every time a new mutable array.
And that’s what I was thinking of.

Now to the aspect of them being immutable.
I looked through all the SOMns code I got, which is some 36500 lines, mostly tests and benchmarks.
I didn’t see anything where I would yearn for literal arrays that are mutable. With the exception of test cases perhaps. Some of my language tests could benefit from that.

So, let’s say, I have neither a strong nor particularly informed opinion on shallow immutability.
I don’t have to notion of tuple in SOMns, but I suppose, that would be a minimal addition.

However, considering your point below:

> Nor does it preclude one from filling the tuple with deeply immutable values, in which case the implementation can determine quite easily that it is deeply immutable and mark it as such when the tuple is constructed (or lazily, or at a later time).

I disagree with the notion of implicit value semantics. I’d call that magic. And, when dealing with actor code, I would argue that it makes code brittle and error prone.

The issue is that when you construct objects/tuples just right, you get a value. If you pass it on to another actor, you can work directly with it. However, if one of your code path should return a mutable object somewhere along the way, everything gets tainted, and suddenly you code doesn’t work anymore, because you got an unexpected far reference in the other actor.

This is one of the reasons why in SOMns there is a strict distinction between objects and values. Objects are never implicitly promoted to values. Neither should tuples or for what its worth arrays be promoted.

Which means in effect, I would also want deeply immutable literal tuples/arrays.

[Tangent: Value objects check all their parameters on construction and if one isn’t a value, we get a NotAValue exception. However, there is a loophole to get a half-initialize Value object that will behave like a value but isn’t: you can escape it from the initializer expressions, I guess, perhaps by setting it on a slot of one of the parameters. I haven’t thought of a way to avoid that yet…]

> I'd urge you not to create semantic differences between the implementations. I’d like to try and harmonize the syntax of SOMns with the specification (and the other implementations) instead.

Yeah, I do try to stick as close to the spec as possible.

Sorry for not being more concise. Hope at least some of this makes sense.

Best regards
Stefan


Vassili Bykov

unread,
Jan 28, 2017, 6:07:01 PM1/28/17
to newspeak...@googlegroups.com, Richard Roberts
IMHO, a critical prerequisite of a constructive discussion about literals is a strict definition of "literal", as an adjective and as a noun. In the presence of mutability and object identity, the "common sense" understanding is vague enough to create arguments when both sides are mostly in agreement.

I'd propose that "literal" invites expectations of full immutability, and given what they do, it would be more clear to speak of array and object constructors instead of what's commonly called array and object literals. (Interestingly, ECMAScript spec calls them "initializers" in the descriptive text, explaining that they are "written in a form resembling a literal," and then proceeds to call them ArrayLiteral and ObjectLiteral in the actual syntax rules).

In this terminology, the classic #(...) syntax in Smalltalk is a literal array with a design bug of being shallowly mutable, and the newer {...} syntax is an array constructor.

Cheers,
--Vassili

Gilad Bracha

unread,
Jan 29, 2017, 11:42:04 AM1/29/17
to newspeak...@googlegroups.com, Richard Roberts
I agree that the term "literal" is imperfect and contributes to confusion. Unfortunately so does the term "constructor". So we should find some other terminology.  "Array" also carries a legacy of mutability, whereas "tuple" does not, so I intend to continue calling these things tuples. For now, I'll use "tuple expressions", but I realize that is questionable as well.

Substantively, none of this changes my view that tuple expressions should be shallowly immutable for the reasons I discussed earlier and others*. What I do think is worth considering are deeply immutable tuple expressions. So Stefan's notation would be #{e1. ... . eN}. An alternative would be to have a deeplyImmutable accessor, as in {e1. ... . eN} deeplyImmutable. This is more verbose, and requires an analysis to ensure that no aliasing is going on to avoid double allocation, but avoids a new language construct.

* others might be: 
- we want to encourage immutability, and so having a convenient form for immutable things bit not for mutable ones is a small step in that direction.
- Stefan's data indicates that mutable array expressions have less utility than one might think. This is debatable of course. Dart has both mutable and constant list expressions, which in fact amount to either array expressions or to properly immutable Smalltalk like array literals. The latter are a cause of all kinds of problems, but the former are widely used. The motivation for them was dubious, basically copying a Python idiom where one starts with an empty list given conveniently as "[]" (would be "{}" in our syntax) and then fills it in. I think this doesn't matter at all.
Reply all
Reply to author
Forward
0 new messages