Naming Conventions

Andrew Sorensen

unread,

Mar 1, 2016, 8:57:24 PM3/1/16

to extemp...@googlegroups.com

Hi All,

Two related questions about naming conventions.

First question:

At the moment whenever you create a named type e.g.:

(bind-type MyObj <i64,String*,|4,i64|>)

Extempore creates two constructors for you. A reference constructor MyObj and a value constructor myObj (note case). This is also true for a lower-case named type e.g.

(bind-type my_other_obj <i64,String*,|4,i64|>)

Which gives a reference constructor My_other_obj and a value constructor my_other_obj.

Note that by convention we are trying to encourage capitalized camel style named-types. Although this is not enforced you will now get a little notification message about this. The is in contrast to xtlang procedures which are by convention lowercase and separated by underscores.

If you are wondering what I mean by reference and value constructors, here are two examples using MyObj.

(bind-func test_myobj_ref_constructor

(lambda ()

(let ((obj (MyObj 4 (Str "Hello") (array 1 2 3 4))))

obj)))

(bind-func test_myobj_val_constructor

(lambda ()

(let ((obj (myObj 4 (Str "Hello") (array 1 2 3 4))))

obj)))

The important thing to notice is (a) the use of MyObj vs myObj constructor, and the return type of the two functions.

Compiled: test_myobj_ref_constructor >>> [MyObj*]*

Compiled: test_myobj_val_constructor >>> [MyObj]*

The ref constructor example returning type MyObj* and the value constructor returning type MyObj. The reference constructor zone allocates memory for the new MyObj and returns a reference to that zone allocated memory. The value constructor stack allocates memory for the new MyObj and then returns a copy of that object (for small objects on 64bit architectures this will usually be by register).

Also worth mentioning here is the use of (array 1 2 3 4). You can "construct" arrays in xtlang using either "array" or "Array" where again the capitalization is by value "array" or reference "Array", where the reference allocation is (as with named types) zone allocation. So for example (array 1 2 3 4) is type |4,i64| and (Array 1 2 3 4) is type |4,i64|*. xtlang also supports similar semantics for tuples i.e. "(Tuple 1:i64 (Str "Andrew") " and "(tuple 1:i64 (Str "Andrew") and vectors "(Vector 1.0:f 2.0 3.0 4.0)" and "(vector 1.0:f 2.0 3.0 4.0)".

So, apart from providing a little more information, my main question is about this use of upper/lowercase constructors to differentiate ref vs value semantics. I'm obviously uncomfortable about this and was wondering if anyone has a better idea. We need as solution that is concise. Something like MyObj_r and MyObj_v might be ok - but MyObj and MyObj_val might not be? It would also be nice to use the same semantics for both named types, as well as array/vector/tuple constructors. So if we went with "_val" for example, we would also have Array & array_val, Tuple and tuple_val etc..

Please let me know what you think.

Second Question:

pref/aref/tref and pref-ptr/aref-ptr/tref-ptr or poorly named. The naming originally came from schemes 'list-ref' (the full expansion of aref in xtlang is actually array-ref). list-ref does actually make sense for scheme but array-ref doesn't make sense for extempore, because array-ref actually returns a value not a ref. In fact aref-ptr returns a reference.

So we really need a rename here. I'm pretty happy with aref (and friends) changing to aval and aref-ptr changing to aref. Of course this is going to be a little painful while we all adjust to 'aref' semantics changing.

Unfortunately this is a fairly aggressive change that will certainly break everyone's code - in some fairly nasty ways.

Alternatively, if we change aref-ptr to something other than aref we could run aref and aval in parallel for a while, and aref-ptr with xxxx in parallel for a while. This would make the change over much easier - but the question then is what would aref-ptr become? (aref really does make the most sense).

Please let me know what you think (a) about names and (b) about when might be a good time to introduce this kind of breaking change.

I would really appreciate a range of views on these topics so please sing out.

Cheers,

Andrew.

Ben Swift

unread,

Mar 2, 2016, 5:44:38 PM3/2/16

to extemp...@googlegroups.com

Hi Andrew

Q1. I think the case thing isn't great (as you pointed out yourself).
Succinctness is a noble goal, but I don't think it's worth introducing
that extra confusion, especially as someone who has to teach xtlang to
people :)

Is one option to have the plain MyObj as the value constructor and MyObjref (or
MyObj_ref or MyObjptr or MyObj_ptr)? Especially since I guess there are
a few situations where I currently use things by reference, but in
future if it were easier to work by value I'd like to do things that way?

Q2. this is an easier one in my mind - I reckon it's worth breaking
existing code to go to an {a,t,p,v}{ref,val} scheme. The change will
touch a lot of code, but it should be +mostly* just a find/replace
jobby, and the compiler might also help out with catching any borked
code (although it won't in all cases).

Cheers,
Ben

Andrew Sorensen

unread,

Mar 2, 2016, 6:12:41 PM3/2/16

to extemp...@googlegroups.com

On Thu, Mar 3, 2016 at 8:44 AM, Ben Swift <b...@benswift.me> wrote:

Hi Andrew

Q1. I think the case thing isn't great (as you pointed out yourself).
Succinctness is a noble goal, but I don't think it's worth introducing
that extra confusion, especially as someone who has to teach xtlang to
people :)

Is one option to have the plain MyObj as the value constructor and MyObjref (or
MyObj_ref or MyObjptr or MyObj_ptr)? Especially since I guess there are
a few situations where I currently use things by reference, but in
future if it were easier to work by value I'd like to do things that way?

I think if anything the opposite, MyObj should return a zalloc'd reference and then we could have a MyObj_val. Value semantics are nice for small objects (on 64bit), but for *most* user defined named types passing by reference will still be the common (i.e. efficient) method. This would then also mean Array and Array_val, Tuple and Tuple_val etc.. Or perhaps explicit for both? MyObj_ref & MyObj_val??

MyObj and MyObj_ ??

I'm very keen to here other thoughts on this one.

Q2. this is an easier one in my mind - I reckon it's worth breaking
existing code to go to an {a,t,p,v}{ref,val} scheme. The change will
touch a lot of code, but it should be +mostly* just a find/replace
jobby, and the compiler might also help out with catching any borked
code (although it won't in all cases).

Actually the problem is that the compiler will not be able to help out for the pathological case pref/aref/tref, which is where the trouble will be ;)

--
You received this message because you are subscribed to the Google Groups "Extempore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to extemporelan...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

digego

unread,

Mar 3, 2016, 6:03:11 PM3/3/16

to Extempore

OK, if I don't hear any objections this is what I'm going to go with:

(bind-type MyObj <i64,i64>)

will construct

MyObj >>> [MyObj*]*

MyObj_ref >>> [MyObj*]*

MyObj_val >>> [MyObj]*

Note that MyObj is an alias for MyObj_ref (i.e. the reference case is the default).

(bind-type myobj <i64,i64>)

myobj >>> [myobj*]*

myobj_ref >>> [myobj*]*

myobj_val >>> [myobj]*

i.e. no case sensitivity anymore

tuple, array and vector ...

No capitalized versions for any of the above.

tuple >>> [tuple ...]*

tuple_val >>> [tuple ...]*

tuple_ref >>> [tuple* ...]*

.. and similar for array and vector

note that tuple is an alias for tuple_val (i.e. the value case is now the default). Yes, this is the opposite to the named types behaviour above, but it also represents what I believe will be the 'common' case in both instances. Named types default to reference, and array/vector/tuple default to value. To avoid any confusion just stick with the explicit _val/_ref naming and avoid using the 'default' alias names.

We will change tref -> tval and tref-ptr -> tref (and same for aref and pref ...).

Ben Swift

unread,

Mar 3, 2016, 9:03:46 PM3/3/16

to extemp...@googlegroups.com

Hi mate

Sounds good.

Cheers,
Ben

ami...@gmail.com

unread,

Mar 4, 2016, 2:31:29 AM3/4/16

to extemp...@googlegroups.com

El 2 mar 2016, a las 18:12, Andrew Sorensen <dig...@gmail.com> escribió:

On Thu, Mar 3, 2016 at 8:44 AM, Ben Swift <b...@benswift.me> wrote:
Hi Andrew

Q1. I think the case thing isn't great (as you pointed out yourself).
Succinctness is a noble goal, but I don't think it's worth introducing
that extra confusion, especially as someone who has to teach xtlang to
people :)

Is one option to have the plain MyObj as the value constructor and MyObjref (or
MyObj_ref or MyObjptr or MyObj_ptr)? Especially since I guess there are
a few situations where I currently use things by reference, but in
future if it were easier to work by value I'd like to do things that way?

I think if anything the opposite, MyObj should return a zalloc'd reference and then we could have a MyObj_val. Value semantics are nice for small objects (on 64bit), but for *most* user defined named types passing by reference will still be the common (i.e. efficient) method. This would then also mean Array and Array_val, Tuple and Tuple_val etc.. Or perhaps explicit for both? MyObj_ref & MyObj_val??

MyObj and MyObj_ ??

I'm very keen to here other thoughts on this one.

Just a shot in the dark but is it possible to do some sort of punning of pointer operation symbols with '&'/'*', to make "_val" and "_ref" clear but also succinct? I don't love writing "_val" every time I want to pass something around (although I see the reason for it)

Tom

Andrew Sorensen

unread,

Mar 4, 2016, 4:58:38 PM3/4/16

to extemp...@googlegroups.com

Hey Tom,

Yes, I thought about that option also, but I was a little concerned that people might interpret & * as operators? I'm with you on the _val problem. In practice, it shouldn't be too bad though as the 'common case' defaults should be the norm.

Toby Gifford

unread,

Mar 4, 2016, 8:21:15 PM3/4/16

to extemp...@googlegroups.com

Hi Andrew, sorry i'm a bit late to the party. Some thoughts/replies

- I like the idea of getting rid of the uppercase/lowercase thing. Actually I had no idea it even existed, and it seems likely to cause confusion.

- I like the idea of always having the option to obtain a value or a ref as desired (though see caveat below)

- the _ref and _val prefixes seem OK to me, though I would probably favour _r and _v

Thanks for your answer to my question about return values and allocation, i.e. that they are generally stored in registers. That was actually an 'aha' moment for me! Now I see why POD types like integers and floats are returned as values, and almost anything else as a reference. Also makes a lot more sense of various debugger readouts ...

Which brings me to the caveat I referred to earlier: what does it actually mean to return a value for an object type? I get that the value constructor allocates memory in the constructor's stack frame(and initialises that memory). But then you say it makes a copy of that to 'return'. But where is that copy stored (in the typical case that it won't fit into the registers)? Is it on the stack frame of the (parent) calling function? And if so, why not have the compiler just use the parent-stack-frame-allocated space in the first place? Is this something to do with implementing continuations that means my simple stack frame hierarchy mental model is problematic?

While we are on this question, what actually happens when you declare a POD type in a closure, like say

(bind-func fnord:[void]*

(lambda ()
(let ((x:i64 3))

(println "fnord")

void))))

Is the memory for x stack allocated or zone allocated? Would it make a difference if x was declared before the lambda? And what if the function actually returned an i64:

(bind-func fnord:[i64]*

(lambda ()
(let ((x:i64 3))

(println "fnord")

x))))

Is memory for an i64 allocated twice? Stack or zone? And which stack(s)/zone(s)? You say the return value is stored in a register, but I presume the calling function can't just leave it there?

Having arrays and vectors and tuples being value by default, whilst named types are references by default, seems very confusing, but is probably worth being confusing if its the commonest use. Actually, I find this whole area confusing. Is it true to say that a tuple, array, or vector, is a contiguous region of memory? Why would a named type (which is just a tuple really?) have different typical usage?

Lastly, relating to Tom's question, I have found myself wanting an address-of operator, which is I think what you were fearing Andrew! The use case here is when I want to pass a POD type around (or have a C-library that requires it), and I find myself having to do this

(bind-func use-old-skool-c-funktion:[i64,i8*,i8*]*

(lambda (input output)

(let ((size_ref:i64* (salloc)))

(pset! size_ref 0 (* FRAMES CHANNELS))

(let ((output:i8* (zalloc (pref size_ref 0))))

(old-skool-c-api input output size_ref)

(pref size_ref 0)))

I guess it's not really much harder doing this than declaring a value type and passing around its address, but it feels a bit awkward.

Andrew Sorensen

unread,

Mar 7, 2016, 7:21:38 PM3/7/16

to extemp...@googlegroups.com

Which brings me to the caveat I referred to earlier: what does it actually mean to return a value for an object type? I get that the value constructor allocates memory in the constructor's stack frame(and initialises that memory). But then you say it makes a copy of that to 'return'. But where is that copy stored (in the typical case that it won't fit into the registers)? Is it on the stack frame of the (parent) calling function? And if so, why not have the compiler just use the parent-stack-frame-allocated space in the first place? Is this something to do with implementing continuations that means my simple stack frame hierarchy mental model is problematic?

There is *generally* no actual copy, the caller makes space for the return "value" for the callee to write into. Just be aware though that this is a compiler optimization, and is architecture, platform and compiler independent. Also keep in mind that this large return "value" will likely be used elsewhere (returned again, used as a parameter to other function calls etc..), and that the compiler cannot always optimize away additional "copies". It is likely the value will need to be pushed (i.e. copied) for future calls etc.. In general, better to help the compiler out by passing larger objects by reference - obviously an 8 byte address is easier to throw around than a 1M "value". Also worth keeping in mind that the stack is a relatively small resource - where zones use heap memory space.

While we are on this question, what actually happens when you declare a POD type in a closure, like say
(bind-func fnord:[void]*

(lambda ()
    (let ((x:i64 5))
      (println "fnord")
       void))))

Is the memory for x stack allocated or zone allocated? Would it make a difference if x was declared before the lambda?

x is stack allocated: Here is the x86

0x123ccb000: mov qword ptr [rsp - 0x8], rdi

0x123ccb005: mov qword ptr [rsp - 0x10], 0x5

0x123ccb00e: ret 0x8

The first line is pushing the closure environment (gets passed into every xtlang closure as a hidden argument). The second line pushes 0x5. The third line returns - nothing else to do. So the first two lines here are redundant - more on this below.

If the x (i.e. let) was before the lambda the answer would be more complicated as the x would then belong to the closures environment - let's leave that as a discussion for another day :)

And what if the function actually returned an i64:
(bind-func fnord:[i64]*
(lambda ()
    (let ((x:i64 3))
      (println "fnord")
       x))))

Is memory for an i64 allocated twice? Stack or zone? And which stack(s)/zone(s)? You say the return value is stored in a register, but I presume the calling function can't just leave it there?

This totally depends on the context. In this simple case the compiler would optimally simply move "3" directly to rax (return register) and return - job done!

Looking at the actual x86 output from fnord we find that llvm doesn't do an optimal job of optimizing this. This is not a criticism of llvm in any way, the xtlang compiler does not make llvm's job very easy and leaves most of the optimization to llvm.

So in the asm below the first and second lines (which are basically pushing the stack) are redundant. Again the first line pushes the closures environment parameter (reg rdi) onto the stack. As we don't actually use the environment param in this case this could have been optimized away. LLVM could also optimize out the second line, as pushing the 0x3 to the stack is also redundant. The third line is the business though - moves the 0x3 to eax (i.e. rax) and returns. You can imagine *why* we end up with code like this though - if you consider the let separately to the return.

0x123cc7000: mov qword ptr [rsp - 0x8], rdi

0x123cc7005: mov qword ptr [rsp - 0x10], 0x3

0x123cc700e: mov eax, 0x3

0x123cc7013: ret 0x8

I'm not sure what you mean by "just leave it there"? Why would the calling function want to just leave it there?

Having arrays and vectors and tuples being value by default, whilst named types are references by default, seems very confusing, but is probably worth being confusing if its the commonest use. Actually, I find this whole area confusing. Is it true to say that a tuple, array, or vector, is a contiguous region of memory? Why would a named type (which is just a tuple really?) have different typical usage?

Purely because of size. Named types are likely to be larger - in general. calling (tuple ...) (array ...) etc., are likely to be small and should fit in registers. For example (array 1.0 2.0 3.0 4.0), (tuple a b). It's unlikely you'll see to many (array 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 etc..) calls. You should think of (tuple ..) and (array ..) as literals, and named type constructors as ... constructors :)

Lastly, relating to Tom's question, I have found myself wanting an address-of operator, which is I think what you were fearing Andrew! The use case here is when I want to pass a POD type around (or have a C-library that requires it), and I find myself having to do this
(bind-func use-old-skool-c-funktion:[i64,i8*,i8*]*
(lambda (input output)
    (let ((size_ref:i64* (salloc)))
       (pset! size_ref 0 (* FRAMES CHANNELS))
        (let ((output:i8* (zalloc (pref size_ref 0))))
   (old-skool-c-api input output size_ref)
   (pref size_ref 0)))
I guess it's not really much harder doing this than declaring a value type and passing around its address, but it feels a bit awkward.

Yes, I agree that it this is awkward, but as you say this usually happens as a byproduct of interfacing to C - there really isn't a *good* reason to write code like this for xtlang that doesn't interface to C. For those (relatively) few "C" lib cases I'm inclined to leave things as they are.

Cheers,

Andrew.

George

unread,

Jan 24, 2022, 3:36:45 PM1/24/22

to Extempore

Hi to all

I was referred to this thread by a note box in the "xtlang Types" document.

"The semantics of the *ref functions are in the process of being changed---see this thread on the mailing list for more details. We'll update these docs as soon as things settle down, but for now accept my humble apology that some of this stuff is out of date. Sorry!"

I wonder if this discussion continues to be relevant. Or has the matter been settled?

No sensible contribution from me as a complete novice. Just wondering.

Regards

George

Ben Swift

unread,

Jan 26, 2022, 8:11:07 PM1/26/22

to extemp...@googlegroups.com, George

Hi George

I agree that the reference to a 6yo mailing list thread as "currently
under discussion" is a bit confusing :)

I've removed that box. In terms of where we landed, the stuff discussed in the
mailing list thread is still largely accurate, and I'll have a look to see if
that docs page is misleading at all.

Cheers,
Ben

Reply all

Reply to author

Forward