Re: Adding fixed sized integer and word types

4 views
Skip to first unread message

Ben Lippmeier

unread,
Dec 20, 2018, 6:14:24 PM12/20/18
to Chris Hall, discu...@googlegroups.com
(cc-ing to discus-lang as Amos is also hacking currently)

On 20 Dec 2018, at 7:18 pm, Chris Hall <followin...@gmail.com> wrote:

First pass has been committed as

I added 'constructor functions', so `#word16 12` is how you get a word16 instance.

Currently `Nat` is the default parsed integer, I would like to change that to Int so that we can support negative numbers, this would make it

12: Int
-6 : Int
#nat 14 : nat
#word8 43 : word8

Thoughts?

In Shimmer the literals are constructed like #nat’5, #word16’12,  #int’-5 and so on. This is consistent with the naming for primitive operators like #nat’add, and using the prime means that the whole value is defined in a single lexeme. We can have multiple lexemes for the same value: eg #word8’13, #word8’0b1101, #word8’0xd would all be the same value. Out of range values like #word8’300 would be caught in the lexer. 

With the space syntax like “#nat 14”, the “#nat” is an odd thing. The # in its name would indicate it’s a primitive of some sort, but it can’t be assigned a type. Detection of out of range values like #word8’300 would need to go somewhere between the lexer and the type checker, but it’s not really type checking as neither #word8 or 300 has a type by itself (?). Also consider floating point values, where #float32’-0.5 would make sense but #word8 -0.5 would not.

I’m fine with having the default number format to be #Int if that seems more useful, provided the form with the explicit type also works. The default number is #Nat now only because that was the easiest to implement at the time.

Ben.


Chris Hall

unread,
Dec 20, 2018, 7:13:30 PM12/20/18
to Ben Lippmeier, discu...@googlegroups.com
The type of `#word8` is Int -> Word8, but they are meta level so they are a bit funny, and I dislike how they would handle failure (runtime panic).

I'm happy to move to #nat'5 instead, I just went for what was easiest and required no real updates to lexer/parser.
I was thinking yesterday that for floating point we would at least need floating point literal lexing and parsing anyway.

The default number literal being Int is mostly only useful if we stick with `#word8 12`, if we instead move to `#word8'12` then it isn't a big difference.
For completely ideological reasons, I like the idea of the default numeric type being unsigned :p

As an aside, in our tests I saw
> [list #Nat| 10, 11, 12, 13, 14]
what is the type of `list` there? it seems to be roughly `(t : Type) (items : [t]) -> ...`

~ Chris

Ben Lippmeier

unread,
Dec 20, 2018, 7:21:40 PM12/20/18
to discu...@googlegroups.com, Chris Hall

On 21 Dec 2018, at 11:12 am, Chris Hall <followin...@gmail.com> wrote:

The type of `#word8` is Int -> Word8, but they are meta level so they are a bit funny, and I dislike how they would handle failure (runtime panic).

I'm happy to move to #nat'5 instead, I just went for what was easiest and required no real updates to lexer/parser.
I was thinking yesterday that for floating point we would at least need floating point literal lexing and parsing anyway.

The default number literal being Int is mostly only useful if we stick with `#word8 12`, if we instead move to `#word8'12` then it isn't a big difference.

For completely ideological reasons, I like the idea of the default numeric type being unsigned :p

Maybe the default should just be of type “Natural”: unsigned, arbitrary precision, and non-concrete — the way numbers were always supposed to be.

As an aside, in our tests I saw
> [list #Nat| 10, 11, 12, 13, 14]
what is the type of `list` there? it seems to be roughly `(t : Type) (items : [t]) -> ...`

The ‘list’ is a soft keyword that marks the list construction, not a variable with a type. I added the explicit keyword just so it is obvious whether you’ve got an argument vector, a list, set or map — there are never enough sorts of brackets on the keyboard. There is a production for it in the grammar under doc/reference/01-grammar.md

Using the keyword means the types of empty containers are unambiguous:

[list #Nat|]         :: #List #Nat
[set  #Nat|]         :: #Set  #Nat
[map  #Nat #Symbol|] :: #Map  #Nat #Symbol

Ben.

Amos Robinson

unread,
Dec 20, 2018, 7:38:41 PM12/20/18
to discu...@googlegroups.com, Chris Hall
I have a test that generates random values and checks they can be pretty-printed and then parsed back unambiguously – Int fails for non-negative numbers right now because "VInt 100" prints as 100, which parses as "VNat 100". If you add the specific int or nat syntax, you might want to update the `valuePrimitive` test data generator to produce positive ints here:



--
You received this message because you are subscribed to the Google Groups "Discus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discus-lang...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris Hall

unread,
Jan 17, 2019, 8:14:19 PM1/17/19
to Amos Robinson, discu...@googlegroups.com
Sorry for the delay.

So poking around Shimmer I can see the pretty printer SMR/Prim/Name.hs can output w8', i8', etc.
I can't find any evidence that these exist within the Lexer or Parser, it looks like it just blindly consumes the primitive `#int'-5` which I assume is then dealt with at runtime.

In SMR/Prim/Name.hs I can also see `readPrim` which seems to only deal with that nat' prefix.

Where do you deal with w8, w16, i8, etc.? I couldn't find the runtime code.

~ Chris

Ben Lippmeier

unread,
Jan 17, 2019, 9:43:20 PM1/17/19
to discu...@googlegroups.com, Chris Hall, Amos Robinson
It looks like I never finished the text parser and printer. I probably added the w8, w16 etc prim forms when building the binary codec that DDC uses, but never finished the text codec support.

Ben.

Chris Hall

unread,
Jan 17, 2019, 10:15:59 PM1/17/19
to Ben Lippmeier, discu...@googlegroups.com, Amos Robinson
That makes sense based on what I found.

I should have a clean enough lexer for this soon, just fighting bounds checking a little.

~ Chris

Chris Hall

unread,
Feb 3, 2019, 7:03:52 AM2/3/19
to Ben Lippmeier, discu...@googlegroups.com, Amos Robinson
Okay, after much delay and many back-and-forths, I have landed the 114th iteration of this

Importantly it is non-breaking for naturals, so a number on it's own it assumed to be a natural, so the majority of existing tests did not need to change.
If anything in this causes a problem please feel free to revert and I can then re-land after it is fixed, but all current tests are passing.

#bool'true -- #Bool
1 -- #Nat
#nat'1 -- #Nat
#int'12 -- #Int
#word'14 -- #Word

#int8'1 -- #Int8
#int16'12 -- #Int16
#int32'4 -- #Int32
#int64'7 -- #Int64

#word8'1 -- #Word8
#word16'7 -- #Word16
#word32'12 -- #Word32
#word64'37 -- #Word64

#word8'256 -- lexing error as maximum value is for Word8 is 255, this then becomes an unknown primitive (I haven't figured out how to nicely give errors from lexing for bounds checking)

~ Chris
Reply all
Reply to author
Forward
0 new messages