Bitwidth operator

Ben Laurie

unread,

May 8, 2010, 4:39:38 PM5/8/10

to Stupid Development

I've been thinking.

I've left unaddressed the problem of bitwidthds of constants, and
alongside that we have the very clunky operator syntax. Not to mention
the pain of implementation of all those operators.

It seems to me that introducing a bitwidth operator, _, that works on
(pretty much) anything gives a clean way to proceed. This would also
allow us to write operators a little more sanely (e.g. "plus32" would be
"+_32").

It also seems to me that allowing bitwidth deduction is a perfectly fine
thing to do - no casting, just deduction. We still insist that
everything matches.

Combining the ideas we might write something like this:

uint_32 a = 1234;
uint_8 b = 56;
uint_32 c = (a + widen b) ^ 234;

which, after deduction, would be the same as:

uint_32 a = 1234_32;
uint_8 b = 56_8;
uint_32 c = (a +_32 widen_8_32 b) ^ 234_32;

I'd refer to things like "1234" and "+" as having an uncommitted bitwidth.

Thoughts?

Some other digressions:

1. I was thinking of using ++ for overflowplus, and >>> and <<< for
rotation.

2. The narrow operator should fault if the bits it discards are non-zero.

3. Given 2, we could then write both "widen a" and "narrow a" as
a_<bitwidth> safely.

4. Obviously it should be an error to write 257_8. If you wanted to do
that for some reason, you could write (257 &_16 0xff)_8. (257 & 0xff)_8
should be an error, I think, because after deduction & still has
uncommitted width.

5. Thinking about it, sign probably has to be part of the bitwidth. So
we'd actually write:

int_32u a = 1234_32u;
int_8u b = 56_8u;
int_32u c = (a +_32u widen_8u_32u b) ^ 234_32u;

which is kinda ugly. Or we could admit that bits are essentially
unsigned and that sign is in the eye of the operator and just not have a
signed type. Though that sounds like a great way to cause programmer
error, so maybe not.

We could make unsigned the default, and of course you wouldn't have to
write most of the bitwidths.

--
http://www.apache-ssl.org/ben.html http://www.links.org/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff

Ben Clifford

unread,

May 8, 2010, 5:03:38 PM5/8/10

to Stupid Development

sounds quite like hindley-millner type inference, but a bit more
restricted (not inferring the whole type - eg you don't describe inferring
the type of variables themselves)

I like the idea of using it the way you describe to remove redundant
size/sign annotations.

--
http://www.hawaga.org.uk/ben/

Ben Clifford

unread,

May 9, 2010, 11:31:41 AM5/9/10

to Stupid Development

This looks very much like Haskell numeric type handling. So part of this
message talks about what that looks like, but mostly because I want to
comment on it wrt some of the stuff you said - hopefully you can pick up
the syntax.

That has 'type annotations' on expressions (including on constants). These
are type declarations but you can make them almost anywhere rather than
specifically on a variable declaration.

So you write:

32

and that actually means '32 in whatever numeric type is needed'

eg. type this into GHC:

$ ghci
Prelude> :t 32 -- what is the type of 32?
32 :: (Num t) => t

means 32 has unknown type, represented by some type variable t, but it is
the case that t is a Num(ber) type.

So now when you ask about +:

Prelude> :t (+)
(+) :: (Num a) => a -> a -> a

That means + is an operator that takes two parameters of a number type a
and returns a value of that number type. Note its a -> a -> a, not
So that means all the number types are the same. You can't
add a float and an int, even though both float and int are number types -
because when we call +, either a = Int or a = Float.

We could have a different operator, fuzzyAdd with type:
(Num a, Num b, Num c) => a -> b -> c
which *could* add a float and an int, and return something of potentially
a third number type. But thats not what we want in stupid.

> I've left unaddressed the problem of bitwidthds of constants, and
> alongside that we have the very clunky operator syntax. Not to mention
> the pain of implementation of all those operators.
>
> It seems to me that introducing a bitwidth operator, _, that works on
> (pretty much) anything gives a clean way to proceed. This would also
> allow us to write operators a little more sanely (e.g. "plus32" would be
> "+_32").

OK. The haskell syntax uses :: as a type annotation:

32 :: Float

means that the expression on the LHS has type float.

Prelude> :t (32 :: Float)
(32 :: Float) :: Float

So _ in your description is close to :: for numeric constants, and
anything that represents a value.

So this can be used like this:

Prelude> (32 + 8) :: Float
40.0
Prelude> (32 + 8) :: Int
40

In the above two cases, a different + is being used - the first for
add-floats, and the second, add-ints.

That behaviour is the same as you propose.

Also:

Prelude> ( (32 ::Float) + (8 ::Float))
40.0

but:

Prelude> ( (32 ::Float) + (8 ::Int))

<interactive>:1:18:
Couldn't match expected type `Float' against inferred type `Int'
In the second argument of `(+)', namely `(8 :: Int)'
In the expression: ((32 :: Float) + (8 :: Int))
In the definition of `it': it = ((32 :: Float) + (8 :: Int))

So far this is almost identical to what you describe.

The Haskell syntax for forcing a particular + is different - its more
general than +_type syntax

Prelude> ((+) :: Float -> Float -> Float) 5 5
10.0

So +_32 is much more concise.

So then what about widen / narrow?

Maybe they're the same function, as you suggest by saying:

> 3. Given 2, we could then write both "widen a" and "narrow a" as
> a_<bitwidth> safely.

widen has two type variables, the input type, and the output type. In
Haskell syntax: widen :: (Num a, Num b) => a -> b

Saying (expr)_type here in your notation:

(257 & 0xff)_8

means (in Haskell notation):

(widen (257 & 0xff)) :: uint8

which givens an implicit widen, and binding the output type.

A different interpretation, the Haskell interpretation of the expression:

(257 & 0xff)) :: uint8

makes 257 and 0xff be uint8s (and then would give an error later on that
257 can't be a uint8), like this:

the expression is uint 8. so for the operator, & :: a->a->a we know
a=uint8, so then we know 257 and 0xff are both uint8s.

So whats the difference between the two, in the case of stupid being used
in real life rather than contrived examples?

Mostly, I think: the haskell way requires an explicit resize operation
when you have something that is known to be one size that you know want to
be a different size. It doesn't use the same syntax (_) for "I'm telling
you what this is" vs "I'm telling you I want you to convert this to a
different size". Maybe that difference is worth making explicit in the
language.

explicit-resize could still be an operator, but using a different symbol.

> 2. The narrow operator should fault if the bits it discards are non-zero.

> 5. Thinking about it, sign probably has to be part of the bitwidth. So
> we'd actually write:

yes.

Thinking about whats going inside the bitfields, then there are a bunch of
types that aren't in a strict linear order.

u32 can hold anything a u8 can hold so a widen always works. Sometimes you
can narrow a u32 to a u8, but you won't know until runtime. So u32 is
strictly bigger than u8 and the words 'narrow' and 'widen' apply.

But for signed 8 (s8?) and u8, then sometimes you can 'convert' an s8 to a
u8 (but not know until runtime - eg 5 works but -5 doesn't work), and
likewise u8 to s8 ( 100 works but 200 doesn't work). So neither is
strictly bigger than the other, and 'widen' and 'narrow' don't make sense,
but their generalisation, 'resize' does.

--

Ben Laurie

unread,

May 9, 2010, 12:27:10 PM5/9/10

to stupi...@googlegroups.com

On 09/05/2010 16:31, Ben Clifford wrote:
> Mostly, I think: the haskell way requires an explicit resize operation
> when you have something that is known to be one size that you know want to
> be a different size. It doesn't use the same syntax (_) for "I'm telling
> you what this is" vs "I'm telling you I want you to convert this to a
> different size". Maybe that difference is worth making explicit in the
> language.
>
> explicit-resize could still be an operator, but using a different symbol.

Since this seems like the only point of disagreement, I've snipped the rest.

In the abstract this argument makes sense, but I feel that in the case
of Stupid it makes a distinction that's not of much value, since I think
it's clear from context which one you have. If _ is applied to a
primitive, it is setting its type. If it is applied to a derived value,
it is a cast.

That said, maybe it's nice to make it crystal clear. _ for setting size
and __ for casting (or resizing, if you prefer)?

Ben Clifford

unread,

May 9, 2010, 12:43:46 PM5/9/10

to stupi...@googlegroups.com

> Since this seems like the only point of disagreement, I've snipped the
> rest.
>
> In the abstract this argument makes sense, but I feel that in the case
> of Stupid it makes a distinction that's not of much value, since I think
> it's clear from context which one you have. If _ is applied to a
> primitive, it is setting its type. If it is applied to a derived value,
> it is a cast.
>
> That said, maybe it's nice to make it crystal clear. _ for setting size
> and __ for casting (or resizing, if you prefer)?

So if __ is a cast and _ is a type spec, then 100__u8 is almost the same
as 100_u8. In 100__u8 its not clear what the type of 100 is, but who cares
as long as when you cast it to u8 its 100_u8 - the expression 100__u8 has
type u8, which is all that matters.

So then maybe I'm happy with the single operator approach after all, where
that operator is cast.

--

Ben Laurie

unread,

May 9, 2010, 3:03:46 PM5/9/10

to stupi...@googlegroups.com

The half formulated example in my head is, given two operators:

(1 + 1)_8

vs.

(1 + 1)__8

The first would be setting the type, and so all would be hunky dory. The
second would be casting, but would be a compile-time error, because the
size of what it is casting is not determined.

If there is only one operator, is (1+1)_8 an error or not? I'd say it
was. But you'd say it wasn't, I think, from the above.

Does it matter? Not sure!

Ben Clifford

unread,

May 9, 2010, 3:13:47 PM5/9/10

to stupi...@googlegroups.com

> If there is only one operator, is (1+1)_8 an error or not? I'd say it
> was. But you'd say it wasn't, I think, from the above.

Right. I think in both of my opinions that I've expressed, its not an
error.

In one opinion, _8 sets the type, but that type propagates through the +
and so sets the types of the constants

In the other opinion _8 is a cast; the type of the numeric constants is
not tied to any particular kind of numeric constant, and so is some magic
parent type that can be cast to any other numeric type.

--

Ben Laurie

unread,

May 9, 2010, 3:58:11 PM5/9/10

to stupi...@googlegroups.com

OK, so it's interesting to consider the case:

(1 ++ 255)_8

which is impossible to correctly evaluate wihout knowing the bitwidth,
but depending which of your alternatives we choose, we get different
outcomes: in one case, 1 ++ 255 is set to 8 bits, result 0, _8 is ok. In
the other, 1 ++ 255 is infinite bits, so the outcome is 256 and _8 is an
error.

Ben Clifford

unread,

May 9, 2010, 4:27:14 PM5/9/10

to stupi...@googlegroups.com

> OK, so it's interesting to consider the case:
>
> (1 ++ 255)_8
>
> which is impossible to correctly evaluate wihout knowing the bitwidth,

So three different possibilities at least (my two and your one). Toss a
three sided coin ;)

--

Ben Laurie

unread,

May 10, 2010, 5:08:48 AM5/10/10

to stupi...@googlegroups.com

On 09/05/2010 20:13, Ben Clifford wrote:
>

This means they need to be evaluated as bignums. Which I guess is
do-able in the compiler, but not yet do-able in the output.

Ben Clifford

unread,

May 10, 2010, 7:12:50 AM5/10/10

to stupi...@googlegroups.com

> > In one opinion, _8 sets the type, but that type propagates through the +
> > and so sets the types of the constants
> >
> > In the other opinion _8 is a cast; the type of the numeric constants is
> > not tied to any particular kind of numeric constant, and so is some magic
> > parent type that can be cast to any other numeric type.
>
> This means they need to be evaluated as bignums. Which I guess is
> do-able in the compiler, but not yet do-able in the output.

and that makes 257_8 fail at runtime and not at compile time. unless there
is compile-time partial evaluation etc. which makes me less favour the
casting option and prefer either what you proposed initially or "_8 sets