"core C"

Rosario19

unread,

Jun 30, 2015, 2:38:42 AM6/30/15

to

"core C" already exist

it is the use of
"intxx_t" type that would make all operation +-/* all the same for
differents machine/compiler

it is one float of fixed size

it is some macro value for aligned

it is a pointer linear of 32 bit
and a pointer linear of 64 bit

end

endianes of cpu for me has no rilevance
the only place that it count something is when integers are sent
code-decode

endianes not count for operators
0x2 & 0x7
would be the same number for all computer whaterver endianess they
have, or am i wrong?

Jens Thoms Toerring

unread,

Jun 30, 2015, 3:54:20 AM6/30/15

to

Rosario19 <R...@invalid.invalid> wrote:
> "core C" already exist

Yes?

> it is the use of
> "intxx_t" type that would make all operation +-/* all the same for
> differents machine/compiler

There's one drawback with the intN_t types: they're optional.
Thus this only will work on systems were these typedefs
actually exist. Systems with CHAR_BIT larger than 8 will
have real problems supporting all or at least some of them.
And the behaviour on overflow is still not defined for
the intN_t types, so the result of a simple addition that
overflows can be different on different systems.

> it is one float of fixed size

And which is that? There's, as far as I am aware, float,
double and long double. But none of them have a fixed size.

> it is some macro value for aligned

> it is a pointer linear of 32 bit
> and a pointer linear of 64 bit

No idea what you're talking about.

> end

> endianes of cpu for me has no rilevance
> the only place that it count something is when integers are sent
> code-decode

Endianess is very relevant whenever you exchange data with
a different system, either over the network or when reading
a file thatwas created on a different system. And it not
only concerns integers but also floating point numbers.

> endianes not count for operators
> 0x2 & 0x7
> would be the same number for all computer whaterver endianess they
> have, or am i wrong?

Yes, but that requires that you have the data already read
in correctly into memory.
Regards, Jens
--
\ Jens Thoms Toerring ___ j...@toerring.de
\__________________________ http://toerring.de

Malcolm McLean

unread,

Jun 30, 2015, 5:55:27 AM6/30/15

to

On Tuesday, June 30, 2015 at 8:54:20 AM UTC+1, Jens Thoms Toerring wrote:

> Rosario19 <R...@invalid.invalid> wrote:
>
> > it is one float of fixed size
>
> And which is that? There's, as far as I am aware, float,
> double and long double. But none of them have a fixed size.
>

If floating point precision is important to your application, then it's likely
that you a re doing heavy-duty floating point number crunching. So it's
only practical to use the hardware floating point unit. If it isn't strictly
conforming to IEEE then the results won't be consistent with runs on
other platforms, and there's not much you can do about it.

Rick C. Hodgin

unread,

Jun 30, 2015, 7:27:25 AM6/30/15

to

We can define Core C here if you'd like. :-) I appoint myself chair,
and the rest of us a "committee of the whole" to address the issue.

#1 Little endian.
#2 Two's complement.
#3 All pointers are the same size per target.
#4 All memory is executable.
#5 All memory is read/write unless explicitly flagged as read-only.
#6 No default support for char, short, int, long, long long, etc.,
but these keywords are reserved for use with typedef or #define
statements to map to the appropriate target for the machine.
#7 Signed variable types: s8, s16, s32, s64
#8 Unsigned variable types: u8, u16, u32, u64
#9 Floating point types all IEEE 754-2008 compliant: f32, f64
#10 A new "flag" variable type is introduced which is 1-bit,
assigned with yes/true/up and no/false/down keywords.
#11 A new "bits" variable is introduce which is n-bits, where:
1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
checked on use with inquiries being thrown (see #16 below).
#12 Two new arbitrary size variable type are introduced: bi, bfp
(bi=big integer, bfp=big floating point)
#13 A new weakly typed variable type is introduced: var
#14 The basic class is introduced.
#15 The concept of edit-and-continue compilation is mandatory.
#16 A new exception framework is introduced based on inquiries
rather than errors. These allow the machine to suspend for
edit-and-continue guidance when an "error" condition arises,
which will now be called an "inquiry" condition.
#17 Flow control blocks are introduced.
#18 Adhoc functions are introduced.
#19 The #marker is introduced (for named return destinations, as in:
returnto markerName).
#20 The cask is introduced allowing for arbitrary code injection,
and calling protocols on logical tests to meta (true) and
mefa (false), as well as message passing using mema.
#21 A new self-documenting protocol is introduced, which allows
the software to be documented in source code (similar to Java's).
#22 A new metadata set of tags is made available on a line-by-line
basis, or at the function / struct level. These explicitly
tag source code lines to be of a particular type, making a new
type of search and/or replace possible (such as /* md: a, b, c */).
#23 All standard libraries are available and #include'd by default.
They must be explicitly excluded if they are not needed.
#24 All functions are linked to (1) a bare minimum for standard
libraries, (2) all are included for developer code; unless
otherwise directed by compiler/linker flags.

An incomplete list. I'll add more as I think of them. Please feel
free to comment or propose.

Best regards,
Rick C. Hodgin

Rick C. Hodgin

unread,

Jun 30, 2015, 7:41:34 AM6/30/15

to

The thing we're creating is called "Core C" and the body responsible
for it will be the "Core C Group".

Core C will be comprised of:

(1) The Core C language definition.
(2) The Core C known extensions definition.
(3) The Core C guide for adding unknown extensions.
(4) The Core C compiler.
(5) The Core C debugger.
(6) The Core C editor and/or IDE.
(7) The Core C virtual machine.

-----
The Core C compiler will implement all Core C features. It will be
designed to run on the Core C virtual machine, but written in a way
that allows immediate recompilation onto native hardware (for
generating code which runs on the virtual machine), with a well-
documented front-end, middle, and back-end for support of other
languages, and other hardware.

The Core C virtual machine will be comprised of a very thin machine
with opcodes sufficient to represent a virtual CPU that is easily
mapped onto x86-64 and ARM hardware in emulation. The Core C virtual
machine will implement a JIT compiler to translate VM opcodes into
native machine opcodes at load-time.

The Core C debugger and editor and/or IDE will be designed to run on
the virtual machine, and will help define the VM's API for OS/host
services.

The Core C language definition will include a list of known
extensions or allowances, such as allowing big-endian, one's
complement, pointers of various sizes, and these will be given
a definition on use.

A guide will also be given on how to implement things that are not
known to Core C, so they can still be added "within the definition."

Rick C. Hodgin

unread,

Jun 30, 2015, 7:42:43 AM6/30/15

to

Note: I wrote all of this in less than an hour this morning. It's all
off the top of my head. It is not cast in stone, and is just my
first thinking. It will undoubtedly be revised as time goes on.

Rick C. Hodgin

unread,

Jun 30, 2015, 7:48:30 AM6/30/15

to

On Tuesday, June 30, 2015 at 7:27:25 AM UTC-4, Rick C. Hodgin wrote:
> #11 A new "bits" variable is introduce which is n-bits, where:
> 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
> checked on use with inquiries being thrown (see #16 below).

I think this should be defined as "bN" where N is the number of bits,
and the range should be 1 <= n <= 64. It is designed, by default, to
contain an unsigned integer of that width. To use it in another form,
cast it to another destination:

b17 lbMyBitVar;
s32 lnSigned32;

lnSigned32 = (s32)lbMyBitVar;
...
lbMyBitVar = (b17)lnSigned32;

In each of these casts, whatever raw bit sequence exists is
transferred to the destination. No sign extensions or interpolation.

> #16 A new exception framework is introduced based on inquiries
> rather than errors. These allow the machine to suspend for
> edit-and-continue guidance when an "error" condition arises,
> which will now be called an "inquiry" condition.

Bartc

unread,

Jun 30, 2015, 8:02:16 AM6/30/15

to

On 30/06/2015 12:27, Rick C. Hodgin wrote:
> We can define Core C here if you'd like. :-) I appoint myself chair,
> and the rest of us a "committee of the whole" to address the issue.
>
> #1 Little endian.
> #2 Two's complement.
> #3 All pointers are the same size per target.
> #4 All memory is executable.

No way of turning it into non-executable? A bit dangerous.

> #5 All memory is read/write unless explicitly flagged as read-only.

> #6 No default support for char, short, int, long, long long, etc.,
> but these keywords are reserved for use with typedef or #define
> statements to map to the appropriate target for the machine.
> #7 Signed variable types: s8, s16, s32, s64
> #8 Unsigned variable types: u8, u16, u32, u64

No 'char' type?

> #9 Floating point types all IEEE 754-2008 compliant: f32, f64
> #10 A new "flag" variable type is introduced which is 1-bit,
> assigned with yes/true/up and no/false/down keywords.

I bet it won't be 1-bit! How much storage will an array of 1000 flags
take up? Anything smaller than 1000 bytes means you have to deal with
bits and bit-pointers, which might compromise #3, making all pointers
the same size.

> #11 A new "bits" variable is introduce which is n-bits, where:
> 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
> checked on use with inquiries being thrown (see #16 below).

Anywhere, or just inside structs or as array elements? But the same
objection as before. What about bits:12 and bits:20 (which tidily fit
into a 32-bit type)?

What is the range of bits:2? Is it 0 to 3, or can be it anything (-2 to
+1 for example)? Will bits:N ever cross a byte-boundary? (In which case,
you might want bits:8 starting halfway through a byte.)

Range-checking on assignment is going to slow down any operations, as
presumably this encompasses all possible overflows on arithmetic as well
as when assigning to and from a bits:N from a different type.

How many bytes and/or bits will a [5] array of bits:7 take up?

Just saying bit types and bitfields take a lot of support.

> #12 Two new arbitrary size variable type are introduced: bi, bfp
> (bi=big integer, bfp=big floating point)

Not exactly part of a core language. You would expect a core language to
implement features like this.

> #13 A new weakly typed variable type is introduced: var

And this is completely misplaced. I assume this would be dynamically
typed? Then it would make nonsense of whatever type system you're trying
to impose.

Example, in A=&B; you would expect that if B is of type T, then A ought
to be of type T*. But both could of of 'var' type with nothing amiss;
anything goes. And you need to make a distinction between the static
'var' type of something, and the dynamic type it contains. Would a var*
type be allowed? Would var types be passed by value or by reference? Etc.

Another can of worms inappropriate in a 'core' language.

> #15 The concept of edit-and-continue compilation is mandatory.

And this is where you're going to lose everyone who's made it this far.
This is nothing to do with languages or compilers as they are normally
understood.

--
Bartc

Bartc

unread,

Jun 30, 2015, 8:07:44 AM6/30/15

to

On 30/06/2015 12:48, Rick C. Hodgin wrote:
> On Tuesday, June 30, 2015 at 7:27:25 AM UTC-4, Rick C. Hodgin wrote:
>> #11 A new "bits" variable is introduce which is n-bits, where:
>> 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
>> checked on use with inquiries being thrown (see #16 below).
>
> I think this should be defined as "bN" where N is the number of bits,
> and the range should be 1 <= n <= 64. It is designed, by default, to
> contain an unsigned integer of that width. To use it in another form,
> cast it to another destination:
>
> b17 lbMyBitVar;
> s32 lnSigned32;
>
> lnSigned32 = (s32)lbMyBitVar;
> ...
> lbMyBitVar = (b17)lnSigned32;
>
> In each of these casts, whatever raw bit sequence exists is
> transferred to the destination. No sign extensions or interpolation.

OK, this shows you haven't properly thought these through... (...if you
can make a significant change even before someone can respond to the
original proposals!)

So, what is the difference between b17 and u17? And what exactly does it
mean to have standalone variables 17 bits wide? Is it to impose the
range of values 0 to 131071? Which would be a rather odd range. And what
happens on overflow: should it wrap (as happens in C with unsigned), or
be undefined, or trap?

--
Bartc

Rick C. Hodgin

unread,

Jun 30, 2015, 8:13:15 AM6/30/15

to

On Tuesday, June 30, 2015 at 8:02:16 AM UTC-4, Bart wrote:
> On 30/06/2015 12:27, Rick C. Hodgin wrote:
> > #11 A new "bits" variable is introduce which is n-bits, where:
> > 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
> > checked on use with inquiries being thrown (see #16 below).
>
> Anywhere, or just inside structs or as array elements? But the same
> objection as before. What about bits:12 and bits:20 (which tidily fit
> into a 32-bit type)?

When defined separately, they are bit-merged into local or global
memory space, and then bit-masked and bit-shifted for use.

> What is the range of bits:2? Is it 0 to 3, or can be it anything (-2 to
> +1 for example)? Will bits:N ever cross a byte-boundary? (In which case,
> you might want bits:8 starting halfway through a byte.)

All bit types are unsigned integers.

> Range-checking on assignment is going to slow down any operations, as
> presumably this encompasses all possible overflows on arithmetic as well
> as when assigning to and from a bits:N from a different type.

Yes. They are used to migrate the need to range check from developer
code to compiler code. If you don't want range checking, do it
manually in developer code.

> How many bytes and/or bits will a [5] array of bits:7 take up?

When defined as an array, they are bit-packed.

> Just saying bit types and bitfields take a lot of support.

Yup.

> > #12 Two new arbitrary size variable type are introduced: bi, bfp
> > (bi=big integer, bfp=big floating point)
>
> Not exactly part of a core language. You would expect a core language to
> implement features like this.

Would or wouldn't expect a core language to implement a feature like
this? Core C defines it.

> > #13 A new weakly typed variable type is introduced: var
>
> And this is completely misplaced. I assume this would be dynamically
> typed? Then it would make nonsense of whatever type system you're trying
> to impose.

It is based on usage:

var x;

x = 5; // assumed s32
x = "rick"; // assumed s8[]

It is not a literal type, but points to a simple structure, which
then conveys the type within. It must be converted or translated
to fundamental types, and there are new features which allow for
a var type to be tested to see what type it is.

> Example, in A=&B; you would expect that if B is of type T, then A ought
> to be of type T*. But both could of of 'var' type with nothing amiss;
> anything goes. And you need to make a distinction between the static
> 'var' type of something, and the dynamic type it contains. Would a var*
> type be allowed? Would var types be passed by value or by reference? Etc.

If B is var, A will point to the base of the var structure.

> Another can of worms inappropriate in a 'core' language.
>
> > #15 The concept of edit-and-continue compilation is mandatory.
>
> And this is where you're going to lose everyone who's made it this far.
> This is nothing to do with languages or compilers as they are normally
> understood.

No problem. I would say it's a division. I'm talking about dividing
things here into the before, and after camps.

Rick C. Hodgin

unread,

Jun 30, 2015, 8:15:13 AM6/30/15

to

u17 is not defined in Core C. u8, u16, u32, u64. If you go into
extensions of Core C which support a type of u17, then there would
be no difference between b17 and u17.

Typically b17 is needed as a bit-packed value, and u32 would be
used to manipulate it, and then it's stored back as b17.

In RDC I have defined a set of auto-expanding variables which
do that. They are always their fundamental type, but the upsize
in calculations to another size for normal use.

Bartc

unread,

Jun 30, 2015, 8:58:34 AM6/30/15

to

On 30/06/2015 13:13, Rick C. Hodgin wrote:
> On Tuesday, June 30, 2015 at 8:02:16 AM UTC-4, Bart wrote:

>> What is the range of bits:2? Is it 0 to 3, or can be it anything (-2 to
>> +1 for example)?

> All bit types are unsigned integers.

Suppose you want a signed 7-bit value?

>> How many bytes and/or bits will a [5] array of bits:7 take up?
>
> When defined as an array, they are bit-packed.

>> Just saying bit types and bitfields take a lot of support.

So an element of a 7-bit array could start at any offset into a byte,
and could cross a byte boundary (so most accesses will need to be 16-bit
ones). Also, a 7-bit array passed to a function, if you don't start from
element [0] but choose the slice starting at [3] for example, might not
be byte-aligned on the first element.

What about a 2D array of 7-bit values? It get's hairy!

(Which is why I limited bit-types in one language to 1, 2 or 4 bits; as
the sequence tidily continues 8, 16, 32, 64 (and then to 128 bits and up).

You don't find integer types of 43 bits for example. But the same
language is happy to provide bit operations, such as extracting a 23-bit
field from a 32- or 64-bit int.

For accessing fields that cross boundaries, this is a bitfield pointer
type that is supposed to be capable of that: accessing fields 1-64 bits
at any byte address and starting at any bit offset. But you then get
into questions of bit-order and byte-order. Beyond that, there is a
separate bit-slice pointer where you can have a bit (or bit:2 or bit:4)
sequence of any length.

If it gets complicated enough that you have to think twice about what
things mean, then it might be too advanced for a core language.

I would advise limiting bit ops to 1, 2 or 4 bits, or even 1 bit, and
limiting them to use as array elements. (C already has bitfields for
structs, with limited capabilities.) And providing bit/bitfield
operations instead.

>>> #12 Two new arbitrary size variable type are introduced: bi, bfp
>>> (bi=big integer, bfp=big floating point)
>>
>> Not exactly part of a core language. You would expect a core language to
>> implement features like this.
>
> Would or wouldn't expect a core language to implement a feature like
> this? Core C defines it.

No, I mean a core language would be used to implement them in a language
or extension beyond the core language. How many times are big integers
needed anyway? There are rather specialised. Better is built-in support
for 128- and 256-bits.

>>> #13 A new weakly typed variable type is introduced: var
>>
>> And this is completely misplaced. I assume this would be dynamically
>> typed? Then it would make nonsense of whatever type system you're trying
>> to impose.
>
> It is based on usage:
>
> var x;
>
> x = 5; // assumed s32
> x = "rick"; // assumed s8[]

So characters are signed; why?

>> Example, in A=&B; you would expect that if B is of type T, then A ought
>> to be of type T*. But both could of of 'var' type with nothing amiss;
>> anything goes. And you need to make a distinction between the static
>> 'var' type of something, and the dynamic type it contains. Would a var*
>> type be allowed? Would var types be passed by value or by reference? Etc.
>
> If B is var, A will point to the base of the var structure.

And the type of A will be what? (apart from 'var'). Don't forget the
compiler may not know what type it contains. Try this:

var A, B;

B = 632;
A = &B;

will A contain a var* type or a s32* type?

I think people programming at this level will have trouble dealing with
code such as:

var A;
A = &A; // or:
A = *A; // or:
A = A->month; // or:
A = A.month; // or:
A = strlen(A);

with all equally valid as far as the compiler is concerned.

--
Bartc

Ben Bacarisse

unread,

Jun 30, 2015, 9:03:11 AM6/30/15

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> We can define Core C here if you'd like. :-) I appoint myself chair,
> and the rest of us a "committee of the whole" to address the issue.
>
> #1 Little endian.
> #2 Two's complement.
> #3 All pointers are the same size per target.
> #4 All memory is executable.
> #5 All memory is read/write unless explicitly flagged as read-only.
> #6 No default support for char, short, int, long, long long, etc.,
> but these keywords are reserved for use with typedef or #define
> statements to map to the appropriate target for the machine.
> #7 Signed variable types: s8, s16, s32, s64
> #8 Unsigned variable types: u8, u16, u32, u64
> #9 Floating point types all IEEE 754-2008 compliant: f32, f64

This is, basically, C on an Intel chip. My guess is that all the other
parts of the C standard that you've left undefined or implementation
defined are just parts you don't yet know about. Would it be fair to
assume that everywhere the C standard does not mandate the behaviour you
are used to, you will just go on adding #3a, #3b, #7a etc. until it's
all pinned down to match your parochial expectations?

So "core C" is a new language, surprising like your pet language -- the
one you've not yet been able to specify or implement? That not a good
name for something the much bigger than C. I would wholly support your
plan if you picked another name for it like, like... I don't
know... RDC, perhaps?

<snip>
--
Ben.

Rick C. Hodgin

unread,

Jun 30, 2015, 9:11:40 AM6/30/15

to

On Tuesday, June 30, 2015 at 8:58:34 AM UTC-4, Bart wrote:
> On 30/06/2015 13:13, Rick C. Hodgin wrote:
> > On Tuesday, June 30, 2015 at 8:02:16 AM UTC-4, Bart wrote:
>
> >> What is the range of bits:2? Is it 0 to 3, or can be it anything (-2 to
> >> +1 for example)?
>
> > All bit types are unsigned integers.
>
> Suppose you want a signed 7-bit value?

Two's complement. Use an s8 value and store as b7 when done.

> >> How many bytes and/or bits will a [5] array of bits:7 take up?
> > When defined as an array, they are bit-packed.
> >> Just saying bit types and bitfields take a lot of support.
>
> So an element of a 7-bit array could start at any offset into a byte,
> and could cross a byte boundary (so most accesses will need to be 16-bit
> ones). Also, a 7-bit array passed to a function, if you don't start from
> element [0] but choose the slice starting at [3] for example, might not
> be byte-aligned on the first element.

An array of bit variables cannot be passed except by pointer to
element 0.

> What about a 2D array of 7-bit values? It get's hairy!

The 2nd dimension will append to the first. The third onto
that, etc. It's just math.

> (Which is why I limited bit-types in one language to 1, 2 or 4 bits; as
> the sequence tidily continues 8, 16, 32, 64 (and then to 128 bits and up).
>
> You don't find integer types of 43 bits for example. But the same
> language is happy to provide bit operations, such as extracting a 23-bit
> field from a 32- or 64-bit int.
>
> For accessing fields that cross boundaries, this is a bitfield pointer

Bitfield pointers do not exist in Core C. Bits are designed to be
accessed as values, stored back out as values. If you want to have
a particular group of bits being manipulated in some way, put them
in a struct and pass a pointer to the struct around and access them
there.

> type that is supposed to be capable of that: accessing fields 1-64 bits
> at any byte address and starting at any bit offset. But you then get
> into questions of bit-order and byte-order. Beyond that, there is a
> separate bit-slice pointer where you can have a bit (or bit:2 or bit:4)
> sequence of any length.
>
> If it gets complicated enough that you have to think twice about what
> things mean, then it might be too advanced for a core language.
>
> I would advise limiting bit ops to 1, 2 or 4 bits, or even 1 bit, and
> limiting them to use as array elements. (C already has bitfields for
> structs, with limited capabilities.) And providing bit/bitfield
> operations instead.
>
> >>> #12 Two new arbitrary size variable type are introduced: bi, bfp
> >>> (bi=big integer, bfp=big floating point)
> >>
> >> Not exactly part of a core language. You would expect a core language to
> >> implement features like this.
> >
> > Would or wouldn't expect a core language to implement a feature like
> > this? Core C defines it.
>
> No, I mean a core language would be used to implement them in a language
> or extension beyond the core language. How many times are big integers
> needed anyway? There are rather specialised. Better is built-in support
> for 128- and 256-bits.

They will be defined as part of Core C as there are many well-mature
libraries out there which support them. The expense is minimal, and
they are another tool in the arsenal.

> >>> #13 A new weakly typed variable type is introduced: var
> >>
> >> And this is completely misplaced. I assume this would be dynamically
> >> typed? Then it would make nonsense of whatever type system you're trying
> >> to impose.
> >
> > It is based on usage:
> >
> > var x;
> >
> > x = 5; // assumed s32
> > x = "rick"; // assumed s8[]
>
> So characters are signed; why?
>
> >> Example, in A=&B; you would expect that if B is of type T, then A ought
> >> to be of type T*. But both could of of 'var' type with nothing amiss;
> >> anything goes. And you need to make a distinction between the static
> >> 'var' type of something, and the dynamic type it contains. Would a var*
> >> type be allowed? Would var types be passed by value or by reference? Etc.
> >
> > If B is var, A will point to the base of the var structure.
>
> And the type of A will be what? (apart from 'var'). Don't forget the
> compiler may not know what type it contains. Try this:
>
> var A, B;
>
> B = 632;
> A = &B;
>
> will A contain a var* type or a s32* type?

A would be a var, whose type is var*, which points to B. If you
were to reference A after that, you would be referencing and
manipulating B's s32 value.

struct SVar
{
s32 type;
union {
// Whole bunch of values from s8..s64, u8..u64, f32, f64,
// As well as pointers to fundamental types.
// For s8[] data, it would have an SDatum struct, which
// holds a pointer, and length.
};
};

> I think people programming at this level will have trouble dealing with
> code such as:
>
> var A;
> A = &A; // or:
> A = *A; // or:
> A = A->month; // or:
> A = A.month; // or:
> A = strlen(A);
>
> with all equally valid as far as the compiler is concerned.

Using a weakly typed variable comes at performance expense, but it
can also be fantastic for certain data manipulations. I think it
will be good.

Rick C. Hodgin

unread,

Jun 30, 2015, 9:16:35 AM6/30/15

to

I think you're quite solidly in the "before" camp, Ben. So, I'll just
leave that there. :-)

Ben Bacarisse

unread,

Jun 30, 2015, 9:17:20 AM6/30/15

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> On Tuesday, June 30, 2015 at 8:02:16 AM UTC-4, Bart wrote:
>> On 30/06/2015 12:27, Rick C. Hodgin wrote:

<snip>

>> > #12 Two new arbitrary size variable type are introduced: bi, bfp
>> > (bi=big integer, bfp=big floating point)
>>
>> Not exactly part of a core language. You would expect a core language to
>> implement features like this.
>
> Would or wouldn't expect a core language to implement a feature like
> this? Core C defines it.

I think BartC means that truly "core" language -- one that's widely
implemented and can be used to write fast, portable code (a bit C, for
example), would be used to provide such features for languages like RDC
(and Haskell and Python and so on).

Interesting that you don't have a br type (big rational), and whist I
find the topic of bfp very interesting, I am guessing you don't know
what you mean by it yet.

<snip>
--
Ben.

Ben Bacarisse

unread,

Jun 30, 2015, 9:19:59 AM6/30/15

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> The thing we're creating is called "Core C" and the body responsible
> for it will be the "Core C Group".

I think you misspoke. You are trying to define your language RDC and
since you don't know much about programming langage design you like
comp.lang.c to help out.

<snip>
--
Ben.

Rick C. Hodgin

unread,

Jun 30, 2015, 9:25:47 AM6/30/15

to

I get that you don't like me, Ben. And that' fine. Hate away all
you want to. Be insulting and mean. It will bring you nothing but
pain and torment though, and I advise against it.

I have come here to work on Core C. It is different than RDC.
RDC doesn't define a virtual machine. RDC has ports and other
data types and abilities which are not defined here, such as
line-if statements, and full-if statements.

I am posting here to get help with Core C. If there's no interest
it will die quickly. If there is interest, then it will move on
its own -- your non-help notwithstanding.

David Brown

unread,

Jun 30, 2015, 9:47:07 AM6/30/15

to

On 30/06/15 13:27, Rick C. Hodgin wrote:
> We can define Core C here if you'd like. :-) I appoint myself chair,
> and the rest of us a "committee of the whole" to address the issue.

That would be the "royal we"? This is entirely /your/ thoughts on a
language. Others (like me) might comment, but don't pretend that any
one other than /you/ thinks this is a useful idea in practice.

However, it can always be fun to think about what would make up a
"perfect" C language, from an almost entirely subjective viewpoint.

>
> #1 Little endian.

There are far too many big endian systems around for this to be defined
as "core". I'd be happy to say it should support big endian and little
endian systems, but not mixed endian systems.

More usefully, I would like to see type/variable qualifiers for
specifying endianness, which would eliminate all issues that different
endianness causes today (e.g., when specifying a binary file format, you
would explicitly give the endianness of the fields).

The bit-endianness (order of bitfields) would also need to be specified.

> #2 Two's complement.

I'm OK with that.

> #3 All pointers are the same size per target.

I don't see the point of this restriction, and it would cause limitation
on real-world microcontrollers.

> #4 All memory is executable.

Nope - that's a terrible idea. C does not currently distinguish between
types of memory (code, data, bss and stack sections are all
implementation details, not part of the standards). But if you want to
make memory areas part of "Core C", then you definitely want
restrictions on executable parts.

And for many real-world microcontrollers, only some of the memory can be
executable. If "Core C" is to have a point, it must be suitable for a
range of "normal" systems, even if it excludes odd or outdated systems.

> #5 All memory is read/write unless explicitly flagged as read-only.

Again, that's a terrible idea. And it would not work on
microcontrollers, which have read-only memory regardless of what the
user might try to do with flagging.

> #6 No default support for char, short, int, long, long long, etc.,
> but these keywords are reserved for use with typedef or #define
> statements to map to the appropriate target for the machine.

That's going to upset a lot of people...

> #7 Signed variable types: s8, s16, s32, s64

No, these are called int8_t, int16_t, int32_t and int64_t. Don't change
these to non-standard names that could easily be mixed up, just to save
a couple of keystrokes.

> #8 Unsigned variable types: u8, u16, u32, u64

Again, these are uint8_t, uint16_t, uint32_t and uint64_t.

Note that some systems (in particular, some DSP's) have minimum 16-bit
or 32-bit accesses. I'm okay with these not being fully compatible with
"Core C", but you need to be aware of them.

> #9 Floating point types all IEEE 754-2008 compliant: f32, f64

That would be float32_t and float64_t.

> #10 A new "flag" variable type is introduced which is 1-bit,
> assigned with yes/true/up and no/false/down keywords.

That would be "bool", with "true" and "false" values. It would be fine
to make those keywords (just like in C++), and save #include'ing <stdbool.h>

> #11 A new "bits" variable is introduce which is n-bits, where:
> 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
> checked on use with inquiries being thrown (see #16 below).

That is not in the spirit of C - and certainly not in a "Core C".

> #12 Two new arbitrary size variable type are introduced: bi, bfp
> (bi=big integer, bfp=big floating point)

That is not in the spirit of C - and certainly not in a "Core C".

> #13 A new weakly typed variable type is introduced: var

That is not in the spirit of C - and certainly not in a "Core C".

> #14 The basic class is introduced.

That is not in the spirit of C - and certainly not in a "Core C".

If you want to invent a language that is half-way towards C++, and call
it "Core C+", that's fine - but it's a different language.

> #15 The concept of edit-and-continue compilation is mandatory.

That has no place in a language. Even though you like
edit-and-continue, you must surely see that!

> #16 A new exception framework is introduced based on inquiries
> rather than errors. These allow the machine to suspend for
> edit-and-continue guidance when an "error" condition arises,
> which will now be called an "inquiry" condition.

That is not in the spirit of C - and certainly not in a "Core C".

> #17 Flow control blocks are introduced.

C already has plenty of flow control keywords - it doesn't need any more.

> #18 Adhoc functions are introduced.

I'd be fine with local nested functions - but call them local functions
or nested functions, since that is what they are. "Adhoc" means
something completely different from the way you use it (look the word up
in a dictionary).

> #19 The #marker is introduced (for named return destinations, as in:
> returnto markerName).

Nope.

> #20 The cask is introduced allowing for arbitrary code injection,
> and calling protocols on logical tests to meta (true) and
> mefa (false), as well as message passing using mema.

Nope.

"casks" have no place in C of any sort, "core" or otherwise. They may
be part of RDC, but that's because RDC is a different sort of language.

> #21 A new self-documenting protocol is introduced, which allows
> the software to be documented in source code (similar to Java's).

doxygen already exists, and is very popular. You can't mandate
documentation as part of a programming language.

> #22 A new metadata set of tags is made available on a line-by-line
> basis, or at the function / struct level. These explicitly
> tag source code lines to be of a particular type, making a new
> type of search and/or replace possible (such as /* md: a, b, c */).

That makes almost no sense at all - and if it is nothing more than an
attempt to standardise comments, then forget it. Comments are entirely
at the discretion of the programmer - that's the nature of comments.

> #23 All standard libraries are available and #include'd by default.
> They must be explicitly excluded if they are not needed.

Nope, bad idea - polluting the namespace and making compilation far
slower and more complicated. I would have nothing against a bit of
restructuring, simplifying, and improving the standard library - but it
should not be included automatically.

> #24 All functions are linked to (1) a bare minimum for standard
> libraries, (2) all are included for developer code; unless
> otherwise directed by compiler/linker flags.
>

Terrible idea.

Bartc

unread,

Jun 30, 2015, 10:11:19 AM6/30/15

to

On 30/06/2015 14:11, Rick C. Hodgin wrote:
> On Tuesday, June 30, 2015 at 8:58:34 AM UTC-4, Bart wrote:
>> On 30/06/2015 13:13, Rick C. Hodgin wrote:
>>> On Tuesday, June 30, 2015 at 8:02:16 AM UTC-4, Bart wrote:
>>
>>>> What is the range of bits:2? Is it 0 to 3, or can be it anything (-2 to
>>>> +1 for example)?
>>
>>> All bit types are unsigned integers.
>>
>> Suppose you want a signed 7-bit value?
>
> Two's complement. Use an s8 value and store as b7 when done.

Not elegant. You might as well go for ubit:N and sbit:N while you're at it.

>> So an element of a 7-bit array could start at any offset into a byte,
>> and could cross a byte boundary (so most accesses will need to be 16-bit
>> ones). Also, a 7-bit array passed to a function, if you don't start from
>> element [0] but choose the slice starting at [3] for example, might not
>> be byte-aligned on the first element.
>
> An array of bit variables cannot be passed except by pointer to
> element 0.
>
>> What about a 2D array of 7-bit values? It get's hairy!
>
> The 2nd dimension will append to the first. The third onto
> that, etc. It's just math.

Then some rows of the array won't be aligned to the start of a byte.
This can cause problems when passing one of those rows, even from
element[0] of that row, to a function if the mechanism assumes the
pointer will be aligned.

In fact, each row will start at a different bit position from most of
the other rows.

>> (Which is why I limited bit-types in one language to 1, 2 or 4 bits; as
>> the sequence tidily continues 8, 16, 32, 64 (and then to 128 bits and up).
>>
>> You don't find integer types of 43 bits for example. But the same
>> language is happy to provide bit operations, such as extracting a 23-bit
>> field from a 32- or 64-bit int.
>>
>> For accessing fields that cross boundaries, this is a bitfield pointer
>
> Bitfield pointers do not exist in Core C. Bits are designed to be
> accessed as values, stored back out as values. If you want to have
> a particular group of bits being manipulated in some way, put them
> in a struct and pass a pointer to the struct around and access them
> there.

But if you are going to have arrays of bits, then it's going to be hard
to manage without using bit pointers. Or are you going to pass a
1000000-element array of bit:3 types by value?

>> No, I mean a core language would be used to implement them in a language
>> or extension beyond the core language. How many times are big integers
>> needed anyway? There are rather specialised. Better is built-in support
>> for 128- and 256-bits.
>
> They will be defined as part of Core C as there are many well-mature
> libraries

That's part of the problem; which library will you choose to be
standard? The authors of the other libraries might be put out!

>>>> Example, in A=&B; you would expect that if B is of type T, then A ought
>>>> to be of type T*. But both could of of 'var' type with nothing amiss;
>>>> anything goes. And you need to make a distinction between the static
>>>> 'var' type of something, and the dynamic type it contains. Would a var*
>>>> type be allowed? Would var types be passed by value or by reference? Etc.
>>>
>>> If B is var, A will point to the base of the var structure.
>>
>> And the type of A will be what? (apart from 'var'). Don't forget the
>> compiler may not know what type it contains. Try this:
>>
>> var A, B;
>>
>> B = 632;
>> A = &B;
>>
>> will A contain a var* type or a s32* type?
>
> A would be a var, whose type is var*, which points to B. If you
> were to reference A after that, you would be referencing and
> manipulating B's s32 value.

That's not clear. What does *A give you? Would you need **A to get the
s32 value?

Suppose you had a function taking an s32* parameter, and you wanted to
pass B to it so that it could modify the s32 value inside B; what would
you pass? Would you need to insert run-time checks to ensure B contains
the right type? Could you pass either A or B to it (with A containing &B)?

> struct SVar
> {
> s32 type;
> union {
> // Whole bunch of values from s8..s64, u8..u64, f32, f64,
> // As well as pointers to fundamental types.
> // For s8[] data, it would have an SDatum struct, which
> // holds a pointer, and length.
> };
> };

Sorry, this doesn't look right. You are going to end up with a struct
which is likely to be 16 bytes (maybe 12 bytes if you put the values
first, unless it's padded anyway), to store a single u8 value; why?

If you want the particular semantics associated with u8 (so 255+1 gives
you 0 rather than 256), there must be better ways of doing that. (In C,
255+1 will give 256 even using uint8_t unless casts and such are used.)

Inside variants you want a simpler set of numeric types.

You might instead look at ways of having flexible strings and arrays,
which could be a better fit into a core language which is a little
higher level than C.

> Using a weakly typed variable comes at performance expense, but it
> can also be fantastic for certain data manipulations. I think it
> will be good.

If performance doesn't matter then people will just use a more apt
language for that. Mixing static and dynamic types is difficult.

--
Bartc

Ben Bacarisse

unread,

Jun 30, 2015, 10:48:33 AM6/30/15

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> On Tuesday, June 30, 2015 at 9:19:59 AM UTC-4, Ben Bacarisse wrote:
>> "Rick C. Hodgin" <rick.c...@gmail.com> writes:
>>
>> > The thing we're creating is called "Core C" and the body responsible
>> > for it will be the "Core C Group".
>>
>> I think you misspoke. You are trying to define your language RDC and
>> since you don't know much about programming langage design you like
>> comp.lang.c to help out.
<snip>
>

> I get that you don't like me, Ben. And that' fine. Hate away all
> you want to.

I hate your ideas. They are bad ideas. The world does not need another
half-baked language from the 90s.

<snip>
--
Ben.

Rick C. Hodgin

unread,

Jun 30, 2015, 10:50:28 AM6/30/15

to

On Tuesday, June 30, 2015 at 10:11:19 AM UTC-4, Bart wrote:
> On 30/06/2015 14:11, Rick C. Hodgin wrote:
> > On Tuesday, June 30, 2015 at 8:58:34 AM UTC-4, Bart wrote:
> >> On 30/06/2015 13:13, Rick C. Hodgin wrote:
> >>> On Tuesday, June 30, 2015 at 8:02:16 AM UTC-4, Bart wrote:
> >>
> >>>> What is the range of bits:2? Is it 0 to 3, or can be it anything (-2 to
> >>>> +1 for example)?
> >>
> >>> All bit types are unsigned integers.
> >>
> >> Suppose you want a signed 7-bit value?
> >
> > Two's complement. Use an s8 value and store as b7 when done.
>
> Not elegant. You might as well go for ubit:N and sbit:N while you're at it.

I agree. Alright. We won't limit it at all. We'll handle them as
structs, allowing for any type. The syntax will be b9:type if you
need it to some specific type, like b9:s32. That will also help with
ranging, which I could also be specified here using bN:type:rangemethod.

> >> So an element of a 7-bit array could start at any offset into a byte,
> >> and could cross a byte boundary (so most accesses will need to be 16-bit
> >> ones). Also, a 7-bit array passed to a function, if you don't start from
> >> element [0] but choose the slice starting at [3] for example, might not
> >> be byte-aligned on the first element.
> >
> > An array of bit variables cannot be passed except by pointer to
> > element 0.
> >
> >> What about a 2D array of 7-bit values? It get's hairy!
> >
> > The 2nd dimension will append to the first. The third onto
> > that, etc. It's just math.
>
> Then some rows of the array won't be aligned to the start of a byte.

Correct.

> This can cause problems when passing one of those rows, even from
> element[0] of that row, to a function if the mechanism assumes the
> pointer will be aligned.

You can't pass one of those rows. If you have a bit array, only the
entire thing can be passed, and you access it by referencing the
elements within.

> In fact, each row will start at a different bit position from most of
> the other rows.
>
> >> (Which is why I limited bit-types in one language to 1, 2 or 4 bits; as
> >> the sequence tidily continues 8, 16, 32, 64 (and then to 128 bits and up).
> >>
> >> You don't find integer types of 43 bits for example. But the same
> >> language is happy to provide bit operations, such as extracting a 23-bit
> >> field from a 32- or 64-bit int.
> >>
> >> For accessing fields that cross boundaries, this is a bitfield pointer
> >
> > Bitfield pointers do not exist in Core C. Bits are designed to be
> > accessed as values, stored back out as values. If you want to have
> > a particular group of bits being manipulated in some way, put them
> > in a struct and pass a pointer to the struct around and access them
> > there.
>
> But if you are going to have arrays of bits, then it's going to be hard
> to manage without using bit pointers. Or are you going to pass a
> 1000000-element array of bit:3 types by value?

You would pass a pointer to the start of the array, and an offset to
the element within.

> >> No, I mean a core language would be used to implement them in a language
> >> or extension beyond the core language. How many times are big integers
> >> needed anyway? There are rather specialised. Better is built-in support
> >> for 128- and 256-bits.
> >
> > They will be defined as part of Core C as there are many well-mature
> > libraries
>
> That's part of the problem; which library will you choose to be
> standard? The authors of the other libraries might be put out!

It would be defined. Discussed. Debated. And ultimately approved.
But I think if a function is known to C, and it's referenced, there
should not need to be an #include for it. It is known to the C libs
that are provided for by Core C, or by the implementation extensions
to Core C, and they should automatically be known.

> >>>> Example, in A=&B; you would expect that if B is of type T, then A ought
> >>>> to be of type T*. But both could of of 'var' type with nothing amiss;
> >>>> anything goes. And you need to make a distinction between the static
> >>>> 'var' type of something, and the dynamic type it contains. Would a var*
> >>>> type be allowed? Would var types be passed by value or by reference? Etc.
> >>>
> >>> If B is var, A will point to the base of the var structure.
> >>
> >> And the type of A will be what? (apart from 'var'). Don't forget the
> >> compiler may not know what type it contains. Try this:
> >>
> >> var A, B;
> >>
> >> B = 632;
> >> A = &B;
> >>
> >> will A contain a var* type or a s32* type?
> >
> > A would be a var, whose type is var*, which points to B. If you
> > were to reference A after that, you would be referencing and
> > manipulating B's s32 value.
>
> That's not clear. What does *A give you? Would you need **A to get the
> s32 value?

When you refer to A, it would resolve for you automatically to the
value pointed to in B.

> Suppose you had a function taking an s32* parameter, and you wanted to
> pass B to it so that it could modify the s32 value inside B; what would
> you pass? Would you need to insert run-time checks to ensure B contains
> the right type? Could you pass either A or B to it (with A containing &B)?

I think I'll define a default data item for each var type to be
something like d, so you would reference A.d for the data item,
and in that way you could pass it as an s32* i B was holding an
s32.

> > struct SVar
> > {
> > s32 type;
> > union {
> > // Whole bunch of values from s8..s64, u8..u64, f32, f64,
> > // As well as pointers to fundamental types.
> > // For s8[] data, it would have an SDatum struct, which
> > // holds a pointer, and length.
> > };
> > };
>
> Sorry, this doesn't look right. You are going to end up with a struct
> which is likely to be 16 bytes (maybe 12 bytes if you put the values
> first, unless it's padded anyway), to store a single u8 value; why?

Because it's a universal type. The same A you use in one moment to
store a u8 can later be used to store a double, etc. It requires no
additional allocation, and is available to be used in this weakly
typed form.

> If you want the particular semantics associated with u8 (so 255+1 gives
> you 0 rather than 256), there must be better ways of doing that. (In C,
> 255+1 will give 256 even using uint8_t unless casts and such are used.)
>
> Inside variants you want a simpler set of numeric types.
>
> You might instead look at ways of having flexible strings and arrays,
> which could be a better fit into a core language which is a little
> higher level than C.

I agree, but C is low level in terms of its strings and arrays. I
like that. But, we could provide standard library extensions which
allow for those features. I would like them a lot.

> > Using a weakly typed variable comes at performance expense, but it
> > can also be fantastic for certain data manipulations. I think it
> > will be good.
>
> If performance doesn't matter then people will just use a more apt
> language for that. Mixing static and dynamic types is difficult.

They're another tool in the toolbox. Nobody has to use them. But,
when needed, they're there (so you don't need to go to a higher
level language).

I appreciate your thoughtful feedback, Bart.

Rick C. Hodgin

unread,

Jun 30, 2015, 10:52:31 AM6/30/15

to

On Tuesday, June 30, 2015 at 10:48:33 AM UTC-4, Ben Bacarisse wrote:
> "Rick C. Hodgin" <rick.c...@gmail.com> writes:
>
> > On Tuesday, June 30, 2015 at 9:19:59 AM UTC-4, Ben Bacarisse wrote:
> >> "Rick C. Hodgin" <rick.c...@gmail.com> writes:
> >>
> >> > The thing we're creating is called "Core C" and the body responsible
> >> > for it will be the "Core C Group".
> >>
> >> I think you misspoke. You are trying to define your language RDC and
> >> since you don't know much about programming langage design you like
> >> comp.lang.c to help out.
> <snip>
> >
> > I get that you don't like me, Ben. And that' fine. Hate away all
> > you want to.
>
> I hate your ideas. They are bad ideas. The world does not need another
> half-baked language from the 90s.

Your voice has been heard. Thank you for your input. You have my
permission to not participate in these discussions and to never use
the software if you don't want to. No worries.

Rick C. Hodgin

unread,

Jun 30, 2015, 10:56:09 AM6/30/15

to

On Tuesday, June 30, 2015 at 9:47:07 AM UTC-4, David Brown wrote:
> [snip]

I am considering your post. Give me some time.

Rosario19

unread,

Jun 30, 2015, 11:24:43 AM6/30/15

to

On Tue, 30 Jun 2015 07:50:19 -0700 (PDT), "Rick C. Hodgin" wrote:
>On Tuesday, June 30, 2015 at 10:11:19 AM UTC-4, Bart wrote:
>> On 30/06/2015 14:11, Rick C. Hodgin wrote:
>> > On Tuesday, June 30, 2015 at 8:58:34 AM UTC-4, Bart wrote:
>> >> On 30/06/2015 13:13, Rick C. Hodgin wrote:
>> >>> On Tuesday, June 30, 2015 at 8:02:16 AM UTC-4, Bart wrote:
>> >>
>> >>>> What is the range of bits:2? Is it 0 to 3, or can be it anything (-2 to
>> >>>> +1 for example)?
>> >>
>> >>> All bit types are unsigned integers.
>> >>
>> >> Suppose you want a signed 7-bit value?

such basic type would not exist
if the problem is one array of 7 bit
possible better one array of u32 with functions that access to it for
make it appair as array of 7 bit

>> > Two's complement. Use an s8 value and store as b7 when done.
>>
>> Not elegant. You might as well go for ubit:N and sbit:N while you're at it.
>
>I agree. Alright. We won't limit it at all. We'll handle them as
>structs, allowing for any type. The syntax will be b9:type if you
>need it to some specific type, like b9:s32. That will also help with
>ranging, which I could also be specified here using bN:type:rangemethod.
>> >> So an element of a 7-bit array could start at any offset into a byte,
>> >> and could cross a byte boundary (so most accesses will need to be 16-bit
>> >> ones). Also, a 7-bit array passed to a function, if you don't start from
>> >> element [0] but choose the slice starting at [3] for example, might not
>> >> be byte-aligned on the first element.
>> >
>> > An array of bit variables cannot be passed except by pointer to
>> > element 0.
>> >
>> >> What about a 2D array of 7-bit values? It get's hairy!

ND array of element of type size x bit
is a matter of function for access one 1D array of u32 elements

Rick C. Hodgin

unread,

Jun 30, 2015, 11:28:12 AM6/30/15

to

On Tuesday, June 30, 2015 at 9:47:07 AM UTC-4, David Brown wrote:

> On 30/06/15 13:27, Rick C. Hodgin wrote:
> > #1 Little endian.
>
> There are far too many big endian systems around for this to be defined
> as "core". I'd be happy to say it should support big endian and little
> endian systems, but not mixed endian systems.

Alright. Both are allowed. Little endian will be the default.

> More usefully, I would like to see type/variable qualifiers for
> specifying endianness, which would eliminate all issues that different
> endianness causes today (e.g., when specifying a binary file format, you
> would explicitly give the endianness of the fields).

I do like this. Added. What should the syntax be? I have been fond
of adding "dot tags" to things, so s32.le for little endian, and
s32.be for big endian. They also allow additional tags, such as
read-only being s32.le.ro. And of course these can be typedef'd
or #define'd.

> The bit-endianness (order of bitfields) would also need to be
> specified.

Yes.

> > #2 Two's complement.
> I'm OK with that.
>
> > #3 All pointers are the same size per target.
> I don't see the point of this restriction, and it would cause
> limitation on real-world microcontrollers.

No limitations. They will be included in extension allowances to
Core C. But, they are not part of Core C.

> > #4 All memory is executable.
>
> Nope - that's a terrible idea. C does not currently distinguish between
> types of memory (code, data, bss and stack sections are all
> implementation details, not part of the standards). But if you want to
> make memory areas part of "Core C", then you definitely want
> restrictions on executable parts.
>
> And for many real-world microcontrollers, only some of the memory can be
> executable. If "Core C" is to have a point, it must be suitable for a
> range of "normal" systems, even if it excludes odd or outdated systems.

Only some memory on any system can be executable. However, in the
Core C virtual machine, all memory will be executable. And by
default, the Core C language would allow branching to any address.
It would be up to the OS to catch the error and shutdown the app.

> > #5 All memory is read/write unless explicitly flagged as read-only.
>
> Again, that's a terrible idea. And it would not work on
> microcontrollers, which have read-only memory regardless of
> what the user might try to do with flagging.

I disagree. Memory can be flagged as read-only. We can add a
block like:

readonly {
// Everything defined in here is in read-only memory
};

I don't anticipate this language being used for microcontrollers,
by the way. I anticipate it being used by people on desktop
machines, mobile devices, and other personal devices.

> > #6 No default support for char, short, int, long, long long, etc.,
> > but these keywords are reserved for use with typedef or #define
> > statements to map to the appropriate target for the machine.
>
> That's going to upset a lot of people...

I've had the idea of including a #pragma or compiler switch which,
for each version of Core C, wraps those values to their appropriate
values on tha version. But, by default we need to move away from
those generic concepts and get into specifics.

> > #7 Signed variable types: s8, s16, s32, s64
> No, these are called int8_t, int16_t, int32_t and int64_t. Don't change
> these to non-standard names that could easily be mixed up, just to save
> a couple of keystrokes.

Disagree. Both the name int8_t and uint8_t are silly. It can
be shortened, and I'm writing a language for the future, not
the past.

> > #8 Unsigned variable types: u8, u16, u32, u64
> Again, these are uint8_t, uint16_t, uint32_t and uint64_t.

Ibid.

> Note that some systems (in particular, some DSP's) have minimum 16-bit
> or 32-bit accesses. I'm okay with these not being fully compatible with
> "Core C", but you need to be aware of them.
>
> > #9 Floating point types all IEEE 754-2008 compliant: f32, f64
> That would be float32_t and float64_t.

Ibid.

> > #10 A new "flag" variable type is introduced which is 1-bit,
> > assigned with yes/true/up and no/false/down keywords.
>
> That would be "bool", with "true" and "false" values. It would be fine
> to make those keywords (just like in C++), and save #include'ing <stdbool.h>

bool is 1-byte as I understand it. flag variables are bits, packed
into scoped local or global space.

> > #11 A new "bits" variable is introduce which is n-bits, where:
> > 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
> > checked on use with inquiries being thrown (see #16 below).
> That is not in the spirit of C - and certainly not in a "Core C".

I've redefined it in subsequent messages. It's now bN, such as b17.
And you can specify a type tob e b17:s32, which means treat the
value in those 17 bits as an s32 for calculations. It will also
allow for simple signed and unsigned values using b17:s, b17:u.

> > #12 Two new arbitrary size variable type are introduced: bi, bfp
> > (bi=big integer, bfp=big floating point)
> That is not in the spirit of C - and certainly not in a "Core C".

It is part of Core C.

> > #13 A new weakly typed variable type is introduced: var
> That is not in the spirit of C - and certainly not in a "Core C".

It is part of Core C.

> > #14 The basic class is introduced.
> That is not in the spirit of C - and certainly not in a "Core C".

It is part of Core C.

> If you want to invent a language that is half-way towards C++, and call
> it "Core C+", that's fine - but it's a different language.

C needs the class. It is a minor addition with major scope. We
already have everything we need provided for in structs. It just
allows encapsulation and the nicer syntax usage.

It is part of Core C.

> > #15 The concept of edit-and-continue compilation is mandatory.
> That has no place in a language. Even though you like
> edit-and-continue, you must surely see that!

Disagree completely. And you even said previously that you could
see how it could have some use. For others it has exceeding use.
Apple even thought it had enough to add it to versions of GCC,
calling it "fix-and-continue." It's not a trivial thing. It's
major. And...

It's part of Core C.

> > #16 A new exception framework is introduced based on inquiries
> > rather than errors. These allow the machine to suspend for
> > edit-and-continue guidance when an "error" condition arises,
> > which will now be called an "inquiry" condition.
> That is not in the spirit of C - and certainly not in a "Core C".

It is part of Core C.

> > #17 Flow control blocks are introduced.
> C already has plenty of flow control keywords - it doesn't need any more.

It is part of Core C.

> > #18 Adhoc functions are introduced.
>
> I'd be fine with local nested functions - but call them local functions
> or nested functions, since that is what they are. "Adhoc" means
> something completely different from the way you use it (look the word up
> in a dictionary).

Adhocs relate to casks, which are now part of Core C. As such,
adhocs are part of Core C.

> > #19 The #marker is introduced (for named return destinations, as in:
> > returnto markerName).
> Nope.

It is part of Core C.

> > #20 The cask is introduced allowing for arbitrary code injection,
> > and calling protocols on logical tests to meta (true) and
> > mefa (false), as well as message passing using mema.
> Nope.

It is part of Core C.

> "casks" have no place in C of any sort, "core" or otherwise. They may
> be part of RDC, but that's because RDC is a different sort of language.

It is part of Core C.

> > #21 A new self-documenting protocol is introduced, which allows
> > the software to be documented in source code (similar to Java's).
> doxygen already exists, and is very popular. You can't mandate
> documentation as part of a programming language.

Mandated in Core C. We are looking to the future, not the past.
I may even use the doxygen code then as an import into Core C, but
it will be an explicitly defined and integral part.

> > #22 A new metadata set of tags is made available on a line-by-line
> > basis, or at the function / struct level. These explicitly
> > tag source code lines to be of a particular type, making a new
> > type of search and/or replace possible (such as /* md: a, b, c */).
>
> That makes almost no sense at all - and if it is nothing more than an
> attempt to standardise comments, then forget it. Comments are entirely
> at the discretion of the programmer - that's the nature of comments.

It will be parsed and known by the compiler, debugger, and the IDE.
It will provide utility which today has no specific form other than
text-based searches. Adding metadata abilities means sorting and
finding things far more easily.

> > #23 All standard libraries are available and #include'd by default.
> > They must be explicitly excluded if they are not needed.
> Nope, bad idea - polluting the namespace and making compilation far
> slower and more complicated. I would have nothing against a bit of
> restructuring, simplifying, and improving the standard library - but it
> should not be included automatically.

Disagree. I'll add a #pragma at the top which allows for some kind
of "undefine all standard #includes" so it can be done away with,
but I can't tell you how many hours over the years I've spent trying
to figure out which #include file I needed for some function I was
using. It's probably added up to a week or more over time.

I would spare all future users from that. And the experienced ones
will have no problems adding that one additional line to #undef
everything so they can manually define them one-by-one.

> > #24 All functions are linked to (1) a bare minimum for standard
> > libraries, (2) all are included for developer code; unless
> > otherwise directed by compiler/linker flags.
> Terrible idea.

Disagree. It is part of Core C. Compiler/linker flags will allow
for other settings.

Rosario19

unread,

Jun 30, 2015, 11:31:47 AM6/30/15

to

On 30 Jun 2015 07:54:13 GMT,(Jens Thoms Toerring) wrote:
>Rosario19 <R...@invalid.invalid> wrote:
>> "core C" already exist
>
>Yes?
>
>> it is the use of
>> "intxx_t" type that would make all operation +-/* all the same for
>> differents machine/compiler
>
>There's one drawback with the intN_t types: they're optional.
>Thus this only will work on systems were these typedefs
>actually exist. Systems with CHAR_BIT larger than 8 will
>have real problems supporting all or at least some of them.

in that sys would be a little more slow

>And the behaviour on overflow is still not defined for
>the intN_t types, so the result of a simple addition that
>overflows can be different on different systems.

overflow would be: as mod_n wighout UBs and something for see if there
is overflow for the last operation

>> it is one float of fixed size
>
>And which is that? There's, as far as I am aware, float,
>double and long double. But none of them have a fixed size.

double [float 64 bit] float [float 32 bit]

>> it is some macro value for aligned
>
>> it is a pointer linear of 32 bit
>> and a pointer linear of 64 bit
>
>No idea what you're talking about.

>> end
>
>> endianes of cpu for me has no rilevance
>> the only place that it count something is when integers are sent
>> code-decode
>
>Endianess is very relevant whenever you exchange data with
>a different system, either over the network or when reading
>a file thatwas created on a different system. And it not
>only concerns integers but also floating point numbers.

i used it only thru OS function when send trhu internet
never see other use

>> endianes not count for operators
>> 0x2 & 0x7
>> would be the same number for all computer whaterver endianess they
>> have, or am i wrong?
>
>Yes, but that requires that you have the data already read
>in correctly into memory.

yes

> Regards, Jens

Rosario19

unread,

Jun 30, 2015, 12:15:53 PM6/30/15

to

On Tue, 30 Jun 2015 17:31:41 +0200, Rosario19 wrote:
>On 30 Jun 2015 07:54:13 GMT,(Jens Thoms Toerring) wrote:
>>Rosario19 <R...@invalid.invalid> wrote:
>>> "core C" already exist
>>
>>Yes?
>>
>>> it is the use of
>>> "intxx_t" type that would make all operation +-/* all the same for
>>> differents machine/compiler
>>
>>There's one drawback with the intN_t types: they're optional.
>>Thus this only will work on systems were these typedefs
>>actually exist. Systems with CHAR_BIT larger than 8 will
>>have real problems supporting all or at least some of them.
>
>in that sys would be a little more slow
>
>>And the behaviour on overflow is still not defined for
>>the intN_t types, so the result of a simple addition that
>>overflows can be different on different systems.
>
>overflow would be: as mod_n wighout UBs and something for see if there
>is overflow for the last operation

possible something more rivolutionary...
u32 has 0xFFFFFFFF as error element

so if "u32 n1, n2;"

n1 b 0xFFFFFFFF=n2 b 0xFFFFFFFF=0xFFFFFFFF the error element
n1 b n2 =0xFFFFFFFF if operation b overflow the result

where b is a binary operator

operator 0xFFFFFFFF = 0xFFFFFFFF

so no UB, overflow detected in the result even if long operations

a+b+c*f=m

if m=0xFFFFFFFF

some error or overflow somewhere

the same for integer, where the overflow element is 0x80000000
the same for double or float where the overflo element would be
doubleMax and floatMax

but i don't know if all that is ok

there is a problem for big number in array of u32 [or u8 or u16]
c=a*b has to be mod_FFFFFFFF for build the big number [using in some
way overflow too of that operation]
this just above not allow that

so in the end i prefer all mod_n and something for detec overflow

a=b+c; v=ov();

where ov() calculate overflow last statement

Bartc

unread,

Jun 30, 2015, 12:45:08 PM6/30/15

to

On 30/06/2015 15:50, Rick C. Hodgin wrote:
> On Tuesday, June 30, 2015 at 10:11:19 AM UTC-4, Bart wrote:

>>> Two's complement. Use an s8 value and store as b7 when done.
>>
>> Not elegant. You might as well go for ubit:N and sbit:N while you're at it.
>
> I agree. Alright. We won't limit it at all. We'll handle them as
> structs, allowing for any type. The syntax will be b9:type if you
> need it to some specific type, like b9:s32. That will also help with
> ranging, which I could also be specified here using bN:type:rangemethod.

You've already got syntax sN and uN for signed/unsigned ints of N bits,
and mixing it up with the syntax bN for signed/unsigned of N bits!

Either combine the two, so using s9 for your example, or keep them
separate as sbits:9 or sbits9.

>> This can cause problems when passing one of those rows, even from
>> element[0] of that row, to a function if the mechanism assumes the
>> pointer will be aligned.
>
> You can't pass one of those rows. If you have a bit array, only the
> entire thing can be passed, and you access it by referencing the
> elements within.

Suppose someone creates a 2D array by having an array of bit arrays (as
is done in C anyway). Are you saying that you are now restricted in what
you can do with a row of that array-of-bit-arrays?

>> But if you are going to have arrays of bits, then it's going to be hard
>> to manage without using bit pointers. Or are you going to pass a
>> 1000000-element array of bit:3 types by value?
>
> You would pass a pointer to the start of the array, and an offset to
> the element within.

Presumably this (pointer,offset) combo is the same size overall as the
other pointers in the language? Or would you make an exception, or not
call it a pointer, or would require two values to be passed?

>> Sorry, this doesn't look right. You are going to end up with a struct
>> which is likely to be 16 bytes (maybe 12 bytes if you put the values
>> first, unless it's padded anyway), to store a single u8 value; why?
>
> Because it's a universal type. The same A you use in one moment to
> store a u8 can later be used to store a double, etc. It requires no
> additional allocation, and is available to be used in this weakly
> typed form.

But you gain nothing from storing an 8-bit value int instead of 32 or 64
bits.

And since this has now all got to be sorted out at runtime, it means
that to execute:

A+B

when each operand might be s8 s16 s32 s64 u8 u16 u32 u64 f32 f64 (to say
nothing of big ints and bit floatS), that means up to 100 different
combinations to consider. Per operator. Plus all the illegal
combinations to be trapped (array plus a struct for example).

Small types such aa 8-bit values are only of use as storage types, to
save space when you have an array of them, or to craft a particular
shape of struct, or for forming strings. They are are of little use as
standalone variables, let alone inside a variant type.

--
Bartc

Rick C. Hodgin

unread,

Jun 30, 2015, 1:04:09 PM6/30/15

to

On Tuesday, June 30, 2015 at 12:45:08 PM UTC-4, Bart wrote:
> On 30/06/2015 15:50, Rick C. Hodgin wrote:
> > On Tuesday, June 30, 2015 at 10:11:19 AM UTC-4, Bart wrote:
>
> >>> Two's complement. Use an s8 value and store as b7 when done.
> >>
> >> Not elegant. You might as well go for ubit:N and sbit:N while you're at it.
> >
> > I agree. Alright. We won't limit it at all. We'll handle them as
> > structs, allowing for any type. The syntax will be b9:type if you
> > need it to some specific type, like b9:s32. That will also help with
> > ranging, which I could also be specified here using bN:type:rangemethod.
>
> You've already got syntax sN and uN for signed/unsigned ints of N bits,
> and mixing it up with the syntax bN for signed/unsigned of N bits!
>
> Either combine the two, so using s9 for your example, or keep them
> separate as sbits:9 or sbits9.

There's a problem with using sN or uN as they don't relate to another
type. They need to be identified as the wieldable type in calculations,
only being transformed back into their target storage type when stored
back.

It's like the bit structs:

struct SWhatever
{
char field1 : 5;
int field2 : 20;
};

In Core C it would be:

b5:s8 field1;
b20:s32 field2;

And we could set it up so it's auto-upsizing to the appropriate value.
In that case it would only be:

b5:s field1; // Auto-upsized in calculations to s8
b20:s field2; // Auto-upsized in calculations to s32

> >> This can cause problems when passing one of those rows, even from
> >> element[0] of that row, to a function if the mechanism assumes the
> >> pointer will be aligned.
> >
> > You can't pass one of those rows. If you have a bit array, only the
> > entire thing can be passed, and you access it by referencing the
> > elements within.
>
> Suppose someone creates a 2D array by having an array of bit arrays (as
> is done in C anyway). Are you saying that you are now restricted in what
> you can do with a row of that array-of-bit-arrays?

You can do whatever you want to the data by casting it to another form.
By going through the bit array protocols, you will need a pointer to
the bits, and the offset to either read content, or write content, to
that element.

b5 array[200];

// Manual population
array[5] = 9;

// Or:
set_array(array, 5, 9);

void set_array(b5 array[], int element, int value)
{
array[element] = value;

}

> >> But if you are going to have arrays of bits, then it's going to be hard
> >> to manage without using bit pointers. Or are you going to pass a
> >> 1000000-element array of bit:3 types by value?
> >
> > You would pass a pointer to the start of the array, and an offset to
> > the element within.
>
> Presumably this (pointer,offset) combo is the same size overall as the
> other pointers in the language? Or would you make an exception, or not
> call it a pointer, or would require two values to be passed?

Same size. And the array passed would be more like b5& than b5* so
there would be no null checking. And we can add bounds settings to
it so there is a scope that must be adhered to with element
references.

> >> Sorry, this doesn't look right. You are going to end up with a struct
> >> which is likely to be 16 bytes (maybe 12 bytes if you put the values
> >> first, unless it's padded anyway), to store a single u8 value; why?
> >
> > Because it's a universal type. The same A you use in one moment to
> > store a u8 can later be used to store a double, etc. It requires no
> > additional allocation, and is available to be used in this weakly
> > typed form.
>
> But you gain nothing from storing an 8-bit value int instead of 32 or 64
> bits.

Except that when you go to pass A to some function, it is converted to
"char" (s8) rather than "int" (s32) or "long long" (s64).

It has proper context for follow-on use based on how it was stored.

> And since this has now all got to be sorted out at runtime, it means
> that to execute:
>
> A+B
>
> when each operand might be s8 s16 s32 s64 u8 u16 u32 u64 f32 f64 (to say
> nothing of big ints and bit floatS), that means up to 100 different
> combinations to consider. Per operator. Plus all the illegal
> combinations to be trapped (array plus a struct for example).

Yes. There will be combinations written to support every combination.
I do this presently in Visual FreePro for all of the types there are
weakly typed. It is the only variable type that is supported.

> Small types such aa 8-bit values are only of use as storage types, to
> save space when you have an array of them, or to craft a particular
> shape of struct, or for forming strings. They are are of little use as
> standalone variables, let alone inside a variant type.

You don't have to use it, Bart. It is a tool that has validity and
can be useful. In XBASE languages there is a function called TYPE()
used for this:

if (type(A) == _S8) { ... }

And we can add these:

if (isNumeric(A))
if (isFloatingPoint(A))
if (isInteger(A))

Et cetera.

Rick C. Hodgin

unread,

Jun 30, 2015, 3:19:37 PM6/30/15

to

I have defined something similar in RDC and Visual FreePro. It is
a cask, and it allows injection at any point to identify the range
values expected. It performs that test, and if the test fails it
calls the adhoc function named on the cask.

(|reference cask|) // References something which exists
[|definition cask|] // Defines something new (sets a flag)
<|logic cask|> // Performs a test
~|utility cask|~ // Arbitrary code injection without adhoc

Simplified, the casks can be single, left, right, or both:

(|cask|) // Single
(||left|cask|) // Left
(|cask|right||) // Right
(||left|cask|right||) // Both

And each of the right or left sides can have pips between (|
and |), which identify if that portion is raised or lowered.
For example:

(o|cask|o)

This would graphically present as two closed pips.

(O|cask|o)

This would graphically present as left-open, right-closed pips.
And the same is true with other things on the sides:

(o||left|cask|o)
(o|cask|right||o)
(o||left|cask|right||o)

And to add multiple pips up to three, open or closed:

(oo|cask|oo)
(ooo|cask|ooo)

Graphically you can see the forms here in this template:

https://raw.githubusercontent.com/RickCHodgin/libsf/master/source/vjr/source/bmps/graphics/raw_tiled/cask_icons.bmp

And here in something Visual FreePro uses (upper right):

http://www.visual-freepro.org/images/vjr_046.png

-----
For ranges, overflows, underflows, you can setup casks which will
be called when those hit, such as:

s32 x, y;
flag failure;

adhoc underflow(s32& x) {
// Underflow handler code goes here, log, report
failure = yes;
}

adhoc overflow(s32& x) {
// Overflow handler code goes here, log, report
failure = yes;
}

// Some code which populates y
failure = no;
x = y <|x < 0|underflow(x)||> <|x > 200|overflow(x)||>;
if (failure)
returnto(restart_process);

David Brown

unread,

Jun 30, 2015, 4:28:21 PM6/30/15

to

On 30/06/15 17:28, Rick C. Hodgin wrote:
> On Tuesday, June 30, 2015 at 9:47:07 AM UTC-4, David Brown wrote:
>> On 30/06/15 13:27, Rick C. Hodgin wrote:
>>> #1 Little endian.
>>
>> There are far too many big endian systems around for this to be defined
>> as "core". I'd be happy to say it should support big endian and little
>> endian systems, but not mixed endian systems.
>
> Alright. Both are allowed. Little endian will be the default.
>

No, that makes no sense. You can't have them both supported yet one of
them is the default.

>> More usefully, I would like to see type/variable qualifiers for
>> specifying endianness, which would eliminate all issues that different
>> endianness causes today (e.g., when specifying a binary file format, you
>> would explicitly give the endianness of the fields).
>
> I do like this. Added. What should the syntax be? I have been fond
> of adding "dot tags" to things, so s32.le for little endian, and
> s32.be for big endian. They also allow additional tags, such as
> read-only being s32.le.ro. And of course these can be typedef'd
> or #define'd.

I would suggest, as I said, using a qualifier - grammatically similar to
"const" or "volatile". You would have qualifiers _Big_endian, and
_Little_endian, using the style now common for new C keywords.

I could see uses of other qualifiers too, such as _Saturated and
_Wrapped to affect how integer types perform arithmetic.

qualifiers, as I see it (without having thought everything through in
depth), would let you make these specialised types when you need them,
without having to have a proliferation of different types. And they can
be applied to types, variables, structs, arrays, etc. You could not do
that with "dot tags". I also think this sort of thing should have long,
clear names - you would only use them rarely, so clear and explicit
names are far better than minimal abbreviations.

>
>> The bit-endianness (order of bitfields) would also need to be
>> specified.
>
> Yes.
>
>>> #2 Two's complement.
>> I'm OK with that.
>>
>>> #3 All pointers are the same size per target.
>> I don't see the point of this restriction, and it would cause
>> limitation on real-world microcontrollers.
>
> No limitations. They will be included in extension allowances to
> Core C. But, they are not part of Core C.

Extensions to a language specification can make additional restrictions
(such as the way POSIX restricts the C standard's flexibility on
CHAR_BIT by insisting that it is 8). They can also allow flexibility
and expansion capability of the base standard to be made concrete - such
as the C standards allowing extended integer types, and gcc implementing
an _int128_t type.

But you can't base your language around a "core" and then have
extensions break those rules. They can add to the rules, but not break
them. You cannot have a base or "core" that says all pointers are the
same size, and then say that in /this/ extension those rules don't hold.
It would be completely ass-backward.

If you don't want to allow pointers of different sizes (you have yet to
come up with a convincing argument for this, but I realise that you are
adamant that it is important), then don't allow them. I and many others
would disagree on this rule, but it's /your/ "core C".

>
>>> #4 All memory is executable.
>>
>> Nope - that's a terrible idea. C does not currently distinguish between
>> types of memory (code, data, bss and stack sections are all
>> implementation details, not part of the standards). But if you want to
>> make memory areas part of "Core C", then you definitely want
>> restrictions on executable parts.
>>
>> And for many real-world microcontrollers, only some of the memory can be
>> executable. If "Core C" is to have a point, it must be suitable for a
>> range of "normal" systems, even if it excludes odd or outdated systems.
>
> Only some memory on any system can be executable. However, in the
> Core C virtual machine, all memory will be executable. And by
> default, the Core C language would allow branching to any address.
> It would be up to the OS to catch the error and shutdown the app.

That only makes sense in a sandbox in a VM - not in the real world, and
not on many systems (most microcontroller systems don't even have an OS).

Are you trying to reinvent Java, or are you trying to make a better C?

>
>>> #5 All memory is read/write unless explicitly flagged as read-only.
>>
>> Again, that's a terrible idea. And it would not work on
>> microcontrollers, which have read-only memory regardless of
>> what the user might try to do with flagging.
>
> I disagree. Memory can be flagged as read-only. We can add a
> block like:
>
> readonly {
> // Everything defined in here is in read-only memory
> };
>

C already has a way to do this - it's called "const". What are you
trying to do here that is different?

> I don't anticipate this language being used for microcontrollers,
> by the way. I anticipate it being used by people on desktop
> machines, mobile devices, and other personal devices.

I believe a substantial proportion, perhaps more than 50%, of all C code
written today is for microcontrollers rather than PC's. On the desktop
and PC world, there are many languages in use for different purposes,
and C is very much a minority choice for special cases such as low-level
code, OS code, and libraries where speed is a high priority. On
microcontrollers, C dominates (though C++ use is increasing).

Trying to move forward with C but ignoring microcontrollers is almost
entirely pointless.

>
>>> #6 No default support for char, short, int, long, long long, etc.,
>>> but these keywords are reserved for use with typedef or #define
>>> statements to map to the appropriate target for the machine.
>>
>> That's going to upset a lot of people...
>
> I've had the idea of including a #pragma or compiler switch which,
> for each version of Core C, wraps those values to their appropriate
> values on tha version. But, by default we need to move away from
> those generic concepts and get into specifics.

The language specs should not be concerned with compiler switches, and
#pragmas are only suitable for odd implementation-specific settings.
They should not be part of the language itself.

>
>>> #7 Signed variable types: s8, s16, s32, s64
>> No, these are called int8_t, int16_t, int32_t and int64_t. Don't change
>> these to non-standard names that could easily be mixed up, just to save
>> a couple of keystrokes.
>
> Disagree. Both the name int8_t and uint8_t are silly. It can
> be shortened, and I'm writing a language for the future, not
> the past.

Those names were picked for good reason, based on what was in use at the
time. Those reasons have not changed.

I have worked with code that uses a variety of different size-specific
types, with histories dating back to the pre-C99 days. It may be a
matter of taste, but I am much happier with int8_t style. (Things like
int_fast8_t get a bit tedious, however - I am open to better ideas
there. But since we are guaranteed that 8-bit data types exist, and the
compiler can optimise local variables in many ways, types like
int_fast8_t are not really needed - just use int8_t and let the compiler
implement it with a 32-bit int if that is faster.)

The original C language had various short forms (such as implicit int),
which are widely believed to be a way to save K & R typing more than
necessarily on their unpleasant DEC keyboards. There is no such
necessity now - you can pick names that make sense.

>
>>> #8 Unsigned variable types: u8, u16, u32, u64
>> Again, these are uint8_t, uint16_t, uint32_t and uint64_t.
>
> Ibid.
>
>> Note that some systems (in particular, some DSP's) have minimum 16-bit
>> or 32-bit accesses. I'm okay with these not being fully compatible with
>> "Core C", but you need to be aware of them.
>>
>>> #9 Floating point types all IEEE 754-2008 compliant: f32, f64
>> That would be float32_t and float64_t.
>
> Ibid.
>
>>> #10 A new "flag" variable type is introduced which is 1-bit,
>>> assigned with yes/true/up and no/false/down keywords.
>>
>> That would be "bool", with "true" and "false" values. It would be fine
>> to make those keywords (just like in C++), and save #include'ing <stdbool.h>
>
> bool is 1-byte as I understand it. flag variables are bits, packed
> into scoped local or global space.

The only valid values for bool are true and false. If C had a way to
address single bits (some systems do), then it would be sufficient to
use a single bit for a bool.

But if you want something that is always 1 bit, and that the compiler
should pack (despite the inefficiencies that may entail), why not call
it "bit" ? That would be in addition to the logical "bool" type.

>
>>> #11 A new "bits" variable is introduce which is n-bits, where:
>>> 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
>>> checked on use with inquiries being thrown (see #16 below).
>> That is not in the spirit of C - and certainly not in a "Core C".
>
> I've redefined it in subsequent messages. It's now bN, such as b17.
> And you can specify a type tob e b17:s32, which means treat the
> value in those 17 bits as an s32 for calculations. It will also
> allow for simple signed and unsigned values using b17:s, b17:u.
>

That's just too unpleasant for words.

>>> #12 Two new arbitrary size variable type are introduced: bi, bfp
>>> (bi=big integer, bfp=big floating point)
>> That is not in the spirit of C - and certainly not in a "Core C".
>
> It is part of Core C.
>
>>> #13 A new weakly typed variable type is introduced: var
>> That is not in the spirit of C - and certainly not in a "Core C".
>
> It is part of Core C.
>
>>> #14 The basic class is introduced.
>> That is not in the spirit of C - and certainly not in a "Core C".
>
> It is part of Core C.
>
>> If you want to invent a language that is half-way towards C++, and call
>> it "Core C+", that's fine - but it's a different language.
>
> C needs the class. It is a minor addition with major scope. We
> already have everything we need provided for in structs. It just
> allows encapsulation and the nicer syntax usage.
>
> It is part of Core C.

C does not /need/ classes, or any of the stuff here. Remember, this is
just a thought experiment - not reality.

There certainly are advantages in adding classes to C - that is why C++
was created. Of course one does not need to go as far as C++, but you
very quickly move out of C philosophy - hence the suggested Core C+.

>
>>> #15 The concept of edit-and-continue compilation is mandatory.
>> That has no place in a language. Even though you like
>> edit-and-continue, you must surely see that!
>
> Disagree completely. And you even said previously that you could
> see how it could have some use.

I said I could see how it could have some use in an IDE and development
tools - not that it should be involved in the language itself in any
way. There is a /huge/ difference.

> For others it has exceeding use.
> Apple even thought it had enough to add it to versions of GCC,
> calling it "fix-and-continue." It's not a trivial thing. It's
> major. And...

It's a trivial thing as far as the language is concerned - there is
nothing involved.

It is also a trivial thing as far as the compiler is concerned, and I
believe has been implemented in several tools. All that is needed is a
couple of nop's at the start of the function, where the debugger can
insert a jump to the patched code. That is it, basically.

Making it work well with the debugger and IDE is another matter. /That/
is hard. But it is not a language issue in any shape or form.

>
> It's part of Core C.

It would never be part of anything anyone else would consider to be
"Core C", or a "next generation C" or a replacement for C.

>
>>> #16 A new exception framework is introduced based on inquiries
>>> rather than errors. These allow the machine to suspend for
>>> edit-and-continue guidance when an "error" condition arises,
>>> which will now be called an "inquiry" condition.
>> That is not in the spirit of C - and certainly not in a "Core C".
>
> It is part of Core C.

It would never be part of anything anyone else would consider to be
"Core C", or a "next generation C" or a replacement for C.

>
>>> #17 Flow control blocks are introduced.
>> C already has plenty of flow control keywords - it doesn't need any more.
>
> It is part of Core C.

What flow control blocks do you mean here? I assumed you were talking
about "iterate", "until", "unless", etc., that you discussed earlier -
and which I showed are easily implemented by simple macros in today's C.

>
>>> #18 Adhoc functions are introduced.
>>
>> I'd be fine with local nested functions - but call them local functions
>> or nested functions, since that is what they are. "Adhoc" means
>> something completely different from the way you use it (look the word up
>> in a dictionary).
>
> Adhocs relate to casks, which are now part of Core C. As such,
> adhocs are part of Core C.

Casks would never be part of anything anyone else would consider to be
"Core C", or a "next generation C" or a replacement for C, and thus the
same applies to other features that go with them.

On the other hand, local nested functions are often useful ideas (so
useful that C++ has them now, called lambdas, and many other programming
languages have always supported them), and could be added to C if it
were done carefully.

>
>>> #19 The #marker is introduced (for named return destinations, as in:
>>> returnto markerName).
>> Nope.
>
> It is part of Core C.

Great, a new way to do spaghetti programming. Just what we need.

If you feel that something like this has any use, read about generators
and coroutines for better methods.

>
>>> #20 The cask is introduced allowing for arbitrary code injection,
>>> and calling protocols on logical tests to meta (true) and
>>> mefa (false), as well as message passing using mema.
>> Nope.
>
> It is part of Core C.
>
>> "casks" have no place in C of any sort, "core" or otherwise. They may
>> be part of RDC, but that's because RDC is a different sort of language.
>
> It is part of Core C.

See above.

If you insist on things like this as "Core C", rather than RDC, then you
will lose any hope of support from other people. Many of your ideas
here are interesting and could conceivably form suggestions for changes
to the C standards or enhancements and extensions in real compilers.
But this cask nonsense is for you alone.

(I'm not going to bother commenting any further, as your later ideas and
replies just get sillier and more Rick-centric.)

Ian Collins

unread,

Jun 30, 2015, 4:45:40 PM6/30/15

to

David Brown wrote:
> On 30/06/15 17:28, Rick C. Hodgin wrote:
>
>> I don't anticipate this language being used for microcontrollers,
>> by the way. I anticipate it being used by people on desktop
>> machines, mobile devices, and other personal devices.
>
> I believe a substantial proportion, perhaps more than 50%, of all C code
> written today is for microcontrollers rather than PC's. On the desktop
> and PC world, there are many languages in use for different purposes,
> and C is very much a minority choice for special cases such as low-level
> code, OS code, and libraries where speed is a high priority. On
> microcontrollers, C dominates (though C++ use is increasing).

I would have put the proportion of C code written today for
microcontrollers at >75%.

> Trying to move forward with C but ignoring microcontrollers is almost
> entirely pointless.

I would drop "almost"...

--
Ian Collins

Rick C. Hodgin

unread,

Jun 30, 2015, 4:56:09 PM6/30/15

to

Alrighty. If it's pointless, I will proceed back to RDC. No harm. No foul.

Thank you for your input.

Malcolm McLean

unread,

Jun 30, 2015, 5:56:19 PM6/30/15

to

On Tuesday, June 30, 2015 at 9:45:40 PM UTC+1, Ian Collins wrote:
>
> > Trying to move forward with C but ignoring microcontrollers is almost
> > entirely pointless.
>
> I would drop "almost"...
>

No, because there's an important subset of C which is for processor-
intensive basic algorithms on medium sized to large systems. Generally
such code needs to be very portable, because it's too much grief to
rewrite it when a new OS comes out, and it's not too bothered about
micro-optimisations like cache usage or fast boolean types, largely
because getting the algorithm down to a decent Big O is usually so
complicated that if you're trying to do that kind of optimisation as
well, you're blowing the job up into a massive task.

You can reasonably produce a "Core C" aimed at this sector.

Ian Collins

unread,

Jun 30, 2015, 6:03:36 PM6/30/15

to

How can you have a core targeted at a niche?

--
Ian Collins

Malcolm McLean

unread,

Jun 30, 2015, 6:32:02 PM6/30/15

to

Because all code has to do a little bit of flow control and
arithmetical work. But algorithm implementation does just that,
together with memory allocation. Other code needs things like
thread control, timings and interrupts for reading data from ports,
and high-level concepts like windows. You can get rid of all that
for a lot of algorithms development.

Ian Collins

unread,

Jun 30, 2015, 6:52:42 PM6/30/15

to

Malcolm McLean wrote:
> On Tuesday, June 30, 2015 at 11:03:36 PM UTC+1, Ian Collins wrote:
>> Malcolm McLean wrote:
>>> On Tuesday, June 30, 2015 at 9:45:40 PM UTC+1, Ian Collins wrote:
>>
>>> You can reasonably produce a "Core C" aimed at this sector.
>>
>> How can you have a core targeted at a niche?
>>
> Because all code has to do a little bit of flow control and
> arithmetical work. But algorithm implementation does just that,
> together with memory allocation. Other code needs things like
> thread control, timings and interrupts for reading data from ports,
> and high-level concepts like windows. You can get rid of all that
> for a lot of algorithms development.

In other words use standard C as is, not some rehash.

--
Ian Collins

Malcolm McLean

unread,

Jun 30, 2015, 7:58:50 PM6/30/15

to

On Tuesday, June 30, 2015 at 11:52:42 PM UTC+1, Ian Collins wrote:
> Malcolm McLean wrote:
> >
> > Because all code has to do a little bit of flow control and
> > arithmetical work. But algorithm implementation does just that,
> > together with memory allocation. Other code needs things like
> > thread control, timings and interrupts for reading data from ports,
> > and high-level concepts like windows. You can get rid of all that
> > for a lot of algorithms development.
>
> In other words use standard C as is, not some rehash.
>

ANSI C is pretty good for the purpose, yes.

unread,

Jul 1, 2015, 10:55:11 AM7/1/15

to

David Brown <david...@hesbynett.no> writes:
> On 01/07/15 02:30, Keith Thompson wrote:
>> Malcolm McLean <malcolm...@btinternet.com> writes:
>>> On Tuesday, June 30, 2015 at 11:52:42 PM UTC+1, Ian Collins wrote:
>>>> Malcolm McLean wrote:
>>>>>
>>>>> Because all code has to do a little bit of flow control and
>>>>> arithmetical work. But algorithm implementation does just that,
>>>>> together with memory allocation. Other code needs things like
>>>>> thread control, timings and interrupts for reading data from ports,
>>>>> and high-level concepts like windows. You can get rid of all that
>>>>> for a lot of algorithms development.
>>>>
>>>> In other words use standard C as is, not some rehash.
>>>>
>>> ANSI C is pretty good for the purpose, yes.
>>
>> By "ANSI C", do you mean the language defined by the 1989 ANSI standard
>> (complete with the implicit int rule)? Or do you mean the version
>> defined by the 2011 ISO C standard, which is the only one currently
>> recognized by ANSI?
>
> Let's just say "standard C", and skip the details.

That would be ISO C. (But that doesn't tell us what Malcolm meant by
it.)

> The point is, I
> think, that ordinary, everyday C that we know and love is suitable - we
> don't need a new "Core C" to do the sorts of jobs Malcolm is talking about.

Jens Thoms Toerring

unread,

Jul 1, 2015, 11:04:41 AM7/1/15

to

Rick C. Hodgin <rick.c...@gmail.com> wrote:
> We can define Core C here if you'd like. :-) I appoint myself chair,
> and the rest of us a "committee of the whole" to address the issue.

> #1 Little endian.
> #2 Two's complement.

> #3 All pointers are the same size per target.

> #4 All memory is executable.

> #5 All memory is read/write unless explicitly flagged as read-only.

> #6 No default support for char, short, int, long, long long, etc.,
> but these keywords are reserved for use with typedef or #define
> statements to map to the appropriate target for the machine.

> #7 Signed variable types: s8, s16, s32, s64

> #8 Unsigned variable types: u8, u16, u32, u64

> #9 Floating point types all IEEE 754-2008 compliant: f32, f64

> #10 A new "flag" variable type is introduced which is 1-bit,
> assigned with yes/true/up and no/false/down keywords.

> #11 A new "bits" variable is introduce which is n-bits, where:
> 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
> checked on use with inquiries being thrown (see #16 below).

> #12 Two new arbitrary size variable type are introduced: bi, bfp
> (bi=big integer, bfp=big floating point)

> #13 A new weakly typed variable type is introduced: var

> #14 The basic class is introduced.

> #15 The concept of edit-and-continue compilation is mandatory.

> #16 A new exception framework is introduced based on inquiries
> rather than errors. These allow the machine to suspend for
> edit-and-continue guidance when an "error" condition arises,
> which will now be called an "inquiry" condition.

> #17 Flow control blocks are introduced.

> #18 Adhoc functions are introduced.

> #19 The #marker is introduced (for named return destinations, as in:
> returnto markerName).

> #20 The cask is introduced allowing for arbitrary code injection,
> and calling protocols on logical tests to meta (true) and
> mefa (false), as well as message passing using mema.

> #21 A new self-documenting protocol is introduced, which allows
> the software to be documented in source code (similar to Java's).

> #22 A new metadata set of tags is made available on a line-by-line
> basis, or at the function / struct level. These explicitly
> tag source code lines to be of a particular type, making a new
> type of search and/or replace possible (such as /* md: a, b, c */).

> #23 All standard libraries are available and #include'd by default.
> They must be explicitly excluded if they are not needed.

> #24 All functions are linked to (1) a bare minimum for standard
> libraries, (2) all are included for developer code; unless
> otherwise directed by compiler/linker flags.

All this can't be done without an interpreter or a kind of VM,
in which the program is run. With this you're throwing every-
thing out of the window what has made C such a success. And
that is that you can do bare-metal programming at nearly the
fastest possible speed - for which C has been called, not
too far off the mark, a "portable assembler:. With an inter-
preter or a VM it's impossible to use it for writing opera-
ting systems or using it as the implementation language for
higher-level languages. So, what you propose here is a new
language with a C-like syntax, but without any of the funda-
mental benefits of C. And such languages are a dime for the
dozend.

And with such a language all these points about endianess,
numbers of bits in variables, pointer sizes become mostly
irrelevant. These aren't language properties but are deter-
minded by the hardware the program is running on. They all
become implementation details the author of the interpreter/
VM has to care about, but which the user of the language has
no reason to care (or even to know) about - the interpreter/
VM has to somehow map the variables of that language to the
real hardware (or it is impossible to implement it on any
hardware that doesn't exactly fits your model).

And when you start designing a new language then you really
should look a lot further than just reproducing the syntax
of C with a few new bits added while giving up all what makes
C the lingua franca of computing. There's, for example, no
discernable reason why you're so fixated on integer vari-
ables with sizes of 8 bits and multiples of that. In C there
is a reason: it's often the "natural" size of the underlying
hardware. But for something that runs in an interpreter/VM
this is rather unimportant. Since you wil have to check each
and every calculation with such variables for over- and
underflows anyway, why not start with a fresh appproach
(but not new, e.g. ADA does that since a long time) and
have variables with e.g. user-defined ranges? So, if the
problem requires it, I can define a variable that can only
hold values between, say, -17 and 541? Since you already
have thrown out the speed advantage of C the tiny bit of
extra checking this would require won't make any major dif-
ference. And it might help writing much more secure programs
since all the checking the user has to do himself in C (but
too often forgets) are done automatically.

On the other hand, if the user doesn't know in advance whar
range he's going to need, why not make the variables adapt
themselves to what's needed? You'll have to check each assign-
ment to a variable anyway, so if the new value does not fit
anymore into an int32_t why not transparently upgrade it into
an int64_t (or your 'bi' type)?

Anything this "Core C" as stated will have to do with C (except
some similarities in the syntax) will be that the interpreter/VM
will need to be written in C or C++. (or some other languange
that probably is itself implemented in C if you don't mind to
run your interpreter/VM in another interpreter/VM;-). But it
won't be of any real interest to anyone if it hasn't any fea-
tures that make it a lot better than C when it just runs a lot
slower and doesn't allow you to do most things C typically is
used for.
Regards, Jens
--
\ Jens Thoms Toerring ___ j...@toerring.de
\__________________________ http://toerring.de

Keith Thompson

unread,

Jul 1, 2015, 11:05:58 AM7/1/15

to

Bartc <b...@freeuk.com> writes:
> On 30/06/2015 18:04, Rick C. Hodgin wrote:
>> On Tuesday, June 30, 2015 at 12:45:08 PM UTC-4, Bart wrote:
>>> Presumably this (pointer,offset) combo is the same size overall as the
>>> other pointers in the language? Or would you make an exception, or not
>>> call it a pointer, or would require two values to be passed?
>>
>> Same size.
>
> I'm curious: how can a pointer and offset together be the same size as a
> pointer?
>
> Or do all pointers have a spare 32-bit field on the off-chance they
> might have to store an offset?

An example I've mentioned here many times: Cray vector machines are
word-oriented, where a word is 64 bits. A machine-level address/pointer
is a 64-bit quantity containing the address of a word. Cray's C
compiler has CHAR_BIT==8 (for Unix/POSIX compatibility), and implements
a byte pointer as a word pointer with the byte offset stored in the
otherwise unused high-order 3 bits.

Any system with 64-bit pointers that doesn't actually support 16
exabytes of memory is likely to be able to do something similar.

> (I also think that a normal byte-addressed pointer to a single value
> ought to be the same size regardless of the type it points to. But it is
> also reasonable that a bit pointer, bit-field pointer, slice, or
> bit/bitfield slice might all need extra information and could therefore
> be wider. You wouldn't then expect a void* type to accommodate all those
> special forms too.)

--

Rick C. Hodgin

unread,

Jul 1, 2015, 11:12:21 AM7/1/15

to

RDC allows this to be done using casks in the definition. You
can set casks which look at min's and/or max's and then call
the appropriate function if violated, which presumably corrects
the value.

> You'll have to check each assign-
> ment to a variable anyway, so if the new value does not fit
> anymore into an int32_t why not transparently upgrade it into
> an int64_t (or your 'bi' type)?

I had set up some rules for this, and had capped them at 64-bits.
They could upgrade to a bi, however.

> Anything this "Core C" as stated will have to do with C (except
> some similarities in the syntax) will be that the interpreter/VM
> will need to be written in C or C++. (or some other languange
> that probably is itself implemented in C if you don't mind to
> run your interpreter/VM in another interpreter/VM;-). But it
> won't be of any real interest to anyone if it hasn't any fea-
> tures that make it a lot better than C when it just runs a lot
> slower and doesn't allow you to do most things C typically is
> used for.
> Regards, Jens
> --
> \ Jens Thoms Toerring ___ j...@toerring.de
> \__________________________ http://toerring.de

I appreciate your well thought out input, Jens. Thank you.

Bartc

unread,

Jul 1, 2015, 12:01:56 PM7/1/15

to

On 01/07/2015 16:05, Keith Thompson wrote:

> Bartc <b...@freeuk.com> writes:

>> I'm curious: how can a pointer and offset together be the same size as a
>> pointer?
>>
>> Or do all pointers have a spare 32-bit field on the off-chance they
>> might have to store an offset?
>
> An example I've mentioned here many times: Cray vector machines are
> word-oriented, where a word is 64 bits. A machine-level address/pointer
> is a 64-bit quantity containing the address of a word. Cray's C
> compiler has CHAR_BIT==8 (for Unix/POSIX compatibility), and implements
> a byte pointer as a word pointer with the byte offset stored in the
> otherwise unused high-order 3 bits.
>
> Any system with 64-bit pointers that doesn't actually support 16
> exabytes of memory is likely to be able to do something similar.

We're talking about offsets into an array which could be as big as
available memory, and that's for a byte array. For a bit-array, the
offset could be 8 times as big!

So it's not just a few extra bits.

With 32-bit pointers, there is nowhere to put a substantial offset. With
64-bit ones, you might get away with it if pointers are limited to a 4GB
range (2GB user and 2GB system), and offsets are limited to +2GB or +4GB
(the latter being the maximum offset within a 500MB bit-array). And if
the OS and everything else plays along.

Otherwise it gets tight. My point was that the natural way to do this
was with a discrete pointer/offset pair that is larger (perhaps double
the size) than a regular pointer. (With my preferred method of using bit
pointers, then there's a better chance of storing the 0..7 bit offset in
a spare corner of the pointer.)

--
Bartc

Malcolm McLean

unread,

Jul 1, 2015, 12:43:03 PM7/1/15

to

On Wednesday, July 1, 2015 at 3:55:11 PM UTC+1, Keith Thompson wrote:

> David Brown <david...@hesbynett.no> writes:
>
> > Let's just say "standard C", and skip the details.
>
> That would be ISO C. (But that doesn't tell us what Malcolm meant by
> it.)
>

The conservative subset of C which runs anywhere.
We assume that int can index any array we're likely to be passed, and that data
are either small integers or reals. We assume that strings are UTF-8 and if
we need any character-wise processing of non-English text, we'll be told
about it.
Code written like that will usually run correctly and efficiently anywhere.
Theoretically it can break (which is a nuisance for security and malicious exploits),
but generally it doesn't, unlike more advanced constructs.

Keith Thompson

unread,

Jul 1, 2015, 1:56:34 PM7/1/15

to

Bartc <b...@freeuk.com> writes:
> On 01/07/2015 16:05, Keith Thompson wrote:
>> Bartc <b...@freeuk.com> writes:
>>> I'm curious: how can a pointer and offset together be the same size as a
>>> pointer?
>>>
>>> Or do all pointers have a spare 32-bit field on the off-chance they
>>> might have to store an offset?
>>
>> An example I've mentioned here many times: Cray vector machines are
>> word-oriented, where a word is 64 bits. A machine-level address/pointer
>> is a 64-bit quantity containing the address of a word. Cray's C
>> compiler has CHAR_BIT==8 (for Unix/POSIX compatibility), and implements
>> a byte pointer as a word pointer with the byte offset stored in the
>> otherwise unused high-order 3 bits.
>>
>> Any system with 64-bit pointers that doesn't actually support 16
>> exabytes of memory is likely to be able to do something similar.
>
> We're talking about offsets into an array which could be as big as
> available memory, and that's for a byte array. For a bit-array, the
> offset could be 8 times as big!
>
> So it's not just a few extra bits.

Ok, I missed some of the context.

If the base pointer points to the word containing the data, then the
offset can be no bigger than 6 bits (assuming 64-bit pointers and 1-bit
elements). If the base pointer points to the beginning of an array
object, and the offset can refer to any element of the array, then the
offset has to be a size_t or equivalent.

[...]

> Otherwise it gets tight. My point was that the natural way to do this
> was with a discrete pointer/offset pair that is larger (perhaps double
> the size) than a regular pointer. (With my preferred method of using bit
> pointers, then there's a better chance of storing the 0..7 bit offset in
> a spare corner of the pointer.)

It can vary a bit if the maximum size of an object is substantially
smaller than the total size of the memory space; then size_t can be
smaller than void*. Of course monolithic address spaces are all the
rage these days.

Steve Thompson

unread,

Jul 3, 2015, 5:17:01 PM7/3/15

to

On Tue, Jun 30, 2015 at 03:46:58PM +0200, David Brown wrote:
> On 30/06/15 13:27, Rick C. Hodgin wrote:
> > We can define Core C here if you'd like. :-) I appoint myself chair,
> > and the rest of us a "committee of the whole" to address the issue.
>

> That would be the "royal we"? This is entirely /your/ thoughts on a
> language. Others (like me) might comment, but don't pretend that any
> one other than /you/ thinks this is a useful idea in practice.
>
> However, it can always be fun to think about what would make up a
> "perfect" C language, from an almost entirely subjective viewpoint.

It must be a coincidence that his "Core C" feature enumeration has a
close correspondence with several of the features I am putting in my
VM project. It must be the nature of the beast; C-derived systems
will tend to be similar, right? At any rate, my thing is a research
project and not something I am worried about promoting on Usenet.

> >
> > #1 Little endian.
>
> There are far too many big endian systems around for this to be defined
> as "core". I'd be happy to say it should support big endian and little
> endian systems, but not mixed endian systems.
>

> More usefully, I would like to see type/variable qualifiers for
> specifying endianness, which would eliminate all issues that different
> endianness causes today (e.g., when specifying a binary file format, you
> would explicitly give the endianness of the fields).
>

> The bit-endianness (order of bitfields) would also need to be specified.

That's a pretty good idea. If you had a CPU instruction to byteswap
16-64 bit words it would be easy to implement.

> > #2 Two's complement.
>
> I'm OK with that.
>

> > #3 All pointers are the same size per target.
>

> I don't see the point of this restriction, and it would cause limitation
> on real-world microcontrollers.

Doesn't really make sense with some of his criterion.

> > #4 All memory is executable.
>

> Nope - that's a terrible idea. C does not currently distinguish between
> types of memory (code, data, bss and stack sections are all
> implementation details, not part of the standards). But if you want to
> make memory areas part of "Core C", then you definitely want
> restrictions on executable parts.
>
> And for many real-world microcontrollers, only some of the memory can be
> executable. If "Core C" is to have a point, it must be suitable for a
> range of "normal" systems, even if it excludes odd or outdated systems.
>

> > #5 All memory is read/write unless explicitly flagged as read-only.
>

> Again, that's a terrible idea. And it would not work on
> microcontrollers, which have read-only memory regardless of what the
> user might try to do with flagging.

His "core C" is not going to work where there is no hardware memeory
protection.

> > #6 No default support for char, short, int, long, long long, etc.,
> > but these keywords are reserved for use with typedef or #define
> > statements to map to the appropriate target for the machine.
>

> That's going to upset a lot of people...
>

> > #7 Signed variable types: s8, s16, s32, s64
>

> No, these are called int8_t, int16_t, int32_t and int64_t. Don't change
> these to non-standard names that could easily be mixed up, just to save
> a couple of keystrokes.
>

> > #8 Unsigned variable types: u8, u16, u32, u64
>

> Again, these are uint8_t, uint16_t, uint32_t and uint64_t.

These types should have been defined way back when C was an unruly
teenager and used then instead of char, int, etc.

How much trouble has the indeterminate range of int types caused when
porting code?

Regards,

Steve Thompson

--
"If I had a nickel for every time some idiot called me about a
computer problem that turned out to be user error, I would be able to
retire and spend the rest of my days cultivating clues in my backyard
hillside garden." -- MysteryDog in 24hoursupport.helpdesk.

BGB

unread,

Jul 4, 2015, 11:40:36 AM7/4/15

to

On 6/30/2015 6:27 AM, Rick C. Hodgin wrote:
> We can define Core C here if you'd like. :-) I appoint myself chair,
> and the rest of us a "committee of the whole" to address the issue.
>

late to this party, but.
may compare with my own C-Aux language variant and similar where relevant.

> #1 Little endian.

possible, but limits range of target.
C-Aux supports either endianess, but it is nominally left as runtime issue.

default is little-endian though, for cases where the choice is nearly
equal weight. in a way it is ironic though that the VM's compiled image
format is big-endian (it reused the TLV format from some of my other
efforts, which was big-endian, so everything else was done this way in
the format for consistency).

> #2 Two's complement.

sane.

> #3 All pointers are the same size per target.

probably sane.

> #4 All memory is executable.

probably a bad idea in general.

internally, the VM used by C-Aux essentially treats executable code and
normal memory differently. potentially, you could produce bytecode and
shove it into the VM, but this is outside its current use-case.

> #5 All memory is read/write unless explicitly flagged as read-only.

depends.

some of my past VMs have actually used a Unix-like security model (RWX
and UID/GID) for memory access, enforced by the VM (similar to file
access). with some trickery, I was able to basically make a lot of the
access-rights checking cacheable.

the premise was that ideally VM code could be sandboxed, and thus kept
mostly safe.

this mostly ended up unused as it is a bit much code overhead (it
required a lot of boilerplate in declarations).

the VM used by C-Aux currently just exposes raw memory, with it
basically using OS/hardware level access protections. but, this is
itself mostly about current use-case, and may be different for other
use-cases.

a simpler model is just splitting code into two levels, such as
'trusted' and 'untrusted', and restricting what is accessible from
untrusted code.

> #6 No default support for char, short, int, long, long long, etc.,
> but these keywords are reserved for use with typedef or #define
> statements to map to the appropriate target for the machine.

probably not viable.

> #7 Signed variable types: s8, s16, s32, s64

> #8 Unsigned variable types: u8, u16, u32, u64

can be done, but another option is making the normal types fixed size by
default.

for example, in C-Aux:
char, signed 8 bit;
short, signed 16-bit;
int, signed 32-bit;
long long, signed 64-bit.

some remained variable:
long, 32 or 64 depending on target.
internally defaults to 64-bit treatment in the VM.
technically, it is similar to 'short' (which internally uses 'int').
it is a type/converting load/store hack.
internally to the VM, and in my 'BS-RT' language,
this type is called 'native long'.
in the VM, 'long' actually refers to 'long long'.
pointers:
32 or 64 bit, depending on target.

the VM leaves off support for 16-bit targets, as pretty much all the
current 16-bit targets still in active use (typically small
microcontrollers) have too little RAM to make such a VM viable.

like, probably sane to at least require a higher-end 32-bit microcontroller.

other type names:
__int8_t, alias to char
__int16_t, alias to short
__int32_t, alias to int
__int64_t, alias to long long
__int128_t, special, non-core type.

core types and non-core types get different treatment from the VM:
core types have assigned bit-patterns and operators in the bytecode:
the bytecode mostly uses an ILFDA model similar to the JVM.
some other types are core by having short bit-codes.
non-core types generally use operations which depend on signatures.
this includes most extended types, as well as structures.
current core uses a 4-bit type-code.
possible could be to create an "extended core" with 8-bit type-IDs.

> #9 Floating point types all IEEE 754-2008 compliant: f32, f64

mostly sane.

C-Aux and its VM define:
short float, 16-bit IEEE half-float.
like short, it is mostly a load/store hack.
non-core type.
float, 32-bit IEEE float.
double, 64-bit IEEE double.
long double, mostly an alias for double.
internally, the VM does not distinguish them.
__float128_t, non-core, 128-bit float.

> #10 A new "flag" variable type is introduced which is 1-bit,
> assigned with yes/true/up and no/false/down keywords.
> #11 A new "bits" variable is introduce which is n-bits, where:
> 1 <= n <= 7. Defined as "bits:2" for a 2-bit width, range-
> checked on use with inquiries being thrown (see #16 below).

could just make bitfields not suck (such as by better nailing down their
behavior).

> #12 Two new arbitrary size variable type are introduced: bi, bfp
> (bi=big integer, bfp=big floating point)

possible, but these sorts of things are difficult to support with much
of any real semblance of efficiency (the added complexity of dealing
with the variable size tends to outweigh any gains).

hence, why large fixed-size types tend to be a more popular option.

this includes things like 2048-bit packed decimal types and similar.

> #13 A new weakly typed variable type is introduced: var

similar has been considered for C-Aux, but it is more likely to be fully
dynamically typed (weak typing and dynamic typing are different), and
partly as a side-effect from both C-Aux and BS-RT sharing the same VM.

in the current VM, it is a 64-bit tagged reference type, generally with
the type of the pointed-to object squirreled in the reference:
on 32-bit targets, there were 28 bits left over;
on current 64 bit targets, about 12 (Linux) or 16 (Win64).

currently, it uses 12 bits for the type-tag, with any remaining bits
reserved. future-proofing? dunno...

a challenge though is supporting these types (and also dynamic objects)
in the VM without risking screwing up timing constraints. as-is, there
are some limits to the type-system to try to keep the timing for
everything bounded and reasonably predictable.

there is may also be an 'auto' type, which uses type inference.

ex:
auto x=3LL; //x is implicitly long long
auto y=x; //y is also implicitly long long

my past language ('BS') had used 'var' for both cases, but BS-RT and
C-Aux would likely have separate dynamically-typed and type-inferred
variables.

> #14 The basic class is introduced.

I decided against this for C-Aux.

there may or not (eventually) be a C+Aux which does classes (and is
probably a lazy mimic of C++ syntax).

> #15 The concept of edit-and-continue compilation is mandatory.

not viable in my case.

the VM is mostly intended for controllers for my robotics projects, and
it is generally easier to use generic tools (text editors), and some
offline simulation.

granted, the simulators don't really show what actual hardware will do.
something like a 3D and maybe physics-based simulator could be useful,
but the effort of making such a thing would likely not outweigh the
effort it would require to make it.

like, it isn't too difficult to render or animate an action, but
realistic simulation of things like motors and mechanics is non-trivial.

IRL also has all this fun, like stuff not sliding for crap because the
rails aren't quite perfectly parallel (like, they diverge by a small
fraction of an inch along their length, causing things to stick when
sliding along them), ...

well, and also real-world fun like motors getting burnt up because IO
pins get stuck on (... rewinding motors is fun ...).

also cats help, like one can use a PC or laptop for controlling
something (dealing with the controller), and cats will hop up and do a
little dance all over the keyboard, fouling things up or sending the
computer into hibernate or whatever, then one has a mess.

> #16 A new exception framework is introduced based on inquiries
> rather than errors. These allow the machine to suspend for
> edit-and-continue guidance when an "error" condition arises,
> which will now be called an "inquiry" condition.

typically one would glue on some variant of try/catch.

> #17 Flow control blocks are introduced.
> #18 Adhoc functions are introduced.

ok.

currently no plans for this personally in C-Aux, but they will exist in
BS-RT.

possible syntax choices (if added to C-Aux):
carrot:
^ { ... }
^(args)->rtype { ... }
bracket:
[](args)->rtype { ... }
keyword:
__function rtype(args) { ... }

slightly uncertain is whether or not a call/cc type mechanism should
exist (AKA: "call with current continuation"). these allow basically
having more explicit control over control flow.

as-is, nothing like this is supported (and is generally not often useful
enough to justify the added complexity of supporting it).

> #19 The #marker is introduced (for named return destinations, as in:
> returnto markerName).

... hmm ...

> #20 The cask is introduced allowing for arbitrary code injection,
> and calling protocols on logical tests to meta (true) and
> mefa (false), as well as message passing using mema.

?

> #21 A new self-documenting protocol is introduced, which allows
> the software to be documented in source code (similar to Java's).

sort of lazily-done already.

> #22 A new metadata set of tags is made available on a line-by-line
> basis, or at the function / struct level. These explicitly
> tag source code lines to be of a particular type, making a new
> type of search and/or replace possible (such as /* md: a, b, c */).
> #23 All standard libraries are available and #include'd by default.
> They must be explicitly excluded if they are not needed.

why?...

> #24 All functions are linked to (1) a bare minimum for standard
> libraries, (2) all are included for developer code; unless
> otherwise directed by compiler/linker flags.

some OS libraries and the C runtime tend to be included by default in
most compilers anyways.

for non-hosted code, you explicitly don't have or can't use libraries.

>
> An incomplete list. I'll add more as I think of them. Please feel
> free to comment or propose.

BGB

unread,

Jul 4, 2015, 2:03:33 PM7/4/15

to

On 6/30/2015 11:44 AM, Bartc wrote:
> On 30/06/2015 15:50, Rick C. Hodgin wrote:
>> On Tuesday, June 30, 2015 at 10:11:19 AM UTC-4, Bart wrote:
>

...

>>> Sorry, this doesn't look right. You are going to end up with a struct
>>> which is likely to be 16 bytes (maybe 12 bytes if you put the values
>>> first, unless it's padded anyway), to store a single u8 value; why?
>>
>> Because it's a universal type. The same A you use in one moment to
>> store a u8 can later be used to store a double, etc. It requires no
>> additional allocation, and is available to be used in this weakly
>> typed form.
>
> But you gain nothing from storing an 8-bit value int instead of 32 or 64
> bits.
>
> And since this has now all got to be sorted out at runtime, it means
> that to execute:
>
> A+B
>
> when each operand might be s8 s16 s32 s64 u8 u16 u32 u64 f32 f64 (to say
> nothing of big ints and bit floatS), that means up to 100 different
> combinations to consider. Per operator. Plus all the illegal
> combinations to be trapped (array plus a struct for example).
>
> Small types such aa 8-bit values are only of use as storage types, to
> save space when you have an array of them, or to craft a particular
> shape of struct, or for forming strings. They are are of little use as
> standalone variables, let alone inside a variant type.
>

for dynamic types, by stuff generally uses a full-width 'fixnum' and
'flonum' for pretty much all the narrower integer and floating point types.

the fixnum type has a 62-bit range by default.
there is a subset of fixnum which is 32-bits, but it isn't really
clearly distinguished, and mostly exists as a special-case.

the flonum type has full double precision over a large part of its range.

basic summary (stated before):
variants are 64-bit, with the high 4 bits as a major tag:
0: Pointer
1/2: Fixnum (Positive)
3/4: Flonum (+, E+/-256)
5/6: currently reserved in FRVM
7: smaller tagged literal spaces
8: Flonum (large exponent, Inf/NaN)
9/A: reserved in FRVM
B/C: Flonum (-, E+/-256)
D/E: Fixnum (Negative)
F: Pointer (64-bit only)

the Fixnum coding is a little awkward, sadly, but is based mostly on
wanting to be able to identity-map parts of the pointer and double spaces.

Fixnum-32 special cases:
0x10000000'XXXXXXXX: Positive/Unsigned
0xEFFFFFFF'XXXXXXXX: Negative
or, basically, sign or zero extend and XOR high bits with 0x10000000.

in my past VM (BSVM), both pointer-cases were untagged (memory-objects
were typed, rather than pointers), however, in the current FRVM:
0x0TTTYYYY'XXXXXXXX: Pointer
X is the low bits of the pointer.
Y is also part of the pointer for 64 bits.
currently it is reserved and MBZ for 32-bit targets.
T: type-tag, index into a type-info table
0 is special, and means a raw/untyped pointer.
0xFXXXXXXX'XXXXXXXX: Pointer (64-bit only)
BSVM: Negative Pointers only
FRVM: Possibly may be used for 60-bit addresses (both signs).

TBD: currently there is no good way to encode a primitive-typed pointer
in the FRVM tagged-reference.

ex, if one did something like:
__variant v;
int *pi;
int i;
...
v=pi;
i=v[3];

then one has a problem if variant would not remember the type of the
pointer and assumes it to be 'void *'. the tags are generally used for
VM object types, rather than for arbitrary type signatures.

alternatively, it could do like BSVM, and have the VM use heap-allocated
boxed pointers in this case.

possible:
0x9TTTYYYY'XXXXXXXX: Primitive-Type Pointer
X is the low bits of the pointer.
Y is also part of the pointer for 64 bits.
For 32-bits (Possible):
High 3 bits:
0: Signature
1: '*'
2: '**'
3: '***'
4: '****'
5: '*****'
6: '******'
7: '*******'
Low 13 bits:
0: Unbounded / Signature Based
1-8191: Bounded
T: index into a type signature table.
0-15 or 0-255 are special cases (built-in types).
0: 'int *'
1: 'long long *'
2: 'float *'
3: 'double *'
4: 'void **'
5: 'void *'
6: reserved
7: 'long *'
8: 'char *'
9: 'unsigned char *'
10: 'short *'
11: 'unsigned short *'
12: 'unsigned int *'
13: 'unsigned long long *'
14: 'unsigned long *'
15: reserved
256+ would be for types defined via signatures (structs, ...).

or such...

Bartc

unread,

Jul 4, 2015, 3:25:43 PM7/4/15

to

On 04/07/2015 18:59, BGB wrote:
> On 6/30/2015 11:44 AM, Bartc wrote:

[Implementing variant types]

I've been overhauling my dynamic language and interpreter, initially to
be a lot more Python-like, but have since rejected most new ideas (as I
decided mine were better and yielded a faster interpreter!).

That including looking at using 64-bit references as the main way of
moving data around. Both with scalars stored on the heap, and by means
of bit-tagging within the pointers.

Ultimately I decided my old way was best, which is to use a composite
(tag, value) object as the main way to push and pop data. On a 64-bit
system, this is only double the size that's needed anyway (128-bits),
but the advantages are:

* Full-size 64-bit int and float types
* No need do any bit-fiddling because the pointer and value are mixed up

> TBD: currently there is no good way to encode a primitive-typed pointer
> in the FRVM tagged-reference.

* And, with 128-bits, there is space to store both the tag for the
pointer, the type that it points to, and the pointer itself.

(In fact, on a 32-bit target, or on my 64-bit compiler with 32-bit
pointers, I manage to store all the info I need for types such as
strings and lists within the 128-bits. Obviously the string and list
contents are at the other end of a pointer.)

....

> T: index into a type signature table.
> 0-15 or 0-255 are special cases (built-in types).
> 0: 'int *'
> 1: 'long long *'
> 2: 'float *'
> 3: 'double *'
> 4: 'void **'
> 5: 'void *'
> 6: reserved
> 7: 'long *'
> 8: 'char *'
> 9: 'unsigned char *'
> 10: 'short *'
> 11: 'unsigned short *'
> 12: 'unsigned int *'
> 13: 'unsigned long long *'
> 14: 'unsigned long *'
> 15: reserved
> 256+ would be for types defined via signatures (structs, ...).

I've also separated out all the type codes used for 'packed' types (all
the ones you get in C), from those used within the variant system. So
that the latter is more streamlined and far simpler.

--
Bartc

BGB

unread,

Jul 4, 2015, 10:01:16 PM7/4/15

to

I had come from the opposite direction, originally mostly using 32-bit
values with a mix of dynamic lookup, and parts of the address space used
for literal values (had 28 bit fixnum and flonum).

the transition to 64-bit (in the BSVM) was mostly because it actually
saved memory, due to significant reductions in the amount of boxing
(outweighing the increased reference size), and similarly made things
faster.

also partly it was because the BSVM was technically difficult to move
entirely over to static types.

mostly-full-sized types was sufficient.
the main priorities at the time had been to unbox 'int' and 'double',
and to these ends it has been successful.

>> TBD: currently there is no good way to encode a primitive-typed pointer
>> in the FRVM tagged-reference.
>
> * And, with 128-bits, there is space to store both the tag for the
> pointer, the type that it points to, and the pointer itself.
>
> (In fact, on a 32-bit target, or on my 64-bit compiler with 32-bit
> pointers, I manage to store all the info I need for types such as
> strings and lists within the 128-bits. Obviously the string and list
> contents are at the other end of a pointer.)
>

but, 128 bits is a bit steep.
more so for a VM that is mostly still used on 32-bit hardware (a lot of
it not particularly high-stat).

like, if your memory is measured in MB, there is still some need to be
concerned with memory footprint.

this is partly why the BSVM wasn't being used (besides latency and code
size), was that it had a memory footprint of around 150MB, which is
pretty steep if used on a board with 256MB RAM. though, a lot of the
memory use was due to how the FFI and metadata system worked.

luckily, the FRVM currently needs nowhere near this much memory (my CNC
program running, with this VM in use, currently has a footprint of
around 5MB).

then again, the tagged reference type isn't used all that prominently in
FRVM (unlike in the BSVM).

currently in C-Aux, it isn't used at all.
in BS-RT, it is mostly (besides 'variant') likely to be used for
'string', mostly because some tagging is needed to separate between
various 'string' sub-types (they may be UTF-16, UTF-8, CP-1252, or CP-437).

basically, the BS-RT 'string' type is essentially a thinly veiled
character pointer, relying on the type-tag to know what character set it
is using.

> ....
>> T: index into a type signature table.
>> 0-15 or 0-255 are special cases (built-in types).
>> 0: 'int *'
>> 1: 'long long *'
>> 2: 'float *'
>> 3: 'double *'
>> 4: 'void **'
>> 5: 'void *'
>> 6: reserved
>> 7: 'long *'
>> 8: 'char *'
>> 9: 'unsigned char *'
>> 10: 'short *'
>> 11: 'unsigned short *'
>> 12: 'unsigned int *'
>> 13: 'unsigned long long *'
>> 14: 'unsigned long *'
>> 15: reserved
>> 256+ would be for types defined via signatures (structs, ...).
>
> I've also separated out all the type codes used for 'packed' types (all
> the ones you get in C), from those used within the variant system. So
> that the latter is more streamlined and far simpler.
>

these ones here are mostly to represent raw pointer types.
this type numbering is derived from what is used in the FRBC bytecode.

note that the same basic part of the tag space could likely also
represent struct pointers.

both C-Aux and BS-RT use a C-like type-systems.

FWIW, core BS-RT types, their C-Aux equivalents:
byte , unsigned char , 8-bit unsigned byte
sbyte , signed char , 8-bit signed byte
short , short , 16-bit signed short
ushort , unsigned short , 16-bit unsigned short
int , int , 32-bit signed integer
uint , unsigned int , 32-bit unsigned integer
long , long long , 64-bit signed integer
ulong , unsigned long long , 64-bit unsigned integer
nlong , long , 32/64-bit signed integer
unlong , unsigned long , 32/64-bit unsigned integer
char , wchar_t , 16-bit character
float , float , 32-bit float
double , double , 64-bit double

both make use of structs, pointers, and pointer arithmetic.

in BS-RT, the dynamic type-system will exist partly as an extension of
the static type-system, rather than as a separate system. both languages
will be primarily statically-typed.