On 7/7/2018 5:39 AM, Bart wrote:
> On 07/07/2018 02:20, Rick C. Hodgin wrote:
>> On Friday, July 6, 2018 at 7:48:56 PM UTC-4, Chris M. Thomasson wrote:
>>> What type of operations would you use this for? Perhaps a read mostly,
>>> write rarely type of scenario? Do you think it could scale to a
>>> situation where there are a lot of sustained writes?
>>
>> Various parts of a compiler. Picture a source file with #include
>> files. You can load the one file, lex everything, assign keywords,
>> begin parsing, encounter an #include file, repeat the process on it,
>> then load that loaded + lexed + tokenized chain into the original
>> chain after the #include line, etc.
>>
>> SControl maintains every loaded #include file, and its full token
>> range start to end, and the source code is one continuous expression
>> from top to bottom.
>
> Is this an actual compiler, or that header-file reducer you mentioned elsewhere?
Actual compiler.
> And will the list elements be actual tokens?
I call them components. In this expression there would be these:
a = b + c;
[a][whitespace][=][ws][b][ws][+][ws][c][;]
The whitespaces are removed, leaving:
[a][=][b][+][c][;]
The expression is reduced to its footprint by the semicolon terminator:
[a][=][b][+][c]
> Because it seems an elaborate way of handling it. When would you delete
> tokens that have already long since been parsed? (And false #if/#elif blocks
> already dealt with.)
Most of them are not actually deleted, but migrated into an operations
chain which then auto-injects things to make it work functionally.
Suppose you have:
float a; // 32-bit floating point
double b, c; // 64-bit floating point
set_values(&b, &c);
a = b + c;
In order to get this to work, you have to change the operation (ops)
order to something usable:
step 1: add b,c
step 2: round to 32-bit float, saturate if need be
step 3: store a
So your actual expression "a = b + c;" becomes something more like:
a = saturate_f64_to_f32(b + c);
So you wind up injecting operations in your tokens:
[a][=][saturate_f64_to_f32][(][b][+][c][)]
Then you have physical operation steps:
01: [load b to left]
02: [load c to right]
03: [add]
04: [store result to t0]
05: [push t0 to stack]
06: [call saturate_f64_to_f32]
07: [store return value to t1]
08: [load t1]
09: [store t1 to a]
10: [store t1 to t2]
Tokens can then be deleted from the ops chain after they're consumed.
After step 4, [b][+][c] are replaced with [t0], reducing the expression
to:
[a][=][saturate_f64_to_f32][(][t0][)]
After step 7, [saturate_f64_to_f32][(][t0][)] are replaced with [t]:
[a][=][t1]
After step 9, the final expression [t2] is the last result, and all
processing of that expression is completed:
[t2]
-----
In this way, everything in source code is lexed and parsed as is, and
places where things need to be injected or deleted they are able. The
common expression parsing engine injects auto-fixups for type needs,
auto-injects functions like saturate_f64_to_f32() for CAlive's needs,
and so on.
These are all done with internal codes for the operations taking place,
and are not done with source code. The only source code references
which remain are for named tokens. The rest are all symbolic.
> Also, if the list represents a flattening of all tokens in source code that
> includes #includes, then that structure can not only be deeply nested, but
> the same header, and the same tokens, can occur multiple times. I think
> recursively too.
Correct. This was an issue. I decided for CAlive to give #include a
a new meaning. Since CAlive does not require forward-declarations,
when you use #include it will only load a source file one time, and
it keeps track of what's been loaded so that it only loads it one time.
The expression here:
#include <something.h>
Which is:
[#include][<][something.h][>]
And it's converted to this after loading, or after a subsequent reference
to it when it's already been loaded:
[#include_loaded][<][something.h][>]
So even if it loads or not, it marks it as being loaded and that line
is completed.
Then, to meet the needs where source files actually need to be included
in compilation more than one place, I introduce #reinclude:
#reinclude "myfile.h"
This will load the source file as is multiple times, wherever refer-
enced. And, when you compile in -c90 or -c99 mode, it will work as
it does in C, wrapping each #include to a #reinclude internally.
-----
In CAlive, because forward-declarations are not a requirement, every-
thing for the compilation is loaded, including those things which branch
into unused #ifdef..#endif blocks. They are then parsed in subsequent
passes, and are ultimately deleted if they are unused.
And again, in -c90 or -c99 mode, it works as it does today. My goals
are to get CAlive mode working in total, and then go back and add in
the constraints and relaxations C requires. I do not expect to have
full -c90 or -c99 support until probably 2022 because it is a low
priority for me. My true goals are to draw people away from C and C++
and get them using CAlive for their future projects.
--
Rick C. Hodgin