Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

OT: SmallC. Steve, I've looked at the heir's. It should be easy to implement them for && and ||.

27 views
Skip to first unread message

Rod Pemberton

unread,
Sep 19, 2010, 6:40:22 AM9/19/10
to

Steve,

I've looked at the SmallC heir's. It should be easy to implement them for
logical-and && and logical-or ||.

C doesn't officially have precedence levels, but it effectively has 16
precedence levels. (H&S "C. Ref. Man." 3rd w/corrections, pg. 167) It seems
SmallC has 11. It skipped a few levels. It merged two levels together.
And, one implemented level is trivially incomplete. However, it should be
relatively straightforward to add heir()'s for "&&" and "||", logical-and
and logical-or respectively.

Each heirX() corresponds to the precedence level. They are called from the
lowest precedence to the highest, i.e., reverse order. I.e., heir1() which
calls heir2() before doing anything, heir2() calls heir3(), heir3()... To
add "&&" and "||", one needs to insert two levels between hier1() and
hier2(). The first level is for "||". It's routine should be much like
heir2(). Except, it has match("||") instead of match("|"). And, the zor()
is changed to a logical-or routine. The second level is for "&&" which
should be like heir4(). Again, match() and zand() must be adjusted. The
hierX() calls within each routine must be adjusted to call the next higher
level. hier1() is modified to call the first inserted level for "||". The
first inserted level for "||" is modified to call the second inserted level
for "&&". The second inserted level for "&&" calls the original heir2().

I haven't tested this, or implemented the assembly yet, but it should work.
The complexities of the missing "." and "->", selection and indirection,
would probably be too much. FYI, they get added to hier11(). The postfix
"++" and "--" which are in hier10() are supposed to be in heir11(). There
are two "++" and two "--" sections in heir10(). I believe the prefix "++"
and "--" come at the start of hier10() and the postfix are the two that come
later. The sizeof() operator is supposed to be in heir10(). casts are
supposed to be between hier10() and hier9(). The incomplete level is
heir10() which is missing the trivial "unary plus" - not the "addition plus"
which is in hier8(). The combination assignment operators aren't
implemented. They go in heir1(). The lowest hier's are supposed to be:

& (implemented)
~ (implemented)
| (implemented)
&& (unimplemented)
|| (unimplemented)
?: (unimplemented)
= += -= *= /= %= <<= >>= &= ^= |= (mostly unimplemented)
, (unimplemented)


That would require inserting 3 levels between heir1() and heir2(). However,
supporting the "ternary" operator :? may be difficult. It's behavior is
runtime dependent. Inserting a level "," would require 4 levels and heir1()
reworked for "," instead of "=". heir1() appears to be a "base" level, so I
don't think you can insert before it.


Rod Pemberton

s_dub...@yahoo.com

unread,
Sep 19, 2010, 8:38:52 PM9/19/10
to
On Sep 19, 5:40 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:

> Steve,
>
> I've looked at the SmallC heir's.  It should be easy to implement them for
> logical-and && and logical-or ||.
>
> C doesn't officially have precedence levels, but it effectively has 16
> precedence levels. (H&S "C. Ref. Man." 3rd w/corrections, pg. 167)  It seems
> SmallC has 11.  It skipped a few levels.  It merged two levels together.
> And, one implemented level is trivially incomplete.  However, it should be
> relatively straightforward to add heir()'s for "&&" and "||", logical-and
> and logical-or respectively.
>
Hi Rod,
see: http://www.project-fbin.hostoi.com/CCN8_HTM/CCN8.19sep2010.C.HTML
I've put up this work-in-progress code from march in responce to your
msg here.
search the code for: /* >>>>>>> start of cc5 <<<<<<< */
as there I have comments about the precedence levels. I'd appreciate
any comments you may have. I thought the precedence of operators, and
their R-L associativity, is officially established in K&R, perhaps
another misconception I have about C syntax.

-in short-
/** - WARNING, pre-release, work-in-progress, WARNING! **/
/**----------------------------------------------------**/
/** File: CCN8.C By: s_dub...@yahoo.com **/
/** ?(C): COPYLEFT, spin fold or mungicate it, just re-**/
/** name it avoid confusion with this named codebase.**/
/** Last: 19-Sep-10 06:12:18 PM **/
/** Prev: 28-Mar-10 10:59:04 PM, 30-Mar-10 08:19:35 AM **/
/** 01-Apr-10 09:23:04 PM. **/
/** Base: 17-Feb-10 11:23:53 AM **/
/** Vers: 0.0.2 r7hd 26-Mar-2010 **/
/**----------------------------------------------------**/
/** 19-Sep-10 06:12:18 PM - added copyleft. this is **/
/** BETA prelease code, testing and code is incomplete**/
/**----------------------------------------------------**/
/** 01-Apr-10 04:14:34 PM 2rhd **/
/** worked on ?: **/
/**----------------------------------------------------**/
/** 28-Mar-10 09:08:12 PM 2r7hc **/
/** - worked on - !, ~, **/
/**----------------------------------------------------**/
/** 20-Mar-10 09:57:15 AM 2r7h **/
/** inbyte()->scan_nxt(), heir1b for '||' boolean **/
/**----------------------------------------------------**/
/** 19-Mar-10 10:22:58 PM 2r7h **/
/** - worked on heir1c for '&&' boolean - **/
/**----------------------------------------------------**/
/** 17-Mar-10 09:22:40 AM 2r7h **/
/** -begin adding fuctionality- **/

I parked work on this code since march - the syntax for array of
pointers to char is not handled properly - among other things..

Steve

Maxim S. Shatskih

unread,
Sep 20, 2010, 12:31:27 AM9/20/10
to
> I've looked at the SmallC heir's. It should be easy to implement them for

For me, parsing C by recursive descent is kinda a toy. Good as a proof-of-concept and in the educational book, not good for real practice.

--
Maxim S. Shatskih
Windows DDK MVP
ma...@storagecraft.com
http://www.storagecraft.com

Alexei A. Frounze

unread,
Sep 20, 2010, 12:49:53 AM9/20/10
to
On Sep 19, 9:31 pm, "Maxim S. Shatskih"

<ma...@storagecraft.com.no.spam> wrote:
> > I've looked at the SmallC heir's.  It should be easy to implement them for
>
> For me, parsing C by recursive descent is kinda a toy. Good as a proof-of-concept and in the educational book, not good for real practice.

Parsing C code (C99), checking for syntactic and other errors and
transforming into executable code isn't all that toyish, even though
more or less straightforward. I'd say parsing ASM, Basic, Lisp is a
toy. :) And Pascal probably as well.

Alex

Alexei A. Frounze

unread,
Sep 20, 2010, 1:02:15 AM9/20/10
to
On Sep 19, 3:40 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:

> C doesn't officially have precedence levels, but it effectively has 16
> precedence levels. (H&S "C. Ref. Man." 3rd w/corrections, pg. 167)

Um, the C standard does contain phrases such as "precedence of
operators" (section 6.5 Expressions), but where it effectively
establishes this precedence it uses slightly different wording, if any
at all. It introduces operators and their respective expressions (e.g.
postfix-expression, unary-expression, ..., multiplicative-
expression, ..., assignment-expression) and defines operator
expressions in terms of other (operator) expressions. This relative
definition of (sub)expressions establishes what we know as operator
precedence.

Alex

Rod Pemberton

unread,
Sep 20, 2010, 6:27:39 AM9/20/10
to
<s_dub...@yahoo.com> wrote in message
news:c5a86133-e3e5-4165...@f26g2000vbm.googlegroups.com...

On Sep 19, 5:40 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
>
> > I've looked at the SmallC heir's. It should be easy to implement them
> > for logical-and && and logical-or ||.
>
> > C doesn't officially have precedence levels, but it effectively has 16
> > precedence levels. (H&S "C. Ref. Man." 3rd w/corrections, pg. 167) It
> > seems SmallC has 11. It skipped a few levels. It merged two levels
> > together. And, one implemented level is trivially incomplete. However,
> > it should be relatively straightforward to add heir()'s for "&&" and
"||",
> > logical-and and logical-or respectively.
>
> see:
> http://www.project-fbin.hostoi.com/CCN8_HTM/CCN8.19sep2010.C.HTML
> I've put up this work-in-progress code from march in responce to your
> msg here. search the code for: /* >>>>>>> start of cc5 <<<<<<< */
> as there I have comments about the precedence levels. I'd appreciate
> any comments you may have.

Ok.

> I thought the precedence of operators, and
> their R-L associativity, is officially established in K&R,
>

Got a few minutes? ... Long post ahead.

> K&R,

Sorry, I never read that one. IIRC, the .pdf for a newer version is freely
available online. I've got a large collection of C books, packed away. I
learned C from a bunch of other books, but found Harbison and Steele "C: A
Reference Manual", 3rd. ed. 1991 Prentice Hall to be invaluable. It's the
only one I look at. It covered Traditional C and ANSI C. They have newer
versions, e.g., for C99. H&S says that the grammar effectively defines the
precedence levels. They had 17 levels, but an error correction left 16.

> I've put up this work-in-progress

Yeah, I attempted adding some of the functionality from RM Yorston's Z80
version. I was wanting the struct support. But, he radically changed a
bunch of stuff. Some stuff drops right in, but the other stuff is not a
good fit. He also used newer features within the SmallC program like
struct's that I was having to figure out how to implement with #define's and
arrays. I've got another one on the back-burner too. It added a
stackframe, which IIRC, you did for one of yours. I'm not working on "my
version" as enthusiastically as you are on "yours". But, I think I'm going
to leave stackframe attempt there for a while.

> I'd appreciate any comments you may have.

It looks good to me, for whatever that's worth... ;) I'm trying to keep the
old namings of routines and original functionality, etc. Once they get
renamed or functionality gets changed too much things from other SmallC
versions don't just "drop-in" anymore. I'd like to keep some "cut-n-paste"
ability.

I'm guessing there was collision with matches for & and &&, | and ||,
since you added some checks. I notice that you didn't update the & and | to
&& and || in the main program, e.g., like Yorston did with struct's... Are
you still bootstrapping from an earlier version? From the ternary operator
routine, I'd guess you've figured out how the parsing works. I haven't
fully. Are you just converting it to if-else or are you preserving the
runtime semantics?


FYI, H&R 3rd summary says ?: is:

-right associative
-first argument is scalar
-second and third arguments are unary conversions
-first operand is evaluated
-first operand tested against zero via ==
-if 0:
--second operand evaluated
--second operand type conversion
--third operand not evaluated
-if not 0:
--third operand evaluated
--third operand type conversion
--second operand not evaluated

It has two charts, one for ANSI C and one for traditional C, covering the
allowed type conversions between the arguments. I suspect the "not
evaluated" part is what most call the run-time semantics.


> [...] I have comments about the precedence levels.

"Your" current heirX()'s versus H&S levels (since my other post was
cryptic...):

heir1() =
TBI += -= /= *= %= <<= >>= &= ^= |=
heir1a() ?:
heir1b() ||
heir1c() &&
heir2() |
heir3() ^
heir4() &
heir5() == !=
hier6() < > <= >=
hier7() >> <<
hier8() + -
hier9() * / %
hier10() ++ -- - ~ ! * & ++ --
TBI cast sizeof
I think both prefix and postfix ++ -- are here... ?
hier11() () [] -> .


If you implemented H&S 17, it be:

1 heir0() ,
comma for sequential evaluation
2 heir1() = += -= /= *= %= <<= >>= &= ^= |=
3 heir1a() ?:
4 heir1b() ||
5 heir1c() &&
6 heir2() |
7 heir3() ^
8 heir4() &
9 heir5() == !=
10 hier6() < > <= >=
11 hier7() >> <<
12 hier8() + -
13 hier9() * / %
14 hier9a() cast
15 hier10() ++ -- + - ~ ! * & sizeof
unary + -
prefix ++ --
indirection *
address of &
16 (removed)
17 hier11() ++ -- () [] -> . function_calls names literals
postfix ++ --

Hier #1 through #9 are the same. "heir0" and "heir9a" are "new". I think
heir10 gets split: casts to heir9a, postfix ++ -- to heir11, prefix ++ --
stays, and unary + gets added to it.


Questions:

In heir1b() and heir1c(), you convert the two arguments to bool via
sa_mk_bool(). The second argument is converted after a call to another
heirX() level. The first is converted prior to that hierX(). Should the
first be converted after? Should it be popped, converted, pushed after the
"if(heirX())" in the while(1)? I don't know what the correct answer is.
I'm just asking if converting the argument prior to a calling another
heirX() might mess up the pushed argument's value? I.e., does another
heirX() level use this pushed value?

Is there some code from RMY's Z80 version? I see that odd callrts()
function in there...


Future?

Well, maybe your next step is fixing parameter list for ANSI C and then
structs. Then, you can use GCC etc. to verify updating it to ANSI C. IIRC,
you'll need to reorder the heirX()'s from 11 to 1, since GCC won't like that
ordering without function declarations, e.g., forward reference from hier4()
to heir5() etc. One of the problems I had with the version I'm working on w
as that there are quite a few implicitly done things in this C, like all
pointers being "char *", all types as "int" or "char", no typedef's, and no
struct's - using "arrays" and defines, & and | instead of && and ||, etc.
GCC complained about that stuff with -ansi flag.


Rod Pemberton


Maxim S. Shatskih

unread,
Sep 20, 2010, 6:47:09 AM9/20/10
to
>more or less straightforward. I'd say parsing ASM, Basic, Lisp is a
>toy. :) And Pascal probably as well.

By recursive descent??? I was about recursive descent, not C.

Pascal, BTW, is better parsed by LL(1)

Rod Pemberton

unread,
Sep 20, 2010, 8:36:59 AM9/20/10
to
"Rod Pemberton" <do_no...@notreplytome.cmm> wrote in message
news:i77cua$ajb$1...@speranza.aioe.org...

> <s_dub...@yahoo.com> wrote in message
> news:c5a86133-e3e5-4165...@f26g2000vbm.googlegroups.com...
> On Sep 19, 5:40 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
> >

[SNIP long reply by me]

> > I'd appreciate any comments you may have.
>

> I'm guessing there was collision with matches for & and &&, | and ||,
> since you added some checks.
>

Did you complicate heir1c() just a bit?

There is a collision with && and &, and || and |. That needed two lines
added to heir4(). "My" heir2_b() is "your" heir1c() Mine is identical to
the original heir4() except for match("&&") and calls to different heirX()
level. You've changed the chr() check for '&' to a string check for "&&"...
"My" logical || and | are the done the same way. This seems to work for
both. lang() and lor() - instead of zand() and zor() - take care of the
logical conversion before binary and-ing and or-ing. I.e., no sa_mk_bool
within heirX() levels. It does require a few more lines of assembly code to
handle the secondary, but it's fewer than sa_mk_bool() calls. For the
logical conversion, I used "NEG reg; SBB reg,reg" sequence: zero for
zero, -1 otherwise. It's register size independent and requires no jump.

heir4(lval)
int lval[];
{
int k,lval2[2];

k=heir5(lval);
blanks();
if(chr()!='&')
return k;
if(streq(line+lptr,"&&")) /* new */
return k; /* new */
if(k)
rvalue(lval);
while(1)
{
if(match("&"))
{
zpush();
if(heir5(lval2))
rvalue(lval2);
zpop();
zand();
}
else
return 0;
}
}

heir2_b(lval)
int lval[];
{
int k,lval2[2];

k=heir2(lval);
blanks();
if(chr()!='&')
return k;
if(k)
rvalue(lval);
while(1)
{
if(match("&&")) /* diff */
{
zpush();
if(heir2(lval2))
rvalue(lval2);
zpop();
land();
}
else
return 0;
}
}


Rod Pemberton


s_dub...@yahoo.com

unread,
Sep 20, 2010, 2:48:49 PM9/20/10
to
On Sep 20, 5:27 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <s_dubrov...@yahoo.com> wrote in message

From other things you've pointed out from 'Harbison and Steele' it is
a valuable resource, I've yet to get it, sure wish I'd had it back in
the day..

Well, small-c is, after all, a subset of K&R, so that is my definitive
guide, such as it is..

I guess one should ask oneself what the goal is. (left for later
discussion). But my interest in it was; a. self-compiling compiler,
b. understanding the implementation of recursive descent parsing of
binary operators on expressions.

> only one I look at.  It covered Traditional C and ANSI C.  They have newer
> versions, e.g., for C99.  H&S says that the grammar effectively defines the
> precedence levels.  They had 17 levels, but an error correction left 16.
>
> > I've put up this work-in-progress
>
> Yeah, I attempted adding some of the functionality from RM Yorston's Z80
> version.  I was wanting the struct support.  But, he radically changed a
> bunch of stuff.  Some stuff drops right in, but the other stuff is not a
> good fit.  He also used newer features within the SmallC program like
> struct's that I was having to figure out how to implement with #define's and

I've gone back to Yorston's credits to check..(note1)

char Author[] = " Cain, Van Zandt, Hendrix, Yorston" ;


> arrays.  I've got another one on the back-burner too.  It added a
> stackframe, which IIRC, you did for one of yours.  I'm not working on "my
> version" as enthusiastically as you are on "yours".  But, I think I'm going
> to leave stackframe attempt there for a while.
>
> > I'd appreciate any comments you may have.
>
> It looks good to me, for whatever that's worth... ;)  I'm trying to keep the
> old namings of routines and original functionality, etc.  Once they get
> renamed or functionality gets changed too much things from other SmallC
> versions don't just "drop-in" anymore.  I'd like to keep some "cut-n-paste"
> ability.
>
> I'm guessing there was collision with matches for & and &&, | and ||,
> since you added some checks.  I notice that you didn't update the & and | to
> && and || in the main program, e.g., like Yorston did with struct's...  Are
> you still bootstrapping from an earlier version?  From the ternary operator

Yes, earlier versions, before Van Zandt's DOUBLE,
Cain v1.1, &(Glen Fisher) v1.1,x, Van Zandt v1.2, Hendrix v2, Yorston

AFAICT, Hendrix at v2 copyrighted, after the modification of the v1.x
public domain code, so I try to avoid v2 stuff altogether, I don't
think any of v2 stuff has crept in. My additions, unique to me, I
affirm as COPYLEFT now. The ternary ?: is cribbed from Chris Lewis's
Unix cross compiler version SCC3 Small C version C3.0R1.1 of which the
readme says:
"
This directory contains the source for a version of Ron Cain's Small
C
compiler that I have heavily modified - beyond the Small-C V2.0 later
published in Dr. Dobbs. This compiler generates assembler source code
that
needs to be assembled and linked to make a running program.

Small C is a public domain compiler for a subset of C. The main
things
lacking are "#if", structs/unions, doubles/floats/longs and more than
one level of indirection. Even so, it's powerful enough to be able to
compile itself. It's also lots of fun to play around with. It could
use lots of more work (eg: a real scanner), but what the heck...
Retargetting the compiler requires only relinking the frontend with a
new
code generator.
"

So I infer his code as 'public domain' as well.

> routine, I'd guess you've figured out how the parsing works.  I haven't
> fully.  Are you just converting it to if-else or are you preserving the
> runtime semantics?
>

The ternary syntax isn't handled properly yet..
====== main()
Line +5, main+4: invalid expression
if ( y < z ) ? y : z;
^
..but my thoughts were to preserve the runtime sematics - as I say,
this is the 'work in progress'.

> FYI, H&R 3rd summary says ?: is:
>
> -right associative
> -first argument is scalar
> -second and third arguments are unary conversions
> -first operand is evaluated
> -first operand tested against zero via ==
> -if 0:
> --second operand evaluated
> --second operand type conversion
> --third operand not evaluated
> -if not 0:
> --third operand evaluated
> --third operand type conversion
> --second operand not evaluated
>
> It has two charts, one for ANSI C and one for traditional C, covering the
> allowed type conversions between the arguments.  I suspect the "not
> evaluated" part is what most call the run-time semantics.
>
> > [...] I have comments about the precedence levels.
>
> "Your" current heirX()'s versus H&S levels (since my other post was
> cryptic...):
>

Thanks for the comparisons, and the ternary info.

Yeah, my motto: first see if you can do it, then see if you did it
right!

There is a collision in the use of binary and logical operations that
was supposed to be cleaned up in the type of BOOL. Traditional C
often uses: if (expr) .. where expr evaluates to any value but
the ..if.. syntax (_generically_) implies a T or F. But since an
expression includes an assignment, things like; if (c=a+2) .. have to
be considered for their assignment side effects also. One of those 'C-
isms' I guess. I haven't tried to confirm the 'standards-correct'
handling of it yet.

> I'm just asking if converting the argument prior to a calling another
> heirX() might mess up the pushed argument's value?  I.e., does another
> heirX() level use this pushed value?
>

Well, I'm not 100% sure. An expression is evaluated to some value,
left in the primary register as the initial operand (note2). When the
lexeme for an operator is found in the source text by the code
matching for it, at its precedence level, heirX(), the code pushes the
current primary register onto the stack, and the next expression is
evaluated into the newly free primary register. The code execution
for the current operator then unstacks the initial operand into the
secondary register and performs the binary operation between the
secondary and primary register, leaving the result in the primary
register - this is the execution point of possible recursion if the
expression further requires it, we are at the state of (note2) above,
again. -So this sounds like no, but...

Your question relates to at what point in time a Bool would mess up a
side effect of evaluation of an expression where the side effect is
not meant to be a Bool, as in an assignment, again like; if (c=a
+2) .. . This is one of those darker corners that needs careful
inspection, and I don't have the answer yet.

> Is there some code from RMY's Z80 version?  I see that odd callrts()
> function in there...
>

(note1) callrts() is first seen in gtf's (Glen Fisher) distribution of
small-c from 1980, he did the cp/m-80 port for Ron Cain's public
domain code in order to distribute it to cp/m-80 fans..

http://www.project-fbin.hostoi.com/Proj_SmC/CainC/C80CPM.C.HTML

header(){ ...
callrts("ccgo"); /* set default drive for CP/M */
zcall("main"); /* call the code generated by small-c */
zcall("exit"); /* do an exit gtf 7/16/80 */
... }

> Future?

Good leading question.. In a way I've satisfied my a.&b. interest and
the effort has certainly taught me alot, cleared up some questions,
and introduced new one's.

>
> Well, maybe your next step is fixing parameter list for ANSI C and then
> structs.  Then, you can use GCC etc. to verify updating it to ANSI C.  IIRC,
> you'll need to reorder the heirX()'s from 11 to 1, since GCC won't like that
> ordering without function declarations, e.g., forward reference from hier4()
> to heir5() etc.  One of the problems I had with the version I'm working on w
> as that there are quite a few implicitly done things in this C, like all
> pointers being "char *", all types as "int" or "char", no typedef's, and no
> struct's - using "arrays" and defines, & and | instead of && and ||, etc.
> GCC complained about that stuff with -ansi flag.
>

Yes. I've looked at assignments to variables at declaration time, the
parsing for it looks to be a major rewrite. I've also looked at
memory models past .com, as more space is needed soon. Updating to
ANSI C seems afar off, as memory allocation implementation is specific
to an enviornment.

But, what are your goals for small-c?

Thx,
Steve

> Rod Pemberton- Hide quoted text -
>
> - Show quoted text -

s_dub...@yahoo.com

unread,
Sep 20, 2010, 3:06:23 PM9/20/10
to
On Sep 20, 5:27 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <s_dubrov...@yahoo.com> wrote in message

From other things you've pointed out from 'Harbison and Steele' it is


a valuable resource, I've yet to get it, sure wish I'd had it back in
the day..

Well, small-c is, after all, a subset of K&R, so that is my definitive
guide, such as it is..

I guess one should ask oneself what the goal is. (left for later
discussion). But my interest in it was; a. self-compiling compiler,
b. understanding the implementation of recursive descent parsing of
binary operators on expressions.

> only one I look at. It covered Traditional C and ANSI C. They have newer


> versions, e.g., for C99. H&S says that the grammar effectively defines the
> precedence levels. They had 17 levels, but an error correction left 16.
>
> > I've put up this work-in-progress
>
> Yeah, I attempted adding some of the functionality from RM Yorston's Z80
> version. I was wanting the struct support. But, he radically changed a
> bunch of stuff. Some stuff drops right in, but the other stuff is not a
> good fit. He also used newer features within the SmallC program like
> struct's that I was having to figure out how to implement with #define's and

I've gone back to Yorston's credits to check..(note1)

char Author[] = " Cain, Van Zandt, Hendrix, Yorston" ;

> arrays. I've got another one on the back-burner too. It added a
> stackframe, which IIRC, you did for one of yours. I'm not working on "my
> version" as enthusiastically as you are on "yours". But, I think I'm going
> to leave stackframe attempt there for a while.
>
> > I'd appreciate any comments you may have.
>
> It looks good to me, for whatever that's worth... ;) I'm trying to keep the
> old namings of routines and original functionality, etc. Once they get
> renamed or functionality gets changed too much things from other SmallC
> versions don't just "drop-in" anymore. I'd like to keep some "cut-n-paste"
> ability.
>
> I'm guessing there was collision with matches for & and &&, | and ||,
> since you added some checks. I notice that you didn't update the & and | to
> && and || in the main program, e.g., like Yorston did with struct's... Are
> you still bootstrapping from an earlier version? From the ternary operator

Yes, earlier versions, before Van Zandt's DOUBLE,


Cain v1.1, &(Glen Fisher) v1.1,x, Van Zandt v1.2, Hendrix v2, Yorston

AFAICT, Hendrix at v2 copyrighted, after the modification of the v1.x
public domain code, so I try to avoid v2 stuff altogether, I don't
think any of v2 stuff has crept in. My additions, unique to me, I
affirm as COPYLEFT now. The ternary ?: is cribbed from Chris Lewis's
Unix cross compiler version SCC3 Small C version C3.0R1.1 of which the
readme says:
"
This directory contains the source for a version of Ron Cain's Small
C
compiler that I have heavily modified - beyond the Small-C V2.0 later
published in Dr. Dobbs. This compiler generates assembler source code
that
needs to be assembled and linked to make a running program.

Small C is a public domain compiler for a subset of C. The main
things
lacking are "#if", structs/unions, doubles/floats/longs and more than
one level of indirection. Even so, it's powerful enough to be able to
compile itself. It's also lots of fun to play around with. It could
use lots of more work (eg: a real scanner), but what the heck...
Retargetting the compiler requires only relinking the frontend with a
new
code generator.
"

So I infer his code as 'public domain' as well.

> routine, I'd guess you've figured out how the parsing works. I haven't


> fully. Are you just converting it to if-else or are you preserving the
> runtime semantics?
>

The ternary syntax isn't handled properly yet..


====== main()
Line +5, main+4: invalid expression
if ( y < z ) ? y : z;
^
..but my thoughts were to preserve the runtime sematics - as I say,
this is the 'work in progress'.

> FYI, H&R 3rd summary says ?: is:


>
> -right associative
> -first argument is scalar
> -second and third arguments are unary conversions
> -first operand is evaluated
> -first operand tested against zero via ==
> -if 0:
> --second operand evaluated
> --second operand type conversion
> --third operand not evaluated
> -if not 0:
> --third operand evaluated
> --third operand type conversion
> --second operand not evaluated
>
> It has two charts, one for ANSI C and one for traditional C, covering the
> allowed type conversions between the arguments. I suspect the "not
> evaluated" part is what most call the run-time semantics.
>
> > [...] I have comments about the precedence levels.
>
> "Your" current heirX()'s versus H&S levels (since my other post was
> cryptic...):
>

Thanks for the comparisons, and the ternary info.

> heir1() =

Yeah, my motto: first see if you can do it, then see if you did it
right!

There is a collision in the use of binary and logical operations that
was supposed to be cleaned up in the type of BOOL. Traditional C
often uses: if (expr) .. where expr evaluates to any value but
the ..if.. syntax (_generically_) implies a T or F. But since an
expression includes an assignment, things like; if (c=a+2) .. have to
be considered for their assignment side effects also. One of those 'C-
isms' I guess. I haven't tried to confirm the 'standards-correct'
handling of it yet.

> I'm just asking if converting the argument prior to a calling another


> heirX() might mess up the pushed argument's value? I.e., does another
> heirX() level use this pushed value?
>

Well, I'm not 100% sure. An expression is evaluated to some value,
left in the primary register as the initial operand (note2). When the
lexeme for an operator is found in the source text by the code
matching for it, at its precedence level, heirX(), the code pushes the
current primary register onto the stack, and the next expression is
evaluated into the newly free primary register. The code execution
for the current operator then unstacks the initial operand into the
secondary register and performs the binary operation between the
secondary and primary register, leaving the result in the primary
register - this is the execution point of possible recursion if the
expression further requires it, we are at the state of (note2) above,
again. -So this sounds like no, but...

Your question relates to at what point in time a Bool would mess up a
side effect of evaluation of an expression where the side effect is
not meant to be a Bool, as in an assignment, again like; if (c=a
+2) .. . This is one of those darker corners that needs careful
inspection, and I don't have the answer yet.

> Is there some code from RMY's Z80 version? I see that odd callrts()
> function in there...
>


(note1) callrts() is first seen in gtf's (Glen Fisher) distribution of
small-c from 1980, he did the cp/m-80 port for Ron Cain's public
domain code in order to distribute it to cp/m-80 fans..

http://www.project-fbin.hostoi.com/Proj_SmC/CainC/C80CPM.C.HTML

header(){ ...
callrts("ccgo"); /* set default drive for CP/M */
zcall("main"); /* call the code generated by small-c */
zcall("exit"); /* do an exit gtf 7/16/80 */
... }

> Future?

Good leading question.. In a way I've satisfied my a.&b. interest and
the effort has certainly taught me alot, cleared up some questions,
and introduced new one's.

>


> Well, maybe your next step is fixing parameter list for ANSI C and then
> structs. Then, you can use GCC etc. to verify updating it to ANSI C. IIRC,
> you'll need to reorder the heirX()'s from 11 to 1, since GCC won't like that
> ordering without function declarations, e.g., forward reference from hier4()
> to heir5() etc. One of the problems I had with the version I'm working on w
> as that there are quite a few implicitly done things in this C, like all
> pointers being "char *", all types as "int" or "char", no typedef's, and no
> struct's - using "arrays" and defines, & and | instead of && and ||, etc.
> GCC complained about that stuff with -ansi flag.
>

Yes. I've looked at assignments to variables at declaration time, the

Alexei A. Frounze

unread,
Sep 21, 2010, 3:09:11 AM9/21/10
to
On Sep 20, 3:47 am, "Maxim S. Shatskih"

<ma...@storagecraft.com.no.spam> wrote:
> >more or less straightforward. I'd say parsing ASM, Basic, Lisp is a
> >toy. :) And Pascal probably as well.
>
> By recursive descent??? I was about recursive descent, not C.
>
> Pascal, BTW, is better parsed by LL(1)

If I'm not mistaken, Pascal types can be almost as complicated as C's,
it's got pointers and arrays and those can be "chained" indefinitely.
Also, I don't know if it's part of the standard, but Borland and other
compilers supported nested functions and procedures. While neither of
that has to be implemented explicitly with recursion, all can be. And
whether you want it or not, some kind of recursion or stack will be
needed internally in most compilers just to handle expressions and
other things nested and treeish things. Recursion is natural here.

Alex

Rod Pemberton

unread,
Sep 21, 2010, 3:19:06 AM9/21/10
to
<s_dub...@yahoo.com> wrote in message
news:cbfdc8a3-b20b-4773...@f26g2000vbm.googlegroups.com...

On Sep 20, 5:27 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <s_dubrov...@yahoo.com> wrote in message
> news:c5a86133-e3e5-4165...@f26g2000vbm.googlegroups.com...
> On Sep 19, 5:40 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
>
> > Is there some code from RMY's Z80 version? I see that odd callrts()
> > function in there...
> >
> (note1) callrts() is first seen in gtf's (Glen Fisher) distribution of
> small-c from 1980, he did the cp/m-80 port for Ron Cain's public
> domain code in order to distribute it to cp/m-80 fans..
>

Hmm... I must be missing that version. RMY's was the only one I located it
in. Actually, I don't recognize a couple of the names attributed in RMY's,
so my collection is probably incomplete, or not as complete as I thought.
You said V2 versions were copyright. I may have avoided those.

> But, what are your goals for small-c?

The main goal is complete. It originally was to get Evgueniy Vitchev's
smc386c.c version for Linux working with DJGPP. I "ported" it to DJGPP
(GCC) and OpenWatcom. After seeing yours, the goal was to get it
bootstrapped too. With your help (errors), some of your routines (dumpglb
etc.), NCC NASM code, your I/O lib, and Bob White's outdec(), etc. other, it
was bootstrapped a while back. In the process, the code was converted back
to original SmallC C syntax... If I didn't thank you previously, A BIG
THANK YOU!


These are or were loose "goals":

* parsing && || - This was just done but is not thoroughly tested.
* compile ANSI C syntax and coded in ANSI C syntax - I.e., I'd like it to be
bootstrappable with GCC which requires it to support parsing ANSI and be
written in ANSI C. I.e., compiles with GCC -ansi -pedantic without errors
or warnings. The version I started with *was* coded in modern C for Linux.
It wasn't bootstrappable since it only *compiled* SmallC syntax. In the
process of bootstrapping, I converted it back to SmallC style syntax based
on other versions of smallC. It was bootstrapped via NCC and uses C5LIBC.
* structs, switch, case, and default - I was attempting to use Yorston code.
Part of it is not a good fit. Some of it I "translated". It's not complete
enough to test.
* stackframes - The assembly code appeared correct. But, something wasn't
working correctly. This isn't needed. I thought it might reduce the code
size. It would also stabilized the addressing of items on the stack.
* smaller assembly - It works, but it seems to emit large amounts of
assembly for small amounts of work. I had problems with it compiling to
code small enough to fit into a .com.
* 32-bit - Just as the version I started with was in C for Linux, it was
also 32-bit. I converted it from 32-bit to 16-bit, from GAS to NASM.
32-bit support would be nice. That'd require #if X, #endif, to separate
16/32-bit code in a single file.
* strings in assembly instead of bytes - This isn't necessary, but a
nicety.

You've done some of them, but I haven't. I think that parsing ANSI code
may require casts, other integer types, other pointer types, ANSI style
parameter list. Structs, I realized from looking at your version, will also
need at least the selection "dot" operator for sub-elements. a->b is
equivalent to (*a).b So, -> can be done without. I almost never use the ,
comma operator, except where required, like in for() or a parameter list. I
don't need that. I don't need higher level loops, actually, for(),
do-while(), as long as I have an infinite loop and a way to escape it. I
don't need void types. Having a pointer without a type is nice. It fits
better with assembly. But, void pointer doesn't explicitly support pointer
arithmetic in C. That limits their usefulness as a pointer or requires
workarounds with address-of & and subset [] operators and casts. I like
having different integer sizes: 8, 16, 32 bits. I don't use qualifiers or
signed types or floating point much. switch() especially the "unstructured"
form, i.e., without a {} block, is useful for small programs. Having both
it and the structured switch, i.e., {} block and default: case, is nice.
The combination equality statements can be expanded manually, so, that isn't
absolutely necessary... just inconvenient. If I could update it to having
&& and ||, structs, switch, #if, #endif, more integer and pointer types,
it'd be much more useable. It'd be really nice if it had casts, since I use
them a bit with the address-of operator, mostly to work around conversion
restrictions in C's typesystem. I love typedef's, esp. for structs, since
they don't allocate space, and can be overlayed onto any object, esp.
"arrays" of chars, with a pointer. But, I doubt that I'll ever do anything
like that for SmallC. I might in one of my other in progress C-like
compilers.

I recall reading that the old PCC compiler handled structs _without_ knowing
the name of the structs. I'm not sure how that is done.


Rod Pemberton
PS. I see no difference between the 2:48 and 3:06 post.


Maxim S. Shatskih

unread,
Sep 21, 2010, 3:52:23 AM9/21/10
to
> Pascal, BTW, is better parsed by LL(1)
>If I'm not mistaken, Pascal types can be almost as complicated as C's,

Yes, but its syntax (including all smart features) is LL(1), while C's is IIRC not (and C++ is non deterministic).

>Recursion is natural here.

I'm not about recursion, but about manually coded recursive descent parser. There is yacc/bizon, so, why not describe the C grammar for it and run it, getting the LR(1) parsing table?

This is by far less error-prone then coding ~20 steps of recursive descent manually.

Alexei A. Frounze

unread,
Sep 21, 2010, 4:48:54 AM9/21/10
to
On Sep 21, 12:52 am, "Maxim S. Shatskih"

<ma...@storagecraft.com.no.spam> wrote:
> > Pascal, BTW, is better parsed by LL(1)
> >If I'm not mistaken, Pascal types can be almost as complicated as C's,
>
> Yes, but its syntax (including all smart features) is LL(1), while C's is IIRC not (and C++ is non deterministic).
>
> >Recursion is natural here.
>
> I'm not about recursion, but about manually coded recursive descent parser. There is yacc/bizon, so, why not describe the C grammar for it and run it, getting the LR(1) parsing table?
>
> This is by far less error-prone then coding ~20 steps of recursive descent manually.

I'm actually not sure if even C syntax is fully deterministic (we
might be talking about somewhat different things, though).
A little example to show that one needs to look at types to make sense
of syntactic constructs:
-------8<-------
#include <stddef.h>

#if 01
void x (void* ptr)
{
(void)ptr;
}
#else
typedef int x;
#endif

void* p = NULL;

int main(void)
{
x(p);
(void)p;
return 0;
}
-------8<-------
In order to know if x(p) can at all be followed by =, something has to
be known about x.

The parser has to be very "stateful".

Alex

Maxim S. Shatskih

unread,
Sep 21, 2010, 7:29:24 AM9/21/10
to
>In order to know if x(p) can at all be followed by =

I think that, if "x" is a type name, then "x(p)" is an error in C.

Not so in C++, there "typename(p)" is a valid syntax.

C++ parser must know what idents are typenames and what are not, it is "non-deterministic" in this sense (at least).

Actually, the lexical parser must access the typename table to parse "ident" and "typename" differently, after this C++ is (probably) LALR(1).

s_dub...@yahoo.com

unread,
Sep 21, 2010, 12:18:45 PM9/21/10
to
On Sep 21, 2:19 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>

wrote:
> <s_dubrov...@yahoo.com> wrote in message
>
> news:cbfdc8a3-b20b-4773...@f26g2000vbm.googlegroups.com...
> On Sep 20, 5:27 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
>
> > <s_dubrov...@yahoo.com> wrote in message
> >news:c5a86133-e3e5-4165...@f26g2000vbm.googlegroups.com...
> > On Sep 19, 5:40 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> > wrote:
>
> > > Is there some code from RMY's Z80 version? I see that odd callrts()
> > > function in there...
>
> > (note1) callrts() is first seen in gtf's (Glen Fisher) distribution of
> > small-c from 1980, he did the cp/m-80 port for Ron Cain's public
> > domain code in order to distribute it to cp/m-80 fans..
>
> Hmm...  I must be missing that version.  RMY's was the only one I located it
> in.  Actually, I don't recognize a couple of the names attributed in RMY's,
> so my collection is probably incomplete, or not as complete as I thought.
> You said V2 versions were copyright.  I may have avoided those.
>

It is the most common version for cp/m-80 generating 8080 code.
The Cain version is that from the DDJ articles. Cain didn't use cp/m,
he bootstrapped his own unixy OS, and his article about the iolib is
for it, which others used as a 'how-to' guide. That inspired me to do
the C5LIB for NCC, although I was already familiar with the cp/m-86
version.

> > But, what are your goals for small-c?
>
> The main goal is complete.  It originally was to get Evgueniy Vitchev's
> smc386c.c version for Linux working with DJGPP.  I "ported" it to DJGPP
> (GCC) and OpenWatcom.  After seeing yours, the goal was to get it
> bootstrapped too.  With your help (errors), some of your routines (dumpglb
> etc.), NCC NASM code, your I/O lib, and Bob White's outdec(), etc. other, it
> was bootstrapped a while back.  In the process, the code was converted back
> to original SmallC C syntax...  If I didn't thank you previously, A BIG
> THANK YOU!
>

Hey great, code is meant to be shared and used, glad you found use of
it.

> These are or were loose "goals":
>
> * parsing && || - This was just done but is not thoroughly tested.
> * compile ANSI C syntax and coded in ANSI C syntax - I.e., I'd like it to be
> bootstrappable with GCC which requires it to support parsing ANSI and be
> written in ANSI C.  I.e., compiles with GCC -ansi -pedantic without errors
> or warnings.  The version I started with *was* coded in modern C for Linux.
> It wasn't bootstrappable since it only *compiled* SmallC syntax.  In the
> process of bootstrapping, I converted it back to SmallC style syntax based
> on other versions of smallC.  It was bootstrapped via NCC and uses C5LIBC.
> * structs, switch, case, and default - I was attempting to use Yorston code.
> Part of it is not a good fit.  Some of it I "translated".  It's not complete
> enough to test.
> * stackframes - The assembly code appeared correct.  But, something wasn't
> working correctly.  This isn't needed.  I thought it might reduce the code
> size.  It would also stabilized the addressing of items on the stack.
> * smaller assembly - It works, but it seems to emit large amounts of
> assembly for small amounts of work.  I had problems with it compiling to
> code small enough to fit into a .com.
> * 32-bit - Just as the version I started with was in C for Linux, it was
> also 32-bit.  I converted it from 32-bit to 16-bit, from GAS to NASM.
> 32-bit support would be nice.  That'd require #if X, #endif, to separate
> 16/32-bit code in a single file.
> * strings in assembly instead of bytes -  This isn't necessary, but a
> nicety.
>

Yeah, the preprocessor level needs major attention. One might as well
rewrite small-c into ansi-c and use and ansi-c compiler to boot-strap
that version. I originally used PowerC to bootstrap small-c, because
that old compiler still supported K&R as well as ansi-c. Knowing now
what I know, that avenue could be revisited.

Right, small-c should be seen from its historical origins of 8-bit
floppy systems of sub 64k memory and tiny disk storage space, amazing
what was accomplished in that footprint.

-Pointers- I have to wonder what the direction of things would have
taken if C's pointers were defined as a unique non-scalar type. I
wonder if segmentation memory models would have taken the lead over
flat memory models. Right now for small-c, it is important that DS=SS
for dereferencing variables whether they be auto (on the stack) or
static (in DS memory), not enough information is carried along for
indirection to determine which segment, only some offset. Thus arose
the 'far' pointer nonsense as a weak work-around. There's no going
back, however. Btw, NASM -fbin effective addressing is non-scalar, it
requires more than just offset information in certain situations
because of its treatment of 'section'.

> I recall reading that the old PCC compiler handled structs _without_ knowing
> the name of the structs.  I'm not sure how that is done.
>

I should go back and study PCC closer, It seems to me like it became
the predominate on Unix.

> Rod Pemberton
> PS.  I see no difference between the 2:48 and 3:06 post.

There isn't any, I reposted after not seeing the first one come thru,
once in a while google has problems. BTW, I meant to comment on the
following:

>> I'd appreciate any comments you may have.

>It looks good to me, for whatever that's worth... ;) I'm trying to keep the
>old namings of routines and original functionality, etc. Once they get
>renamed or functionality gets changed too much things from other SmallC
>versions don't just "drop-in" anymore. I'd like to keep some "cut-n-paste"
>ability.

Yes, I'm reluctant to rename things too, but for this latest effort I
needed more descriptive naming to point out those functions which
generate backend code more clearly. Those with 'sa_' generate
syntactic actions for the backend code. -also- global, local, are
assembly terms whereas C terms would be static and auto.

Steve

s_dub...@yahoo.com

unread,
Sep 21, 2010, 12:33:30 PM9/21/10
to
On Sep 21, 6:29 am, "Maxim S. Shatskih"

I've a question for both of you..

Is C (standard or C99) grammar actually a context free grammar?
The treament by the parser of '*' depends on context, right? -also
putting it differently- if there is overloading of operators in a
language, is the language also still a context free grammar? (these
are two separate but similar questions to my mind.)

Steve

Alexei A. Frounze

unread,
Sep 21, 2010, 12:56:57 PM9/21/10
to
On Sep 21, 4:29 am, "Maxim S. Shatskih"

<ma...@storagecraft.com.no.spam> wrote:
> >In order to know if x(p) can at all be followed by =
>
> I think that, if "x" is a type name, then "x(p)" is an error in C.
>
> Not so in C++, there "typename(p)" is a valid syntax.

Well, you have this as valid syntax:
int (*pa)[10]; // pa is a C99 pointer to array of 10 ints

And this (copied from C99):
F *((e))(void) { /* ... */ } // same: parentheses irrelevant

I'd say, it's a pretty valid, however awkward, syntax.

> C++ parser must know what idents are typenames and what are not, it is "non-deterministic" in this sense (at least).

AFAIK, there was something about syntax around constructors and their
use that wasn't very deterministic.

Alex

Maxim S. Shatskih

unread,
Sep 21, 2010, 1:21:00 PM9/21/10
to
>The treament by the parser of '*' depends on context, right?

I'm not sure of it. Why?

*ptr is a term
a * b is also a term (another lever)

>putting it differently- if there is overloading of operators in a
>language, is the language also still a context free grammar?

Operator overloading in C++ does not change any syntax.

Maxim S. Shatskih

unread,
Sep 21, 2010, 1:25:24 PM9/21/10
to
> I think that, if "x" is a type name, then "x(p)" is an error in C.
>
> Not so in C++, there "typename(p)" is a valid syntax.

Well, you have this as valid syntax:
int (*pa)[10]; // pa is a C99 pointer to array of 10 ints

And this (copied from C99):
F *((e))(void) { /* ... */ } // same: parentheses irrelevant

I'd say, it's a pretty valid, however awkward, syntax.

And how is this related to x(p)? sorry, I'm too tired today and cannot understand just now?

"e" is a function which returns F*. So, e(p) is OK.

>AFAIK, there was something about syntax around constructors

Why? this is more or less streamlined.

"typename(params)" is a tmp object init. Works for all types, even "int". How to interpret "params" and whether this is a class object constructor call or something degenerate like "int(1)" - is semantics, not syntax.

"typename name(params);" is a non-tmp named object init and declaration. Again works for all types.

Rod Pemberton

unread,
Sep 21, 2010, 7:22:39 PM9/21/10
to
<s_dub...@yahoo.com> wrote in message
news:379c4ca8-c76f-440a...@h25g2000vba.googlegroups.com...

>
> Right, small-c should be seen from its historical origins of 8-bit
> floppy systems of sub 64k memory and tiny disk storage space, amazing
> what was accomplished in that footprint.
>

Many of my personal C programs allocate very little memory, and those that
do usually have no more than 4KB plus some variables. There are some that
allocate more or have unknown usage due to malloc. But, those are the
outliers. If you can access everything as a file, and move forward and
backward in the file, then you only need a window.

> -Pointers- I have to wonder what the direction of things would
> have taken if C's pointers were defined as a unique non-scalar type.

How many architectures have need of that?

Most of the "oddball" ones have "died". We have flat memory due to really
large segments, 8-bit bytes due to ASCII and EBCDIC, contiguous memory,
address sizes equivalent to integer sizes. These are all very nice for
programming languages. Unfortunately, C was standardized while some of the
"oddball" architectures were still "alive".

> I wonder if segmentation memory models would have taken the lead
> over flat memory models. Right now for small-c, it is important that
> DS=SS for dereferencing variables whether they be auto (on the stack)
> or static (in DS memory), not enough information is carried along for
> indirection to determine which segment, only some offset. Thus arose
> the 'far' pointer nonsense as a weak work-around. There's no going
> back, however.

I prefer keeping the flat memory model. Unfortunately, that limits
available memory when small segments are in use. It'd be nice to support
segment registers in 16-bit mode. But, the compiler would need to keep
track of when a pointer changes segment. ISTM, there could be some wrong
segment accesses, if not designed carefully. What do you do about a pointer
that can be assigned a value? I think you'd need to update the segment - or
at least verify no change - on every pointer assignment, pointer arithmetic
operation, array indexing, boundary crossing, etc.

> Btw, NASM -fbin effective addressing is non-scalar, it
> requires more than just offset information in certain
> situations because of its treatment of 'section'.

Uh, hmm, yeah, you've moved up to segments... It looks like you're using
two: code and data. That's quite clean segmentation. What if you need
1.44MB of memory for a floppy ramdisk program? ;-)

> > I recall reading that the old PCC compiler handled structs _without_
knowing
> > the name of the structs. I'm not sure how that is done.
> >
>
> I should go back and study PCC closer, It seems to me like it became
> the predominate on Unix.

They were trying to resurrect it a while ago. Apparently, various BSD
Unices switched to GCC from PCC years ago. But, some in the BSD crowd
wanted a BSD license.

http://pcc.ludd.ltu.se/
http://www.bsdfund.org/projects/pcc/


Rod Pemberton

Rod Pemberton

unread,
Sep 21, 2010, 7:25:35 PM9/21/10
to
"Maxim S. Shatskih" <ma...@storagecraft.com.no.spam> wrote in message
news:i79o7h$1fma$1...@news.mtu.ru...
> [...] while C's is IIRC not (and C++ is non deterministic).
>

AIUI, C is almost LALR(1). It's not 100%. There is ambiguity when a
typedef is used since there is no keyword for it like for "struct" or
"union" before using a struct or union. This means that a symbol table is
needed to determine if an "identifier" is an identifier or a typedef. And,
there is ambiguity with implicit int's and typedef's. IIRC, C99 eliminated
implicit int's and ambiguous parameter names to be typedef's, not parameter
names. There where some scope changes to C90 apparently in an attempt to
fix this too. Preprocessing directives can cause ambiguities according to
H&S "C:A ref. man." 3rd. I'm not sure what else.


The implicit int ambiguity:

void f(T);

Is T an implicit int or a typedef? C99 is typename. C90 it depends... If
there was a typedef T, then it's a typename. If there wasn't, then it's an
implicit int T.


The "struct" and "union" keywords are used to indicate both defining a
struct or union, and when using them. The "typdef" keyword is used to
define, but there is no usage keyword for typedef's in C.


E.g., the "implicit" typedef keyword:

typedef struct XYZ /* XYZ is an identifier */
{
/* struct */
} ASD; /* ASD is a typename */

struct XYZ junk; /* XYZ is an identifer */
ASD garbage; /* ASD is a typename */

Note that there is no "typedef" or "typedefname" keyword to indicate usage
of a typename. There is an "implicit" or "invisible" keyword. This means a
parser doesn't "know" from the syntax if ASD is an identifier or a typename.


RP

Rod Pemberton

unread,
Sep 21, 2010, 7:25:57 PM9/21/10
to
<s_dub...@yahoo.com> wrote in message
news:a5955ef3-55bb-4200...@w4g2000vbh.googlegroups.com...

>
> Is C (standard or C99) grammar actually a context free grammar?
>

It has some ambiguities. I believe that means it's a "context sensitive
grammar". Supposedly, it's almost LALR(1). See my other reply to Maxim for
more on the ambiguities. I think the ANTLR project also has an LL(k) parser
for it. I'm not from a Comp. Sci. background, so I'm not sure exactly where
the various types of parsing algorithms fit into the so called "Chomsky
hierachy". But, I believe most parser algorithms handle "context free
grammars". Both ANTLR (LL(k)) and Gold Parser (LALR(1)) have a large
toolset including grammar development environments. It should be possible
to analyze grammars with them.


http://en.wikipedia.org/wiki/Chomsky_hierarchy
http://www.antlr.org/
http://www.devincook.com/goldparser/


Rod Pemberton

Maxim S. Shatskih

unread,
Sep 22, 2010, 1:32:35 AM9/22/10
to
> there is ambiguity with implicit int's and typedef's. IIRC, C99 eliminated
> implicit int's

Aren't they eliminated by ANSI C in late 1980ies?

Alexei A. Frounze

unread,
Sep 22, 2010, 4:47:19 AM9/22/10
to
On Sep 21, 10:25 am, "Maxim S. Shatskih"

<ma...@storagecraft.com.no.spam> wrote:
> > I think that, if "x" is a type name, then "x(p)" is an error in C.
>
> > Not so in C++, there "typename(p)" is a valid syntax.
>
> Well, you have this as valid syntax:
> int (*pa)[10]; // pa is a C99 pointer to array of 10 ints
>
> And this (copied from C99):
> F *((e))(void) { /* ... */ } // same: parentheses irrelevant
>
> I'd say, it's a pretty valid, however awkward, syntax.
>
> And how is this related to x(p)? sorry, I'm too tired today and cannot understand just now?
>
> "e" is a function which returns F*. So, e(p) is OK.

It shows 2 valid uses of parentheses (perhaps I should've included
only the second one):
1. association the pointer (or the star symbol) not with the base type
(int) but with the derived type (array of 10 ints), hence, grouping
function
2. irrelevant parentheses around variable names that change nothing
and are ignored; many compilers don't mind "int (*p)" either, that is,
without [], where the parentheses aren't around just the name, but
around... what should I call it?... a "subexpression"? Many books and
people say that type and variable definition follows their use, which
seems to justify these unnecessary and harmless parentheses.

> >AFAIK, there was something about syntax around constructors
>
> Why? this is more or less streamlined.
>
> "typename(params)" is a tmp object init. Works for all types, even "int". How to interpret "params" and whether this is a class object constructor call or something degenerate like "int(1)" - is semantics, not syntax.
>
> "typename name(params);" is a non-tmp named object init and declaration. Again works for all types.

I've seen something along the lines of this:
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=439&rll=1
http://stackoverflow.com/questions/1424510/
...

Alex

Rod Pemberton

unread,
Sep 22, 2010, 2:03:27 PM9/22/10
to
"Maxim S. Shatskih" <ma...@storagecraft.com.no.spam> wrote in message
news:i7c4dj$3b5$1...@news.mtu.ru...

> > there is ambiguity with implicit int's and typedef's. IIRC, C99
eliminated
> > implicit int's
>
> Aren't they eliminated by ANSI C in late 1980ies?

http://groups.google.com/group/comp.std.c/browse_thread/thread/238705e56172ff37/5f1181ef93a8f081


RP


s_dub...@yahoo.com

unread,
Sep 22, 2010, 10:53:15 PM9/22/10
to
On Sep 21, 6:22 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <s_dubrov...@yahoo.com> wrote in message

>
> news:379c4ca8-c76f-440a...@h25g2000vba.googlegroups.com...
>
>
>
> > Right, small-c should be seen from its historical origins of 8-bit
> > floppy systems of sub 64k memory and tiny disk storage space, amazing
> > what was accomplished in that footprint.
>
> Many of my personal C programs allocate very little memory, and those that
> do usually have no more than 4KB plus some variables.  There are some that
> allocate more or have unknown usage due to malloc.  But, those are the
> outliers.  If you can access everything as a file, and move forward and
> backward in the file, then you only need a window.
>
> > -Pointers- I have to wonder what the direction of things would
> > have taken if C's pointers were defined as a unique non-scalar type.
>
> How many architectures have need of that?
>
I guess none, save the one in my head :-)

> Most of the "oddball" ones have "died".  We have flat memory due to really
> large segments, 8-bit bytes due to ASCII and EBCDIC, contiguous memory,
> address sizes equivalent to integer sizes.  These are all very nice for
> programming languages.  Unfortunately, C was standardized while some of the
> "oddball" architectures were still "alive".
>
> > I wonder if segmentation memory models would have taken the lead
> > over flat memory models.  Right now for small-c, it is important that
> > DS=SS for dereferencing variables whether they be auto (on the stack)
> > or static (in DS memory), not enough information is carried along for
> > indirection to determine which segment, only some offset.  Thus arose
> > the 'far' pointer nonsense as a weak work-around.  There's no going
> > back, however.
>
> I prefer keeping the flat memory model.  Unfortunately, that limits
> available memory when small segments are in use.  It'd be nice to support
> segment registers in 16-bit mode.  But, the compiler would need to keep
> track of when a pointer changes segment.  ISTM, there could be some wrong
> segment accesses, if not designed carefully.  What do you do about a pointer
> that can be assigned a value?  I think you'd need to update the segment - or
> at least verify no change - on every pointer assignment, pointer arithmetic
> operation, array indexing, boundary crossing, etc.
>

Keep more info in the symbol table, perhaps a linked list for
indirection chain.

> > Btw, NASM -fbin effective addressing is non-scalar, it
> > requires more than just offset information in certain
> > situations because of its treatment of 'section'.
>
> Uh, hmm, yeah, you've moved up to segments...  It looks like you're using
> two: code and data.  That's quite clean segmentation.  What if you need
> 1.44MB of memory for a floppy ramdisk program?  ;-)
>

Shim C5LIB to support int 15h ah 87h, iirc. -well not likely supported
by cmd.exe though.
A 8mb ramdisk was done for cp/m-86 by Freek Heite, I guess back in
2000.

>
>
> > > I recall reading that the old PCC compiler handled structs _without_
> knowing
> > > the name of the structs. I'm not sure how that is done.
>
> > I should go back and study PCC closer, It seems to me like it became
> > the predominate on Unix.
>
> They were trying to resurrect it a while ago.  Apparently, various BSD
> Unices switched to GCC from PCC years ago.  But, some in the BSD crowd
> wanted a BSD license.
>
> http://pcc.ludd.ltu.se/http://www.bsdfund.org/projects/pcc/
>
> Rod Pemberton

Thx, found porttour.pdf worth a read in light of the current parser
discussion:
"
Parsing
As mentioned above, the parser is generated by Yacc from the grammar
on file cgram.y. The grammar is
relatively readable, but contains some unusual features that are worth
comment.
Perhaps the strangest feature of the grammar is the treatment of
declarations. The problem is to keep track
of the basic type and the storage class while interpreting the various
stars, brackets, and parentheses that
may surround a given name. The entire declaration mechanism must be
recursive, since declarations may
appear within declarations of structures and unions, or even within a
sizeof construction inside a dimension
in another declaration!
There are some difficulties in using a bottom-up parser, such as
produced by Yacc, to handle constructions
where a lot of left context information must be kept around. The
problem is that the original PDP-11 compiler
is top-down in implementation, and some of the semantics of C reflect
this. In a top-down parser, the
input rules are restricted somewhat, but one can naturally associate
temporary storage with a rule at a very
early stage in the recognition of that rule. In a bottom-up parser,
there is more freedom in the specification
of rules, but it is more difficult to know what rule is being matched
until the entire rule is seen. The parser
described by cgram.c makes effective use of the bottom-up parsing
mechanism in some places (notably the
treatment of expressions), but struggles against the restrictions in
others. The usual result is that it is necessary
to run a stack of values ‘‘on the side’’, independent of the Yacc
value stack, in order to be able to store
and access information deep within inner constructions, where the
relationship of the rules being recognized
to the total picture is not yet clear.
In the case of declarations, the attribute information (type, etc.)
for a declaration is carefully kept immediately
to the left of the declarator (that part of the declaration involving
the name). In this way, when it is
time to declare the name, the name and the type information can be
quickly brought together. The ‘‘$0’’
mechanism of Yacc is used to accomplish this. The result is not
pretty, but it works. The storage class information
changes more slowly, so it is kept in an external variable, and
stacked if necessary. Some of the
grammar could be considerably cleaned up by using some more recent
features of Yacc, notably actions
within rules and the ability to return multiple values for actions.
A stack is also used to keep track of the current location to be
branched to when a break or continue statement
is processed.
This use of external stacks dates from the time when Yacc did not
permit values to be structures. Some, or
most, of this use of external stacks could be eliminated by redoing
the grammar to use the mechanisms now
provided. There are some areas, however, particularly the processing
of structure, union, and enum declarations,
function prologs, and switch statement processing, when having all the
affected data together in an
array speeds later processing; in this case, use of external storage
seems essential.
The cgram.y file also contains some small functions used as utility
functions in the parser. These include
routines for saving case values and labels in processing switches, and
stacking and popping values on the
external stack described above.
"

Steve


s_dub...@yahoo.com

unread,
Sep 22, 2010, 11:08:43 PM9/22/10
to
On Sep 21, 12:21 pm, "Maxim S. Shatskih"

<ma...@storagecraft.com.no.spam> wrote:
> >The treament by the parser of '*' depends on context, right?
>
> I'm not sure of it. Why?
>
AIUI LL(1) and LR(1) work on context free grammars. I see that
previously you thought C was maybe not LL(1).

The star, for example as a token to be parsed, can be an indirection
operator or a multiplication operator. -determined by context. Maybe
I'm wrong to look at it this way.

Steve

Alexei A. Frounze

unread,
Sep 23, 2010, 3:44:00 AM9/23/10
to
On Sep 22, 8:08 pm, s_dubrov...@yahoo.com wrote:
> On Sep 21, 12:21 pm, "Maxim S. Shatskih"<ma...@storagecraft.com.no.spam> wrote:
> > >The treament by the parser of '*' depends on context, right?
>
> > I'm not sure of it. Why?
>
> AIUI LL(1) and LR(1)  work on context free grammars.  I see that
> previously you thought C was maybe not LL(1).
>
> The star, for example as a token to be parsed, can be an indirection
> operator or a multiplication operator. -determined by context.  Maybe
> I'm wrong to look at it this way.

As I understand it, * can only be a multiplication operator if what
precedes it is an expression. If there's "nothing" before it or
there's a unary operator before it, it's an indirection/dereference
operator.

Alex

Rod Pemberton

unread,
Sep 23, 2010, 3:37:58 PM9/23/10
to
"Alexei A. Frounze" <alexf...@gmail.com> wrote in message
news:e715dc28-0814-4628...@a7g2000prb.googlegroups.com...

It's also used to declare a pointer type, .e.g., within a cast. So, you've
got at least three uses of it within non-declaration syntax. Yes?

q=12**(unsigned int*)&bytes[12];

Parenthesis are another syntax element that is "overloaded", e.g., casts,
function call parameters, argument lists, parenthesized expressions.

For a small compiler, there is no reason to support "oddball" C syntax. No
one expects SmallC to compile stuff from the IOCCC. It only needs to parse
respectable, clean, human readable C. It doesn't have to handle all legal
C.


Rod Pemberton


Alexei A. Frounze

unread,
Sep 24, 2010, 12:27:00 AM9/24/10
to
On Sep 23, 12:37 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in messagenews:e715dc28-0814-4628...@a7g2000prb.googlegroups.com...

> On Sep 22, 8:08 pm, s_dubrov...@yahoo.com wrote:
>
>
>
> > On Sep 21, 12:21 pm, "Maxim S. Shatskih"<ma...@storagecraft.com.no.spam>
> wrote:
> > > >The treament by the parser of '*' depends on context, right?
>
> > > I'm not sure of it. Why?
>
> > AIUI LL(1) and LR(1) work on context free grammars. I see that
> > previously you thought C was maybe not LL(1).
>
> > > The star, for example as a token to be parsed, can be an indirection
> > > operator or a multiplication operator. -determined by context. Maybe
> > > I'm wrong to look at it this way.
>
> > As I understand it, * can only be a multiplication operator if what
> > precedes it is an expression. If there's "nothing" before it or
> > there's a unary operator before it, it's an indirection/dereference
> > operator.
>
> It's also used to declare a pointer type, .e.g., within a cast.  So, you've
> got at least three uses of it within non-declaration syntax.  Yes?
>
> q=12**(unsigned int*)&bytes[12];

You're right. I didn't consider variable/type definitions/declarations
when I was replying.

> Parenthesis are another syntax element that is "overloaded", e.g., casts,
> function call parameters, argument lists, parenthesized expressions.
>
> For a small compiler, there is no reason to support "oddball" C syntax.  No
> one expects SmallC to compile stuff from the IOCCC.  It only needs to parse
> respectable, clean, human readable C.  It doesn't have to handle all legal
> C.

Surely, old K&R style isn't something you'd be upset if it was
missing. And some odd very-compiler-specific behavior as well (e.g.
mishandled backslashes or spaces in wrong places in preprocessor
directives).

Alex

Rod Pemberton

unread,
Sep 25, 2010, 10:46:30 AM9/25/10
to
"Rod Pemberton" <do_no...@notreplytome.cmm> wrote in message
news:i7gaa2$34f$1...@speranza.aioe.org...

> "Alexei A. Frounze" <alexf...@gmail.com> wrote in message
> news:e715dc28-0814-4628...@a7g2000prb.googlegroups.com...
> On Sep 22, 8:08 pm, s_dubrov...@yahoo.com wrote:
>
> > > The star, for example as a token to be parsed, can be an indirection
> > > operator or a multiplication operator. -determined by context. Maybe
> > > I'm wrong to look at it this way.
> >
> > As I understand it, * can only be a multiplication operator if what
> > precedes it is an expression. If there's "nothing" before it or
> > there's a unary operator before it, it's an indirection/dereference
> > operator.
>
> It's also used to declare a pointer type, .e.g., within a cast. So,
you've
> got at least three uses of it within non-declaration syntax. Yes?
>
> q=12**(unsigned int*)&bytes[12];
>

What's needed is for a more powerful C compiler to parse this, and then emit
the parsing information intermixed with the C, so that a very simple parser
doesn't need to determine what this is. It can use the intermixed parse
info to correctly determine how to parse. I'd be like mixing the C source
and the intermediate representation together. As long as there is a
consistent format for the parse info, it should be possible to separate
them.

E.g., say '.' precedes all parse info,

> q=12**(unsigned int*)&bytes[12];

becomes:

.variableq.operator=.integer12.operator*.dereference*.cast(.pointerunsignedi
nt*.cast_end).addressof&.variablebytes.operator[.index12.unused].eos;

E.g., the parser doesn't need to determine that 12 is an integer. It's told
that 12 is an integer by the "integer" and calls the routine to get an
integer. It may need to be more robust than that. I.e., there may be
situations where '.' followed by parse type cannot be separated from the C
code. Also, I removed spaces for ".pointer" because I thought breaking it
down into unsigned, int, and * would be harder for a low-level parser to
reconstruct as a pointer type for a cast.

Oh, we were discussing ambiguities in the grammar. When can the spaces in
"long long" be removed without introducing ambiquity? ISTM, that most C
compilers won't compile "longlong" as "long long"... AFAIK, it's the only
"keyword" with spaces within.


Rod Pemberton


s_dub...@yahoo.com

unread,
Sep 25, 2010, 2:44:53 PM9/25/10
to
On Sep 25, 9:46 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> "Rod Pemberton" <do_not_h...@notreplytome.cmm> wrote in message
>
> news:i7gaa2$34f$1...@speranza.aioe.org...
>
>
>
>
>
> > "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in message

> >news:e715dc28-0814-4628...@a7g2000prb.googlegroups.com...
> > On Sep 22, 8:08 pm, s_dubrov...@yahoo.com wrote:
>
> > > > The star, for example as a token to be parsed, can be an indirection
> > > > operator or a multiplication operator. -determined by context. Maybe
> > > > I'm wrong to look at it this way.
>
> > > As I understand it, * can only be a multiplication operator if what
> > > precedes it is an expression. If there's "nothing" before it or
> > > there's a unary operator before it, it's an indirection/dereference
> > > operator.
>
> > It's also used to declare a pointer type, .e.g., within a cast.  So,
> you've
> > got at least three uses of it within non-declaration syntax.  Yes?
>
> > q=12**(unsigned int*)&bytes[12];
>
> What's needed is for a more powerful C compiler to parse this, and then emit
> the parsing information intermixed with the C, so that a very simple parser
> doesn't need to determine what this is.  It can use the intermixed parse
> info to correctly determine how to parse.  I'd be like mixing the C source
> and the intermediate representation together.  As long as there is a
> consistent format for the parse info, it should be possible to separate
> them.
>
> E.g., say '.' precedes all parse info,
>
> > q=12**(unsigned int*)&bytes[12];
>
> becomes:
>
> .variableq.operator=.integer12.operator*.dereference*.cast(.pointerunsigned­i

> nt*.cast_end).addressof&.variablebytes.operator[.index12.unused].eos;
>
> E.g., the parser doesn't need to determine that 12 is an integer.  It's told
> that 12 is an integer by the "integer" and calls the routine to get an
> integer.  It may need to be more robust than that.  I.e., there may be
> situations where '.' followed by parse type cannot be separated from the C
> code.  Also, I removed spaces for ".pointer" because I thought breaking it
> down into unsigned, int, and * would be harder for a low-level parser to
> reconstruct as a pointer type for a cast.
>
> Oh, we were discussing ambiguities in the grammar.  When can the spaces in
> "long long" be removed without introducing ambiquity?  ISTM, that most C
> compilers won't compile "longlong" as "long long"...  AFAIK, it's the only
> "keyword" with spaces within.
>
a+++b; means ? without whitespace. There are other ambiquities as
well.

Small-c compacts whitespace but doesn't remove it.

In some languages, whitespace can be totally removed, it is only there
for readability.

Rod Pemberton

unread,
Sep 25, 2010, 3:51:18 PM9/25/10
to
<s_dub...@yahoo.com> wrote in message
news:5b71e3dc-ed9b-40b5...@l6g2000yqb.googlegroups.com...

I think it means "a++ + b", if the H&S precedence table is correct... + is
lowest. ++ postfix is higher than ++ prefix. If I'm unsure and don't need
to verify, I generally assume left-to-right, most complete sequence first,
adjusted by parens when present. But, that'd be interesting to check with
some compilers to see what they do. GCC, OpenWatcom, and SmallC (NCC) all
do "a++ + b".


Rod Pemberton


James Harris

unread,
Sep 25, 2010, 6:24:31 PM9/25/10
to
On 25 Sep, 20:51, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <s_dubrov...@yahoo.com> wrote in message
>
> news:5b71e3dc-ed9b-40b5...@l6g2000yqb.googlegroups.com...

...

> > a+++b; means ? without whitespace.
>
> I think it means "a++ + b", if the H&S precedence table is correct...  + is
> lowest.  ++ postfix is higher than ++ prefix.  If I'm unsure and don't need
> to verify, I generally assume left-to-right, most complete sequence first,
> adjusted by parens when present.  But, that'd be interesting to check with
> some compilers to see what they do.  GCC, OpenWatcom, and SmallC (NCC) all
> do "a++ + b".

You may have the right interpretation but I don't think such
recognition is anything to do with precedence. Lexers often use
maximal munch when forming composite tokens.

http://en.wikipedia.org/wiki/Maximal_munch

James

io_x

unread,
Sep 26, 2010, 4:19:39 AM9/26/10
to
<Steve> ha scritto nel messaggio

>Small-c compacts whitespace but doesn't remove it.

>In some languages, whitespace can be totally removed, it is only there
>for readability.

I think it too
>Steve

s_dub...@yahoo.com

unread,
Sep 26, 2010, 11:28:37 AM9/26/10
to
On Sep 25, 5:24 pm, James Harris <james.harri...@googlemail.com>
wrote:

> On 25 Sep, 20:51, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
>
> > <s_dubrov...@yahoo.com> wrote in message
>
> >news:5b71e3dc-ed9b-40b5...@l6g2000yqb.googlegroups.com...
>
> ...
>
> > > a+++b; means ? without whitespace.
>
> > I think it means "a++ + b", if the H&S precedence table is correct...  + is
> > lowest.  ++ postfix is higher than ++ prefix.  If I'm unsure and don't need
> > to verify, I generally assume left-to-right, most complete sequence first,
> > adjusted by parens when present.  But, that'd be interesting to check with
> > some compilers to see what they do.  GCC, OpenWatcom, and SmallC (NCC) all
> > do "a++ + b".

Interesting...

>
> You may have the right interpretation but I don't think such
> recognition is anything to do with precedence. Lexers often use
> maximal munch when forming composite tokens.
>
>  http://en.wikipedia.org/wiki/Maximal_munch
>
> James

-and what of a+++++b ? -obviously other rules come into play for
maximal munch as (a)(++)(++)(+)(b) should be in error.

Does a++b and a++++b generate an error as it should?

I agree with Rod because the recognition and precedence (and context
for that matter) is effectively codified in the recursive descent
method.

Steve

James Harris

unread,
Sep 26, 2010, 12:47:46 PM9/26/10
to
On 26 Sep, 16:28, s_dubrov...@yahoo.com wrote:

...

> > > > a+++b; means ? without whitespace.
>
> > > I think it means "a++ + b", if the H&S precedence table is correct...  + is
> > > lowest.  ++ postfix is higher than ++ prefix.  If I'm unsure and don't need
> > > to verify, I generally assume left-to-right, most complete sequence first,
> > > adjusted by parens when present.  But, that'd be interesting to check with
> > > some compilers to see what they do.  GCC, OpenWatcom, and SmallC (NCC) all
> > > do "a++ + b".
>
> Interesting...
>
>
>
> > You may have the right interpretation but I don't think such
> > recognition is anything to do with precedence. Lexers often use
> > maximal munch when forming composite tokens.
>
> >  http://en.wikipedia.org/wiki/Maximal_munch
>

> -and what of  a+++++b ?  -obviously other rules come into play for
> maximal munch as (a)(++)(++)(+)(b) should be in error.
>
> Does a++b and a++++b generate an error as it should?

OK I've run a few tests using from two to five adjacent plus signs.
Here are the results.

/* Two plus signs */
c = a++b;
Fails with error: expected ';' before 'b'
Presumably lexes as a++ b
OK if changed to c = a+ +b;

/* Three plus signs */
c = a+++b;
OK, presumably lexes as a++ +b

/* Four plus signs */
c = a++++b;
Fails with error: lvalue required as increment operand
and error: expected ';' before 'b'
Presumable lexes as a++ ++ b
OK if changed to a+++ +b (presumably seen as a++ + +b)

/* Five plus signs */
c = a+++++b;
Fails with error: lvalue required as increment operand
Presumably lexes as a++ ++ +b
OK if written as c = a++ + ++b;

Aren't all of the above consistent with maximal munch?

> I agree with Rod because the recognition and precedence (and context
> for that matter) is effectively codified in the recursive descent
> method.

As I said, lexers "often" use maximal munch, as gcc seems to do as
shown above. I didn't claim that all did. What results do you get?

James

Rod Pemberton

unread,
Sep 26, 2010, 1:23:28 PM9/26/10
to
<s_dub...@yahoo.com> wrote in message
news:5b71e3dc-ed9b-40b5...@l6g2000yqb.googlegroups.com...

Except for preprocessing lines, that should be true for C too. You should
be able to remove whitespace and have the code compile. IIRC, one of the
specification defined parsing phases removes whitespace. The publicly
available grammars for C usually only match "long". They don't match "long
long". But, , they have to be able to distinguish "long" from "long long"
somehow. So, I assume they are doing something special. So far, I've not
found a C compiler that supports "long long" that will handle "long long"
without a space.


Rod Pemberton

Rod Pemberton

unread,
Sep 26, 2010, 2:58:16 PM9/26/10
to
<s_dub...@yahoo.com> wrote in message
news:23af1e5e-002a-40be...@a9g2000yqg.googlegroups.com...

> On Sep 25, 5:24 pm, James Harris <james.harri...@googlemail.com>
> wrote:
> > On 25 Sep, 20:51, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> > wrote:
> > > <s_dubrov...@yahoo.com> wrote in message
> >
>news:5b71e3dc-ed9b-40b5...@l6g2000yqb.googlegroups.com...
>
> > > a+++b; means ? without whitespace.
>
> > I think it means "a++ + b", if the H&S precedence table is correct... +
> > is lowest. ++ postfix is higher than ++ prefix. If I'm unsure and don't
> > need to verify, I generally assume left-to-right, most complete sequence
> > first, adjusted by parens when present. But, that'd be interesting to
> > check with some compilers to see what they do. GCC, OpenWatcom,
> > and SmallC (NCC) all do "a++ + b".
>
>Interesting...
>
...

> > You may have the right interpretation but I don't think such
> > recognition is anything to do with precedence. Lexers often use
> > maximal munch when forming composite tokens.
>

> -and what of a+++++b ? -obviously other rules come into play for
> maximal munch as (a)(++)(++)(+)(b) should be in error.
>
> Does a++b and a++++b generate an error as it should?
>
>I agree with Rod because the recognition and precedence (and
> context for that matter) is effectively codified in the recursive
> descent method.
>

Well, James beat me to the first reply, but I'm posting anyway since this
took a couple o' minutes of my life. It'll also answer his question on
"What results did you get?". :-)

You'll need a fixed with font for what follows. The results are for GCC
(from DJGPP), OpenWatcom, and your first version of Smallc. I modified NCC
slightly to emit what it was parsing, since I was get invalid instructions
errors when running the compiled code... So, the first two "parses" for NCC
are my guess, which I believe are reasonable and correct. And, the last two
are what it was actually emitted for it's parsing after modification. I
didn't rerun for the first two after modification.

a++b
GCC a++ b Xparse_error
OW a++ b Xparse_error
NCC a++ b Xmissing_semicolon

a+++b
GCC a++ + b OK
OW a++ + b OK
NCC a++ + b OK

a++++b
GCC a++ ++b Xparse_error
OW a++ ++b Xparse_error
NCC a++ + +b Xinvalid_expression

a+++++b
GCC a++ ++ +b Xlvalue_error
OW a++ ++ +b Xlvalue_error
NCC a++ + ++b OK

Personally, I think GCC and OW are incorrect on the last one. I tried a
couple combinations with spaces. The few I tried seemed to fix this. So,
spaces or parens are probably a good idea. Except for SmallC, they do seem
to be "maximal munch" as James said. NCC doesn't support unary '+' on the
third one (4 pluses). I added unary plus to one version of SmallC, it
compiles it. I didn't modify that version to see what it was actually
parsing though. It's a little hard to tell, but the code looks like a++
plus b to me, i.e.,

/* unary plus on b */
a++ + +b

As a comparison, a publicly available ANSI C grammar updated (by me) to ISO
C99 detects:

++ ++ Xparse_error
+++ ++ + OK
++++ ++ ++ Xparse_error
+++++ ++ ++ + OK

The grammar is in yacc and lex (actually, bison and flex). So, it's
"maximal munch", probably. Obviously, the last should be an error, but not
a parsing error...


Rod Pemberton

s_dub...@yahoo.com

unread,
Sep 26, 2010, 10:01:16 PM9/26/10
to
On Sep 26, 11:47 am, James Harris <james.harri...@googlemail.com>
wrote:

> On 26 Sep, 16:28, s_dubrov...@yahoo.com wrote:
[snip]
Yes. I appreciate your input, I wouldn't have thought to think about
maximal munch otherwise.

> > I agree with Rod because the recognition and precedence (and context
> > for that matter) is effectively codified in the recursive descent
> > method.
>
> As I said, lexers "often" use maximal munch, as gcc seems to do as
> shown above. I didn't claim that all did. What results do you get?
>

Well Rod did a good job checking. I hadn't considered unary plus as
Rod had. I hadn't tried any until now, so I thought I'd add from Mix
PowerC and CCN8, which give..

.. for PowerC, as Rod got me thinking about unary '+', I did:
/** File: maxmunch.c **/
/** test parsing for 'maximal munch' **/

main()
{
int a,b;

+a++b;
+a+++b;
+a++++b;
+a+++++b;
}

/** Results

Power C - Version 2.2.0
(C) Copyright 1989-1993 by Mix Software
Compiling ...
maxmunch.C(8): +a++b; (pointer under b)
************* ^ 14
14: ';' expected
------------------------------------------------------------
maxmunch.C(10): +a++++b; (pointer under b)
************** ^ 29
29: Variable required for ++ and --
------------------------------------------------------------
maxmunch.C(11): +a+++++b; (pointer under '+' before b)
************** ^ 29
29: Variable required for ++ and --
------------------------------------------------------------
29 lines compiled
3 Compile errors
**/

.. for CCN8, which doesn't support unary plus. However, it does
generate nasm output in spite of the error, as best it can, so some
inference can be drawn from that as to the point of failure...

/** File: maxmunch.c **/
/** test parsing for 'maximal munch' **/

main()
{
int a,b;

a++b; /** seen as: a++ b failure **/
a+++b;
a++++b; /** seen as: a++ + .. failure **/
a+++++b; /** parsed as a++ + ++b success **/
}
/*** results CCN8

====== main()
Line +8, main+4: missing semicolon
a++b;
^ (pointer under b)

Line +10, main+6: invalid expression
a++++b;
^ (pointer under b)
***/

So, PowerC does maximal munch also..

I'm disappointed that the lexer doesn't backtrack, considering unary
plus.
Even a++b; could be: a + +b on a second attempt.

So we know for certain whitespace is required for C disambiguation.

And what can I say about maximal munch.. clever, but not too smart.

Thxs,
Steve

> James- Hide quoted text -

robert...@yahoo.com

unread,
Sep 27, 2010, 2:38:14 AM9/27/10
to


Perhaps actually looking at the standard might help settle a question
this elementary.

From 6.4 in the C99 standard:

"(4) If the input stream has been parsed into preprocessing tokens up
to a given character, the
next preprocessing token is the longest sequence of characters that
could constitute a
preprocessing token." (and then goes on to describe the one exception
regarding header names).

And just below that:

"(6) EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y,
which violates a constraint on
increment operators, even though the parse x ++ + ++ y might yield a
correct expression."

The common C89 draft has basically the same language, but I don't have
my copy of the final C89 standard handy, but I don't think that
section is changed.

robert...@yahoo.com

unread,
Sep 27, 2010, 2:45:00 AM9/27/10
to
On Sep 26, 12:23 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <s_dubrov...@yahoo.com> wrote in message


Because "long long" is two tokens, not one. Just like "long double"
or "unsigned int" are two tokens, and "longdouble" and "unsignedint"
are *not* valid keywords, despite being a single (preprocessing) token.

Rod Pemberton

unread,
Sep 27, 2010, 11:39:26 AM9/27/10
to
<robert...@yahoo.com> wrote in message
news:55a4903c-a1d6-47cb...@26g2000yqv.googlegroups.com...

> On Sep 26, 9:01 pm, s_dubrov...@yahoo.com wrote:
> > On Sep 26, 11:47 am, James Harris <james.harri...@googlemail.com>
> > wrote:
> > > On 26 Sep, 16:28, s_dubrov...@yahoo.com wrote:
> [snip]
>
> > > > > You may have the right interpretation but I don't think such
> > > > > recognition is anything to do with precedence. Lexers often use
> > > > > maximal munch when forming composite tokens.
>
> > > > > http://en.wikipedia.org/wiki/Maximal_munch
>
> > > > -and what of a+++++b ? -obviously other rules come into play for
> > > > maximal munch as (a)(++)(++)(+)(b) should be in error.
>
> > > > Does a++b and a++++b generate an error as it should?
>
> > > Aren't all of the above consistent with maximal munch?
>
> > Yes. I appreciate your input, I wouldn't have thought to think about
> > maximal munch otherwise.
>
> > > > I agree with Rod because the recognition and precedence (and context
> > > > for that matter) is effectively codified in the recursive descent
> > > > method.
>
> > > As I said, lexers "often" use maximal munch, as gcc seems to do as
> > > shown above. I didn't claim that all did. What results do you get?
>
> > Well Rod did a good job checking. I hadn't considered unary plus as
> > Rod had. I hadn't tried any until now, so I thought I'd add from Mix
> > PowerC and CCN8, which give..
>
> > .. for PowerC, as Rod got me thinking about unary '+', I did:
> [...]

> > So, PowerC does maximal munch also..
>
> > I'm disappointed that the lexer doesn't backtrack, considering unary
> > plus.
> > Even a++b; could be: a + +b on a second attempt.
>
> > So we know for certain whitespace is required for C disambiguation.
>
> > And what can I say about maximal munch.. clever, but not too smart.
>
> From 6.4 in the C99 standard:
>
> "(4) If the input stream has been parsed into preprocessing tokens up
> to a given character, the
> next preprocessing token is the longest sequence of characters that
> could constitute a
> preprocessing token." (and then goes on to describe the one exception
> regarding header names).
>
>
> And just below that:
>
> "(6) EXAMPLE 2 The program fragment x+++++y is parsed as x ++ ++ + y,
> which violates a constraint on
> increment operators, even though the parse x ++ + ++ y might yield a
> correct expression."
>
> The common C89 draft has basically the same language, but I don't have
> my copy of the final C89 standard handy, but I don't think that
> section is changed.
>

Interesting...

It's also interesting to note that C89 3.1 has "identifier" and "operator"
as part of both syntax classes: "token" and "pre-processing token". So, it
can parse the "EXAMPLE 2". But, C99 6.4 removes "operator" from both.
Technically, this means that an "operator" is never a "pre-processing
token", nor a "token", and the quoted example therefore doesn't apply...
;-) Laugh! I'm assuming that's an error in the C99 spec, unless "operator"
somehow *now* falls into the "each non-white-space character that cannot be
one of the above" class. It seems it's that way throughout all the new
drafts too.

Humor aside, it seems this "maximal munch" goes back to K&R '74:

"If the input stream has been parsed into tokens up to a given character,
the next token is taken to include the longest string of characters which
could possibly constitute a token."

At a minimum, it prevents backtracking of the parser. Although, I'd say it
also doesn't follow the effective precedences created by the grammar.


Rod Pemberton

Rod Pemberton

unread,
Sep 27, 2010, 12:16:18 PM9/27/10
to
<robert...@yahoo.com> wrote in message
news:d6785fa5-3b7c-4e55...@h7g2000yqn.googlegroups.com...

AIUI, the C specifications do not require spaces to delimit tokens. C is
not space delimited like Forth. If true, this means that "unsignedint"
should be parsable as "unsigned" and "int", not an identifier "unsignedint".
To parse C that way, requires exact matching of keywords, operators, etc.
Using "maximal munch" with space-less C syntax causes incorrect parsing due
to token concatenation/merging. Since "maximal munch" seems to go back to
'74, apparently C *was required* to be space delimited through this rule.


Rod Pemberton


wolfgang kern

unread,
Sep 27, 2010, 1:56:08 PM9/27/10
to

Rod Pemberton in discussion with Steve...
...
what I read so far within this thread is that you
like to see an easy readable syntax together with
some enhanced functionality ...

My way would be (and partilar already is) to go back to
ye olde BASIC-interpreters and add modern features to it.
Expressions and actions on it will become code-functions
(not macros) while the language above will remain fully
readable/maintainable to a HLL-programmer.

And beside this I'd had a language which allow access to
all hardware opportunities including I/O (not only AX,BX).

Too much abstractions may add convenience for the lazy,
but will for sure not add to code performance in terms
of timing and size.

__
wolfgang


robert...@yahoo.com

unread,
Sep 27, 2010, 4:10:57 PM9/27/10
to
On Sep 27, 11:16 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <robertwess...@yahoo.com> wrote in message


The tokenization is done in terms of preprocessing tokens. Keywords
mainly do not exist at that point, and all of the keywords are parsed
as identifiers. Identifiers start with a letter or underscore, and
continue with additional letters, underscores or digits, until some
other character occurs. For reference, PP-tokens are: header names,
identifiers, preprocessing numbers, character constants, string
literals, operators, and punctuators, plus any individual characters
that can't be parsed as one, or part, of those types of tokens.

Only when you get to the later translations stages, are some of the
(PP-token) identifiers, *ahem*, identified as keywords. Similarly, PP-
numbers (at least those left after preprocessing), get turned into
constants. In general phase 7 translates all remaining PP-tokens into
"tokens" (no "PP"), reclassifying some of them, and then what most
people call the compilation step happens on that list of tokens.

Thus the (nonsensical) sequences "longlong++" and "long++" are both
two preprocessing tokens. A space ends many PP tokens (obviously not
inside strings, for example), so "long long++" is actually three PP-
tokens. When translation gets to phase 7, the "long" identifier PP-
tokens in the first and thirds example get translated into keyword
tokens, while the "longlong" becomes an identifier token. In all
three cases the "++" operator PP- token turns into an operator token.

s_dub...@yahoo.com

unread,
Sep 27, 2010, 6:56:01 PM9/27/10
to

I agree with you Wolfgang.

I would prefer a block structure, curly bracket delimited structure,
but not C, over Basic tho. With a language sufficiently different
from C so as not to be confused with it.

Steve

Rod Pemberton

unread,
Sep 29, 2010, 8:06:54 PM9/29/10
to
"wolfgang kern" <now...@never.at> wrote in message
news:i7qm4d$gge$1...@newsreader2.utanet.at...

>
> Rod Pemberton in discussion with Steve...
> ...
> what I read so far within this thread is that you
> like to see an easy readable syntax together with
> some enhanced functionality ...
>
> My way would be (and partilar already is) to go back to
> ye olde BASIC-interpreters and add modern features to it.
>

Have you looked at these?

http://cbmbasic.sourceforge.net/
http://ti99basic.sourceforge.net/


Rod Pemberton


s_dub...@yahoo.com

unread,
Sep 29, 2010, 10:29:55 PM9/29/10
to
On Sep 27, 11:16 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <robertwess...@yahoo.com> wrote in message
> Rod Pemberton- Hide quoted text -
>
> - Show quoted text -


Out of curiosity I took alook at BCPL to see if it specified
delimiters for identifiers. The language for names (identifiers) is
near what Robert cites for the C standard.
"
A name either starts with a capital letter and is terminated by the
first non-letter or digit, or it is a single small letter.
"
So by inference the remaining symbols, including whitespace is a name
(identifier) delimiter.

I was going to advance the notion that a language standard is like an
'Engineering As-Built', i.e. the factual resulting compromise of a
construction. But what the heck, I'm exhausted. The standard is what
it is, and the C language is what it is. -And I'm too exhausted to
deprecate small-c into parsing a+++++b like the standard; (a)(++)(++)
(+)(b).
:-)

Steve

wolfgang kern

unread,
Sep 30, 2010, 5:45:28 AM9/30/10
to

Rod Pemberton wrote:
>> ...
>> what I read so far within this thread is that you
>> like to see an easy readable syntax together with
>> some enhanced functionality ...

>> My way would be (and particular already is) to go back to


>> ye olde BASIC-interpreters and add modern features to it.

> Have you looked at these?

> http://cbmbasic.sourceforge.net/
> http://ti99basic.sourceforge.net/

Yeah these are funny, but I had something like TurboBasic,
PowerBasic (structured, w/o line-numbers) for x86 in mind.
For user editable scripts I took the idea from Mr.Sinclair's
ZX80/81 to create just token-parameter strings and interprete
them as fast as possible.

__
wolfgang


Rod Pemberton

unread,
Sep 30, 2010, 6:38:50 PM9/30/10
to
"wolfgang kern" <now...@never.at> wrote in message
news:i81n00$9r6$1...@newsreader2.utanet.at...

:)

I pulled out the C64 programmer's reference manual a while back and looked
at it's BASIC. It's not something you'd want as "100% compatible". IMO,
there were way too many 8-bit limitations. It is a BASIC, but it's not a
BASIC well suited to modern usage. Even a restricted subset of C is more
functional. But, using line-numbered BASIC as a machine control language
does work. Many years ago, I used a machine that had BASIC as it's main
control language. The BASIC code was mixed with other code that controlled
the machine's specialized operation. Machine control lines were started
with an escape character. I.e., the escaped lines were redirected from
BASIC's interpreter to another interpreter. BASIC was just fine for user
input, screen output, data, control flow, saving, running, editing the
program. The other interpreter handled the machine control. After a DOS
style CLI, I'd say this was one of the more useful interactive interfaces
I've used. Interpreters are nice for quick testing and implementation. I
think it worked well for three reasons: line-numbered, interpreted, and
non-BASIC code wasn't handled by BASIC. Line-numbered interpreted C anyone?
Linux shells are very awkward, IMO. They've got wierdly named commands,
strange parameters, odd side effects, and of course, extremely powerful
commands embedded in primitive utilities, i.e., easy to wreck everything
unintentionally, etc. I'd love to have a DOS shell for Linux.


Rod Pemberton


Alexei A. Frounze

unread,
Oct 1, 2010, 8:05:20 AM10/1/10
to
On Sep 30, 3:38 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> "wolfgang kern" <nowh...@never.at> wrote in message

Interesting... Are you running everything in as root all the time? I
think there's something good in being able to run stuff with reduced
privileges and elevate/switch user (or runas/sudo) only as necessary.
It can save you from a lot of unintentional mistakes with far reaching
consequences.

Alex

wolfgang kern

unread,
Oct 1, 2010, 4:09:13 PM10/1/10
to

Rod Pemberton replied:

[removed the here (I have both in abo) ALA from AOD]

...
>>>> what I read so far within this thread is that you
>>>> like to see an easy readable syntax together with
>>>> some enhanced functionality ...

>>>> My way would be (and particular already is) to go back to
>>>> ye olde BASIC-interpreters and add modern features to it.

>> Yeah these are funny, but I had something like TurboBasic,
>> PowerBasic (structured, w/o line-numbers) for x86 in mind.
>> For user editable scripts I took the idea from Mr.Sinclair's
>> ZX80/81 to create just token-parameter strings and interprete
>> them as fast as possible.

> :)
> I pulled out the C64 programmer's reference manual a while back and looked
> at it's BASIC. It's not something you'd want as "100% compatible".

Yes,C64-Basic isn't and weren't ever meant to be compatible/easy
convertible to x86 platforms.

> IMO, > there were way too many 8-bit limitations.
> It is a BASIC, but it's not a BASIC well suited to modern usage.

I agree here, so yes it would need some addons like block strucured
syntax and nested IF/DO/WHILE/...

> Even a restricted subset of C is more functional.

Yeah, but Steve's intention seem to be to add even more functionality
and get rid of useless/redundant syntax requirements...
I really appreciate every attempt in this direction.

> But, using line-numbered BASIC as a machine control language does work.

I can confirm this from personal experience (even many years ago).

> Many years ago, I used a machine that had BASIC as it's main control
> language. The BASIC code was mixed with other code that controlled
> the machine's specialized operation.

me too used various MC-boards and I even produced a few 8/16-bit
machines during the old days with BASIC styled ROM-code.

> Machine control lines were started with an escape character.
> I.e., the escaped lines were redirected from BASIC's interpreter to
> another interpreter.

Yes, I remember...PDP, NOVA-II,...

> BASIC was just fine for user input, screen output, data, control flow,
> saving, running, editing the program.
> The other interpreter handled the machine control. After a DOS
> style CLI, I'd say this was one of the more useful interactive interfaces
> I've used. Interpreters are nice for quick testing and implementation. I
> think it worked well for three reasons: line-numbered, interpreted, and
> non-BASIC code wasn't handled by BASIC. Line-numbered interpreted C
> anyone?

Interpetors are often more effective than complex code compilation.
But line numbered source... isn't for sure my thing.

> Linux shells are very awkward, IMO. They've got wierdly named commands,
> strange parameters, odd side effects, and of course, extremely powerful
> commands embedded in primitive utilities, i.e., easy to wreck everything
> unintentionally, etc. I'd love to have a DOS shell for Linux.

windoze were and still act obviously as a 'new-basic' interpretor,
and it got a quite high market share ...
(a lection in how to fool customers).

I gave up to use M$-products for anything else than games and inet,
it's just not worth to fiddle with all the required detours and end
up with limited access-rights.

Feel free to create a new Mshit-DOS-shell for Loonix ...:)
but I'd like to see it in ASM without commandline parameters.

__
wolfgang


Rod Pemberton

unread,
Oct 1, 2010, 5:57:44 PM10/1/10
to
"Alexei A. Frounze" <alexf...@gmail.com> wrote in message
news:6dbf5e0b-8930-4ae3...@v6g2000prd.googlegroups.com...

> On Sep 30, 3:38 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
>
> > [snip]

> > Linux shells are very awkward, IMO. They've got wierdly named commands,
> > strange parameters, odd side effects, and of course, extremely powerful
> > commands embedded in primitive utilities, i.e., easy to wreck everything
> > unintentionally, etc. I'd love to have a DOS shell for Linux.
>
>
> Interesting... Are you running everything in as root all the time?
>

Well, duh! :-) Actually, the primary OSes I use (DOS, Win98SE) have no
support for multi-user, so everything is "root". Linux does, but I don't
use it much. I've been meaning to use it more.

My comment was more on the chaotic features of Linux commands and utilities
than on needing to reduce my access level...

>
> I think there's something good in being able to run stuff with
> reduced privileges and elevate/switch user (or runas/sudo) only
> as necessary. It can save you from a lot of unintentional mistakes
> with far reaching consequences.
>

You do know that multi-user systems - especially for home use - have
historically been the rarity? I.e., I've been using OSes without user
privilege levels for so long I don't really care for them. Or, I've worked
on OSes with multi-user privilege and have always needed to be "root" to do
the work.

Nowadays, Linux is the only environment I have which is multi-user, but I
don't use it much. I've been meaning to downgrade it from 64-bit to 32-bit,
both VectorLinux, so I've got access to GLIB instead of DJGPP's custom C
library. You suggested using 'runas' or 'sudo'. It's been a long time
since I used such a command for SysV. I don't recall what it was named.
AFAIR, the command had you re-login to boost your privilege, however, the
privilege of running that command had to be enabled for that account. If
I'm the only user and I'm using Linux mostly to compile code (i.e., no
Internet, no GUI etc.), how does that help me - instead of annoy? Can you
compile and run executables without root privilege? without sudo/runas?
IIRC, compiling and executing are typically locked out of "user-level"
accounts by default, i.e., prevents user from compiling C code with buffer
overflow and C stack exploits.

Most of the systems I use, used, or worked on the most are single user (DOS,
Win98SE, C64, MacIntosh, etc.) while others are multi-user (SysV, VOS, VMS,
Linux) I have minimal WinXP/Vista/7 exposure. I know they implement some
sort of multi-user by default. Years ago, yes, I thoroughly locked down
*other* users since I was the admin. ;-) I do have access to newer
versions of Windows, but they aren't my machines, so I'm not thoroughly
familiar with the account levels and restrictions. If I had a "new" Windows
version on a system of mine, I'd probably attempt to login as "root", or
reconfigure it so it would do so automatically. I know that what privileges
WinXP, Vista, and Win7 allow by default has become more restrictive, but I'm
not too familiar. IIRC, XP had a special login that annoyingly removed the
GUI and access to installed programs, etc., while boosting privilege to run
admin scripts. For Internet usage, I'd probably fall-back to a "user-level"
account to help prevent "drive-by" "root-kit" installs since MS can't seem
to fix this. I.e., lock it down when most likely to come under attack.
Most Linux-es have the multi-user login level scripts setup without
annoyances. I leave them alone. But, I've used more than one Linux where
I've completely removed the scripts for setting user-levels. It's when the
multi-user levels become annoying that I turn the system into a single-user,
single-terminal system. If I used Linux a bit more, I'd probably do it
every time.

BTW, do you set your GUI trash bin automatically delete? I do... and have
for at least a decade. Have I deleted something I wanted back? Yes, maybe
three times over 3 decades. Does DOS allow you to recover deleted files?
It did, upto 6.22 I think, but after that, no. TUC: Total (single) User
Control. If I tell it to delete, I want it deleted. It doesn't have to be
"scrubbed" or overwritten, but I want my disk space back now. Odd, that
makes me sound authoritarian... "Do what I say!" I just want the computer
to do exactly what I tell it to do: nothing more, nothing less. I don't
want any "interference" from the OS. I don't want a "paternalistic" OS,
like Windows is becoming, that keeps blocking, changing, requesting
confirmation, or preventing my actions. I don't want it to make "informed"
or "smart" or "better" or "ask questions" or "automatically restore deleted
files since they're system files" or "delay/prevent deleting a file since
it's 'in use' or 'locked'" or "do something else behind my back for 'safety'
or 'security' or 'because it "knows" better or "knows" what I want'". Of
course, if utilities are designed rationally, then a simple utility isn't
going to have a powerful destructive command as one of it's features, and
therefore, privilege levels aren't needed.


Rod Pemberton


Robert Redelmeier

unread,
Oct 1, 2010, 8:43:48 PM10/1/10
to
In alt.lang.asm Rod Pemberton <do_no...@notreplytome.cmm> wrote in part:

> Linux shells are very awkward, IMO. They've got wierdly named commands,
> strange parameters, odd side effects, and of course, extremely powerful
> commands embedded in primitive utilities, i.e., easy to wreck everything
> unintentionally, etc. I'd love to have a DOS shell for Linux.


IIRC 4DOS is available for Linux . The main interactive
difference between Linux command shells and MS-DOSish command
shells is the former automagically parse wildcards and pass
the list to the command invoked while MS-DOS passes the cmd
line untouched and commands have to do their own parsing.

-- Robert


Alexei A. Frounze

unread,
Oct 2, 2010, 6:01:16 AM10/2/10
to
On Oct 1, 2:57 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in messagenews:6dbf5e0b-8930-4ae3...@v6g2000prd.googlegroups.com...

>
> > On Sep 30, 3:38 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> > wrote:
>
> > >  [snip]
> > > Linux shells are very awkward, IMO. They've got wierdly named commands,
> > > strange parameters, odd side effects, and of course, extremely powerful
> > > commands embedded in primitive utilities, i.e., easy to wreck everything
> > > unintentionally, etc. I'd love to have a DOS shell for Linux.
>
> > Interesting... Are you running everything in as root all the time?
>
> Well, duh!  :-)  Actually, the primary OSes I use (DOS, Win98SE) have no
> support for multi-user, so everything is "root".  Linux does, but I don't
> use it much.  I've been meaning to use it more.
>
> My comment was more on the chaotic features of Linux commands and utilities
> than on needing to reduce my access level...

Well, you're paying the price for the convenience. :) Most people do.
But most people probably do it because they just don't or can't or are
unwilling to know any better. You, OTOH, are informed of the risks
associated with what you're doing but do it anyway. In either case,
it's a trade-off, a fair one in your case.

Alex

Rod Pemberton

unread,
Oct 4, 2010, 3:06:41 AM10/4/10
to
"Alexei A. Frounze" <alexf...@gmail.com> wrote in message
news:5d72f5ac-a1b6-4e2e...@e34g2000prn.googlegroups.com...

> On Oct 1, 2:57 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
> > "Alexei A. Frounze" <alexfrun...@gmail.com> wrote in
messagenews:6dbf5e0b-8930-4ae3...@v6g2000prd.googlegroups.com
...
> > > On Sep 30, 3:38 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> > > wrote:
>
> > > > [snip]
> > > > Linux shells are very awkward, IMO. They've got wierdly named
> > > > commands, strange parameters, odd side effects, and of course,
> > > > extremely powerful commands embedded in primitive utilities, i.e.,
> > > > easy to wreck everything unintentionally, etc. I'd love to have a
> > > > DOS shell for Linux.
>
> > > Interesting... Are you running everything in as root all the time?
>
> > [snip]

>
> Well, you're paying the price for the convenience. :) Most people do.
>
...

> But most people probably do it because they just don't or can't or are
> unwilling to know any better.
>

The category most computer users are in is: "I can use it to do something".
Beyond that, they don't know, or care how it works, until they think it
doesn't work correctly which can be a long time after the fact. The next
category is: "proficient with applications", e.g., word processing,
spreadsheets, presentation software, email. Are we even close to any of
those categories? Or, are we way, way beyond? I think I'm way beyond. I
think my intellect, electrical engineering background, computer programming
skills, and personal operating system development experiences put me into a
very small group of people when it comes to computers. If you're in an area
of the US where many of these types of people are employed, they may seem to
be quite common. If you take the US as a whole, they aren't... Employees
with certain skillsets tend to cluster around employers who hire people with
those skillsets. It's like trying to find the students who performed poorly
in High School on an elite University campus. If you exclude the athletes,
you'll only find the strong academic performers at a top University.

Is your mother very computer literate? If not a mother, then is there a
father, grandparents, brother, sister, etc. in your family, or a friend who
is utterly clueless about computers? Have you had to answer many, many
questions about computers for such a person? Did he/she try to understand?
Did he/she ask you the same question at least a dozen times? Have you tried
explaining the perils of computers to him/her? Security? Virii, malware,
adware, rootkits, phishing, hacking, spoofing, scams, etc? Have they asked
you to run a full virus scan, only for them to become annoyed after it was
still going 5 hours later?

I have tried to explain a very small amount of my computing related
knowledge to my mother. I would like her to grasp the basics so using a
computer is easier. But, it's no use. She doesn't understand the
terminology, or my explanations, or accept that the "way things are" is the
"way things are". She doesn't want to understand how computers work, or
what computer security or Internet safety is, etc. She just wants to use
her computer to do what she wants. It doesn't matter to her if computers
don't work the way she wants them to work. She'll attempt to make it work
her way. She doesn't care that the software doesn't do what she wants.
She'll attempt to make it do what she wants. She'll keep trying until she
breaks something. Or, I hear her complaints about not being able to do it,
essentially forever. I.e., she likes using filenames as a notepad - think
4KB filenames in Windows... She likes saving, deleting, opening, and
renaming files while in Windows save file dialog, etc. Why? She doesn't
have to figure out where her files are stored... They're right there in the
save dialog box. She doesn't know or care what a directory is, what a
directory path is, refuses to remember directory paths, i.e., where her
files are saved, or how to get to them. She just wants them in a folder,
and for that folder to always "be there" when she's using an application
which accesses them. She doesn't understand the difference between a GUI, a
web browser, and the Internet.

I wasn't mentioning all this to belittle my mother. I'm just pointing out
that the understanding of a typical computer programer in regards to
computers is way beyond most computer users, is way beyond most office
workers, is way beyond most college students, etc. And, OS developers are
far beyond that. Some "way beyond" people are paranoid an will lock a
computer down to the point it can't be used, while others will decide it's
not worth the effort since you can't keep everyone out and there is always
one expert out there who can intrude. I tend to be in the keep as many out
as I can given reasonable effort, but I'm aware that there is always someone
who can or possibly can get in. The problem is the Internet can let anyone
in the World intrude into your life without them needing to travel to you.

> You, OTOH, are informed of the risks associated with what you're doing
> but do it anyway. In either case, it's a trade-off, a fair one in your
case.
>

Well, it's not like there is zero security present. Is it enough? I hope
so, but probably not... Is there a way in? Probably, although I hope
not...

I have virus and malware scanners. I have all browser privileges disabled
for most Windows Internet zones. I have most browser privileges disabled
for zones in use. I have Javascript disabled for Adobe .pdf's. I have a
variety of exploits disabled by a security program or I add the site to the
hosts file. I've removed or changed registry keys to enhance security.
I've got a hardware firewall. I actively monitor and kill applications that
I think are out of control. I reboot, if something is doing "too much":
disk, network, cpu, or "more than I told it to do", or "just seems not quite
correct". I aggressively delete file caches, cookies, history, and Flash
storage, etc. and I have most caches and data from the Internet going to a
ramdisk - so this data actually gets deleted. I've redone my registry keys
and browser commands to ensure those cache's go to ramdisk. I check certain
registry keys to prevent changes. I pull the ethernet cable if I see
network activity that shouldn't be there. And, other stuff I'll not mention
publicly. So far, most of those targeted on the Internet are for financial
theft.

Security is a bit like the lock on your front house door. Any guy with
enough momentum can knock the door in. A smart guy can get usually get in
with a shim or screwdriver, and a shove. Authorities or criminals can shoot
out the hinges. Do you add a chain lock? A deadbolt? Multiple locks?
Different door keys? What about a large sliding bolt? Steel door? Heavy
duty industrial hinges? Door with a bullet proof core? Door with a water
core (e.g., for bombs)? At what point is the door secure enough? There are
windows on the house too, a wood roof, a thin metal garage door, etc.
Security for the house a whole is dependent on other features: visual
monitoring of your property, use of inside lights, motion activated outside
lights, an alarm system, door/window lock pins, fence (think guy trying to
hop fence with stolen item in the dark...), landscaping (think rocks,
ditches, trees, shrubs, etc.), sprinkler system (think guy carrying stolen
item on wet lawn at 3 or 4am...), dog, shotgun, etc. At some point,
security becomes more than is usually required and too much may hinder or
reduce the quality of life of the occupant. There are parallels to
over-privilege'd OSes.


Rod Pemberton


s_dub...@yahoo.com

unread,
Oct 22, 2010, 8:40:09 PM10/22/10
to
On Oct 1, 3:09 pm, "wolfgang kern" <nowh...@never.at> wrote:
> Rod Pemberton replied:
>
[snip]

>
> > IMO, > there were way too many 8-bit limitations.
> > It is a BASIC, but it's not a BASIC well suited to modern usage.
>
> I agree here, so yes it would need some addons like block strucured
> syntax and nested IF/DO/WHILE/...
>
> > Even a restricted subset of C is more functional.
>
> Yeah, but Steve's intention seem to be to add even more functionality
> and get rid of useless/redundant syntax requirements...
> I really appreciate every attempt in this direction.
>

Yes, my attention is toward a simpler syntax than C, with fexible
functionallity.

I've been sketching out a 'curly brackets' kind of syntax for awhile..

I'm interested in Abstract Data Types, because they lend themselves to
axiomatic (algebraic) specification.

One book I have, 'Abstract Data Types' by Nell Dale and Henry M.
Walker covers specifications, implementations and applications written
in a generic syntax somewhat like pascal and ada, but closer to (after
digging thru the appendix) Mathematica where a number of examples are
given in Mathematica form.

You can wiki for Mathematica: http://en.wikipedia.org/wiki/Mathematica

And an example of its syntax is here (for Mod):

http://reference.wolfram.com/mathematica/ref/Mod.html

It turns out that what I had in mind for my syntax is somewhat close
to what Mathematica is:

http://reference.wolfram.com/mathematica/ref/If.html

..and Mathematica has a 20+ year headstart.

Some of the basic ideas for statements..

a function call..
<dotted_operator> ::= '.'operator'.' '[' <operator list> ']' ';'

a declaration..
<colon_definition> ::= ':'operator':' '[' <formal_parameter_list> ']'
'{' <compound_statement> '}'

.etc.


Steve


> > But, using line-numbered BASIC as a machine control language does work.
>
> I can confirm this from personal experience (even many years ago).
>

[snip]
> __
> wolfgang- Hide quoted text -

Message has been deleted

Rod Pemberton

unread,
Oct 23, 2010, 2:10:47 PM10/23/10
to
<s_dub...@yahoo.com> wrote in message
news:41c7fb80-d6e1-4848...@c20g2000yqj.googlegroups.com...

> Some of the basic ideas for statements..
>
> a function call..
> <dotted_operator> ::= '.'operator'.' '[' <operator list> ']' ';'
>
> a declaration..
> <colon_definition> ::= ':'operator':' '[' <formal_parameter_list> ']'
> '{' <compound_statement> '}'
>

Ah... some grammar... The guys on comp.lang.misc, and probably
comp.compilers too, love grammar stuff.


RP


Rod Pemberton

unread,
Oct 23, 2010, 2:34:47 PM10/23/10
to
"wolfgang kern" <now...@never.at> wrote in message
news:i9um4r$9cv$2...@newsreader2.utanet.at...
>
> I don't need/want to be C-compliant, so I decided to use a fast
> token interpreter for user editable application-strings.
> Functions are already part of the OS and are chain-jumped rather
> than called, because it saves on call/ret pairs and this's faster!

chain-jumped!!! :-)

> [...] fast token interpreter [...]

and

> good old GOTO survived, and because parameters wont haunt my stack
> there are no issues with nested IF/DO/WHILE by jumping out of them.
>

Sounds alot like FORTH in assembly as DTC (Direct Threaded Code)...

> Me too search for better solutions in my user editable applications
> and programmer tools. The syntax-check is the biggest part of it.
>

Yeah, I ran into this issue too with my attempts at C compilers.

Basically, I've concluded that both me and my tools will "know" what type an
object or keyword or syntax element is. So, what I call "character directed
parsing" works for me. Officially, I think it's a form of syntax directed
parsing. I use a single ASCII character directly in front of syntax to
direct a switch(). The parser doesn't have to identify what the syntax
element represents, since I knew what it was when I coded it, or the code
emitting application I wrote implemented what I knew about it. Currently, I
use it for assembly, but it can be used for other situations, like C, too.
I use it like this for assembly:

.eax _out $255
.eax _push
.ax _rcl $4
.ax .01 _rcl

E.g., . period indicates register, underscore indicates instruction, $
indicates decimal integer, ' for character constants, " for strings. There
is no need to figure out what "eax" is. I.e., is it an instruction or
register or variable? The period says it's a register. As you can see,
I've got pseudo-registers, like .01 for the hardcoded ,1 syntax for ror/rcl
shl/shr etc instructions. I've got others setup too, like , comma for
prefixes, ! for code size, % for hex integer, : for label, ; for
label-reference, [ and ] for memory reference, etc. The parser doesn't need
to identify what comes next. It just needs to jump to the routine to get
it. Identifying what comes next takes much processing with parsers.


I've posted to alt.lang.asm and comp.lang.misc on this:
http://groups.google.com/group/alt.lang.asm/msg/489f14849201d8ee
http://groups.google.com/group/alt.lang.asm/msg/6157767abb722830

And, I mentioned the idea for C on alt.os.development:
http://groups.google.com/group/alt.os.development/msg/1810186d02a3c5f8

> At the moment I have something that look like a G-styled texteditor
> where the user can enter plain text for captions/comments/...
> and insert predefined tokens from a set of selection panels for
> position/format/color/functions/type/size..., while myself still
> use the editable hexdump to create msg-boxes,menus,input forms,...
>

The guys on comp.lang.misc seem to be enamored with XML...


Rod Pemberton


wolfgang kern

unread,
Oct 23, 2010, 7:41:45 PM10/23/10
to

Rod Pemberton replied:

>> I don't need/want to be C-compliant, so I decided to use a fast
>> token interpreter for user editable application-strings.
>> Functions are already part of the OS and are chain-jumped rather
>> than called, because it saves on call/ret pairs and this's faster!

> chain-jumped!!! :-)

there could exist another term for it, but I wont remember yet ;)

>> [...] fast token interpreter [...]
> and
>> good old GOTO survived, and because parameters wont haunt my stack
>> there are no issues with nested IF/DO/WHILE by jumping out of them.
> Sounds alot like FORTH in assembly as DTC (Direct Threaded Code)...

Even my strings are just tokens/parameter sequences, the thing
actually work 'Self Threaded' and can interact with other string-
instances without detours over stack or system-calls.

And because every instance got its very own data-field, declared
by a single pointer in the header of such a string, all functional
code in the OS behind allow rare needed reentrance and asyncron
timeslice multiplexing of several strings.

Yes, I remember this discussions.
For reading code from others I prefer the original x86_ASM notation,
while I can use in my private tools:
R0w=fedc ; or
ax=fedc ; both means MOV ax,0xfedc
eax=Sxbh ; MOVSX byte eax,bh
beside other 'abnormal' mnemonics.

But my tokens were interpreted also during source edit, so the human
readable keywords for them are not part of the strings and could be
different for other language support.

>> At the moment I have something that look like a G-styled texteditor
>> where the user can enter plain text for captions/comments/...
>> and insert predefined tokens from a set of selection panels for
>> position/format/color/functions/type/size..., while myself still
>> use the editable hexdump to create msg-boxes,menus,input forms,...

> The guys on comp.lang.misc seem to be enamored with XML...

script-languages... can't tell how I hate redundant typing jobs.
I'm just able to create simple homepages with a standard texteditor
because I got this terrible huge HTML documentation.

__
wolfgang


s_dub...@yahoo.com

unread,
Oct 23, 2010, 8:01:03 PM10/23/10
to
On Oct 23, 7:57 am, "wolfgang kern" <nowh...@never.at> wrote:

[snip]


>
> I don't need/want to be C-compliant, so I decided to use a fast
> token interpreter for user editable application-strings.
> Functions are already part of the OS and are chain-jumped rather
> than called, because it saves on call/ret pairs and this's faster!

> good old GOTO survived, and because parameters wont haunt my stack
> there are no issues with nested IF/DO/WHILE by jumping out of them.
>

> Me too search for better solutions in my user editable applications
> and programmer tools. The syntax-check is the biggest part of it.
>

> At the moment I have something that look like a G-styled texteditor
> where the user can enter plain text for captions/comments/...
> and insert predefined tokens from a set of selection panels for
> position/format/color/functions/type/size..., while myself still
> use the editable hexdump to create msg-boxes,menus,input forms,...
>

This reminds me of a product that DRI had for CP/M called Display
Manager. I don't remember much about it because I never actually used
it past a demo, but it was a package which took a simple syntax of
commands and built text based msg-boxes, menus, input forms, and the
like, on the fly, as it interpreted commands for it. It was an
interesting idea for its day, but suffered in speed on the 8mhz
systems.

> I followed some discussions about 'regular expression' interpretation
> in the past, but even I think this may add to readabilty, it need some
> more graphic elements to display complex multi-line formulars.
> Not to mention an editor for power(sup),index(sub),roots,infititives
> and multiline brackets.

Actually, what got me thinking about some sort of dotted operator
syntax was how to represent such math constructs without graphics,
being restricted to ascii, or a superset of it. In the old days,
power was sometimes represented as m**n, or m^n, or m^^n. So, another
way using dotted operator syntax; .pwr.[m,n]; for .operator.
[operand_list]; - the syntax allows parsing in a reasonable way I
think, and a interpreter for it could generate graphic output display
for the user at some point.

Steve

> It may exist already anyway, I gave up on it to avoid bloat.
>
> __
> wolfgang

s_dub...@yahoo.com

unread,
Oct 23, 2010, 8:04:40 PM10/23/10
to
On Oct 23, 1:10 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <s_dubrov...@yahoo.com> wrote in message

Yeah, just enough shown for halloween :-)

When I ever finish the syntax and an interpreter for it, I'll drop a
note off there too.

Steve

wolfgang kern

unread,
Oct 24, 2010, 8:56:59 AM10/24/10
to

Sreve wrote:

|[snip]
...


> At the moment I have something that look like a G-styled texteditor
> where the user can enter plain text for captions/comments/...
> and insert predefined tokens from a set of selection panels for
> position/format/color/functions/type/size..., while myself still
> use the editable hexdump to create msg-boxes,menus,input forms,...

|This reminds me of a product that DRI had for CP/M called Display
|Manager. I don't remember much about it because I never actually used
|it past a demo, but it was a package which took a simple syntax of
|commands and built text based msg-boxes, menus, input forms, and the
|like, on the fly, as it interpreted commands for it. It was an
|interesting idea for its day, but suffered in speed on the 8mhz
|systems.

I liked the idea (and particular took it) from Clive Sinclair's
ZX80/81-BASIC which saved me a lot on keystrokes by using the
hotkey-functions.
I may be a decent programmer, but an awful lazy typist ;)

Speed weren't an issue, also my old 50MHz machine (rip) could fill
a full 1024*768 screen with formatted (interpreted) graphic-text
within a 60Hz sreen-refresh period.
On the newer/faster machines, I can add functions and calculations
like direct numeric display from memory image (bin2ascii,bin2hex,..)
and some graphic elements like borders, icons, symbols up to the
point where the screen start flickering.

> I followed some discussions about 'regular expression' interpretation
> in the past, but even I think this may add to readabilty, it need some
> more graphic elements to display complex multi-line formulars.
> Not to mention an editor for power(sup),index(sub),roots,infititives
> and multiline brackets.

|Actually, what got me thinking about some sort of dotted operator
|syntax was how to represent such math constructs without graphics,
|being restricted to ascii, or a superset of it. In the old days,
|power was sometimes represented as m**n, or m^n, or m^^n. So, another
|way using dotted operator syntax; .pwr.[m,n]; for .operator.
|[operand_list]; - the syntax allows parsing in a reasonable way I
|think, and a interpreter for it could generate graphic output display
|for the user at some point.

Dots, Commas and Flyshit...
I can't recommend to give this little ones more functionality than
parameter-separators. I wasted too many hours on bug search during
the ZX80/C-64 BASIC days due to these easy overlooked.

__
wolfgang

wolfgang kern

unread,
Oct 24, 2010, 9:13:05 AM10/24/10
to
I posted:

> Sreve wrote:
...
I'm awful sorry Steve, eyes and fingers are wornout, so my
warning on using dots syntactically is not for nothing ;)

__
wolfgang


s_dub...@yahoo.com

unread,
Oct 24, 2010, 11:56:21 PM10/24/10
to

It's ok Wolfgang, that is a valid concern, my eyesight isn't what it
was either. It is more about position and uniqueness, so symbol
substitution is just a macro-substitution away. :-)

Steve

Rod Pemberton

unread,
Jan 30, 2011, 6:41:21 AM1/30/11
to
"Rod Pemberton" <do_no...@notreplytome.cmm> wrote in message
news:i74pa5$qo0$1...@speranza.aioe.org...
>
> I've looked at the SmallC heir's. It should be easy to implement them for
> logical-and && and logical-or ||.
>

Steve,

IIRC, we were having problems following the recursive-descent parser in
SmallC. Well, it seems someone wrote a grammar for SmallC back in 1983.
The grammar follows the code. Geof Cooper's SmallC grammar is here:
http://groups.google.com/group/net.sources/browse_thread/thread/7e70ce70b4d4dd17/13cb3e331ece1bd7

I've not seen that style before, but it's close to E/BNF for Yacc/Bison.

Google Groups' archive sometimes pulls stuff up back to 1983 or 1981.
However, posts prior to 1991 or so become unfindable for 6-8 months at a
time... In them, I also found two other versions of SmallC, one for vax and
the complete Chris Lewis C3.OR1.1 version. I also found 3 of 4 posts on V2
CP/M runtime support for SmallC. They were posted to net.sources and
comp.os.minix. If you want links or Usenet msg-IDs, let me know.


Rod Pemberton

s_dub...@yahoo.com

unread,
Jan 30, 2011, 11:57:35 PM1/30/11
to
On Jan 30, 5:41 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> "Rod Pemberton" <do_not_h...@notreplytome.cmm> wrote in message

>
> news:i74pa5$qo0$1...@speranza.aioe.org...
>
>
>
> > I've looked at the SmallC heir's.  It should be easy to implement them for
> > logical-and && and logical-or ||.
>
> Steve,
>
> IIRC, we were having problems following the recursive-descent parser in
> SmallC.  Well, it seems someone wrote a grammar for SmallC back in 1983.
> The grammar follows the code.  Geof Cooper's SmallC grammar is here:http://groups.google.com/group/net.sources/browse_thread/thread/7e70c...

>
> I've not seen that style before, but it's close to E/BNF for Yacc/Bison.
>

Hey, thanks.
Yeah, informal EBNF, that's ok.

Looks like a version for version 2 (Hendricks), just a heads up on
that.

> Google Groups' archive sometimes pulls stuff up back to 1983 or 1981.
> However, posts prior to 1991 or so become unfindable for 6-8 months at a
> time...  In them, I also found two other versions of SmallC, one for vax and
> the complete Chris Lewis C3.OR1.1 version.  I also found 3 of 4 posts on V2
> CP/M runtime support for SmallC.  They were posted to net.sources and
> comp.os.minix.  If you want links or Usenet msg-IDs, let me know.
>

Sure, pass them (the links) along, I'll check them out against what
sources I may have, or don't have..

I'd started on a pcdos 'handles' (int 21h) version of an iolib, still
in the midst of it.

And, I've been trying to inch along an interpreter for a "simplified"
curly braces kind of language.
It's been hard to gather enough time together to work on it.

Thanks again,

Steve

> Rod Pemberton

Rod Pemberton

unread,
Feb 1, 2011, 2:24:08 AM2/1/11
to
<s_dub...@yahoo.com> wrote in message
news:11a75afc-840d-4ed4...@d11g2000yql.googlegroups.com...

> On Jan 30, 5:41 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
> > "Rod Pemberton" <do_not_h...@notreplytome.cmm> wrote in message
> > news:i74pa5$qo0$1...@speranza.aioe.org...
> >
> > I've looked at the SmallC heir's. It should be easy to implement them
> > for logical-and && and logical-or ||.
> >
> > IIRC, we were having problems following the recursive-descent parser in
> > SmallC. Well, it seems someone wrote a grammar for SmallC back in 1983.
> > The grammar follows the code. Geof Cooper's SmallC grammar is [link].

> >
> > I've not seen that style before, but it's close to E/BNF for Yacc/Bison.
> >
>
> Hey, thanks.
> Yeah, informal EBNF, that's ok.
>
> Looks like a version for version 2 (Hendricks), just a heads up
> on that.

How do I tell version 1 from 2 ?

> I also found two other versions of SmallC, one for vax
> and the complete Chris Lewis C3.OR1.1 version. I also
> found 3 of 4 posts on V2 CP/M runtime support for SmallC.

Ok. I found the first.

> And, I've been trying to inch along an interpreter for a
> "simplified" curly braces kind of language.
>

Interpreter of bytecode or source code? If source code, that's different.
It may be somewhat slower to do lexing and parsing, but I think it's do
able. You'd definately want to find ways to minimize lexing/parsing. I
think they call some of those issues syntax "ambiguities". E.g., I think
space based parsing, like FORTH, will work with much of C, although C
doesn't require it.

I think it's doable because ... SmallC uses a two register model. FORTH,
which is stack based, usually uses only the first few stack items at a time,
e.g., 3 or 4, somewhat like procedure locals and parameters for C. So, it
should be easy to implement that two register model for a stack based
language, like FORTH, or for a stack-based interpreter. Also, I wrote a C
"compiler" (very limited) which *almost* output FORTH. There were some
remaining issues. I wasn't taking in the FORTH direction, since I've
already got an in-progress FORTH interpreter in C. E.g., one of the things
I didn't finish was the "compiler" code to convert C function parameters to
stack use, i.e., the parameters/return and prolog/epilog. But, it was
pretty close to FORTH. Still is.

> Sure, pass them (the links) along,

Chris Lewis C3.OR1.1 (all 3)
http://groups.google.com/group/comp.os.minix/browse_thread/thread/bbea732c991fe4f8/a931179f3d2a2bdb
http://groups.google.com/group/comp.os.minix/browse_thread/thread/baa464962324565b/28a71e38d03359cf
http://groups.google.com/group/comp.os.minix/browse_thread/thread/bcfd910bbc3a19f3/7a0efd632d5f5f17

V2 CP/M support (all 4)
http://groups.google.com/group/net.sources/browse_thread/thread/4be21f65ce5dda7f/c219f5c302ca8f4a
http://groups.google.com/group/net.sources/browse_thread/thread/b660134d0e9c2fa0/3651698ed1557836
http://groups.google.com/group/net.sources/browse_thread/thread/b660134d0e9c2fa0/b234d4d757e89ddc
http://groups.google.com/group/net.sources/browse_thread/thread/b660134d0e9c2fa0/62e319347f3496fb

SmallC for vax. Somewhere in there, there is supposed to be an #include ""
fix:
http://groups.google.com/group/net.sources/browse_thread/thread/1001cddd9bf4d/9104cdcd9bacce9a


Rod Pemberton


s_dub...@yahoo.com

unread,
Feb 1, 2011, 1:10:29 PM2/1/11
to
On Feb 1, 1:24 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> <s_dubrov...@yahoo.com> wrote in message

>
> news:11a75afc-840d-4ed4...@d11g2000yql.googlegroups.com...
>
>
>
>
>
> > On Jan 30, 5:41 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> > wrote:
> > > "Rod Pemberton" <do_not_h...@notreplytome.cmm> wrote in message
> > >news:i74pa5$qo0$1...@speranza.aioe.org...
>
> > > I've looked at the SmallC heir's. It should be easy to implement them
> > > for logical-and && and logical-or ||.
>
> > > IIRC, we were having problems following the recursive-descent parser in
> > > SmallC. Well, it seems someone wrote a grammar for SmallC back in 1983.
> > > The grammar follows the code. Geof Cooper's SmallC grammar is [link].
>
> > > I've not seen that style before, but it's close to E/BNF for Yacc/Bison.
>
> > Hey, thanks.
> > Yeah, informal EBNF, that's ok.
>
> > Looks like a version for version 2 (Hendricks), just a heads up
> > on that.
>
> How do I tell version 1 from 2 ?
>
By the added syntax for things unsupported in Cain's original version,
such as:
"extern","#undef","register","switch", etc.

> > I also found two other versions of SmallC, one for vax
> > and the complete Chris Lewis C3.OR1.1 version. I also
> > found 3 of 4 posts on V2 CP/M runtime support for SmallC.
>
> Ok.  I found the first.
>

Thanks again, I have this already but not in this form. There's _not_
an 8086 backend for it, just to be clear.

> > And, I've been trying to inch along an interpreter for a
> > "simplified" curly braces kind of language.
>
> Interpreter of bytecode or source code?  If source code, that's different.

The source is to be typical text syntax. Purportedly, an interpreter
for a language is supposed to be easier to do than a compiler, we'll
see. I can see that it should be easier to debug and boot strap a
language this way, if starting from assembler.

> It may be somewhat slower to do lexing and parsing, but I think it's do
> able.  You'd definately want to find ways to minimize lexing/parsing.  I
> think they call some of those issues syntax "ambiguities".  E.g., I think
> space based parsing, like FORTH, will work with much of C, although C
> doesn't require it.
>

Those issues depend on the form of the language. I'm not far enough
along to imagine what I'll end up with, we'll see.

> I think it's doable because ...  SmallC uses a two register model.  FORTH,
> which is stack based, usually uses only the first few stack items at a time,
> e.g., 3 or 4, somewhat like procedure locals and parameters for C.  So, it
> should be easy to implement that two register model for a stack based
> language, like FORTH, or for a stack-based interpreter.  Also, I wrote a C
> "compiler" (very limited) which *almost* output FORTH.  There were some
> remaining issues.  I wasn't taking in the FORTH direction, since I've
> already got an in-progress FORTH interpreter in C.  E.g., one of the things
> I didn't finish was the "compiler" code to convert C function parameters to
> stack use, i.e., the parameters/return and prolog/epilog.  But, it was
> pretty close to FORTH.  Still is.
>
> > Sure, pass them (the links) along,
>

> Chris Lewis C3.OR1.1 (all 3)http://groups.google.com/group/comp.os.minix/browse_thread/thread/bbe...http://groups.google.com/group/comp.os.minix/browse_thread/thread/baa...http://groups.google.com/group/comp.os.minix/browse_thread/thread/bcf...
>
> V2 CP/M support (all 4)http://groups.google.com/group/net.sources/browse_thread/thread/4be21...http://groups.google.com/group/net.sources/browse_thread/thread/b6601...http://groups.google.com/group/net.sources/browse_thread/thread/b6601...http://groups.google.com/group/net.sources/browse_thread/thread/b6601...


>
> SmallC for vax.  Somewhere in there, there is supposed to be an #include ""

> fix:http://groups.google.com/group/net.sources/browse_thread/thread/1001c...
>

Thanks for those, as you know the cp/m stuff can map to Call5. What
is interesting in those is a mapping of unixy stuff to cp/m. So I'll
be reading them closer with a mindset of mapping the unixy stuff to
Call5.
There's also a clearer indication of how 'switch' could be
implemented.

Anyway, thanks again for those,

Steve

> Rod Pemberton- Hide quoted text -

0 new messages