Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using a C struct to represent a node in a parse tree ... how many fields in the struct?

64 views
Skip to first unread message

Roger L Costello

unread,
Apr 8, 2022, 11:50:41 AM4/8/22
to
Hi Folks,

The compiler book [1] that I am reading says this:

The most obvious way to represent the information gained from lexical and
syntax analysis is as a tree along the lines of the parse tree. In C this is
suitably handled using a simple struct for each node:

struct node
{
int nodetype ;
struct node *field1 ;
struct node *field2 ;
struct node *field3 ;
struct node *field4 ;
struct node *field5 ;
} ;

But, but, but, ..what if a node requires more than 5 fields; how is that
handled?

/Roger

[1] Introduction to Compiling Techniques, A First Course Using ANSI C, LEX and
YACC by J. P. Bennett, page 47.
[Use a bigger struct. You know what kind of nodes your parser creates, so define
a struct that does what you want. Personally, I prefer to define a struct for
each node type and a general node that is a union, or perhaps a union of
pointers. -John]

Johann Klammer

unread,
Apr 8, 2022, 2:59:23 PM4/8/22
to
doesn't happen.
e.g. if/else is always 1 or two nodes. if they're deeper nested
it ends up in subnodes. as long as you use flex/yacc you'll just have to look
what's the highest number of nonterminals in your ruleset. and that's the number
of fields in your struct, possibly plus an identifying enum and usually some
union for the terminal symbols(as they won't have subnodes)
if you're overly anal you can prolly dynamic allocate those pointers.

> struct node
> {
> int nodetype ;
> node *fields[0];
> } ;

Kaz Kylheku

unread,
Apr 8, 2022, 5:09:26 PM4/8/22
to
On 2022-04-08, Roger L Costello <cost...@mitre.org> wrote:
> Hi Folks,
>
> The compiler book [1] that I am reading says this:
>
> The most obvious way to represent the information gained from lexical and
> syntax analysis is as a tree along the lines of the parse tree. In C this is
> suitably handled using a simple struct for each node:
>
> struct node
> {
> int nodetype ;
> struct node *field1 ;
> struct node *field2 ;
> struct node *field3 ;
> struct node *field4 ;
> struct node *field5 ;
> } ;
>
> But, but, but, ..what if a node requires more than 5 fields; how is that
> handled?

The fields are all nodes so you can easily have a linked list:

struct node
{
int nodetype;
struct node *car;
struct node *cdr;
}

The Lisp family of languages handle all syntax using nothing but binary
cells like this, including nodes with arbitrary numbers of arguments.


You can also also have an N-ary tree, compactly allocated, using
the C struct hack (which is now official and called "flexible array
member"):

struct node {
int nodetype;
int nfields;
struct node* fields[];
};

When the node is allocated, extra room is requested for required
number of fields, which are then accessed as nodeptr->field[0],
and so on.

If it's ever necessary to dynamically grow that after a node has
come into existence, and is referenced in multiple places in
the program, the above representation has disadvantages.

But you can easily switch to:

struct node {
int nodetype;
int nfields;
struct node** fields;
};

where fields is a separately allocated array of pointers. The
access expressions look the same.

nodeptr->field[0]

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

marb...@yahoo.co.uk

unread,
Apr 9, 2022, 12:05:58 PM4/9/22
to
Hmm! The last time I wrote a (simple) compiler, it never occurred to me to use any of those methods! Here's my code (with the 'payload' removed from the struct and new_node):
I guess Johann would say I'm being extremely anal :-D

struct tree {
enum token_type type;
struct tree *child, *sibling;
};

static struct tree *new_node(int type) {
struct tree *res=malloc_or_bomb(sizeof *res);
res->type=type;
res->child=res->sibling=NULL;
return res;
}

static struct tree *add_child(struct tree *parent, struct tree *child) {
/* Add child to end of parent's list of children. */
struct tree **p;
for (p=&parent->child; NULL!=*p; p=&(*p)->sibling)
;
*p=child;
return child;
}

static struct tree *add_first_child(struct tree *parent, struct tree *child) {
/* Add child to start of parent's list of children. */
child->sibling=parent->child;
parent->child=child;
return child;
}

(I think I pinched the child/sibling technique for storing trees with arbitrary numbers of children from an Infocom game!)
[The child/sibling approach is the way Lisp lists have worked since the last 1950s.
For reasons related to the architecture of the IBM 709, they're usually spelled CAR and CDR. -John]

luser droog

unread,
Apr 28, 2022, 12:30:37 AM4/28/22
to
On Friday, April 8, 2022 at 4:09:26 PM UTC-5, Kaz Kylheku wrote:
> On 2022-04-08, Roger L Costello <cost...@mitre.org> wrote:
> > Hi Folks,
> >
> > The compiler book [1] that I am reading says this:
> >
> > The most obvious way to represent the information gained from lexical and
> > syntax analysis is as a tree along the lines of the parse tree. In C this is
> > suitably handled using a simple struct for each node:
> >
> > struct node
> > {
> > int nodetype ;
> > struct node *field1 ;
> > struct node *field2 ;
> > struct node *field3 ;
> > struct node *field4 ;
> > struct node *field5 ;
> > } ;
> >
> > But, but, but, ..what if a node requires more than 5 fields; how is that
> > handled?
> The fields are all nodes so you can easily have a linked list:
>
> struct node
> {
> int nodetype;
> struct node *car;
> struct node *cdr;
> }
>
> The Lisp family of languages handle all syntax using nothing but binary
> cells like this, including nodes with arbitrary numbers of arguments.
>

This is mostly how I do it. I've just started a new draft in C, so it's the
appropriate size (and scope) to share as an example. One trick I've found
useful in previous versions is the extra 'data' member in the symbol struct.
If the output of the tokenizer is a stream of symbols, then the data for each
symbol can be something useful like the original string representation and/or
file position.


#define PC10OBJECT_H

typedef union object object;
typedef object list;
typedef object suspension;
typedef object parser;
typedef object boolean;
typedef object operator;
typedef operator predicate;
typedef object fSuspension( object );
typedef object fParser( object, list );
typedef object fOperator( object, object );
typedef boolean fPredicate( object, object );
typedef object fBinaryOperator( object, object );

typedef enum {
INVALID, INT, LIST, SUSPENSION, PARSER, OPERATOR, SYMBOL, STRING, VOID
} tag;

union object { tag t;
struct { tag t; int i; } Int;
struct { tag t; struct list *ref; } List;
struct { tag t; struct suspension *ref; } Suspension;
struct { tag t; struct parser *ref; } Parser;
struct { tag t; struct operator *ref; } Operator;
struct { tag t; struct symbol *ref; } Symbol;
struct { tag t; struct string *ref; } String;
struct { tag t; void *ptr; } Void;
};

struct list {
object a, b; // or car and cdr if you like
};

struct suspension {
object env; // an association list of (<symbol>, <value>) pairs
fSuspension *f;
};

struct parser {
object env;
fParser *f;
};

struct operator {
object env;
fOperator *f;
};

struct symbol {
int code;
char *printname;
object data;
};

struct string {
char *ptr;
int disposable; // if yes, the GC should call free(ptr)
};

extern boolean T_, NIL_;
0 new messages