Bithov Vinu Student <
vi...@calday.co.uk> writes:
> Here is my code for parsing S-expressions:
>
> ```
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #define TOKEN_MAX 256
> typedef struct List List;
>
> struct List {
> union {
> char* string;
> List* list;
> };
> enum {
> STRING,
> SYMBOL,
> LIST
> } type;
> List* next;
> };
>
> List* parse(FILE* stream) {
> List* head, *tail = NULL;
You don't initialise head, and later on the logic relies on it being
NULL to start with.
> char c;
>
> while((c = fgetc(stream)) != EOF) {
c should be an int. What are you using as your reference for learning
C? It's possible you are being led astray, because this sort of loop
should be explained early on.
Also, if you loop until the input finishes, you are not, logically,
parsing an S-expression. What would you do with the input
a b c
? It's a list in some sense but not an S-expression. An S-expression
parser should be ready to stop when it's done.
> List* current = NULL;
>
> if(c == '(') {
> current->list = parse(stream);
> current->type = LIST;
You can't access p->mem unless p points to a structure but I think the
"big picture" needs to be re-thought out here. Break the problem down
into cooperating functions.
> } else if(c == '"') {
> char* tmp = calloc(1, TOKEN_MAX);
> int i = 0;
> while(((c = fgetc(stream)) != '"') && (i < TOKEN_MAX) && (c != ')') && (c != '(')) {
> tmp[i++] = c;
> }
You need to account for EOF. And why can't a string include
parentheses?
> tmp[i] = c;
> current = calloc(1, sizeof(List));
> current->string = calloc(1, strlen(tmp)+1);
> strcpy(current->string, tmp);
> current->type = STRING;
This is the sort of pattern I'd expect. Allocate the struct and fill in
the members. It's a small point, but you know the length already. No
need for a strlen call.
But once the scope is left, the storage pointed to by tmp is lost
forever. You should free it.
> } else {
> char* tmp = calloc(1, TOKEN_MAX);
> int i = 0;
> while(((c = fgetc(stream)) != ' ') && (i < TOKEN_MAX) && (c != ')') && (c != '(')) {
> tmp[i++] = c;
> }
EOF again, but this time you are too permissive. Symbols can't contain
(unescaped) spaces and such like. This loop should test what is
permitted, rather than what is not.
> tmp[i] = c;
> current = calloc(1, sizeof(List));
> current->string = calloc(1, strlen(tmp)+1);
> strcpy(current->string, tmp);
> current->type = SYMBOL;
> }
> if(current != NULL) {
> if(head == NULL) {
> head = tail = current;
> } else {
> tail = tail->next = current;
> }
> }
> }
> return head;
> }
>
> int main() {
> parse(fopen("test.lisp", "r"));
> }
> ```
>
> I know the code for parsing symbols and strings can be truncated into
> a single general-purpose function but for now I'm just looking to get
> any sort of parser working.
Hmm... not sure I'd put those together. But I would suggest lots of
functions. S-expressions have a structure; I'd reflect that in the
parsing functions.
Tips: turn on as many warnings form the compiler as you can. If using a
recent gcc, use its -fsanitize=undefined option. And/or, if you can,
run the program under the control of an run0time sanitiser like
valgrind.
--
Ben.