Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Segmentation fault when appending to a Linked List

23 views
Skip to first unread message

Bithov Vinu Student

unread,
Sep 10, 2021, 4:51:32 PM9/10/21
to
Here is my code for parsing S-expressions:

```
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TOKEN_MAX 256
typedef struct List List;

struct List {
union {
char* string;
List* list;
};
enum {
STRING,
SYMBOL,
LIST
} type;
List* next;
};

List* parse(FILE* stream) {
List* head, *tail = NULL;
char c;

while((c = fgetc(stream)) != EOF) {
List* current = NULL;

if(c == '(') {
current->list = parse(stream);
current->type = LIST;
} else if(c == '"') {
char* tmp = calloc(1, TOKEN_MAX);
int i = 0;
while(((c = fgetc(stream)) != '"') && (i < TOKEN_MAX) && (c != ')') && (c != '(')) {
tmp[i++] = c;
}
tmp[i] = c;
current = calloc(1, sizeof(List));
current->string = calloc(1, strlen(tmp)+1);
strcpy(current->string, tmp);
current->type = STRING;
} else {
char* tmp = calloc(1, TOKEN_MAX);
int i = 0;
while(((c = fgetc(stream)) != ' ') && (i < TOKEN_MAX) && (c != ')') && (c != '(')) {
tmp[i++] = c;
}
tmp[i] = c;
current = calloc(1, sizeof(List));
current->string = calloc(1, strlen(tmp)+1);
strcpy(current->string, tmp);
current->type = SYMBOL;
}
if(current != NULL) {
if(head == NULL) {
head = tail = current;
} else {
tail = tail->next = current;
}
}
}
return head;
}

int main() {
parse(fopen("test.lisp", "r"));
}
```

I know the code for parsing symbols and strings can be truncated into a single general-purpose function but for now I'm just looking to get any sort of parser working. When tested with the following code (in test.lisp):

```
("hello" world ("this is " an ("example of a common" lisp (program))))
```

Compiling with gcc with standard flags gives me a segfault. Compiling with -g and using "gdb run" gives me the output:

```
Starting program: /home/USER/sex

Program received signal SIGSEGV, Segmentation fault.
0x00005555555553e8 in parse (stream=0x5555555592a0) at sex.c:57
57 tail = tail->next = current;
```

What have I done wrong? Thanks

--
Student Account

--
Calday Grange Grammar School is a charitable company limited by guarantee
and registered in England and Wales with company number 8332696.
The
Registered Office is at Grammar School Lane, West Kirby, Wirral, CH48 8GG

Lew Pitcher

unread,
Sep 10, 2021, 4:57:32 PM9/10/21
to
Consider the above test. If c contains a '(', you
a) assign the results of parse() to a structure member
pointed to by current (current->list), and
b) assign the value of LIST to a structure member pointed
to by the current (current->type)

However, just prior to the test, you set current to point
to NULL. You can't safely access memory at NULL.




[snip]
> What have I done wrong? Thanks
>
> --
> Student Account




--
Lew Pitcher
"In Skills, We Trust"

Lew Pitcher

unread,
Sep 10, 2021, 5:45:37 PM9/10/21
to
Regarding the error reported by gdb

On Fri, 10 Sep 2021 13:51:26 -0700, Bithov Vinu Student wrote:

> Here is my code for parsing S-expressions:
>
> ```
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #define TOKEN_MAX 256
> typedef struct List List;
>
> struct List {
> union {
> char* string;
> List* list;
> };
> enum {
> STRING,
> SYMBOL,
> LIST
> } type;
> List* next;
> };
>
> List* parse(FILE* stream) {
> List* head, *tail = NULL;
> char c;
>
> while((c = fgetc(stream)) != EOF) {
> List* current = NULL;
>
[snip]
> if(current != NULL) {
> if(head == NULL) {
> head = tail = current;
> } else {
> tail = tail->next = current;
> }
> }
> }
> return head;
> }
[snip]
> Compiling with gcc with standard flags gives me a segfault. Compiling with -g and using "gdb run" gives me the output:
>
> ```
> Starting program: /home/USER/sex
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00005555555553e8 in parse (stream=0x5555555592a0) at sex.c:57
> 57 tail = tail->next = current;
> ```
>
> What have I done wrong? Thanks

Note your initialization...
> List* head, *tail = NULL;

Note the assignment
> tail = tail->next = current;

Given the initialization of tail,
1) Where do you set tail to an object (other than NULL)?
2) what does tail point at?
3) what does tail->next reference?

Ben Bacarisse

unread,
Sep 10, 2021, 6:28:47 PM9/10/21
to
Bithov Vinu Student <vi...@calday.co.uk> writes:

> Here is my code for parsing S-expressions:
>
> ```
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #define TOKEN_MAX 256
> typedef struct List List;
>
> struct List {
> union {
> char* string;
> List* list;
> };
> enum {
> STRING,
> SYMBOL,
> LIST
> } type;
> List* next;
> };
>
> List* parse(FILE* stream) {
> List* head, *tail = NULL;

You don't initialise head, and later on the logic relies on it being
NULL to start with.

> char c;
>
> while((c = fgetc(stream)) != EOF) {

c should be an int. What are you using as your reference for learning
C? It's possible you are being led astray, because this sort of loop
should be explained early on.

Also, if you loop until the input finishes, you are not, logically,
parsing an S-expression. What would you do with the input

a b c

? It's a list in some sense but not an S-expression. An S-expression
parser should be ready to stop when it's done.

> List* current = NULL;
>
> if(c == '(') {
> current->list = parse(stream);
> current->type = LIST;

You can't access p->mem unless p points to a structure but I think the
"big picture" needs to be re-thought out here. Break the problem down
into cooperating functions.

> } else if(c == '"') {
> char* tmp = calloc(1, TOKEN_MAX);
> int i = 0;
> while(((c = fgetc(stream)) != '"') && (i < TOKEN_MAX) && (c != ')') && (c != '(')) {
> tmp[i++] = c;
> }

You need to account for EOF. And why can't a string include
parentheses?

> tmp[i] = c;
> current = calloc(1, sizeof(List));
> current->string = calloc(1, strlen(tmp)+1);
> strcpy(current->string, tmp);
> current->type = STRING;

This is the sort of pattern I'd expect. Allocate the struct and fill in
the members. It's a small point, but you know the length already. No
need for a strlen call.

But once the scope is left, the storage pointed to by tmp is lost
forever. You should free it.

> } else {
> char* tmp = calloc(1, TOKEN_MAX);
> int i = 0;
> while(((c = fgetc(stream)) != ' ') && (i < TOKEN_MAX) && (c != ')') && (c != '(')) {
> tmp[i++] = c;
> }

EOF again, but this time you are too permissive. Symbols can't contain
(unescaped) spaces and such like. This loop should test what is
permitted, rather than what is not.

> tmp[i] = c;
> current = calloc(1, sizeof(List));
> current->string = calloc(1, strlen(tmp)+1);
> strcpy(current->string, tmp);
> current->type = SYMBOL;
> }
> if(current != NULL) {
> if(head == NULL) {
> head = tail = current;
> } else {
> tail = tail->next = current;
> }
> }
> }
> return head;
> }
>
> int main() {
> parse(fopen("test.lisp", "r"));
> }
> ```
>
> I know the code for parsing symbols and strings can be truncated into
> a single general-purpose function but for now I'm just looking to get
> any sort of parser working.

Hmm... not sure I'd put those together. But I would suggest lots of
functions. S-expressions have a structure; I'd reflect that in the
parsing functions.

Tips: turn on as many warnings form the compiler as you can. If using a
recent gcc, use its -fsanitize=undefined option. And/or, if you can,
run the program under the control of an run0time sanitiser like
valgrind.

--
Ben.
0 new messages