Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Open source C compiler using Regular Expressions

169 views
Skip to first unread message

sasho648

unread,
Sep 1, 2021, 12:28:50 PM9/1/21
to
It uses PCRE2 to parse the C file and match a huge regex composed of several .regex files stitched together one Perl script (main.pl). There are about 94 currently callouts placed inside it which invoke C++ code that reads named capture groups and calls the LLVM APIs appropriately to construct a program.

https://github.com/6a4h8/cparser2/tree/wip

This is an open source compiler using regular expressions and mainly focusing on the C89 (from fips pub 160 pdf document).



The backend was originally a huge C switch which I recently converted into C++ virtual functions - there are two pair of them - one for parsing - they can alter the match and one for producing.

The parsing one is mainly used for typedefs since they require context sensitive parsing inside functions.

Currently it doesn't implement: initialization, WIP on implementing conditional evaluation with the logical ops, incomplete types, un-prototyped functions.

Most importantly it doesn't support attributes and preprocessor directives.

It does implement: everything else hopefully.

Check out the WIP branch (lastly worked on Windows). Invocation:

cparser main.pl in_src.c

Expected output (llvm bitcode and IR representation):

in_src.c.bc
in_src.c.ll

It can be debugged if you uncomment the ending of line 6 in main.h. This will produce 2 output.txt files and significantly slow down the compilation process.

Benjamin Williams (Hodgez)

unread,
Jun 12, 2023, 1:55:35 AM6/12/23
to
Absolute mad lad. I love it. I will have to give it a try later to see how all it works.

sasho648

unread,
Jun 12, 2023, 3:46:16 AM6/12/23
to
On Monday, June 12, 2023 at 8:55:35 AM UTC+3, Benjamin Williams (Hodgez) wrote:
> Absolute mad lad. I love it. I will have to give it a try later to see how all it works.
Just FYI - it's on https://github.com/AnFunctionArray/cllvmbackend on now (with git submodule - the actual perl/regex part). I guess on the "mad lad" part you'll be happy to hear that this version is also multithreaded (because it turned out (last time - I've not checked out the last perl updates) that this way was actually faster - with the bottleneck otherwise being the regex engine) - you need this evn vars:

MAXTHREADS=8
MINLEN=50000
SILENT=1

Otherwise the syntax is the same:

regularc ./parse.pl ./bulk/tests/test.c

But also generally last time it had some issues (since I was trying it for different purposes (for which there is the non standard INTPROM env var)). However I also had success compiling the c donut program with slight modifications (mainly removed the preprocessor - line concatenation and comments) at certain point in the past.

sasho648

unread,
Jun 12, 2023, 3:49:47 AM6/12/23
to
Faster - that's for **very large** files - otherwise it's the same.
0 new messages