Regular expression parsing

47 views
Skip to first unread message

trx2358...@yahoo.com

unread,
Sep 12, 2024, 1:39:13 AMSep 12
to Digest Recipients
As a follow up to my article on Apter trees I did a write-up on parsing regular expressions into trees which can be walked to generate a state machine for parsing strings.

It builds upon Raul Miller’s post on Nondeterministic Finite Automata.

Most of the hard work is in that article.  I just slapped a skin on top of it.  Though there is at least one trick in my parser that I quite like.


This is obviously not meant to compete with PCRE but is more like an educational tool.  There’s currently no error checking and things like escape handling in character classes doesn’t match that standard, but it should be serviceable.

Also there’s no support for group capture or anchors.  I think it’s a good exercise to imagine how these things might be added.

I’m not actually sure how to do proper anchor support, but auto-inserting .* after each non-anchored rightmost clause doesn’t work too bad as a hack.

Please do let me know if you have any trouble running it, run into any bugs, etc.  The Discussion page on the Wiki is as good a place as any to post such comments.

Regards,
-Doug


Reply all
Reply to author
Forward
0 new messages