Allowing regular expressions in my language

75 views
Skip to first unread message

chri...@gmail.com

unread,
May 10, 2013, 4:48:58 AM5/10/13
to pe...@googlegroups.com
Hey everyone!

I am currently trying to write a small DSL for my masters thesis and reached a point, where I am not sure how to proceed.

I have the following rule:

value
= "true"
/ "false"
/ ["] content:([^"])* ["]
/ first:[0-9]+ last:(.[0-9]+)?
/ [/] content:([^/])* [/]
/ f:field

Now I want to allow escaped double quotes and escaped slashes as well, something like that:

value
= "true"
/ "false"
/ ["] content:([\"] / [^"])* ["]
/ first:[0-9]+ last:(.[0-9]+)?
/ [/] content:([\/] / [^/])* [/]
/ f:field

But this leads me to an error:
Expected ["] or [^"] but end of input found

I think the problem is, that [\"] already "grabs" the closing double quote of a doublequoted string.

So. How do I make this work? :D

Thanks in advance!
Chris

Guilherme Vieira

unread,
May 10, 2013, 12:42:54 PM5/10/13
to chri...@gmail.com, pe...@googlegroups.com
The problem is that you're using the "[characters]" expression ([\"]) to match the double-quote escape sequence (\"). In other words, at that point your grammar matches either a backslash or a double-quote. In order to match them in sequence, you should use a string literal, like this:

["] content:('\"' / [^"])* ["]

Also, I'm not sure if it's a code style matter, but I found funny that you use ["] instead of '"'. I always use the latter. It looks like you bumped into this problem altogether because you were using [characters] to match quotes instead of string literals.

Additionally, you might want to take a look at the !expression and &expression operators (http://pegjs.majda.cz/documentation). These will match stuff or avoid matching stuff without advancing the parser position, thus not consuming the matched or avoided string. In this particular case it's not necessary, but you could've used something like this, which is what I usually do:

["] content:('\"'? !'"' .)* ["]

Adding double-slashes is easy, too:

["] content:('\\'? '\"'? !'"' .)* ["]

It may be a little difficult to read at first, but I find this form the most easily extensible one (adding / removing stuff is easy), and it's not really that bad to read once you get used to it, I think.

I hope this helps, and good luck with your thesis.

-- 
Atenciosamente / Sincerely,
Guilherme Prá Vieira




--
You received this message because you are subscribed to the Google Groups "PEG.js: Parser Generator for JavaScript" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pegjs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

chri...@gmail.com

unread,
May 10, 2013, 1:07:04 PM5/10/13
to pe...@googlegroups.com, chri...@gmail.com

Hey!
First of all, thank you very much for your fast reply :)

I tried your suggested solution:

value
= "true"
/ "false"

/ '"' content:('\"'? !'"' .)* '"'

But still, when i have this as input:
string == "Blabl\"abla"

It tells me "Line 9, column 2: Expected "\"" or any character but end of input found."

With line 9 being the end of my input. I think the expression consumes all characters, now.

Thanks for you help.
Chris

Guilherme Vieira

unread,
May 10, 2013, 1:45:59 PM5/10/13
to chri...@gmail.com, pe...@googlegroups.com
Odd, I just tried that grammar online with the input you mentioned and it worked. I don't know how that's possible, but maybe the problem is with some other portion of your grammar. Can you want to put the rest of your grammar and the input that's breaking on Pastebin?

-- 
Atenciosamente / Sincerely,
Guilherme Prá Vieira


Chris

chri...@gmail.com

unread,
May 10, 2013, 2:17:50 PM5/10/13
to pe...@googlegroups.com, chri...@gmail.com

Hey, sure i can do this.

Grammar: http://pastebin.com/zeNkPdvE
Input: http://pastebin.com/QBcHJLF7

Thank you very much
Chris

Guilherme Vieira

unread,
May 10, 2013, 2:54:20 PM5/10/13
to Chris Vaas, pe...@googlegroups.com
So, I made a few mistakes, sorry, I guess my JavaScript is a little rusty... You were actually right to parse the string contents using the slash operator. Additionally, I told you to use '\"' instead of '\\"'. This works:

["] content:('\\"' / [^"])* ["] { return '"' + content.join('') + '"' }

-- 
Atenciosamente / Sincerely,
Guilherme Prá Vieira



Chris

Reply all
Reply to author
Forward
0 new messages