Can Hampi output the regular expressions generated from a CFG grammar

100 views
Skip to first unread message

Jun He

unread,
May 8, 2012, 12:30:03 PM5/8/12
to hampi-devel, Joseph...@it.uu.se
Hi,

Is it possible that Hampi outputs the regular expressions generated
from
a CFG grammar? For example, the regular expression limit0 is
generated
from the CFG grammar S0 with string length of 5:

var hampiStringVar : 5;
cfg S0 := "a" | A0 B0 ;
cfg A0 := "b" | S0 B0 ;
cfg B0 := "a" | B0 A0 ;
reg limit0 := fix(S0, 5);

Is it possible to output the regular expression limit0?

Best regards,
Jun He

Adam Kiezun

unread,
May 8, 2012, 8:32:08 PM5/8/12
to hampi...@googlegroups.com, Joseph...@it.uu.se
Hi,
it's currently not possible to see this expression. 
You'd need to add a println in the Hampi's source code. I could also imagine adding a println statement to a future version of Hampi grammar. If someone wants to works on it, I'd be happy to help. 

./adam

Raphaël Michel

unread,
May 11, 2012, 2:32:10 AM5/11/12
to hampi...@googlegroups.com, Joseph...@it.uu.se, Adam Kiezun
Hi !

I'm going to need it within a couple of weeks so I'll start to work on this soon.

Raphaël.

Jun He

unread,
May 11, 2012, 8:03:59 AM5/11/12
to hampi-devel
Hi,

I guess I have found a solution. You just change the Hampi's source
code file "src/hampi/parser/HConstraintPrepare.java", after line 236:
Regexp boundedRegexp = gsb.getBoundedRegexp(g, cfg, size, false), add
a new line: System.out.println(boundedRegexp). Then it will print the
converted regular expression. However there is a problem when size is
large, as it will create a long string and cause a stack overflow.
Hence it's better to output the regular expression to a file.

/Jun

Adam Kiezun

unread,
May 11, 2012, 9:28:02 AM5/11/12
to hampi...@googlegroups.com
Great. I'm happy it worked for you.
Yes, these expressions can be huge. They're not optimized for printing or readiing in any way. In memory, they are stored in an efficient way, with a lot of sharing. But when they're printed out, they're huge.

./adam

Jun He

unread,
May 13, 2012, 11:11:39 AM5/13/12
to hampi-devel
Hi Adam,

I am planning to a write a function similar to Regexp::toString(), the
purpose is trying to make the printed regular expression easy to read
and short by sharing sub regular expressions. For example,
considering the regular expression (a | b) c (a | b), I prefer to
print it as: S1 = a | b, S0 = S1 c S1. Hence the key is to compare
whether two sub regular expressions are equal or not. I tried to use
int Regexp::cachedHashCode to identify a regular expression, however
it doesn't work. A counter example is bounding the CFG in file
"resources/othertools/ambiguity/053.cfg" with length larger than 30.
Do you have any suggestion how to identify a regular expression?

/Jun

Adam Kiezun

unread,
May 15, 2012, 9:58:04 AM5/15/12
to hampi...@googlegroups.com
In general, checking if two regular expression encode the same
language is a hard problem (I think PSPACE-complete). Google
'equivalence problem of regular expressions'

./adam

shay....@gmail.com

unread,
May 13, 2012, 3:50:16 PM5/13/12
to hampi...@googlegroups.com

Sent from my LG phone
Reply all
Reply to author
Forward
0 new messages