A possible bug in "not in"

8 views
Skip to first unread message

prateeks

unread,
Nov 2, 2009, 7:53:36 PM11/2/09
to hampi...@googlegroups.com
Hi guys,

The following example return UNSAT.

var hampiStringVar : 5 ;
cfg flax0 := q0;
cfg q0 := "=" q1 ;
cfg q1 := ;

// --------------------------------

assert hampiStringVar not in flax0 ;

If the first line is changed to "var hampiStringVar : 1", it works.

-- Prateek.

Adam Kiezun

unread,
Nov 2, 2009, 8:45:24 PM11/2/09
to hampi...@googlegroups.com
Nice find! BTW, this simpler case also fails (returns UNSAT):

var hampiStringVar : 2 ;
cfg flax0 := "x";


assert hampiStringVar not in flax0 ;

I believe (without looking at the code yet) the reason is that Hampi
is trying to be smart and notices that flax0 cannot be expanded to
size 2 and so it says UNSAT! without checking whether the assertion is
an 'in' or 'not in'.

I'm may going to be able to fix this right now due to other work. Can
you look into it? I'll be very happy to put the patch in.

Thanks
./adam

Devdatta

unread,
Nov 2, 2009, 8:46:38 PM11/2/09
to hampi...@googlegroups.com
In GrammarStringBounder.java internalGetBoundedRegexp() :

I removed the check that returned NULL when a grammar can't return
empty string and I think that this fixes it.

@Adam : I might be really wrong .. please confirm!

--- src/hampi/grammars/apps/GrammarStringBounder.java (revision 20)
+++ src/hampi/grammars/apps/GrammarStringBounder.java (working copy)
@@ -141,9 +141,8 @@
}

Regexp result;
- if (x.isEmpty() && !canGenerateEmptyString){
- result = null;
- }else if (x.isEmpty() && canGenerateEmptyString){
+
+ if (x.isEmpty()){
Hampi h = new Hampi();
result = h.constRegexp("");
}else if (x.size() == 1){
Index: src/hampi/Hampi.java


Cheers
Devdatta

2009/11/2 prateeks <prat...@eecs.berkeley.edu>:

Devdatta

unread,
Nov 2, 2009, 9:04:49 PM11/2/09
to hampi...@googlegroups.com
Hi

Consider that I want to find a string that is of length 5 and does not
contain "=" in it. How would I write this as a constraint to Hampi ?

Cheers
Devdatta

2009/11/2 Devdatta <dev.a...@gmail.com>:

Adam Kiezun

unread,
Nov 2, 2009, 9:54:46 PM11/2/09
to hampi...@googlegroups.com
Hi Devdatta,
Thanks for looking into it. I have not yet confirmed that the fix
works but regardless of that, we need to make sure that the fix does
not (unless we find no other way) make Hampi slower in cases where
Hampi works correctly. In particular, simply removing the check I
mentioned (your fix may or may not be that check - I'll need to
confirm) would make Hampi much slower in cases where the check works.
If you run the regression tests with and without your fix and if they
all pass and there is no significant slowdown, I'll be happy to put
the fix in.

./adam

Adam Kiezun

unread,
Nov 2, 2009, 10:06:08 PM11/2/09
to hampi...@googlegroups.com
Currently, you need to specify the alphabet (universe) for Hampi. So,
to say that the string does not contain something you need to also say
what it can contain. For you specific problem:

var hampiStringVar : 5 ;
cfg Text := (Sigma)* ;
cfg Sigma := ['a'-'z'] | "=" | "_";

assert hampiStringVar in Text ;
assert hampiStringVar not contains "=";

This limitation is something I'd like to see resolved at some point -
I think Hampi could just assume the ASCII alphabet (or, at least emit
a warning that specifying an alphabet is required).

./adam

Vijay Ganesh

unread,
Nov 5, 2009, 11:26:59 AM11/5/09
to hampi...@googlegroups.com
Hi All,

The CAV deadline is on January 15th. Thats two months. HAMPI++ would be an excellent paper for that.

-Vijay.

Devdatta

unread,
Dec 6, 2009, 6:32:23 PM12/6/09
to hampi...@googlegroups.com
Hi Adam,

> This limitation is something I'd like to see resolved at some point -
> I think Hampi could just assume the ASCII alphabet (or, at least emit
> a warning that specifying an alphabet is required).

Have you thought about this a little more ? I am looking into
automatic conversion of RegEx to Hampi Grammar right now and support
for being able to say "cfg NT := not ( some set of characters) " would
be really useful.

Take the regex /a[^bcd][^rfg]mno[^fgh]/ for e.g. Writing this right
now in Hampi Grammar terms is really hard, if I am not wrong. Do you
know of a way to write this easily in Hampi ?


Cheers
Devdatta


2009/11/2 Adam Kiezun <aki...@gmail.com>:

Adam Kiezun

unread,
Dec 6, 2009, 7:10:22 PM12/6/09
to hampi-devel
Hi Devdatta,
I'm open to almost any solution that would work for you. What do you
think about a solution in which you'd need to declare the universe for
the string variable at declaration? (once we have multiple variables
then we could have an alphabet for each).

For example, you'd be able to declare v as:
var v : 5 in star([a-z]); //this sets the universe for the variable v
This way "assert v not in something" would have a simple semantics.

Another thing we should add is support for combining regexps as differences, eg:
reg r3 := minus(r1, r2); //means that r3 is the set of strings that
are in r1 and are not in r2

I have not thought these ideas through yet though. Do you think this
would work (in general, and for you in particular)? We'd need to come
up with a way of encoding those 'minus' expressions in STP.

./adam

Devdatta

unread,
Dec 6, 2009, 7:20:25 PM12/6/09
to hampi...@googlegroups.com
The problem with the 'universe' idea is that you have to write the
assert for the whole string, which seems to me would not be trivial
when you have multiple negated character classes at arbitrary depths
in the regular expression. What I really want to say is that a
particular cfg Non Terminal generates any character other than a given
set of characters.

Right now , I am manually doing a negation (e.g if I have the ASCII
values \097's negation then I write that the cfg produces \000-\096 |
\098-\255 ) but I am afraid this would make really big STP constraints
( I have had some experience with that). Ideally I would want to write
the "NOT of chars x" so that Hampi can do any optimization it can
think of. To start off , Hampi could do the trivial negation that I
did , but later on we could look at optimizing it.

Cheers
Devdatta


2009/12/6 Adam Kiezun <aki...@gmail.com>:

Pieter Hooimeijer

unread,
Dec 6, 2009, 7:29:09 PM12/6/09
to hampi...@googlegroups.com
One proposed approach:

1) Add an alphabet notion for any declared variable, as Adam suggested.
2) Treat negated character classes (e.g. [^0-5]) as a front end
feature -- in other words, if the alphabet is [0-9], then do the naive
encoding of [^0-5] -> [6-9].
3) During the STP encoding step, track the alphabet (this is necessary
to implement (1) anyway). As an optimization, if a particular position
can be encoded more cheaply using a negation in the STP clause, (e.g.,
when describing all alphabet symbols except a few) then do it.

I think this nicely separates the frontend/backend issues. Notably it
avoids the maintenance cost of propagating additional regular language
notation all the way to the STP encoding.

Cheers,

Pieter

Devdatta

unread,
Dec 6, 2009, 7:40:22 PM12/6/09
to hampi...@googlegroups.com
> to implement (1) anyway). As an optimization, if a particular position
> can be encoded more cheaply using a negation in the STP clause, (e.g.,
> when describing all alphabet symbols except a few) then do it.

This optimization (imho) will always be possible. I don't really need
support for "not" of a non terminal. The regex syntax only supports
notting for one character of the matched string. So instead of doing
the naive thing first and then trying to figure out if an optimization
is possible you want to support the NOT using STP's negation
assertions directly. For that you could change the formal grammar to
only support "not" of a set of characters at the front end.

Cheers
Devdatta

2009/12/6 Pieter Hooimeijer <pieter.h...@gmail.com>:

Pieter Hooimeijer

unread,
Dec 6, 2009, 8:03:21 PM12/6/09
to hampi...@googlegroups.com
> This optimization (imho) will always be possible.

There are a couple of reasons why I figured this might be more easily
done at the backend:
1) A (suboptimal) negated character class like [^a-zA-Z0-9[bunch of
other stuff here]] may actually be cheaper
to encode without an STP negation.
2) There may be other situations where the optimization is relevant,
unrelated to character class negations. Consider a regex like
"or([a-zA-Z0-9], [\000-254])" -- in this case it might be beneficial
to encode the equivalent of [^\255], even though the negation is not
explicit in the input.

Cheers,

Pieter

Adam Kiezun

unread,
Dec 7, 2009, 11:26:58 AM12/7/09
to hampi-devel
Hi,
This sounds like the simplest solution: ie for now, add front-end
support only for 1-character negations
1. not([ CHAR_LIT - CHAR_LIT ])
2. not([ CHAR_SEQ - CHAR_SEQ ])
3. (maybe) not(CHAR_SEQ)

this may be relatively easy to implement - the length of this string
is always 1 (knowing the length is important for many calculations
inside of hampi), and we should be able to encode it in STP as
'NOT(the usual thing)'. So for example, Hampi's "not([a-z])" would get
encoded in STP as something like "NOT(AND(BV[..]>= 'a', BV[..]<=
'z'))". I think we'd not need to split such 'nots' into two sub-ranges
inside Hampi, because STP would do that for us. Do you think this
would work?

The alphabet/universe may need to be added anyway. Otherwise things
like this do not have a semantics:
var v : 1;
assert a not contains '+';

I think Pieter's optimization idea is a very good one and we should
definitely look into it too. Perhaps we can start with the simplest
thing and see how much the optimization would help.

What do you guys think?
./adam

Devdatta

unread,
Dec 7, 2009, 12:51:01 PM12/7/09
to hampi...@googlegroups.com
> This sounds like the simplest solution: ie for now, add front-end
> support only for 1-character negations
> 1. not([ CHAR_LIT - CHAR_LIT  ])
> 2. not([ CHAR_SEQ - CHAR_SEQ  ])
> 3. (maybe) not(CHAR_SEQ)

You will need support for OR inside the not or (equivalently) AND of
different not constraints. I suggest the former as it seems to be
cleaner (OR is already present in the grammar).

> 'z'))". I think we'd not need to split such 'nots' into two sub-ranges
> inside Hampi, because STP would do that for us. Do you think this
> would work?
Not clear to me what you mean by 'split such nots into two sub ranges'.

>
> The alphabet/universe may need to be added anyway. Otherwise things
> like this do not have a semantics:
> var v : 1;
> assert a not contains '+';

Why not ? It just means that the 8 bit value can be anything other
than ASCII('+')

>
> I think Pieter's optimization idea is a very good one and we should
> definitely look into it too. Perhaps we can start with the simplest
> thing and see how much the optimization would help.
>

Sure. I think the first step should be to just add support in front
end. Optimizations can be added based on how much time they take to
implement.


Cheers
devdatta

Adam Kiezun

unread,
Dec 7, 2009, 1:06:16 PM12/7/09
to hampi-devel
On Mon, Dec 7, 2009 at 12:51 PM, Devdatta <dev.a...@gmail.com> wrote:
>> This sounds like the simplest solution: ie for now, add front-end
>> support only for 1-character negations
>> 1. not([ CHAR_LIT - CHAR_LIT  ])
>> 2. not([ CHAR_SEQ - CHAR_SEQ  ])
>> 3. (maybe) not(CHAR_SEQ)
>
> You will need support for OR inside the not or (equivalently) AND of
> different not constraints. I suggest the former as it seems to be
> cleaner (OR is already present in the grammar).

I'm not sure why we'd need 'OR inside the not'. Can you give an example?

>> 'z'))". I think we'd not need to split such 'nots' into two sub-ranges
>> inside Hampi, because STP would do that for us. Do you think this
>> would work?
> Not clear to me what you mean by 'split such nots into two sub ranges'.

I meant that we won't need to split "NOT(AND(BV[..]>= 'a', BV[..]<=
'z'))" into something like "OR(AND(BV[..]>= \000, BV[..]<'a'),
AND(BV[..]> 'z', BV[..]<\256))"
that is conversion of ~[b-d] => [\000-a] & [e-\256] would be taken
care of by STP.

>> The alphabet/universe may need to be added anyway. Otherwise things
>> like this do not have a semantics:
>> var v : 1;
>> assert a not contains '+';
>
> Why not ? It just means that the 8 bit value can be anything other
> than ASCII('+')

That's already implicitly assuming that ASCII is the universe. What if
we wanted smaller/bigger universe? Why not make the universe explicit
so that it's (a) obvious, and (b) controlable.

./adam

Devdatta

unread,
Dec 7, 2009, 1:42:02 PM12/7/09
to hampi...@googlegroups.com
>
> I'm not sure why we'd need 'OR inside the not'. Can you give an example?
>

The simple regex character class [^afm] will mean not ( 'a' | 'f' | 'm').

>
> I meant that we won't need to split "NOT(AND(BV[..]>= 'a', BV[..]<=
> 'z'))" into something like "OR(AND(BV[..]>= \000, BV[..]<'a'),
> AND(BV[..]> 'z', BV[..]<\256))"
> that is conversion of ~[b-d] => [\000-a] & [e-\256] would be taken
> care of by STP.
>

ok

>
> That's already implicitly assuming that ASCII is the universe. What if
> we wanted smaller/bigger universe? Why not make the universe explicit
> so that it's (a) obvious, and (b) controlable.
>
Why make it a default in Hampi ? Its already controllable in that a
user can write sigma = a-z | 0-9 | \ | / | ... , Text =sigma*
and then assert string in Text.

Cheers
devdatta

Adam Kiezun

unread,
Dec 7, 2009, 1:58:05 PM12/7/09
to hampi-devel
>> I'm not sure why we'd need 'OR inside the not'. Can you give an example?
>
> The simple regex character class  [^afm] will mean not ( 'a' | 'f' | 'm').

got it.

>> That's already implicitly assuming that ASCII is the universe. What if
>> we wanted smaller/bigger universe? Why not make the universe explicit
>> so that it's (a) obvious, and (b) controllable.
>>
> Why make it a default in Hampi ? Its already controllable in that a
> user can write sigma = a-z | 0-9 | \ | / | ... , Text =sigma*
> and then assert string in Text.

we can make it a hidden default (which it may be already, modulo the
bug in 'not in') or an explicitly controlled requirement. I'm fine
either way but perhaps explicit is better here because it's one fewer
thing to remember for a user, ie makes the constraint file more
self-contained.

./adam
Reply all
Reply to author
Forward
0 new messages