Ok, I've been trying to sort this out.
It seems that Hampi correctly interprets my input, and iteratively considers lengths from 1 to 2. For length=1, Hampi first calls getBoundedRegexp (from GrammarStringBounder.java) for the non-terminal S1 and succeeds to create the regular expression "a", and then Hampi calls the function for S2 and null is returned. This null causes prepareSizeFixRegexp (from HConstraintPreparer.java) to throw an UNSAT exception, which is captured in the function run (from Hampi.java) and it is handled by skipping length=1 and proceeding to test length=2.
Similarly, for length=2, the call to getBoundedRegexp for the non-terminal S1 fails returning null, which is transformed into an exception, which is captured in the run function. Then, length=2 is skipped and it proceeds to test length=3 without even considering the non-terminal S2. (For length=3 it finishes, since I asked only lengths 1 and 2).
I believe this is a bug, since whenever a regular expression for a "not in"-assert cannot be constructed, it means that we should ignore that assert (note that the assert is not restricting the values of the variable since it is "not in emptyset"). I've fixed this "bug" by changing these 2 lines of prepareIn (from HConstraintPreparer.java)
Regexp reg = prepareSizeFixRegexp(varName, boundSize, ast);
return hampi.regexpConstraint(exp, inExpr.isIn(), reg);
into this code:
try{
Regexp reg = prepareSizeFixRegexp(varName, boundSize, ast);
return hampi.regexpConstraint(exp, inExpr.isIn(), reg);
}catch(HampiResultException e){
if (!e.isUnsat() || inExpr.isIn())
throw e;
return hampi.andConstraint(new ArrayList<Constraint>()); //a trivially true constraint
}
With this I get the answers I expected in my toy example.
------------------------------
------------------------------
However, if I'm not mistaken, there's another unrelated bug. Consider the following example:
cfg S1 := "a" S1 "a" | "a" | ;
Obviously the language generated by S1 and S2 are the same, yet I get this answer:
{VAR(v)=aa}
I haven't been able to understand why this fails (the source code is quite huge!), but at least this seems wrong to me (line 73 of GrammarStringBounder.java):
if (canGenerateEmptyString || elems.size() <= bound){//uses the fact that every symbol (other than start) produces at least one terminal
where elems is the right-hand side being analyzed. In this example, A2 has exactly one right-hand side, with 3 elements. Therefore, when bound is 1 or 2, this "if" fails even though A2 can generate a word of length 2 (thanks to the fact that S2 can generate the empty string).
It seems that instead of counting the number of elements in elems by doing an elems.size(), it should count the non-nullable elements in elems.
I do not know if this tool is still being maintained, but I hope that somebody finds this information useful.
Carles