First, I'm quite sure that the characteristic vector must be 2n + 1,
not 2n + 2. If you replace all 2n + 2 occurences
in the code with 2n + 1, then the crash goes away.
Second, the RecognizeMapping index update by this line in "recognize" function:
stack.push(new RecognizeMapping(mapping.working_string + c,
dictionary_next, levenshtein_next, mapping.index + 1));
the "mapping.index + 1" is just wrong. As you can see on table 5.2 of
Schulz and Mihov, the next index is determined by the
next levenshtein state.
I changed the characteristic vector from 2n + 2 to 2n + 1. The crash
I get happens when this.chi = ChiType.MS. Then I get this stacktrace:
Exception in thread "main" java.lang.NegativeArraySizeException
at com.infiauto.datastr.auto.LevenshteinAutomaton.functionR(LevenshteinAutomaton.java:444)
at com.infiauto.datastr.auto.LevenshteinAutomaton.deltaE(LevenshteinAutomaton.java:453)
at com.infiauto.datastr.auto.LevenshteinAutomaton.delta(LevenshteinAutomaton.java:488)
at com.infiauto.datastr.auto.LevenshteinAutomaton.<init>(LevenshteinAutomaton.java:669)
at com.infiauto.datastr.auto.LevenshteinAutomaton.main(LevenshteinAutomaton.java:789)
functionR has often been the source of my bugs.
> Second, the RecognizeMapping index update by this line in "recognize" function:
>
> stack.push(new RecognizeMapping(mapping.working_string + c,
> dictionary_next, levenshtein_next, mapping.index + 1));
>
> the "mapping.index + 1" is just wrong. As you can see on table 5.2 of
> Schulz and Mihov, the next index is determined by the
> next levenshtein state.
You're right. I made a big mistake there. I'm going to have to look
at my design in order to fix that.
I'll go ahead and include the latest working copy of the only file I've changed.
Thanks again.
Court
There are two problems I'm struggling to understand right now:
1) The LA as I've built it right now is constructed of States, which are
just sets of Positions. When I build a characteristic vector and apply
the delta function to the current State I get a State back, but
shouldn't it be just a single Position? I mean, since an LA is a DFA I
shouldn't be getting multiple Positions for any given input, correct?
2) I'm looking at your Python code to try and help my understanding.
I'm looking at the ErrorTolerantRecognizer.recognize function in fsc.py.
Your code says:
if mPrime[0] != []:
So what kind of data structure (mPrime) is being returned by the delta
function in your code? A set of sets?
Courtney Falk