Upgrading from 0.7

Tiago Ferreira

unread,

Aug 12, 2020, 7:49:45 AM8/12/20

to LearnLib Q&A

Hi everyone,

I'm working on learning some state machines for a network protocol, much like TCP-Learner. I am now at the point where I should upgrade parts of this code, that are already a bit old and use LearnLib 0.7, to the latest version. This was mostly guided by the need of better equivalence oracles and more recent learning algorithms.

The only "pattern" I'm missing is on how to develop "oracle chains" in the new version. Currently I make use of 4 main oracles:

* The Membership Oracle -> it wraps around and interacts with the SUL, I have adapted this to a SULOracle and it seems to work great.

* The Probabilistic Oracle -> this Oracle is essentially responsible for running queries multiple times and selecting the most frequent one as the answer. This is because my SUL is (mostly) deterministic but there's a 10% chance of non-determinism due to missing a packet, stuff like that. This has been manually implemented, as far as I know there isn't a built-in oracle that does this, but please correct me if I'm wrong.

* The Cache Oracle -> Pretty similar to what is now built into LearnLib, this was just a manually implemented version as it wasn't present in the old versions.

* The Equivalence Oracle (Random Walk at the moment) -> Equivalence testing and providing counterexamples.

These work as a stack, all wrapping down to the SULOracle, and it is that behaviour that I'm not sure how to reproduce in the current version. Should I use the Filter interface, something else?

Thank you!

Tiago

LearnLib Q&A

unread,

Sep 12, 2020, 6:20:34 PM9/12/20

to LearnLib Q&A

Hi Tiago,

sorry for the delay but I haven't received any notification about your post (not sure whether this was an issue on my end or with Google Groups). I hope this answer still finds you in time.

Two main points before we go to the code:

The equivalence oracle simply needs a delegate membership oracle for comparing the system's output with the output of the hypothesis. So, it doesn't really care about any oracle chain as long as it is provided with some kind of membership-query-mechanism.
As you are dealing with non-deterministic systems, you already made the correct decision to put the cache between the learner and the probabilistic oracle. Of course, this will lower the effectiveness of the cache because let's say your probabilistic oracle does a majority vote out of 10 tries, then the 10 identical queries would have to be executed on the system. However, otherwise you would get exceptions about cache inconsistencies. With this order of oracles you would insert queries into the cache once the probabilistic oracle has decided which answer to pick.

I have uploaded an exemplary setup as a gist. A few things to note:

For membership oracles you don't really need chains or filters (to be fair, I have yet to see a useful setup using the FilterChain class). Most MQ oracles require some form of delegation mechanism which is achieved by passing an existing oracle in the constructors. So all you have to do is to instantiate the oracles in the proper order to ensure a correct nesting.
I suggest you use a RandomWordEQOracle rather than a RandomWalkEQOracle because it allows you to operate on the (cached) membership oracle rather than the SUL and it basically uses the same search heuristic.
With caches you get an additional cache consistency test that validates if the hypothesis conforms with all of your observed behavior so far (basically an additional EQ oracle). I used this consistency check to construct a chain of EQ oracles (here the chaining mechanism makes more sense because the two EQ oracles are truly independent).
You could potentially further optimize the ProbabilisticOracle implementation by properly batching the queries and using the processQueries(Collection<? extends Query<I, Word<O>>> queries) method of the delegate oracle. This is especially impactful if you use a parallel delegation oracle.

Hope this helps.

Kind regards,
Markus

Merlin Chlosta

unread,

Sep 14, 2020, 10:04:05 AM9/14/20

to LearnLib Q&A

Hi Tiago,

Hi Markus,

I got a similar setup and some follow-up questions. My caching detects non-determinism, usually the 'probabilistic oracle' works around that. However, one of my SULs has 50%ish non-determinism in some paths.

How would you deal with that? I found a paper "Automata Learning with Automated Alphabet Abstraction Refinement" [1] which seems to implement just that -- but is that available in LearnLib, and if so, could you point me in the right direction?

I'd be handy to know LearnLib's progress finer than the Learning-Refining iterations. For example, I have already a very big cache file that did produce a learned model before. After increasing the input alphabet and adding new queries to the cache, I can't reproduce the learned model for the initial (reduced) input alphabet. I'd like to figure out why. My SUL is rather slow with just a few queries per second, so running on the cache and re-producing previous results is pretty valuable.

Thanks,

Merlin

[1] https://learnlib.de/wp-content/uploads/2017/10/automated-alphabet-abstraction-refinement-vmcai2011.pdf

Markus Frohme

unread,

Sep 15, 2020, 3:41:57 PM9/15/20

to Merlin Chlosta, LearnLib Q&A

Hi Merlin,

I think the AAR paper mostly deals with non-determinism that is introduced by input alphabet abstractions. Let's say you have an abstract input symbol "numeric value" and your SUL driver decides to always map this abstract value to the number "0". Now, you receive a counterexample that tells you "if you input 42, you observe a different behavior". Mapping the 42 back to "numeric value" would then yield a non-deterministic behavior on the abstract level.

The AAR approach would be able to automatically detect the difference and handle the concrete inputs 0 and 42 separately. From your description it sounds like your SUL is already answering the concrete symbols non-deterministically. So here, the AAR approach would probably not help you (and it is currently not implemented in LearnLib anyway). I think implementing your own heuristic (e.g. via a "probabilistic oracle") is probably the easiest solution to begin with.

Regarding your second question: Reconstructing a learner's state from a cache is often very complicated, because you have to deal with the internal structure of the learner and often its structure depends on the order in which the learner has received information such as counterexamples. So given only cache entries it may even not be possible at all.

However, you can use your cache data to "cheat" a little bit on the performance of membership and equivalence queries in a new learning setup. Using a simple map from input to output data, you can write your own membership oracle instance, that first checks if your cache data contains information for the posed query and only delegate to the actual SUL in case of a cache-miss (similar to the DFAHashCacheOracle [0]). Similarly, you can use the SampleSetEQOracle [1] (and its subtypes) to check if the current hypothesis is consistent with your previously observed cache data and automatically return a counterexample if necessary.

Integrating information via these external interfaces (MQs, EQs) allows you to abstract from all the internal clutter of the learners and lets you setup some interesting mixed approaches (e.g. begin learning with L* and finish with TTT) which may lead to interesting effects.

Kind regards,
Markus

[0] - https://github.com/LearnLib/learnlib/blob/develop/oracles/filters/cache/src/main/java/de/learnlib/filter/cache/dfa/DFAHashCacheOracle.java
[1] - https://github.com/LearnLib/learnlib/blob/develop/oracles/equivalence-oracles/src/main/java/de/learnlib/oracle/equivalence/SampleSetEQOracle.java

On Mon, Sep 14, 2020 at 07:04:05AM -0700, Merlin Chlosta wrote:
> Hi Tiago,
> Hi Markus,
>
> I got a similar setup and some follow-up questions. My caching detects
> non-determinism, usually the 'probabilistic oracle' works around that.
> However, one of my SULs has 50%ish non-determinism in some paths.
> How would you deal with that? I found a paper "Automata Learning with
> Automated Alphabet Abstraction Refinement" [1] which seems to implement
> just that -- but is that available in LearnLib, and if so, could you point
> me in the right direction?
>
> I'd be handy to know LearnLib's progress finer than the Learning-Refining
> iterations. For example, I have already a very big cache file that did
> produce a learned model before. After increasing the input alphabet and
> adding new queries to the cache, I can't reproduce the learned model for
> the initial (reduced) input alphabet. I'd like to figure out why. My SUL is
> rather slow with just a few queries per second, so running on the cache and
> re-producing previous results is pretty valuable.
>
> Thanks,
> Merlin
>
> [1]
> https://learnlib.de/wp-content/uploads/2017/10/automated-alphabet-abstraction-refinement-vmcai2011.pdf

> [...]

Reply all

Reply to author

Forward