I'm making an alchemy world so that ONA can learn its rules.
I'm having mixed results, hoping someone has suggestions.
I kind of suspect that the operations's output (which I feed back into ONA) is poorly constructed.
The rules of the world are 4 operations:
- ^mix increments the inventory of "mixed" things.
- ^boil decrements the mixed inventory, increments the "boiled" inventory.
- ^give produces a potion. It decrements boiled & sends "g0" (which is the goal) back to ONA.
- ^blend does nothing. It's wasted time.
- No inventory counter can go negative. No inventory can go over 3. (So in 100 consecutive ^mix calls, 97 are wasted.)
ONA quickly learns that ^blend is a waste; it still calls it occasionally, but not often. That's cool.
ONA also quickly tends to call the ops in order: ^mix, ^boil, ^give. Also cool.
It never quite figures out the max & min on the inventories, though. For example, it might fall into a pattern of 5 more ^mixes (most of them wasted), a couple ^boils, a couple ^gives, couple more ^boils, couple more ^gives. Due to the bounds on the inventory, a lot of those ops are wasted.
I presume that the ops can send some feedback into ONA. I've experimented with that. They seem to help it learn more quickly, but it still gets to about the same success rate (about 1/2 of the op calls are wasted).
Any advice? Like, what format the feedback should be? I've tried simple statements such as <mix --> N> (where N is the new inventory of mixed) and more complex sentences such as "<<((mix * 2) * (boil * 0)) --> ^boil> =/> ((mix * 1) * (boil * 1))>" (which is an attempt to show that calling ^boil when mix = 2 and boil = 0 results in mix = 1 and boil = 1).
Thanks for your thoughts.
Gene