Hi -- Thanks very much for publicizing and maintaining this code. I'm interested in using a sequence memoizer as a component of a larger model, which leads me to a few questions.
(1) In my setting, the training sequences are not fixed. They are themselves imputed by a sampler. So I can't just feed in all the training sequences once and for all. Rather, as the state of the sampler evolves, I need to continually add and remove short independent training sequences. In principle, this is straightforward thanks to the exchangeability of the training sequences with one another. However, the implementation probably involves removing data from a trie. I see at
http://www.sequencememoizer.com/documentation/sequencememoizer/index.htmlthat newSequence() and continueSequence() make it possible to add new sequences. However, it doesn't look like there are methods currently for removing sequences. I haven't looked at the source code. Would it be easy for me (or you!) to add this functionality?
(2) Is there anything wrong with using a very large alphabet size?
(3) Is any support coming for graphical PY processes?
(4) Actually, I am using the sequence memoizer merely to encode a kind of conditional backoff. I care about p(yz | ...tuvwx), and I want a model of it that backs off to p(yz | uvwx) and p(yz | vwx) and so on. This is roughly what your code is for. However, note that I shouldn't train the sequence ...tuvwxyz. In my setting, the arbitrarily long context ...tuvwx is given by some OTHER model, and I only want the SM model to be trained on and to predict the last two characters, yz. If I trained on ...tuvwxyz, then I would be learning also that uv tends to be followed by w, which is a fact about the distribution over contexts, not a fact about the conditional distribution that the SM is supposed to model. Spuriously learning that uv tends to be followed by w would both waste memory and distort the predictive distributions. It is possible to hack around the latter problem with a bit of extra expense, but that doesn't save the waste. Suggestions?
Thanks again!
-cheers, jason