--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You can do it in lattice rescoring as long as your lattice can potentially have all the paths you care about (i.e. as long as it contains the sub-words).
Combining different LMs with weights is easy using combinations of lattice-compose and lattice-scale. lattice-compose always adds the LM with a weight of one, and you can use lattice-scale before an after to in effect add the LM with a weight that's different from one-- see lmrescore.sh, where it uses this trick to add the 'old' LM with a weight of -1, to subtract its scores.
As you mention, lattice-compose "adds" the acoustic costs and graph costs from each lattice. But, this is not the same as a linear interpolation between the likelihoods, since the costs are the (negative) log-likelihood. Instead, I would be doing a logarithmic interpolation between the two models, not a linear one which is what I want.
I am wondering, if there is a lattice-determinize version that determinizes using the log-semiring instead of the tropical. This would be perfect, since I could easily create a non-deterministic lattice with the union of the (scaled) original lattices and then determinize it. This should work, since, AFAIK, the union of the (scaled) lattices would be determinizable.
Let's assume that the lattices are converted to regular FSTs: Then, the recipe would be something like this:1. Linear scale to each lattice. Just create a new initial state with an epsilon transition to the old initial state and cost -log interpolation_weight, for each lattice.2. fstunion scaled_lattice1.fst scaled_lattice2.fst | fstproject --project_output=true | fstdeterminizestar --use-log=true > linear_interpolation.fstI think this should work, provided that the union of the lattices is in fact determinizable on the output symbols, which I think is the case.What do you think? Is there another way of doing this in a more direct way?
Lattices have their own semiring due to the different types of scores. lattice-combine can scale and combine the lattices in the way you want. Doing lattice-determinize generally won't make much of a difference here however you do it- these two types of lattices would almost never contain identical paths so properly summing the probabilties (or not) wouldn't make a difference. Doing this and then finding the lattice best path would always give you a path from one or other lattice. If you do MBR decoding (lattice-mbr-decode, IIRC it will cobmine them in a more meaningful way).
lattice-combine does this, without the determization (which as I said, wouldn't give you a meaningful combination of the lattices anyway). MBR decoding will give you a little bit of combination. But this is not at all equivalent to linear interpolation of the LMs. In fact, it's not clear to me that it's even meaningful to linearly combine the LMs since they are over different types of symbol. What you probably want to do is to create an FST representation of an OOV word and use 'fstreplace' to insert it into the 'UNK' symbol in the LM, at the 'LG' stage of graph compilation. Getting this LM to compile correctly is tricky because of the determinization that happens in building HCLG. At one point Nagendra Goel was working on scripts like this but I don't think they are ready.