ad 1) I'm not sure if your idea would work well. Also, you would essentially limit yourself to only one set of keywords and would have to run decode again for a new kw set.
If you are asking about this just because you are worried about the memory consumption, Id approach it from a different angle -- prune the LM so that you will get to a reasonable memory consumption during decoding and then rescore using the original LM -- there is a functionality for rescoring directly using carpa models which I believe is faster and memory effective that the "old" way of rescoring using G.fst.
ad 2) it is definitely possible, but it would take some, probably substantial, work. The kaldi kw search pipeline is setup with a different scenario -- decode and index once, search many times (with possibly different kw list each time), so it was optimized to achieve fast search times (once the index has been generated). The decoding/lattice generation is usually the slowest part of it. From the lattice, the kw index is created and the actual search is really quick.
y.