Hello,
1) I assume you're talking about the lexicon.txt file. The lines in those files are space delimited, where the first token is a word and the following tokens are its phones. So !SIL is a word and sil is a phone. I'm actually not sure whether this has special meaning, as far as whether the word !SIL will show up in decoded texts. <UNK> has special meaning. Decoded results contain the <UNK> word when no other word in the vocabulary of the ASR system is found to be likely. spn is a stand-in fake phoneme to mean a spoken sound. You don't have to use <UNK> as your out of vocabulary word necessarily. Whatever is in your oov.txt is the out of vocabulary word, but oov.txt conventionally contains <UNK>.
2) It's been a long time since my phonetics class, but in ASR "phones" do not necessarily have to be phonemes. Something you will say in the egs/wsj/s5/ for example, is that there are three phones for each phoneme: a beginning phone, an intermediate phone, and an end phone, depending on where that phoneme occurs in the word. In practice, the phones do end up being phonemes (maybe with some small deviations from IPA) because most lexicons (like CMUdict) are written using linguistic phonemes.