Now think what happens if p(w|d)==0 --- can't calculate logarithm. This case actually happens, especially for sparse models, because p(w|d) is calculated as \sum_t p(w|t)*p(t|d) -- and it could be that for all topics either $p(w|t)==0$ or $p(t|d) == 0$. There are 3 ways to handle p(w|d)==0.
1. Completely exclude p(w|d)==0 summands from the perplexity formula
2. Approximate p(w|d) = n_dw / n_d (how many times $w$ has occurred in a given document $d$ divided by the length of $d$ in words) --- this BigARTM calls "document unigram model".
3. Approximate p(w|d) = tf(w) (how many times $w$ occurred in the entire collection divided by the length of the collection in words) - this BigARTM calls "collection unigram model".
Method (1) may give low perplexity, e.g. too optimistic - for example extremely sparse models will tend to have perplexity close to 0 despite being very bad
Method (2) is better than (1), but still gives too optimistic perplexity
Method (3) gives a conservative estimate of perplexity, but it requires to know tf -- e.g. frequency in the dictionary.
The warning that you get indicates that BigARTM tried to use method (3), and fall back to method (2) because the token was not present in the dictionary so its tf is unknown. For you it means that your perplexity score might be slightly underestimated.
Kind regards,
Alex