Hi there,
Thanks for the transcript. It hadn't occurred to me that the console output would get so big, I can see why you didn't just cut and paste it in!
So here's what I suspect is happening:
i. Term vector indexing / learning has already completed, so you should have a termvectors.bin file that you can use for term-term analyses.
ii. The document vector warning messages are only happening for some documents, not all.
iii. The code was giving the severe messages every time any contentsfield is empty. This I think was a mistake in the code. Now if you recompile, you should just get a fine message when a single field is empty, and only get a severe message when the whole document is empty.
iv. In addition, these messages should now have the user-provided docid (typically the path / filename), rather than Lucene's internal doc integer ID.
If I'm correct / lucky, you should be able to check out the latest code, recompile, and rerun without seeing these problems. If you want to confirm the hypothesis of what's happening, you should be able to check the finer error messages and see if they correspond with documents where you have (say) a title and no body, or something like that.
I have not written new test cases for this, that would take a bit longer because I'd need to hack up an example Lucene index. I would like to create some tests that do this in due course, but in the interests of time I wanted to share what I think is a reasonable diagnosis and fix so that you can see if works for you.
Best wishes,
Dominic