String tsvText = jCas.getDocumentText();
int begin = 0;
int end = 0;
for (String tsvLine : tsvText.split("\\r?\\n") {
// This is where you would swap the TSV parsing to match your own schema
String[] parts = tsvLine.split("\\t");
String docText = parts[0];
String docLabel = parts[1];
end += docText.length + 1;
// Swap UsenetDocument with your own type
UsenetDocument document = new UsenetDocument(jCas, begin, end) ;
document.setCategory(docLabel);
document.addToIndexes();
begin = end + 1;
}
[1] This isn't actual working code, you will need to make sure the offsets are calculated correctly, and that the Java I'm cobbling together from memory is correct. You may also want to change UsenetDocument to something from your own type system.
Cheers,
Lee