Hello,
thanks for answer.
I figured it out how it works. The were two problems:
first of all string indicating Simplified Chinese encoding was not
GB2312 but gb_2312.
And the second was the way I passed settings two converter about the
it should load txt files. Below is the working version of my code
(part of method converting documents):
-----------------------------------------------------
OfficeDocumentConverter converter = null;
String inputExt =
FileConverterUtils.getFileExtension(inputFile.getName());
if ("txt".equals(inputExt)) {
String txtCharset = CharsetDetector.detectCharset(inputFile);
if ("gb2312".equals(txtCharset.toLowerCase())) {
log.info("setting custom encoding type (GB_2312)");
SimpleDocumentFormatRegistry sdfr = new
SimpleDocumentFormatRegistry();
DocumentFormat txt = new DocumentFormat("Plain Text", "txt", "text/
plain");
txt.setInputFamily(DocumentFamily.TEXT);
Map<String,Object> txtLoadProperties = new
LinkedHashMap<String,Object>();
txtLoadProperties.put("Hidden", true);
txtLoadProperties.put("ReadOnly", true);
txtLoadProperties.put("FilterName", "Text (encoded)");
txtLoadProperties.put("FilterOptions", "gb_2312");
txt.setLoadProperties(txtLoadProperties);
sdfr.addFormat(txt);
DocumentFormat pdf = new DocumentFormat("Portable Document Format",
"pdf", "application/pdf");
pdf.setStoreProperties(DocumentFamily.TEXT,
Collections.singletonMap("FilterName", "writer_pdf_Export"));
pdf.setStoreProperties(DocumentFamily.SPREADSHEET,
Collections.singletonMap("FilterName", "calc_pdf_Export"));
pdf.setStoreProperties(DocumentFamily.PRESENTATION,
Collections.singletonMap("FilterName", "impress_pdf_Export"));
pdf.setStoreProperties(DocumentFamily.DRAWING,
Collections.singletonMap("FilterName", "draw_pdf_Export"));
sdfr.addFormat(pdf);
converter = new OfficeDocumentConverter(this.officeManager, sdfr);
} else {
converter = new OfficeDocumentConverter(this.officeManager);
}
} else {
converter = new OfficeDocumentConverter(this.officeManager);
}
converter.convert(inputFile, outputFile);
-----------------------------------------------------
I used org.mozilla.intl.chardet.CharsetDetector class to detect file
encoding type.
I hope it will help at least a little bit other folks dealing with
similar problems. Of course it would be useful to create JODConverter
documentation but as always there's no time for this.
Greetings,
Janusz