private void identify(IdentificationRequest request, ContainerSignatureDefinitions defs) throws IOException {
Ole2ContainerContentIdentifier identifier = new Ole2ContainerContentIdentifier();
identifier.init(defs, "OLE2");
Ole2IdentifierEngine engine = new Ole2IdentifierEngine();
ContainerFileIdentificationRequestFactory requestFactory = new ContainerFileIdentificationRequestFactory();
engine.setRequestFactory(requestFactory);
identifier.setIdentifierEngine(engine);
IdentificationResultCollection resultCollection = new IdentificationResultCollection(request);
resultCollection = identifier.process(request.getSourceInputStream(), resultCollection);
List<IdentificationResult> results = resultCollection.getResults();
for(IdentificationResult result: results){
System.out.println(result.getPuid() + ": " + result.getMimeType());
}
}
Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:116)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:153)
at uk.gov.nationalarchives.droid.container.ole2.Ole2IdentifierEngine.process(Ole2IdentifierEngine.java:59)
at AbstractContainerContentIdentifier.process(AbstractContainerContentIdentifier.java:118)
at Ole.identify(Ole.java:56)
Can't comment on the code itself (hard to debug on a phone!) but I would note that ole2 is the container format for older office documents. Docx files are zip containers.
Before attempting to do container identification, you must be sure to pass the right kind of file to the right kind of container identifier.
Regards
Matt
<
dependency
>
<
groupId
>eu.scape-project.nanite</
groupId
>
<
artifactId
>nanite-core</
artifactId
>
<
version
>1.0.72.2</
version
>
</
dependency
>