Hi Vladimir,
as the person responsible for these changes (projects 5 and 6), I'm sorry we didn't find time to include the XML generation. It was on the roadmap, but the CSV output was felt to be more important by those who expressed an interest. Had we time and resource, it would have been included. Hopefully it will be included in the DROID 7 project now underway.
However, I should explain that the reason for dropping the original XML output was that the data model of DROID 5 and 6 is substantially different to that of DROID 4 and below, which means the same XML schema output would not work - and even if an XML output is included in DROID 7, it will necessarily be different to previous versions.
The main changes were:
1. DROID can now scan inside zip and tar files, which themselves contain files. This means a file-path is not always available for all files which DROID scans, so a URI is now the preferred identifier (e.g. "zip://C:/files/largeArchive.zip!fileInsideZipFile.txt"). File paths are still available for files which are genuinely sitting directly in a file system, but not for files inside archival files.
2. The "Tentative", "Positive" system of identification was removed, as it was felt that it represented a subjective assessment of the strength of identification (based purely on whether the identification was done by a binary signature or a file extension). There were several cases where files were identified "positively", but the identification was in fact wrong. A new, more objective, system of identification was introduced, which tells you what kind of signature matched a file, and how many identifications there were. This allows a user of DROID to decide how much faith to place in different systems of identification, for different sorts of files.
3. A new form of identification "container signatures" was introduced to identify composite file formats (e.g. Microsoft DOCX is actually a zip file which contains certain other XML files). The previous binary signatures were reasonably bad at identifying these sorts of files, as the contents were obscured by the containing zip. Container identification now opens zip and ole2 files and looks inside them. This in itself made us change the positive/tentative system, as we now had three different forms of identification. Should we regard container identifications as better than binary ones, or vice versa, or the same? As point (2) shows, we moved away from trying to assess the identification strength, and simply reported what identifications were done by what method.
With these changes, the original XML would not work at all. It would, of course, be possible to create a new XML output which was very similar to the original, but accomodated the differences outlined above.
Regards,
Matt Palmer.
On Wednesday, October 24, 2012 5:01:35 PM UTC+1, Vladimir Knobel wrote: