Generate FileCollection report XML from Command Line with DROID 6.1

Vladimir Knobel

unread,

Oct 24, 2012, 12:01:35 PM10/24/12

to droid...@googlegroups.com

Hi

I'm having real trouble to generate the above mentioned XML report using D-6.1 like it was possible with D-4 with the command line modifier "-o".

Can anyone guide me on how to reproduce this with the new version of DROID?

Thxs!

V.

e.g.:

<FileCollection xmlns="http://www.nationalarchives.gov.uk/pronom/FileCollection">
	<DROIDVersion>4.0</DROIDVersion>
	<SignatureFileVersion>16</SignatureFileVersion>
	<DateCreated>2012-10-11T15:03:47</DateCreated>
	<IdentificationFile IdentQuality="Positive">
		<FilePath>somefile.pdf</FilePath>
		<FileFormatHit>
			<Status>Positive (Specific Format)</Status>
			<Name>Portable Document Format</Name>
			<Version>1.4</Version>
			<PUID>fmt/18</PUID>
			<MimeType>application/pdf</MimeType>
		</FileFormatHit>
	</IdentificationFile></FileCollection>

ross-spencer

unread,

Oct 24, 2012, 12:55:50 PM10/24/12

to droid...@googlegroups.com

Hi Vladimir,

I used to work on the DROID project (more so on 6.1). I believe this report was deprecated between DROID 4 and 5 due to a number of significant changes in the DROID paradigm / architecture. The main focus of the development then was to provide a different set of reports for government departments and the File Collection report was one of the reports that fell by the wayside. Others might be able to comment on why but I believe it was a combination of priorities, limitations in what it could report (i.e. it would need to be re-designed for containers) and new code would need to be created to generate the report after the code base was re-engineered.

You could roll back to 4.0 (as I imagine you rely on this report for your workflow (I myself used to maintain an XSLT to generate further reports based on this XML)), but the improvements in DROID 6.1 are numerous (Identification capability, containers, speed etc). A number of users do make use of one of the two CSV outputs from the tool now. The new version provides a simple CSV output of: "{file},{puid}".

I also suggest you head to the http://droid7.wikispaces.com site and log your requirements for an XML output similar to the 'FileCollection' in a future version of DROID. You could also include any identified improvements you'd like to see made too as it will provide a good opportunity to be able to make them.

I hope some of that information helps. I appreciate that accommodating such changes can however be difficult.

Ross

Matt Palmer

unread,

Oct 25, 2012, 5:38:10 AM10/25/12

to droid...@googlegroups.com

Hi Vladimir,

as the person responsible for these changes (projects 5 and 6), I'm sorry we didn't find time to include the XML generation. It was on the roadmap, but the CSV output was felt to be more important by those who expressed an interest. Had we time and resource, it would have been included. Hopefully it will be included in the DROID 7 project now underway.

However, I should explain that the reason for dropping the original XML output was that the data model of DROID 5 and 6 is substantially different to that of DROID 4 and below, which means the same XML schema output would not work - and even if an XML output is included in DROID 7, it will necessarily be different to previous versions.

The main changes were:

1. DROID can now scan inside zip and tar files, which themselves contain files. This means a file-path is not always available for all files which DROID scans, so a URI is now the preferred identifier (e.g. "zip://C:/files/largeArchive.zip!fileInsideZipFile.txt"). File paths are still available for files which are genuinely sitting directly in a file system, but not for files inside archival files.
2. The "Tentative", "Positive" system of identification was removed, as it was felt that it represented a subjective assessment of the strength of identification (based purely on whether the identification was done by a binary signature or a file extension). There were several cases where files were identified "positively", but the identification was in fact wrong. A new, more objective, system of identification was introduced, which tells you what kind of signature matched a file, and how many identifications there were. This allows a user of DROID to decide how much faith to place in different systems of identification, for different sorts of files.
3. A new form of identification "container signatures" was introduced to identify composite file formats (e.g. Microsoft DOCX is actually a zip file which contains certain other XML files). The previous binary signatures were reasonably bad at identifying these sorts of files, as the contents were obscured by the containing zip. Container identification now opens zip and ole2 files and looks inside them. This in itself made us change the positive/tentative system, as we now had three different forms of identification. Should we regard container identifications as better than binary ones, or vice versa, or the same? As point (2) shows, we moved away from trying to assess the identification strength, and simply reported what identifications were done by what method.

With these changes, the original XML would not work at all. It would, of course, be possible to create a new XML output which was very similar to the original, but accomodated the differences outlined above.

Regards,

Matt Palmer.

On Wednesday, October 24, 2012 5:01:35 PM UTC+1, Vladimir Knobel wrote:

Vladimir Knobel

unread,

Oct 25, 2012, 6:23:09 AM10/25/12

to droid...@googlegroups.com

Hi Ross and Matt

Thanks for your very comprehensive and detailed answers and the insights on what's is going on in the project.!

Now I have a better picture on how DROID works, but I may need a little more information to understand if we can continue using DROID in his last version.

I have seen and improvement by using the "container signatures" that's exactly the point that moved me to try to use the last version of DROID, specially the results when identifying an Office Word Document.

DROID 4 returns "OLE2 Compound Document Format fmt/111" when DROID 6 a more accurate "fmt/40 word 97-2003".

Now here my questions:

1. I have tested the build in reports and I can't seem to find one that links the file path (or URI) and the PRONOM PUID, MIME type and so on.

Is it possible to create my own report file in the "report_definitions" folder?

2. I've followed the help build in the GUID version of DROID to try to run it from the command line, I get to create a profile although it seems to hang after creating it (it doesn't return control back).

Here and example of my input:

droid -a "C:\some\path\to\a\folder\primarydata" -p "testprofile.droid"

Now when I try to run the profile following the instructions in the help by executing following command:

droid -p "testprofile.droid" -e "Results.csv"

I get an error stating:

No command line options specified
Invalid usage: use droid -h to print the options.

an then all the usage text... (by the way it will be nice to be able to disable the output of the usage after an error, it fills the console screen unnecessarily)

No matter what combination of parameter I choose I keep getting that error, it seems to be related to the -p command line option modifier.

It's that a known bug? Or am I missing something?

Thanks in advance for any hints!

Vlad

Message has been deleted

Vladimir Knobel

unread,

Oct 25, 2012, 6:35:01 AM10/25/12

to droid...@googlegroups.com

By the way, I'm running DROID in a Windows 7 x64 Machine with JRE 6. If more detailed info is needed do hesitate to ask me.

Also the No-Profile mode works a fine, like:

droid -Nr "C:\some\path\to\a\folder\primarydata" -Ns DROID_SignatureFile_V63.xml -Nc container-signature-20120828.xml

will return a list to the standard output just fine:

C:\some\path\to\a\folder\primarydata\Testpdf1-000003.TIFF,fmt/353C:\some\path\to\a\folder\primarydata\Testpdf2-000001.TIFF,fmt/353C:\some\path\to\a\folder\primarydata\Testpdf1.pdf,fmt/18

But I need more idetification information than that, i.e: MIME Type, Name and Version...

Matt Palmer

unread,

Oct 25, 2012, 6:46:52 AM10/25/12

to droid...@googlegroups.com

Hi Vladimir,

I can't really comment on the behaviour of DROID 6.1, as I'm no longer associated with the project, or working at the National Archives. The output of DROID 6.01 included all information in the command line, and as far as I know, worked fine generating profiles and CSV output (if a little awkwardly). Your command line options seem entirely reasonable to me - and are in fact correct for DROID 6.01, even if they are not for DROID 6.1. You should probably report these as separate bugs on the GitHub issue tracking system for DROID.

1. Include PDF of help in main distribution.
2. Include all identification information in console output (mime-type, name, version, identification method, warnings, etc.) - the same output as for CSV.
3. Exporting profiles using command line options does not work (or at least, does not work as described in the help file).

Regards,

Matt.

Matt Palmer

unread,

Oct 25, 2012, 6:53:00 AM10/25/12

to droid...@googlegroups.com

Hi Vladimir,

if report generation is the same as DROID 6.01, then you can indeed create your own reports in report_definitions. They are just XML files in a folder which specify which information should be included or filtered out in the report.

You can additionally create different output formats using XSLT by including an XSLT file of the DROID report XML in the report_definitions folder, using the naming convention:

[name of output format].[output extension].xsl

Regards,

Matt

Vladimir Knobel

unread,

Oct 25, 2012, 7:00:53 AM10/25/12

to droid...@googlegroups.com

Hi Matt

Thanks for confirming this!

But I've just realized the CSV export may work as well, only if the command line worked :)

Well I'm going to report the bugs and enhancements.

Thank you very much for your help!

Regards

Vlad

Vladimir Knobel

unread,

Oct 25, 2012, 9:56:16 AM10/25/12

to droid...@googlegroups.com

Hi Matt

Here and update, in the case you are curious or something may find help by reading this.

The problem with the command line option modifiers is that the examples in the documentation are outdated, and the order of the options matter (strange but the do!).

So a correct command to export to a CSV file your previously generated profile would be something like this:

droid -E "exportreults_testprofiledroid_commandline.csv" -p "testprofile_word.droid"

and not like it's described in the GUI Help:

droid -p "testprofile_word.droid" -E "exportreults_testprofiledroid_commandline.csv"

The hint came from the release notes from version 6.1

- CLI options reordered for usability

and in fact the help displayed after the error shows them in the right order, but he message could be more descriptive....

Thxs again!

Vlad

Matt Palmer

unread,

Oct 25, 2012, 10:10:26 AM10/25/12

to droid...@googlegroups.com

Hi Vladimir,

that's strange. I just installed DROID 6.1 and ran it using the -p option followed by the -E option, and it actually worked for me. So I'm not sure it's anything to do with the order of the options... or maybe it's a bug that only affects some users...?

A mystery...

Matt.

Vladimir Knobel

unread,

Oct 29, 2012, 9:27:49 AM10/29/12

to droid...@googlegroups.com

Thanks for taking time to test it on your side, it's indeed really strange.

I'm running droid 6.1 (I don't know if there are different builds but mine has following MD5 checksum -- > 9445acd75225057c9bfc3cd948ffb0e4 droid-command-line-6.1.jar) on a Windows 7 Enterprise x64 machine with JRE 6 (build 1.6.0_34-b04).

Since we have seen the way of generating a profile before identifying the files is quite slow, we will try to extend the console output to something more comprehensive here in-house. With a programmer who is proficient in java and then try to contribute the changes to the project. We may also have a look on the option modifiers order issue.

Best Regards

Vlad

Dclipsham

unread,

Nov 2, 2012, 8:34:23 AM11/2/12

to droid...@googlegroups.com

Thank you Vlad. We'd be very keen to see what your in-house team comes up with.

David

Reply all

Reply to author

Forward