Droid Embedded guidance

240 views
Skip to first unread message

Frederic Brégier

unread,
Sep 25, 2012, 9:57:08 AM9/25/12
to droid...@googlegroups.com
Hi,

I'm interesting to use Droid embedded in an open source project.

I'm following almost the same goal than the previous "fits" open source project but this one relies on old version of Droid.

So I have some questions:
1) Is there any issue to embed the Droid jars within an open source project (license for instance) ?
2) Is there any guidance to do that goal, and in particular identifying how to get one file identification (one by one) ?
3) Among all dependency, what are the necessary jars in the context of an "embedded" solution ?

Thank you,
Frederic

Dclipsham

unread,
Sep 26, 2012, 6:41:42 AM9/26/12
to droid...@googlegroups.com
Hi Frederic.

1) Absolutely none, providing the licence that you choose for your open source project is not in-compatible. DROID 6.1 is released under the BSD Licence, which is very very permissive.
2) There is no explicit public API for DROID 6. Ideally one should be added in the near future.
However, the functionality that you need basically exists. I would start by looking at the java class uk.gov.nationalarchives.droid.command.action.NoProfileRunCommand in the droid-command-line module.
If you look at the execute() function, you can see how to programmatically construct a DROID instance and have it scan a folder.
3) DROID 6.1 is build using Maven and all of the dependencies required for DROID are documented in its pom.xml files (see the source code on GitHub). If you choose to use Maven for your project as well you can simply add a dependency on DROID and DROID’s dependencies will be automatically pulled in for you.

I hope this answers your questions.

David

Frederic Brégier

unread,
Sep 26, 2012, 8:10:39 AM9/26/12
to droid...@googlegroups.com
Hi David,

First I really thank you for your answers.

Regarding the license, we will be either in your model (BSD) or GPL (LGPL probably).

As I was going through the code, I reach the exact same pointers than you give to me.
I found out some elements that could make the life easier for someone wanting like me using Droid in embedded mode. Here are my thoughs, taking into consideration that it is just a proposal from my side.

A) Getting the full signature information
When using the current NoProfileRunCommand implementation, and in particular the current BinarySignatureIdentifier, I was obliged to extend the BinarySignatureIdentifier to my own code just by overriding the following code:

    /**
     * @return the sigFile
     */
    public FFSignatureFile getSigFile() {
        return super.getSigFile();
    }

This is needed if someone wants then to check the "PUID" get in response to allow to find the full signature information available (name, version, mimetype, ...). Currently, this method is "protected" (none defined in fact so protected by default). Making it "public" will simplify the reuse of the initialize BinarySignatureIdentifier without having to create a new one to only get the signature informations.

B) Getting the information back to a list instead of System.out
When using the current NoProfileRunCommand implementation, I have to put in place a workaround that catch the System.out like this :

        PrintStream oldOut = System.out;
        DroidOutputStream out = new DroidOutputStream(myBinarySignatureIdentifier.getSigFile());
        PrintStream newOut = new PrintStream(out, true);
        System.setOut(newOut);
        try {
            command.execute();
        } catch (CommandExecutionException e) {
            e.printStackTrace();
        }
        System.setOut(oldOut);
        newOut.close();

Having the possibility to add an extra argument as PrintStream, or better to be able to call the very same code but having the result as a List<String> or equivalent could help a lot the embedded project.

C) And as I go within the great Droid code, I saw that you may add the bzip2 support too easily (as there are already zip, tar, gzip format supports).
It is not that needed however...Just that this method is available in Commons-Compress.
I test it and it is really easy (based on GzipArchiveContentIdentifier).


Regarding the Maven, I use Eclipse 3.7 and currently the project is not showing up correctly. I may have wrongly setup the droid project. So I manually add the dependencies from the binary package.
I will look at the various sub-pom.xml to check the right dependencies (limiting the number of jar since I don't use the gui interface).

Also one question more : I saw that the project page says that the droid jars will be on maven repository. I was not able to find them. Is it because it is not already done ?

Best regards,
Frederic

Frederic Brégier

unread,
Sep 27, 2012, 11:43:13 AM9/27/12
to droid...@googlegroups.com
Hi,

Just to let you know that I am able through eclipse to import the jar (in fact, Eclipse version 4.2).
Also, as far as I test, I was able to embed Droid. Thank you !

Best regards,
Frederic
Message has been deleted

Vladislav Korecký

unread,
Nov 14, 2012, 7:45:57 AM11/14/12
to droid...@googlegroups.com
Hi Frederic,
I followed your guide how to embedded DROID and it works, thank you.
But I cannot get mime-type, version and other information from container file type like docx. When I debug class ResultPrinter I found that mime-type and other informations returns only binary identifier and no container identifier.
All PUIDs are correct, I only missing mime-type and version information in container identification result.
Could you please help me ?

Thank you in advance,
Vlada

Frederic Brégier

unread,
Nov 14, 2012, 10:39:39 AM11/14/12
to droid...@googlegroups.com
Hi Vlada,

As I wrote in other mails, four our prototype, we had to adapt some codes from Droid (by replacing or inherit some Droid code), in particular the possibility to access the internal representation of Droid of the BinarySignature. Once I have the puid, I can request the internal repository of Droid to get the corresponding SIgnature, and therefore having the related "descriptive" information.

You can have a look at the prototype code at :
https://github.com/fredericBregier/VitamTools
And in particular :
https://github.com/fredericBregier/VitamTools/tree/master/src/main/java/fr/gouv/culture/vitam/droid
and
https://github.com/fredericBregier/VitamTools/tree/master/src/main/java/uk/gov/nationalarchives/droid

This code is by no mean a final product, just a prototype, and contains more than just Droid interface (exif and jhove, plus digest support plus some french specificities).

I have asked TNA to adapt the Droid code (maybe in version 7 or earlier) to enable such code without having to made those "ugly" bypass.

Best regards,
Frederic

Vladislav Korecký

unread,
Nov 15, 2012, 3:26:34 AM11/15/12
to droid...@googlegroups.com
Thank you Frederic,
your workaround works perfectly and I have now all required informations from DROID.

Vlada

Dne středa, 14. listopadu 2012 16:39:39 UTC+1 Frederic Brégier napsal(a):

d.lem...@docuteam.ch

unread,
Jan 9, 2013, 3:41:27 AM1/9/13
to droid...@googlegroups.com

Hi David

I could successfully embed and use DROID in our project :-)

Now can you give me a hint where to start reading when I want to test files using the CONTAINER method?

What works fine is checking files using the SIGNATURE and EXTENSION method.

Many thanx in advance
:-Denis

Dclipsham

unread,
Jan 11, 2013, 8:31:30 AM1/11/13
to droid...@googlegroups.com
Hi Denis,

According to the Code Structure (found on this page on Github - http://digital-preservation.github.com/droid/), the Container processing is handled by droid-container
 
I'm sorry I can't be more specific, but if you describe in more detail what you are trying to achieve, I or one of our other members may be able to help.

David

Matt Palmer

unread,
Jan 11, 2013, 8:37:57 AM1/11/13
to droid...@googlegroups.com

Have a look at the submissiongateway in Droid-results.  This invokes calls on both binary signatures, container signatures, and dispatch to processing archival formats.

It's a bit confusing, as the real purpose of the submissiongateway is just to schedule jobs to be processed in a multithreaded way. Unfortunately, it also ended up with the identification logic embedded in it too

These should really be cleanly separated in a future version of Droid, possibly with a new identification package that contains the identification logic only.

Regards,

Matt palmer.

--
You received this message because you are subscribed to the Google Groups "droid-list" group.
To post to this group, send an email to droid...@googlegroups.com.
To unsubscribe from this group, send email to droid-list+...@googlegroups.com.
Visit this group at http://groups.google.com/group/droid-list?hl=en-GB.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

d.lem...@docuteam.ch

unread,
Jan 15, 2013, 5:04:56 AM1/15/13
to droid...@googlegroups.com

Hi David and Matt

Many thanx for your answers. I could solve my issue by myself in the meanwhile. What I needed was a collection of IdentificationResults for a file; however this collection should include all available IdentificationResults: binary signatures, container signatures, and extension matches; i.e. their "method" is one of "SIGNATURE", "CONTAINER", or "EXTENSION".

To accomplish this, I had to do quite some programming. I find it's a pity that DROID doesn't offer a simple external interface like

<code>
List<IdentificationResult> getIdentificationResults(String filePath);
List<IdentificationResult> getIdentificationResults(File file);
</code>

to retrieve the relevant informations easily.

This is my implementation. It is an abstract class that offers only static methods because I don't need several different instances.

<code>

/**
 * Copyright (C) 2011-2013 Docuteam GmbH
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 3
 * as published by the Free Software Foundation.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
package ch.docuteam.docutools.file;

import java.io.*;
import java.util.List;

import uk.gov.nationalarchives.droid.command.action.CommandExecutionException;
import uk.gov.nationalarchives.droid.command.container.Ole2ContainerContentIdentifier;
import uk.gov.nationalarchives.droid.command.container.ZipContainerContentIdentifier;
import uk.gov.nationalarchives.droid.container.*;
import uk.gov.nationalarchives.droid.container.ole2.Ole2IdentifierEngine;
import uk.gov.nationalarchives.droid.container.zip.ZipIdentifierEngine;
import uk.gov.nationalarchives.droid.core.BinarySignatureIdentifier_Extended;
import uk.gov.nationalarchives.droid.core.interfaces.*;
import uk.gov.nationalarchives.droid.core.interfaces.archive.IdentificationRequestFactory;
import uk.gov.nationalarchives.droid.core.interfaces.resource.FileSystemIdentificationRequest;
import uk.gov.nationalarchives.droid.core.interfaces.resource.RequestMetaData;
import uk.gov.nationalarchives.droid.core.signature.FileFormat;
import ch.docuteam.docutools.file.exception.*;
import ch.docuteam.docutools.out.Logger;

/**
 * This is an abstract class for getting file format information using DROID.
 * <br>
 * The DROID subsystem requires the signature file "config/DROID_SignatureFile_V66.xml" to be in the folder "config" in the working directory.
 *
 * @author denis
 *
 */
public abstract class MetadataProviderDROID
{
// ===========================================================================================
// ======== Structure =======================================================
// ===========================================================================================

// ======== Static Final Public =======================================================

// ======== Static Final Private =======================================================

static private final String DefaultSignatureFile = "config/DROID_SignatureFile_V66.xml";
static private final String DefaultContainerSignatureFile = "config/container-signature-20121218.xml";

// ======== Static Public =======================================================

// ======== Static Private =======================================================

static private String SignatureFile = DefaultSignatureFile;
static private String ContainerSignatureFile = DefaultContainerSignatureFile;

static private BinarySignatureIdentifier_Extended SignatureIdentificator = null;
static private ContainerSignatureDefinitions ContainerSignatureDefs = null;
static private List<TriggerPuid> ContainerSignatureTriggerPuids = null;

private static Boolean IsInitialized = false;

// ===========================================================================================
// ======== Methods =======================================================
// ===========================================================================================

// ======== Static Public =======================================================

// -------- Initializing -------------------------------------------------------

static public void setSignatureFile(String newSignatureFile)
{
SignatureFile = newSignatureFile;

IsInitialized = false;
// I will (re-)initialize myself the next time I am used.
}

static public void setContainerSignatureFile(String newContainerSignatureFile)
{
ContainerSignatureFile = newContainerSignatureFile;

IsInitialized = false;
// I will (re-)initialize myself the next time I am used.
}

/**
* Deprecated - use method setSignatureFile() instead.
* @throws DROIDCouldNotInitializeException
*/
@Deprecated
static public void setConfigFile(String newSignatureFile)
{
setSignatureFile(newSignatureFile);
}

// -------- Accessing -------------------------------------------------------

static public IdentificationResult getIdentificationResult(String filePath) throws DROIDCouldNotInitializeException, DROIDNoIdentificationFoundException, DROIDMultipleIdentificationsFoundException, FileNotFoundException
{
List<IdentificationResult> resultList = getIdentificationResults(filePath);

if (resultList == null || resultList.isEmpty()) throw new DROIDNoIdentificationFoundException();
if (resultList.size() != 1) throw new DROIDMultipleIdentificationsFoundException(resultList);

return resultList.get(0);
}

// The following are convenience methods (shortcuts for retrieving specific metadata directly):

static public String getFileFormatPUID(String fileName) throws DROIDCouldNotInitializeException, DROIDNoIdentificationFoundException, DROIDMultipleIdentificationsFoundException, FileNotFoundException
{
IdentificationResult result = getIdentificationResult(fileName);
return (result == null)? null: result.getPuid();
}

static public String getMimeType(String fileName) throws DROIDCouldNotInitializeException, DROIDNoIdentificationFoundException, DROIDMultipleIdentificationsFoundException, FileNotFoundException
{
IdentificationResult result = getIdentificationResult(fileName);
return (result == null)? null: result.getMimeType();
}

static public String getFileFormatName(String fileName) throws DROIDCouldNotInitializeException, DROIDNoIdentificationFoundException, DROIDMultipleIdentificationsFoundException, FileNotFoundException
{
IdentificationResult result = getIdentificationResult(fileName);
return (result == null)? null: result.getName();
}

static public String getFileFormatVersion(String fileName) throws DROIDCouldNotInitializeException, DROIDNoIdentificationFoundException, DROIDMultipleIdentificationsFoundException, FileNotFoundException
{
IdentificationResult result = getIdentificationResult(fileName);
return (result == null)? null: result.getVersion();
}

static public String getFileFormatMethod(String fileName) throws DROIDCouldNotInitializeException, DROIDNoIdentificationFoundException, DROIDMultipleIdentificationsFoundException, FileNotFoundException
{
IdentificationResult result = getIdentificationResult(fileName);
return (result == null)? null: result.getMethod().getMethod();
}

// ======== Static Private =======================================================

// -------- Initializing -------------------------------------------------------

static private void initializeIfNecessary() throws DROIDCouldNotInitializeException
{
if (IsInitialized) return;

try
{
Logger.debug("Initializing DROID...");

SignatureIdentificator = new BinarySignatureIdentifier_Extended();
SignatureIdentificator.setSignatureFile(SignatureFile);
SignatureIdentificator.init();

ContainerSignatureDefs = new ContainerSignatureSaxParser().parse(new FileInputStream(ContainerSignatureFile));
ContainerSignatureTriggerPuids = ContainerSignatureDefs.getTiggerPuids();

IsInitialized = true;

Logger.debug("...OK");
}
catch (java.lang.Exception ex)
{
Logger.debug("...NOK!");

throw new DROIDCouldNotInitializeException(ex);
}
}

// -------- Calculating -------------------------------------------------------

static private List<IdentificationResult> getIdentificationResults(String filePath) throws DROIDCouldNotInitializeException, FileNotFoundException
{
initializeIfNecessary();

if (!new File(filePath).exists()) throw new FileNotFoundException(filePath);

File file = new File(filePath);
RequestMetaData metadata = new RequestMetaData(file.length(), file.lastModified(), file.getName());
RequestIdentifier id = new RequestIdentifier(file.toURI());
IdentificationRequest request = new FileSystemIdentificationRequest(metadata, id);

FileInputStream fis = null;
try
{
fis = new FileInputStream(file);
request.open(fis);

// Identify the file. Try 3 different systems: first container, then binary (= signature), and finally file extension:
// (NOTE: To get the container identifications, I need the binary identifications)
IdentificationResultCollection signatureResultCollection = SignatureIdentificator.matchBinarySignatures(request);
IdentificationResultCollection containerResultCollection = getContainerResults(request, signatureResultCollection);

IdentificationResultCollection finalResultCollection;
if      (containerResultCollection.getResults().size() > 0) finalResultCollection = containerResultCollection;
else if (signatureResultCollection.getResults().size() > 0) finalResultCollection = signatureResultCollection;
else finalResultCollection = SignatureIdentificator.matchExtensions(request, false);

SignatureIdentificator.removeLowerPriorityHits(finalResultCollection);

return finalResultCollection.getResults();
}
catch (Exception ex)
{
ex.printStackTrace();
return null;
}
finally
{
try
{
request.close();
if (fis != null) fis.close();
}
catch(IOException ex){};
}
}


private static IdentificationResultCollection getContainerResults(IdentificationRequest request, IdentificationResultCollection results) throws CommandExecutionException
{
IdentificationResultCollection containerResults = new IdentificationResultCollection(request);

for (IdentificationResult identResult: results.getResults())
{
String filePuid = identResult.getPuid();
if (filePuid == null) continue;

TriggerPuid containerPuid = null;
for (TriggerPuid tp: ContainerSignatureTriggerPuids)
{
if (tp.getPuid().equals(filePuid))
{
containerPuid = tp;
break;
}
}
if (containerPuid == null) continue;

IdentificationRequestFactory requestFactory = new ContainerFileIdentificationRequestFactory();
String containerType = containerPuid.getContainerType();

if ("OLE2".equals(containerType))
{
try
{
Ole2ContainerContentIdentifier ole2Identifier = new Ole2ContainerContentIdentifier();
ole2Identifier.init(ContainerSignatureDefs, containerType);
Ole2IdentifierEngine ole2IdentifierEngine = new Ole2IdentifierEngine();
ole2IdentifierEngine.setRequestFactory(requestFactory);
ole2Identifier.setIdentifierEngine(ole2IdentifierEngine);
ole2Identifier.process(request.getSourceInputStream(), containerResults);
}
catch (IOException e)
{
e.printStackTrace();
}
}
else if ("ZIP".equals(containerType))
{
try
{
ZipContainerContentIdentifier zipIdentifier = new ZipContainerContentIdentifier();
zipIdentifier.init(ContainerSignatureDefs, containerType);
ZipIdentifierEngine zipIdentifierEngine = new ZipIdentifierEngine();
zipIdentifierEngine.setRequestFactory(requestFactory);
zipIdentifier.setIdentifierEngine(zipIdentifierEngine);
zipIdentifier.process(request.getSourceInputStream(), containerResults);
}
catch (IOException e)
{
e.printStackTrace();
}
}
else
{
throw new CommandExecutionException("Unknown container type: " + containerPuid);
}
}

IdentificationResultCollection finalContainerResults = new IdentificationResultCollection(request);
for (IdentificationResult r: containerResults.getResults())
{
FileFormat ff = SignatureIdentificator.getFileFormatForPuid(r.getPuid());
finalContainerResults.addResult(new MetadataProviderDROID_IdentificationResult(r, ff));
}

return finalContainerResults;
}

}

</code>

Message has been deleted

Dclipsham

unread,
Jan 16, 2013, 8:18:16 AM1/16/13
to droid...@googlegroups.com
Denis,

Thank you for your post and for documenting your successful implementation. We are currently recruiting a new Digital Preservation Analyst Developer ( http://ig24.i-grasp.com/fe/tpl_nationalarchives01.asp?s=PyAxDIfSqHTyVvHqn&jobid=64413,9852215965 ) and the successful applicant will be tasked with driving the development of DROID. I hope that we'll be able to take your interface suggestions forward as I'm sure many would find this useful.

David
Reply all
Reply to author
Forward
0 new messages