Wildcards (?? * {n}) not being matched in DROID container signatures

32 views
Skip to first unread message

ross-spencer

unread,
Jul 24, 2014, 11:16:34 PM7/24/14
to droid...@googlegroups.com
Dear Droid-list,

I have been doing some work on container signatures to see a number of formats potentially added to PRONOM. 

I am working on Thumbs.db files at present and have discovered in the early phases of this research that with reasonable confidence the regular expression matching in container signatures is faulty. 

Thumbs.db files are OLE2 based objects. As such, they can be sent through to the container recognition mechanism in DROID 6.1.3.

Their structure is:

Thumbs.db
   - 1
   - 2..n (images stored in Thumbs.db)
   - Catalog

The numbered files and 'Catalog' sit at the root level, and there are no additional files or folders. These files do not have an extension.  

I am able to get a simple container signature working by just looking for 'Catalog'. Further, I can get it to match a basic byte sequence in the object: 10 00 07 00

However, I can't get it to match longer, more variable byte sequences using various regular expression components. For example:

10 00 07 00 ?? 00 
10 00 07 00 {1} 00 
10 00 07 00 * 00 

Do not match. 

The wildcard represents the number of image files stored inside the Thumbs.db file. For the minute I can't verify which additional bytes it will use for larger numbers.

For now I can use the workaround:

10 00 07 00 [00:FF] 00 

Or just truncate the signature. However, my final Thumbs.db signature is likely to be much longer, and also use a second file within the object to strengthen it. 

Attached are three files exhibiting the problem and the two working ones. I've also included a bear bones signature file containing OLE2 and the reference information for Thumbs.db. 

I've also attached one of a sample of Thumbs.db files that I have been testing with. 

I'll create a corresponding GitHub issues ticket for this, GitHub doesn't allow for file attachments. 

Please let me know if I can provide further information or help with testing. 

Cheers. 

Ross
thumbs.db-signature-files.zip
thumbs-for-tna.db

Lehane, Richard

unread,
Jul 24, 2014, 11:48:31 PM7/24/14
to droid...@googlegroups.com

Hi Ross and all

I’ve been messing with Thumbs.db files a bit lately and not all of them have that Catalog object. E.g. the file attached. Perhaps different versions of Windows produce different flavours of thumbs??

In these ones, the image objects have names that look a bit like GUIDs rather than incrementing integers.

An alternative strategy for a container signature might be to look for the JFIF signature in the image objects. In all the thumbs files I’ve dug into this seems to start at either offset 12 or offset 24. You can tell by looking at the first four bytes in the file. This is an unsigned int that will represent either of those two offsets (i.e. x000000c0 or x00000018).

Cheers

Richard

--
You received this message because you are subscribed to the Google Groups "droid-list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to droid-list+...@googlegroups.com.
To post to this group, send email to droid...@googlegroups.com.
Visit this group at http://groups.google.com/group/droid-list.
For more options, visit https://groups.google.com/d/optout.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
Thumbs.db

ross-spencer

unread,
Jul 25, 2014, 12:08:18 AM7/25/14
to droid...@googlegroups.com
Hi Richard,

I'm planning on starting a different thread for Thumbs.db as a potential identification, documenting some of my thinking. If you're OK with this I can include your notes and possibly create a development signature alongside my own to ask people to test.  I posted this as soon as I could to pass on the regular expression fault asap to the DROID team and get some feedback on it. I'm hoping to develop another OLE2 signature for Serif PagePlus files. This is really excellent information though and I think it should be fairly easy to develop a test signature for - although without a single consistent file to hook into it might be difficult to capture all flavours... 

Cheers,

Ross

To post to this group, send email to droi...@googlegroups.com.


Visit this group at http://groups.google.com/group/droid-list.
For more options, visit https://groups.google.com/d/optout.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

Reply all
Reply to author
Forward
0 new messages