DROID not identifying EML files

105 views
Skip to first unread message

tol...@gmail.com

unread,
Oct 14, 2014, 8:58:15 AM10/14/14
to droid...@googlegroups.com
Hi all,
 
I recently ran DROID on a sample of EML files (i.e. files exported from Windows Mail (OS = Vista).  DROID correctly lists the extension as EML but the "Format" and "MIME type" & "PUID" fields are all blank which leads me to the impression that these files are not true "EML" files.  I am able to access the files via MS Outlook 2013.

Any ideas of why this might be occurring?

Thanks,
 
Heather Tompkins

Lehane, Richard

unread,
Oct 14, 2014, 5:18:52 PM10/14/14
to droid...@googlegroups.com
Hi Heather
The two byte signatures for EML (http://apps.nationalarchives.gov.uk/PRONOM/fmt/278) are pretty specific and probably aren't covering your examples (there is a note on the signature page saying: "Signature may be too prescriptive and will need improving with more eml format testing.") 

The two signatures PRONOM currently has are:
"X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350"
and
"X-Converted-By: Emailchemy ??"

You could open your EML files in a text editor to check whether they have an X header that is different (or if you can share any of these EMLs, send them to the list). If so, it may mean a PRONOM update.

cheers
Richard



From: droid...@googlegroups.com [droid...@googlegroups.com] on behalf of tol...@gmail.com [tol...@gmail.com]
Sent: Tuesday, 14 October 2014 11:58 PM
To: droid...@googlegroups.com
Subject: DROID not identifying EML files

--
You received this message because you are subscribed to the Google Groups "droid-list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to droid-list+...@googlegroups.com.
To post to this group, send email to droid...@googlegroups.com.
Visit this group at http://groups.google.com/group/droid-list.
For more options, visit https://groups.google.com/d/optout.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

richard....@nationalarchives.gsi.gov.uk

unread,
Oct 15, 2014, 9:25:45 AM10/15/14
to droid...@googlegroups.com
Hello Heather
 
As Richard says your files vary from the signatures we have in PRONOM at the moment. Do send us some examples if possible at PRO...@nationalarchives.gsi.gov.uk for us to take a look at and we may be able to adjust the existing signature if we can see another variant of the format.
 
Thanks, Richard

tol...@gmail.com

unread,
Oct 20, 2014, 10:12:47 AM10/20/14
to droid...@googlegroups.com
Hi Richard,
 
Thanks for the info.  I opened one of the EML files in Notepad and got the following:
 
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
 
So perhaps it is not the right MIME type for EML? 
 
I will look into sharing the files (the files are a test sample from a donor so I will need to obtain permission first).
 
Thanks,
 
Heather
 
 

On Tuesday, October 14, 2014 5:18:52 PM UTC-4, Richard Lehane wrote:
Hi Heather
The two byte signatures for EML (http://apps.nationalarchives.gov.uk/PRONOM/fmt/278) are pretty specific and probably aren't covering your examples (there is a note on the signature page saying: "Signature may be too prescriptive and will need improving with more eml format testing.") 

The two signatures PRONOM currently has are:
"X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350"
and
"X-Converted-By: Emailchemy ??"

You could open your EML files in a text editor to check whether they have an X header that is different (or if you can share any of these EMLs, send them to the list). If so, it may mean a PRONOM update.

cheers
Richard



From: droid...@googlegroups.com [droid...@googlegroups.com] on behalf of tol...@gmail.com [tol...@gmail.com]
Sent: Tuesday, 14 October 2014 11:58 PM
To: droid...@googlegroups.com
Subject: DROID not identifying EML files

Hi all,
 
I recently ran DROID on a sample of EML files (i.e. files exported from Windows Mail (OS = Vista).  DROID correctly lists the extension as EML but the "Format" and "MIME type" & "PUID" fields are all blank which leads me to the impression that these files are not true "EML" files.  I am able to access the files via MS Outlook 2013.

Any ideas of why this might be occurring?

Thanks,
 
Heather Tompkins

--
You received this message because you are subscribed to the Google Groups "droid-list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to droid-list+...@googlegroups.com.
To post to this group, send email to droi...@googlegroups.com.

Visit this group at http://groups.google.com/group/droid-list.
For more options, visit https://groups.google.com/d/optout.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________

tol...@gmail.com

unread,
Oct 20, 2014, 10:13:48 AM10/20/14
to droid...@googlegroups.com
Hi Richard,
 
I will look into sharing the files & send then to the address you specified if possible so that PRONOM can be updated.
 
Thanks,
 
Heather

Tyler Thorsted

unread,
Jul 21, 2016, 10:32:03 AM7/21/16
to droid-list
Hi, I would like to bring this topic back to the front for discussion. It looks like the PRONOM signature is still the original from 2010 and has not been updated with new variations. 

We see most of our EML files identified as HTML fmt/99 documents. 

Thanks.

Tyler

Dclipsham

unread,
Jul 22, 2016, 4:34:29 AM7/22/16
to droid-list
Hi Tyler,

Are you able to provide samples to pro...@nationalarchives.gsi.gov.uk? Usual caveats apply - will not share or use beyond the PRONOM team here at The National Archives and will delete as soon as finished with. Based upon Heather's observations, I'm of a mind to chop certainly the revision number, possibly also the minor version number (assuming this is what they represent) from the X header, but could do with samples to show this would suffice....

David

Dclipsham

unread,
Jul 22, 2016, 7:05:40 AM7/22/16
to droid-list
I attach a sample set of 5 .eml files from the following scenarios:
2x Outlook Express 6 introductory emails
Mozilla Thunderbird email without a subject
Gmail email plain text (a copy of the email I received when I posted the above)
Gmail email with attachment

Based on the above, the following fields are consistently present, but the order is not:
MIME-Version: 1.0
To: 
From: 
Date: 
Content-Type: 
Content-Transfer-Encoding: 


Additionally we received an email into the PRONOM inbox in June from Jay Gattuso as follows:
----

Hi PRONOM,

 

I have another new signature for you.

 

I was poking around with some .eml files and found that the current signature (fmt/278) is not ideal or very good. Out of 789 files, I see 1 match in droid as is  – the comments in the signature section hint towards the same.

 

I made another signature that might be useful – this would be a new format.

 

Extension is .eml

Binary sig is “4d494d452d56657273696f6e3a20312e30”

Name: MIME email version 1.0

 

Offset is tough to nail down, but I can see it occurs between 393 and 2236  bytes from BOF  in the small sample set I have.

 

My sample set is 789 eml files.

 

Running the set through droid gives me only one (1) match to the current fmt/278 signature.

 

Interestingly, if I change the signature to being a less sensitive string I get a few more (24)

 

Tightened sig is hex version of “X-MimeOLE: Produced By Microsoft MimeOLE V6.00” rather than the existing “ X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350”

 

The hit count goes up to 54 if its reduced to “X-MimeOLE:”

 

I have attached the two xml signature snippets.

 

Using my proposed signature I see 515 matches.

 

A sample message can be found here: http://www.phpclasses.org/browse/file/14672.html

 

MS cover it here:- https://support.microsoft.com/en-us/kb/836555

 

Q5: What is a MIME version header?

A5: The MIME version header field denotes a MIME formatted message. Messages that are sent from earlier software that do not support MIME do not have this field. Mail clients use the absence of this field to distinguish non-MIME messages.”

 

 

 

Is this enough to go on?

 

Cheers,

 

J  

----
I agree with Jay that the existing .eml signatures are not good. I like the suggestion of using 'MIME-Version: 1.0', and having this as a new entry but would like to strengthen it, so I'm proposing using all the consistent fields, as a multi-BOF signature (we've historically tried to avoid this where possible, but it seems to be the best way here). I'd give a fairly arbitrary offset of up to 16k for each field, both to tie the ID relatively near the start of the files, and to account for the layout variations seen. 

The resulting signature file is attached, so Tyler, Heather, Jay and anybody else interested, please do test with your collections and report back your findings.

I'd also relax the "X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350" to "X-MimeOLE: Produced By Microsoft MimeOLE V6.00" in the existing signature


David


EML.zip
DROID_SignatureEMLTESTFile_V104.xml

Dclipsham

unread,
Jul 22, 2016, 9:59:13 AM7/22/16
to droid-list
Tyler has forwarded me a sample generated by Apple Mail that did not contain the 'Content-Transfer-Encoding: ' field so a revised signature excluding this field is attached.
DROID_SignatureEMLTESTFile_V104v2.xml

Dclipsham

unread,
Jul 26, 2016, 2:06:20 PM7/26/16
to droid-list
I've further refined the SIG to account for a MIME/Mime capitalisation variation. Hard deadline for release is today, so I'm running with this signature for tomorrows signature release. There is scope to use content-transfer-encoding to be more specific with ID around character encoding, differentiating for example, between UTF-8, Ansi1252 etc, so I request examples where this field is used (not entire email messages, just the parameter given where the content-transfer-encoding field header is employed).
Reply all
Reply to author
Forward
0 new messages