Magic Number Files

20 views

Skip to first unread message

Eustacio Gadit

unread,

Jul 24, 2024, 7:43:27 PM7/24/24

to elguaphara

This article aims at giving an introduction to magic numbers and file headers, how to extract a file based on magic numbers, and how to corrupt and repair a file based on magic numbers in the Linux environment.

magic number files

Download File 🔗 https://bltlly.com/2zLE7d

Most files have the signatures in the bytes at the beginning of the file, but some file systems may even have the file signature at offsets other than the beginning. For example, file system ext2/ext3 has bytes 0x53 and 0xEF at the 1080th and 1081st position.

I am designing a binary file format from scratch, and I would like to include some magic bytes at the beginning so that it can be identified easily. How do I go about choosing which bytes? I am not aware of any central registry of magic numbers, so is it just a matter of picking something fairly random that isn't already identified by, say, the file command on a nearby UNIX box?

Stay away from super-short magic numbers. Just because you're designing a binary format doesn't mean you can't use a text string for identifier. Follow that by an EOF char, and as an added bonus people who cat or type your binary file won't get a mangled terminal.

There is no universally correct way. Best practices can be suggested, but these often situational. For example, if you're checking the integrity of volatile memory, which has an undefined initial state when power is applied, it may be beneficial to incorporate many 0s or 1s in a sequence (i.e. FFF0 00FF F000) which can stand out against random noise.

If the file is mostly binary, a popular choice is using a text encoding like ASCII which stands out among the binary data in a hex editor. For example, GIF uses GIF89a, FLAC uses fLaC. On the other hand, a plain text identifier may be falsely detected in a random text file, so invalid/control characters might be incorporated.

In general, it does not matter that much what they are, even a bunch of NULL bytes can be used for file detection. But ideally you want the longest unique identifier you can afford, and at minimum 4 bytes long. Any identifier under 4 bytes will show up more often in random data. The longer it is, the less likely it will ever be detected as a false positive. Some known examples are as long as 40 bytes. In a way, it's like a password.

That said, a single file signature should not be the only line of defense. The actual parsing process itself should be able to verify integrity and weed out invalid files even if the signature matches. This can be done with additional file signatures, using length-sensitive data, value/range checking, and especially, hash/checksum values.

This table of file signatures (aka "magic numbers") is a continuing work-in-progress. I had found little information on this in a single place, with the exception of the table in Forensic Computing: A Practitioner's Guide by T. Sammes & B. Jenkinson (Springer, 2000); that was my inspiration to start this list in 2002. See also Wikipedia's List of file signatures. Comments, additions, and queries can be sent to Gary Kessler at g...@garykessler.net.

This list is not exhaustive although I add new files as I find them or someone contributes signatures. Interpret the table as a one-way function: the magic number generally indicates the file type whereas the file type does not always have the given magic number. If you want to know to what a particular file extension refers, check out File Extension Seeker: Metasearch engine for file extensions.

My software utility page contains a custom signature file based upon this list, for use with FTK, Scalpel, Simple Carver, Simple Carver Lite, and TrID. There is also a raw CSV file and JSON file of signatures.

Tim Coakley's Filesig.co.uk site, with Filesig Manager and Simple Carver. Also, see Tim's SQLite Database Catalog page, "a repository of information used to identify specific SQLite databases and properties for research purposes."

The National Archives' PRONOM site provides on-line information about data file formats and their supporting software products, as well as their multi-platform DROID (Digital Record Object Identification) software.

The following individuals have given me updates or suggestions for this list over the years: Devon Ackerman, Ansh Aggarwal, Nazim Aliyev, Justin Almanza, Marco Barbieri, Vladimir Benko, Arvin Bhatnagar, Jim Blackson, Keith Blackwell, Alex Boschma, Sam Brothers, David Burton, Alex Caithness, Erik Campeau, Bjrn Carlin, Tim Carver, Michael D Cavalier, Per Christensson, Oscar Choi, JMJ.Conseil, Jesse Cooper, Jesse Corwin, Mike Daniels, David DeBrota, Cornelis de Groot, Jeffrey Duggan, Tony Duncan, Jon Eldridge, Ehsan Elhampour, Jean-Pierre Fiset, Peter Almer Frederiksen, Tim Gardner, Mark Gonyea, Chris Griffith, Linda Grody, Andis Grosšteins, Paulo Guzmn, Rich Hanes, George Harpur, Brian High, Eric Huber, Alexander Hbert, John Hughes, Allan Jensen, Broadus Jones, Matthew Kelly, Axel Kesseler, Nick Khor, Shane King, Art Kocsis, Thiemo Kreuz, Bill Kuhns, Evgenii Kustov, Andreas Kyrmegalos, Glenn Larsson, Jeremy Lloyd, Anand Mani, Kevin Mansell, Nevena Markovi&cacute;, Davyd McColl, Par Osterberg Medina, Michal, Sergey Miklin, David Millard, Bruce Modick, Lee Nelson, Mart Oskamp, Dan P., Jorge Paulhiac, Carlo Politi, Seth Polley, Hedley Quintana, Anthony Rabon, Stanley Rainey, Cory Redfern, Bruce Robertson, Ben Roeder, Thomas Rsner, Gerd, Rthig, Gaurav Sehgal, Andy Seitz, Anli Shundi, Erik Siers, Philip Smith, Mike Sutton, Matthias Sweertvaegher, Tobiasz &Sacute;wiatlowski, Frank Thornton, Erik van de Burgwal, yvind Walding, Jason Wallace, Daniel Walton, Franklin Webber, Bernd Wechner, Douglas White, Mike Wilkinson, Gavin Williams, Sean Wolfinger, David Wright, Yuna, and Shaul Zevin. I thank them and apologize if I have missed anyone.

I would like to give particular thanks to Danny Mares of Mares and Company, author of the MaresWare Suite (primarily for the "subheaders" for many of the file types here), and the people at X-Ways Forensics for their permission to incorporate their lists of file signatures.

Finally, Dr. Nicole Beebe from The University of Texas at San Antonio posted samples of more than 32 file types at the Digital Corpora, which I used for verification and additional signatures. These files were used to develop the Sceadan File Type Classifier. The file samples can be downloaded from the Digital Corpora website.

All information on this page 2002-document.write(new Date().getFullYear()), Gary C. Kessler. Permission to use the material here is extended to any of this page's visitors, as long as appropriate attribution is provided and the information is not altered in any way without express written permission of the author.

The only thing I can find online suggests this is caused by compiling a .py -> .pyc file and then trying to use it with the wrong version of python. In my case, however, the file seems to import fine some times but not others, and I'm not sure why.

If they are not yours, and the original py files are not provided, you'll have to either get the py files for re-compilation, or use an interpreter that can run the pyc files with that particular magic value.

One thing that might be causing the intermittent nature. The pyc that's causing the problem may only be imported under certain conditions. It's highly unlikely it would import sometimes. You should check the actual full stack trace when the import fails.

Open calc and convert its display mode to Programmer (Scientific in XP) to see Hex and Decimal conversion. Select "Hex" from Radio button. Enter values as second byte first and then the first byte i.e f303 Now click on "Dec" (Decimal) radio button. The value displayed is one which is correspond to the magic number aka version of python.

This can also be due to missing __init__.py file from the directory. Say if you create a new directory in Django for separating the unit tests into multiple files and place them in one directory then you also have to create the __init__.py file beside all the other files in new created test directory. otherwise it can give error like:

I had a strange case of Bad Magic Number error using a very old (1.5.2) implementation. I generated a .pyo file and that triggered the error. Bizarrely, the problem was solved by changing the name of the module. The offending name was sms.py. If I generated an sms.pyo from that module, Bad Magic Number error was the result. When I changed the name to smst.py, the error went away. I checked back and forth to see if sms.py somehow interfered with any other module with the same name but I could not find any name collision. Even though the source of this problem remained a mistery for me, I recommend trying a module name change.

This can also happen if you have the wrong python27.dll file (in case of Windows), to solve this just re-install (or extract) python with the exact corresponding dll version. I had a similar experience.

I just faced the same issue with Fedora26 where many tools such as dnf were broken due to bad magic number for six.For an unknown reason i've got a file /usr/bin/six.pyc, with the unexpected magic number. Deleting this file fix the problem

I looked into some forums and understood that this could be a problem of R version.
While saving the file, I did not do anything different: I simply used save button in the Environment space and replaced the already existing file.
I tried using ReadRDS, readr but no help.

Likely it's a file that got misnamed with the wrong extension. The magic number should start with "RDA2" or something like that. File format magic numbers are traditionally the first bytes of a file. Seeing something that looks like a fragment of a column name leads one to believe that it's not an RData file, some plain text that begins with "Featu"

Is the first column of the data named "Feature"? It might be an issue of a someone using write() instead of save() to serialize a data frame to disk. If that is the case, read.csv() or read.delim() would be appropriate.

I don't quite understand what caused this.. (you wrote " It might be an issue of a someone using write() instead of save() to serialize a data frame to disk" I have to read to understand this) but I saved the files like this (save button in 'environment' space) before without any problem.