Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

detect endianness of a binary with python

747 views
Skip to first unread message

Holger brunck

unread,
Jul 21, 2010, 8:57:28 AM7/21/10
to pytho...@python.org
Hi all,
I use python 2.5 and I am looking for a possibility to determine a file type.
Especially the endianness of a file is needed for me. Is there a way to detect
this easily in python? Something like the "file" utility for linux would be very
helpfull.

Any help is appreciated.

Best regards
Holger Brunck

Grant Edwards

unread,
Jul 21, 2010, 10:02:16 AM7/21/10
to
On 2010-07-21, Holger brunck <holger...@keymile.com> wrote:

> I use python 2.5 and I am looking for a possibility to determine a
> file type. Especially the endianness of a file is needed for me. Is
> there a way to detect this easily in python?

Only if you already know what's going to be in the file.

> Something like the "file" utility for linux would be very helpfull.
>
> Any help is appreciated.

You're going to have to describe in detail what's in the file before
anybody can help.

--
Grant Edwards grant.b.edwards Yow! A shapely CATHOLIC
at SCHOOLGIRL is FIDGETING
gmail.com inside my costume..

Michael Torrie

unread,
Jul 21, 2010, 10:44:57 AM7/21/10
to pytho...@python.org
On 07/21/2010 08:02 AM, Grant Edwards wrote:
> On 2010-07-21, Holger brunck <holger...@keymile.com> wrote:
>
>> I use python 2.5 and I am looking for a possibility to determine a
>> file type. Especially the endianness of a file is needed for me. Is
>> there a way to detect this easily in python?
>
> Only if you already know what's going to be in the file.
>
>> Something like the "file" utility for linux would be very helpfull.
>>
>> Any help is appreciated.
>
> You're going to have to describe in detail what's in the file before
> anybody can help.

There is a python module called "magic" that uses the same engine as
file to determine a file type. It's part of the "find" source code:

http://www.darwinsys.com/file/

On Fedora I can just yum install python-magic to get it.

Holger brunck

unread,
Jul 21, 2010, 11:29:14 AM7/21/10
to pytho...@python.org

>> Something like the "file" utility for linux would be very helpfull.
>>
>> Any help is appreciated.

>You're going to have to describe in detail what's in the file before
>anybody can help.

We are creating inside our buildsystem for an embedded system a cram filesystem
image. Later on inside our build process we have to check the endianness,
because it could be Little Endian or big endian (arm or ppc).

The output of the "file" tool is for a little endian cramfs image:
<ourImage>: Linux Compressed ROM File System data, little endian size 1875968
version #2 sorted_dirs CRC 0x8721dfc0, edition 0, 462 blocks, 10 files

It would be possible to execute
ret = os.system("file <ourImage> | grep "little endian")
and evaluate the return code.
But I don't like to evaluate a piped system command. If there is an way without
using the os.system command this would be great.

Best regards
Holger

Thomas Jollans

unread,
Jul 21, 2010, 3:06:38 PM7/21/10
to pytho...@python.org

Files don't, as such, have a detectable endianess. 0x23 0x41 could mean
either 0x4123 or 0x2341 - there's no way of knowing.

The "file" utility also doensn't really know about endianess (well,
maybe it does swap bytes here and there, but that's an implementation
detail) - it just knows about file types. It knows what a little-endian
cramfs image looks like, and what a big-endian cramfs image looks like.
And as they're different, it can tell them apart.

If you're only interested in a couple of file types, it shouldn't be too
difficult to read the first few bytes/words with the struct module and
apply your own heuristics. Open the files in question in a hex editor
and try to figure out how to tell them apart!

MRAB

unread,
Jul 21, 2010, 3:54:34 PM7/21/10
to pytho...@python.org

If you have control over the file format then you could ensure that
there's a double-byte value such as 0xFF00 at a certain offset. That
will tell you the endianness of the file.

Grant Edwards

unread,
Jul 21, 2010, 10:31:54 PM7/21/10
to
On 2010-07-21, Thomas Jollans <tho...@jollans.com> wrote:

>> It would be possible to execute ret = os.system("file <ourImage> |
>> grep "little endian") and evaluate the return code. But I don't like
>> to evaluate a piped system command. If there is an way without using
>> the os.system command this would be great.
>
> Files don't, as such, have a detectable endianess. 0x23 0x41 could mean
> either 0x4123 or 0x2341 - there's no way of knowing.
>
> The "file" utility also doensn't really know about endianess (well,
> maybe it does swap bytes here and there, but that's an implementation
> detail) - it just knows about file types. It knows what a little-endian
> cramfs image looks like, and what a big-endian cramfs image looks like.
> And as they're different, it can tell them apart.
>
> If you're only interested in a couple of file types, it shouldn't be too
> difficult to read the first few bytes/words with the struct module and
> apply your own heuristics. Open the files in question in a hex editor
> and try to figure out how to tell them apart!

And by looking at the rules that "file" uses for the two file types
that matter, one should be able to figure out how to implement
something in Python. Or one can use the Python "magic" module as
previously suggested: http://pypi.python.org/pypi/python-magic/

--
Grant

Daniel Fetchinson

unread,
Jul 22, 2010, 5:38:23 AM7/22/10
to Python
>>> Something like the "file" utility for linux would be very helpfull.
>>>
>>> Any help is appreciated.
>
>>You're going to have to describe in detail what's in the file before
>>anybody can help.
>
> We are creating inside our buildsystem for an embedded system a cram
> filesystem
> image. Later on inside our build process we have to check the endianness,
> because it could be Little Endian or big endian (arm or ppc).
>
> The output of the "file" tool is for a little endian cramfs image:
> <ourImage>: Linux Compressed ROM File System data, little endian size
> 1875968
> version #2 sorted_dirs CRC 0x8721dfc0, edition 0, 462 blocks, 10 files
>
> It would be possible to execute
> ret = os.system("file <ourImage> | grep "little endian")
> and evaluate the return code.
> But I don't like to evaluate a piped system command. If there is an way
> without
> using the os.system command this would be great.
>

Please see http://pypi.python.org/pypi/python-magic

HTH,
Daniel


--
Psss, psss, put it down! - http://www.cafepress.com/putitdown

Tim Roberts

unread,
Jul 23, 2010, 1:44:55 AM7/23/10
to
Holger brunck <holger...@keymile.com> wrote:
>
>We are creating inside our buildsystem for an embedded system a cram filesystem
>image. Later on inside our build process we have to check the endianness,
>because it could be Little Endian or big endian (arm or ppc).
>
>The output of the "file" tool is for a little endian cramfs image:
><ourImage>: Linux Compressed ROM File System data, little endian size 1875968
>version #2 sorted_dirs CRC 0x8721dfc0, edition 0, 462 blocks, 10 files
>
>It would be possible to execute
>ret = os.system("file <ourImage> | grep "little endian")
>and evaluate the return code.

I wouldn't use os.system with grep and evaluate the return code. Instead
I'd use subprocess.Popen("file <ourImage>") and read the text output of the
commdn directly. By parsing that string, I can extract all kinds of
interesting information.

That is an entirely Unix-like way of doing things. Don't reinvent the
wheel when there's a tool that already does what you want.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Robert Kern

unread,
Jul 23, 2010, 11:44:51 AM7/23/10
to pytho...@python.org
On 7/23/10 12:44 AM, Tim Roberts wrote:
> I wouldn't use os.system with grep and evaluate the return code. Instead
> I'd use subprocess.Popen("file<ourImage>") and read the text output of the
> commdn directly. By parsing that string, I can extract all kinds of
> interesting information.

Small correction: subprocess.Popen(["file", our_image_filename])

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

0 new messages