Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using struct to read binary files

46 views
Skip to first unread message

mercado

unread,
Nov 26, 2009, 8:59:29 PM11/26/09
to pytho...@python.org
Hello,

I am writing a Python program to read the Master Boot Record (MBR),
and I'm having trouble because I have no previous experience reading
binary files.

First off, I've written a binary file containing the MBR to disk using
the following command:
sudo dd if=/dev/sda of=/tmp/mbrcontent bs=1 count=512

Then I want to read the MBR file and parse out its information. For
example, I know that bytes 454-457 contain the address of the first
sector of the first partition, and bytes 458-461 contain the number of
sectors in the first partition (see
http://en.wikipedia.org/wiki/Master_boot_record for more information
on the structure of the MBR).

So far what I have is this:

--------------------------------------------------------------------------------
f = open("/tmp/mbrcontent", "rb")
contents = f.read()
f.close()

firstSectorAddress = contents[454:458]
numSectors = contents[458:462]
--------------------------------------------------------------------------------

On my machine, the bytes contained in firstSectorAddress are
"\x3F\x00\x00\x00", and the bytes contained in numSectors are
"\x20\x1F\x80\x01". I know from doing a pen and paper calculation
that the first sector address is 63, and the number of sectors is
25,173,792 (the numbers are stored in little-endian format).

How do I figure that out programmatically? I think that I can use
struct.unpack() to do this, but I'm not quite sure how to use it.

Can anybody help me out? Thanks in advance.

Gabriel Genellina

unread,
Nov 26, 2009, 10:22:37 PM11/26/09
to pytho...@python.org
En Thu, 26 Nov 2009 22:59:29 -0300, mercado <python...@gmail.com>
escribi�:

You're almost done:

py> import struct
py> firstSectorAddress = "\x3F\x00\x00\x00"
py> struct.unpack("<L", firstSectorAddress)
(63,)
py> struct.unpack("<L", "\x20\x1F\x80\x01")
(25173792,)

--
Gabriel Genellina

Tim Chase

unread,
Nov 26, 2009, 10:22:47 PM11/26/09
to mercado, pytho...@python.org
> --------------------------------------------------------------------------------
> f = open("/tmp/mbrcontent", "rb")
> contents = f.read()
> f.close()
>
> firstSectorAddress = contents[454:458]
> numSectors = contents[458:462]
> --------------------------------------------------------------------------------
>
> On my machine, the bytes contained in firstSectorAddress are
> "\x3F\x00\x00\x00", and the bytes contained in numSectors are
> "\x20\x1F\x80\x01". I know from doing a pen and paper calculation
> that the first sector address is 63, and the number of sectors is
> 25,173,792 (the numbers are stored in little-endian format).
>
> How do I figure that out programmatically? I think that I can use
> struct.unpack() to do this, but I'm not quite sure how to use it.

To pull out little-endian data, you can use struct.unpack() like

>>> struct.unpack('<L', firstSectorAddress)
(63,)

>>> struct.unpack('<L', numSectors)
(25173792,)

It's a curious problem I haven't toyed with boot-sectors since my
ASM days back on a 286, so I threw together the following code
which you're welcome to tear apart and remash as you see fit.

-tkc

########################################################
import struct

mbr = file('mbrcontent', 'rb').read()
partition_table = mbr[446:510]
signature = struct.unpack('<H', mbr[510:512])[0]
little_endian = (signature == 0xaa55) # should be True
print "Little endian:", little_endian
PART_FMT = (little_endian and '<' or '>') + (
"B" # status (0x80 = bootable (active), 0x00 = non-bootable)
# CHS of first block
"B" # Head
"B" # Sector is in bits 5; bits 9 of cylinder are in bits 7-6
"B" # bits 7-0 of cylinder
"B" # partition type
# CHS of last block
"B" # Head
"B" # Sector is in bits 5; bits 9 of cylinder are in bits 7-6
"B" # bits 7-0 of cylinder
"L" # LBA of first sector in the partition
"L" # number of blocks in partition, in little-endian format
)
PART_SIZE = 16
fmt_size = struct.calcsize(PART_FMT)
# sanity check expectations
assert fmt_size == PART_SIZE, \
"Partition format string is %i bytes, not %i" % (
fmt_size, PART_SIZE)

def cyl_sector(sector_cyl, cylinder7_0):
sector = sector_cyl & 0x1F # bits 5-0

# bits 7-6 of sector_cyl contain bits 9-8 of the cylinder
cyl_high = (sector_cyl >> 5) & 0x03
cyl = (cyl_high << 8) | cylinder7_0
return sector, cyl

for partition in range(4):
print "Partition #%i" % partition,
offset = PART_SIZE * partition
(
status,
start_head, start_sector_cyl, start_cyl7_0,
part_type,
end_head, end_sector_cyl, end_cyl7_0,
lba,
blocks
) = struct.unpack(
PART_FMT,
partition_table[offset:offset + PART_SIZE]
)
if status == 0x80:
print "Bootable",
elif status:
print "Unknown status [%s]" % hex(status),
print "Type=0x%x" % part_type
start = (start_head,) + cyl_sector(
start_sector_cyl, start_cyl7_0)
end = (end_head,) + cyl_sector(
end_sector_cyl, end_cyl7_0)
print " (Start: Heads:%i\tCyl:%i\tSect:%i)" % start
print " (End: Heads:%i\tCyl:%i\tSect:%i)" % end
print " LBA:", lba
print " Blocks:", blocks

mercado

unread,
Nov 27, 2009, 9:39:24 AM11/27/09
to pytho...@python.org
Thanks Tim and Gabriel! That was exactly what I needed
0 new messages