Using awk to parse binary files: help me please.

Christian Ferrari

unread,

Apr 17, 2001, 6:06:19 AM4/17/01

to

Has someone already used awk to parse binary files?

The problem: read fixed length records (for example 80 chars) without
record separator.

Thank you in advance.

Christian

Dan Mercer

unread,

Apr 17, 2001, 8:58:50 AM4/17/01

to

In article <3ADC159B...@primeur.com>,

You can use dd to unblock the records - it will truncate trailing spaces:

$ cat block
012345678 012345678 012345678 012345678 012345678 012345678 ...
(240 bytes long)
$ dd if=block cbs=10 conv=unblock
0+1 records in
0+1 records out
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678
012345678

--
Dan Mercer
dame...@mmm.com

Opinions expressed herein are my own and may not represent those of my employer.

Jim Monty

unread,

Apr 17, 2001, 12:29:04 PM4/17/01

to

Christian Ferrari <Christia...@primeur.com> wrote:
> Has someone already used awk to parse binary files?

I have not.

> The problem: read fixed length records (for example 80 chars) without
> record separator.

Awk is a text processing language. Use Perl or C for binary data.

Just curious... Are you SURE these are binary files? Can you describe
their binary-ness exactly? From your brief problem description, I
infer that you have fixed-length data records and I suspect that
this data is partly corrupted with NULs. If this is the case, you
could use tr to fix the corruption, dd to turn the fixed-length
data records into one-line-per-record ASCII text files, then awk
to parse the records. But this is a wild guess.

--
Jim Monty
mo...@primenet.com
Tempe, Arizona USA

Patrick TJ McPhee

unread,

Apr 18, 2001, 1:25:31 AM4/18/01

to

In article <3ADC159B...@primeur.com>,
Christian Ferrari <Christia...@primeur.com> wrote:
% Has someone already used awk to parse binary files?

Standard awk is not the right tool for this purpose.

On the other hand, gawk handles binary files well, and without introducing
a new, awful language in the process. For your particular purpose,
there is a special variable called FIELDWIDTHS, which is a list of integers.
If you set it, gawk's processing of input changes from the standard awk
delimited record/delimited field approach, to a fixed-length record
approach.

For instance, according to /usr/include/elf.h, an elf executable starts
with a 4-byte magic number, a byte giving the class of machine, a byte
giving the data format, a byte giving the ELF version, a byte of padding,
8 bytes of brand information, and various other crap. You could set
FIELDWIDTHS to
FIELDWIDTHS = "4 1 1 1 1 8"

and then parse this much of the file header with

FNR == 1 { if ($1 == "\177ELF") print "elf file"
else { print "not elf file"; nextfile }
if ($2 == "\001") print "32-bit"
else if ($2 == "\002") print "64-bit (2x as good as 32-bit)"
else print "you've got some odd kind of bits here"

if ($3 == "\001") print "little endian"
else if ($3 == "\002") print "big endian"
else print "medium endian"

if ($4 == "\001") print "this is the only elf version I know about"
else print "you've got a wacky elf version going here"

print "brand", $6

nextfile
}

[note that nextfile is a gawk language extension]
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com