Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

finding hex sequence in binary file

12 views

Skip to first unread message

Gregory Bloom

unread,

Sep 14, 2001, 7:24:37 PM9/14/01

Is there a simple command in UNIX that can find the offset of all matches of
a hexadecimal sequence within a binary file?

Bonus question: Is there a regex way to identify hex sequence offsets using
embedded wildcards?
i.e. 'hexgrep "AF8B*CCEE" myfile' might produce
0277: AF8B0014CCEE
0588: AF8BA7B901CCEE

laura fairhead

unread,

Sep 16, 2001, 10:58:00 AM9/16/01

On Fri, 14 Sep 2001 17:24:37 -0600, "Gregory Bloom" <gjbloom...@yahoo.com> wrote:

>Is there a simple command in UNIX that can find the offset of all matches of
>a hexadecimal sequence within a binary file?
>

No, at least not amongst the standard utilities.

>Bonus question: Is there a regex way to identify hex sequence offsets using
>embedded wildcards?
>i.e. 'hexgrep "AF8B*CCEE" myfile' might produce
>0277: AF8B0014CCEE
>0588: AF8BA7B901CCEE

I don't really understand what you mean here but I've written
a 'hexgrep' shell script you could use to do this sort of thing.

UNIX falls flat on it's face w.r.t binary file operations really
but you can use 'od -txC -An' to dump binary as hex whence it
can be more easily be operated upon (because now is text).

A very crude but often useful search in a binary is just;
od -txC file | grep 'XX XX'

But that has the problem that for more than one byte it won't
work if the sequence happens to be wrapped across lines.

This utility uses 'awk' to buffer the lines so that a search
will be able to match almost arbitarily long patterns.
The buffer is clipped at BUFMAX characters (not bytes) which
should be set depending on how big your version of 'awk' allows
strings to be (the buffer can actually grow about a line bigger
than this), it will also set a maximum 'span' for the entire
expression match.

The pattern from the command line is matched using awk's 'sub'
function and consequently should be an ERE. The expression
should match hex 2 digit values seperated by a space character,
so your example above would be ;

hexgrep "AF 8B.*CC EE" file

.* is necessary because its an ERE, although the program could
be changed to reformat the commandline pattern from something
more to your taste into the ERE, but because it is a regular
expression you can do lots of things. this should work on almost
any bourne shell with an 'od -txC' (with SunOS you might have
to change 'awk' to 'nawk');

#!/bin/sh
BUFMAX=2048
[ $# -ne 2 ] && { echo "usage: `basename $0` pattern file" >&2;exit 1;}
EXPR=`echo "$1" |tr '[a-f]' '[A-F]'`
od -An -txC "$2" |sed 's/.//' |tr '[a-f]' '[A-F]' |
awk '
{
buf=buf$0" "
while(sub("'"$EXPR"'",":&:",buf))
{
i=index(buf,":")
buf=substr(buf,i+1)
j=index(buf,":")
printf "%08X:%s\n",base+(i-1)/3,substr(buf,1,j-1)
buf=substr(buf,j+2)
base+=(i-1+j)/3
}
if((i=int((length(buf)-'$BUFMAX')/3))>0)
{
buf=substr(buf,i*3+1);base+=i
}
}'

byefrom

--
: ${L} # http://lf.8k.com:80

0 new messages