how to read binary data file?

Grant

unread,

Oct 26, 2009, 10:51:53 PM10/26/09

to

I want to read binary data file, fixed field widths, but gawk is
not converting a character to number for me, what did I forget?

$ printf "%s" 4321 > number_file

$ xxd number_file
0000000: 3433 3231 4321

$ gawk 'BEGIN{ FS="" }; \
{ x = $1; \
print x, x+0; \
y = (($4 * 256 + $3) * 256 + $2) * 256 + $1; \
printf"%08x\n",y
}' number_file
4 4
01020304

Expected output:
4 52
31323334

How convert a char type to number type?

(for example '1' == 0x31 = 49)

Thanks,
Grant.
--
http://bugsplatter.id.au

Janis Papanagnou

unread,

Oct 26, 2009, 11:53:59 PM10/26/09

to

Grant wrote:
> I want to read binary data file, fixed field widths, but gawk is
> not converting a character to number for me, what did I forget?

http://www.gnu.org/software/gawk/manual/gawk.html#Ordinal-Functions

Janis

Grant

unread,

Oct 27, 2009, 1:53:43 AM10/27/09

to

On Tue, 27 Oct 2009 04:53:59 +0100, Janis Papanagnou <janis_pa...@hotmail.com> wrote:

>Grant wrote:
>> I want to read binary data file, fixed field widths, but gawk is
>> not converting a character to number for me, what did I forget?
>
>http://www.gnu.org/software/gawk/manual/gawk.html#Ordinal-Functions
>
>Janis

Thank you! I thought gawk could do it :)

Grant.
--
http://bugsplatter.id.au

Ted Davis

unread,

Oct 27, 2009, 2:35:14 PM10/27/09

to

This is part of a script to widen lines in a BMP file - the code to read
the file into an array and the function to convert bytes to scalar numbers
are included; the rest of the program is omitted.

function Bytes2Number( String, x, y, z, Number ) {
# This function converts byte strings (binary numbers) into their
#corresponding numeric strings so that they can be processed as gawk
numbers.
# The lookup table (CharString) is a global variable.
# This code assumes that binary numbers are big-endian (most significant
# byte first)- it is up to the calling program to order the bytes.

# On the first use, the (global) LUT is created, then left for later use. It
# consists of a list of characters from \000 to \777 in order - the (index
# value minus 1) of a character multiplied by the power of 256 corresponding
# to its position in the string is the byte's numerical weight. The function
# doesn't care about the length of the byte string (within the integer limits
# of the gawk version and port).
if( !CharString ) {
for( x = 0; x <= 255; x++ ) CharString = CharString sprintf( "%c", x )
}
x = split( String, Scratch, "" )
Number = 0
for( y = 1; y <= x; y++ ) {
z = index( CharString, Scratch[ y ] ) -1

Number = Number + z * (256^(x - y))
}
return Number # Note that Number is a regular gawk scalar variable.
}

BEGIN{
# It is necessary to tell gawk to read/write the file as binary, especially under
# Windows where ^Z in files is a killer. Setting BINMODE to 3 will also work,
# but it throws error messages.
BINMODE = "rw"
# Setting FS to null causes gawk to make each byte a separate field.
FS= ""
# The next two lines are not strictly necessary - there are here for clarity.
Header = ""
ByteCount = 0
# Testing indicates that, in Windows at least, it is necessary to specify RS, even though
# it would appear redundant to set it to \n - not doing so results in 0A0D being
# replaced with 0A in the output, with the loss of one byte for each occurance.
# The value is arbitrary - it has been tested using one of the line colors.
RS = "\n"
}
{
# Read the file into an array. If there are multiple lines, that is, if RS appears
# in the file, insert the record separator back into the array at the end of
# each line for which RT exists.
for( x = 1; x <= NF; x++ ) Bytes[ ++ByteCount ] = $(x)
if( RT ) { Bytes[ ++ByteCount ] = RT }
}

Some lines probably wrapped.

I don't remember if I tested that under Linux ... I think I did.

--

T.E.D. (tda...@mst.edu)

Grant

unread,

Oct 27, 2009, 3:08:01 PM10/27/09

to

Too late :)

I already reached:

!read_file {
for (i = 1; i <= NF; i++) { data[offset++] = $i }
data[offset++] = "\n"
next
}

Don't hit me for the offset++, data area starts at file offset 8.

And discovered the technique is way too slow reading the file a byte
at a time -- data file I'm looking at is ~8MB, the freakin' index
appears after the variable length record data area and I think more
playing with this particular project is impractical.

I was exploring feasibility of reading binary data, but for what
I'm looking at, awk is not the answer -- unless gawk can do file
random access, and I don't recall seeing fseek(), fread() and friends
in gawk.
>
>Some lines probably wrapped.

Not here, line wrap is under my control.

>
>I don't remember if I tested that under Linux ... I think I did.

Thanks for your info -- you posted your binary file reader before
and I forgot all about it -- didn't seem useful to me at the time
-- isn't that true for lots of stuff? --> infomation overload.

Grant.
--
http://bugsplatter.id.au

Loki Harfagr

unread,

Oct 28, 2009, 2:56:28 PM10/28/09

to

Wed, 28 Oct 2009 06:08:01 +1100, Grant did cat :

More overload here, hope it'll help ;-)

sample test:
-------------
$ LC_ALL=C echo "@ABCDabcd" | awk -f byte2wot2byte
(64)[40](65)[41](66)[42](67)[43](68)[44](97)[61](98)[62](99)[63](100)[64]
-------------

sample tool:
-------------
$ cat byte2wot2byte

BEGIN{
while(++i<256) MCchar[i]=sprintf("%c",i)
while(--i){
MCval[MCchar[i]]=i
MXval[MCchar[i]]=sprintf("%x",i)
}
FS=""
}
{
j=0 ### foolproof
while(1+NF-(++j)){
printf "(%02d)",MCval[$j]
printf "[%02s]",MXval[$j]
}
}
END{print ""}
-------------

Grant

unread,

Oct 28, 2009, 3:43:48 PM10/28/09

to

So pretty :o)

A minor change to allow for 0..255 width? Certainly I'm not playing golf.

$ cat b2w2b

BEGIN{
while(++i<256) MCchar[i]=sprintf("%c",i)
while(--i){

MCval[MCchar[i]]=sprintf("%3d",i)
MXval[MCchar[i]]=sprintf(" %02x",i)
}
FS=""
}
{
j=0; printf "%s", " "
while(1+NF-(++j)){
printf " %s", $j
}
print ""
j=0; printf "%s", "d"
while(1+NF-(++j)){
printf " %s",MCval[$j]
}
print ""
j=0; printf "%s", "x"
while(1+NF-(++j)){
printf " %s",MXval[$j]
}
print ""
}
END{print ""}

$ LC_ALL=C echo "@ABCDabcd" | awk -f b2w2b
@ A B C D a b c d
d 64 65 66 67 68 97 98 99 100
x 40 41 42 43 44 61 62 63 64

Looks better to me :)

But your BEGIN block is very nice golf :)

Grant.
--
http://bugsplatter.id.au

Grant

unread,

Oct 28, 2009, 4:22:32 PM10/28/09

to

On Thu, 29 Oct 2009 06:43:48 +1100, Grant <g_r_a...@bugsplatter.id.au> wrote:

>A minor change to allow for 0..255 width? Certainly I'm not playing golf.

Maybe a little:

j=0; a=" "; d="d"; x="x"
while(1+NF-(++j)){
a=a sprintf(" %s", $j)
d=d sprintf(" %s",MCval[$j])
x=x sprintf(" %s",MXval[$j])
}
print a; print d; print x; print ""
}

$ LC_ALL=C echo "@ABCDabcd" | awk -f b2w2b
@ A B C D a b c d
d 64 65 66 67 68 97 98 99 100
x 40 41 42 43 44 61 62 63 64

Grant.
--
http://bugsplatter.id.au