Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to read binary data file?

656 views
Skip to first unread message

Grant

unread,
Oct 26, 2009, 10:51:53 PM10/26/09
to

I want to read binary data file, fixed field widths, but gawk is
not converting a character to number for me, what did I forget?

$ printf "%s" 4321 > number_file

$ xxd number_file
0000000: 3433 3231 4321

$ gawk 'BEGIN{ FS="" }; \
{ x = $1; \
print x, x+0; \
y = (($4 * 256 + $3) * 256 + $2) * 256 + $1; \
printf"%08x\n",y
}' number_file
4 4
01020304

Expected output:
4 52
31323334

How convert a char type to number type?

(for example '1' == 0x31 = 49)

Thanks,
Grant.
--
http://bugsplatter.id.au

Janis Papanagnou

unread,
Oct 26, 2009, 11:53:59 PM10/26/09
to
Grant wrote:
> I want to read binary data file, fixed field widths, but gawk is
> not converting a character to number for me, what did I forget?

http://www.gnu.org/software/gawk/manual/gawk.html#Ordinal-Functions

Janis

Grant

unread,
Oct 27, 2009, 1:53:43 AM10/27/09
to
On Tue, 27 Oct 2009 04:53:59 +0100, Janis Papanagnou <janis_pa...@hotmail.com> wrote:

>Grant wrote:
>> I want to read binary data file, fixed field widths, but gawk is
>> not converting a character to number for me, what did I forget?
>
>http://www.gnu.org/software/gawk/manual/gawk.html#Ordinal-Functions
>
>Janis

Thank you! I thought gawk could do it :)

Grant.
--
http://bugsplatter.id.au

Ted Davis

unread,
Oct 27, 2009, 2:35:14 PM10/27/09
to

This is part of a script to widen lines in a BMP file - the code to read
the file into an array and the function to convert bytes to scalar numbers
are included; the rest of the program is omitted.


function Bytes2Number( String, x, y, z, Number ) {
# This function converts byte strings (binary numbers) into their
#corresponding numeric strings so that they can be processed as gawk
numbers.
# The lookup table (CharString) is a global variable.
# This code assumes that binary numbers are big-endian (most significant
# byte first)- it is up to the calling program to order the bytes.

# On the first use, the (global) LUT is created, then left for later use. It
# consists of a list of characters from \000 to \777 in order - the (index
# value minus 1) of a character multiplied by the power of 256 corresponding
# to its position in the string is the byte's numerical weight. The function
# doesn't care about the length of the byte string (within the integer limits
# of the gawk version and port).
if( !CharString ) {
for( x = 0; x <= 255; x++ ) CharString = CharString sprintf( "%c", x )
}
x = split( String, Scratch, "" )
Number = 0
for( y = 1; y <= x; y++ ) {
z = index( CharString, Scratch[ y ] ) -1

Number = Number + z * (256^(x - y))
}
return Number # Note that Number is a regular gawk scalar variable.
}


BEGIN{
# It is necessary to tell gawk to read/write the file as binary, especially under
# Windows where ^Z in files is a killer. Setting BINMODE to 3 will also work,
# but it throws error messages.
BINMODE = "rw"
# Setting FS to null causes gawk to make each byte a separate field.
FS= ""
# The next two lines are not strictly necessary - there are here for clarity.
Header = ""
ByteCount = 0
# Testing indicates that, in Windows at least, it is necessary to specify RS, even though
# it would appear redundant to set it to \n - not doing so results in 0A0D being
# replaced with 0A in the output, with the loss of one byte for each occurance.
# The value is arbitrary - it has been tested using one of the line colors.
RS = "\n"
}
{
# Read the file into an array. If there are multiple lines, that is, if RS appears
# in the file, insert the record separator back into the array at the end of
# each line for which RT exists.
for( x = 1; x <= NF; x++ ) Bytes[ ++ByteCount ] = $(x)
if( RT ) { Bytes[ ++ByteCount ] = RT }
}


Some lines probably wrapped.

I don't remember if I tested that under Linux ... I think I did.

--

T.E.D. (tda...@mst.edu)

Grant

unread,
Oct 27, 2009, 3:08:01 PM10/27/09
to

Too late :)

I already reached:

!read_file {
for (i = 1; i <= NF; i++) { data[offset++] = $i }
data[offset++] = "\n"
next
}

Don't hit me for the offset++, data area starts at file offset 8.

And discovered the technique is way too slow reading the file a byte
at a time -- data file I'm looking at is ~8MB, the freakin' index
appears after the variable length record data area and I think more
playing with this particular project is impractical.

I was exploring feasibility of reading binary data, but for what
I'm looking at, awk is not the answer -- unless gawk can do file
random access, and I don't recall seeing fseek(), fread() and friends
in gawk.
>
>Some lines probably wrapped.

Not here, line wrap is under my control.


>
>I don't remember if I tested that under Linux ... I think I did.

Thanks for your info -- you posted your binary file reader before
and I forgot all about it -- didn't seem useful to me at the time
-- isn't that true for lots of stuff? --> infomation overload.

Grant.
--
http://bugsplatter.id.au

Loki Harfagr

unread,
Oct 28, 2009, 2:56:28 PM10/28/09
to
Wed, 28 Oct 2009 06:08:01 +1100, Grant did cat :

More overload here, hope it'll help ;-)

sample test:
-------------
$ LC_ALL=C echo "@ABCDabcd" | awk -f byte2wot2byte
(64)[40](65)[41](66)[42](67)[43](68)[44](97)[61](98)[62](99)[63](100)[64]
-------------

sample tool:
-------------
$ cat byte2wot2byte

BEGIN{
while(++i<256) MCchar[i]=sprintf("%c",i)
while(--i){
MCval[MCchar[i]]=i
MXval[MCchar[i]]=sprintf("%x",i)
}
FS=""
}
{
j=0 ### foolproof
while(1+NF-(++j)){
printf "(%02d)",MCval[$j]
printf "[%02s]",MXval[$j]
}
}
END{print ""}
-------------

Grant

unread,
Oct 28, 2009, 3:43:48 PM10/28/09
to

So pretty :o)

A minor change to allow for 0..255 width? Certainly I'm not playing golf.

$ cat b2w2b


BEGIN{
while(++i<256) MCchar[i]=sprintf("%c",i)
while(--i){

MCval[MCchar[i]]=sprintf("%3d",i)
MXval[MCchar[i]]=sprintf(" %02x",i)
}
FS=""
}
{
j=0; printf "%s", " "
while(1+NF-(++j)){
printf " %s", $j
}
print ""
j=0; printf "%s", "d"
while(1+NF-(++j)){
printf " %s",MCval[$j]
}
print ""
j=0; printf "%s", "x"
while(1+NF-(++j)){
printf " %s",MXval[$j]
}
print ""
}
END{print ""}

$ LC_ALL=C echo "@ABCDabcd" | awk -f b2w2b
@ A B C D a b c d
d 64 65 66 67 68 97 98 99 100
x 40 41 42 43 44 61 62 63 64

Looks better to me :)

But your BEGIN block is very nice golf :)

Grant.
--
http://bugsplatter.id.au

Grant

unread,
Oct 28, 2009, 4:22:32 PM10/28/09
to
On Thu, 29 Oct 2009 06:43:48 +1100, Grant <g_r_a...@bugsplatter.id.au> wrote:

>A minor change to allow for 0..255 width? Certainly I'm not playing golf.

Maybe a little:

j=0; a=" "; d="d"; x="x"
while(1+NF-(++j)){
a=a sprintf(" %s", $j)
d=d sprintf(" %s",MCval[$j])
x=x sprintf(" %s",MXval[$j])
}
print a; print d; print x; print ""
}

$ LC_ALL=C echo "@ABCDabcd" | awk -f b2w2b
@ A B C D a b c d
d 64 65 66 67 68 97 98 99 100
x 40 41 42 43 44 61 62 63 64

Grant.
--
http://bugsplatter.id.au

0 new messages