Awk script to convert HEX to binary ?

Jean-Paul Iribarren

unread,

Dec 27, 2011, 1:12:07 PM12/27/11

to

Hi awk experts!

I've been trying to write something by myself, but no success so far, so
I consider the pragmatic approach: ask people who now. Here is the
story: I am dealing with a Linux-based embedded device where in some
situations the only connection to the outside world is the serial port
used for the console. No pppd available to set a comfortable PLIP-based
communication above the serial port. And I need to add some utilities to
this embedded devices...

Thus the idea: I would compile these utilities on my workstation, then
convert them to a very basic ".hex" ASCII format, using two chars for
each byte of the binary file (e.g. 0x2A -> "2A"), with LF separators
added every few characters for readability and convenience. Then I would
pipe the resultant ASCII file through the console to a (typically
awk-based) script running on the embedded device that would perform the
opposite conversion (e.g. 0x32/0x41 -> 0x2A) for each pair of ASCII
characters:

Hexdumps on the workstation:

- binary file: my_utility

[...]
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F |................|
[...]

- hex-encoded file: my_utility.hex

[...]
31 30 31 31 31 32 31 33 31 34 31 35 31 36 0a 31 |10111213141516.1|
37 31 38 31 39 31 41 31 42 31 43 31 44 0a 01 45 |718191A1B1C1D.1E|
[...]

On the embedded device:

~ # my_hex_to_bin_awk_script > my_utility
[copy-paste my_utility.hex to console on workstation]
[Ctrl-D]

===> script my_hex_to_bin_awk_script reads the incoming flow from stdin
(up to EOF) and rebuilds my_utility binary file. Simple, eh? Except that
so far I haven't been able to find the proper getline() / printf()
combination in awk to achieve what I want...

So, many thanks in advance for your suggestions!
--
JPI

Kenny McCormack

unread,

Dec 27, 2011, 1:58:31 PM12/27/11

to

In article <4efa0a76$0$30667$426a...@news.free.fr>,

Jean-Paul Iribarren <jpi.sp...@free.fr> wrote:
>Hi awk experts!
>
>I've been trying to write something by myself, but no success so far, so
>I consider the pragmatic approach: ask people who now. Here is the
>story: I am dealing with a Linux-based embedded device where in some
>situations the only connection to the outside world is the serial port
>used for the console. No pppd available to set a comfortable PLIP-based
>communication above the serial port. And I need to add some utilities to
>this embedded devices...

Warning: Alternatives/meta-based answer (i.e., not an answer to your question)

IMHO, the real question here is: What do I have on the primitive device and
what tools found therein can be used to leverage a solution? I'm actually
amazed that you seem to actually have a usable AWK on the primitive device.
And I suspect that by the time you get it (the tool-chain on the embedded
device) up to the level where you could actually do what you want to do,
you'll have already solved your problem. I have some experience with this -
where I have an embedded device that has an "awk" on it, but the awk is so
old and primitive that the first thing I did was to compile and install gawk
on it.

I also did something similar to this a million or so years ago, where I had
a PC with a serial port and no other usable communications devices (no
removable media). Luckily, the PC had DOS and GWBasic installed, and I was
able to cobble something together in GWBasic to communicate over the serial
line and move files back and forth.

Anyway, what this is all leading up to is: Does your device have rz/sz
installed? If not, I suggest that you get that going, one way or another,
then use that to move files henceforth. Of course, this doesn't answer the
question of how to get rz onto the machine.

P.S. I have deliberately avoided answering the AWK-based side of your
question, because I doubt that the primitive version of AWK that (I suspect)
you have on the device will be able to do what you need. GAWK can do it, of
course, but there you go...

--

Some of the more common characteristics of Asperger syndrome include:

* Inability to think in abstract ways (eg: puns, jokes, sarcasm, etc)
* Difficulties in empathising with others
* Problems with understanding another person's point of view
* Hampered conversational ability
* Problems with controlling feelings such as anger, depression
and anxiety
* Adherence to routines and schedules, and stress if expected routine
is disrupted
* Inability to manage appropriate social conduct
* Delayed understanding of sexual codes of conduct
* A narrow field of interests. For example a person with Asperger
syndrome may focus on learning all there is to know about
baseball statistics, politics or television shows.
* Anger and aggression when things do not happen as they want
* Sensitivity to criticism
* Eccentricity
* Behaviour varies from mildly unusual to quite aggressive
and difficult

Janis Papanagnou

unread,

Dec 27, 2011, 2:25:47 PM12/27/11

to

On 27.12.2011 19:12, Jean-Paul Iribarren wrote:
> Hi awk experts!
>
> I've been trying to write something by myself, but no success so far, so I
> consider the pragmatic approach: ask people who now. Here is the story: I am
> dealing with a Linux-based embedded device where in some situations the only
> connection to the outside world is the serial port used for the console. No
> pppd available to set a comfortable PLIP-based communication above the serial
> port. And I need to add some utilities to this embedded devices...
>
> Thus the idea: I would compile these utilities on my workstation, then convert
> them to a very basic ".hex" ASCII format, using two chars for each byte of the
> binary file (e.g. 0x2A -> "2A"), with LF separators added every few characters
> for readability and convenience. Then I would pipe the resultant ASCII file
> through the console to a (typically awk-based) script running on the embedded
> device that would perform the opposite conversion (e.g. 0x32/0x41 -> 0x2A) for
> each pair of ASCII characters:
>
> Hexdumps on the workstation:
>
> - binary file: my_utility
>
> [...]
> 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F |................|
> [...]
>
> - hex-encoded file: my_utility.hex
>
> [...]
> 31 30 31 31 31 32 31 33 31 34 31 35 31 36 0a 31 |10111213141516.1|
> 37 31 38 31 39 31 41 31 42 31 43 31 44 0a 01 45 |718191A1B1C1D.1E|

The last "01" should be a "31" I suppose?

> [...]
>
> On the embedded device:
>
> ~ # my_hex_to_bin_awk_script > my_utility
> [copy-paste my_utility.hex to console on workstation]
> [Ctrl-D]
>
> ===> script my_hex_to_bin_awk_script reads the incoming flow from stdin (up to
> EOF) and rebuilds my_utility binary file. Simple, eh? Except that so far I
> haven't been able to find the proper getline() / printf() combination in awk
> to achieve what I want...
>
> So, many thanks in advance for your suggestions!

I am not perfectly sure whether that is what you want...

There are a couple of possibilities. With a standard awk, for example,

awk '
BEGIN {
n = split ("30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46",h,",")
n = split ("0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F",l,",")
for (i=1; i<=n; i++) m[h[i]] = i
}
{ for (i=1; i<NF; i++) printf ("%s", l[m[$i]]) }
'

I've used upper case letters here for the hex digits (change as necessary).
The "\n" will be ignored. And note that the i<NF is no typo; it's to skip
the last field in the line (like the "|10111213141516.1|")

Janis

Jean-Paul Iribarren

unread,

Dec 28, 2011, 3:49:19 AM12/28/11

to

Le 27/12/2011 20:25, Janis Papanagnou a écrit :
> (...)

> The last "01" should be a "31" I suppose?

Yes, sorry, my mistake.

> (...)

> There are a couple of possibilities. With a standard awk, for example,
>
> awk '
> BEGIN {
> n = split ("30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46",h,",")
> n = split ("0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F",l,",")
> for (i=1; i<=n; i++) m[h[i]] = i
> }
> { for (i=1; i<NF; i++) printf ("%s", l[m[$i]]) }
> '

Hhmmm, that doesn't seem to work. Perhaps I didn't run it properly, the
command I have used is:

awk '
BEGIN {
n = split ("30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46",h,",")
n = split ("0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F",l,",")
for (i=1; i<=n; i++) m[h[i]] = i
}
{ for (i=1; i<NF; i++) printf ("%s", l[m[$i]]) }

' file.hex | hexdump -C

... but it doesn't output anything.

Nevermind, I have discovered a base64 utility at the deep end of my
embedded device, so I should be able to use it for such "file transfers".

Anyway, thank you for the try.
--
JPI

Jean-Paul Iribarren

unread,

Dec 28, 2011, 3:55:15 AM12/28/11

to

Le 27/12/2011 19:58, Kenny McCormack a écrit :
> (...)

> Anyway, what this is all leading up to is: Does your device have rz/sz
> installed? If not, I suggest that you get that going, one way or another,
> then use that to move files henceforth. Of course, this doesn't answer the
> question of how to get rz onto the machine.

Been there, done that in the good ol' times with rz/sz, xmodem or kermit
:-), but no, my device doesn't have these utilities installed.

On the other hand, I have just discovered a base64 utility at the deep
end of the filesystem of my device, so I can use it to decode
base64-encoded data sent from my workstation through the terminal. No
awk script required anymore!

Thank you for having answered.
--
JPI

Loki Harfagr

unread,

Dec 28, 2011, 6:36:43 AM12/28/11

to

Wed, 28 Dec 2011 09:49:19 +0100, Jean-Paul Iribarren did cat :

and if you want some fun (and have no bladi nor blada at hand) here's
some base64 code for playing with ;-)
---------
cat base64_in_awk.sh

### not yet complete nor compliant ;-)
###
### this is a simple base64 enc/dec in awk
### mostly made for fun but actually used in a few awk scripts I use
### in some other tools I wrote for fine grain analysis of
### texts, mainly emails, mainly spams to try and generate some
### synthetic regexps (or ideas of) regarding false positives or reinforcements.
### (and yes I know some tools exist in perl and I even use some of
### them which is another reason why I also do it in awk ;-)
### the wrapping is set at 72 like the 'mimencode' usage
### to avoid wrapping set ORS to nil.
###
Aargh(){
r=$1
shift
printf "\n%s\nThats all, folks...\n\n" "${@}"
exit $r
}

[ $# -gt 1 ] || Aargh 1 "something is direly in the unseen world"
### most used way by default, anyway as 2 parms are mandatory this is only belting the suspenders
WOT=${1:-d}
shift
### gawk -v wot=$WOT -v ORS='' '
### gawk -v wot=$WOT -v ORS='µ' '
gawk -v wot=$WOT '
function _ba64dec(_b64str,_BASE64,_wrap,_res,_ba,_by,_len,_i,_j)
{
_len=split(_b64str,_ba,"")
while (_i<=_len){
if( 0==(++_wrap) %72){++_i;continue}
### get the 4 _bytes values and find their position in BASE64 base
for(_j=1;_j<5;_j++){
_by[_j] = index(_BASE64, _ba[++_i])
_by[_j]--
}
### Reconstruct ASCII string
_res = _res sprintf( "%c", lshift(and(_by[1], 63), 2) + rshift(and(_by[2], 48), 4) )
_res = _res sprintf( "%c", lshift(and(_by[2], 15), 4) + rshift(and(_by[3], 60), 2) )
_res = _res sprintf( "%c", lshift(and(_by[3], 3), 6) + _by[4] )
gsub(/[\x00\xff\xbf\x0f]/,"",_res)
}
return _res
}
function _ord(_char, i)
{
while(++i<256) if (sprintf("%c", i) == _char) return i
}

function _ba64enc(_b64str,_BASE64,_wrap, _ba1,_ba2,_ba3,_ba4,_by1,_by2,_by3,_by4, _res)
{
while (length(_b64str) > 0){
### find the values
_by1 = _ord(substr(_b64str, 1, 1))
if (length(_b64str) == 1){
_by2 = 0
_by3 = 0
}
if (length(_b64str) == 2){
_by2 = _ord(substr(_b64str, 2, 1))
_by3 = 0
}
if (length(_b64str) >= 3){
_by2 = _ord(substr(_b64str, 2, 1))
_by3 = _ord(substr(_b64str, 3, 1))
}

### transform to BASE64 values
_ba1 = rshift(_by1, 2)
_ba2 = lshift(and(_by1, 3), 4) + rshift(and(_by2, 240), 4)
_ba3 = lshift(and(_by2, 15), 2) + rshift(and(_by3, 192), 6)
_ba4 = and(_by3, 63)

### transmute values to BASE64 string
_res = _res substr(_BASE64, _ba1 + 1, 1)
_res = _res substr(_BASE64, _ba2 + 1, 1)
if (length(_b64str) == 1){
_res = _res "=="
_b64str = ""
}
if (length(_b64str) == 2){
_res = _res substr(_BASE64, _ba3 + 1, 1)
_res = _res "="
_b64str = ""
}
if (length(_b64str) >= 3){
_res = _res substr(_BASE64, _ba3 + 1, 1)
_res = _res substr(_BASE64, _ba4 + 1, 1)
_b64str = substr(_b64str, 4)
}
if( 0==(++_wrap) %18) _res=_res ORS
}
return _res
}
BEGIN{_w=0}
{
### Base64 for filenames given as alternate example, see RFC4648
### _BASE64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
_BASE64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
print wot=="d"?_ba64dec($0,_BASE64,_w):_ba64enc($0,_BASE64,_w)
}
' ${@}
---------

Janis Papanagnou

unread,

Dec 28, 2011, 6:37:56 AM12/28/11

to

You've found a tool, that's fine. Here, just for the record, what you should
see using my code (with your test data embedded)...

$ cat hex2bin.sh

awk '
BEGIN {
n = split ("30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46",h,",")
n = split ("0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F",l,",")
for (i=1; i<=n; i++) m[h[i]] = i
}
{
for (i=1; i<NF; i++)
printf ("%s", l[m[$i]])
}

' << EOT

31 30 31 31 31 32 31 33 31 34 31 35 31 36 0a 31 |10111213141516.1|

37 31 38 31 39 31 41 31 42 31 43 31 44 0a 31 45 |718191A1B1C1D.1E|
EOT

$ ksh hex2bin.sh
101112131415161718191A1B1C1D1E

Janis

Kenny McCormack

unread,

Dec 28, 2011, 9:20:18 AM12/28/11

to

In article <4efad988$0$26329$426a...@news.free.fr>,

Well, I would still suggest that the first (and, effectively, the last) thing
you do with your newly found tool is to use it to get rz/sz onto the box.

>Thank you for having answered.

Thanks - it was fun to write.

--
Religion is regarded by the common people as true,
by the wise as foolish,
and by the rulers as useful.

(Seneca the Younger, 65 AD)

pk

unread,

Jan 1, 2012, 2:07:36 PM1/1/12

to

One way is to write your own hex2dec() function (trivial), then do
something like

awk '{for(i=1;i<length;i+=2)printf "%c", hex2dec(substr($0,i,2))}' my_utility.hex > my_utility

GNU awk (which I guess isn't what you have on your embedded system) has a
--non-decimal-data command line switch to automatically recognize, er, non
decimal numeric data, so that would make it a bit simpler.

Ed Morton

unread,

Jan 3, 2012, 12:43:16 PM1/3/12

to

Janis Papanagnou <janis_pa...@hotmail.com> wrote:
<snip>

> n = split ("0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F",l,",")

<snip>

> { for (i=1; i<NF; i++) printf ("%s", l[m[$i]]) }

Just a suggestion - don't use the letter "l" for variable names as it
looks too much like the number "1" in some fonts. Right now with the
browser I'm using I can't see any difference at all between 1 and l.

Ed.

Posted using www.webuse.net

Janis Papanagnou

unread,

Jan 3, 2012, 12:57:17 PM1/3/12

to

On 03.01.2012 18:43, Ed Morton wrote:
> Janis Papanagnou <janis_pa...@hotmail.com> wrote:
> <snip>
>> n = split ("0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F",l,",")
> <snip>
>> { for (i=1; i<NF; i++) printf ("%s", l[m[$i]]) }
>
> Just a suggestion - don't use the letter "l" for variable names as it
> looks too much like the number "1" in some fonts.

I know. With the fonts that I use I have no problems, and I remembered
about potential visibility problems just after I had sent the posting.

> Right now with the
> browser I'm using I can't see any difference at all between 1 and l.

I am sorry for the inconvenience.

Janis

>
> Ed.
>
> Posted using www.webuse.net

Kaz Kylheku

unread,

Jan 3, 2012, 6:40:21 PM1/3/12

to

On 2011-12-27, Jean-Paul Iribarren <jpi.sp...@free.fr> wrote:
> Hi awk experts!
>
> I've been trying to write something by myself, but no success so far, so
> I consider the pragmatic approach: ask people who now. Here is the
> story: I am dealing with a Linux-based embedded device where in some
> situations the only connection to the outside world is the serial port
> used for the console. No pppd available to set a comfortable PLIP-based
> communication above the serial port. And I need to add some utilities to
> this embedded devices...

The only utility you need to transport this way is the binary executable
of "rz", the Zmodem receive program. Then use that to upload everything
else.

Is there any possibility that you can upgrade the base image of this Linux
system to include these Zmodem utilities, so they are already there when you
need to upload some one-off thing?

> Hexdumps on the workstation:

Does the Linux host have no utility for decoding base64? No uudecode?

> - binary file: my_utility
>
> [...]
> 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F |................|
> [...]

The ASCII column is useless for the purpose. I would be more useful
to give yourself a hex address at the far left so that you know which
line is which, if you ever get confused.

00001F20 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F

But wait! A format very similar to this is already put out by an existing
utility: od, with arguments -tx1. More precisely "od -vtx1". The -v is needed
so that od does not condenze all-zero regions by printing lines consisting of
just an asterisk.

$ od -vtx1 file > file.hex

Speaking of which, do you have a decompression utility on the embedded system?
It would be silly to cut and paste the hexdump of an uncompressed binary
where you do have reams of zeros.

To get "rz" onto the system, I'd strip all symbols and compress it first.

> so far I haven't been able to find the proper getline() / printf()
> combination in awk to achieve what I want...

There is already a printf utility, so you don't need awk.

#!/bin/bash
# reverse a dump produced by "od -vtx1"

if [ $# != 2 ] ; then
echo "read the source code of $0 for usage"
exit 1
fi

infile=$1
outfile=$2

while read address bytes ; do
set -- $bytes
for hexpair in "$@" ; do
printf "\x$hexpair"
done
done < "$infile" > "$outfile"

This example requires bash; if you have some stripped down shell like "ash", it
probably won't work because the printf won't do the hex escapes.

E.g. on Ubuntu:

bash $ printf "\x41\n"
A
$ dash
dash $ printf "\x41\n"
\x41

If you have the standalone printf from GNU coreutils, that yields a workaround:

dash $ /usr/bin/printf "\x41\n"
A