Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

number to binary / binary to number

36 views
Skip to first unread message

Digi

unread,
Apr 16, 2022, 8:49:07 AM4/16/22
to
what is "heavy" in gawk?

i may provide example of what i meaning:

if you're parsing some structures, for example or generate an graphic file or cutting audio file or any more - you will need to work with the numbers in it's binary forms:


thus "0123" is the 0x33323130 (hex: 30 31 32 33)

so to convert from number to it's binary form i will need something like:


BEGIN{
for ( i = 0; i < 256; i++ )
ASC[ CHR[ i ] = sprintf( "%.c", i ) ] = i

n = numtobin32( 0x1234567 )

print dump( n )

}

func numtobin32( n ) {

return CHR[ and( 0xFF, n ) ] \
CHR[ and( 0xFF, rshift( n, 8 ) ) ] \
CHR[ and( 0xFF, rshift( n, 16 ) ) ] \
CHR[ and( 0xFF, rshift( n, 24 ) ) ] }

the opposite conversion is looking even worse:

func bintonum( t ,a,r,A ) {

split( t, A, "" )
r = 0
while( ++a in A )
r = lshift( r, 8 ) + ASC[ A[ a ] ]
return r + 0 }

the paradox is in that both conversions above transform one four bytes of data into another - exactly the same four bytes ...

Janis Papanagnou

unread,
Apr 16, 2022, 9:06:28 AM4/16/22
to
You haven't provided any question or statement in your post. Nonetheless
I'd say that the answer to your thoughts is @include "binary_ops", where
the library is what you may want to provide.

Janis

Janis Papanagnou

unread,
Apr 16, 2022, 9:21:08 AM4/16/22
to
On 16.04.2022 15:06, Janis Papanagnou wrote:
> On 16.04.2022 14:49, Digi wrote:
>> what is "heavy" in gawk?
>>
>> i may provide example of what i meaning:
>>
>> if you're parsing some structures, for example or generate an graphic file or cutting audio file or any more - you will need to work with the numbers in it's binary forms:
>>
>>
>> thus "0123" is the 0x33323130 (hex: 30 31 32 33)
>>
>> so to convert from number to it's binary form i will need something like:
>>
>>
>> BEGIN{
>> for ( i = 0; i < 256; i++ )
>> ASC[ CHR[ i ] = sprintf( "%.c", i ) ] = i
>>
[...]
>
> You haven't provided any question or statement in your post. Nonetheless
> I'd say that the answer to your thoughts is @include "binary_ops", where
> the library is what you may want to provide.

I saw that there's an 'ordchar' extension in gawk's extension directory
which may help you reduce own implementation efforts.

>
> Janis
>

Kenny McCormack

unread,
Apr 16, 2022, 9:48:00 AM4/16/22
to
In article <613c5e48-d8f9-4df6...@googlegroups.com>,
Digi <cosm...@gmail.com> wrote:
>what is "heavy" in gawk?
>
>i may provide example of what i meaning:
>
>if you're parsing some structures, for example or generate an graphic file or
>cutting audio file or any more - you will need to work with the numbers in it's
>binary forms:

Yes, this is a weakness in the AWK model. The reason we program in AWK in
the first place is so that we don't have to worry about this stuff - i.e.,
the underlying representations of numbers and strings. We can just program
as if numbers and strings are native (and basically interchangeable) types.

But, every so often, we have a need to go below the surface. To get at the
underlying bits and bytes. In my case, this is usually when I want to
access some underlying Unix/Linux functionality that GAWK doesn't currently
provide. For example, I recently decided to re-implement the "touch"
utility in GAWK. To do so, I had to be able to access (one of the many)
"utime" function(s) from GAWK. As it happens, accessing the function
itself was easy (once you have already built functionality to access
arbitrary system calls from GAWK, as I have done), but the hard part (hard
only because it had not already been done) was creating functionality to
convert a GAWK number into the underlying binary representation needed to
pass to the system call.

This was all written up in a recent thread here on this newsgroup (q.v.).
The gist of it was that this function will do the work:

function encode(n, i,s) {
s = sprintf("%c",n)
for (i=1; i<4; i++)
s = s sprintf("%c",rshift(n,i*8))
return s
}

But it is not exactly a thing of beauty.

Overall, I think the best advice I can give is that if you think you're
going to be doing this in any ongoing scale, you will probably end up
writing an extension library (in C) to do most of the nitty gritty stuff.

--
Men rarely (if ever) manage to dream up a God superior to themselves.
Most Gods have the manners and morals of a spoiled child.

Digi

unread,
Apr 17, 2022, 8:03:21 AM4/17/22
to
Jaanis:

"I saw that there's an 'ordchar' extension in gawk's extension directory
which may help you reduce own implementation efforts."
yeah, i just providing (commonly) an examples in it's best(performance) case.
and i want just discuss about some themes in gawk.

i hear that this is good place for this. isn't ? is there another places? )

Kenny:

"Yes, this is a weakness in the AWK model."

but i hear here from somebody from gawk team that gawk is positions itself as: perfectly suited for "pure file parsing".
it is still so, but this is strange that the best language have that's kind of weakness.

it's looks like such kind of things should be compensated by the two new dynamic extensions:

n = bintonum( t )

and like:

t = numtobin( n, bytewide)

but this is also not the perfect solution because of at least one reason: dynamic extensions is also requiring file infrastructure that is hard on the remote machines.

however it is looks like i should start to do that by myself. i mean writing dynamic extensions ... it's time =)

regards
D



Message has been deleted

Kpop 2GM

unread,
Apr 22, 2022, 9:38:59 AM4/22/22
to
[[ REPOSTING due to minor mawk-1/nawk bug found ]]

@Digi : If you want a num-to-bin to handle it all, even when using gawk unicode mode, you can try this :

The exact code can print out any arbitrary 4-byte combination in mawk-1.3.4, mawk-1.9.9.6, gawk 5.1.1 byte mode, gawk 5.1.1 unicode mode, and macOS nawk.

It also has auto awk-variant detection in order to make necessary behavior adjustments, such as mawk-1 not printing large hex or negative hex, nawk not printing negative hex, mawk-2 not having auto-comma feature, and ensuring gawk-unicode-mode doesn't interpret the values as multi-byte unicode code points. (remove the leading dots - they're only for formatting purposes on newsgroup)

.. gawk -e '
....function encode(_,__,___) {
....return \
....sprintf("%c%c%c%c",(__<__)*((_%=___=(__^=__^=__+=__=__==__)\
........*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___,
.......................................... int(_*=__)%__+___,
..........................................int(_*=__)%__+___,
........................................int(_*__)%__+___)
.. } BEGIN {
........___=2^2^5; __["1701734259"]__["3891792015"]
........__["2405365991"]__[sprintf("%.f",3^32-4^7)]
........__["-444025027"]=__["3850942269"]=-1

........PROCINFO["sorted_in"]="@val_num_asc"

........for(_ in __) {
............printf(" \t 32-bit usgned <( %"(\
..................("\333\222")~"[^\333\222]"?"":"\47"\
.................. )"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",
.................._, +"0x1" ? ((_%___)+___)%___ : _, encode(_))
........}
....}'

........32-bit usgned <(.......... -444,025,027 | 0x ffffffffe588b73d )> = [[ 刷= ]]
........32-bit usgned <(..........3,850,942,269 | 0x........ e588b73d )> = [[ 刷= ]]
........32-bit usgned <(..........1,701,734,259 | 0x........ 656e6773 )> = [[ engs ]]
........32-bit usgned <(..1,853,020,188,835,457 | 0x....6954fe21dfe81 )> = [[ ??? ]]
........32-bit usgned <(..........2,405,365,991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
........32-bit usgned <(..........3,891,792,015 | 0x........ e7f8088f )> = [[ ?? ]]

% echo; mawk 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*( (_%=___=(__^=__^=__+=__=__==__)*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, int(_*=__)%__+___,int(_*=__)%__+___,int(_*__)%__+___) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'

........32-bit usgned <(..........1,701,734,259 | 0x........ 656e6773 )> = [[ engs ]]
........32-bit usgned <(..1,853,020,188,835,457 | 0x........ e21dfe81 )> = [[ ??? ]]
........32-bit usgned <(..........2,405,365,991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
........32-bit usgned <(..........3,891,792,015 | 0x........ e7f8088f )> = [[ ?? ]]
........32-bit usgned <(..........3,850,942,269 | 0x........ e588b73d )> = [[ 刷= ]]
........32-bit usgned <(.......... -444,025,027 | 0x........ e588b73d )> = [[ 刷= ]]

% echo; mawk2 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*( (_%=___=(__^=__^=__+=__=__==__)*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, int(_*=__)%__+___,int(_*=__)%__+___,int(_*__)%__+___) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'

........32-bit usgned <(............ 3891792015 | 0x........ e7f8088f )> = [[ ?? ]]
........32-bit usgned <(............ 3850942269 | 0x........ e588b73d )> = [[ 刷= ]]
........32-bit usgned <(............ 1701734259 | 0x........ 656e6773 )> = [[ engs ]]
........32-bit usgned <(...... 1853020188835457 | 0x....6954fe21dfe81 )> = [[ ??? ]]
........32-bit usgned <(............ 2405365991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
........32-bit usgned <(............ -444025027 | 0x ffffffffe588b73d )> = [[ 刷= ]]

% echo; nawk 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*( (_%=___=(__^=__^=__+=__=__==__)*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, int(_*=__)%__+___,int(_*=__)%__+___,int(_*__)%__+___) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'

........32-bit usgned <(..........3,850,942,269 | 0x........ e588b73d )> = [[ 刷= ]]
........32-bit usgned <(..........3,891,792,015 | 0x........ e7f8088f )> = [[ ?? ]]
........32-bit usgned <(..........1,701,734,259 | 0x........ 656e6773 )> = [[ engs ]]
........32-bit usgned <(.......... -444,025,027 | 0x........ e588b73d )> = [[ 刷= ]]
........32-bit usgned <(..........2,405,365,991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
........32-bit usgned <(..1,853,020,188,835,457 | 0x........ e21dfe81 )> = [[ ??? ]]

% echo; gawk -b -e 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*( (_%=___=(__^=__^=__+=__=__==__)*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, int(_*=__)%__+___,int(_*=__)%__+___,int(_*__)%__+___) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'

........32-bit usgned <(.......... -444,025,027 | 0x ffffffffe588b73d )> = [[ 刷= ]]
........32-bit usgned <(..........3,850,942,269 | 0x........ e588b73d )> = [[ 刷= ]]
........32-bit usgned <(..........1,701,734,259 | 0x........ 656e6773 )> = [[ engs ]]
........32-bit usgned <(..1,853,020,188,835,457 | 0x....6954fe21dfe81 )> = [[ ??? ]]
........32-bit usgned <(..........2,405,365,991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
........32-bit usgned <(..........3,891,792,015 | 0x........ e7f8088f )> = [[ ?? ]]
0 new messages