Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

calculating the value of two or four bytes?

12 views
Skip to first unread message

Tars_Tarkas

unread,
Jan 27, 2003, 9:55:37 PM1/27/03
to
Hi there!

If this be not the right newsgroup to ask, I'll gladly accept pointers
to places to ask.

I'm currently writting a hex-editor (under Linux, but I want it to work
"everywhere" that does ANSI C (and curses, which I use to display
stuff)). Mostly because the ones I found I didn't like, but also as an
excercise in portable C-coding.

I thought it would be a good feature to display the value of the two and
the four bytes starting at the cursor position in the status line.

Now, I have a char buffer (char *buffer) which contains the currently
displayed bytes.

What I had in mind is (basically):

[ ... ]
#include <net/in.h>
[ ... ]
n2 = *((uint16_t *)(&buffer[cursor]));
n4 = *((unit32_t *)(&buffer[cursor]));

and let the compiler figure it out. I have some problems with that:

a) I use net/in.h for the uint16_t and uint32_t types to be somewhat
platform independant. I know it exists on Linux and Solaris, so I'm
figuring most UNIXes are covered, but how about elsewhere? That's
assuming elsewhere also has the curses librabry for text-mode displaying
stuff...

b) This of course assumes that 16 bit == 2 bytes and 32 bit == 4 bytes,
but what if that's not true? I mean is there a standard C way of saying
"a type the size of sizeof(char)*2 or sizeof(char)*4" (sizeof(char)
being one, IIRC).

c) For reference, I'm testing on i386/Linux and Sparc/Solaris. The
problem I have with that is, on Linux it works all the time, but on
Solaris it bus errors if cursor (the current byte the cursor is
highlighting) indexes a non-2-bytes aligned address (in the n2 case,
non-4-bytes aligned in the n4 case).

Now, this may seem like a hardware/platform specific question, but what
I really want to know is not how to fix this on Solaris, but whether it
is broken at all. And if anybody has a better/more portable way of doing
this.

I thought it was a nice way to avoid any endianess issues which I would
otherwise have to deal with. I have some other ideas how to do this, but
all of them involve figuring out whether the system is little or big
endian, which I don't particularily want to do (if I can avoid it)...

Well, much blah for a simple question, I'm sure, but any help appreciated.

TIA,
Tars Tarkas

Bertrand Mollinier Toublet

unread,
Jan 27, 2003, 10:48:16 PM1/27/03
to
Tars_Tarkas wrote:
> Hi there!
>
> <snip>

>
> I thought it would be a good feature to display the value of the two and
> the four bytes starting at the cursor position in the status line.
>
> Now, I have a char buffer (char *buffer) which contains the currently
> displayed bytes.
>
> What I had in mind is (basically):
>
> [ ... ]
> #include <net/in.h>
> [ ... ]
> n2 = *((uint16_t *)(&buffer[cursor]));
> n4 = *((unit32_t *)(&buffer[cursor]));
>
> and let the compiler figure it out. I have some problems with that:
>
> a) I use net/in.h for the uint16_t and uint32_t types to be somewhat
> platform independant. I know it exists on Linux and Solaris, so I'm
> figuring most UNIXes are covered, but how about elsewhere? That's
> assuming elsewhere also has the curses librabry for text-mode displaying
> stuff...

net/in.h is not a standard (ANSI/ISO) header. So much for portability :-)
However, C99 defines types a fixed width. Given the current deployment
of C99 able platforms, this might not be very relevant to you.

>
> b) This of course assumes that 16 bit == 2 bytes and 32 bit == 4 bytes,
> but what if that's not true? I mean is there a standard C way of saying
> "a type the size of sizeof(char)*2 or sizeof(char)*4" (sizeof(char)
> being one, IIRC).

I don't know of such a way. CHAR_BIT, defined in limits.h, gives you the
number of bits per char.
Now you might want to think about where you are going to use your hex
editor. I am not sure it would be very relevant to use it on a platform
where chars are not 8 bits (or multiples of it, maybe) long. This might
let you make some assumptions and thus let you simplify your design, and
eventually your code.


>
> c) For reference, I'm testing on i386/Linux and Sparc/Solaris. The
> problem I have with that is, on Linux it works all the time, but on
> Solaris it bus errors if cursor (the current byte the cursor is
> highlighting) indexes a non-2-bytes aligned address (in the n2 case,
> non-4-bytes aligned in the n4 case).
>
> Now, this may seem like a hardware/platform specific question, but what
> I really want to know is not how to fix this on Solaris, but whether it
> is broken at all. And if anybody has a better/more portable way of doing
> this.
>
> I thought it was a nice way to avoid any endianess issues which I would
> otherwise have to deal with. I have some other ideas how to do this, but
> all of them involve figuring out whether the system is little or big
> endian, which I don't particularily want to do (if I can avoid it)...
>

to answer a long question short, indeed, your idea is broken, as
evidenced by the failure on Sparc/Solaris. To try to cast a pointer to
char to a pointer to a bigger type creates alignment issues.

As far as I know, you will have to end up determining the endianness
yourself (either at run-time or at compile-time), and compute your
2-bytes and 4-bytes values with masks and shifts...

--
Bertrand Mollinier Toublet
Currently looking for employment in the San Francisco Bay Area
http://bmt-online.dapleasurelounge.com/

Tars_Tarkas

unread,
Jan 27, 2003, 11:31:14 PM1/27/03
to
Bertrand Mollinier Toublet wrote:
> Tars_Tarkas wrote:
>
>> Hi there!
>>
> > <snip>
>
>>
>> I thought it would be a good feature to display the value of the two
>> and the four bytes starting at the cursor position in the status line.
>>
>> Now, I have a char buffer (char *buffer) which contains the currently
>> displayed bytes.
>>
>> What I had in mind is (basically):
>>
>> [ ... ]
>> #include <net/in.h>
>> [ ... ]
>> n2 = *((uint16_t *)(&buffer[cursor]));
>> n4 = *((unit32_t *)(&buffer[cursor]));
>>
>> and let the compiler figure it out. I have some problems with that:
>>
>> a) I use net/in.h for the uint16_t and uint32_t types to be somewhat
>> platform independant. I know it exists on Linux and Solaris, so I'm
>> figuring most UNIXes are covered, but how about elsewhere? That's
>> assuming elsewhere also has the curses librabry for text-mode
>> displaying stuff...
>
>
> net/in.h is not a standard (ANSI/ISO) header. So much for portability :-)
> However, C99 defines types a fixed width. Given the current deployment
> of C99 able platforms, this might not be very relevant to you.

Well, it's good to know that this is now getting included. Though I
don't know if it's agood thing or a bad thing to have...

>> b) This of course assumes that 16 bit == 2 bytes and 32 bit == 4
>> bytes, but what if that's not true? I mean is there a standard C way
>> of saying "a type the size of sizeof(char)*2 or sizeof(char)*4"
>> (sizeof(char) being one, IIRC).
>
>
> I don't know of such a way. CHAR_BIT, defined in limits.h, gives you the
> number of bits per char.
> Now you might want to think about where you are going to use your hex
> editor.

At the moment I'm going to use it on my linux box to edit a wizardry 8
save file (not for character boosting, but I killed a rather important
NPcharacter about a zillion save games ago and I don't feel like doing
all the stuff since then again, so I want to bring that character back
to life with some non-game magic). Aside from that it has always
bothered me that there wasn't a simple little hex editor for unix in
text mode (that I know of), so I just decided to write one.

> I am not sure it would be very relevant to use it on a platform
> where chars are not 8 bits (or multiples of it, maybe) long. This might
> let you make some assumptions and thus let you simplify your design, and
> eventually your code.

Yeah, I see the wisdom in that. I guess I'll just assume that a byte is
8 bits for now and code it in a way that if I have to, it'll be easy to
take care of other cases.

>>
>> c) For reference, I'm testing on i386/Linux and Sparc/Solaris. The
>> problem I have with that is, on Linux it works all the time, but on
>> Solaris it bus errors if cursor (the current byte the cursor is
>> highlighting) indexes a non-2-bytes aligned address (in the n2 case,
>> non-4-bytes aligned in the n4 case).
>>
>> Now, this may seem like a hardware/platform specific question, but
>> what I really want to know is not how to fix this on Solaris, but
>> whether it is broken at all. And if anybody has a better/more portable
>> way of doing this.
>>
>> I thought it was a nice way to avoid any endianess issues which I
>> would otherwise have to deal with. I have some other ideas how to do
>> this, but all of them involve figuring out whether the system is
>> little or big endian, which I don't particularily want to do (if I can
>> avoid it)...
>>
>
> to answer a long question short, indeed, your idea is broken, as
> evidenced by the failure on Sparc/Solaris.

[1]


> To try to cast a pointer to char to a pointer to a bigger type creates
> alignment issues.
>
> As far as I know, you will have to end up determining the endianness
> yourself (either at run-time or at compile-time), and compute your
> 2-bytes and 4-bytes values with masks and shifts...

Not what I hoped for, but I already guessed it was like this. No free
lunch today ;)

Well, thanks for the quick answer and all the information,
Tars Tarkas

[1](Actually I'm begining to like Solaris more and more as a test
platform, it points out many subtle bugs/incorrect code that Linux just
tolerates (Solaris initializing memory with 0, for example, makes array
miss-indexing much easier to find))

Tars_Tarkas

unread,
Jan 28, 2003, 12:08:51 AM1/28/03
to
Bertrand Mollinier Toublet wrote:
> Tars_Tarkas wrote:
>
>> Hi there!
>>
> > <snip>
>
[ snip writing as protable as I can? ]
>> [ ... ]
>> #include <net/in.h>
[ snip ]

> net/in.h is not a standard (ANSI/ISO) header. So much for portability :-)

I just rechecked (which I should've done before posting...) I did mean
<netinet/in.h>. But that's probably not ANSI/ISO neither.

Well, no matter, since the only use of including this has disappeared
anyway.
--
Tars_Tarkas, human C coder, killed by dereferencing NULL.

Kevin Easton

unread,
Jan 28, 2003, 1:23:38 AM1/28/03
to

You don't need to computer the endianness of the platform - just display
"bytes interpreted as 2-byte little endian", "bytes interpreted as
2-byte big endian", "bytes interpreted as 4-byte little endian" and so
on - after all, the endianness of the platform might not match the
endianess of the data that someone's viewing - only they know that
themselves.

- Kevin.

those who know me have no need of my name

unread,
Jan 28, 2003, 5:53:23 AM1/28/03
to
in comp.lang.c i read:

>I'm currently writting a hex-editor

>I thought it would be a good feature to display the value of the two


>and the four bytes starting at the cursor position in the status line.
>
>Now, I have a char buffer (char *buffer) which contains the currently
>displayed bytes.

in which case you have the two or four bytes at hand ...

>n2 = *((uint16_t *)(&buffer[cursor]));
>n4 = *((unit32_t *)(&buffer[cursor]));

so why are you considering these highly unsafe casts? these attempt to
treat the data as other than bytes, which has alignment and representation
issues. but you want to display bytes, which seems to indicate:

assert(8 == CHAR_BIT);
char display[2+1];
sprintf(display,"%02x",(unsigned)buffer[cursor]);

since you are already doing this i guess you mean that you want to use the
bytes at the cursor as if they were integers. you cannot do that portably
the way you'd like because on some platforms some bit patterns are not
valid integer values, you must use the value not the representation of the
bytes, but then you'd also need to cope with each platform's byte ordering,
which is what you are trying to avoid. oh well, you can't win all your
battles. it's not tough to deal with; you would write a routine that does
the task, but which would be different, or at least potentially so, on each
platform, i.e., you would provide `ports'. on some platforms it will be as
simple as a memcpy, on others you will have to take the value of each byte
which you add into an accumulator, in the `proper' order.

>a) I use net/in.h for the uint16_t and uint32_t types to be somewhat
>platform independant. I know it exists on Linux and Solaris, so I'm
>figuring most UNIXes are covered, but how about elsewhere?

you have only posix/sus portability, once you use the proper header. if
you want full iso c language portability you need to dump the header. if
you can presume/require c99 or at least stdint.h then you can test for and
use uint_fastN_t, uintN_t or uint_leastN_t, though if any of that isn't
available there is nothing available that can be used as you desire.
unsigned int and unsigned long can be used within the constraints of your
other presumptions, mainly that CHAR_BIT will be 8.

>That's assuming elsewhere also has the curses librabry for text-mode
>displaying stuff...

poor assumption, i.e., that will limit the portability. (there's nothing
wrong with that, i'm just noting it. also, posix/sus and curses stuff is
off-topic here, if you have with those aspects you need to post to
comp.unix.programming.)

>b) This of course assumes that 16 bit == 2 bytes and 32 bit == 4
>bytes, but what if that's not true? I mean is there a standard C way
>of saying "a type the size of sizeof(char)*2 or sizeof(char)*4"
>(sizeof(char) being one, IIRC).

sizeof(char) is always 1. always.

int and long must be at least this wide, respectively, when CHAR_BIT is 8.
it's very likely that your code would not survive/work on a platform with
any other CHAR_BIT value so perhaps they (or rather the unsigned versions)
would be useful to you.

--
bringing you boring signatures for 17 years

those who know me have no need of my name

unread,
Jan 28, 2003, 6:01:10 AM1/28/03
to
in comp.lang.c i read:

>it has always bothered me that there wasn't a simple little hex editor for
>unix in text mode (that I know of), so I just decided to write one.

better search tools might help you. well, they'll help in obtaining what
you want without having to write it. not that the exercise in writing a
portable application isn't worth the effort -- it is, well worth it in
fact.

>I guess I'll just assume that a byte is 8 bits for now and code it in a
>way that if I have to, it'll be easy to take care of other cases.

don't assume, insure -- test CHAR_BIT.

>[1](Actually I'm begining to like Solaris more and more as a test
>platform, it points out many subtle bugs/incorrect code that Linux
>just tolerates (Solaris initializing memory with 0, for example, makes
>array miss-indexing much easier to find))

linux isn't tolerating the problems, it's the platform (ia-32 i'm guessing)
which is doing so. if you were running linux/sparc you would have had the
same problem, and if you were running solaris/x86 you would have had the
same toleration.

CBFalconer

unread,
Jan 28, 2003, 7:00:19 AM1/28/03
to
Tars_Tarkas wrote:
>
> I'm currently writting a hex-editor (under Linux, but I want it to work
> "everywhere" that does ANSI C (and curses, which I use to display
> stuff)). Mostly because the ones I found I didn't like, but also as an
> excercise in portable C-coding.
>
> I thought it would be a good feature to display the value of the two and
> the four bytes starting at the cursor position in the status line.
>
> Now, I have a char buffer (char *buffer) which contains the currently
> displayed bytes.
>
> What I had in mind is (basically):
>
> [ ... ]
> #include <net/in.h>

You have just gone totally non-portable. Curses also makes you
non-portable, but that can be isolated in a display module.

I think you would be better off deciding what you are displaying
(ints, floats, whatever) and then showing their hex
representation. Don't forget that bytes can be larger than 8 bits
(see CHAR_BITS), and the sizeof operator can show you how many of
those are needed to hold an object. Then you can make displays
such as:

xxxxxxxx yyyyyyyy xxxxxxxx yyyyyyyy
xx xx xx xx yy yy yy yy xx xx xx xx yy yy yy yy

where the longer values are the normal representation (eg 1.2345)
and the shorter one is the byte decomposition of that.

Let Delia and John Carter handle the distribution via the Helium
war fleet. You will have all 4 hands full, besides which you are
green. IIRC.

--
Chuck F (cbfal...@yahoo.com) (cbfal...@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!


Tars_Tarkas

unread,
Jan 28, 2003, 7:35:41 AM1/28/03
to
those who know me have no need of my name wrote:
> in comp.lang.c i read:
>
>
>>I'm currently writting a hex-editor
>
>
>>I thought it would be a good feature to display the value of the two
>>and the four bytes starting at the cursor position in the status line.
>>
>>Now, I have a char buffer (char *buffer) which contains the currently
>>displayed bytes.
>
>
> in which case you have the two or four bytes at hand ...
>
>
>>n2 = *((uint16_t *)(&buffer[cursor]));
>>n4 = *((unit32_t *)(&buffer[cursor]));
>
>
> so why are you considering these highly unsafe casts?

Precisely because I didn't consider them highly unsafe, until the sparc
machine jumped at me with a (bus) error. And it seemed the easy way out.

> these attempt to
> treat the data as other than bytes, which has alignment and representation
> issues. but you want to display bytes, which seems to indicate:
>
> assert(8 == CHAR_BIT);
> char display[2+1];
> sprintf(display,"%02x",(unsigned)buffer[cursor]);

Yes something like that. The assert is a good idea, hadn't thought of
that yet.

> since you are already doing this i guess you mean that you want to use the
> bytes at the cursor as if they were integers.

Yep, that's about it. I figured it would be usefull to know.

> you cannot do that portably
> the way you'd like because on some platforms some bit patterns are not
> valid integer values, you must use the value not the representation of the
> bytes, but then you'd also need to cope with each platform's byte ordering,
> which is what you are trying to avoid. oh well, you can't win all your
> battles.

Seems so. Or rather I can't win them the way I want to, since the
overall aim of this battle is to write a portable hex editor, which I
might actually be able to do (given enough time).

> it's not tough to deal with; you would write a routine that does
> the task, but which would be different, or at least potentially so, on each
> platform, i.e., you would provide `ports'. on some platforms it will be as
> simple as a memcpy, on others you will have to take the value of each byte
> which you add into an accumulator, in the `proper' order.

Hmm... This is getting downright complicated... ;) Well, I'll see what
can be done.

>>a) I use net/in.h for the uint16_t and uint32_t types to be somewhat
>>platform independant. I know it exists on Linux and Solaris, so I'm
>>figuring most UNIXes are covered, but how about elsewhere?
>
>
> you have only posix/sus portability, once you use the proper header. if
> you want full iso c language portability you need to dump the header. if
> you can presume/require c99 or at least stdint.h then you can test for and
> use uint_fastN_t, uintN_t or uint_leastN_t, though if any of that isn't
> available there is nothing available that can be used as you desire.
> unsigned int and unsigned long can be used within the constraints of your
> other presumptions, mainly that CHAR_BIT will be 8.

Well, presuming or requiering c99 doesn't sound like a good idea here. I
think the distribution of c99 compilers and such isn't all that big, or
am I mistaken?

>>That's assuming elsewhere also has the curses librabry for text-mode
>>displaying stuff...
>
>
> poor assumption, i.e., that will limit the portability. (there's nothing
> wrong with that, i'm just noting it. also, posix/sus and curses stuff is
> off-topic here, if you have with those aspects you need to post to
> comp.unix.programming.)

I was just noting that I was using curses, which does limit portability,
I agree, but writing my own display code is a bit too much (for now).

And the question wasn't about posix/sus anyway... If I feel up to it, I
can always see how nethack handles it's display stuff (nethack text-mode
I mean, which compiles and runs on very many platforms with and without
curses).

>>b) This of course assumes that 16 bit == 2 bytes and 32 bit == 4
>>bytes, but what if that's not true? I mean is there a standard C way
>>of saying "a type the size of sizeof(char)*2 or sizeof(char)*4"
>>(sizeof(char) being one, IIRC).
>
>
> sizeof(char) is always 1. always.

Good. At least something I remembered correctly.

> int and long must be at least this wide, respectively, when CHAR_BIT is 8.
> it's very likely that your code would not survive/work on a platform with
> any other CHAR_BIT value so perhaps they (or rather the unsigned versions)
> would be useful to you.

Ok, I'll see if those can help me here. Thanks!
--
GP_Spukgestalt, human C coder, killed by pointing the wrong way.

Tars_Tarkas

unread,
Jan 28, 2003, 7:39:04 AM1/28/03
to
those who know me have no need of my name wrote:
> in comp.lang.c i read:
>
>
>>it has always bothered me that there wasn't a simple little hex editor for
>>unix in text mode (that I know of), so I just decided to write one.
>
>
> better search tools might help you.

I did try freshmeat.net, which had a hex edit with a GUI for the GNome
desktop as the first link. After the silly Gnome wouldn't accept that I
wanted to compile the thing, I decided to just write it myself.

> well, they'll help in obtaining what
> you want without having to write it. not that the exercise in writing a
> portable application isn't worth the effort -- it is, well worth it in
> fact.
>
>
>>I guess I'll just assume that a byte is 8 bits for now and code it in a
>>way that if I have to, it'll be easy to take care of other cases.
>
>
> don't assume, insure -- test CHAR_BIT.

Done.

>>[1](Actually I'm begining to like Solaris more and more as a test
>>platform, it points out many subtle bugs/incorrect code that Linux
>>just tolerates (Solaris initializing memory with 0, for example, makes
>>array miss-indexing much easier to find))
>
>
> linux isn't tolerating the problems, it's the platform (ia-32 i'm guessing)
> which is doing so. if you were running linux/sparc you would have had the
> same problem, and if you were running solaris/x86 you would have had the
> same toleration.

Ah so, I see. Well, still a good thing that Sparc doesn't tolerate it,
I'd much rather be told there's something wrong right away than losing
data and finding out afterwards...
--
GP_Spukgestalt, human C coder, overrun by a bus.

Tars_Tarkas

unread,
Jan 28, 2003, 7:49:03 AM1/28/03
to
CBFalconer wrote:
> Tars_Tarkas wrote:
>
>>I'm currently writting a hex-editor (under Linux, but I want it to work
>>"everywhere" that does ANSI C (and curses, which I use to display
>>stuff)). Mostly because the ones I found I didn't like, but also as an
>>excercise in portable C-coding.
>>
>>I thought it would be a good feature to display the value of the two and
>>the four bytes starting at the cursor position in the status line.
>>
>>Now, I have a char buffer (char *buffer) which contains the currently
>>displayed bytes.
>>
>>What I had in mind is (basically):
>>
>>[ ... ]
>>#include <net/in.h>
>
>
> You have just gone totally non-portable.

So I've been told. ;) Since I was just using this (or rather
netinet/in.h, but same diference) include file for the uintFoo_t types,
which I won't use anymore, I can get rid of it.

> Curses also makes you
> non-portable, but that can be isolated in a display module.

Which I'm planning to do. Nethack seems a good candidate to see how to
display text mode stuff in a portable way...

> I think you would be better off deciding what you are displaying
> (ints, floats, whatever) and then showing their hex
> representation. Don't forget that bytes can be larger than 8 bits
> (see CHAR_BITS), and the sizeof operator can show you how many of
> those are needed to hold an object. Then you can make displays
> such as:
>
> xxxxxxxx yyyyyyyy xxxxxxxx yyyyyyyy
> xx xx xx xx yy yy yy yy xx xx xx xx yy yy yy yy
>
> where the longer values are the normal representation (eg 1.2345)
> and the shorter one is the byte decomposition of that.

Actually what I want to display is lines of about this build:

0x0000001 XX XX XX XX XX XX XX XX XX XX YYYYYYYYYY

where XX is the representation of one byte as hex-numbers ('A3' or '29'
or something) and the corresponding Y would be the corresponding
character (if it can be displayed, which is another non-trivial thing to
find out, as I'm finding out).

> Let Delia and John Carter handle the distribution via the Helium
> war fleet. You will have all 4 hands full, besides which you are
> green. IIRC.

4 hands would make typing that much faster if I could fit them all on
the keyboard at the same time, that is. And I hope being green is not an
issue here! Good idea on the distribution channels though ;)
--
Tars_Tarkas, barsoomian C coder.

Tars_Tarkas

unread,
Jan 28, 2003, 7:51:04 AM1/28/03
to

I was thinking if it wasn't a good idea to make the display switchable
in this ascpect, but still the default setting should match the platform
I should think.
--
GP_Spukgestalt, human C coder, killed by opening an egg the wrong side.

glen herrmannsfeldt

unread,
Jan 28, 2003, 1:39:52 PM1/28/03
to

"Tars_Tarkas" <Tars_...@ganja.com> wrote in message
news:b14rfb$fo5$1...@minotaurus.cip.informatik.uni-muenchen.de...
(snip)

>
> I thought it would be a good feature to display the value of the
two and
> the four bytes starting at the cursor position in the status
line.
>
> Now, I have a char buffer (char *buffer) which contains the
currently
> displayed bytes.
>
> What I had in mind is (basically):
>
> [ ... ]
> #include <net/in.h>
> [ ... ]
> n2 = *((uint16_t *)(&buffer[cursor]));
> n4 = *((unit32_t *)(&buffer[cursor]));
>
(snip)

memcpy() the bytes to a properly aligned variable, and then use
that.

maybe memcpy(&n2,&buffer[cursor],sizeof(n2));

(you might need some more casts)

The point being that you want to use the endianness of the CPU you
are running on, but don't want to know what it is. This doesn't
help if you want to edit the file on a different machine, though.

-- glen


Tars_Tarkas

unread,
Jan 28, 2003, 1:45:49 PM1/28/03
to
glen herrmannsfeldt wrote:
> "Tars_Tarkas" <Tars_...@ganja.com> wrote in message
> news:b14rfb$fo5$1...@minotaurus.cip.informatik.uni-muenchen.de...
> (snip)
>
>>I thought it would be a good feature to display the value of the
>
> two and
>
>>the four bytes starting at the cursor position in the status
>
> line.
>
>>Now, I have a char buffer (char *buffer) which contains the
>
> currently
>
>>displayed bytes.
>>
>>What I had in mind is (basically):
>>
>>[ ... ]
>>#include <net/in.h>
>>[ ... ]
>>n2 = *((uint16_t *)(&buffer[cursor]));
>>n4 = *((unit32_t *)(&buffer[cursor]));
>>
>
> (snip)
>
> memcpy() the bytes to a properly aligned variable, and then use
> that.
>
> maybe memcpy(&n2,&buffer[cursor],sizeof(n2));
>
> (you might need some more casts)

Very good idea! Thanks a lot.

> The point being that you want to use the endianness of the CPU you
> are running on, but don't want to know what it is.

Yes, precisely :)

> This doesn't
> help if you want to edit the file on a different machine, though.

I can find out the endianess of a machine I think. And take appropriate
action I think (such as switching the endianess of the display or
something).
--
GP_Spukgestalt, human C coder, killed by a if (foo = bar < foobar).

Daniel Fox

unread,
Jan 28, 2003, 2:15:12 PM1/28/03
to
Tars_Tarkas <Tars_...@ganja.com> wrote in
news:b16j4t$j94$1...@minotaurus.cip.informatik.uni-muenchen.de:

> glen herrmannsfeldt wrote:

>> memcpy() the bytes to a properly aligned variable, and then use
>> that.
>>
>> maybe memcpy(&n2,&buffer[cursor],sizeof(n2));
>>
>> (you might need some more casts)
>
> Very good idea! Thanks a lot.

Except that the result might be a trap representation, if I'm not
mistaken...

>> The point being that you want to use the endianness of the CPU you
>> are running on, but don't want to know what it is.
>
> Yes, precisely :)
>
>> This doesn't
>> help if you want to edit the file on a different machine, though.
>
> I can find out the endianess of a machine I think. And take
> appropriate action I think (such as switching the endianess of the
> display or something).

You need not know the endian order of the machine at all, really.

Lawrence Kirby once posted an excellent way to read values from arbitrary
locations in memory; perfect for a hex editor (I have indeed used them for
my own small hex viewer).

Here are my modified versions of his code. The second (optional) parameter
is provided to allow you to determine whether you wish to receive a pointer
to the characters after the read or not, useful for both examining
different values at a single location in memory or reading in various
values sequentially, as the case may be.

This will read a 32 bit big endian value notation from an arbitrary
location in memory, returning it as an unsigned long:

unsigned long
read_u32_be( const unsigned char * data_p, const unsigned char ** data_pp )
{
unsigned long u32_be = ((unsigned long)data_p[0] << 24 ) |
((unsigned long)data_p[1] << 16 ) |
((unsigned long)data_p[2] << 8 ) |
data_p[3];

if( data_pp != NULL ) {
*data_pp = (unsigned char *)(data_p + 4);
}

return u32_be;
}

This will do the same, with a little endian value:

unsigned long
read_u32_le( const unsigned char * data_p, const unsigned char ** data_pp )
{
unsigned long u32_le = ((unsigned long)data_p[3] << 24 ) |
((unsigned long)data_p[2] << 16 ) |
((unsigned long)data_p[1] << 8 ) |
data_p[0];

if( data_pp != NULL ) {
*data_pp = (unsigned char *)(data_p + 4);
}

return u32_le;
}

And here's a signed version, for big endian. It reads two's complement
values. The odd looking bit maniuplations and subtractions ensure that the
two's complement values can be read even on a machine that isn't two's
complement. The '- 1' is to avoid overflow.

long
read_s32_be( const unsigned char * data_p, const unsigned char ** data_pp )
{
unsigned long u32_be = ((unsigned long)data_p[0] << 24 ) |
((unsigned long)data_p[1] << 16 ) |
((unsigned long)data_p[2] << 8 ) |
data_p[3];
long s32_be;

if( u32_be & 0x80000000UL ) {
s32_be = (long)( u32_be - 0x80000000UL ) - 0x7FFFFFFF - 1;
}
else {
s32_be = u32_be;
}

if( data_pp != NULL ) {
*data_pp = (unsigned char *)(data_p + 4);
}

return s32_be;
}

And little endian:

long
read_s32_le( const unsigned char * data_p, const unsigned char ** data_pp )
{
unsigned long u32_le = ((unsigned long)data_p[3] << 24 ) |
((unsigned long)data_p[2] << 16 ) |
((unsigned long)data_p[1] << 8 ) |
data_p[0];
long s32_le;

if( u32_le & 0x80000000UL ) {
s32_le = (long)( u32_le - 0x80000000UL ) - 0x7FFFFFFF - 1;
}
else {
s32_le = u32_le;
}

if( data_pp != NULL ) {
*data_pp = (unsigned char *)(data_p + 4);
}

return s32_le;
}


Perhaps these will be useful to you; I certainly have found them so. You
can adapt them to arbitrary integer sizes (16 or 24 or XX) quite easily.


-Daniel

Chris Torek

unread,
Jan 28, 2003, 3:43:14 PM1/28/03
to
In article <Xns931193FEDC3D0...@65.82.44.9>
Daniel Fox <danielfox200...@hotmail.com> writes:
>And here [are] signed version[s] .... . It reads two's complement
>values. The odd looking bit maniuplations and subtractions ensure that the
>two's complement values can be read even on a machine that isn't two's
>complement. The '- 1' is to avoid overflow.

If you are have large enough "long" values, there is a shorter
version than this:

> if( u32_be & 0x80000000UL ) {
> s32_be = (long)( u32_be - 0x80000000UL ) - 0x7FFFFFFF - 1;
> }
> else {
> s32_be = u32_be;
> }

Here is an example that reads the two's complement value of some
16-bit "input" value, regardless of whether the machine you are
using has two's complement:

long output;
...
/* input is in [0..0xffff] -- if need be, mask with 0xffff */
output = (long)(input ^ 0x8000) - 32768L;

The trick here is to take an input "unsigned int" value in [0..0x7fff]
and turn it into one in [0x8000..0xffff] -- i.e., 32768 through
65535 -- and take an input in [0x8000..0xffff] and turn it into
one in [0..0x7fff]. Convert that to long, giving a value in
0..65535, and then subtract 32768 (0x8000). Input values that were
0x8000 through 0xffff become -32768 through -1 respectively, and
inputs in the [0..0x7fff] range become values in the 0 through 32767
range.

Note that we can sign-extend 9-bit values the same way:

/* can use ordinary "int output" here */
output = (input ^ 0x100) - 256;

or 18-bit values:

/* back to "long"s */
output = (long)(input ^ 0x20000) - 131072L;

There is a pattern here: xor with the sign bit, then subtract the
value "the sign bit" represents when considered as a non-sign-bit.
For 18-bit values, bit 17 is the sign bit, so xor with (1UL << 17)
or 0x20000, then subtract (1L << 17) or 131072L (signed).

The one problem here is that, to read 32-bit values, you need at
least a 33-bit "long". In C99 you could use "long long" to obtain
this.

(Of course, if you can assume -- or verify -- that you have a non-
overflow-trapping two's complement machine to begin with, all you
need to do use "unsigned long" for the xor and subtraction. Even
if your "unsigned long" is more than 32 bits, this produces the
proper bit pattern, which becomes the "right answer".)
--
In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)
Domain: to...@bsdi.com http://67.40.109.61/torek/ (for the moment)
(you probably cannot email me -- spam has effectively killed email)

Tars_Tarkas

unread,
Jan 28, 2003, 6:16:15 PM1/28/03
to
Daniel Fox wrote:
> Tars_Tarkas <Tars_...@ganja.com> wrote in
> news:b16j4t$j94$1...@minotaurus.cip.informatik.uni-muenchen.de:
>
>
>>glen herrmannsfeldt wrote:
>
>
>>>memcpy() the bytes to a properly aligned variable, and then use
>>>that.
>>>
>>>maybe memcpy(&n2,&buffer[cursor],sizeof(n2));
>>>
>>>(you might need some more casts)
>>
>>Very good idea! Thanks a lot.
>
>
> Except that the result might be a trap representation, if I'm not
> mistaken...

Just wondering, trap representation, does that mean it could be a bit
pattern that, while seperated into a few chars, is perfectly ok but
taken together as a short or long (or whatever) means an error
condition? Or something completely different?

[ snip code to do it better / more protable and safe and stuff ]

> Perhaps these will be useful to you; I certainly have found them so. You
> can adapt them to arbitrary integer sizes (16 or 24 or XX) quite easily.

It certainly looks useful and makes sense to do it that way. Now, I
noticed you use some shifts to gather the bits together into a foo bit
variable. This will then not result in "trap representation", because C
knows what it's doing during those shifts and '|'s and such?

So here's another set questions (which will probably sound totally
clueless): I am currently (in a horribly un-portable way, I'm sure)
calculating the hexadecimal representaion of a byte like this:

int ntoFF(unsigned char number, char *buffer)
{
buffer[1] = (number & 15);
buffer[0] = (number & 240) >> 4;
buffer[1] += (buffer[1] < 10 ? '0' : 'A' - 10);
buffer[0] += (buffer[0] < 10 ? '0' : 'A' - 10);
buffer[2] = ' ';
return 0;
}

(assumption is that buffer points to somewhere that has enough space)

Now, the first question I have is this: Assuming that chars are 8 bit
and can thus represent (unsigned) values from 0 to (at most) 255, can I
then be sure that number & 15 selects the lower 4 bits and number & 240
selects the higher 4 bits? (If the chars are not 8 bit, then the whole
thing won't work anyway). What I'm asking is, does C act the way I'd
expect it to, i.e. like 15 being 00001111 and 240 11110000, or do I have
to do something else here too?

And, while I'm at it, the second question is this: After I have the
values of the higher 4 bits in buffer[0] and the lower ones in
buffer[1], how do I portably calculate the correct character to display?

I'm currently relying heavily on the ASCII properties of 0 to 9 being
adjectant and in proper order and A-F as well. I looked at the EBCDIC
table and it's the same there, but still, some strange platform might
have the letters all jumbled and then what? So, does C do some magic on
it's own or do I have to use a lookup table or something?

Come to think of it, a lookup table sounds good anyway. Sort of like this:

char lookup[16] = "0123456789ABCDEF";

int ntoFF(unsigned char number, char *buffer)
{
int l,h;

l = (number & 15);
h = (number & 240) >> 4;

buffer[1] = lookup[l];
buffer[0] = lookup[h];

buffer[2] = ' ';

return 0;
}

with the int so it won't complain I'm indexing using a char (otherwise
I'd just do buffer[1] = lookup[buffer[1]])...

Still, the original question would still intereset me. I think I saw
something in another post about the numbers at least being guaranteed to
be in ascending order, is that so? Then what about the letters?

Hmm, again quite a long post, hope you'll forgive the length, but I'm
trying to learn something here. :)
--
GP_Spukgestalt, human C coder, killed by falling of the end of an array.

bd

unread,
Jan 28, 2003, 6:20:12 PM1/28/03
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tars_Tarkas wrote:

>> poor assumption, i.e., that will limit the portability. (there's nothing
>> wrong with that, i'm just noting it. also, posix/sus and curses stuff is
>> off-topic here, if you have with those aspects you need to post to
>> comp.unix.programming.)
>
> I was just noting that I was using curses, which does limit portability,
> I agree, but writing my own display code is a bit too much (for now).
>
> And the question wasn't about posix/sus anyway... If I feel up to it, I
> can always see how nethack handles it's display stuff (nethack text-mode
> I mean, which compiles and runs on very many platforms with and without
> curses).

That's because it supports any of the following libraries:
* termlib
* termcap
* curses
* ncurses

They probably all support some API calls which it uses.

- --
Replace spamtrap with bd to reply.
Fry: Nowadays people aren't interested in art that's not tattooed on fat
guys.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+Nw/ux533NjVSos4RAiBdAJ9sqLFFcBu9UdfcYm0TyxS8wpgpaACgq7sW
sGrQMheQOdkJ4a4P3bOkFtM=
=RyI9
-----END PGP SIGNATURE-----

bd

unread,
Jan 28, 2003, 6:25:14 PM1/28/03
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tars_Tarkas wrote:

>> I don't know of such a way. CHAR_BIT, defined in limits.h, gives you the
>> number of bits per char.
>> Now you might want to think about where you are going to use your hex
>> editor.
>
> At the moment I'm going to use it on my linux box to edit a wizardry 8
> save file (not for character boosting, but I killed a rather important
> NPcharacter about a zillion save games ago and I don't feel like doing
> all the stuff since then again, so I want to bring that character back
> to life with some non-game magic). Aside from that it has always
> bothered me that there wasn't a simple little hex editor for unix in
> text mode (that I know of), so I just decided to write one.

Emacs has one:

M-x hexl-mode

- --
Replace spamtrap with bd to reply.

Maslow's Maxim:
If the only tool you have is a hammer, you treat everything like
a nail.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+NxDlx533NjVSos4RAvbIAJ90zJ3NGjjUWb3JUtrCiq/yMDv+ggCgzZAy
1pOJiDxl6z8MbZon+3Jlh3Y=
=8QSF
-----END PGP SIGNATURE-----

Tars_Tarkas

unread,
Jan 28, 2003, 6:28:58 PM1/28/03
to
bd wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Tars_Tarkas wrote:
>
>
[ snip ]

>>to life with some non-game magic). Aside from that it has always
>>bothered me that there wasn't a simple little hex editor for unix in
>>text mode (that I know of), so I just decided to write one.
>
> Emacs has one:
>
> M-x hexl-mode

Ok, I didn't know that, might've guessed, but still, last time I checked
Emacs wasn't simple and surely not little ;)

(Not that emacs isn't a good editor, just not what I'd call "simple
little").
--
GP_Spukgestalt, human C coder, killed by eight megabytes (and constantly
swapping).

Tars_Tarkas

unread,
Jan 28, 2003, 6:31:30 PM1/28/03
to
bd wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Tars_Tarkas wrote:
>
>
>>>poor assumption, i.e., that will limit the portability. (there's nothing
>>>wrong with that, i'm just noting it. also, posix/sus and curses stuff is
>>>off-topic here, if you have with those aspects you need to post to
>>>comp.unix.programming.)
>>
>>I was just noting that I was using curses, which does limit portability,
>>I agree, but writing my own display code is a bit too much (for now).
>>
>>And the question wasn't about posix/sus anyway... If I feel up to it, I
>>can always see how nethack handles it's display stuff (nethack text-mode
>>I mean, which compiles and runs on very many platforms with and without
>>curses).
>
>
> That's because it supports any of the following libraries:
> * termlib
> * termcap
> * curses
> * ncurses

even in the non-unix version? I mean termFoo and Foocurses seems a
little to unixish for all the platforms supported by nethack...
--
GP_Spukgestalt, human C coder, killed by LD_LIBRARY_PATH.

Mark McIntyre

unread,
Jan 28, 2003, 6:52:49 PM1/28/03
to
On Wed, 29 Jan 2003 00:16:15 +0100, in comp.lang.c , Tars_Tarkas
<Tars_...@ganja.com> wrote:

>Just wondering, trap representation, does that mean it could be a bit
>pattern that, while seperated into a few chars, is perfectly ok but
>taken together as a short or long (or whatever) means an error
>condition? Or something completely different?

From the ISO standard:
6.2.6.1 General
Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not have
character type, the behavior is undefined. If such a representation is
produced by a side effect that modifies all or any part of the object
by an lvalue expression that does not have character type, the
behavior is undefined. Such a representation is called a trap
representation.


--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>

Daniel Fox

unread,
Jan 28, 2003, 7:04:22 PM1/28/03
to
Tars_Tarkas <Tars_...@ganja.com> wrote in
news:b17301$jv2$1...@minotaurus.cip.informatik.uni-muenchen.de:

> Daniel Fox wrote:

>> Except that the result might be a trap representation, if I'm not
>> mistaken...
>
> Just wondering, trap representation, does that mean it could be a bit
> pattern that, while seperated into a few chars, is perfectly ok but
> taken together as a short or long (or whatever) means an error
> condition? Or something completely different?

Essentially so. The individual bytes that make up an object can be
referred to as the object representation. A trap representation is an
object representation that doesn't correspond to a value for that
object. You could think of it as the lack of a value or an invalid
value; either way, you want to avoid generating one if strict
portability is a concern.

> [ snip code to do it better / more protable and safe and stuff ]
>
>> Perhaps these will be useful to you; I certainly have found them so.
>> You can adapt them to arbitrary integer sizes (16 or 24 or XX) quite
>> easily.
>
> It certainly looks useful and makes sense to do it that way. Now, I
> noticed you use some shifts to gather the bits together into a foo bit
> variable. This will then not result in "trap representation", because
> C knows what it's doing during those shifts and '|'s and such?

No arithmetic operation on valid values that does not produce either an
undefined or implementation defined result can produce a trap
representation. In the case where a type may have padding bits, the
shift operators will shift through them. Essentially, shift operators
operate by value not by representation.

> So here's another set questions (which will probably sound totally
> clueless): I am currently (in a horribly un-portable way, I'm sure)
> calculating the hexadecimal representaion of a byte like this:
>
> int ntoFF(unsigned char number, char *buffer)
> {
> buffer[1] = (number & 15);
> buffer[0] = (number & 240) >> 4;
> buffer[1] += (buffer[1] < 10 ? '0' : 'A' - 10);
> buffer[0] += (buffer[0] < 10 ? '0' : 'A' - 10);
> buffer[2] = ' ';
> return 0;
> }

> Now, the first question I have is this: Assuming that chars are 8 bit

> and can thus represent (unsigned) values from 0 to (at most) 255,

You can make the assumtion that type char has at least 8 bits, but it
may also have more. Instead of 8, consider using CHAR_BIT to determine
how many bits will be in the char type; if you're not interested in
dealing with more than 8 bits, you can always mask off the extra ones.

> can I then be sure that number & 15 selects the lower 4 bits and
> number & 240 selects the higher 4 bits? (If the chars are not 8 bit,
> then the whole thing won't work anyway).

Well, I highly recommend you use hexadecimal notation when dealing with
bitwise operators; its what most programmers are used to. But yes, that
will do what you want. If chars are > 8 bits, keep in mind all you need to
do is discard the extra bits by & with 0xFF.

> What I'm asking is, does C act the way I'd expect it to, i.e. like 15
> being 00001111 and 240 11110000, or do I have to do something else
> here too?

You're pretty safe here. Unsigned char has the advantage of having no
padding bits and no trap representations; you are free to fiddle the bits
of an unsigned char as much as you like.

> Still, the original question would still intereset me. I think I saw
> something in another post about the numbers at least being guaranteed
> to be in ascending order, is that so? Then what about the letters?

You're guaranteed to have the digits '0'-'9' be adjacent in any
implementation regardless of character set, but not so with the letters A-
F. This is why generating an index into a string of digits is the preferred
method.

- Daniel

Pika Palasokeri

unread,
Jan 28, 2003, 7:19:21 PM1/28/03
to
In article <3E36471A...@yahoo.com>, cbfal...@yahoo.com says...
>
[snip]

>I think you would be better off deciding what you are displaying
>(ints, floats, whatever) and then showing their hex
>representation. Don't forget that bytes can be larger than 8 bits
>(see CHAR_BITS), and the sizeof operator can show you how many of
>those are needed to hold an object. Then you can make displays
>such as:
>
> xxxxxxxx yyyyyyyy xxxxxxxx yyyyyyyy
> xx xx xx xx yy yy yy yy xx xx xx xx yy yy yy yy
>
>where the longer values are the normal representation (eg 1.2345)
>and the shorter one is the byte decomposition of that.
>

The whole idea with a hex editor is that it should be able to handle any kind
of file without knowing or caring about its contents. So while your response
may be sound in principle, it's probably of little use in the particular
context of a hex editor.

PP

CBFalconer

unread,
Jan 28, 2003, 8:57:41 PM1/28/03
to
Tars_Tarkas wrote:
>
> Daniel Fox wrote:
> >
... snip ...

> >
> > Except that the result might be a trap representation, if I'm not
> > mistaken...
>
> Just wondering, trap representation, does that mean it could be a bit
> pattern that, while seperated into a few chars, is perfectly ok but
> taken together as a short or long (or whatever) means an error
> condition? Or something completely different?

For example, a system may designate 0x8000 to be a trap
representation on a 2's complement machine. This probably means
that INT_MAX == -INT_MIN, but is allowable. Any access to such a
pattern *AS AN INTEGER* triggers a system trap. Very useful if
uninitialized memory is set to this value.

CBFalconer

unread,
Jan 28, 2003, 8:57:39 PM1/28/03
to

Why? Especially if modifications to one field are reflected in
the other. The byte order dump only makes sense if it is in
memory address order. If the actual byte order is unknown a few
quick experiments will rapidly disclose it.

Kevin Easton

unread,
Jan 28, 2003, 11:06:40 PM1/28/03
to
Daniel Fox <danielfox200...@hotmail.com> wrote:
> Tars_Tarkas <Tars_...@ganja.com> wrote in
> news:b17301$jv2$1...@minotaurus.cip.informatik.uni-muenchen.de:

Or in this case, it would probably be best to use sprintf with the
"%02x" format string.

- Kevin.

Tars_Tarkas

unread,
Jan 29, 2003, 1:01:09 PM1/29/03
to
CBFalconer wrote:
> Pika Palasokeri wrote:
>
>>In article <3E36471A...@yahoo.com>, cbfal...@yahoo.com says...
>>
>>[snip]
>>
>>
>>>I think you would be better off deciding what you are displaying
>>>(ints, floats, whatever) and then showing their hex
>>>representation. Don't forget that bytes can be larger than 8 bits
>>>(see CHAR_BITS), and the sizeof operator can show you how many of
>>>those are needed to hold an object. Then you can make displays
>>>such as:
>>>
>>> xxxxxxxx yyyyyyyy xxxxxxxx yyyyyyyy
>>> xx xx xx xx yy yy yy yy xx xx xx xx yy yy yy yy
>>>
>>>where the longer values are the normal representation (eg 1.2345)
>>>and the shorter one is the byte decomposition of that.
>>
>>The whole idea with a hex editor is that it should be able to handle
>>any kind of file without knowing or caring about its contents. So
>>while your response may be sound in principle, it's probably of
>>little use in the particular context of a hex editor.
>
>
> Why? Especially if modifications to one field are reflected in
> the other. The byte order dump only makes sense if it is in
> memory address order. If the actual byte order is unknown a few
> quick experiments will rapidly disclose it.

What I had in mind was to provide a simple status line below the actual
data displayed, where the current two and four byte values are displayed.

Actually, here's a picture of what it looks like at the moment, maybe
this will clear things up (if somebody can't read http/jpg and wants to
see it, mail me, I'll send it around (though why anybody would want it
that badly, I know not)):

http://home.in.tum.de/~haasse/comp/hex-edit/screenshot.jpg

(it's running inside an fvwm2 decorated Eterm, which has nothing
whatsoever to do with the hex-editor).
--
GP_Spukgestalt, human C coder, killed by a giant chicken.

Tars_Tarkas

unread,
Jan 29, 2003, 1:09:49 PM1/29/03
to
Kevin Easton wrote:
> Daniel Fox <danielfox200...@hotmail.com> wrote:
>
>>Tars_Tarkas <Tars_...@ganja.com> wrote in
>>news:b17301$jv2$1...@minotaurus.cip.informatik.uni-muenchen.de:

[ snip displaying numbers as hexvalues and letters in character codes ]

>>>Still, the original question would still intereset me. I think I saw
>>>something in another post about the numbers at least being guaranteed
>>>to be in ascending order, is that so? Then what about the letters?
>>
>>You're guaranteed to have the digits '0'-'9' be adjacent in any
>>implementation regardless of character set, but not so with the letters A-
>>F. This is why generating an index into a string of digits is the preferred
>>method.
>
>
> Or in this case, it would probably be best to use sprintf with the
> "%02x" format string.

Well, I did that as a intermediate solution, but I thought that to
display, let's say, 16 * 20 bytes (16 bytes per line, 20 lines per
screen), with an sprintf call for each byte to display was a bit much.

So I decided to write a simple little function to do it myself. Which, I
recon, will be faster then the sprintf call, since I only handle one
case of a very specific nature, whereas sprintf can handle a lot more
stuff and has therefore (probably) more overhead involved...
--
GP_Spukgestalt, human C coder, killed by -O3.

bd

unread,
Jan 29, 2003, 5:30:14 PM1/29/03
to

Tars_Tarkas wrote:

Dunno about dos. It probably uses ANSI, which should work in DOS.


- --
Replace spamtrap with bd to reply.

Keep a diary and one day it'll keep you.
-- Mae West

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+OFW2x533NjVSos4RAhvCAJsGCF5Ug6waTlCudQMq4XgXsrVzYgCgoBYH
H2mOvhX8drsNWSuPQmj9IPo=
=9VgQ
-----END PGP SIGNATURE-----

Kevin Easton

unread,
Jan 29, 2003, 6:52:02 PM1/29/03
to

Don't overlook the fact that while you can't portably assume that the
characters A to F are consecutive and in-order (so you have to use the
string indexing method), the standard library CAN make these nonportable
assumptions, so it may well be faster. Also, you do all 16 at once with
sprintf(str, "%02x %02x %02x %02x %02x %02x %02x %02x %02x - "
"%02x %02x %02x %02x %02x %02x %02x %02x %02x", /* ... */)

- Kevin.


Tars_Tarkas

unread,
Jan 29, 2003, 7:05:21 PM1/29/03
to

Ok, there's that. But still the sprintf code has to parse the input
string and then somehow invoke the right magic to convert the
argument(s) into whatever I specified. That's something I can skip over,
since I know exactly what specific kind of transformation I want to do
with the values I have...

> Also, you do all 16 at once with
> sprintf(str, "%02x %02x %02x %02x %02x %02x %02x %02x %02x - "
> "%02x %02x %02x %02x %02x %02x %02x %02x %02x", /* ... */)

Assuming there will always be 16 bytes to one line. That's something I'm
not assuming, you see. What I did was to automatically calculate how
many bytes per line I can display and use that (if the user doesn't
override it).[1] So I'd have to somehow build the format string for
every bytes-per-line value... Otherwise that solution would be quite
acceptable I guess..

[1] This is not perfect for sure. I'm thinking it would be a better idea
to calculate how many bytes would fit and use the maximum of that and 16
as the default bytes-per-line value. The user could then specify a
command line argument to tell the program to override that and display
as many bytes per line as possible.
--
GP_Spukgestalt, human C coder, killed by a buffer overflow.

Richard Bos

unread,
Jan 30, 2003, 4:19:54 AM1/30/03
to
bd <spam...@bd-home-comp.no-ip.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1

*snigger*

> Tars_Tarkas wrote:


>
> > bd wrote:
> >> That's because it supports any of the following libraries:
> >> * termlib
> >> * termcap
> >> * curses
> >> * ncurses
> >
> > even in the non-unix version? I mean termFoo and Foocurses seems a
> > little to unixish for all the platforms supported by nethack...

No, under MS-DOS it seems to call interrupts directly. It could have
used curses, though - there _are_ curses implementations for MS-DOS, you
know.

> Dunno about dos. It probably uses ANSI, which should work in DOS.

NetHack is very much pre-ANSI - the code always horrifies me when I read
it, but the result is good :-).

If you mean ANSI escape codes: *fwap*. Any program which expects me to
load another device driver merely because it is too broken to do the
right thing shall find itself heading for the bit bucket pronto.

Richard

Peter Shaggy Haywood

unread,
Jan 30, 2003, 8:27:51 PM1/30/03
to
Groovy hepcat Tars_Tarkas was jivin' on Tue, 28 Jan 2003 13:49:03
+0100 in comp.lang.c.
Re: calculating the value of two or four bytes?'s a cool scene! Dig
it!

>Actually what I want to display is lines of about this build:
>
>0x0000001 XX XX XX XX XX XX XX XX XX XX YYYYYYYYYY
>
>where XX is the representation of one byte as hex-numbers ('A3' or '29'
>or something) and the corresponding Y would be the corresponding
>character (if it can be displayed, which is another non-trivial thing to
>find out, as I'm finding out).

#include <ctype.h>
...
if(isgraph(your_byte))
display(your_byte);
else
display('.');

Still think it's non-trivial?

--

Dig the even newer still, yet more improved, sig!

http://alphalink.com.au/~phaywood/
"Ain't I'm a dog?" - Ronny Self, Ain't I'm a Dog, written by G. Sherry & W. Walker.
I know it's not "technically correct" English; but since when was rock & roll "technically correct"?

Tars_Tarkas

unread,
Jan 30, 2003, 8:49:27 PM1/30/03
to
Peter Shaggy Haywood wrote:
> Groovy hepcat Tars_Tarkas was jivin' on Tue, 28 Jan 2003 13:49:03
> +0100 in comp.lang.c.
> Re: calculating the value of two or four bytes?'s a cool scene! Dig
> it!
>
>
>>Actually what I want to display is lines of about this build:
>>
>>0x0000001 XX XX XX XX XX XX XX XX XX XX YYYYYYYYYY
>>
>>where XX is the representation of one byte as hex-numbers ('A3' or '29'
>>or something) and the corresponding Y would be the corresponding
>>character (if it can be displayed, which is another non-trivial thing to
>>find out, as I'm finding out).
>
>
> #include <ctype.h>
> ...
> if(isgraph(your_byte))
> display(your_byte);
> else
> display('.');
>
> Still think it's non-trivial?

<grin> Must've overlooked isgraph() amongst all the other isfoobar()
functions (I am already including ctype.h, you see...). Thanks a bunch!

> --
>
> Dig the even newer still, yet more improved, sig!

No offence or nothing, but could you improve it one step further and use

--<space><newline>

as your sig delimiter?
--
GP_Spukgestalt, human C coder, killed by a triviality.

David Thompson

unread,
Feb 1, 2003, 5:47:02 PM2/1/03
to
Tars_Tarkas <Tars_...@ganja.com> wrote :
(wanting to write a hex editor)

> At the moment I'm going to use it on my linux box to edit a wizardry 8
> save file .... Aside from that it has always

> bothered me that there wasn't a simple little hex editor for unix in
> text mode (that I know of), so I just decided to write one.
>
<OT>
From other posts you obviously mean text mode of a video
display as opposed to graphics mode, but still full screen.
Classic Unix rarely had and couldn't rely on full screens;
the line-at-a-time solution for binary editing was adb --
like so many things in Unix, "perverted" to a purpose
other than that for which it was designed, and for which
it was just barely adequate, but enough to get by on. :-?
</>

--
- David.Thompson 1 now at worldnet.att.net


those who know me have no need of my name

unread,
Feb 1, 2003, 9:16:26 PM2/1/03
to
in comp.lang.c i read:
>Kevin Easton wrote:

>> do all 16 at once with
>> sprintf(str, "%02x %02x %02x %02x %02x %02x %02x %02x %02x - "
>> "%02x %02x %02x %02x %02x %02x %02x %02x %02x", /* ... */)
>
>Assuming there will always be 16 bytes to one line. That's something
>I'm not assuming, you see. What I did was to automatically calculate
>how many bytes per line I can display and use that (if the user
>doesn't override it).[1] So I'd have to somehow build the format
>string for every bytes-per-line value...

smells trivial enough ...

[warning: incomplete error checking, assumed headers]

char *build_format(const unsigned int linewidth)
{
static char *fmt = "%02.2x ";
const size_t fmtlen = strlen(fmt);
const size_t bpl = linewidth / snprintf(0, 0, fmt, 0);
const size_t buflen = bpl * fmtlen + 1;
char *buf = malloc(buflen);
if (0 == buf) return 0;
for (size_t i=0; i<bpl; i++) sprintf(buf+(i*fmtlen), "%s", fmt);
return buf;
}

but i'd just call printf multiple times, e.g.,

size_t output_hex_bytes(const unsigned int linewidth,
const char *buffer,
const size_t start,
const size_t end)
{
static char *fmt = "%02.2x ";
const size_t numb = end - start;
if (0 == numb) return end + 1;
const size_t bpf = snprintf(0, 0, fmt, 0);
const size_t bpl = linewidth / bpf;
size_t wid = numb < bpl ? numb : bpl;
size_t i;
for (i=0; i<wid; i++) printf(fmt, buffer[start+i]);
for (; wid<bpl; wid++) printf("%*.*s", (int)bpf, (int)bpf, "");
return start + i;
}
/* example usage */
void output_line(const char *s)
{
const size_t len = strlen(s);
size_t i = 0;
while (1)
{
const size_t j = output_hex_bytes(30, s, i, len-1);
/* i've used a constant, but it could as well be a variable */

/* whatever should follow the hex bytes on each line, e.g., */
fputs(" ", stdout);
for (size_t k=i; k<j; k++) printf("%c ", isprint(s[k]) ? s[k] : '.');

puts("");
if ((i=j) >= len) break;
}
}
/* driver */
int main(void)
{
output_line("123456789");
output_line("1234567890");
output_line("12345678901");
return 0;
}

[people with implementations that conform to an older standard should cry
in their beer rather than whining about the fact that i used the current.]

--
bringing you boring signatures for 17 years

those who know me have no need of my name

unread,
Feb 1, 2003, 9:21:26 PM2/1/03
to
[fu-t set]

in comp.lang.c i read:
>bd wrote:
>> Tars_Tarkas wrote:

>>>it has always bothered me that there wasn't a simple little hex editor
>>>for unix in text mode (that I know of),

>> Emacs has one:


>> M-x hexl-mode
>
>Ok, I didn't know that, might've guessed, but still, last time I
>checked Emacs wasn't simple and surely not little ;)

depends, e.g., if you use emacs a lot then you probably don't exit it very
often (i've got one running for the last 40+ days) in which case the
incremental cost of another buffer is trivial.

>(Not that emacs isn't a good editor, just not what I'd call "simple
>little").

bed, biew, fb, ghex, khexedit, shed, vile and vim all come to mind.

Andrew Phillips

unread,
Feb 7, 2003, 4:14:37 PM2/7/03
to
This seems to have generated a lot of discussion for such a simple
question.

> ... "everywhere" that does ANSI C ...

You are asking for long debates about C standard issues by putting
this in. Don't confuse "portable" with "works on any possible ANSI C
compiler that conforms to the standard". You can easily write code
that works on most (all?) standard compilers that actually exist. If
you are worried that your code may be compiled on some weird system
then just use assertions to detect this.

assert(sizeof(long) == 4);
assert(sizeof(short) == 2);

long i32; memcpy(&i32, buffer+cursor, sizeof(i32)); sprintf(disp,
"%ld", i32);
short i16; memcpy(&i16, buffer+cursor, sizeof(i16)); sprintf(disp,
"%d", i16);

Simple.

> b) This of course assumes that 16 bit == 2 bytes and 32 bit == 4 bytes,
> but what if that's not true? I mean is there a standard C way of saying
> "a type the size of sizeof(char)*2 or sizeof(char)*4" (sizeof(char)
> being one, IIRC).

Actually the code above doesn't make that assumption. (But even if it
did you would be very hard pressed to find a compiler where CHAR_BITS
was not 8.)

> c) For reference, I'm testing on i386/Linux and Sparc/Solaris. The
> problem I have with that is, on Linux it works all the time, but on
> Solaris it bus errors if cursor (the current byte the cursor is
> highlighting) indexes a non-2-bytes aligned address (in the n2 case,
> non-4-bytes aligned in the n4 case).

Just read the bytes one at a time or copy them somewhere with the
correct alignment (as in memcpy above).

> I thought it was a nice way to avoid any endianess issues which I would
> otherwise have to deal with.

It is. Of course, the byte order, and even the format of signed
numbers (2's complement, 1's complement, or sign & magnitude) being
viewed may not match that of the native system. Its a bit more
difficult to handle that but I can send you code.

> I have some other ideas how to do this, but
> all of them involve figuring out whether the system is little or big
> endian, which I don't particularily want to do (if I can avoid it)...

This is pretty simple as long as you assume the native system uses
normal big or little endian byte orders (ie ABCD and DCBA) and avoid
other weirdness (like CDAB).

Mark Gordon

unread,
Feb 10, 2003, 12:39:26 PM2/10/03
to
On 7 Feb 2003 13:14:37 -0800
and...@expertcomsoft.com (Andrew Phillips) wrote:

<snip>

> > b) This of course assumes that 16 bit == 2 bytes and 32 bit == 4
> > bytes, but what if that's not true? I mean is there a standard C way
> > of saying "a type the size of sizeof(char)*2 or sizeof(char)*4"
> > (sizeof(char) being one, IIRC).
>
> Actually the code above doesn't make that assumption. (But even if it
> did you would be very hard pressed to find a compiler where CHAR_BITS
> was not 8.)

I think this is false for a lot of C compilers for DSP processors. I
know for a fact that the compiler for the TMS320C1x, C2x, C5x series had
CHAR_BITS==16 for the simple reason that the processors could not
operate on anything smaller for most basic operations.
--
Mark Gordon
Paid to be a Geek & a Senior Software Developer
Currently looking for a new job commutable from Slough, Berks, U.K.
Although my email address says spamtrap, it is real and I read it.

0 new messages