Variable Block Text File

scad

unread,

Oct 28, 2008, 6:57:04 PM10/28/08

to

I have a file that has blocks of data that can vary in length. The
first 2 bytes of the block are a Hex number telling me how many bytes
long the block is (including those 2 bytes). I need to be able to
read those first 2 bytes, then read then entire block and write it out
to a new file with '\n' at the end of each block. Can someone help me
with that? I am having significant trouble determining the block
length as I have done little work in C++.

Thank you,

Scott

Juha Nieminen

unread,

Oct 28, 2008, 7:07:46 PM10/28/08

to

scad wrote:
> I have a file that has blocks of data that can vary in length. The
> first 2 bytes of the block are a Hex number telling me how many bytes
> long the block is (including those 2 bytes).

Are you sure the two bytes form a hexadecimal number (in ascii?), that
is, the maximum size of the block is 255 bytes (ie. FF in hex), rather
than the two bytes forming a 16-bit value telling the size of the block
(ie. the maximum size would then be 65535 bytes)?

The solution is obviously different depending on that. Also in the
latter case it depends on whether the two bytes form a low-endian or a
high-endian value.

scad

unread,

Oct 28, 2008, 8:28:48 PM10/28/08

to

It is a 16-bit value. 7F 88 = 32648

Thank you,

James Kanze

unread,

Oct 29, 2008, 5:05:19 AM10/29/08

to

And how is this binary value represented? Without knowing that,
we can't read it. If it's the same as an unsigned short in XDR,
something like:

unsigned short result = input.get() ;
result |= result << 8 | input.get() ;

would do the trick (except for error handling). If the format
is something else, you'd need something different.

And of course, this only works if you open the file in binary.
Similarly, reading the data, then outputing it with a trailing
'\n', will likely only work if the data is text, encoded in the
same character set as you normally use.

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Juha Nieminen

unread,

Oct 29, 2008, 11:35:55 AM10/29/08

to

scad wrote:
> It is a 16-bit value. 7F 88 = 32648

Thus it had nothing to do with hexadecimal. You should be more
accurate when posting questions, or else you will only send people into
wild goose chases.

sean_in...@yahoo.com

unread,

Oct 30, 2008, 9:17:02 AM10/30/08

to

It's common for beginners to associate binary values
with hex. No need to bite the newbies.

Sean

Richard Herring

unread,

Oct 30, 2008, 9:50:46 AM10/30/08

to

In message
<9db199e7-3f40-4ad0...@y71g2000hsa.googlegroups.com>,
sean_in...@yahoo.com writes

It's common for beginners and others to confuse values with
representations, and this should be discouraged.

A value is just a value, it isn't "binary" any more than it is
"hexadecimal".

--
Richard Herring

Message has been deleted

Juha Nieminen

unread,

Oct 30, 2008, 3:14:20 PM10/30/08

to

Richard Herring wrote:
> A value is just a value, it isn't "binary" any more than it is
> "hexadecimal".

True, but it's difficult to talk about values and their storage when
the terminology is so confusing.

"Hexadecimal" refers quite unambiguously to the (usually ascii)
representation of a numerical value (in base 16). The term "binary" is
more complicated.

In theory when you say "the number is stored in binary" it might refer
to one of two things:

1) It's stored in base-2 representation. That is, the number is stored
by writing a combination of the two characters '0' and '1'.

2) It's stored in the same way as it's stored in memory, in other
words, as a series of octets. In other words, it's stored in "raw"
format, without any conversion or representation in ascii.

Thus the term "binary" is used with two different meanings: In some
contexts it talks about base-2 (ascii) representation, in other contexts
it talks about raw, unconverted byte values (eg. when saying "open the
file in binary mode). These two things have basically nothing to do with
each other, except that they share the name "binary".

Maybe this is the reason why it seems that some people get even more
confused and think "hexadecimal" refers to what usually is meant with
"binary" (in the second meaning).

James Kanze

unread,

Oct 31, 2008, 6:33:57 AM10/31/08

to

On Oct 30, 8:14 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
> Richard Herring wrote:
> > A value is just a value, it isn't "binary" any more than it
> > is "hexadecimal".

> True, but it's difficult to talk about values and their
> storage when the terminology is so confusing.

> "Hexadecimal" refers quite unambiguously to the (usually
> ascii) representation of a numerical value (in base 16). The
> term "binary" is more complicated.

> In theory when you say "the number is stored in binary" it
> might refer to one of two things:

> 1) It's stored in base-2 representation. That is, the number
> is stored by writing a combination of the two characters '0'
> and '1'.

That is, actually, what is required by the C++ standard.

Of course, since only two characters are involved, a character
encoding using just one bit (rather than the usual 7, 8 or more)
is sufficient, and used by all of the implementations I've ever
encountered.

(Sort of a half :-). Just thought I'd add to the confusion, for
the fun of it.)

> 2) It's stored in the same way as it's stored in memory, in
> other words, as a series of octets. In other words, it's
> stored in "raw" format, without any conversion or
> representation in ascii.

I like the word "raw". Or "machine" or "hardware" representation.

The C++ standard requires this to be a pure binary
representation (and I don't think the intent is to require
ASCII).

Of course, all of the standard requirements are "as if"; an
implementation can use base 10, as long as it implements &, |, ^
and ~ in a manner that they behave "as if" the representation
were base 2.

> Thus the term "binary" is used with two different meanings: In
> some contexts it talks about base-2 (ascii) representation, in
> other contexts it talks about raw, unconverted byte values
> (eg. when saying "open the file in binary mode). These two
> things have basically nothing to do with each other, except
> that they share the name "binary".

And that they are both demonstrably base 2. (Consider the
behavior of |, &, ^ and ~.)

> Maybe this is the reason why it seems that some people get
> even more confused and think "hexadecimal" refers to what
> usually is meant with "binary" (in the second meaning).

Since most modern machines are byte oriented, maybe we should
call machine format base 256.