Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Proper name for meta data on bools?

76 views
Skip to first unread message

Rick C. Hodgin

unread,
Oct 17, 2017, 4:18:44 PM10/17/17
to
Is there a proper name for encoding meta data on something like a bool,
where actual data value is 0 for false, !0 for true, but to then use
that additional storage to encode information within the value so that
it still reads as true and false as expected, but also contains some
meta data or additional information beyond just yes/no?

union
{
char value;
bool flag;
};

flag = true; // value is 0x01
flag = false; // value is 0x00

value = 10;
std::cout << ((flag) ? "True" : "False") << std::endl;

Is there a proper term for populating value (flag's internal value)?
And does anyone know if C or C++ compilers will ever strip the value
down to its true/false state when passing the value to a function?

Thank you,
Rick C. Hodgin

asetof...@gmail.com

unread,
Oct 17, 2017, 11:17:50 PM10/17/17
to
Sometime ago I heard someone speak of
boolean variable it is better to be one int...
I agree, possibly can be unsigned int;
for me can not be a bit nor one char
because conversion until int
C language has to follow

Boolean has some practical use for to be seen as single bit only when it is one array of boolean...

Paavo Helde

unread,
Oct 18, 2017, 1:23:13 AM10/18/17
to
On 18.10.2017 6:17, asetof...@gmail.com wrote:
> Sometime ago I heard someone speak of
> boolean variable it is better to be one int...
> I agree, possibly can be unsigned int;
> for me can not be a bit nor one char
> because conversion until int
> C language has to follow

Wow, a poem!

Jorgen Grahn

unread,
Oct 18, 2017, 1:28:56 AM10/18/17
to
On Tue, 2017-10-17, Rick C. Hodgin wrote:
> Is there a proper name for encoding meta data on something like a bool,
> where actual data value is 0 for false, !0 for true, but to then use
> that additional storage to encode information within the value so that
> it still reads as true and false as expected, but also contains some
> meta data or additional information beyond just yes/no?
>
> union
> {
> char value;
> bool flag;
> };
>
> flag = true; // value is 0x01
> flag = false; // value is 0x00
>
> value = 10;
> std::cout << ((flag) ? "True" : "False") << std::endl;
>
> Is there a proper term for populating value (flag's internal value)?

I think the problem is you're trying for too much abstraction.
A "bool with metadata" doesn't make sense to me, because bools
don't have metadata.

While if you take a more concrete example, an errno value (or an error
code in general) is in some sense a boolean with (in case of an error)
additional data.

> And does anyone know if C or C++ compilers will ever strip the value
> down to its true/false state when passing the value to a function?

I'm ignoring your union above, but in C++ with a class, you as the
author control conversion to bool. (Before C++11 it was a bit tricky
IIRC, if you wanted nice syntax.)

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

David Brown

unread,
Oct 18, 2017, 3:02:59 AM10/18/17
to
"A bit tricky" ? The "safe boolean idiom" was a fine example of just
how hideous C++ could be in order to do something so useful and so
simple to describe. The introduction of explicit conversions in C++11
is then a fine example of how the C++ committee pay attention to the
real world C++ challenges, and add features that make the language
significantly better. Safe boolean conversions are now simple to write,
and very useful.

David Brown

unread,
Oct 18, 2017, 3:09:10 AM10/18/17
to
If you are adding metadata to a bool, it is not a bool any more. You
should consider it from the other side - think of a type holding some
information, that you will sometimes want to view or test as a simple
boolean flag.

There is a lot of use in having something that holds data and can be
considered as "true" or "false", or "valid" or "invalid". This concept
goes back to the earliest days of C, using a null pointer 0 to indicate
an invalid pointer - thus letting you test pointer validity with "if (p)
{ ... }". In C, you basically take a pointer or an integer type and use
that - any time you treat it in a boolean context (like in "if", or when
assigning to a bool) then 0 is "false", non-0 is "true".

With C++, you make a class with an explicit conversion to bool. That
lets you have more freedom in picking which values (of the data members)
are considered "false", and which are considered "true".


Rick C. Hodgin

unread,
Oct 18, 2017, 3:38:55 AM10/18/17
to
Is there a proper name for exploiting otherwise unused data space,
such as in the example I gave, which does not alter program behavior,
but does introduce additional data conveyance mechanisms atop legacy
code bases (for example)?

If you don't know the proper name, don't answer. I am looking to know
if there is a proper name for this ability.

I do not know the proper name, or even if one exists in computer science,
hence my asking. I presume it does exist.

David Brown

unread,
Oct 18, 2017, 5:03:51 AM10/18/17
to
On 18/10/17 09:38, Rick C. Hodgin wrote:
> Is there a proper name for exploiting otherwise unused data space,
> such as in the example I gave, which does not alter program behavior,
> but does introduce additional data conveyance mechanisms atop legacy
> code bases (for example)?

I don't know of any name - perhaps "tagging" of some sort?

I am sceptical to the idea of applying this to unchanged existing code.
Either it /will/ change program behaviour to at least some extent, or
it is going to be unreliable. For example, if you try to use values
other than 0 or 1 in place of a boolean, to convey additional
information, then there are three things that can go badly wrong. One
is that any use of "bool", "!!", or similar constructs, will wipe the
extra information. Another is that code that expects the value to be
either 0 or 1, will break. Finally, the existing code could already be
using the extra bits for its own purposes in exactly the same way.

It is a different matter if you are talking about a C++ type with
careful access control, and you change the type itself - if all access
to the data members, copying, etc., happens via methods that /you/
control, you can add more information and safely re-compile the code
that uses it. But for low-level types, you would have to be sure that
the legacy code uses them safely.

It is certainly possible to make use of otherwise unused space in types
- look up "tagged pointer" for an example. But that is done in
cooperation with code that uses the pointer, not as an add-on to
existing code.

Rick C. Hodgin

unread,
Oct 18, 2017, 7:44:25 AM10/18/17
to
What is it called when padded data is used to convey information?
Like on BMP files. Every row must be 32-bit aligned, but RGB()
color data can be 24-bits, leaving up to 3 bytes per row available
that is otherwise wasted data space.

If the use of that data has a name, then the name for this
type of other data use I describe is probably similar.

David Brown

unread,
Oct 18, 2017, 8:29:40 AM10/18/17
to
On 18/10/17 13:44, Rick C. Hodgin wrote:
> What is it called when padded data is used to convey information?

My immediate instinct is to call it "a mistake". I am still not sure
what you are hoping to do here, so it would be wrong of me to be too
categorical - but please tread carefully. You risk making something
that will work during your testing, but break with other code because
either you or the original code author are making assumptions.

> Like on BMP files. Every row must be 32-bit aligned, but RGB()
> color data can be 24-bits, leaving up to 3 bytes per row available
> that is otherwise wasted data space.
>
> If the use of that data has a name, then the name for this
> type of other data use I describe is probably similar.
>

Are you thinking of this?
<https://en.wikipedia.org/wiki/Steganography>

That is more about putting information in less significant bits
(effectively adding noise) rather than unused bits.

(In a BMP file, there will only be padding if the image width is not a
multiple of 4. And any manipulation of the picture data may change or
drop data stored in the padding.)

If you are really just adding information into previously unused spaces,
in a safe manner, then you are simply extending the existing format.

Scott Lurndal

unread,
Oct 18, 2017, 8:30:32 AM10/18/17
to
"Rick C. Hodgin" <rick.c...@gmail.com> writes:
>Is there a proper name for exploiting otherwise unused data space,
>such as in the example I gave, which does not alter program behavior,
>but does introduce additional data conveyance mechanisms atop legacy
>code bases (for example)?

The proper name is "foolish". Many attempts in the past to
"exploit otherwise unused data space" have never lasted. Many
systems used to use the unused high bits of a pointer, for example,
which led to problems with VA/PA sizes changed on processors
and crappy code.

Other attempts (e.g. storing data in unused portions of the
instruction stream - like the Burroughs B200) have their own
issues.

Bo Persson

unread,
Oct 18, 2017, 8:44:36 AM10/18/17
to
On 17/10/17 22:18, Rick C. Hodgin wrote:
> Is there a proper name for encoding meta data on something like a bool,
> where actual data value is 0 for false, !0 for true, but to then use
> that additional storage to encode information within the value so that
> it still reads as true and false as expected, but also contains some
> meta data or additional information beyond just yes/no?
>
> union
> {
> char value;
> bool flag;
> };
>
> flag = true; // value is 0x01
> flag = false; // value is 0x00
>
> value = 10;
> std::cout << ((flag) ? "True" : "False") << std::endl;
>
> Is there a proper term for populating value (flag's internal value)?
> And does anyone know if C or C++ compilers will ever strip the value
> down to its true/false state when passing the value to a function?
>

It is not unusual to see compilers only test the low bit of a boolean,
for example using BT [flag],1 on an x86 system.

So would fail for even values, like 10.

This sounds like holding one foot over the edge of a cliff, hoping not
to fall.



Bo Persson

Rick C. Hodgin

unread,
Oct 18, 2017, 8:59:26 AM10/18/17
to
On Wednesday, October 18, 2017 at 8:30:32 AM UTC-4, Scott Lurndal wrote:
> "Rick C. Hodgin" <rick.c...@gmail.com> writes:
> >Is there a proper name for exploiting otherwise unused data space,
> >such as in the example I gave, which does not alter program behavior,
> >but does introduce additional data conveyance mechanisms atop legacy
> >code bases (for example)?
>
> The proper name is "foolish"...

I'm not asking about the ethical nature, or security or validity of
using this technique. I'm simply asking what it's called (if there's
a technical name for it, if it's something noted (perhaps by computer
virus researchers) as something that is known, and is employed to
convey more information than the original intent in the data).

I've searched and I don't know what to search for to find the name,
so I thought I would ask other developers who may know.

If you don't know that's fine. All I'm asking for is the name.

Rick C. Hodgin

unread,
Oct 18, 2017, 9:01:19 AM10/18/17
to
On Wednesday, October 18, 2017 at 8:44:36 AM UTC-4, Bo Persson wrote:
> It is not unusual to see compilers only test the low bit of a boolean,
> for example using BT [flag],1 on an x86 system.
>
> So would fail for even values, like 10.

This is useful information, Bo. Appreciated.

David Brown

unread,
Oct 18, 2017, 9:20:20 AM10/18/17
to
I remember helping someone with code that was something equivalent to:

extern void readFromEEprom(uint16_t eepromAddress, uint16_t count,
void * destination);

int foo(void) {
bool b;
readFromEEprom(1000, 1, &b);
if (b) {
return 1;
} else {
return 2;
}
}

He was most surprised that this function was returning values other than
1 or 2 - apparently his bool was neither true nor false. It turned out
that the value stored in the eeprom was something like 10, rather than 0
or 1. The compiler had optimised the conditional into the equivalent of
"return 2 - b;" - which is perfectly correct as a bool can only hold
either 0 or 1, as long as you have used it in a proper manner.

The moral here is don't lie to your compiler - it will get its revenge.

>
> This sounds like holding one foot over the edge of a cliff, hoping not
> to fall.
>

Indeed.

(Further comments from Rick suggest he simply wants to know about this,
perhaps to be sure of avoiding it, rather than because he wants to
balance on a cliff edge. I still can't help him with any names to aid
his search.)

Öö Tiib

unread,
Oct 18, 2017, 9:53:17 AM10/18/17
to
On Wednesday, 18 October 2017 15:59:26 UTC+3, Rick C. Hodgin wrote:
> On Wednesday, October 18, 2017 at 8:30:32 AM UTC-4, Scott Lurndal wrote:
> > "Rick C. Hodgin" <rick.c...@gmail.com> writes:
> > >Is there a proper name for exploiting otherwise unused data space,
> > >such as in the example I gave, which does not alter program behavior,
> > >but does introduce additional data conveyance mechanisms atop legacy
> > >code bases (for example)?
> >
> > The proper name is "foolish"...
>
> I'm not asking about the ethical nature, or security or validity of
> using this technique. I'm simply asking what it's called (if there's
> a technical name for it, if it's something noted (perhaps by computer
> virus researchers) as something that is known, and is employed to
> convey more information than the original intent in the data).

It is called "bit packing" and more generally "data compression".
The exact technique that you describe has additionally "foolish"
before it and "in the wrong way" after it.

David Brown

unread,
Oct 18, 2017, 10:07:41 AM10/18/17
to
It might have other adjectives attached, like "deceitful" or "secret" -
as Rick suggests, it could be used in connection with malware or other
hidden information (watermarking is a related technology). Presumably
if that's what Rick is thinking of, it is in terms of spotting the
malware or preventing it.

Scott Lurndal

unread,
Oct 18, 2017, 10:09:58 AM10/18/17
to
ARM calls it TBI (top byte ignored) when applied to pointers. A processor
state bit can be set by the OS to cause the high byte of 64-bit pointers
to be ignored by the hardware. This does, however, limit the virtual
address space to only 56 bits (to be fair, it's currently limited by the
architecture (v8.2) to 52-bits).

Rick C. Hodgin

unread,
Oct 18, 2017, 10:24:33 AM10/18/17
to
On Wednesday, October 18, 2017 at 10:09:58 AM UTC-4, Scott Lurndal wrote:
> ARM calls it TBI (top byte ignored) when applied to pointers. A processor
> state bit can be set by the OS to cause the high byte of 64-bit pointers
> to be ignored by the hardware. This does, however, limit the virtual
> address space to only 56 bits (to be fair, it's currently limited by the
> architecture (v8.2) to 52-bits).

I think it has to be something like data co-opting, or non-design
reclamation, or something along those lines.

James R. Kuyper

unread,
Oct 18, 2017, 11:38:44 AM10/18/17
to
On 2017-10-18 09:19, David Brown wrote:
> On 18/10/17 14:44, Bo Persson wrote:
>> On 17/10/17 22:18, Rick C. Hodgin wrote:
>>> Is there a proper name for encoding meta data on something like a bool,
>>> where actual data value is 0 for false, !0 for true, but to then use
>>> that additional storage to encode information within the value so that
>>> it still reads as true and false as expected, but also contains some
>>> meta data or additional information beyond just yes/no?
>>>
>>> union
>>> {
>>> char value;
>>> bool flag;
>>> };
>>>
>>> flag = true; // value is 0x01
>>> flag = false; // value is 0x00

Unless bool is your own typedef, you're making way too many unjustified
assumptions.
Assuming that <stdbool.h> has been #included, so bool is a macro that
expands tor _Bool, the only thing that the standard guarantees about
flag is that it has at least one value bit. There's no upper limit on
the number of padding or value bits it may contain. sizeof(bool) is
permitted to be > 1 (it might be the same as sizeof(int_fast8_t)). If it
contains only one value bit, there's no guaranteeing which one that is.
It might be the same bit that signed char uses as a sign bit.

Think for a minute about how your code might fail if sizeof(flag) >
sizeof(value). Consider how it might fail if the the only value bit of
flag corresponds to any bit other than the lowest order bit of value.
Then re-design.
I get the impression he's having trouble seeing the forest because
there's too many trees in the way. He should simply use 'value', and
take advantage of the fact that for the purpose of conditional
expressions, 0 is treated as false, and non-zero is treated as true. By
declaring it simply and correctly as a char, rather than a union with
bool, he avoids the problems that Bo Persson and you, have pointed out.
Those are valid optimizations only because the unique properties of _Bool.

Chris Vine

unread,
Oct 18, 2017, 11:46:56 AM10/18/17
to
The bottom bits (rather than top bits) of pointers are sometimes used in
dynamically typed language implementations to hold a limited amount of
type information. If the allocator allocates memory aligned on the
pointer size, with 64-bit pointers the bottom 3 bits hold 000 and are
available for use; likewise the bottom 2 bits for 32-bit pointers[1].
This seems to go by the name of pointer tagging.

Chris

[1] glibc malloc() and cognates reputedly always align addresses on
8-byte boundaries even on 32-bit systems, so allowing the bottom 3 bits
to be used in that case.

Gareth Owen

unread,
Oct 18, 2017, 2:33:01 PM10/18/17
to
sc...@slp53.sl.home (Scott Lurndal) writes:

> "Rick C. Hodgin" <rick.c...@gmail.com> writes:
>>Is there a proper name for exploiting otherwise unused data space,
>>such as in the example I gave, which does not alter program behavior,
>>but does introduce additional data conveyance mechanisms atop legacy
>>code bases (for example)?
>
> The proper name is "foolish". Many attempts in the past to
> "exploit otherwise unused data space" have never lasted. Many
> systems used to use the unused high bits of a pointer, for example,
> which led to problems with VA/PA sizes changed on processors
> and crappy code.

CounterPoint: Chandler Carruth how they use ever single bit of composite
objects in Clang/LLVM https://www.youtube.com/watch?v=vElZc6zSIXM (key
bits from 22:00 ish)

Rick C. Hodgin

unread,
Oct 18, 2017, 2:37:48 PM10/18/17
to
On Wednesday, October 18, 2017 at 2:33:01 PM UTC-4, gwowen wrote:
> CounterPoint: Chandler Carruth how they use ever single bit of composite
> objects in Clang/LLVM https://www.youtube.com/watch?v=vElZc6zSIXM (key
> bits from 22:00 ish)

You can embed timestamps in YouTube URLs using &t=XxhYymZzs:

Jump right to 22:00 using this url:
https://www.youtube.com/watch?v=vElZc6zSIXM&t=22m0s

Gareth Owen

unread,
Oct 18, 2017, 2:47:50 PM10/18/17
to
"Rick C. Hodgin" <rick.c...@gmail.com> writes:

Bo Persson

unread,
Oct 19, 2017, 1:32:32 AM10/19/17
to
Of course you can do that if you have your own compiler.

But the code will have zero portability, which is a problem only if you
intended it to be portable. Clang/LLVM internal code might not have that
goal.


Bo Persson

Öö Tiib

unread,
Oct 19, 2017, 2:44:28 AM10/19/17
to
It can be "secret". In reality I have seen it done with fully straight face
on case of desire to pass more information without changing some interface.
Result is typically opposite to what I imagine they wanted to achieve.
Instead of backward compatibility they get very odd defects for the
parties that were made before so did not imagine the extended "Gotcha!"s
in interface and future is also dim because of confused maintainers with
that strange interface. That is why I lean towards "foolish".

Gareth Owen

unread,
Oct 19, 2017, 1:55:51 PM10/19/17
to
Bo Persson <b...@gmb.dk> writes:

> Of course you can do that if you have your own compiler.
>
> But the code will have zero portability, which is a problem only if
> you intended it to be portable.

Nope.

It's insanely complicated template magic, and it relies on alignof and
uintptr_t from C++11, but other than that, its completely standard.
0 new messages