Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Ideas on RTTI support?

1 view
Skip to first unread message

Rod Pemberton

unread,
Jun 19, 2009, 2:50:23 AM6/19/09
to
"Scott Balmos" <sba...@fastmail.fm> wrote in message
news:4a516244-9d26-4e26...@m19g2000yqk.googlegroups.com...
> [snip]
> Has anyone ever attempted writing C++ RTTI support in an OS, if not in
> a kernel?
>

No, I use C. I'll listen to your idea though. But, I'm not familiar with
RTTI. Since I'm not, could you explain why or what you're wanting to do in
your OS?

I mean, from what I've been able to gather about RTTI, it's a C++ feature
that apparently allows you to perform a validity check on a type cast or
type conversion at runtime instead of at compile time. What advantage does
doing this at runtime have? What I'm trying to ask is, once your code is
compiled and passes the compile time checks for a specific cast or
conversion, why would it also need to do a runtime check? I'm assuming this
has something to do with C++'s programming paradigm... Overloading?


Rod Pemberton


Dmitry A. Kazakov

unread,
Jun 19, 2009, 3:54:57 AM6/19/09
to

Checking is a wrong word, identification is a correct one. The type
identification is necessary to be able to perform dynamic dispatch. Since C
is does not have polymorphic objects there is no dispatch and no need in
identification.

As for OS support there are problems with it:

1. Reuse of the type identifiers (tags), after the type dies (leaving its
scope). The tags should be dense or hashable in order to have dispatch
efficient.

2. Global scope of the tags. In a distributed OS you will need to have
unique tag for whatever possible type.

3. Persistence (stored polymorphic objects must keep their unique type
tags)

4. Language impedance. Some languages like C++ define the way type tags
must behave, which is inconsistent, limiting, and inefficient for more
advanced OO languages. It might be difficult to find a common denominator.

I guess that tags (and dispatching tables) would have much in common with
the virtual memory management.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

cr88192

unread,
Jun 19, 2009, 1:18:52 PM6/19/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:pvpvu6vo1ahz$.81meb7ad8cfn.dlg@40tude.net...

> On Fri, 19 Jun 2009 02:50:23 -0400, Rod Pemberton wrote:
>
>> "Scott Balmos" <sba...@fastmail.fm> wrote in message
>> news:4a516244-9d26-4e26...@m19g2000yqk.googlegroups.com...
>>> [snip]
>>> Has anyone ever attempted writing C++ RTTI support in an OS, if not in
>>> a kernel?
>>
>> No, I use C. I'll listen to your idea though. But, I'm not familiar
>> with
>> RTTI. Since I'm not, could you explain why or what you're wanting to do
>> in
>> your OS?
>>
>> I mean, from what I've been able to gather about RTTI, it's a C++ feature
>> that apparently allows you to perform a validity check on a type cast or
>> type conversion at runtime instead of at compile time. What advantage
>> does
>> doing this at runtime have? What I'm trying to ask is, once your code is
>> compiled and passes the compile time checks for a specific cast or
>> conversion, why would it also need to do a runtime check? I'm assuming
>> this
>> has something to do with C++'s programming paradigm... Overloading?

(responding to Rod):
RTTI can be used as a type-checking mechanism similar to dynamic types, but
differing in that primarily, as opposed to using predicates, you use a
"dynamic_cast" and see if the result is NULL...


>
> Checking is a wrong word, identification is a correct one. The type
> identification is necessary to be able to perform dynamic dispatch. Since
> C
> is does not have polymorphic objects there is no dispatch and no need in
> identification.
>

some of us do dynamic typing in C however, although typically by different
means...


> As for OS support there are problems with it:
>
> 1. Reuse of the type identifiers (tags), after the type dies (leaving its
> scope). The tags should be dense or hashable in order to have dispatch
> efficient.
>
> 2. Global scope of the tags. In a distributed OS you will need to have
> unique tag for whatever possible type.
>
> 3. Persistence (stored polymorphic objects must keep their unique type
> tags)
>

these can be accomplished by using strings for types.
though a "simple and intuitive" approach exists (comming up with a name for
each possible use and naming each object with it), another possible approach
(which works better if one is the compiler dev), is to use the source-code's
type as the basis for the type ID.

the remaining issue is that, at the C level, there may be many "anonymous"
types (such as structs or function-pointers which are declared implicitly as
part of another declaration).

originally, I had used a "global gensym" to generate unique names for each
such object, but once adding persistent metadata this became an issue since
then the metadata cache becomes quickly polluted with endless redefinitions
of the same type...

one idea I had found was to make use of the lexical structure for
calculating a hash, such that 2 types formed via the same sequence of
(internal) tokens are considered equivalent (theoretically, structural
equivalence would be better, but it is much easier to calculate a hash and
look up a type via its hash than it is to go and look for a pre-existing
structurally-equivalent type).

this system is then used in combination with explicit type naming...


> 4. Language impedance. Some languages like C++ define the way type tags
> must behave, which is inconsistent, limiting, and inefficient for more
> advanced OO languages. It might be difficult to find a common denominator.
>

if possible, one option is to make use of a universal/generic "signature"
system, where most types can be identified in terms of a signature string...

the issue here is that such a setup is a little more complicated than it may
seem, as it needs to be capable of expressing not just structural types
(structs, classes, ...), but also primitive types (int, short, long, ...),
should not alias with languages (AKA: Java/C# char and C char are different,
and both likely need to be expressed as different types, as well as
expressing the very different object systems/semantics between C# and C++,
...);
as well, it should also be capable of representing abstract virtual types
(such as 'cons', 'fixnum', 'number', ...) typically found in dynamic
languages (these should not be represented as the implementation's
base-types, such as 'void *' or 'int');
...

I have partly been addressing these issues, though mostly through mutation
and adaptation (typically changes are kept small, as any notable change in
my case requires modifying lots of disparate code...).

in my case, each signature is a string, so often this allows a very simple
generic type-check mechanism:
types are compatible if the signature strings are equivalent (theoretically,
compatibility may exist between lexically different strings, but this is
much more difficult and expensive to check, and in most cases, exact-match
semantics are sufficient...).

I also like signature strings being strings as this means that sig-strings
are far more easily human-decodable (unlike a binary signature coding, such
as used in .NET). this does not mean that I think said strings should be
"human readable" (as in, verbose to a point of being easily recognizable),
rather, I just figure they should be composed of ASCII characters.

as-is, I will note that my system is sort of a descendant and hybridization
of both the IA64 name-mangling scheme, and the JVM's signature strings
(syntax closer to JVM, type notation similar to IA64's name mangling). FWIW,
I will note that the IA64 name mangling scheme is de-facto in GCC and many
other C++ compilers (except maybe MSVC and friends...). so, fammiliarity
with name-mangling should give an idea of the basic notation (for C and C++
it is more-or-less letter-equivalent).

however, for other reasons my actual mangled names are somewhat different,
where personally I feel my scheme to be an improvement over the IA64 scheme
(both a generally cleaner syntax, as well as able to retain more
information, such as the return type, ...).
however, the combined name-and-signature is itself mangled (using an
adaptation of the JNI scheme), and so the resultant mangled names look a
little different.

in my case, the idea is to bridge the systems at link time (allowing calls
to/from existing C++ compilers).


> I guess that tags (and dispatching tables) would have much in common with
> the virtual memory management.
>

yeah...

type strings are associated with every object allocated from my garbage
collector (among other things).
allocating raw memory is typed as well, at least as far as identifying that
the allocation is more-or-less unstructured memory (or unknown-structured
memory).

of course, in this case, my GC uses a much older system than my signature
system, so typically heap objects are named with generic names. I have
considered at times using sig-strings when allocating objects as well, just
this has not caught on as of yet in my case (technically, it would likely
require internal framework code to be doing the allocations, and generic
user code is unlikely to be able to get the signatures exactly right or keep
everything consistsnt between their code and the matadata database, ...).

so, for user-allocated objects, generic/semantic type names are probably
better (used in combination with sig-strings).

the main transition point would likely be with the introdution of the 'new'
keyword, as then it is the compiler allocating the object, and so can far
more likely generate correct sig-strings for the allocs.


I have no real idea how existing C++ compilers typically do RTTI though,
since this does not seem to be exactly well documented...

cr88192

unread,
Jun 19, 2009, 1:49:49 PM6/19/09
to

"cr88192" <cr8...@hotmail.com> wrote in message
news:h1gh9u$7oq$1...@news.albasani.net...
>
<snip>

>
> I also like signature strings being strings as this means that sig-strings
> are far more easily human-decodable (unlike a binary signature coding,
> such as used in .NET). this does not mean that I think said strings should
> be "human readable" (as in, verbose to a point of being easily
> recognizable), rather, I just figure they should be composed of ASCII
> characters.
>
> as-is, I will note that my system is sort of a descendant and
> hybridization of both the IA64 name-mangling scheme, and the JVM's
> signature strings (syntax closer to JVM, type notation similar to IA64's
> name mangling). FWIW, I will note that the IA64 name mangling scheme is
> de-facto in GCC and many other C++ compilers (except maybe MSVC and
> friends...). so, fammiliarity with name-mangling should give an idea of
> the basic notation (for C and C++ it is more-or-less letter-equivalent).
>
> however, for other reasons my actual mangled names are somewhat different,
> where personally I feel my scheme to be an improvement over the IA64
> scheme (both a generally cleaner syntax, as well as able to retain more
> information, such as the return type, ...).
> however, the combined name-and-signature is itself mangled (using an
> adaptation of the JNI scheme), and so the resultant mangled names look a
> little different.
>
> in my case, the idea is to bridge the systems at link time (allowing calls
> to/from existing C++ compilers).
>

I went and checked, and I will now de-emphasize the similarity of my scheme
and the IA64 scheme...

it seems I had failed to take note of something, which is that my scheme was
originally a subset of the IA64 scheme, and has since mutated away much
further than I had originally taken into account...

much past the level of common base types (char, int, long, ...) and
pointers-to-types, the similarity ends...

so, in both schemes "void **" is "PPv" and "unsigned char *" is "Ph", but
much past this level, and things quickly diverge (handling of complex types,
scoping, ... is very different, and at this point my scheme becomes far more
similar to the JVM's scheme).

"struct Foo_s *" -> "PXFoo_s;"

'X' is used for structs/unions/unmanaged classes/...

'U' ended up being used for user types, rather than user type-modifiers
(different from IA64, so 'U' in my case serves the role of 'u' in IA64, and
oddly I had considered 'F' for the use 'U' apparently originally had, ...).

as well as endless other differences...


for:
"namespace foo { ... __gc class Bar { ... }; ... }"
"Bar" -> "Lfoo/Bar;"


so, yeah, it is sort of like the love-child of the JVM's signature-system
with aspects of the IA64 scheme, and some amount of custom engineering as
well...

"int[,]" -> "C2i" ("Cf" is "float _Complex", ... but 'C' with a number was
overloaded for square arrays).
"int[][]" -> "QQi" (vs "[[i" as in the JVM, as I had wanted to keep '['
available as a possible later brace, rather than as a type modifier...).


as noted, my JNI-scheme adaptation has several additional
character-shorthands, as well as the ability to encode 2-digit (8 bit) char
escapes (vs forcing a 16-bit escape for all chars):
'_9xx' vs '_0xxxx'
...


so, yeah, in all it is a little more "custom" than "conventional"...

Dmitry A. Kazakov

unread,
Jun 19, 2009, 2:31:27 PM6/19/09
to
On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:pvpvu6vo1ahz$.81meb7ad8cfn.dlg@40tude.net...

>> Checking is a wrong word, identification is a correct one. The type


>> identification is necessary to be able to perform dynamic dispatch. Since
>> C is does not have polymorphic objects there is no dispatch and no need in
>> identification.
>
> some of us do dynamic typing in C however, although typically by different
> means...

Sure, but dynamic typing is a larger different issue. We should
differentiate identification and full type descriptions. You might wish to
have the first in OS, keeping the second outside.

>> As for OS support there are problems with it:
>>
>> 1. Reuse of the type identifiers (tags), after the type dies (leaving its
>> scope). The tags should be dense or hashable in order to have dispatch
>> efficient.
>>
>> 2. Global scope of the tags. In a distributed OS you will need to have
>> unique tag for whatever possible type.
>>
>> 3. Persistence (stored polymorphic objects must keep their unique type
>> tags)
>
> these can be accomplished by using strings for types.

Yes, but that would make it pretty useless. An implementation should
deliver performance of a dispatching call at the level of a plain call x
1.5-2 x number of dispatching arguments.

> the remaining issue is that, at the C level, there may be many "anonymous"
> types (such as structs or function-pointers which are declared implicitly as
> part of another declaration).

> originally, I had used a "global gensym" to generate unique names for each
> such object, but once adding persistent metadata this became an issue since
> then the metadata cache becomes quickly polluted with endless redefinitions
> of the same type...

This problem comes with structured equivalence of types. Which makes
identification quite difficult, because you should describe the type in its
tag. With a nominal equivalence the type tag could be a number.

cr88192

unread,
Jun 19, 2009, 3:26:44 PM6/19/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:rjm4lkpzo505$.jh2kpma5noap.dlg@40tude.net...

> On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:pvpvu6vo1ahz$.81meb7ad8cfn.dlg@40tude.net...
>
>>> Checking is a wrong word, identification is a correct one. The type
>>> identification is necessary to be able to perform dynamic dispatch.
>>> Since
>>> C is does not have polymorphic objects there is no dispatch and no need
>>> in
>>> identification.
>>
>> some of us do dynamic typing in C however, although typically by
>> different
>> means...
>
> Sure, but dynamic typing is a larger different issue. We should
> differentiate identification and full type descriptions. You might wish to
> have the first in OS, keeping the second outside.
>

yes, ok.
I had almost thought maybe the point was to support a full VM-like
typesystem in the kernel or similar...


>>> As for OS support there are problems with it:
>>>
>>> 1. Reuse of the type identifiers (tags), after the type dies (leaving
>>> its
>>> scope). The tags should be dense or hashable in order to have dispatch
>>> efficient.
>>>
>>> 2. Global scope of the tags. In a distributed OS you will need to have
>>> unique tag for whatever possible type.
>>>
>>> 3. Persistence (stored polymorphic objects must keep their unique type
>>> tags)
>>
>> these can be accomplished by using strings for types.
>
> Yes, but that would make it pretty useless. An implementation should
> deliver performance of a dispatching call at the level of a plain call x
> 1.5-2 x number of dispatching arguments.
>

FWIW, if well-designed, a string-based signature can be of similar
performance to a binary signature...

similarly, via interning strings, ... a number of other optimizations can be
done (for example, typechecking via pointer-equality, ...).

but, yeah, for dispatch, one can do as the JVM does and require 1:1 string
equivalence in these cases...


granted, usually one things strings, and they also think "slow", but in
practice this need not be the case, as carefully designed strings and
processing code can perform similarly to binary representations (typically
by basing both the representation and processing code around the use of
switches or jump-tables...).

similarly, though it is possible to dynamically dispatch using strings, in
many cases I optimize this by using the strings to drive the creation of
specialized machine code thunks, such that the machine code can perform the
requested dispatch with no additional runtime overhead (internally, I use
this for object method dispatching and similar...).


many other tasks can be done effectively via hash tables...


>> the remaining issue is that, at the C level, there may be many
>> "anonymous"
>> types (such as structs or function-pointers which are declared implicitly
>> as
>> part of another declaration).
>
>> originally, I had used a "global gensym" to generate unique names for
>> each
>> such object, but once adding persistent metadata this became an issue
>> since
>> then the metadata cache becomes quickly polluted with endless
>> redefinitions
>> of the same type...
>
> This problem comes with structured equivalence of types. Which makes
> identification quite difficult, because you should describe the type in
> its
> tag. With a nominal equivalence the type tag could be a number.
>

yeah...

that was the point of my later fix of using lexical hashing:
as a general rule, structurally equivalent types will generate the same
hash.

"struct { int x; int y; }"

may appear in 2 different contexts, and can then be hashed using the tokens,
in this case:
"{", "int", "x", ";", "int", "y", ";", '}'

by hashing and checking this, we can know, at least, that the types are
structurally equivalent.

however, there are cases where structurally "compatible" types may end up
with different hashes, which FWIW would require more expensive checks to be
done (such as searching for a matching type).

but, for now, I am lazy, and make the simplifying assumption that
structural-equivalence = lexical-equivalence...


a slight modification (making it potentially safer), is using "internal"
tokens, AKA, the structure is parsed and types/... are fully expanded, and
then we hash this "conceptual" version of the object.

this way, 2 lexically equivalent objects which appear in different contexts
and potentially have different types (but which overlap in terms of name)
would generate different hashes.

something much closer to "proper" structural equivalence could be checked by
then flattening the entire structure into a signature, and hashing/checking
this signature.


for example, a loose structural equivalence could flatten the above struct
to "{ii}", or if names are significant: "{/x;i/y;i}" or similar...

Dmitry A. Kazakov

unread,
Jun 20, 2009, 4:57:38 AM6/20/09
to
On Fri, 19 Jun 2009 12:26:44 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:rjm4lkpzo505$.jh2kpma5noap.dlg@40tude.net...
>> On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote:
>>
>>>> As for OS support there are problems with it:
>>>>
>>>> 1. Reuse of the type identifiers (tags), after the type dies (leaving
>>>> its
>>>> scope). The tags should be dense or hashable in order to have dispatch
>>>> efficient.
>>>>
>>>> 2. Global scope of the tags. In a distributed OS you will need to have
>>>> unique tag for whatever possible type.
>>>>
>>>> 3. Persistence (stored polymorphic objects must keep their unique type
>>>> tags)
>>>
>>> these can be accomplished by using strings for types.
>>
>> Yes, but that would make it pretty useless. An implementation should
>> deliver performance of a dispatching call at the level of a plain call x
>> 1.5-2 x number of dispatching arguments.
>
> FWIW, if well-designed, a string-based signature can be of similar
> performance to a binary signature...

[...]

> many other tasks can be done effectively via hash tables...

To start with you have to calculate the hash value from a string, which
alone is too slow as compared to simple indexing.

Further type tags must be ordered according to the <: relation. An
implementation of <: is required for tests like if the type S is in the
class rooted in the type T.



> but, for now, I am lazy, and make the simplifying assumption that
> structural-equivalence = lexical-equivalence...

Which is a wrong assumption. Two types are equivalent when they have same
operations. Once you define an operation Foo for the type int in some
context the result will be another type different from original int. As
known in OO, you have to carry all operations with types in order to be
able to compare (and deal) them. Just for this reason structural
equivalence is hardly compatible with OO.

cr88192

unread,
Jun 20, 2009, 1:28:15 PM6/20/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:1fn4evn388rjq.m...@40tude.net...

> On Fri, 19 Jun 2009 12:26:44 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:rjm4lkpzo505$.jh2kpma5noap.dlg@40tude.net...
>>> On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote:
>>>
>>>>> As for OS support there are problems with it:
>>>>>
>>>>> 1. Reuse of the type identifiers (tags), after the type dies (leaving
>>>>> its
>>>>> scope). The tags should be dense or hashable in order to have dispatch
>>>>> efficient.
>>>>>
>>>>> 2. Global scope of the tags. In a distributed OS you will need to have
>>>>> unique tag for whatever possible type.
>>>>>
>>>>> 3. Persistence (stored polymorphic objects must keep their unique type
>>>>> tags)
>>>>
>>>> these can be accomplished by using strings for types.
>>>
>>> Yes, but that would make it pretty useless. An implementation should
>>> deliver performance of a dispatching call at the level of a plain call x
>>> 1.5-2 x number of dispatching arguments.
>>
>> FWIW, if well-designed, a string-based signature can be of similar
>> performance to a binary signature...
>
> [...]
>
>> many other tasks can be done effectively via hash tables...
>
> To start with you have to calculate the hash value from a string, which
> alone is too slow as compared to simple indexing.
>

typically, hashes and indices are trivial to calculate.
if you mean for representing the type (in an object), then yes, an index IS
used here, but an index only tells where something is, not what it is...

for other tasks, one can precompute the hash, or make use of the literal
pointer value...


so, as I see it, using indices/handles does not preclude using strings as
the cannonical representation, no more than does using strings preclude
using indices or handles for cases where these make more sense...


the point here is ASCII signature vs binary signature (AKA: a string of
bytes where each byte works as a modifier), and the indernal use of indices
or handles is not particularly relevant.

a loop and swtich will not really care whether or not its input is based on
ASCII characters or literal byte values...


> Further type tags must be ordered according to the <: relation. An
> implementation of <: is required for tests like if the type S is in the
> class rooted in the type T.
>

this is a special case...

to pull this one off requires having a prebuilt structural representation of
the class heirarchy (at which point, one fetches the relevant classes and
performs the operation), or if one does not, that the types are strings or
not does not make much difference, the operation will still be slow (for
example, if performing the operation is performed via a large number of
database operations...).

more so, this operation can be sped up notably via the use of hashing, and
so in the general cases ceases to matter much.

is T descended from S?
combine T and S hashes, and check the hash table for either a conformation
or rejection, and if not found, perform the formal check and add it to the
hash.


anyways, many operations (for example, in my object system), are pulled off
in double-digit clock-cycles, which means I probably can't be doing things
THAT badly...


>> but, for now, I am lazy, and make the simplifying assumption that
>> structural-equivalence = lexical-equivalence...
>
> Which is a wrong assumption. Two types are equivalent when they have same
> operations. Once you define an operation Foo for the type int in some
> context the result will be another type different from original int. As
> known in OO, you have to carry all operations with types in order to be
> able to compare (and deal) them. Just for this reason structural
> equivalence is hardly compatible with OO.
>

I fail to see the point of this claim...


the point in this case is to merge multiple redefinitions of the same type
(such as a struct, ...), and lexical equivalence and/or structural
equivalence is a good way to do this...

the semantic type heirarchy is, as I see it, an entirely different system.

actually, I tend to represent "types" and "structures" as essentially 2
parallel systems, where "structures" represents the layout or representation
of things, and tends to rely on structural equivalence.

"types" is its own systems, where a type may use a structure, but it is not
necessarily the case that a type is a structure (or vice-versa...).

types are still, however, typically merged along the lines of definitional
equivalence, which would include things like their names, overloaded
operators, ...


anyway, because something is a "simplifying assumption" does not mean it is
"universally true", only that "it works in enough cases that, for now, I
will assume it to be true" (in effect, there are cases which may break such
an assumption, but they are ignored).


it would be like, for example, if one makes an assembler and makes a
simplifying assumption:
there are 3 sections: '.text', '.data', and '.bss'.

there may well be other sections in existence, but for the sake of the task
at hand, they are not relevant.


now, in my case, I tend to assume that
lexical-equivalence=structural-equivalence.
the reason:
doing a proper check for structural equivalence is expensive;
hashing a sequence of tokens and seeing if I have a match, is VERY much
cheaper (even if it "may" risk false negatives, or a naive approach may risk
false positives in certain contrieved cases...).


so, one need not actually care in all cases what is actually the case, just
whatever can be done most cheaply and efficiently...

Dmitry A. Kazakov

unread,
Jun 21, 2009, 7:15:56 AM6/21/09
to

O(n), where n is the length of the string as compared to O(1) of indexing.

> the point here is ASCII signature vs binary signature (AKA: a string of
> bytes where each byte works as a modifier), and the indernal use of indices
> or handles is not particularly relevant.

What I meant that the difference is n vs. 1.

>> Further type tags must be ordered according to the <: relation. An
>> implementation of <: is required for tests like if the type S is in the
>> class rooted in the type T.
>
> this is a special case...

You need in order to implement dynamic cast.

> is T descended from S?

This is the definition of T <: S. In order to evaluate it efficiently, you
need ordered sets, rather than unordered, represented by a hash. The point
is merely that simple hash won't really help here.

>>> but, for now, I am lazy, and make the simplifying assumption that
>>> structural-equivalence = lexical-equivalence...
>>
>> Which is a wrong assumption. Two types are equivalent when they have same
>> operations. Once you define an operation Foo for the type int in some
>> context the result will be another type different from original int. As
>> known in OO, you have to carry all operations with types in order to be
>> able to compare (and deal) them. Just for this reason structural
>> equivalence is hardly compatible with OO.
>
> I fail to see the point of this claim...

The point is that structural equivalence does not capture the notion of
type used in OO.

> the point in this case is to merge multiple redefinitions of the same type
> (such as a struct, ...), and lexical equivalence and/or structural
> equivalence is a good way to do this...

If you have identification, you just compare the identifiers. Otherwise,
structural equivalence is in general case a halting problem. So the bottom
line is, do not bother with structural equivalence, it does not work
anyway.

> the semantic type heirarchy is, as I see it, an entirely different system.

Why? You need a tree distance defined on type tags, otherwise they would be
useless.

> now, in my case, I tend to assume that
> lexical-equivalence=structural-equivalence.

Huh, but that is what named equivalence is. It compares names and ignores
any semantics!

Richard Harter

unread,
Jun 21, 2009, 12:40:49 PM6/21/09
to
On Sun, 21 Jun 2009 13:15:56 +0200, "Dmitry A. Kazakov"
<mai...@dmitry-kazakov.de> wrote:

>On Sat, 20 Jun 2009 10:28:15 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:1fn4evn388rjq.m...@40tude.net...
>>> On Fri, 19 Jun 2009 12:26:44 -0700, cr88192 wrote:
>>>
>>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>>> news:rjm4lkpzo505$.jh2kpma5noap.dlg@40tude.net...
>>>>> On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote:

>>> To start with you have to calculate the hash value from a string, which
>>> alone is too slow as compared to simple indexing.
>>
>> typically, hashes and indices are trivial to calculate.
>
>O(n), where n is the length of the string as compared to O(1) of indexing.

I'm not sure what you mean by indexing, but for any meaning that
occurs to me, this is not right.

If we have N distinct strings it takes log(N) decisions to
distinguish between them. Moreover the average string length, n,
must be >= log2(N) bits.

Furthermore O(log n) is the cost of determining that a string s is
possibly in our set of strings. To verify that s actually is in
the set of strings we must check the the entire string. I.e.,
verification is O(n).

In short, the cost of looking of looking up a string in a set of
strings is O(n). The cost can be concealed by a languages syntax
but it is there.


Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
If I do not see as far as others, it is because
I stand in the footprints of giants.

cr88192

unread,
Jun 21, 2009, 1:39:30 PM6/21/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:fykmjqnzt20z.o...@40tude.net...

theoretically, yes.
in practice, this usually does not matter (as small loops are typically
really fast and most strings are short).

in cases where performance is important, one need not do lookups with
strings.
in other cases, this does not matter.

>> the point here is ASCII signature vs binary signature (AKA: a string of
>> bytes where each byte works as a modifier), and the indernal use of
>> indices
>> or handles is not particularly relevant.
>
> What I meant that the difference is n vs. 1.
>

however, if the string is maybe only a few chars long (typical of the sig
strings in my case), this 'n' does not matter a whole lot.

I get a much bigger hit, FWIW, from having profiler options turned on...


>>> Further type tags must be ordered according to the <: relation. An
>>> implementation of <: is required for tests like if the type S is in the
>>> class rooted in the type T.
>>
>> this is a special case...
>
> You need in order to implement dynamic cast.
>

ok. yes, but I was noting the "entirety" of the typesystem, not simply
dynamic cast.

dynamic cast can use special mechanisms (as in, directly interfacing with
the object system), and thus bypassing any direct usage of these strings.
the core mechanisms for dynamic cast were actually implementing in my case
for implementing a JVM, and it makes use of direct pointers, precomputed
hashes, and a hashtable.

InstanceOfP(cls, obj)
SubclassOfP(cls, obj->info)
SubclassOfP(clsa, clsb)
i=(clsa->qhash*127+clsb->qhash)&4095;
if((hclsa[i]==clsa) && (hclsb[i]==clsb))
return(hstat[i]);
...

or something like this...

I think I mentally before wondered if the hash table wondered much in the
face of an operation which simply walks the inheritence tree, but it could
be a lot more expensive in the MI case or similar, so the hash is probably
reasonable.


or, at least, this is how it is done for "managed objects". I have not yet
decided on the RTTI scheme for unmanaged objects.

possibility A: implement the approach from the IA64 C++ ABI.
possibility B: hide a reference to the "class info" in the same spot as
would be used for RTTI, but this would break compatibility with gcc's
RTTI...
possibility C: do internal magic and combine A and B, where I have plain
RTTI, and a hidden reference into the object system embedded in the RTTI
info magic...


>> is T descended from S?
>
> This is the definition of T <: S. In order to evaluate it efficiently, you
> need ordered sets, rather than unordered, represented by a hash. The point
> is merely that simple hash won't really help here.
>

a hash does not help "much", but it does help.

otherwise, it is walking the inheritence tree, which is also cheap.
either way, it is cheap...

from this stance, one could just as easily argue "you can't implement
interface method dispatch with a hash table", but oh wait, I do, and it
works fairly well...

many operations are in the double-digits of clock cycles, so things are
probably not THAT bad...


>>>> but, for now, I am lazy, and make the simplifying assumption that
>>>> structural-equivalence = lexical-equivalence...
>>>
>>> Which is a wrong assumption. Two types are equivalent when they have
>>> same
>>> operations. Once you define an operation Foo for the type int in some
>>> context the result will be another type different from original int. As
>>> known in OO, you have to carry all operations with types in order to be
>>> able to compare (and deal) them. Just for this reason structural
>>> equivalence is hardly compatible with OO.
>>
>> I fail to see the point of this claim...
>
> The point is that structural equivalence does not capture the notion of
> type used in OO.
>

this is a different point.
as noted, the behavior of "named types" is very different from "unnamed
types".

in my case, it is mostly a matter of relevance to unnamed types, than of
named types...

consider:
struct { int x; int y; } foo;
...
struct { int x; int y; } bar;

(stuff like this actually happens in practice).

really, should each struct be assumed to be a completely unique type (and
thus, through subsequent runs, fill up the metadata database with endless
versions of essentially the same struct), or can we just safely merge them,
thus avoiding a "database getting gradually filled with garbage" level of
threat?...


FWIW, 2 different/unrelated classes generally CAN'T be merged via structural
equivalence, it just does not work this way (firstly, classes are typically
named, secondly, they typically inherit things, ...).

as for anonymous classes, well, that remains to be seen...


>> the point in this case is to merge multiple redefinitions of the same
>> type
>> (such as a struct, ...), and lexical equivalence and/or structural
>> equivalence is a good way to do this...
>
> If you have identification, you just compare the identifiers. Otherwise,
> structural equivalence is in general case a halting problem. So the bottom
> line is, do not bother with structural equivalence, it does not work
> anyway.
>

if I have identifiers, then I use them...

the issue here is what to do about cases where the original programmer did
not bother to name the type, and the compiler is left figuring out what to
do about it.

about the halting problem, I don't see why this is the case.
structural equivalence is cheap enough to check, and hashes can be easily
enough computed.

granted, in the "general case" (as in, not doing hash-based merging), I
would have to go through a potentially long and slow look through the
database, and probably deal with an O(n^2) overhead trying to merge things
(especially bad since the DB is a key/value system vaguely similar to the
Windows System Registry, and so is not so efficient for generalized
queries...).

simple solution: I don't do this...

if the hash approach doesn't work, then the structures do not merge.
it is not that difficult really...


>> the semantic type heirarchy is, as I see it, an entirely different
>> system.
>
> Why? You need a tree distance defined on type tags, otherwise they would
> be
> useless.
>

in my compiler framework, they are implemented as 2 essentially parallel
structures.
more or less, this is a side effect of how C is designed, and C++ (more or
less) follows suit (C++ blurs the line a little, but the split more or less
remains).

the result is that there is an internal split between "types" and
"structure".

within the metadata DB, both types and structures are defined, typically, in
terms of QNames (or, alternatively, hashes, if they are used, along with
GUIDs, but I have not had much reason thus far for using/allowing GUIDs as
identifiers, and have instead typically been using Base64-coded 96-bit UIDs
instead...).

there is also fairly heavy use of signatures, which in this case may both
reference the QName (as well as indicating the way in which this qname is
being used), as well as identify primitive types and other things.


hmm:
I am now left tempted by the idea of a B64-GUID, or essentially a GUID using
a Base64 representation as opposed to the
"{XXXXXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX}" notation...

the reason: 22 or 23 characters (vs like 38 chars).

my 96 bit UIDs, however, nicely take up 16 chars (raw representation...).

A..Z a..z 0..9 _ $


>> now, in my case, I tend to assume that
>> lexical-equivalence=structural-equivalence.
>
> Huh, but that is what named equivalence is. It compares names and ignores
> any semantics!
>

I use named equivalence when this is available.

but as noted, not all code contains named types.

Dmitry A. Kazakov

unread,
Jun 21, 2009, 2:24:27 PM6/21/09
to
On Sun, 21 Jun 2009 16:40:49 GMT, Richard Harter wrote:

> On Sun, 21 Jun 2009 13:15:56 +0200, "Dmitry A. Kazakov"
> <mai...@dmitry-kazakov.de> wrote:
>
>>On Sat, 20 Jun 2009 10:28:15 -0700, cr88192 wrote:
>>
>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>> news:1fn4evn388rjq.m...@40tude.net...
>>>> On Fri, 19 Jun 2009 12:26:44 -0700, cr88192 wrote:
>>>>
>>>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>>>> news:rjm4lkpzo505$.jh2kpma5noap.dlg@40tude.net...
>>>>>> On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote:
>
>>>> To start with you have to calculate the hash value from a string, which
>>>> alone is too slow as compared to simple indexing.
>>>
>>> typically, hashes and indices are trivial to calculate.
>>
>>O(n), where n is the length of the string as compared to O(1) of indexing.
>
> I'm not sure what you mean by indexing,

I meant evaluation of the target of a dispatching call. When the
dispatching table is represented by a dense array (the single dispatch
case) indexed by the type tag. Ideal performance of dispatch should be
close to that.

> but for any meaning that
> occurs to me, this is not right.
>
> If we have N distinct strings it takes log(N) decisions to
> distinguish between them. Moreover the average string length, n,
> must be >= log2(N) bits.

You consider here a dispatching table represented by a binary tree. cr88192
did consider hash tables. My response was that evaluation of a hash value
is still O(n).

> Furthermore O(log n) is the cost of determining that a string s is
> possibly in our set of strings. To verify that s actually is in
> the set of strings we must check the the entire string. I.e.,
> verification is O(n).
>
> In short, the cost of looking of looking up a string in a set of
> strings is O(n). The cost can be concealed by a languages syntax
> but it is there.

Yes!

Dmitry A. Kazakov

unread,
Jun 21, 2009, 3:01:44 PM6/21/09
to
On Sun, 21 Jun 2009 10:39:30 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:fykmjqnzt20z.o...@40tude.net...
>> On Sat, 20 Jun 2009 10:28:15 -0700, cr88192 wrote:

>>> typically, hashes and indices are trivial to calculate.
>>
>> O(n), where n is the length of the string as compared to O(1) of indexing.
>
> theoretically, yes.
> in practice, this usually does not matter (as small loops are typically
> really fast and most strings are short).

Not in the case when you have to pack all existing types into the table.
The idea was to handle it global by the OS.

> in cases where performance is important, one need not do lookups with
> strings.

The language does not know if performance is important when the programmer
writes f(x). It has to decide how to implement the call. If the expected
performance of an OS service is poor, it will not be used at all.

>>>> Further type tags must be ordered according to the <: relation. An
>>>> implementation of <: is required for tests like if the type S is in the
>>>> class rooted in the type T.
>>>
>>> this is a special case...
>>
>> You need in order to implement dynamic cast.
>
> ok. yes, but I was noting the "entirety" of the typesystem, not simply
> dynamic cast.
>
> dynamic cast can use special mechanisms (as in, directly interfacing with
> the object system), and thus bypassing any direct usage of these strings.

So it is that "bypassing" which actually does all the job. (:-)) So you
need something that does it (without strings and hash tables). That thing
could be made a part of OS. But I doubt it would be easy to do.

> I think I mentally before wondered if the hash table wondered much in the
> face of an operation which simply walks the inheritence tree, but it could
> be a lot more expensive in the MI case or similar, so the hash is probably
> reasonable.

I don't believe in hash tables. Additionally to already discussed:

1. Dispatch table search (dispatching)
2. Subtype test (dynamic cast, <:)

you also need a very efficient merge and split of the dispatching tables.

This is an issue when types are declared in a scope (you extend the
dispatching table by adding new type tags = new rows and columns there).
Then, when the type leaves the scope and dies you have to collapse the
tables. It would be extremely expensive operation with hashes. And as an OS
service that is always the case, since applications come and go. So we have
to forget that nice global scope static types.

> in my case, it is mostly a matter of relevance to unnamed types, than of
> named types...
>
> consider:
> struct { int x; int y; } foo;
> ...
> struct { int x; int y; } bar;
>
> (stuff like this actually happens in practice).
>
> really, should each struct be assumed to be a completely unique type (and
> thus, through subsequent runs, fill up the metadata database with endless
> versions of essentially the same struct), or can we just safely merge them,
> thus avoiding a "database getting gradually filled with garbage" level of
> threat?...

Because when f is defined on foo that does not make it defined on bar.

> the issue here is what to do about cases where the original programmer did
> not bother to name the type, and the compiler is left figuring out what to
> do about it.

You introduce a unique ID, which is the type tag.

> about the halting problem, I don't see why this is the case.

Because two types are equivalent if and only if all operations defined on
them are equivalent. If somebody defines f on foo, that alone would make it
distinct from bar. Note that members like x and y are actually operations
too, which get and set the corresponding components. So foo and bar are
equivalent because both have "get x" and "set x" of type integer.

cr88192

unread,
Jun 21, 2009, 6:46:07 PM6/21/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:107uhrk3t7q4x$.pu1rjhwu0wpp$.dlg@40tude.net...

> On Sun, 21 Jun 2009 10:39:30 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:fykmjqnzt20z.o...@40tude.net...
>>> On Sat, 20 Jun 2009 10:28:15 -0700, cr88192 wrote:
>
>>>> typically, hashes and indices are trivial to calculate.
>>>
>>> O(n), where n is the length of the string as compared to O(1) of
>>> indexing.
>>
>> theoretically, yes.
>> in practice, this usually does not matter (as small loops are typically
>> really fast and most strings are short).
>
> Not in the case when you have to pack all existing types into the table.
> The idea was to handle it global by the OS.
>

not so much of a problem...
one can just allocate a hash table with maybe 1M entries, and hope that it
does not run out (or, if it does, endure the cost of re-hashing into a
bigger table...).


>> in cases where performance is important, one need not do lookups with
>> strings.
>
> The language does not know if performance is important when the programmer
> writes f(x). It has to decide how to implement the call. If the expected
> performance of an OS service is poor, it will not be used at all.
>

hmm... many people tolerate Vista and Windows 7...


I have noticed that many typical Win32 API calls are FAR slower than even
code which is done dynamically and via ASCII-based accessors (such as doing
an interface method call and dispatching the method based on a textual name
and signature, rather than via method handles...).

the slowness of the Win32 API is almost notable in its own right...


anyways, one can generally optimize according to commonality.
machinery likely to be core functionality of a framework (such as accessing
fields or calling methods) can be highly optimized.

this does not mean that every other piece of machinery need follow the exact
same design...

>>>>> Further type tags must be ordered according to the <: relation. An
>>>>> implementation of <: is required for tests like if the type S is in
>>>>> the
>>>>> class rooted in the type T.
>>>>
>>>> this is a special case...
>>>
>>> You need in order to implement dynamic cast.
>>
>> ok. yes, but I was noting the "entirety" of the typesystem, not simply
>> dynamic cast.
>>
>> dynamic cast can use special mechanisms (as in, directly interfacing with
>> the object system), and thus bypassing any direct usage of these strings.
>
> So it is that "bypassing" which actually does all the job. (:-)) So you
> need something that does it (without strings and hash tables). That thing
> could be made a part of OS. But I doubt it would be easy to do.
>

yeah...

strings are typically my "cannonical" representation, but not every
operation needs to use them...

it is just, which would one rather have for everything:
strings; specialized structures and bit-twiddling.

I prefer to use strings, as then I can represent things fairly uniformly (it
doesn't matter which exact subsystem or similar I am interacting with...).

some subsystems, however, may do things different internally...


>> I think I mentally before wondered if the hash table wondered much in the
>> face of an operation which simply walks the inheritence tree, but it
>> could
>> be a lot more expensive in the MI case or similar, so the hash is
>> probably
>> reasonable.
>
> I don't believe in hash tables. Additionally to already discussed:
>
> 1. Dispatch table search (dispatching)
> 2. Subtype test (dynamic cast, <:)
>
> you also need a very efficient merge and split of the dispatching tables.
>
> This is an issue when types are declared in a scope (you extend the
> dispatching table by adding new type tags = new rows and columns there).
> Then, when the type leaves the scope and dies you have to collapse the
> tables. It would be extremely expensive operation with hashes. And as an
> OS
> service that is always the case, since applications come and go. So we
> have
> to forget that nice global scope static types.
>

once added we could "pretend" that types don't die...

that or, in my object system, the approach is basically to periodically
"rehash" in a way which lesser-used entries are discarded (rehashing is not
cheap, but it is also rare...).

as such, types will die when the table is rehashed, and they fall out of
visibility.


as for the inheritence-checking hash, this has is small, and is more of a
cache. if/when a type is actually destroyed, or another critical change is
made, the hash can be simply cleared and then rebuilt (I don't use this as
much for bigger hashes, as it can be a far more severe hit, overall, than
rehashing...).


>> in my case, it is mostly a matter of relevance to unnamed types, than of
>> named types...
>>
>> consider:
>> struct { int x; int y; } foo;
>> ...
>> struct { int x; int y; } bar;
>>
>> (stuff like this actually happens in practice).
>>
>> really, should each struct be assumed to be a completely unique type (and
>> thus, through subsequent runs, fill up the metadata database with endless
>> versions of essentially the same struct), or can we just safely merge
>> them,
>> thus avoiding a "database getting gradually filled with garbage" level of
>> threat?...
>
> Because when f is defined on foo that does not make it defined on bar.
>

no, as then the structure has changed, and they will no longer be merged...

basically, adding a property will essentially change the structure, and thus
the hash, and so it will no longer try to merge with the older type, rather
it will try to merge with whatever type has the new hash (should such a type
exist).


>> the issue here is what to do about cases where the original programmer
>> did
>> not bother to name the type, and the compiler is left figuring out what
>> to
>> do about it.
>
> You introduce a unique ID, which is the type tag.
>

the problem here is with multiple runs and a large persistent metadata
database...

prior to the DB, this worked plenty well.
after adding the DB, each new run essentially creates garbage, in the form
of new UIDs and redefinitions of essentially the same types...

to deal with this would require some means of effectively "cleaning" the DB
of old entries, and preventing it from gradually being weighed down by huge
amounts of garbege...


>> about the halting problem, I don't see why this is the case.
>
> Because two types are equivalent if and only if all operations defined on
> them are equivalent. If somebody defines f on foo, that alone would make
> it
> distinct from bar. Note that members like x and y are actually operations
> too, which get and set the corresponding components. So foo and bar are
> equivalent because both have "get x" and "set x" of type integer.
>

this is only if said operations are defined externally...

if they are methods, they are part of the structure, and no such merging
would happen in the first place.


if they are defined via operator overloading, then they exist as part of the
types' heirarchy, and not as part of the structures' heirarchy (a given
structural type may be shared between several ADTs, and these ADTs will
continue being unique, even if the structure is shared...).

if operator overloading were made to merge the overloaded operators into the
structure, then effectively they would join with said structure, thus
changing its hash and preventing merging.

so, again, there does not seem to be any real problem IMO...

Dmitry A. Kazakov

unread,
Jun 22, 2009, 4:18:17 AM6/22/09
to
On Sun, 21 Jun 2009 15:46:07 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:107uhrk3t7q4x$.pu1rjhwu0wpp$.dlg@40tude.net...
>> On Sun, 21 Jun 2009 10:39:30 -0700, cr88192 wrote:

>>> theoretically, yes.
>>> in practice, this usually does not matter (as small loops are typically
>>> really fast and most strings are short).
>>
>> Not in the case when you have to pack all existing types into the table.
>> The idea was to handle it global by the OS.
>
> not so much of a problem...
> one can just allocate a hash table with maybe 1M entries, and hope that it
> does not run out (or, if it does, endure the cost of re-hashing into a
> bigger table...).

...in a flight control system during a flight across Atlantic...

>>> in cases where performance is important, one need not do lookups with
>>> strings.
>>
>> The language does not know if performance is important when the programmer
>> writes f(x). It has to decide how to implement the call. If the expected
>> performance of an OS service is poor, it will not be used at all.
>
> hmm... many people tolerate Vista and Windows 7...

There is already Windows. You cannot do it better (sorry worse (:-)). It is
perfectly bad.

> anyways, one can generally optimize according to commonality.

I don't believe it. If an optimum exists and can be found, why don't you
implement the optimum from the start?

>>> dynamic cast can use special mechanisms (as in, directly interfacing with
>>> the object system), and thus bypassing any direct usage of these strings.
>>
>> So it is that "bypassing" which actually does all the job. (:-)) So you
>> need something that does it (without strings and hash tables). That thing
>> could be made a part of OS. But I doubt it would be easy to do.
>
> yeah...
>
> strings are typically my "cannonical" representation, but not every
> operation needs to use them...

... and operations on type tags are among those!

>>> I think I mentally before wondered if the hash table wondered much in the
>>> face of an operation which simply walks the inheritence tree, but it could
>>> be a lot more expensive in the MI case or similar, so the hash is probably
>>> reasonable.
>>
>> I don't believe in hash tables. Additionally to already discussed:
>>
>> 1. Dispatch table search (dispatching)
>> 2. Subtype test (dynamic cast, <:)
>>
>> you also need a very efficient merge and split of the dispatching tables.
>>
>> This is an issue when types are declared in a scope (you extend the
>> dispatching table by adding new type tags = new rows and columns there).
>> Then, when the type leaves the scope and dies you have to collapse the
>> tables. It would be extremely expensive operation with hashes. And as an OS
>> service that is always the case, since applications come and go. So we have
>> to forget that nice global scope static types.
>
> once added we could "pretend" that types don't die...

Oh yes, that is the Windows approach. If something goes wrong - reboot the
damn beast...

> as for the inheritence-checking hash, this has is small, and is more of a
> cache. if/when a type is actually destroyed, or another critical change is
> made, the hash can be simply cleared and then rebuilt (I don't use this as
> much for bigger hashes, as it can be a far more severe hit, overall, than
> rehashing...).

That makes it totally useless for a language that creates type in a tight
loop. For example, in Ada one often writes:

procedure Foo (X : <some-array-type>) is
subtype Index is Integer range X'Range;
-- The subtype which is always in the bounds of X.
begin
... -- The compiler can omit any bound checks related to
-- use Index with X

This is innocent hint to the compiler with turn catastrophic with what you
proposed when Foo is called in a loop.

>> Because when f is defined on foo that does not make it defined on bar.
>
> no, as then the structure has changed, and they will no longer be merged...

In that case you will need to clone types re-hash each time an operation
get declared. It does not worth the efforts, you save a paar KBytes at the
cost of a massive distributed overhead.



>> You introduce a unique ID, which is the type tag.
>
> the problem here is with multiple runs and a large persistent metadata
> database...

This is what I was trying to convey.



> prior to the DB, this worked plenty well.
> after adding the DB, each new run essentially creates garbage, in the form
> of new UIDs and redefinitions of essentially the same types...
>
> to deal with this would require some means of effectively "cleaning" the DB
> of old entries, and preventing it from gradually being weighed down by huge
> amounts of garbege...

Collecting the garbage implies that the type tag is a system resource,
which makes it even more expensive to use...

>>> about the halting problem, I don't see why this is the case.
>>
>> Because two types are equivalent if and only if all operations defined on
>> them are equivalent. If somebody defines f on foo, that alone would make it
>> distinct from bar. Note that members like x and y are actually operations
>> too, which get and set the corresponding components. So foo and bar are
>> equivalent because both have "get x" and "set x" of type integer.
>
> this is only if said operations are defined externally...

You mean free functions?

> if they are methods, they are part of the structure, and no such merging
> would happen in the first place.

The relation between a type and its operations (methods) is n to m. Neither
is a part of another.

Anyway, the point is, structural equivalence requires looking into the
operations in order to determine equivalence. This is halting.

> if they are defined via operator overloading, then they exist as part of the
> types' heirarchy, and not as part of the structures' heirarchy (a given
> structural type may be shared between several ADTs, and these ADTs will
> continue being unique, even if the structure is shared...).

class X { virtual void Foo (); };
class Y { virtual void Foo (); };

are these two equivalent?

Richard Harter

unread,
Jun 22, 2009, 12:20:40 PM6/22/09
to
On Sun, 21 Jun 2009 20:24:27 +0200, "Dmitry A. Kazakov"
<mai...@dmitry-kazakov.de> wrote:

>On Sun, 21 Jun 2009 16:40:49 GMT, Richard Harter wrote:
>
>> On Sun, 21 Jun 2009 13:15:56 +0200, "Dmitry A. Kazakov"
>> <mai...@dmitry-kazakov.de> wrote:
>>
>>>On Sat, 20 Jun 2009 10:28:15 -0700, cr88192 wrote:
>>>
>>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>>> news:1fn4evn388rjq.m...@40tude.net...
>>>>> On Fri, 19 Jun 2009 12:26:44 -0700, cr88192 wrote:
>>>>>
>>>>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>>>>> news:rjm4lkpzo505$.jh2kpma5noap.dlg@40tude.net...
>>>>>>> On Fri, 19 Jun 2009 10:18:52 -0700, cr88192 wrote:
>>
>>>>> To start with you have to calculate the hash value from a string, which
>>>>> alone is too slow as compared to simple indexing.
>>>>
>>>> typically, hashes and indices are trivial to calculate.
>>>
>>>O(n), where n is the length of the string as compared to O(1) of indexing.
>>
>> I'm not sure what you mean by indexing,
>
>I meant evaluation of the target of a dispatching call. When the
>dispatching table is represented by a dense array (the single dispatch
>case) indexed by the type tag. Ideal performance of dispatch should be
>close to that.

Thank you for the context. Yes, given that there exists a
precomputed type tag, it will be more efficient time wise to do a
table lookup using the precomputed tag than it is compute a new
type signature.

>
>> but for any meaning that
>> occurs to me, this is not right.
>>
>> If we have N distinct strings it takes log(N) decisions to
>> distinguish between them. Moreover the average string length, n,
>> must be >= log2(N) bits.
>
>You consider here a dispatching table represented by a binary tree. cr88192
>did consider hash tables. My response was that evaluation of a hash value
>is still O(n).

No, I am not considering a dispatching table represented by a
binary tree. I was considering that which I considered in the text
that I wrote.



>
>> Furthermore O(log n) is the cost of determining that a string s is
>> possibly in our set of strings. To verify that s actually is in
>> the set of strings we must check the the entire string. I.e.,
>> verification is O(n).
>>
>> In short, the cost of looking of looking up a string in a set of
>> strings is O(n). The cost can be concealed by a languages syntax
>> but it is there.
>
>Yes!

Just as a side note, one should remember that the use of the big
O() function in discussions of programming is almost always a
convenient simplification. We aren't actually interested in the
asymptotic value of cost functions as n goes to infinity - n is
always bounded for some perhaps unknown value of n. Moreover usage
seldom reflects all costs; rather it reflects those costs that are
considered relevant in context.

This side note is not a disagreement with what you wrote; it is
merely a general observation.

cr88192

unread,
Jun 22, 2009, 1:26:18 PM6/22/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:xssk85vmx8ts.1a6biujdmgcr5$.dlg@40tude.net...

> On Sun, 21 Jun 2009 15:46:07 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:107uhrk3t7q4x$.pu1rjhwu0wpp$.dlg@40tude.net...
>>> On Sun, 21 Jun 2009 10:39:30 -0700, cr88192 wrote:
>
>>>> theoretically, yes.
>>>> in practice, this usually does not matter (as small loops are typically
>>>> really fast and most strings are short).
>>>
>>> Not in the case when you have to pack all existing types into the table.
>>> The idea was to handle it global by the OS.
>>
>> not so much of a problem...
>> one can just allocate a hash table with maybe 1M entries, and hope that
>> it
>> does not run out (or, if it does, endure the cost of re-hashing into a
>> bigger table...).
>
> ...in a flight control system during a flight across Atlantic...
>

you would have to have a MASSIVE hash to have any real effect here...
a delay of maybe a few us is not likely to have any real effect here.


of course, you would probably not want to use a garbage collector though
(since these may easily have delays of several hundred MS or more).


>>>> in cases where performance is important, one need not do lookups with
>>>> strings.
>>>
>>> The language does not know if performance is important when the
>>> programmer
>>> writes f(x). It has to decide how to implement the call. If the expected
>>> performance of an OS service is poor, it will not be used at all.
>>
>> hmm... many people tolerate Vista and Windows 7...
>
> There is already Windows. You cannot do it better (sorry worse (:-)). It
> is
> perfectly bad.
>

yep, maybe...


>> anyways, one can generally optimize according to commonality.
>
> I don't believe it. If an optimum exists and can be found, why don't you
> implement the optimum from the start?
>

it is too much effort to optimize everything.
as a result, I optimize whatever is going slow, AKA: whatever starts showing
up in the profiler...


>>>> dynamic cast can use special mechanisms (as in, directly interfacing
>>>> with
>>>> the object system), and thus bypassing any direct usage of these
>>>> strings.
>>>
>>> So it is that "bypassing" which actually does all the job. (:-)) So you
>>> need something that does it (without strings and hash tables). That
>>> thing
>>> could be made a part of OS. But I doubt it would be easy to do.
>>
>> yeah...
>>
>> strings are typically my "cannonical" representation, but not every
>> operation needs to use them...
>
> ... and operations on type tags are among those!
>

well, it works...

granted, this GC/typesystem is not the fastest possible option, but in
general it is "plenty fast enough".
partly this is because I don't do a whole lot of fine-grain typechecking
though.

it is not a very good GC setup for dynamically typed interpreters, for
various reasons (actually, its garbage collection behavior is a much worse
offender), but works plenty well for statically typed code...

as for type signatures, most of these result from compiling said statically
typed code...


>>>> I think I mentally before wondered if the hash table wondered much in
>>>> the
>>>> face of an operation which simply walks the inheritence tree, but it
>>>> could
>>>> be a lot more expensive in the MI case or similar, so the hash is
>>>> probably
>>>> reasonable.
>>>
>>> I don't believe in hash tables. Additionally to already discussed:
>>>
>>> 1. Dispatch table search (dispatching)
>>> 2. Subtype test (dynamic cast, <:)
>>>
>>> you also need a very efficient merge and split of the dispatching
>>> tables.
>>>
>>> This is an issue when types are declared in a scope (you extend the
>>> dispatching table by adding new type tags = new rows and columns there).
>>> Then, when the type leaves the scope and dies you have to collapse the
>>> tables. It would be extremely expensive operation with hashes. And as an
>>> OS
>>> service that is always the case, since applications come and go. So we
>>> have
>>> to forget that nice global scope static types.
>>
>> once added we could "pretend" that types don't die...
>
> Oh yes, that is the Windows approach. If something goes wrong - reboot the
> damn beast...
>

yep.

although, if done well, a rehash or GC operation can be done to clear out
full hashes...


>> as for the inheritence-checking hash, this has is small, and is more of a
>> cache. if/when a type is actually destroyed, or another critical change
>> is
>> made, the hash can be simply cleared and then rebuilt (I don't use this
>> as
>> much for bigger hashes, as it can be a far more severe hit, overall, than
>> rehashing...).
>
> That makes it totally useless for a language that creates type in a tight
> loop. For example, in Ada one often writes:
>
> procedure Foo (X : <some-array-type>) is
> subtype Index is Integer range X'Range;
> -- The subtype which is always in the bounds of X.
> begin
> ... -- The compiler can omit any bound checks related to
> -- use Index with X
>
> This is innocent hint to the compiler with turn catastrophic with what you
> proposed when Foo is called in a loop.
>

that is only if "ranges" are implemented as "proper" types.

anyways, I don't really expect things like this to happen in much practice.

anyways, the above table in question is for the C/I-OO system, and so, it
would only really be an issue if one is creating a constant stream of new
classes, and at this point even then the impact is likely to be modest (it
falls back to doing linear inheritence checks).

creating a constant stream of classes is likely to create far worse problems
than just poor hashing behavior though.


>>> Because when f is defined on foo that does not make it defined on bar.
>>
>> no, as then the structure has changed, and they will no longer be
>> merged...
>
> In that case you will need to clone types re-hash each time an operation
> get declared. It does not worth the efforts, you save a paar KBytes at the
> cost of a massive distributed overhead.
>

not really.

types are not (themselves) mutable...
similarly, they don't force rebuilding the hash tables.

only type deletion would cause this, and in general, type deletion does not
happen explicitly (more often, types fall out of visibility and gradually
disappear).


>>> You introduce a unique ID, which is the type tag.
>>
>> the problem here is with multiple runs and a large persistent metadata
>> database...
>
> This is what I was trying to convey.
>

note that simply creating a new name, will make the problem worse.
this is why I tend to use hash values, as a hash value will tend to be
deterministic, and thus has much better "merge" properties against said
persistent database.


>> prior to the DB, this worked plenty well.
>> after adding the DB, each new run essentially creates garbage, in the
>> form
>> of new UIDs and redefinitions of essentially the same types...
>>
>> to deal with this would require some means of effectively "cleaning" the
>> DB
>> of old entries, and preventing it from gradually being weighed down by
>> huge
>> amounts of garbege...
>
> Collecting the garbage implies that the type tag is a system resource,
> which makes it even more expensive to use...
>

database size is the system resource...

bigger database means more wasted time and memory.

as a result, the generation of DB garbage should be minimized.


>>>> about the halting problem, I don't see why this is the case.
>>>
>>> Because two types are equivalent if and only if all operations defined
>>> on
>>> them are equivalent. If somebody defines f on foo, that alone would make
>>> it
>>> distinct from bar. Note that members like x and y are actually
>>> operations
>>> too, which get and set the corresponding components. So foo and bar are
>>> equivalent because both have "get x" and "set x" of type integer.
>>
>> this is only if said operations are defined externally...
>
> You mean free functions?
>

free functions are not operations...
I don't need to care what functions are defined, since a function is not a
part of the type.

a function is a function...


I had thought you were talking about operator overloading, ...


>> if they are methods, they are part of the structure, and no such merging
>> would happen in the first place.
>
> The relation between a type and its operations (methods) is n to m.
> Neither
> is a part of another.
>
> Anyway, the point is, structural equivalence requires looking into the
> operations in order to determine equivalence. This is halting.
>

no, I don't do it this way.

a class, for example, is a list of slots and methods.
the only methods which can exist for a given class are those defined for
this class, and they are physically located "within" the class.

classes with different method declarations are thus unique.

methods are internally associated with classes based on QName, meaning that,
as is, as far as definitions go, only named classes can have methods at
present (although, this is at the metadata/IL level).

at the language level, a class can have internally-defined methods, which
could apply to unnamed classes (an unnamed class, for technical reasons,
can't have external methods).

in the compiler, the class would be hashed (in this case, likely including
said methods), and so if the classes are not identical, they will not merge.

if 2 unnammed classes both have identical definitions and identical methods,
they may as well be merged...

however, the point is moot, as I don't think this is likely to happen much
in practice.

more so, this could only happen in C++ (in C# and Java, these languages will
not allow this particular situation to happen).


>> if they are defined via operator overloading, then they exist as part of
>> the
>> types' heirarchy, and not as part of the structures' heirarchy (a given
>> structural type may be shared between several ADTs, and these ADTs will
>> continue being unique, even if the structure is shared...).
>
> class X { virtual void Foo (); };
> class Y { virtual void Foo (); };
>
> are these two equivalent?
>

no, these two classes are named and have different names.


mostly it is an issue for things like:
struct { int x; } foo;
...
struct { int x; } bar;

and, also, for the situation where the same type shows up multiple times,
typically as a result of having been included (as part of a header) in
several disjoint modules.

the strategy is to merge such definitions such that they don't pollute the
metadata database...

this situation could also arrise with function pointers/... so it is
important to merge them.


similarly, my framework also typically merges strings as well...
feel like complaining about this next?...

cr88192

unread,
Jun 22, 2009, 2:04:06 PM6/22/09
to

"Richard Harter" <c...@tiac.net> wrote in message
news:4a3fa9b4....@text.giganews.com...

yep, or we can statically lookup the method and then hold a method handle...

granted, this is best suited to static typing, not dynamic typing...

however, my present emphasis is on primarily optimizing the statically typed
case, and dealing with dynamic languages primarily by looking for
"invariants" and compiling them to statically typed code...


FWIW, this can also help locate "questionable" constructions in dynamic
languages, which are often more likely to be programmer error than the
intended behavior.


>>
>>> but for any meaning that
>>> occurs to me, this is not right.
>>>
>>> If we have N distinct strings it takes log(N) decisions to
>>> distinguish between them. Moreover the average string length, n,
>>> must be >= log2(N) bits.
>>
>>You consider here a dispatching table represented by a binary tree.
>>cr88192
>>did consider hash tables. My response was that evaluation of a hash value
>>is still O(n).
>
> No, I am not considering a dispatching table represented by a
> binary tree. I was considering that which I considered in the text
> that I wrote.
>

he seems to have an issue with losing the context and then arguing about
things in a context different than where they were stated.


>>
>>> Furthermore O(log n) is the cost of determining that a string s is
>>> possibly in our set of strings. To verify that s actually is in
>>> the set of strings we must check the the entire string. I.e.,
>>> verification is O(n).
>>>
>>> In short, the cost of looking of looking up a string in a set of
>>> strings is O(n). The cost can be concealed by a languages syntax
>>> but it is there.
>>
>>Yes!
>
> Just as a side note, one should remember that the use of the big
> O() function in discussions of programming is almost always a
> convenient simplification. We aren't actually interested in the
> asymptotic value of cost functions as n goes to infinity - n is
> always bounded for some perhaps unknown value of n. Moreover usage
> seldom reflects all costs; rather it reflects those costs that are
> considered relevant in context.
>
> This side note is not a disagreement with what you wrote; it is
> merely a general observation.
>

yeah...

in particular, what is important is not some "theoretical" complexity,
rather the practical performance.

O(n) where n is both small and bounded, and where O(1) is absurdly cheap, is
IMO not worth worrying about.

O(log2 n) where O(1) is very expensive could very well be a worse situation
than O(n) where O(1) is cheap, except for some sufficiently large n (and if
n is known to be below this threshold, it makes little sense to use an algo
which will be slower).

FFS, in some cases I use O(n^2) and O(n^3) algos to good effect in cases
where I know that n is sufficiently small that it will not matter to the
overall performance.


so, the big concern as I see it, is the actual performance (as in, testable,
measurable, and verifiable with the profiler), not some theoretical
complexity.

and, FWIW, hashing typically improves performance FAR more than it hurts it.

Dmitry A. Kazakov

unread,
Jun 23, 2009, 3:46:50 AM6/23/09
to
On Mon, 22 Jun 2009 10:26:18 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message

> news:xssk85vmx8ts.1a6biujdmgcr5$.dlg@40tude.net...

>> I don't believe it. If an optimum exists and can be found, why don't you
>> implement the optimum from the start?
>
> it is too much effort to optimize everything.
> as a result, I optimize whatever is going slow, AKA: whatever starts showing
> up in the profiler...

Things put into the OS must be close to optimal.

>>> strings are typically my "cannonical" representation, but not every
>>> operation needs to use them...
>>
>> ... and operations on type tags are among those!
>
> well, it works...

I have no doubts about it. The question was if it makes sense to put that
into the OS. To answer this, making it just working is not enough. It must
work optimal for most of possible uses.

>>> as for the inheritence-checking hash, this has is small, and is more of a
>>> cache. if/when a type is actually destroyed, or another critical change
>>> is
>>> made, the hash can be simply cleared and then rebuilt (I don't use this
>>> as
>>> much for bigger hashes, as it can be a far more severe hit, overall, than
>>> rehashing...).
>>
>> That makes it totally useless for a language that creates type in a tight
>> loop. For example, in Ada one often writes:
>>
>> procedure Foo (X : <some-array-type>) is
>> subtype Index is Integer range X'Range;
>> -- The subtype which is always in the bounds of X.
>> begin
>> ... -- The compiler can omit any bound checks related to
>> -- use Index with X
>>
>> This is innocent hint to the compiler with turn catastrophic with what you
>> proposed when Foo is called in a loop.
>
> that is only if "ranges" are implemented as "proper" types.

Yes. Are you saying that tags of "improper" types need not to be handled by
the OS? (:-))

> creating a constant stream of classes is likely to create far worse problems
> than just poor hashing behavior though.

Yes, if the implementation is poor.

Now considering a distributed networking system of the next generation, you
will inevitable face this problem. You cannot put all types into one node.
It means that types will be massively distributed between communicating
nodes (and applications). So you will create and kill types all the time in
the same way, we presently create and kill values. This is IMO a
characteristic of "new generation" OS - working with types *routinely* = at
virtually zero cost.

>>>> Because when f is defined on foo that does not make it defined on bar.
>>>
>>> no, as then the structure has changed, and they will no longer be
>>> merged...
>>
>> In that case you will need to clone types re-hash each time an operation
>> get declared. It does not worth the efforts, you save a paar KBytes at the
>> cost of a massive distributed overhead.
>
> not really.
>
> types are not (themselves) mutable...

Yes.

> similarly, they don't force rebuilding the hash tables.

No. Dispatching table is not a property of a type. It is of a polymorphic
operation of a class (set of types). The set of types is mutable. You can
create new types derived form another. That does not mutate the base. It
mutates the dispatching table.

> only type deletion would cause this, and in general, type deletion does not
> happen explicitly (more often, types fall out of visibility and gradually
> disappear).

I see no difference. Once the type vanishes, the dispatching table should
be cleaned in order to reuse the type tag and memory.

>>> if they are methods, they are part of the structure, and no such merging
>>> would happen in the first place.
>>
>> The relation between a type and its operations (methods) is n to m.
>> Neither is a part of another.
>>
>> Anyway, the point is, structural equivalence requires looking into the
>> operations in order to determine equivalence. This is halting.
>
> no, I don't do it this way.

As I said, what you call structural is in rather nominal based on
signatures.

> a class, for example, is a list of slots and methods.
> the only methods which can exist for a given class are those defined for
> this class, and they are physically located "within" the class.

This model is wrong because it does not support multi-methods. The relation
in not 1 to n, as a matter of fact.

>>> if they are defined via operator overloading, then they exist as part of the
>>> types' heirarchy, and not as part of the structures' heirarchy (a given
>>> structural type may be shared between several ADTs, and these ADTs will
>>> continue being unique, even if the structure is shared...).
>>
>> class X { virtual void Foo (); };
>> class Y { virtual void Foo (); };
>>
>> are these two equivalent?
>
> no, these two classes are named and have different names.
>
> mostly it is an issue for things like:
> struct { int x; } foo;
> ...
> struct { int x; } bar;

OK, I see the first declaration of struct { int } introduces an anonymous
type. The second declaration refers to this declaration.

I don't think it makes sense to support type inference in OS (I would
rather consider equivalence of foo and bar in your example as type
inference). This could be dealt with at the language level.



> similarly, my framework also typically merges strings as well...
> feel like complaining about this next?...

Yes, because there exist different attitudes to the issue. In other
languages anonymously declared types might be treated as not equivalent.
Compare to Ada:

Foo : array (1..2) of Integer; -- Anonymous array type is declared here
Bar : array (1..2) of Integer;

Foo := Bar; -- This is illegal, two anonymous types are not equivalent

cr88192

unread,
Jun 23, 2009, 3:01:20 PM6/23/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:y82e4z753y0c$.w6jdikh6sv99.dlg@40tude.net...

> On Mon, 22 Jun 2009 10:26:18 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:xssk85vmx8ts.1a6biujdmgcr5$.dlg@40tude.net...
>
>>> I don't believe it. If an optimum exists and can be found, why don't you
>>> implement the optimum from the start?
>>
>> it is too much effort to optimize everything.
>> as a result, I optimize whatever is going slow, AKA: whatever starts
>> showing
>> up in the profiler...
>
> Things put into the OS must be close to optimal.
>

maybe, but some things are more important than others.

granted, for an OS is might be a little harder to run it though a profiler.
in my past OS, one idea I once discovered to make it easier to profile and
debug, was basically to rig up code so that the kernel could be built in
"hosted" mode, AKA, running on top of an existing OS.
this then allowed me to profile and debug things more easily, without as
often booting the thing and trying to figure out just where the hell the
crash was.

it might be a little more difficult if the OS involves loadable binaries, as
one would have to devise a loader and process manager also able to cooperate
with the host OS (such as Windows or Linux).

this is made a lot easier if ones binaries and libraries are load-time
relocatable.

I forget exactly, but I think I had used fixed-address binaries (in the
PE/COFF format), which had been set to relocate to an odd address (vague
memory of 0x00C00000 or 0x0C000000 or similar).

either way, this way could allow a "dummy stub" to get the binary loaded
into the process, and get the libraries loaded. note that this would mean
using a different syscall mechanism between hosted and non-hosted operation
(such as using one lib for core syscalls in hosted mode, and another for
non-hosted).


from what I remember though, my kernel did not use fully separate process
spaces, but at the time I was lazy and typically relocated binaries on load
so they went in the same address space as the kernel (I think I had binaries
with relocations left in or similar).

this would simplify loading, but is not so good from a design perspective...

(if I were to do OS-dev now, I would probably likely implement a partial
special purpose CPU emulator and custom profiler and debugger, but this is
more because now I would likely be targetting a long-mode kernel, which
would likely be a PITA to run hosted...).


or such...


>>>> strings are typically my "cannonical" representation, but not every
>>>> operation needs to use them...
>>>
>>> ... and operations on type tags are among those!
>>
>> well, it works...
>
> I have no doubts about it. The question was if it makes sense to put that
> into the OS. To answer this, making it just working is not enough. It must
> work optimal for most of possible uses.
>

I think a lot of this (in its early form) was used in an OS.

basically, my modern codebase is more or less a distant descendant of my OS
project, where long ago I gave up on OS dev (concluding that it was not
likely to achieve useful results), and so all of the non-hosted parts
(drivers, ...) eventually fell off, and other parts atrophied and
disappeared...


it is worth noting that later, a few minor bits which had fell off, were
re-incorporated into my compiler framework, such as chunks of the PE/COFF
loader, ...

integer ranges to me seem more like a compiler hack, so the underlying type
would likely just be a plain integer...


>> creating a constant stream of classes is likely to create far worse
>> problems
>> than just poor hashing behavior though.
>
> Yes, if the implementation is poor.
>

creating lots of classes will end up using up lots of memory in the C/I
system, and at present, there are no mechanisms by which classes can be
"garbage". as such, creating a constant stream of "new" classes would
essentially use up all the memory...

the main advantage is that, in good old C-family languages, this does not
really happen (except maybe in Java, but this creates "anonymous classes"
which I think I implemented as GC friendly).


> Now considering a distributed networking system of the next generation,
> you
> will inevitable face this problem. You cannot put all types into one node.
> It means that types will be massively distributed between communicating
> nodes (and applications). So you will create and kill types all the time
> in
> the same way, we presently create and kill values. This is IMO a
> characteristic of "new generation" OS - working with types *routinely* =
> at
> virtually zero cost.
>

potentially...

however, the question is whether they will be enough "unique" types in the
entirety of the codebase running on such an OS to make this too much of a
problem.

actually, a way to handle this is probably just making a separate
"per-assembly" types DB, then loading this DB for whatever process is being
loaded (and for whatever libraries/assemblies are loaded).

AKA: the .NET strategy...

as is, I maintain a single global types DB, but mostly this is because this
was less effort than bothering to implement a separate DB per assembly.


I had a few times considered the possibility of splitting up the DB, though
probably this would be either through "mounts" or "hives". however, I have
not had reason to do so thus far.


>>>>> Because when f is defined on foo that does not make it defined on bar.
>>>>
>>>> no, as then the structure has changed, and they will no longer be
>>>> merged...
>>>
>>> In that case you will need to clone types re-hash each time an operation
>>> get declared. It does not worth the efforts, you save a paar KBytes at
>>> the
>>> cost of a massive distributed overhead.
>>
>> not really.
>>
>> types are not (themselves) mutable...
>
> Yes.
>
>> similarly, they don't force rebuilding the hash tables.
>
> No. Dispatching table is not a property of a type. It is of a polymorphic
> operation of a class (set of types). The set of types is mutable. You can
> create new types derived form another. That does not mutate the base. It
> mutates the dispatching table.
>

yes.

but, this is not (exactly) how my hashes are used.

I use hash tables for interface method calls, but not for virtual method
calls (except for currently in the MI case).

the reason this is is, because I can make use of a method handle, and
directly index the vtable slot...


for interfaces, the traditional mechanism gets rather ugly, and is not
likely to be much faster than a hash table. so, my solution was to basically
produce a joint hash value (class and method), and look up the method in a
big hash table (and add the requested method if not found).

granted, the hash table I use for this is not exactly small (I think it has
initially 2^18 entries or similar).
(note that I also precompute the hash values, and so combining them and
performing a lookup is fairly fast).

for MI, a similar mechanism is used, but I have since discovered that this
is "ill advised" (namely, in that it can't correctly implement C++'s MI
semantics).

the issue is mostly that MI is implemented by creating a new class which
"merges" all the superclasses, and uses some internal amount of hackery to
redirect superclass fields and methods into the new merged class (or,
essentially, MI is implemented as hacked SI).


>> only type deletion would cause this, and in general, type deletion does
>> not
>> happen explicitly (more often, types fall out of visibility and gradually
>> disappear).
>
> I see no difference. Once the type vanishes, the dispatching table should
> be cleaned in order to reuse the type tag and memory.
>

well, it will happen, but gradually...
once a type falls out of use, it will be cleared out of the hashes
eventually.
the main thing would be to allow named classes to be garbage collected,
which presently is not done (since all the classes are linked together in
the class manager).

note that the class-manager holds resident classes, but classes may exist in
the DB and be non-resident (AKA: just a collection of DB entries). trying to
instance the class may cause it to be loaded from the DB.

a remote possibility could be an "unload" feature for resident classes where
there are no instances (or resident subclasses). however, this would be best
done sparingly, as reloading would be expensive.


>>>> if they are methods, they are part of the structure, and no such
>>>> merging
>>>> would happen in the first place.
>>>
>>> The relation between a type and its operations (methods) is n to m.
>>> Neither is a part of another.
>>>
>>> Anyway, the point is, structural equivalence requires looking into the
>>> operations in order to determine equivalence. This is halting.
>>
>> no, I don't do it this way.
>
> As I said, what you call structural is in rather nominal based on
> signatures.
>

I am not sure what is meant here...


>> a class, for example, is a list of slots and methods.
>> the only methods which can exist for a given class are those defined for
>> this class, and they are physically located "within" the class.
>
> This model is wrong because it does not support multi-methods. The
> relation
> in not 1 to n, as a matter of fact.
>

method overloading works fine, as each method has a different signature...


>>>> if they are defined via operator overloading, then they exist as part
>>>> of the
>>>> types' heirarchy, and not as part of the structures' heirarchy (a given
>>>> structural type may be shared between several ADTs, and these ADTs will
>>>> continue being unique, even if the structure is shared...).
>>>
>>> class X { virtual void Foo (); };
>>> class Y { virtual void Foo (); };
>>>
>>> are these two equivalent?
>>
>> no, these two classes are named and have different names.
>>
>> mostly it is an issue for things like:
>> struct { int x; } foo;
>> ...
>> struct { int x; } bar;
>
> OK, I see the first declaration of struct { int } introduces an anonymous
> type. The second declaration refers to this declaration.
>
> I don't think it makes sense to support type inference in OS (I would
> rather consider equivalence of foo and bar in your example as type
> inference). This could be dealt with at the language level.
>

it is done by the compiler, and then the relevant data is sent to the
matadata DB (managed by the framework runtime...).

in an OS, well, no, this would probably not be all located in the kernel,
rather it would probably be located in runtime libraries (DLLs or Shared
Objects or whatever...).


I would assume in an OS a relatively low-level (AKA: Linux style) kernel
would be used.


>> similarly, my framework also typically merges strings as well...
>> feel like complaining about this next?...
>
> Yes, because there exist different attitudes to the issue. In other
> languages anonymously declared types might be treated as not equivalent.
> Compare to Ada:
>
> Foo : array (1..2) of Integer; -- Anonymous array type is declared here
> Bar : array (1..2) of Integer;
>
> Foo := Bar; -- This is illegal, two anonymous types are not equivalent
>

this is a compiler issue, not a core typesystem or runtime issue...

the compiler can complain about it if it wants, but the runtime need not do
so...

and, if the compiler wants to make each as its own type, it is free to do so
(only that this is not the sort of behavior typically done in C or C++,
where multiple definitions are more likely a side-effect than intentional,
and also it is worth noting that the C standard specifies the use of
structural equivalence...).

Dmitry A. Kazakov

unread,
Jun 24, 2009, 8:30:29 AM6/24/09
to
On Tue, 23 Jun 2009 12:01:20 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message

> news:y82e4z753y0c$.w6jdikh6sv99.dlg@40tude.net...


>
>> Yes. Are you saying that tags of "improper" types need not to be handled
>> by the OS? (:-))
>
> integer ranges to me seem more like a compiler hack, so the underlying type
> would likely just be a plain integer...

There is no such thing like "underlying type" or "plain integer". If any
that is an implementation detail.

It is not only wrong to think in that, it is just wrong. Consider modular
integer of the range 1..3. Again, type is a set of values and operations
defined on them, period. Nothing beyond that.

>>> creating a constant stream of classes is likely to create far worse problems
>>> than just poor hashing behavior though.
>>
>> Yes, if the implementation is poor.
>
> creating lots of classes will end up using up lots of memory in the C/I
> system, and at present, there are no mechanisms by which classes can be
> "garbage". as such, creating a constant stream of "new" classes would
> essentially use up all the memory...

Yes, if the implementation is poor... (:-))

>>>>> if they are methods, they are part of the structure, and no such merging
>>>>> would happen in the first place.
>>>>
>>>> The relation between a type and its operations (methods) is n to m.
>>>> Neither is a part of another.
>>>>
>>>> Anyway, the point is, structural equivalence requires looking into the
>>>> operations in order to determine equivalence. This is halting.
>>>
>>> no, I don't do it this way.
>>
>> As I said, what you call structural is in rather nominal based on
>> signatures.
>
> I am not sure what is meant here...

It is meant that you don't compare types for being semantically equivalent.
You compare their signatures constructed out of the memory layout. This is
type of equivalence is nominal. Semantic equivalence of T1 and T2 means
that

1. for each value of T1 there exist one and only one value of T2
2. the same is true for their operations
3 for each pair of such operations any combination the arguments
equivalent in the sense defined by 1, the both yield the results equivalent
in this sense.

Equivalently any provable predicate P defined in terms of T1 remains true
after substituting T2 for T1.

>>> a class, for example, is a list of slots and methods.
>>> the only methods which can exist for a given class are those defined for
>>> this class, and they are physically located "within" the class.
>>
>> This model is wrong because it does not support multi-methods. The relation
>> in not 1 to n, as a matter of fact.
>
> method overloading works fine, as each method has a different signature...

Methods cannot be overloaded, per definition of. Unless you mean something
else.

>>> similarly, my framework also typically merges strings as well...
>>> feel like complaining about this next?...
>>
>> Yes, because there exist different attitudes to the issue. In other
>> languages anonymously declared types might be treated as not equivalent.
>> Compare to Ada:
>>
>> Foo : array (1..2) of Integer; -- Anonymous array type is declared here
>> Bar : array (1..2) of Integer;
>>
>> Foo := Bar; -- This is illegal, two anonymous types are not equivalent
>
> this is a compiler issue, not a core typesystem or runtime issue...
>
> the compiler can complain about it if it wants, but the runtime need not do
> so...

Exactly the reverse. The OS does not care if any two types are equivalent
according to the language standard. The compiler takes care to infer
equivalence if it will.

This is different in different languages and incomputable in general case.
Why would OS bother about the mess? It is the language semantics to
determine what is equivalent, not the OS's business.

cr88192

unread,
Jun 24, 2009, 1:23:54 PM6/24/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:1nwbhpkrjmg29.b...@40tude.net...

> On Tue, 23 Jun 2009 12:01:20 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:y82e4z753y0c$.w6jdikh6sv99.dlg@40tude.net...
>>
>>> Yes. Are you saying that tags of "improper" types need not to be handled
>>> by the OS? (:-))
>>
>> integer ranges to me seem more like a compiler hack, so the underlying
>> type
>> would likely just be a plain integer...
>
> There is no such thing like "underlying type" or "plain integer". If any
> that is an implementation detail.
>

plain integer:
in my case is given the letter 'i' and is defined as 32 bits.
otherwise, it is 'x' and 64 bits, or 'n' and 128 bits.


> It is not only wrong to think in that, it is just wrong. Consider modular
> integer of the range 1..3. Again, type is a set of values and operations
> defined on them, period. Nothing beyond that.
>

then, the integer "can" be 8 bits, but more likely is 32...


>>>> creating a constant stream of classes is likely to create far worse
>>>> problems
>>>> than just poor hashing behavior though.
>>>
>>> Yes, if the implementation is poor.
>>
>> creating lots of classes will end up using up lots of memory in the C/I
>> system, and at present, there are no mechanisms by which classes can be
>> "garbage". as such, creating a constant stream of "new" classes would
>> essentially use up all the memory...
>
> Yes, if the implementation is poor... (:-))
>

well, you see if you can figure out how to allocate things without using any
memory...


>>>>>> if they are methods, they are part of the structure, and no such
>>>>>> merging
>>>>>> would happen in the first place.
>>>>>
>>>>> The relation between a type and its operations (methods) is n to m.
>>>>> Neither is a part of another.
>>>>>
>>>>> Anyway, the point is, structural equivalence requires looking into the
>>>>> operations in order to determine equivalence. This is halting.
>>>>
>>>> no, I don't do it this way.
>>>
>>> As I said, what you call structural is in rather nominal based on
>>> signatures.
>>
>> I am not sure what is meant here...
>
> It is meant that you don't compare types for being semantically
> equivalent.
> You compare their signatures constructed out of the memory layout. This is
> type of equivalence is nominal. Semantic equivalence of T1 and T2 means
> that
>
> 1. for each value of T1 there exist one and only one value of T2
> 2. the same is true for their operations
> 3 for each pair of such operations any combination the arguments
> equivalent in the sense defined by 1, the both yield the results
> equivalent
> in this sense.
>
> Equivalently any provable predicate P defined in terms of T1 remains true
> after substituting T2 for T1.
>

oh, ok...

this can be done with named types, but generally is not the case for unnamed
types.

this is also more a matter of the "core typesystem", but is not as much the
case for "auxilary types".


>>>> a class, for example, is a list of slots and methods.
>>>> the only methods which can exist for a given class are those defined
>>>> for
>>>> this class, and they are physically located "within" the class.
>>>
>>> This model is wrong because it does not support multi-methods. The
>>> relation
>>> in not 1 to n, as a matter of fact.
>>
>> method overloading works fine, as each method has a different
>> signature...
>
> Methods cannot be overloaded, per definition of. Unless you mean something
> else.
>

void foo(int x);
void foo(float x, float y);
void foo(double x, double y, double z);
...

this is generally called overloading...


internally, each will have a modified name, so, they might be instead
regarded as:
foo(i)v
foo(ff)v
foo(ddd)v

when compiling, the compiler figures out which types the call-site is using,
and picks the appropriate method as the reciever.

actually, internally the name and signature are kept separate, but for some
uses they are joined and spit apart when needed. this is similar to the JVM
strategy.


>>>> similarly, my framework also typically merges strings as well...
>>>> feel like complaining about this next?...
>>>
>>> Yes, because there exist different attitudes to the issue. In other
>>> languages anonymously declared types might be treated as not equivalent.
>>> Compare to Ada:
>>>
>>> Foo : array (1..2) of Integer; -- Anonymous array type is declared
>>> here
>>> Bar : array (1..2) of Integer;
>>>
>>> Foo := Bar; -- This is illegal, two anonymous types are not equivalent
>>
>> this is a compiler issue, not a core typesystem or runtime issue...
>>
>> the compiler can complain about it if it wants, but the runtime need not
>> do
>> so...
>
> Exactly the reverse. The OS does not care if any two types are equivalent
> according to the language standard. The compiler takes care to infer
> equivalence if it will.
>
> This is different in different languages and incomputable in general case.
> Why would OS bother about the mess? It is the language semantics to
> determine what is equivalent, not the OS's business.
>

FWIW, the OS need not particularly care about "types" in the first place...

the OS can instead provide abstract facilities (such as, facilities for
signature handling, if even this, more likely the OS will just provide mmap,
...), leaving all other issues up to the combination of compiler and runtime
libraries...

the runtime only need care about general issues, not specific issues (such
as "exact" type behavior).
things like type-checking, ... which are really compiler issues.

as I said before, the types equivalence/merging semantics are done in the
compiler, not in the runtime. the runtime is only told things by the
compiler ("hey, here is this type, ...").


but, anyways, my main concern here is C-family languages (C, C++, Java, C#,
...), and I am going by what is "standard" practice on these fronts. any
other languages will just have to live with it...

Dmitry A. Kazakov

unread,
Jun 24, 2009, 2:22:59 PM6/24/09
to
On Wed, 24 Jun 2009 10:23:54 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:1nwbhpkrjmg29.b...@40tude.net...
>> On Tue, 23 Jun 2009 12:01:20 -0700, cr88192 wrote:
>>
>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>> news:y82e4z753y0c$.w6jdikh6sv99.dlg@40tude.net...
>>>
>>>> Yes. Are you saying that tags of "improper" types need not to be handled
>>>> by the OS? (:-))
>>>
>>> integer ranges to me seem more like a compiler hack, so the underlying
>>> type
>>> would likely just be a plain integer...
>>
>> There is no such thing like "underlying type" or "plain integer". If any
>> that is an implementation detail.
>
> plain integer:
> in my case is given the letter 'i' and is defined as 32 bits.
> otherwise, it is 'x' and 64 bits, or 'n' and 128 bits.

It is not an integer. Plain integer if any is such implements the set of
integer numbers. Yours does not.

>> It is not only wrong to think in that, it is just wrong. Consider modular
>> integer of the range 1..3. Again, type is a set of values and operations
>> defined on them, period. Nothing beyond that.
>
> then, the integer "can" be 8 bits, but more likely is 32...

It cannot be, since 2+2 = 1 (mod 4)

This illustrates the principle. The type is a set of values and operations.
The operations of the ring modulus 4 are sufficiently different from ones
of the ring 256 (if "8-bits" means modulus 2**8).

>> Yes, if the implementation is poor... (:-))
>
> well, you see if you can figure out how to allocate things without using any
> memory...

That depends on what has to be in the memory...

>>>>> a class, for example, is a list of slots and methods.
>>>>> the only methods which can exist for a given class are those defined for
>>>>> this class, and they are physically located "within" the class.
>>>>
>>>> This model is wrong because it does not support multi-methods. The
>>>> relation in not 1 to n, as a matter of fact.
>>>
>>> method overloading works fine, as each method has a different
>>> signature...
>>
>> Methods cannot be overloaded, per definition of. Unless you mean something
>> else.
>
> void foo(int x);
> void foo(float x, float y);
> void foo(double x, double y, double z);
> ...
>
> this is generally called overloading...

OK, you mean overloaded name "foo". What was the point?

cr88192

unread,
Jun 24, 2009, 5:17:50 PM6/24/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:mv65ymz5j45s$.2vjcwt609bi6.dlg@40tude.net...

> On Wed, 24 Jun 2009 10:23:54 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:1nwbhpkrjmg29.b...@40tude.net...
>>> On Tue, 23 Jun 2009 12:01:20 -0700, cr88192 wrote:
>>>
>>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>>> news:y82e4z753y0c$.w6jdikh6sv99.dlg@40tude.net...
>>>>
>>>>> Yes. Are you saying that tags of "improper" types need not to be
>>>>> handled
>>>>> by the OS? (:-))
>>>>
>>>> integer ranges to me seem more like a compiler hack, so the underlying
>>>> type
>>>> would likely just be a plain integer...
>>>
>>> There is no such thing like "underlying type" or "plain integer". If any
>>> that is an implementation detail.
>>
>> plain integer:
>> in my case is given the letter 'i' and is defined as 32 bits.
>> otherwise, it is 'x' and 64 bits, or 'n' and 128 bits.
>
> It is not an integer. Plain integer if any is such implements the set of
> integer numbers. Yours does not.
>

integers hold however many values will fit in the bits...


>>> It is not only wrong to think in that, it is just wrong. Consider
>>> modular
>>> integer of the range 1..3. Again, type is a set of values and operations
>>> defined on them, period. Nothing beyond that.
>>
>> then, the integer "can" be 8 bits, but more likely is 32...
>
> It cannot be, since 2+2 = 1 (mod 4)
>
> This illustrates the principle. The type is a set of values and
> operations.
> The operations of the ring modulus 4 are sufficiently different from ones
> of the ring 256 (if "8-bits" means modulus 2**8).
>

not really, all that has to happen is that the compiler inserts a hidden
modulus operator or mask, and maybe some extra hidden logic.

same difference though...


>>> Yes, if the implementation is poor... (:-))
>>
>> well, you see if you can figure out how to allocate things without using
>> any
>> memory...
>
> That depends on what has to be in the memory...
>

if certain types are runtime objects (such as classes), then necessarily
they need to be in memory.

OTOH, unmanaged classes don't (necessarily) need runtime memory, but one
can't create them dynamically either (since the class is a fixed part of the
code to be compiled, and even templates/generics are fixed post-compile).


>>>>>> a class, for example, is a list of slots and methods.
>>>>>> the only methods which can exist for a given class are those defined
>>>>>> for
>>>>>> this class, and they are physically located "within" the class.
>>>>>
>>>>> This model is wrong because it does not support multi-methods. The
>>>>> relation in not 1 to n, as a matter of fact.
>>>>
>>>> method overloading works fine, as each method has a different
>>>> signature...
>>>
>>> Methods cannot be overloaded, per definition of. Unless you mean
>>> something
>>> else.
>>
>> void foo(int x);
>> void foo(float x, float y);
>> void foo(double x, double y, double z);
>> ...
>>
>> this is generally called overloading...
>
> OK, you mean overloaded name "foo". What was the point?
>

you were making the claim that structural equivalence (or whatever is
currently being argued against) would not support multi-methods (which can
be noted are more or less analogous to overloaded methods), which is
incorrect, since method overloading clearly works with structural
equivalence (only that, in a hidden way, the signature is regarded as a part
of the name...).

Dmitry A. Kazakov

unread,
Jun 25, 2009, 3:36:21 AM6/25/09
to
On Wed, 24 Jun 2009 14:17:50 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:mv65ymz5j45s$.2vjcwt609bi6.dlg@40tude.net...
>> On Wed, 24 Jun 2009 10:23:54 -0700, cr88192 wrote:
>>
>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>> news:1nwbhpkrjmg29.b...@40tude.net...
>>>> On Tue, 23 Jun 2009 12:01:20 -0700, cr88192 wrote:
>>>>
>>>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>>>> news:y82e4z753y0c$.w6jdikh6sv99.dlg@40tude.net...
>>>>>
>>>>>> Yes. Are you saying that tags of "improper" types need not to be
>>>>>> handled
>>>>>> by the OS? (:-))
>>>>>
>>>>> integer ranges to me seem more like a compiler hack, so the underlying
>>>>> type
>>>>> would likely just be a plain integer...
>>>>
>>>> There is no such thing like "underlying type" or "plain integer". If any
>>>> that is an implementation detail.
>>>
>>> plain integer:
>>> in my case is given the letter 'i' and is defined as 32 bits.
>>> otherwise, it is 'x' and 64 bits, or 'n' and 128 bits.
>>
>> It is not an integer. Plain integer if any is such implements the set of
>> integer numbers. Yours does not.
>
> integers hold however many values will fit in the bits...

Integer hold integer values. Your "plain integer" is not an integer at all.
Because if X is an integer, then any provable proposition about integer set
holds for X. Take the proposition if X is integer then X+1 is integer.
q.e.d.

The moral: You are using a model of integers. This model is inadequate.
There are other competing models, which are inadequate as well. None is
"better" than other, without further considerations. Therefore there is no
"plain integers". Equivalently, there is no "plain number".

>>>> It is not only wrong to think in that, it is just wrong. Consider modular
>>>> integer of the range 1..3. Again, type is a set of values and operations
>>>> defined on them, period. Nothing beyond that.
>>>
>>> then, the integer "can" be 8 bits, but more likely is 32...
>>
>> It cannot be, since 2+2 = 1 (mod 4)
>>
>> This illustrates the principle. The type is a set of values and operations.
>> The operations of the ring modulus 4 are sufficiently different from ones
>> of the ring 256 (if "8-bits" means modulus 2**8).
>
> not really, all that has to happen is that the compiler inserts a hidden
> modulus operator or mask, and maybe some extra hidden logic.

That is an irrelevant implementation detail.



>>>> Yes, if the implementation is poor... (:-))
>>>
>>> well, you see if you can figure out how to allocate things without using
>>> any memory...
>>
>> That depends on what has to be in the memory...
>
> if certain types are runtime objects (such as classes), then necessarily
> they need to be in memory.

No. You don't need full descriptions of a class in the memory. That was the
starting point: what would be possible to do in order to support RTTI.
Note, not reflection, nor introspection, nor type marshaling. Only RTTI.

>>>>>>> a class, for example, is a list of slots and methods.
>>>>>>> the only methods which can exist for a given class are those defined
>>>>>>> for this class, and they are physically located "within" the class.
>>>>>>
>>>>>> This model is wrong because it does not support multi-methods. The
>>>>>> relation in not 1 to n, as a matter of fact.
>>>>>
>>>>> method overloading works fine, as each method has a different
>>>>> signature...
>>>>
>>>> Methods cannot be overloaded, per definition of. Unless you mean
>>>> something
>>>> else.
>>>
>>> void foo(int x);
>>> void foo(float x, float y);
>>> void foo(double x, double y, double z);
>>> ...
>>>
>>> this is generally called overloading...
>>
>> OK, you mean overloaded name "foo". What was the point?
>
> you were making the claim that structural equivalence (or whatever is
> currently being argued against) would not support multi-methods (which can
> be noted are more or less analogous to overloaded methods), which is
> incorrect,

No, I claimed that you cannot have "class = list of slots for method". That
is inconsistent with multi-methods.

> since method overloading clearly works with structural
> equivalence (only that, in a hidden way, the signature is regarded as a part
> of the name...).

That depends on the scope of the name:

declare
procedure Foo is begin null; end Foo;
begin
declare
procedure Foo is begin null; end Foo;
begin
null;
end;
end;

But I fail to see how this is relevant to run-time. Run-time just does not
care about equivalence of the objects Foo and Foo.

cr88192

unread,
Jun 25, 2009, 11:29:34 AM6/25/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:z4pud70x3xae$.1ton4ahqn2jsx$.dlg@40tude.net...

int i;
...
i=i+1;

always true (even when there is an overflow).

bounds-checked arithmetic and saturating arithmetic are possible though.


> The moral: You are using a model of integers. This model is inadequate.
> There are other competing models, which are inadequate as well. None is
> "better" than other, without further considerations. Therefore there is no
> "plain integers". Equivalently, there is no "plain number".
>

they are called integers.
int=32 bits.

then we use longs or int128 if we need more, or short or char/sbyte if we
need less...


>>>>> It is not only wrong to think in that, it is just wrong. Consider
>>>>> modular
>>>>> integer of the range 1..3. Again, type is a set of values and
>>>>> operations
>>>>> defined on them, period. Nothing beyond that.
>>>>
>>>> then, the integer "can" be 8 bits, but more likely is 32...
>>>
>>> It cannot be, since 2+2 = 1 (mod 4)
>>>
>>> This illustrates the principle. The type is a set of values and
>>> operations.
>>> The operations of the ring modulus 4 are sufficiently different from
>>> ones
>>> of the ring 256 (if "8-bits" means modulus 2**8).
>>
>> not really, all that has to happen is that the compiler inserts a hidden
>> modulus operator or mask, and maybe some extra hidden logic.
>
> That is an irrelevant implementation detail.
>

implementation detail?...
then what the hell then is the rest of this thread?...

grr...


but, yes, as I see it, the implementation is primarily, and the conceptual
model is secondary.
we will still have the implementation even with no model, but we have no (or
at least, no useful) model without the implementation.


>>>>> Yes, if the implementation is poor... (:-))
>>>>
>>>> well, you see if you can figure out how to allocate things without
>>>> using
>>>> any memory...
>>>
>>> That depends on what has to be in the memory...
>>
>> if certain types are runtime objects (such as classes), then necessarily
>> they need to be in memory.
>
> No. You don't need full descriptions of a class in the memory. That was
> the
> starting point: what would be possible to do in order to support RTTI.
> Note, not reflection, nor introspection, nor type marshaling. Only RTTI.
>

RTTI, to work at all, needs at least a basic skeleton of the type-system and
class heirarchy (in C++, this much is usually generated by the compiler).

but, for my framework, for it to operate much at all (given the nature of
the project), much of this ends up existing in memory.

?...

a class is:
a list of fields;
a list of methods;
...

I am not sure what you are claiming here...


>> since method overloading clearly works with structural
>> equivalence (only that, in a hidden way, the signature is regarded as a
>> part
>> of the name...).
>
> That depends on the scope of the name:
>
> declare
> procedure Foo is begin null; end Foo;
> begin
> declare
> procedure Foo is begin null; end Foo;
> begin
> null;
> end;
> end;
>
> But I fail to see how this is relevant to run-time. Run-time just does not
> care about equivalence of the objects Foo and Foo.
>

it depends on how ones' project is implemented.

my project lies on the border between a (traditional) compiler and a VM, and
between a static and dynamic typesystem.

actually, a lot of it is "soft-static typing", AKA, the implementation is
largely statically typed in many places, but the machinery is sufficiently
flexible as to allow simulation of dynamically-typed behavior.

dynamic typing can be used in some places, but I try to use it less as it is
less efficient (and is less "provable"...).

Dmitry A. Kazakov

unread,
Jun 25, 2009, 12:34:23 PM6/25/09
to

Are you kidding?

> bounds-checked arithmetic and saturating arithmetic are possible though.

Neither is the integer arithmetic. You consider them integer because you
ignore how arithmetic operations are implemented there. That is exactly the
point. If they are supposed to implement THE integer arithmetic then they
are WRONG. Otherwise they implement something ELSE, i.e. it is not integer
type.

>> The moral: You are using a model of integers. This model is inadequate.
>> There are other competing models, which are inadequate as well. None is
>> "better" than other, without further considerations. Therefore there is no
>> "plain integers". Equivalently, there is no "plain number".
>
> they are called integers.

They are not. They are called *machine numbers*.

>>>>>> It is not only wrong to think in that, it is just wrong. Consider modular
>>>>>> integer of the range 1..3. Again, type is a set of values and operations
>>>>>> defined on them, period. Nothing beyond that.
>>>>>
>>>>> then, the integer "can" be 8 bits, but more likely is 32...
>>>>
>>>> It cannot be, since 2+2 = 1 (mod 4)
>>>>
>>>> This illustrates the principle. The type is a set of values and operations.
>>>> The operations of the ring modulus 4 are sufficiently different from
>>>> ones of the ring 256 (if "8-bits" means modulus 2**8).
>>>
>>> not really, all that has to happen is that the compiler inserts a hidden
>>> modulus operator or mask, and maybe some extra hidden logic.
>>
>> That is an irrelevant implementation detail.
>
> implementation detail?...

Yes, an irrelevant to the issue what is the type.

> then what the hell then is the rest of this thread?...

It was about types and their equivalence.

> but, yes, as I see it, the implementation is primarily, and the conceptual model is secondary.

No the model is primary, its implementation is of no interest. In the
context of type equivalence, the implementation is strictly irrelevant. The
same type can be implemented differently at the *same* machine in the
*same* program.

> we will still have the implementation even with no model, but we have no (or
> at least, no useful) model without the implementation.

Implementation without model is garbage, just because you don't know what
it does implement.

>> No, I claimed that you cannot have "class = list of slots for method". That
>> is inconsistent with multi-methods.
>>
>
> ?...
>
> a class is:
> a list of fields;
> a list of methods;
> ...
>
> I am not sure what you are claiming here...

Class is neither from above.

>>> since method overloading clearly works with structural
>>> equivalence (only that, in a hidden way, the signature is regarded as a
>>> part of the name...).
>>
>> That depends on the scope of the name:
>>
>> declare
>> procedure Foo is begin null; end Foo;
>> begin
>> declare
>> procedure Foo is begin null; end Foo;
>> begin
>> null;
>> end;
>> end;
>>
>> But I fail to see how this is relevant to run-time. Run-time just does not
>> care about equivalence of the objects Foo and Foo.
>
> it depends on how ones' project is implemented.

If you put forward an existing implementation. But the question was how
RTTI could/should be, rather than how it is.

cr88192

unread,
Jun 25, 2009, 1:58:38 PM6/25/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:896rus71oc2k.1nk85oajodq9x$.dlg@40tude.net...

if integer means:
scalar with no fractional part.

then an integer is an integer.
even overflow does not add a fractional part or require changing the type,
so all is well...


>> bounds-checked arithmetic and saturating arithmetic are possible though.
>
> Neither is the integer arithmetic. You consider them integer because you
> ignore how arithmetic operations are implemented there. That is exactly
> the
> point. If they are supposed to implement THE integer arithmetic then they
> are WRONG. Otherwise they implement something ELSE, i.e. it is not integer
> type.
>

none has a fractional part...

if you mean unbounded range, well maybe, but that is a different issue, and
can't be practically solved.

VLIs can help though (I use these sometimes...). but, a VLI is still limited
to whatever type is used to implement the arithmetic, even if storage takes
less...

>>> The moral: You are using a model of integers. This model is inadequate.
>>> There are other competing models, which are inadequate as well. None is
>>> "better" than other, without further considerations. Therefore there is
>>> no
>>> "plain integers". Equivalently, there is no "plain number".
>>
>> they are called integers.
>
> They are not. They are called *machine numbers*.
>

you call them that.

most people call them integers (though usually with a size qualifier...).

it is like how floats are also called floats, but have little real relation
to either water or concrete...

I guess byte/word/dword/qword/... could also be used, but these are usually
considered ASM-specific...


>>>>>>> It is not only wrong to think in that, it is just wrong. Consider
>>>>>>> modular
>>>>>>> integer of the range 1..3. Again, type is a set of values and
>>>>>>> operations
>>>>>>> defined on them, period. Nothing beyond that.
>>>>>>
>>>>>> then, the integer "can" be 8 bits, but more likely is 32...
>>>>>
>>>>> It cannot be, since 2+2 = 1 (mod 4)
>>>>>
>>>>> This illustrates the principle. The type is a set of values and
>>>>> operations.
>>>>> The operations of the ring modulus 4 are sufficiently different from
>>>>> ones of the ring 256 (if "8-bits" means modulus 2**8).
>>>>
>>>> not really, all that has to happen is that the compiler inserts a
>>>> hidden
>>>> modulus operator or mask, and maybe some extra hidden logic.
>>>
>>> That is an irrelevant implementation detail.
>>
>> implementation detail?...
>
> Yes, an irrelevant to the issue what is the type.
>

type is what type is.
if the type is a 32 bit integer, the type is a 32 bit integer.
that is how the type is usually defined...

one types 'int', they usually expect 32 bits.


>> then what the hell then is the rest of this thread?...
>
> It was about types and their equivalence.
>

yes, ok.

none the less, i=i, l=l, x=x, ...

i!=x, because 32bits!=64bits...


>> but, yes, as I see it, the implementation is primarily, and the
>> conceptual model is secondary.
>
> No the model is primary, its implementation is of no interest. In the
> context of type equivalence, the implementation is strictly irrelevant.
> The
> same type can be implemented differently at the *same* machine in the
> *same* program.
>

this makes little sense, things are not done this way...

when we say we have a type (say, a 64-bit double), we expect a 64 bit
double.
to do otherwise is in error...

when we say a struct is layed out a certain way, we can assume that it is
layed out as defined, otherwise, again, this is in error...


>> we will still have the implementation even with no model, but we have no
>> (or
>> at least, no useful) model without the implementation.
>
> Implementation without model is garbage, just because you don't know what
> it does implement.
>

we have ASM without C;
we have the CPU without ASM;
we have physics without the CPU;
...

just because people don't know something does not mean it is not there...
even if everyone and everything were dead (and thus, no ideas or models),
rocks/light/... would remain.


>>> No, I claimed that you cannot have "class = list of slots for method".
>>> That
>>> is inconsistent with multi-methods.
>>>
>>
>> ?...
>>
>> a class is:
>> a list of fields;
>> a list of methods;
>> ...
>>
>> I am not sure what you are claiming here...
>
> Class is neither from above.
>

ok, inheritence then, if you must.
inheritence was implied though...


>>>> since method overloading clearly works with structural
>>>> equivalence (only that, in a hidden way, the signature is regarded as a
>>>> part of the name...).
>>>
>>> That depends on the scope of the name:
>>>
>>> declare
>>> procedure Foo is begin null; end Foo;
>>> begin
>>> declare
>>> procedure Foo is begin null; end Foo;
>>> begin
>>> null;
>>> end;
>>> end;
>>>
>>> But I fail to see how this is relevant to run-time. Run-time just does
>>> not
>>> care about equivalence of the objects Foo and Foo.
>>
>> it depends on how ones' project is implemented.
>
> If you put forward an existing implementation. But the question was how
> RTTI could/should be, rather than how it is.
>

I am not sure I see the distinction.

things are how things are, granted possibilities are possible, one is bound,
inherently, to one of any number of possible implementations.

now, as for how one implements dynamic cast, one can make some
generalizations:

dynamic_cast<classname> classinstance

since we know already that the value is an class-instance, all we have to do
is verify (at runtime) that the instance is a compatible class.

if we are casting up the heirarchy, than in most cases this can be
degenerated to a normal cast.
if the classes are unrelated, then a warning or error may be appropriate
(compiler may then treat it like a downwards cast, or reject the code).


in other cases, rejection or type-checking can be done as appropriate.

in this case, signature checking should not be necessary (and would not
likely produce the correct information anyways).


the IA-64 ABI uses special info embedded in the object files (basically, a
bunch of linked structures standing in for the class heirarchy...). the JVM
uses the class heirarchy directly (internally via use of several special
opcodes).

my approach is to do how I normally do other things in my compiler+object
system:
the compiler produces references to special "magic functions" (basically,
link-time generated code), which would serve to return the class handle in
question, and probably another intrinsic to do the check.

then again, I could merge them, thus giving me something analogous to the
JVM's "instanceof" opcode.

internally, when generating the code (link-time), it requests info (such as
the class handle) from the object system, and may generate a call into the
object system for completing certain tasks (later, I may make the system
tighter-integrated, such that the automatic code generation generates the
code to implement tasks directly, rather than redirecting many of them into
internal API calls...).

for checking inheritence, my system first checks the classes via a special
hash table, and failing this, walks the heirarchy.


the reason I will not, as of yet, directly generate code to perform certain
tasks, is that this will essentially lock down the internals of the object
system, as well as possibly the physical layout of objects. a partial
solution here would be to allow many classes to be declared "immutable"
(basically, telling the object system that no runtime modification of the
class layout will take place), allowing these classes to have more efficient
code be generated (such as accessing a field being a single 'lea'
instruction), but at the cost that altering the layout would break running
code (that or make use of special linker/code generation ninjitsu...).

but, accessing the field with the current approach in within the 2-digit
clock cycles range, so for now it is probably good enough...


note that I use link-time code generation, mostly as otherwise it would
involve adding a lot of ugly complexity to the code generator (and gives a
little more flexibility to the runtime as to how to implement this
stuff...).

I am now thinking though, for my uses it may make sense to sneak in part of
the state of the register allocator as one of the fixed-arguments to some of
these fixed calls, since knowing this could help the link-time codegen
generate more efficient code as it will know which regs it is free to use
(as is, these calls are to preserve any used registers, whereas if told
which regs are free, it could allow using these regs without needing to
preserve/restore them, ...). this would probably be passed in as a bitmask.

...

Pawel Dziepak

unread,
Jun 25, 2009, 2:34:01 PM6/25/09
to
cr88192 wrote:
> then an integer is an integer.
> even overflow does not add a fractional part or require changing the type,
> so all is well...

If there is an overflow it is an undefined behavior (C++03 5/5).

> type is what type is.
> if the type is a 32 bit integer, the type is a 32 bit integer.
> that is how the type is usually defined...
>
> one types 'int', they usually expect 32 bits.

Not really. C/C++ programmer should expect 'int' to "have the natural
size suggested by the architecture of the execution environment", which
not always equals 32 bits.

>> Implementation without model is garbage, just because you don't know what
>> it does implement.
>>
>
> we have ASM without C;
> we have the CPU without ASM;
> we have physics without the CPU;
> ...
> just because people don't know something does not mean it is not there...
> even if everyone and everything were dead (and thus, no ideas or models),
> rocks/light/... would remain.

Even if you know something that does not mean that it is good to depend
on it. Specification tells you what you can do, implementation makes it
possible. If you start to take advantage of implementation details then
your code may become incompatible with the specification.

Pawel Dziepak

cr88192

unread,
Jun 25, 2009, 3:18:37 PM6/25/09
to

"Pawel Dziepak" <pdzi...@quarnos.org> wrote in message
news:h20fur$5pa$1...@news.eternal-september.org...

> cr88192 wrote:
>> then an integer is an integer.
>> even overflow does not add a fractional part or require changing the
>> type,
>> so all is well...
>
> If there is an overflow it is an undefined behavior (C++03 5/5).
>

technically, maybe...
usually this "undefined behavior" is simply to wrap around though...


>> type is what type is.
>> if the type is a 32 bit integer, the type is a 32 bit integer.
>> that is how the type is usually defined...
>>
>> one types 'int', they usually expect 32 bits.
>
> Not really. C/C++ programmer should expect 'int' to "have the natural
> size suggested by the architecture of the execution environment", which
> not always equals 32 bits.
>

in practice, on current computers, it is de-facto defined as 32 bits.


>>> Implementation without model is garbage, just because you don't know
>>> what
>>> it does implement.
>>>
>>
>> we have ASM without C;
>> we have the CPU without ASM;
>> we have physics without the CPU;
>> ...
>> just because people don't know something does not mean it is not there...
>> even if everyone and everything were dead (and thus, no ideas or models),
>> rocks/light/... would remain.
>
> Even if you know something that does not mean that it is good to depend
> on it. Specification tells you what you can do, implementation makes it
> possible. If you start to take advantage of implementation details then
> your code may become incompatible with the specification.
>

typically, the specification and implementation are closely linked.

it makes little sense to consider that the specification is somehow isolated
from the implementation...

it makes about as much sense and then regarding their use by humans as an
"implementation detail"...


anyways, as I see it, a specification is a description of an implementation,
or an abstraction over, an implementation. a specification is in no way
"more fundamental" than the implementation, only that it makes it easier to
gloss over the details, and allow multiple possible implementation
strategies to be in use.


people speaking of "implementation detail" as a prejorative just annoys me
though.

it is the implementation which is the real part, and a specification is
little more than formalisms...

specifying something, similarly, does no good if it is not possible to
implement it effectively.


> Pawel Dziepak


Dmitry A. Kazakov

unread,
Jun 25, 2009, 4:29:52 PM6/25/09
to
On Thu, 25 Jun 2009 10:58:38 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:896rus71oc2k.1nk85oajodq9x$.dlg@40tude.net...
>> On Thu, 25 Jun 2009 08:29:34 -0700, cr88192 wrote:
>>
>> Are you kidding?
>
> if integer means:
> scalar with no fractional part.

No, integer means:

http://en.wikipedia.org/wiki/Integer



> most people call them integers (though usually with a size qualifier...).
>
> it is like how floats are also called floats, but have little real relation
> to either water or concrete...

Float numbers are machine numbers of certain type used to model real
numbers:

http://en.wikipedia.org/wiki/Real_number

> one types 'int', they usually expect 32 bits.

No, one usually expects a reference in the language or compiler manual
which describes the semantics of int. A description there defines the type,
of which name is int.

>>> but, yes, as I see it, the implementation is primarily, and the
>>> conceptual model is secondary.
>>
>> No the model is primary, its implementation is of no interest. In the
>> context of type equivalence, the implementation is strictly irrelevant. The
>> same type can be implemented differently at the *same* machine in the
>> *same* program.
>
> this makes little sense, things are not done this way...

Of course they are. Our software works with many numeric models at once,
because it communicates with sensors and actuators deploying a great
variety of numeric representations.

> when we say we have a type (say, a 64-bit double), we expect a 64 bit
> double.

64-bit double has no meaning. There exist a dozen of floating-point
representations of 64-bit length.

> just because people don't know something does not mean it is not there...

Regarding programs it means exactly this. A program without a specification
is a random collection of bits. The purpose of a program cannot be
determined. This is a hard mathematical fact (see halting problem).

cr88192

unread,
Jun 25, 2009, 7:23:39 PM6/25/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:12v66dbfgzpen.1...@40tude.net...

> On Thu, 25 Jun 2009 10:58:38 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:896rus71oc2k.1nk85oajodq9x$.dlg@40tude.net...
>>> On Thu, 25 Jun 2009 08:29:34 -0700, cr88192 wrote:
>>>
>>> Are you kidding?
>>
>> if integer means:
>> scalar with no fractional part.
>
> No, integer means:
>
> http://en.wikipedia.org/wiki/Integer
>

which property do you feel is "absent"?...


>> most people call them integers (though usually with a size qualifier...).
>>
>> it is like how floats are also called floats, but have little real
>> relation
>> to either water or concrete...
>
> Float numbers are machine numbers of certain type used to model real
> numbers:
>
> http://en.wikipedia.org/wiki/Real_number
>

yes, but "real" is not "float".
"float" represents "real", none the less, the term is different, so in the
same way your objection to "integer" is pointless...

it might have some validity if "float" were somehow realted to water or
concrete, however...


>> one types 'int', they usually expect 32 bits.
>
> No, one usually expects a reference in the language or compiler manual
> which describes the semantics of int. A description there defines the
> type,
> of which name is int.
>

most code for maybe about the past decade has expected 'int' to be 32 bits.
before this, much code was around which expected it to be 16 bits...


>>>> but, yes, as I see it, the implementation is primarily, and the
>>>> conceptual model is secondary.
>>>
>>> No the model is primary, its implementation is of no interest. In the
>>> context of type equivalence, the implementation is strictly irrelevant.
>>> The
>>> same type can be implemented differently at the *same* machine in the
>>> *same* program.
>>
>> this makes little sense, things are not done this way...
>
> Of course they are. Our software works with many numeric models at once,
> because it communicates with sensors and actuators deploying a great
> variety of numeric representations.
>

yet, I don't have to "know" much about any of these, and yet they still keep
working despite this...

FWIW, my "conceptual model" of most of this crap is pointless, it is the
implementation which makes it work...


>> when we say we have a type (say, a 64-bit double), we expect a 64 bit
>> double.
>
> 64-bit double has no meaning. There exist a dozen of floating-point
> representations of 64-bit length.
>

yet, only a few that we can bother to care about, this is besides the
point...


>> just because people don't know something does not mean it is not there...
>
> Regarding programs it means exactly this. A program without a
> specification
> is a random collection of bits. The purpose of a program cannot be
> determined. This is a hard mathematical fact (see halting problem).
>

no, we just will be sitting around stupidly looking at something and not
knowing how it works...


by this reasoning, because for most of human history, no one had any damn
idea about DNA, does this mean that humans could not exist?...

humans were around, and maybe had almost no idea about "how" they exist,
only that they did. similarly, many people had some rather stupid ideas
about biology (such that parents physically contained all of their offspring
and all future generations, at the same time, or for that matter,
spontaneous generation...).


yet, all the same, people continued to exist.

similarly, physics works even though, throughout history, people didn't know
their physics either...
the "implementations" existed, but there was no "conceptual model".
humans figured out all this crap through figuring it out...

thus "existence" / "the implementation" is primary, and "the model" is
secondary.

referencing the halting problem here is, also, pointless...

Dmitry A. Kazakov

unread,
Jun 26, 2009, 4:15:26 AM6/26/09
to
On Thu, 25 Jun 2009 16:23:39 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:12v66dbfgzpen.1...@40tude.net...
>> On Thu, 25 Jun 2009 10:58:38 -0700, cr88192 wrote:
>>
>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>> news:896rus71oc2k.1nk85oajodq9x$.dlg@40tude.net...
>>>> On Thu, 25 Jun 2009 08:29:34 -0700, cr88192 wrote:
>>>>
>>>> Are you kidding?
>>>
>>> if integer means:
>>> scalar with no fractional part.
>>
>> No, integer means:
>>
>> http://en.wikipedia.org/wiki/Integer
>
> which property do you feel is "absent"?...

forall n in Z exists n + 1 in Z and n + 1 > n

>>> most people call them integers (though usually with a size qualifier...).
>>>
>>> it is like how floats are also called floats, but have little real
>>> relation to either water or concrete...
>>
>> Float numbers are machine numbers of certain type used to model real
>> numbers:
>>
>> http://en.wikipedia.org/wiki/Real_number
>
> yes, but "real" is not "float".
> "float" represents "real", none the less, the term is different, so in the
> same way your objection to "integer" is pointless...

"float" is an abbreviation of "floating-point", which refers to a
representation of real in the form:

Fraction * Radix ** Exponent

where fraction is usually normalized.

>>> one types 'int', they usually expect 32 bits.
>>
>> No, one usually expects a reference in the language or compiler manual
>> which describes the semantics of int. A description there defines the
>> type, of which name is int.
>
> most code for maybe about the past decade has expected 'int' to be 32 bits.
> before this, much code was around which expected it to be 16 bits...

Broken code. Every decent programming guide line on C/C++ requires an
explicit use of int32 (or equivalent) when the program requires the
corresponding range of values.

> FWIW, my "conceptual model" of most of this crap is pointless, it is the
> implementation which makes it work...

How do you know if it does? Heard anything about testing, verification and
validation?

>>> when we say we have a type (say, a 64-bit double), we expect a 64 bit
>>> double.
>>
>> 64-bit double has no meaning. There exist a dozen of floating-point
>> representations of 64-bit length.
>
> yet, only a few that we can bother to care about, this is besides the
> point...

How do you identify these "few"?

> by this reasoning, because for most of human history, no one had any damn
> idea about DNA, does this mean that humans could not exist?...

You are confusing the program with its medium and the execution platform.

> thus "existence" / "the implementation" is primary, and "the model" is
> secondary.

> referencing the halting problem here is, also, pointless...

It is not a reference, it is a proof. You cannot create a program which
could tell the purpose of other programs.

"Existence" without specifying what does exist is either tautology or else
meaningless. So any formal system either postulates existence of certain
entities axiomatically, otherwise it proves existence *after* defining what
is we are looking for. There is no third way.

So either you specify your program and them prove, verify etc, that the
junk of bits is indeed the thing, or else you just present that garbage
as-is giving your word that it is, well, is. (The customer must be happy,
with that.)

Pawel Dziepak

unread,
Jun 26, 2009, 4:55:26 AM6/26/09
to
cr88192 wrote:
> in practice, on current computers, it is de-facto defined as 32 bits.

The fact that in most cases sizeof(int) equals 4 doesn't mean that you
should expect it on all hardware architectures (e.g. gcc makes int 16
bit on AVR). AFAIK char is 9 bits on PDP-10, so even though sizeof(int)
is 4, int is 36 bits. That's not common situation (only if you
concentrate on PCs), but if one doesn't make such assumptions it will be
possible to easily port programs to other architectures.

> anyways, as I see it, a specification is a description of an implementation,
> or an abstraction over, an implementation. a specification is in no way
> "more fundamental" than the implementation, only that it makes it easier to
> gloss over the details, and allow multiple possible implementation
> strategies to be in use.
>
>
> people speaking of "implementation detail" as a prejorative just annoys me
> though.
>
> it is the implementation which is the real part, and a specification is
> little more than formalisms...

Specification is an abstract. I think it can be compared to interfaces
in OOP. In general it is always better to depend only on interface (or
specification) instead of implementation. Even if the latter changes,
code may not require any corrections.
The other reason why it is vital to base on language specification is
that, for example C++, still evolves. Even though everybody is trying to
keep backward compatibility, they rather don't bother thinking about
UDs. In other words, new version of the standard might force
implementation to change behavior that is not covered by the specification.

Pawel Dziepak

Rod Pemberton

unread,
Jun 26, 2009, 9:34:25 AM6/26/09
to
"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:12v66dbfgzpen.1...@40tude.net...

> On Thu, 25 Jun 2009 10:58:38 -0700, cr88192 wrote:
> > "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> > news:896rus71oc2k.1nk85oajodq9x$.dlg@40tude.net...
> >> On Thu, 25 Jun 2009 08:29:34 -0700, cr88192 wrote:
> >>
> > if integer means:
> > scalar with no fractional part.
>
> No, integer means:
>
> http://en.wikipedia.org/wiki/Integer
>

OK. You two have been going back and forth on this long enough. That link
is "integer" as defined by mathematics. Right at the top it directs you to
another definition of "integer" as defined by computer science...:
http://en.wikipedia.org/wiki/Integer_(computer_science)

The C and C++ standards also _imply_ the definition of integers also. They
can't explicitly define what they are, although the language appears to do
so - it actually doesn't, since the standards are abstracted from real
systems. That is they can't require int's or integers to be signed or
unsigned, or a certain size, or one's or two's complement. Some cpu
architecture might not have signed integers, even though an "int" in C has
historically been a signed value and most cpu architectures since the
mid-70's do have signed integers. So, the language strongly implies an int
is signed, but it can't require it. That's the problem with abstracting a
model from implementation: nothing can be required.

A byte has been 8-bits since Bob "The Father of ASCII" Bemer standardized
8-bit groupings, and Dr. Werner Buchholz coined the term "byte" in 1960.
However, the C specification in 1989 redefined a byte to be something else
(for portability). So, there is a whole group of people who don't
understand that a byte is 8-bits because their reference definition is
something else.


Rod Pemberton


cr88192

unread,
Jun 27, 2009, 5:29:33 PM6/27/09
to

"Pawel Dziepak" <pdzi...@quarnos.org> wrote in message
news:h222e1$jb4$1...@news.eternal-september.org...

> cr88192 wrote:
>> in practice, on current computers, it is de-facto defined as 32 bits.
>
> The fact that in most cases sizeof(int) equals 4 doesn't mean that you
> should expect it on all hardware architectures (e.g. gcc makes int 16
> bit on AVR). AFAIK char is 9 bits on PDP-10, so even though sizeof(int)
> is 4, int is 36 bits. That's not common situation (only if you
> concentrate on PCs), but if one doesn't make such assumptions it will be
> possible to easily port programs to other architectures.
>

whoever brought up universal portability?...

the usual idea is this:
a person as 1..N targets they actually care about (usually in decreasing
order), past this point, they could care less...

as such, although "some" architectures may have a quirk/issue/... if it is
not in the N they care about, they are not likely to care.


"well, on the z80 and m68k there is 'this' particular issue?..."
"who cares?... the targets are x86, x86-64, and PPC...".


>> anyways, as I see it, a specification is a description of an
>> implementation,
>> or an abstraction over, an implementation. a specification is in no way
>> "more fundamental" than the implementation, only that it makes it easier
>> to
>> gloss over the details, and allow multiple possible implementation
>> strategies to be in use.
>>
>>
>> people speaking of "implementation detail" as a prejorative just annoys
>> me
>> though.
>>
>> it is the implementation which is the real part, and a specification is
>> little more than formalisms...
>
> Specification is an abstract. I think it can be compared to interfaces
> in OOP. In general it is always better to depend only on interface (or
> specification) instead of implementation. Even if the latter changes,
> code may not require any corrections.
> The other reason why it is vital to base on language specification is
> that, for example C++, still evolves. Even though everybody is trying to
> keep backward compatibility, they rather don't bother thinking about
> UDs. In other words, new version of the standard might force
> implementation to change behavior that is not covered by the
> specification.
>

I was not saying that people should ignore the generalities and only target
specific implementations, rather, that what IS the underlying implementation
should not be ignored.


it would be like if an engineer designed structures, only to have them
collapse later, since they regard it as a mere "implementation detail" that
a steel cable does not have a tensile strength of 1000000 lbs or a melting
poing greater than 2000F...

then they refuse to acknowledge the complaints of the contractors that the
design is absurd and that "all these trivial details" will prevent it from
working as desired...


> Pawel Dziepak


cr88192

unread,
Jun 27, 2009, 5:49:57 PM6/27/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:rsxktggkb2wi.1c...@40tude.net...

> On Thu, 25 Jun 2009 16:23:39 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:12v66dbfgzpen.1...@40tude.net...
>>> On Thu, 25 Jun 2009 10:58:38 -0700, cr88192 wrote:
>>>
>>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>>> news:896rus71oc2k.1nk85oajodq9x$.dlg@40tude.net...
>>>>> On Thu, 25 Jun 2009 08:29:34 -0700, cr88192 wrote:
>>>>>
>>>>> Are you kidding?
>>>>
>>>> if integer means:
>>>> scalar with no fractional part.
>>>
>>> No, integer means:
>>>
>>> http://en.wikipedia.org/wiki/Integer
>>
>> which property do you feel is "absent"?...
>
> forall n in Z exists n + 1 in Z and n + 1 > n
>

at this point, one can simply say that intergers (in computers) are a finite
bounded subset of the integer space...

>>>> most people call them integers (though usually with a size
>>>> qualifier...).
>>>>
>>>> it is like how floats are also called floats, but have little real
>>>> relation to either water or concrete...
>>>
>>> Float numbers are machine numbers of certain type used to model real
>>> numbers:
>>>
>>> http://en.wikipedia.org/wiki/Real_number
>>
>> yes, but "real" is not "float".
>> "float" represents "real", none the less, the term is different, so in
>> the
>> same way your objection to "integer" is pointless...
>
> "float" is an abbreviation of "floating-point", which refers to a
> representation of real in the form:
>
> Fraction * Radix ** Exponent
>
> where fraction is usually normalized.
>

yes, but alas, none of this is water or concrete, for which the word "float"
implies (you can "float" a boat, or use the "float" to level the concrete
slab).

otherwise, one need not bother arguing about the difference between
'integer' in computers, and 'integer' in mathematics, and simply note that,
in the case of computers, integers do not hold the complete set of integers.


>>>> one types 'int', they usually expect 32 bits.
>>>
>>> No, one usually expects a reference in the language or compiler manual
>>> which describes the semantics of int. A description there defines the
>>> type, of which name is int.
>>
>> most code for maybe about the past decade has expected 'int' to be 32
>> bits.
>> before this, much code was around which expected it to be 16 bits...
>
> Broken code. Every decent programming guide line on C/C++ requires an
> explicit use of int32 (or equivalent) when the program requires the
> corresponding range of values.
>

but, really, why does this matter?...
after all, code can be edited easily enough...


>> FWIW, my "conceptual model" of most of this crap is pointless, it is the
>> implementation which makes it work...
>
> How do you know if it does? Heard anything about testing, verification and
> validation?
>

one can test, but this tests that the code is not broke, and says nothing
about the universal ontology...


>>>> when we say we have a type (say, a 64-bit double), we expect a 64 bit
>>>> double.
>>>
>>> 64-bit double has no meaning. There exist a dozen of floating-point
>>> representations of 64-bit length.
>>
>> yet, only a few that we can bother to care about, this is besides the
>> point...
>
> How do you identify these "few"?
>

we know them because we know them, it does not exactly require a lot of
thinking to figure these things out...


>> by this reasoning, because for most of human history, no one had any damn
>> idea about DNA, does this mean that humans could not exist?...
>
> You are confusing the program with its medium and the execution platform.
>

the "program" IS the "medium and execution platform"...

it just happens to be the case that source code happens to be easy to
transport and mechanically translate (AKA: compile...).


>> thus "existence" / "the implementation" is primary, and "the model" is
>> secondary.
>
>> referencing the halting problem here is, also, pointless...
>
> It is not a reference, it is a proof. You cannot create a program which
> could tell the purpose of other programs.
>

it is not necessary to do so in the first place.


> "Existence" without specifying what does exist is either tautology or else
> meaningless. So any formal system either postulates existence of certain
> entities axiomatically, otherwise it proves existence *after* defining
> what
> is we are looking for. There is no third way.
>

we know what exists, as it is already there, and simply requires looking at
it enough to know how to best describe what is already there...

people study what exists, and people make new things, which then proceed to
exist as well...
(now, whether anything new is "created" is itself debatable, as it would all
be said to just be an elaborate combination of patterns and internal
reorganizations).


> So either you specify your program and them prove, verify etc, that the
> junk of bits is indeed the thing, or else you just present that garbage
> as-is giving your word that it is, well, is. (The customer must be happy,
> with that.)
>

as I see it, your view of things is backwards...

Dmitry A. Kazakov

unread,
Jun 28, 2009, 4:36:37 AM6/28/09
to
On Sat, 27 Jun 2009 14:49:57 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:rsxktggkb2wi.1c...@40tude.net...
>> On Thu, 25 Jun 2009 16:23:39 -0700, cr88192 wrote:
>>
>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>> news:12v66dbfgzpen.1...@40tude.net...
>>>> On Thu, 25 Jun 2009 10:58:38 -0700, cr88192 wrote:
>>>>
>>>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>>>> news:896rus71oc2k.1nk85oajodq9x$.dlg@40tude.net...
>>>>>> On Thu, 25 Jun 2009 08:29:34 -0700, cr88192 wrote:
>>>>>>
>>>>>> Are you kidding?
>>>>>
>>>>> if integer means:
>>>>> scalar with no fractional part.
>>>>
>>>> No, integer means:
>>>>
>>>> http://en.wikipedia.org/wiki/Integer
>>>
>>> which property do you feel is "absent"?...
>>
>> forall n in Z exists n + 1 in Z and n + 1 > n
>
> at this point, one can simply say that intergers (in computers) are a finite
> bounded subset of the integer space...

Wrong. A bounded subset of Z is not Z. That is the whole point.

When we talk about the integer type, we mean Z and operations defined on
it. Take anything out, and the result will be another type, not integer.
You can call in "int", bounded int, whatever, but that is not integer, it
has properties different from Z, like that (2**32 - 1) + 1 /= 2**32.

> otherwise, one need not bother arguing about the difference between
> 'integer' in computers, and 'integer' in mathematics, and simply note that,
> in the case of computers, integers do not hold the complete set of integers.

Wrong. Computer integers are exactly same as in mathematics. Computer "int"
as a type can be (and is) perfectly described mathematically. The resulting
mathematical structure is not of Z, yet it is a perfectly mathematical
structure.



>>>>> one types 'int', they usually expect 32 bits.
>>>>
>>>> No, one usually expects a reference in the language or compiler manual
>>>> which describes the semantics of int. A description there defines the
>>>> type, of which name is int.
>>>
>>> most code for maybe about the past decade has expected 'int' to be 32 bits.
>>> before this, much code was around which expected it to be 16 bits...
>>
>> Broken code. Every decent programming guide line on C/C++ requires an
>> explicit use of int32 (or equivalent) when the program requires the
>> corresponding range of values.
>
> but, really, why does this matter?...
> after all, code can be edited easily enough...

I don't understand this remark.

>>> FWIW, my "conceptual model" of most of this crap is pointless, it is the
>>> implementation which makes it work...
>>
>> How do you know if it does? Heard anything about testing, verification and
>> validation?
>
> one can test, but this tests that the code is not broke, and says nothing
> about the universal ontology...

How do you test without specifications? What do you test without specifying
it?

>>>>> when we say we have a type (say, a 64-bit double), we expect a 64 bit
>>>>> double.
>>>>
>>>> 64-bit double has no meaning. There exist a dozen of floating-point
>>>> representations of 64-bit length.
>>>
>>> yet, only a few that we can bother to care about, this is besides the
>>> point...
>>
>> How do you identify these "few"?
>
> we know them because we know them, it does not exactly require a lot of
> thinking to figure these things out...

So what does "double" mean without "lot of thinking"?

>>> by this reasoning, because for most of human history, no one had any damn
>>> idea about DNA, does this mean that humans could not exist?...
>>
>> You are confusing the program with its medium and the execution platform.
>
> the "program" IS the "medium and execution platform"...

Nonsense. A program can be printed on paper, painted on a T-shirt,
represented by a composition of used beer cans...

>> "Existence" without specifying what does exist is either tautology or else
>> meaningless. So any formal system either postulates existence of certain
>> entities axiomatically, otherwise it proves existence *after* defining
>> what is we are looking for. There is no third way.
>
> we know what exists, as it is already there, and simply requires looking at
> it enough to know how to best describe what is already there...

What an impressing argument... (:-))

cr88192

unread,
Jun 28, 2009, 9:52:58 AM6/28/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:1cwr1of458xeo.1...@40tude.net...

close enough, and it is traditionally called 'int' or 'integer' either
way...

to do anything differently is a violation of long-held and authoritative
programmer tradition.


>>>>>> one types 'int', they usually expect 32 bits.
>>>>>
>>>>> No, one usually expects a reference in the language or compiler manual
>>>>> which describes the semantics of int. A description there defines the
>>>>> type, of which name is int.
>>>>
>>>> most code for maybe about the past decade has expected 'int' to be 32
>>>> bits.
>>>> before this, much code was around which expected it to be 16 bits...
>>>
>>> Broken code. Every decent programming guide line on C/C++ requires an
>>> explicit use of int32 (or equivalent) when the program requires the
>>> corresponding range of values.
>>
>> but, really, why does this matter?...
>> after all, code can be edited easily enough...
>
> I don't understand this remark.
>

if code only works in one case, and can be edited to work in another, it is
thus not "broken"...
one can argue, why does it matter, if one can just edit it and make it work
as-needed?...


>>>> FWIW, my "conceptual model" of most of this crap is pointless, it is
>>>> the
>>>> implementation which makes it work...
>>>
>>> How do you know if it does? Heard anything about testing, verification
>>> and
>>> validation?
>>
>> one can test, but this tests that the code is not broke, and says nothing
>> about the universal ontology...
>
> How do you test without specifications? What do you test without
> specifying
> it?
>

one tests for behaviors, and that nothing "out of place" happens.

none the less, it is the combination of the code and implementation that
defines the "reality" of the matter (it runs, and we see the results, ...).

editing and testing thus forges it more into what we want it to be and do,
and as it so happens, code is not too difficult to edit.


>>>>>> when we say we have a type (say, a 64-bit double), we expect a 64 bit
>>>>>> double.
>>>>>
>>>>> 64-bit double has no meaning. There exist a dozen of floating-point
>>>>> representations of 64-bit length.
>>>>
>>>> yet, only a few that we can bother to care about, this is besides the
>>>> point...
>>>
>>> How do you identify these "few"?
>>
>> we know them because we know them, it does not exactly require a lot of
>> thinking to figure these things out...
>
> So what does "double" mean without "lot of thinking"?
>

IEEE 754 / IEEE 754r

it also means that the CPU architecture is x86, x86-64, or (maybe) PPC...

since the CPU does something a certain way, the CPU is the authority for
what is double.
since given CPU types are in common use, they are the authority.
given the dependence of the existing software base on particular CPUs, the
software base gives the CPU authority.
the mass of users/developers/... give the software authority.
...

it is much like how a governments/... are composed of large numbers of
individuals, and thus become the rightful authority over the lives and
behaviors of said individuals...


>>>> by this reasoning, because for most of human history, no one had any
>>>> damn
>>>> idea about DNA, does this mean that humans could not exist?...
>>>
>>> You are confusing the program with its medium and the execution
>>> platform.
>>
>> the "program" IS the "medium and execution platform"...
>
> Nonsense. A program can be printed on paper, painted on a T-shirt,
> represented by a composition of used beer cans...
>

these are different representations, and due to their difference in
representation, are essentially different entities.

only that it is possible to convert from one representation to another.


for example, source code and machine code are different simply by a matter
of translation, yet the compiled program is a different entity from the
source representation of the program.

the machine code version is more solidly connected to existence than is the
source-code, but the source code is anchored into existence in large part
because of the existence of compilers.

both exist in large part due to the existence of CPU's and hard drives, ...


>>> "Existence" without specifying what does exist is either tautology or
>>> else
>>> meaningless. So any formal system either postulates existence of certain
>>> entities axiomatically, otherwise it proves existence *after* defining
>>> what is we are looking for. There is no third way.
>>
>> we know what exists, as it is already there, and simply requires looking
>> at
>> it enough to know how to best describe what is already there...
>
> What an impressing argument... (:-))
>

yep.

Dmitry A. Kazakov

unread,
Jun 28, 2009, 11:47:15 AM6/28/09
to
On Sun, 28 Jun 2009 06:52:58 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message

> news:1cwr1of458xeo.1...@40tude.net...

>> Wrong. Computer integers are exactly same as in mathematics. Computer "int"
>> as a type can be (and is) perfectly described mathematically. The resulting
>> mathematical structure is not of Z, yet it is a perfectly mathematical
>> structure.
>
> close enough, and it is traditionally called 'int' or 'integer' either way...
>
> to do anything differently is a violation of long-held and authoritative
> programmer tradition.

I never heard about such. Maybe you meant a tradition of buffer overruns in
C programs? Anyway, no tradition can serve as an excuse for ignorance.

>>>>>>> one types 'int', they usually expect 32 bits.
>>>>>>
>>>>>> No, one usually expects a reference in the language or compiler manual
>>>>>> which describes the semantics of int. A description there defines the
>>>>>> type, of which name is int.
>>>>>
>>>>> most code for maybe about the past decade has expected 'int' to be 32 bits.
>>>>> before this, much code was around which expected it to be 16 bits...
>>>>
>>>> Broken code. Every decent programming guide line on C/C++ requires an
>>>> explicit use of int32 (or equivalent) when the program requires the
>>>> corresponding range of values.
>>>
>>> but, really, why does this matter?...
>>> after all, code can be edited easily enough...
>>
>> I don't understand this remark.
>
> if code only works in one case, and can be edited to work in another, it is
> thus not "broken"...

This is called "fixing broken code".

If the code weren't broken there would be no need to fix it. If it were
designed properly in first place, e.g. following C/C++ guide lines and
elementary common sense, you would not need to fix this kind of errors.

>>>>> FWIW, my "conceptual model" of most of this crap is pointless, it is the
>>>>> implementation which makes it work...
>>>>
>>>> How do you know if it does? Heard anything about testing, verification
>>>> and validation?
>>>
>>> one can test, but this tests that the code is not broke, and says nothing
>>> about the universal ontology...
>>
>> How do you test without specifications? What do you test without
>> specifying it?
>
> one tests for behaviors, and that nothing "out of place" happens.

How "behaviours" and "out of place" are specified? Note the word
"specified".

> none the less, it is the combination of the code and implementation that
> defines the "reality" of the matter (it runs, and we see the results, ...).

Buggy code defines bugs. That's is only reality.

>>>>>>> when we say we have a type (say, a 64-bit double), we expect a 64 bit
>>>>>>> double.
>>>>>>
>>>>>> 64-bit double has no meaning. There exist a dozen of floating-point
>>>>>> representations of 64-bit length.
>>>>>
>>>>> yet, only a few that we can bother to care about, this is besides the
>>>>> point...
>>>>
>>>> How do you identify these "few"?
>>>
>>> we know them because we know them, it does not exactly require a lot of
>>> thinking to figure these things out...
>>
>> So what does "double" mean without "lot of thinking"?
>
> IEEE 754 / IEEE 754r
>
> it also means that the CPU architecture is x86, x86-64, or (maybe) PPC...

> since the CPU does something a certain way, the CPU is the authority for
> what is double.

So, how are you talking about "double" without mentioning the CPU? What are
you talking about? About CPU? Mass culture? Program? In the program, you
have:

#define double char*

what is double?

>>>>> by this reasoning, because for most of human history, no one had any
>>>>> damn
>>>>> idea about DNA, does this mean that humans could not exist?...
>>>>
>>>> You are confusing the program with its medium and the execution
>>>> platform.
>>>
>>> the "program" IS the "medium and execution platform"...
>>
>> Nonsense. A program can be printed on paper, painted on a T-shirt,
>> represented by a composition of used beer cans...
>
> these are different representations, and due to their difference in
> representation, are essentially different entities.

One nonsensical statement derived from another. A program remains the same
entity independently on whether it is stored at the hard drive or memory
steak.

That is one of the key ideas of programming to separate means of computing
from algorithms.

Instructions how to walk to the next bus station can be followed by anybody
who has a head and feet.

cr88192

unread,
Jun 28, 2009, 1:16:12 PM6/28/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:5ejfdcqynoc9.1d...@40tude.net...

> On Sun, 28 Jun 2009 06:52:58 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:1cwr1of458xeo.1...@40tude.net...
>
>>> Wrong. Computer integers are exactly same as in mathematics. Computer
>>> "int"
>>> as a type can be (and is) perfectly described mathematically. The
>>> resulting
>>> mathematical structure is not of Z, yet it is a perfectly mathematical
>>> structure.
>>
>> close enough, and it is traditionally called 'int' or 'integer' either
>> way...
>>
>> to do anything differently is a violation of long-held and authoritative
>> programmer tradition.
>
> I never heard about such. Maybe you meant a tradition of buffer overruns
> in
> C programs? Anyway, no tradition can serve as an excuse for ignorance.
>

C, C++, Java, C#, ... all generally use the terms 'int' and 'integer' to
refer to fixed-size subsets of the integer space...

since these are some of the main programming languages, their use of terms
is authoritative.


>>>>>>>> one types 'int', they usually expect 32 bits.
>>>>>>>
>>>>>>> No, one usually expects a reference in the language or compiler
>>>>>>> manual
>>>>>>> which describes the semantics of int. A description there defines
>>>>>>> the
>>>>>>> type, of which name is int.
>>>>>>
>>>>>> most code for maybe about the past decade has expected 'int' to be 32
>>>>>> bits.
>>>>>> before this, much code was around which expected it to be 16 bits...
>>>>>
>>>>> Broken code. Every decent programming guide line on C/C++ requires an
>>>>> explicit use of int32 (or equivalent) when the program requires the
>>>>> corresponding range of values.
>>>>
>>>> but, really, why does this matter?...
>>>> after all, code can be edited easily enough...
>>>
>>> I don't understand this remark.
>>
>> if code only works in one case, and can be edited to work in another, it
>> is
>> thus not "broken"...
>
> This is called "fixing broken code".
>
> If the code weren't broken there would be no need to fix it. If it were
> designed properly in first place, e.g. following C/C++ guide lines and
> elementary common sense, you would not need to fix this kind of errors.
>

it is only needed to change it:
firstly, if the behavior is depended on;
in porting to an arch where this is not upheld...


'int32_t' and friends were not added until C99, and so it is an ongoing
tradition (from pre-C99) to use base types and assume that they are the
intended sizes...

languages such as Java and C# also standardize the size of these various
base types (for example, standardizing on int being 32 bits and long being
64).


>>>>>> FWIW, my "conceptual model" of most of this crap is pointless, it is
>>>>>> the
>>>>>> implementation which makes it work...
>>>>>
>>>>> How do you know if it does? Heard anything about testing, verification
>>>>> and validation?
>>>>
>>>> one can test, but this tests that the code is not broke, and says
>>>> nothing
>>>> about the universal ontology...
>>>
>>> How do you test without specifications? What do you test without
>>> specifying it?
>>
>> one tests for behaviors, and that nothing "out of place" happens.
>
> How "behaviours" and "out of place" are specified? Note the word
> "specified".
>

we generally "know" when a behavior is out of place, and so usually it is
not necessary to say what is not the expected behavior, only what is the
expected behavior.

now, that one can say what they want code to do, however, says little about
which aspect is more fundamental.


>> none the less, it is the combination of the code and implementation that
>> defines the "reality" of the matter (it runs, and we see the results,
>> ...).
>
> Buggy code defines bugs. That's is only reality.
>

yes, and these bugs are a part of the reality of the compiled program...
so, what is the issue?...

bugs are just as much apart of reality as non-bugs.
things like sickness and death also just happen to exist.

we don't write somewhere that someone will get sick, and then they do.
rather, they get sick, and we note that they have gotten sick.


>>>>>>>> when we say we have a type (say, a 64-bit double), we expect a 64
>>>>>>>> bit
>>>>>>>> double.
>>>>>>>
>>>>>>> 64-bit double has no meaning. There exist a dozen of floating-point
>>>>>>> representations of 64-bit length.
>>>>>>
>>>>>> yet, only a few that we can bother to care about, this is besides the
>>>>>> point...
>>>>>
>>>>> How do you identify these "few"?
>>>>
>>>> we know them because we know them, it does not exactly require a lot of
>>>> thinking to figure these things out...
>>>
>>> So what does "double" mean without "lot of thinking"?
>>
>> IEEE 754 / IEEE 754r
>>
>> it also means that the CPU architecture is x86, x86-64, or (maybe) PPC...
>
>> since the CPU does something a certain way, the CPU is the authority for
>> what is double.
>
> So, how are you talking about "double" without mentioning the CPU? What
> are
> you talking about? About CPU? Mass culture? Program? In the program, you
> have:
>
> #define double char*
>
> what is double?
>

this is a preprocessor issue, and so in the code "double" no longer refers
to double...


>>>>>> by this reasoning, because for most of human history, no one had any
>>>>>> damn
>>>>>> idea about DNA, does this mean that humans could not exist?...
>>>>>
>>>>> You are confusing the program with its medium and the execution
>>>>> platform.
>>>>
>>>> the "program" IS the "medium and execution platform"...
>>>
>>> Nonsense. A program can be printed on paper, painted on a T-shirt,
>>> represented by a composition of used beer cans...
>>
>> these are different representations, and due to their difference in
>> representation, are essentially different entities.
>
> One nonsensical statement derived from another. A program remains the same
> entity independently on whether it is stored at the hard drive or memory
> steak.
>

I disagree...

a copy of a program on a hard-drive and on a memory stick are different
entities (one is located in the hard drive, and the other may go in ones'
pocket...).

they, however, may remain as a single conceptual entity due to all of the
machinery existing to translate from one form to another.


> That is one of the key ideas of programming to separate means of computing
> from algorithms.
>

even as such, this is itself an abstraction over the implementation, and
does not change which is more fundamental.


> Instructions how to walk to the next bus station can be followed by
> anybody
> who has a head and feet.
>

yes, however, one person is not another, and said "instructions" do not
exist apart from either the individuals or the mediums and forms by which
they are transferred...

Rod Pemberton

unread,
Jun 28, 2009, 4:35:41 PM6/28/09
to
"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:5ejfdcqynoc9.1d...@40tude.net...

>
> If the code weren't broken there would be no need to fix it. If it were
> designed properly in first place, e.g. following C/C++ guide lines and
> elementary common sense, you would not need to fix this kind of errors.
>

Oh, well, I just see that claim as untrue... C has had quite a few changes
to the way the language works. Implicit int's which are allowed in K&R C
create parsing conflicts with typedef's in ANSI C. Use of ANSI C's void and
void *'s catch a few common programming errors, but are unavailable in K&R
C. The defintion of NULL changed. pre- and post-decrement operators no
longer work properly due to ANSI C's insertion of "sequence points". The
language has been continually reworked to support larger integer types or
generic parameters behind the scenes. etc. etc.

> Buggy code defines bugs. That's is only reality.

Um... humorously:

So, buggy code can't define non-bugs? ;-)

So, non-buggy code can't define non-bugs, since that would imply an
alternate and additional reality...? :)


Seriously, having done my fair share of professional and non-professional
software development, I can say that the bugs that are found relatively soon
are those that are either expected and tested for, or those that expose
themselves through basic usage of the program. Unfortunately, you only test
for problems where you expect problems are likely to occur, and you can test
a program only so thoroughly in a development environment. Your production
environment will reveal bugs that were missed in the most painful way. Bugs
that are in low use code, are trivially incorrect, are due to unforeseen
logic complexities or unexpected interactions of code modifications, or
require review by other "experts" such as accountants and lawyers, are
likely to not be found for a long time.


Rod Pemberton

Dmitry A. Kazakov

unread,
Jun 29, 2009, 3:41:48 AM6/29/09
to
On Sun, 28 Jun 2009 10:16:12 -0700, cr88192 wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:5ejfdcqynoc9.1d...@40tude.net...
>> On Sun, 28 Jun 2009 06:52:58 -0700, cr88192 wrote:
>>
>> I never heard about such. Maybe you meant a tradition of buffer overruns
>> in C programs? Anyway, no tradition can serve as an excuse for ignorance.
>
> C, C++, Java, C#, ... all generally use the terms 'int' and 'integer' to
> refer to fixed-size subsets of the integer space...
>
> since these are some of the main programming languages, their use of terms
> is authoritative.

And the Moon is made of cheese...

>>>> How do you test without specifications? What do you test without
>>>> specifying it?
>>>
>>> one tests for behaviors, and that nothing "out of place" happens.
>>
>> How "behaviours" and "out of place" are specified? Note the word
>> "specified".
>
> we generally "know" when a behavior is out of place, and so usually it is
> not necessary to say what is not the expected behavior, only what is the
> expected behavior.

How this "knowledge" is communicated? Or is it so that all humankind is
born with this knowledge?



>>> none the less, it is the combination of the code and implementation that
>>> defines the "reality" of the matter (it runs, and we see the results,
>>> ...).
>>
>> Buggy code defines bugs. That's is only reality.
>
> yes, and these bugs are a part of the reality of the compiled program...
> so, what is the issue?...

The issue is that bug is defined as a behavior different from specified.
There is neither bug, nor behavior, nor even program without a
specification. It is the specification of sine, which makes a program
computing sine what it is. With the specification of text processor this
program is garbage.

> bugs are just as much apart of reality as non-bugs.
> things like sickness and death also just happen to exist.
>
> we don't write somewhere that someone will get sick, and then they do.
> rather, they get sick, and we note that they have gotten sick.

Ask physicians for a definition of sickness.

>>>>>>>>> when we say we have a type (say, a 64-bit double), we expect a 64
>>>>>>>>> bit
>>>>>>>>> double.
>>>>>>>>
>>>>>>>> 64-bit double has no meaning. There exist a dozen of floating-point
>>>>>>>> representations of 64-bit length.
>>>>>>>
>>>>>>> yet, only a few that we can bother to care about, this is besides the
>>>>>>> point...
>>>>>>
>>>>>> How do you identify these "few"?
>>>>>
>>>>> we know them because we know them, it does not exactly require a lot of
>>>>> thinking to figure these things out...
>>>>
>>>> So what does "double" mean without "lot of thinking"?
>>>
>>> IEEE 754 / IEEE 754r
>>>
>>> it also means that the CPU architecture is x86, x86-64, or (maybe) PPC...
>>
>>> since the CPU does something a certain way, the CPU is the authority for
>>> what is double.
>>
>> So, how are you talking about "double" without mentioning the CPU? What are
>> you talking about? About CPU? Mass culture? Program? In the program, you have:
>>
>> #define double char*
>>
>> what is double?
>
> this is a preprocessor issue, and so in the code "double" no longer refers
> to double...

Which code you are talking about? The source code is the content of the
text files with the extensions .h and .c.

>>>>>>> by this reasoning, because for most of human history, no one had any damn
>>>>>>> idea about DNA, does this mean that humans could not exist?...
>>>>>>
>>>>>> You are confusing the program with its medium and the execution
>>>>>> platform.
>>>>>
>>>>> the "program" IS the "medium and execution platform"...
>>>>
>>>> Nonsense. A program can be printed on paper, painted on a T-shirt,
>>>> represented by a composition of used beer cans...
>>>
>>> these are different representations, and due to their difference in
>>> representation, are essentially different entities.
>>
>> One nonsensical statement derived from another. A program remains the same
>> entity independently on whether it is stored at the hard drive or memory
>> steak.
>
> I disagree...
>
> a copy of a program on a hard-drive and on a memory stick are different
> entities (one is located in the hard drive, and the other may go in ones'
> pocket...).

I have nothing to add to this. It speaks for itself.

Dmitry A. Kazakov

unread,
Jun 29, 2009, 4:00:50 AM6/29/09
to
On Sun, 28 Jun 2009 16:35:41 -0400, Rod Pemberton wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:5ejfdcqynoc9.1d...@40tude.net...
>>
>> If the code weren't broken there would be no need to fix it. If it were
>> designed properly in first place, e.g. following C/C++ guide lines and
>> elementary common sense, you would not need to fix this kind of errors.
>
> Oh, well, I just see that claim as untrue... C has had quite a few changes
> to the way the language works. Implicit int's which are allowed in K&R C
> create parsing conflicts with typedef's in ANSI C. Use of ANSI C's void and
> void *'s catch a few common programming errors, but are unavailable in K&R
> C. The defintion of NULL changed. pre- and post-decrement operators no
> longer work properly due to ANSI C's insertion of "sequence points". The
> language has been continually reworked to support larger integer types or
> generic parameters behind the scenes. etc. etc.

So? The point is, when the range of an integer number is known, the program
shall use an integer type which range is 1) defined and 2) corresponds to
the required range. What is here to disagree with?

My opponent claimed, something like that a long standing tradition of
indiscriminate use of int everywhere, should justify its use to represent
readings of a sensor with the range of 0..2**93-1 even if the machine
number corresponding to int is -2**15-1..2**15. That is an obvious rubbish.

>> Buggy code defines bugs. That's is only reality.
>
> Um... humorously:
>
> So, buggy code can't define non-bugs? ;-)

False |= True (:-))

> So, non-buggy code can't define non-bugs, since that would imply an
> alternate and additional reality...? :)

The inverse wrong. (:-))

> Seriously, having done my fair share of professional and non-professional
> software development, I can say that the bugs that are found relatively soon
> are those that are either expected and tested for, or those that expose
> themselves through basic usage of the program. Unfortunately, you only test
> for problems where you expect problems are likely to occur, and you can test
> a program only so thoroughly in a development environment. Your production
> environment will reveal bugs that were missed in the most painful way. Bugs
> that are in low use code, are trivially incorrect, are due to unforeseen
> logic complexities or unexpected interactions of code modifications, or
> require review by other "experts" such as accountants and lawyers, are
> likely to not be found for a long time.

Yes. This can be stated in a simpler way: what happens with the bugs in the
specifications? They are worse, much worse than "first-order" bugs.

Rod Pemberton

unread,
Jun 29, 2009, 4:35:40 AM6/29/09
to
"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:w9tbp6x90lp7.1v...@40tude.net...

> On Sun, 28 Jun 2009 16:35:41 -0400, Rod Pemberton wrote:
>
> > "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> > news:5ejfdcqynoc9.1d...@40tude.net...
> >>
> >> If the code weren't broken there would be no need to fix it. If it were
> >> designed properly in first place, e.g. following C/C++ guide lines and
> >> elementary common sense, you would not need to fix this kind of errors.
> >
> > Oh, well, I just see that claim as untrue... C has had quite a few
changes
> > to the way the language works. Implicit int's which are allowed in K&R
C
> > create parsing conflicts with typedef's in ANSI C. Use of ANSI C's void
and
> > void *'s catch a few common programming errors, but are unavailable in
K&R
> > C. The defintion of NULL changed. pre- and post-decrement operators no
> > longer work properly due to ANSI C's insertion of "sequence points".
The
> > language has been continually reworked to support larger integer types
or
> > generic parameters behind the scenes. etc. etc.
>
> So?

All examples of situations where the C code isn't broken, but must be fixed.

> My opponent claimed, something like that a long standing tradition of
> indiscriminate use of int everywhere, should justify its use to represent
> readings of a sensor with the range of 0..2**93-1 even if the machine
> number corresponding to int is -2**15-1..2**15. That is an obvious
rubbish.

With what do you intend to read in a value of range of 0..2**93-1, if you
can only read in values with a range of -2**15-1..2**15?

I think it's clearly not rubbish. In general, you can't read a float or
double directly from a device, but you can read in data via bounded integers
or more typically via bytes.

For example, you'll need to construct an artificial type with the range of
0..2**93-1 to represent the data received from the sensor. Then, you'll
need to read in the value from the sensor using multiple reads of values
with range of -2**15-1..2**15. Then, you'll need to convert those multiple
values of -2**15-1..2**15 to your artificial type with the range of
0..2**93-1.


Rod Pemberton


Dmitry A. Kazakov

unread,
Jun 29, 2009, 8:22:14 AM6/29/09
to
On Mon, 29 Jun 2009 04:35:40 -0400, Rod Pemberton wrote:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
> news:w9tbp6x90lp7.1v...@40tude.net...
>> On Sun, 28 Jun 2009 16:35:41 -0400, Rod Pemberton wrote:
>>
>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>>> news:5ejfdcqynoc9.1d...@40tude.net...
>>>>
>>>> If the code weren't broken there would be no need to fix it. If it were
>>>> designed properly in first place, e.g. following C/C++ guide lines and
>>>> elementary common sense, you would not need to fix this kind of errors.
>>>
>>> Oh, well, I just see that claim as untrue... C has had quite a few changes
>>> to the way the language works. Implicit int's which are allowed in K&R C
>>> create parsing conflicts with typedef's in ANSI C. Use of ANSI C's void and
>>> void *'s catch a few common programming errors, but are unavailable in K&R
>>> C. The defintion of NULL changed. pre- and post-decrement operators no
>>> longer work properly due to ANSI C's insertion of "sequence points". The
>>> language has been continually reworked to support larger integer types or
>>> generic parameters behind the scenes. etc. etc.
>>
>> So?
>
> All examples of situations where the C code isn't broken, but must be fixed.

I see. No, it was broken. When you change the language that may break the
code. You can argue that it is rather the language which is broken, but
that is equivalent. (And regarding C, it was always broken... (:-))

>> My opponent claimed, something like that a long standing tradition of
>> indiscriminate use of int everywhere, should justify its use to represent
>> readings of a sensor with the range of 0..2**93-1 even if the machine
>> number corresponding to int is -2**15-1..2**15. That is an obvious
>> rubbish.
>
> With what do you intend to read in a value of range of 0..2**93-1, if you
> can only read in values with a range of -2**15-1..2**15?

Who said that? Surely I can read values. I need to deploy an integer type
with the range of the value read.

Formally: between the model type and the domain type there must be a
bijective mapping of the sets of values.



> I think it's clearly not rubbish. In general, you can't read a float or
> double directly from a device, but you can read in data via bounded integers
> or more typically via bytes.

No, there is plenty of hardware (some I have on my desk) that communicates
floating-point numbers.

Automation makes some progress too, believe me. (:-))

> For example, you'll need to construct an artificial type with the range of
> 0..2**93-1 to represent the data received from the sensor. Then, you'll
> need to read in the value from the sensor using multiple reads of values
> with range of -2**15-1..2**15.

That depends on the hardware protocol. Whether acquiring a value requires
multiple I/O operations is irrelevant. Normally there are multiple layered
protocols which connects you to the sensor.

cr88192

unread,
Jun 29, 2009, 12:35:47 PM6/29/09
to

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:1vkli3q8m415f$.1cwt2r2a5ahi7.dlg@40tude.net...

> On Sun, 28 Jun 2009 10:16:12 -0700, cr88192 wrote:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
>> news:5ejfdcqynoc9.1d...@40tude.net...
>>> On Sun, 28 Jun 2009 06:52:58 -0700, cr88192 wrote:
>>>
>>> I never heard about such. Maybe you meant a tradition of buffer overruns
>>> in C programs? Anyway, no tradition can serve as an excuse for
>>> ignorance.
>>
>> C, C++, Java, C#, ... all generally use the terms 'int' and 'integer' to
>> refer to fixed-size subsets of the integer space...
>>
>> since these are some of the main programming languages, their use of
>> terms
>> is authoritative.
>
> And the Moon is made of cheese...
>

claiming the moon is made of cheese is a bit different than a disagreement
over the the exact use of a term...


>>>>> How do you test without specifications? What do you test without
>>>>> specifying it?
>>>>
>>>> one tests for behaviors, and that nothing "out of place" happens.
>>>
>>> How "behaviours" and "out of place" are specified? Note the word
>>> "specified".
>>
>> we generally "know" when a behavior is out of place, and so usually it is
>> not necessary to say what is not the expected behavior, only what is the
>> expected behavior.
>
> How this "knowledge" is communicated? Or is it so that all humankind is
> born with this knowledge?
>

humans have things like intuition/... where ones' intuition will usually
start making a fuss if something seems out of place. so, one need not always
need to use exact knowledge.

most bugs tend to be fairly obvious once encountered.
granted, some are less obvious, and some things may seem like bugs which are
not, and these cases may require looking into it a bit more.


>>>> none the less, it is the combination of the code and implementation
>>>> that
>>>> defines the "reality" of the matter (it runs, and we see the results,
>>>> ...).
>>>
>>> Buggy code defines bugs. That's is only reality.
>>
>> yes, and these bugs are a part of the reality of the compiled program...
>> so, what is the issue?...
>
> The issue is that bug is defined as a behavior different from specified.
> There is neither bug, nor behavior, nor even program without a
> specification. It is the specification of sine, which makes a program
> computing sine what it is. With the specification of text processor this
> program is garbage.
>

a 'specification' is only one way to regard the correctness or incorrectness
of a behavior.

for example, organic life continues on just fine, with little or no
adherence to what could be called a specification.

DNA fills this role, however, DNA has not in itself been specified, and
essentially follows a different set of rules from those of formal systems...


things like genetic algorithms, neural nets, ... can do all sorts of things
while similarly, not having such behavior be specified.

likewise for things like hidden markov models, self-organizing behavior, ...


>> bugs are just as much apart of reality as non-bugs.
>> things like sickness and death also just happen to exist.
>>
>> we don't write somewhere that someone will get sick, and then they do.
>> rather, they get sick, and we note that they have gotten sick.
>
> Ask physicians for a definition of sickness.
>

it is usually not needed, as we can usually see when it has happened easily
enough.
anyways, physicians only categorize and describe sickness, but they did not
create it.
them not having created it, makes it essentially a different situation than
a bug.

both.

in the input code, the word remains the seem, but the target is changed.
after comming out of the preprocessor, there is no longer a double to be
seen...


>>>>>>>> by this reasoning, because for most of human history, no one had
>>>>>>>> any damn
>>>>>>>> idea about DNA, does this mean that humans could not exist?...
>>>>>>>
>>>>>>> You are confusing the program with its medium and the execution
>>>>>>> platform.
>>>>>>
>>>>>> the "program" IS the "medium and execution platform"...
>>>>>
>>>>> Nonsense. A program can be printed on paper, painted on a T-shirt,
>>>>> represented by a composition of used beer cans...
>>>>
>>>> these are different representations, and due to their difference in
>>>> representation, are essentially different entities.
>>>
>>> One nonsensical statement derived from another. A program remains the
>>> same
>>> entity independently on whether it is stored at the hard drive or memory
>>> steak.
>>
>> I disagree...
>>
>> a copy of a program on a hard-drive and on a memory stick are different
>> entities (one is located in the hard drive, and the other may go in ones'
>> pocket...).
>
> I have nothing to add to this. It speaks for itself.
>

as they say, "existence precedes essence"...

Marco van de Voort

unread,
Jul 31, 2009, 2:31:14 PM7/31/09
to
["Followup-To:" header set to comp.lang.misc.]

On 2009-06-26, Pawel Dziepak <pdzi...@quarnos.org> wrote:
> cr88192 wrote:
>> in practice, on current computers, it is de-facto defined as 32 bits.
>
> The fact that in most cases sizeof(int) equals 4 doesn't mean that you
> should expect it on all hardware architectures (e.g. gcc makes int 16
> bit on AVR). AFAIK char is 9 bits on PDP-10, so even though sizeof(int)
> is 4, int is 36 bits. That's not common situation (only if you
> concentrate on PCs), but if one doesn't make such assumptions it will be
> possible to easily port programs to other architectures.

What does C do btw on machines that can't adress single bytes, like afaik
the CDC (6000?), where Pascal e.g. got his "pack" and "unpack" oddities in
the standard from?


Alexei A. Frounze

unread,
Aug 1, 2009, 4:50:26 AM8/1/09
to
On Jul 31, 11:31 am, Marco van de Voort <mar...@stack.nl> wrote:
> ["Followup-To:" header set to comp.lang.misc.]
> On 2009-06-26, Pawel Dziepak <pdzie...@quarnos.org> wrote:
>
> > cr88192 wrote:
> >> in practice, on current computers, it is de-facto defined as 32 bits.
>
> > The fact that in most cases sizeof(int) equals 4 doesn't mean that you
> > should expect it on all hardware architectures (e.g. gcc makes int 16
> > bit on AVR). AFAIK char is 9 bits on PDP-10, so even though sizeof(int)
> > is 4, int is 36 bits. That's not common situation (only if you
> > concentrate on PCs), but if one doesn't make such assumptions it will be
> > possible to easily port programs to other architectures.
>
> What does C do btw on machines that can't adress single bytes, like afaik
> the CDC (6000?), where Pascal e.g. got his "pack" and "unpack" oddities in
> the standard from?

Two options:
- waste memory: have as big C bytes/chars as the minimum physically
addressable unit of memory
- waste CPU cycles: every time extract individual C bytes/chars from
the same unit, this would affect pointers (and possibly the
addressable range) since a pointer to a char/byte would need to
contain the number of that individual byte/char

Alex

BGB / cr88192

unread,
Aug 13, 2009, 1:59:13 AM8/13/09
to

"Alexei A. Frounze" <alexf...@gmail.com> wrote in message
news:d52aba0a-3a85-4e94...@f20g2000prn.googlegroups.com...

On Jul 31, 11:31 am, Marco van de Voort <mar...@stack.nl> wrote:
> ["Followup-To:" header set to comp.lang.misc.]
> On 2009-06-26, Pawel Dziepak <pdzie...@quarnos.org> wrote:
>
> > cr88192 wrote:
> >> in practice, on current computers, it is de-facto defined as 32 bits.
>
> > The fact that in most cases sizeof(int) equals 4 doesn't mean that you
> > should expect it on all hardware architectures (e.g. gcc makes int 16
> > bit on AVR). AFAIK char is 9 bits on PDP-10, so even though sizeof(int)
> > is 4, int is 36 bits. That's not common situation (only if you
> > concentrate on PCs), but if one doesn't make such assumptions it will be
> > possible to easily port programs to other architectures.
>
> What does C do btw on machines that can't adress single bytes, like afaik
> the CDC (6000?), where Pascal e.g. got his "pack" and "unpack" oddities in
> the standard from?

<--


Two options:
- waste memory: have as big C bytes/chars as the minimum physically
addressable unit of memory
- waste CPU cycles: every time extract individual C bytes/chars from
the same unit, this would affect pointers (and possibly the
addressable range) since a pointer to a char/byte would need to
contain the number of that individual byte/char

-->

AFAIK, it has more often faked the bytes...

0 new messages