Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

OO object layout - method-table pointer or tag

73 views
Skip to first unread message

James Harris

unread,
Aug 29, 2021, 2:24:52 PM8/29/21
to
I am a long way from implementing OO-type objects but I have in mind a
layout for them and something Dmitry mentioned in another thread
suggested that there may be a better way.

My intended model is to have each object's data preceded by a pointer to
a table of its methods. To illustrate, let's say the source code had a
structure 'point' with two fields (x and y) and which referred to two
methods (plot and clear).

Each 'point' would have the coordinates and the two methods mentioned
such that if we had a point P then

P.x would refer to the x coordinate
P.plot() would invoke the plot method

My idea is to implement that as two parts. Each object would have a
read-write part as

pointer to method table
x
y

and the pointer therein would point to a read-only 'method table' which
had the records and some other fields, such as

number of objects pointing at this table
pointer to the type
pointer to procedure 'plot'
pointer to procedure 'clear'


Sorry if that looks a bit complicated. It's about as short as I can
think to make it. Basically, the method table would contain a count of
the number of objects which were pointing at it, a pointer to the type
definition, and a pointer to each method.

I chose it that way to make it fast to dispatch one of the procedures:
follow one pointer to get to the method table then invoke the routine
pointed to by the pointer at the designated offset. (The routine would
be supplied a pointer to the object.)

To get to base info about the type would also be possible by following
the pointer to the type.

Opinions on the scheme?

However Dmitry said: "You do not need length, because it can be deduced
from the tag. Then tag is more universal than a pointer to the table.
Pointer do not work with multiple-dispatch, for example."

Ignoring the length part (as some objects would have their own length
and some would not but that difference is immaterial here) how would a
'tag' system be laid out in memory and how would one use that to locate
a method?


--
James Harris

Dmitry A. Kazakov

unread,
Aug 29, 2021, 3:32:21 PM8/29/21
to
On 2021-08-29 20:24, James Harris wrote:

> Ignoring the length part (as some objects would have their own length
> and some would not but that difference is immaterial here) how would a
> 'tag' system be laid out in memory and how would one use that to locate
> a method?

You must consider the following inheritance variants:

1. Multiple inheritance (MI)

A B
\ /
C

1.1. Java model. This is when only one parent may have an implementation
(e.g. members). Thus you have a single path where you could keep vptr
[the limited model you have in mind].

1.2. Full MI, C++ model. Any parent may have an implementation. This in
particular means that when you pass C as A you must have vptr of
A'Class, when you pass C as B, you must have another vptr of B'Class.

1.3. Idempotent inheritance:

A
/ \
B C
\ /
D

D has the single instance of A inherited over B and C.

1.4. Additive inheritance:

A
/ \
B C
\ /
D

D has two instances of A, one inherited over B another over C.

2. Multiple dispatch.

2.1. Multimethods only. A multimethod is when all arguments and results
are from the same type hierarchy. E.g. arithmetic operations are
multimethods. Multimethods can have vptr, but the table becomes
multidimensional.

2.2. Parallel inheritance. Arguments can be from different hierarchies,
but only diagonal combinations are allowed:

A1 B1
| |
A2 B2

Foo (A1, B1) - OK
Foo (A1, B2) - not OK
Foo (A2, B2) - OK

This is very common case, e.g. when you have a hierarchy of array
element types parallel to the corresponding array types.

2.3. Full multiple dispatch. Any combinations allowed.

3. Embedded vs. outside tags. If you embed the tag into the object, you
cannot have classes of scalar types.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

David Brown

unread,
Aug 29, 2021, 5:28:44 PM8/29/21
to
So far, that is a straight copy of C++ (without virtual inheritance
support - and it's questionable whether virtual inheritance is useful
enough to justify its complications).

> I chose it that way to make it fast to dispatch one of the procedures:
> follow one pointer to get to the method table then invoke the routine
> pointed to by the pointer at the designated offset. (The routine would
> be supplied a pointer to the object.)
>
> To get to base info about the type would also be possible by following
> the pointer to the type.
>
> Opinions on the scheme?

It works for C++.

Bart

unread,
Aug 29, 2021, 6:04:39 PM8/29/21
to
I haven't implemented this stuff in static code: classes with methods
and with inheritance, so can't speak directly from experience.

I have played with it however, and I have implemented it for dynamic code.

The first thing that strikes me, if I understood correctly, is that each
instance of a Point needs to include not only the data x,y, but also a
pointer.

With apparently, exactly the same pointer in each of millions of
instances. Which all means that if you allocate an array of such points,
you can't just initialise them with all-zeros to (0,0), but need to
specifically initialise that pointer field in each one.

So, why is that actually necessary? Is it because of inheritance, where
any instance P could be of the base class, or of a derived class which
could have its own sets of methods, and this information is not known
until runtime?

I can illustrate that with this example in dynamic code:

record point =
var x,y
proc plot(&self) =
println "POINT",self.x,self.y
end
end

record dirvec(point) = # inherit from Point
proc plot(&self) =
println "DIRVEC",self.x,self.y
end
end

proc start=
a:=point(10,20)
b:=dirvec(30,30)

a.plot()
b.plot()
end

The output here is:

POINT 10 20
DIRVEC 30 30

Each instance (A and B), contains values for its X and Y fields. But no
pointers to method tables. But they both have, because it is dynamic, is
a type tag: a index into tables of types.

That is used to search (again because its dynamic), a table of entries
for the 'plot' attribute (which can be used in a different other types
too), which matches the types of A and B.

I would have expect a static language to convert those last the lines to:

point.plot(a)
dirvec.plot(b)

I guess it depends on how A and B would be declared, and whether, if
they have those two different types, either can still be passed to a
function taking a Point type, and possibly one taking a Dirvec type.

(I think I ran into trouble trying this in static code, not just because
of methods, but because the two types instances would have the normal
fields at different offsets, and could be different types; say Point has
int32 X,Y, but Dirvec defines float64 Y,Z and inherits int32 X.

But maybe I'm just not understanding how OOP is supposed to work.)


James Harris

unread,
Aug 30, 2021, 5:47:58 AM8/30/21
to
On 29/08/2021 20:32, Dmitry A. Kazakov wrote:
> On 2021-08-29 20:24, James Harris wrote:
>
>> Ignoring the length part (as some objects would have their own length
>> and some would not but that difference is immaterial here) how would a
>> 'tag' system be laid out in memory and how would one use that to
>> locate a method?
>
> You must consider the following inheritance variants:

...

Thanks for the information, Dmitry. I might come back to you about some
of that. But for this thread I was wondering how the memory might be
laid out under the tag model that you mentioned. What form does a tag
take and how does the runtime get from the tag to a method?

If it depends on which model is in use then what about single
inheritance and the Full MI model you mentioned?


--
James Harris

James Harris

unread,
Aug 30, 2021, 6:23:15 AM8/30/21
to
On 29/08/2021 23:04, Bart wrote:
> On 29/08/2021 19:24, James Harris wrote:
>> I am a long way from implementing OO-type objects but I have in mind a
>> layout for them and something Dmitry mentioned in another thread
>> suggested that there may be a better way.
>>
>> My intended model is to have each object's data preceded by a pointer
>> to a table of its methods.

...

>>
>> However Dmitry said: "You do not need length, because it can be
>> deduced from the tag. Then tag is more universal than a pointer to the
>> table. Pointer do not work with multiple-dispatch, for example."

...

> I haven't implemented this stuff in static code: classes with methods
> and with inheritance, so can't speak directly from experience.
>
> I have played with it however, and I have implemented it for dynamic code.
>
> The first thing that strikes me, if I understood correctly, is that each
> instance of a Point needs to include not only the data x,y, but also a
> pointer.
>
> With apparently, exactly the same pointer in each of millions of
> instances. Which all means that if you allocate an array of such points,
> you can't just initialise them with all-zeros to (0,0), but need to
> specifically initialise that pointer field in each one.
>
> So, why is that actually necessary? Is it because of inheritance, where
> any instance P could be of the base class, or of a derived class which
> could have its own sets of methods, and this information is not known
> until runtime?

It's not because of inheritance. It's to support polymorphism.

Without OO there would be no 'method table pointer' (MTP). Instead, each
function called to act on an object would know the object's type and,
hence, where its fields are.

With OO the object could be one of a number of different types as long
as they all have the same conceptual set of functions (in the same
internal order), e.g. they both have a plot() function or they both have
an add() function. So you could have

pixel_point P
float_point F

Both types could have a plot() function. As long as it had the same
interface then the function could be invoked by

P.plot()
F.plot()

The key benefit, AISI, is that the routine which calls plot would work
with either type of point. The call wouldn't care what form the point
took; it would be polymorphic.

Of course, whether that's worth the complexity of the implementation is
debatable but it does allow a lot of source code to be simpler.

>
> I can illustrate that with this example in dynamic code:
>
>     record point =
>         var x,y
>         proc plot(&self) =
>             println "POINT",self.x,self.y
>         end
>     end
>
>     record dirvec(point) =        # inherit from Point
>         proc plot(&self) =
>             println "DIRVEC",self.x,self.y
>         end
>     end
>
>     proc start=
>         a:=point(10,20)
>         b:=dirvec(30,30)
>
>         a.plot()
>         b.plot()
>     end
>
> The output here is:
>
>     POINT 10 20
>     DIRVEC 30 30
>
> Each instance (A and B), contains values for its X and Y fields. But no
> pointers to method tables. But they both have, because it is dynamic, is
> a type tag: a index into tables of types.

So your form of a tag is an index into a table? Maybe that's how
Dmitry's 'tag' would be implemented.

I have doubts about indexing into a table rather than pointing to the
method table directly. It would probably require at least one extra
level of dereferencing to find a method.

>
> That is used to search (again because its dynamic), a table of entries
> for the 'plot' attribute (which can be used in a different other types
> too), which matches the types of A and B.
>
> I would have expect a static language to convert those last the lines to:
>
>    point.plot(a)
>    dirvec.plot(b)

Why not

a.plot()
b.plot()

?

>
> I guess it depends on how A and B would be declared, and whether, if
> they have those two different types, either can still be passed to a
> function taking a Point type, and possibly one taking a Dirvec type.
>
> (I think I ran into trouble trying this in static code, not just because
> of methods, but because the two types instances would have the normal
> fields at different offsets, and could be different types; say Point has
> int32 X,Y, but Dirvec defines float64 Y,Z and inherits int32 X.
>
> But maybe I'm just not understanding how OOP is supposed to work.)

Dmitry's reply about various different interpretations of how OOP can
work shows there is no single right answer. Indeed, OOP may not be the
best solution, at all.


--
James Harris

David Brown

unread,
Aug 30, 2021, 7:19:28 AM8/30/21
to
In particular, it is for support of run-time polymorphism.

There is also compile-time polymorphism, handled by templates or other
generic coding mechanisms. Since the types are determined entirely at
compile time, there is no need for a pointer to a virtual method table
and like C++ classes without virtual methods, there is no need for the
extra pointer in the class implementation. (Obviously compile-time
polymorphism has limitations of flexibility, but it results in more
efficient code.)

I often use inheritance in C++ without virtual methods, and therefore no
virtual method table overhead in the classes (and no possibility of
run-time polymorphism).

>
> Without OO there would be no 'method table pointer' (MTP). Instead, each
> function called to act on an object would know the object's type and,
> hence, where its fields are.
>
> With OO the object could be one of a number of different types as long
> as they all have the same conceptual set of functions (in the same
> internal order), e.g. they both have a plot() function or they both have
> an add() function. So you could have
>
>   pixel_point P
>   float_point F
>
> Both types could have a plot() function. As long as it had the same
> interface then the function could be invoked by
>
>   P.plot()
>   F.plot()
>
> The key benefit, AISI, is that the routine which calls plot would work
> with either type of point. The call wouldn't care what form the point
> took; it would be polymorphic.
>
> Of course, whether that's worth the complexity of the implementation is
> debatable but it does allow a lot of source code to be simpler.
>

<https://en.wikipedia.org/wiki/Virtual_method_table>

You are describing exactly the system used in many compiled object
oriented languages.

But you should be aware that compile-time polymorphism has become the
current trend, giving more efficient results in many cases.

>>
>> I can illustrate that with this example in dynamic code:
>>
>>      record point =
>>          var x,y
>>          proc plot(&self) =
>>              println "POINT",self.x,self.y
>>          end
>>      end
>>
>>      record dirvec(point) =        # inherit from Point
>>          proc plot(&self) =
>>              println "DIRVEC",self.x,self.y
>>          end
>>      end
>>
>>      proc start=
>>          a:=point(10,20)
>>          b:=dirvec(30,30)
>>
>>          a.plot()
>>          b.plot()
>>      end
>>
>> The output here is:
>>
>>      POINT 10 20
>>      DIRVEC 30 30
>>
>> Each instance (A and B), contains values for its X and Y fields. But
>> no pointers to method tables. But they both have, because it is
>> dynamic, is a type tag: a index into tables of types.
>

That code is not dynamic polymorphism - the types are all statically
known at compile time. Dynamic polymorphism would be if you also had:

proc doubleplot(&x) =
x.plot();
x.plot();
end

proc start =
a : =point(10, 20)
b := dirvec(30, 30)
doubleplot(&a)
doubleplot(&b)
end

i.e., the function "doubleplot" is compiled without knowing whether it
will be calling the "point" version of "plot", or the "dirvec" version.

> So your form of a tag is an index into a table? Maybe that's how
> Dmitry's 'tag' would be implemented.

That's what it sounds like to me. I don't see off-hand any significant
benefits of using an index into a table, rather than a pointer directly
to the type information (primarily the virtual methods list, but perhaps
also containing run-time type information if the language supports taht).

>
> I have doubts about indexing into a table rather than pointing to the
> method table directly. It would probably require at least one extra
> level of dereferencing to find a method.
>

The extra level of indirection could be removed if you are doing
whole-program code generation, but you'd still need to scale the index
and add it to the base of the table instead of using the pointer directly.

>>
>> That is used to search (again because its dynamic), a table of entries
>> for the 'plot' attribute (which can be used in a different other types
>> too), which matches the types of A and B.
>>
>> I would have expect a static language to convert those last the lines to:
>>
>>     point.plot(a)
>>     dirvec.plot(b)
>
> Why not
>
>   a.plot()
>   b.plot()
>
> ?

Perhaps Bart meant the static language would implement "a.plot()" as
though it had been "point.plot(a)", which is exactly how optimising
compilers /do/ implement polymorphism when they know the types at
compile time. (These applies to virtual methods too - it's called
devirtualisation optimisation.)

>
>>
>> I guess it depends on how A and B would be declared, and whether, if
>> they have those two different types, either can still be passed to a
>> function taking a Point type, and possibly one taking a Dirvec type.
>>
>> (I think I ran into trouble trying this in static code, not just
>> because of methods, but because the two types instances would have the
>> normal fields at different offsets, and could be different types; say
>> Point has int32 X,Y, but Dirvec defines float64 Y,Z and inherits int32 X.
>>
>> But maybe I'm just not understanding how OOP is supposed to work.)
>
> Dmitry's reply about various different interpretations of how OOP can
> work shows there is no single right answer. Indeed, OOP may not be the
> best solution, at all.
>

Yes. Your job here is to listen to all the ideas, and figure out what
will work for /you/ and /your/ language.

Dmitry A. Kazakov

unread,
Aug 30, 2021, 7:37:01 AM8/30/21
to
On 2021-08-30 11:47, James Harris wrote:
> On 29/08/2021 20:32, Dmitry A. Kazakov wrote:
>> On 2021-08-29 20:24, James Harris wrote:
>>
>>> Ignoring the length part (as some objects would have their own length
>>> and some would not but that difference is immaterial here) how would
>>> a 'tag' system be laid out in memory and how would one use that to
>>> locate a method?
>>
>> You must consider the following inheritance variants:
>
> ...
>
> Thanks for the information, Dmitry. I might come back to you about some
> of that. But for this thread I was wondering how the memory might be
> laid out under the tag model that you mentioned. What form does a tag
> take and how does the runtime get from the tag to a method?

You must also consider type tests. E.g. is A a descendant of B, unrelated?

> If it depends on which model is in use then what about single
> inheritance and the Full MI model you mentioned?

In the most general case a dispatching table belongs to the method, not
to the type. The tag would denote an N-th coordinate (plane) of the
multidimensional index in the table.

E.g. in the case of full dispatch: Print (Device, Shape), Print is a
method of two hierarchies.

Now, of course, because the same tag must be used to index a multitude
of tables, it cannot be a dense index. So, go figure.

Another issue to consider is whether tables are contiguous. Consider the
following scenario:

The type A is declared in the shared library libA. So you can put the
dispatching table there? Not so fast. Let there be the library libB,
that derives some B from A. That must change the dispatching tables for
A upon loading of libB. When unloading libB you must roll the changes back.

Bart

unread,
Aug 30, 2021, 8:08:08 AM8/30/21
to
On 30/08/2021 11:23, James Harris wrote:
> On 29/08/2021 23:04, Bart wrote:

>> So, why is that actually necessary? Is it because of inheritance,
>> where any instance P could be of the base class, or of a derived class
>> which could have its own sets of methods, and this information is not
>> known until runtime?
>
> It's not because of inheritance. It's to support polymorphism.
>
> Without OO there would be no 'method table pointer' (MTP). Instead, each
> function called to act on an object would know the object's type and,
> hence, where its fields are.
>
> With OO the object could be one of a number of different types as long
> as they all have the same conceptual set of functions (in the same
> internal order), e.g. they both have a plot() function or they both have
> an add() function. So you could have
>
>   pixel_point P
>   float_point F
>
> Both types could have a plot() function. As long as it had the same
> interface then the function could be invoked by
>
>   P.plot()
>   F.plot()
>
> The key benefit, AISI, is that the routine which calls plot would work
> with either type of point. The call wouldn't care what form the point
> took; it would be polymorphic.

That example looks too simple to me; it doesn't really involve OO in my
opinion as it merely provides a convenient alternative syntax. Example
in static code:

record pixelpoint =
byte x,y
proc plot(pixelpoint &self)=
println "Pixelpoint:",self.x,self.y
end
end

record floatpoint =
real x,y
proc plot(floatpoint &self)=
println "Floatpoint:",self.x,self.y
end
end

proc start=
pixelpoint p:=(10,20)
floatpoint q:=(30.5,40.6)

p.plot()
q.plot()
println p.bytes, q.bytes
end

Output is:

Pixelpoint: 10 20
Floatpoint: 30.500000 40.600000
2 16


That p.plot() line is just an alternative to pixelpoint.plot(p), with
the transformation done at compile-time. Here, this is just
encapsulation (which I actually never use; I'm surprised it works!).

I'm not up on OO terms, but polymorphism to me would mean function and
operator overloading. So if you had two functions both called plot,
taking pixelpoint or floatpoint parameters respectively, you could say:

plot(p)
plot(f)

>> Each instance (A and B), contains values for its X and Y fields. But
>> no pointers to method tables. But they both have, because it is
>> dynamic, is a type tag: a index into tables of types.
>
> So your form of a tag is an index into a table? Maybe that's how
> Dmitry's 'tag' would be implemented.

That's just generally how it works with dynamic typing; the tag is the
key to everything. However such code usually means dynamic type dispatch
which is slow. If you have static typing at all, you expect it to be
much faster.

> I have doubts about indexing into a table rather than pointing to the
> method table directly. It would probably require at least one extra
> level of dereferencing to find a method.

Why, in your example, wouldn't the compiler know the type of P?

Do you have a more elaborate example that shows off the advantages of
OO, or whatever you are trying to do, that would be too clunky in C?

Bart

unread,
Aug 30, 2021, 8:21:27 AM8/30/21
to
On 30/08/2021 12:19, David Brown wrote:
> On 30/08/2021 12:23, James Harris wrote:
>> On 29/08/2021 23:04, Bart wrote:

>>> I can illustrate that with this example in dynamic code:

> That code is not dynamic polymorphism - the types are all statically
> known at compile time. Dynamic polymorphism would be if you also had:
>
> proc doubleplot(&x) =
> x.plot();
> x.plot();
> end
>
> proc start =
> a := point(10, 20)
> b := dirvec(30, 30)
> doubleplot(&a)
> doubleplot(&b)
> end
>
> i.e., the function "doubleplot" is compiled without knowing whether it
> will be calling the "point" version of "plot", or the "dirvec" version.

This works fine (but the &s are not needed). But since this is a
dynamically typed language where you get generics and everything else
for free, that is not surprising.

I was making a point about what information needs to be stored with each
instance. Sometimes I don't even need the tag for every point value:

type point = struct
int32 x,y
end

proc start =
x:=new(array,point, 1 million)
println x.bytes
print x[12345]
end

Output is:

8000000
(0,0)

However the 'struct' type doesn't have methods. That is just because the
language decided not to allow that; it could have done, and each array
element would still need only 8 bytes (the array object itself has a tag
and knows the type of the elements).

But it still knows how to print the type! (For a custom print function,
here I'd favour an overload of 'tostring' for this point type, then I
can just do a regular 'print'.)

Dmitry A. Kazakov

unread,
Aug 30, 2021, 8:44:07 AM8/30/21
to
On 2021-08-30 13:19, David Brown wrote:

> There is also compile-time polymorphism, handled by templates or other
> generic coding mechanisms.

Yes, but static polymorphism has a very limited use because you cannot
have objects from the closure of the class, only instances of specific
type ones. In practice this precludes most cases of reuse.

James Harris

unread,
Aug 30, 2021, 4:25:55 PM8/30/21
to
On 30/08/2021 12:36, Dmitry A. Kazakov wrote:
> On 2021-08-30 11:47, James Harris wrote:
>> On 29/08/2021 20:32, Dmitry A. Kazakov wrote:
>>> On 2021-08-29 20:24, James Harris wrote:
>>>
>>>> Ignoring the length part (as some objects would have their own
>>>> length and some would not but that difference is immaterial here)
>>>> how would a 'tag' system be laid out in memory and how would one use
>>>> that to locate a method?
>>>
>>> You must consider the following inheritance variants:
>>
>> ...
>>
>> Thanks for the information, Dmitry. I might come back to you about
>> some of that. But for this thread I was wondering how the memory might
>> be laid out under the tag model that you mentioned. What form does a
>> tag take and how does the runtime get from the tag to a method?
>
> You must also consider type tests. E.g. is A a descendant of B, unrelated?

Interesting question...!

>
>> If it depends on which model is in use then what about single
>> inheritance and the Full MI model you mentioned?
>
> In the most general case a dispatching table belongs to the method, not
> to the type. The tag would denote an N-th coordinate (plane) of the
> multidimensional index in the table.

That sounds weird. If the dispatch table belongs to the method what
would it be indexed by?

When you talk about a tag do you mean something that might, if it cannot
be optimised any better, result in a series of three or even more
lookups executed at run time before the required method is found?

>
> E.g. in the case of full dispatch: Print (Device, Shape), Print is a
> method of two hierarchies.
>
> Now, of course, because the same tag must be used to index a multitude
> of tables, it cannot be a dense index. So, go figure.
>
> Another issue to consider is whether tables are contiguous. Consider the
> following scenario:
>
> The type A is declared in the shared library libA. So you can put the
> dispatching table there? Not so fast. Let there be the library libB,
> that derives some B from A. That must change the dispatching tables for
> A upon loading of libB. When unloading libB you must roll the changes back.

It's rare that a good idea is so complex or has so many semantically
different implementations. I do wonder whether all this complexity
should be replaced by something simpler.

My current scheme is simpler but it doesn't necessarily have
substitutability which is something I gather you believe to be
important, even from your interesting question, above.


--
James Harris

Dmitry A. Kazakov

unread,
Aug 30, 2021, 5:13:12 PM8/30/21
to
By the type, of course. It is pretty straightforward. Consider

Surface Shape
/ \ / \
PDF SVG Ellipse Rectangle

Draw | PDF | SVG |
----------+-----+-----+
Ellipse | ptr | ptr |
----------+-----+-----+
Rectangle | ptr | ptr |
----------+-----+-----+

> When you talk about a tag do you mean something that might, if it cannot
> be optimised any better, result in a series of three or even more
> lookups executed at run time before the required method is found?

Yes. You need some unique type identifier, short enough to fit in a
register.

>> Another issue to consider is whether tables are contiguous. Consider
>> the following scenario:
>>
>> The type A is declared in the shared library libA. So you can put the
>> dispatching table there? Not so fast. Let there be the library libB,
>> that derives some B from A. That must change the dispatching tables
>> for A upon loading of libB. When unloading libB you must roll the
>> changes back.
>
> It's rare that a good idea is so complex or has so many semantically
> different implementations. I do wonder whether all this complexity
> should be replaced by something simpler.

It is a very complex issue. E.g. consider requirements:

1. All slots in a dispatching table are always defined, i.e. dispatch
never fail at run-time.

1.1. That would require the user to override slots the language cannot
safely inherit from the parent type, e.g. mixed type operations.

1.2. The problem with modules. Let the module X derive HTML from Surface
and the module Y derive Sector from Shape. If neither of the modules
uses/knows another, where to define Draw (HTML, Sector)?

There must be some solution for where to place methods. BTW, C++ model
when methods are put into classes is clearly inconsistent with multiple
dispatch. Where to place Draw?

2. No object of a type may exist after the type ceases to exist.

3. Dispatch never happens on partially constructed/destructed objects.
Consider:

A
|
B

In the constructor of B is called after A is constructed. Now what
happens if you dispatch to Foo from the constructor of A and Foo is
overridden by B?

C++ "resolves" that by refusing to dispatch. It will call A::Foo instead
of B::Foo.

That is no solution because programmers do need dispatching
constructors/destructors. A possible solution would be to have
constructors/destructors of class-wide objects where you could safely
dispatch from.

4. Allowing or not to re-dispatch. I.e. if you dispatch on Foo, can you
dispatch again on the same argument inside Foo? This is how C++ works
and this is greatly inefficient and also inconsistent with the types.

> My current scheme is simpler but it doesn't necessarily have
> substitutability which is something I gather you believe to be
> important, even from your interesting question, above.

Inheritance is an open issue. I do not know any language that handles it
consistently and complete. So you have a chance to invent something useful.

James Harris

unread,
Sep 3, 2021, 8:27:24 AM9/3/21
to
On 30/08/2021 22:13, Dmitry A. Kazakov wrote:
> On 2021-08-30 22:25, James Harris wrote:
>> On 30/08/2021 12:36, Dmitry A. Kazakov wrote:

...

>>> In the most general case a dispatching table belongs to the method,
>>> not to the type. The tag would denote an N-th coordinate (plane) of
>>> the multidimensional index in the table.
>>
>> That sounds weird. If the dispatch table belongs to the method what
>> would it be indexed by?
>
> By the type, of course. It is pretty straightforward. Consider
>
>    Surface       Shape
>      /  \         /  \
>    PDF  SVG  Ellipse  Rectangle
>
>    Draw      | PDF | SVG |
>    ----------+-----+-----+
>    Ellipse   | ptr | ptr |
>    ----------+-----+-----+
>    Rectangle | ptr | ptr |
>    ----------+-----+-----+

That solution looks as though the shapes would have to know how to draw
themselves on different surfaces. As such I doubt it would scale well as
new shapes and new types of surface were added. But maybe it's purely
illustrative of multiple dispatch. What I take from it is that, however
it's done, the address of the drawing function (the ptr in the above
table) would need to be found from a key of three parameters

(shape type, surface type, function name)

whereas single dispatch would find a function based on two parameters
such as

(shape type, function name)

Is that correct?


>
>> When you talk about a tag do you mean something that might, if it
>> cannot be optimised any better, result in a series of three or even
>> more lookups executed at run time before the required method is found?
>
> Yes. You need some unique type identifier, short enough to fit in a
> register.

If the tag is the unique identifier you mention how would the compiled
code get from the tag to the correct routine?

My intention was for the tag to be the address of the dispatch table so
that functions (as long as they were at known offsets) could be accessed
quickly but I don't think you liked that.

I will have to come back to you on the other points of your post.


--
James Harris

Dmitry A. Kazakov

unread,
Sep 3, 2021, 9:39:26 AM9/3/21
to
Yes, and the surface parameter is then class-wide, i.e. each function
must have to deal with any shape (any instance from the class = class-wide).

>>> When you talk about a tag do you mean something that might, if it
>>> cannot be optimised any better, result in a series of three or even
>>> more lookups executed at run time before the required method is found?
>>
>> Yes. You need some unique type identifier, short enough to fit in a
>> register.
>
> If the tag is the unique identifier you mention how would the compiled
> code get from the tag to the correct routine?

Via the dispatching table indexed by the tag. Tag could not be a dense
index, true, so you would use a hash or binary search instead.

> My intention was for the tag to be the address of the dispatch table so
> that functions (as long as they were at known offsets) could be accessed
> quickly but I don't think you liked that.

Hash functions are pretty quick.

Then again, if you don't repeat C++ and Java error confusing class-wide
and specific types, you would drastically reduce cases when dispatch
ever happens. Most of the code deals with statically known specific
types, ergo, no dispatch (or as some call it, "static dispatch").

Segmented dispatching tables is IMO a bigger problem unless you rebuild
affected dispatching each time you load a library or dynamically create
a new type etc.

Bart

unread,
Sep 3, 2021, 10:00:10 AM9/3/21
to
On 03/09/2021 13:27, James Harris wrote:
> On 30/08/2021 22:13, Dmitry A. Kazakov wrote:
>> On 2021-08-30 22:25, James Harris wrote:
>>> On 30/08/2021 12:36, Dmitry A. Kazakov wrote:
>
> ...
>
>>>> In the most general case a dispatching table belongs to the method,
>>>> not to the type. The tag would denote an N-th coordinate (plane) of
>>>> the multidimensional index in the table.
>>>
>>> That sounds weird. If the dispatch table belongs to the method what
>>> would it be indexed by?
>>
>> By the type, of course. It is pretty straightforward. Consider
>>
>>     Surface       Shape
>>       /  \         /  \
>>     PDF  SVG  Ellipse  Rectangle
>>
>>     Draw      | PDF | SVG |
>>     ----------+-----+-----+
>>     Ellipse   | ptr | ptr |
>>     ----------+-----+-----+
>>     Rectangle | ptr | ptr |
>>     ----------+-----+-----+
>
> That solution looks as though the shapes would have to know how to draw
> themselves on different surfaces. As such I doubt it would scale well as
> new shapes and new types of surface were added. But maybe it's purely
> illustrative of multiple dispatch.

(The example reminds me of actual drawing code I needed to write in my
applications in the 1990s. The script language included functions such
as a general-purpose one called Plot(), with 8 parameters (many optional
with defaults). The remaining docs are poor but I think they were:

Dest Destination code
D Info specific to a destination
Scan Scan type
View View point
P Drawing element
Clear Whether to clear destination first
Nsegments 1, or build results as multiple hoz bands
Flags Misc

This is for displaying a 3D model, which is represented in P as a
hierarchical tree of drawing elements. There were nearly 50 leaf objects
(including Rectangle and Ellipse!) plus named collections which in a HLL
would correspond to a named function call.

Dest is a code broadly indicating one of Window, Image, Printer/Plotter,
or collections of 3D Face or Lines. Here D might refer to the actual
window, a filename to capture output (eg. EPS print format) etc.

Scan indicates how the data should be processed, with 16 variations,
including Wireframe, Hidden, Shaded, Erase, Highlight, DXF dest, Expand
to lines/faces, ...

View specifies the viewpoint from which the result will be rendered, and
whether it uses perspective projection.

Flags controls many other factors: whether to draw grids or axes, page
outlines ...

OK, the example was clearly a simple one to illustrate some aspects of
how OOP works. But I see such examples using shapes all the time and
find them somewhat annoying.

It gives the impression that OOP can magically solve all the problems
that come up in actual applications, but the reality is always a lot
more complicated.

I'm rather sceptical about what OOP can do, and I have firm views on how
far a language should get involved in matters that should be the concern
of an application.

I also think that, if you really want to use OOP seriously, then you're
quite liable to just end up reinventing C++, as you meet the same
obstacles.)


What I take from it is that, however
> it's done, the address of the drawing function (the ptr in the above
> table) would need to be found from a key of three parameters
>
>   (shape type, surface type, function name)
>
> whereas single dispatch would find a function based on two parameters
> such as
>
>   (shape type, function name)
>
> Is that correct?

The shape should describe the shape itself and nothing else. If you want
to plot it, then extra information is needed:

p.plot(surface_type)

With perhaps that parameter being optional when there is a sensible
default. So here there is single dispatch based on shape type.

Anything else would be difficult to get your head around. If there are M
shape types and N surface types, then you've got MxN methods to write,
but where are you going to write them? Functions are written one after
the other, not in a table!

You might consider introducing overloaded functions, so the plot()
method for Rectangle is split into one each for PDF and SVG. Feature creep.

(My approach as I used was single dispatch - a simple switch - using
common code to reduce complex objects to basic edges, and then using
single dispatch on the 'surface' when drawing those edges.)

James Harris

unread,
Sep 3, 2021, 10:40:46 AM9/3/21
to
OK. I am now trying to understand what you mean by 'class wide'. Are you
saying that each drawing function would have to be passed the target
surface as a parameter - and that makes surface 'class wide'?

In practice, I'd imagine it's best for 'surfaces' to know how to draw on
themselves and they would export drawing primitives (such as point,
line, curve, rectangle, ellipse etc) which routines can call.

>
>>>> When you talk about a tag do you mean something that might, if it
>>>> cannot be optimised any better, result in a series of three or even
>>>> more lookups executed at run time before the required method is found?
>>>
>>> Yes. You need some unique type identifier, short enough to fit in a
>>> register.
>>
>> If the tag is the unique identifier you mention how would the compiled
>> code get from the tag to the correct routine?
>
> Via the dispatching table indexed by the tag. Tag could not be a dense
> index, true, so you would use a hash or binary search instead.

Are you thinking that to find each function the executable code would
carry out the hash or binary search at run time (if optimisation didn't
remove the need for it)?

My guess is that Python works that way: each name referred to in a
method is subjected to lookup in a series of tables (in something called
Method Resolution Order, MRO) starting with the object and ascending to
each class in the MRO in turn until a matching name is found. But that's
slow.

In compiled code, carrying out a hash lookup or a binary search just to
find which function to call would be faster than Python but still too slow.

>
>> My intention was for the tag to be the address of the dispatch table
>> so that functions (as long as they were at known offsets) could be
>> accessed quickly but I don't think you liked that.
>
> Hash functions are pretty quick.

Hash functions are not fast enough to find a call target! I was/am even
concerned about the extra cost of following a double pointer. Using a
hash table would be slower still.

>
> Then again, if you don't repeat C++ and Java error confusing class-wide
> and specific types, you would drastically reduce cases when dispatch
> ever happens. Most of the code deals with statically known specific
> types, ergo, no dispatch (or as some call it, "static dispatch").

I'm not sure I understand that but if you think C++ and Java got it
wrong do you think Ada got it right and what did it do differently?

>
> Segmented dispatching tables is IMO a bigger problem unless you rebuild
> affected dispatching each time you load a library or dynamically create
> a new type etc.
>

I presume that means when a large dispatch table reaches the limits of
its allocated memory and has to be split.


--
James Harris

James Harris

unread,
Sep 3, 2021, 10:59:19 AM9/3/21
to
On 03/09/2021 14:59, Bart wrote:
> On 03/09/2021 13:27, James Harris wrote:

...

> OK, the example was clearly a simple one to illustrate some aspects of
> how OOP works. But I see such examples using shapes all the time and
> find them somewhat annoying.

I recently had to take a first look at how I'd want a GUI to work. My
answer was basically sentient widgets which knew how to operate on data
and how to draw themselves. But the drawing would be by graphics
primitives which were provided by the surfaces on which the widgets
wanted to appear. Without having implemented any of this, yet, that
seems to me to be the best approach.

>
> It gives the impression that OOP can magically solve all the problems
> that come up in actual applications, but the reality is always a lot
> more complicated.

OOP has lots wrong with it - including multiple incompatible ways of
realising it.

ADTs, by contrast, are great. Each type comes with its own functions.

Don't throw the ADTs out with the OOP!

>
> I'm rather sceptical about what OOP can do, and I have firm views on how
> far a language should get involved in matters that should be the concern
> of an application.
>
> I also think that, if you really want to use OOP seriously, then you're
> quite liable to just end up reinventing C++, as you meet the same
> obstacles.)

I don't know much about C++ but ending up with similar solutions is
unlikely as its design goals are different.
>
>
>  What I take from it is that, however
>> it's done, the address of the drawing function (the ptr in the above
>> table) would need to be found from a key of three parameters
>>
>>    (shape type, surface type, function name)
>>
>> whereas single dispatch would find a function based on two parameters
>> such as
>>
>>    (shape type, function name)
>>
>> Is that correct?
>
> The shape should describe the shape itself and nothing else. If you want
> to plot it, then extra information is needed:
>
>    p.plot(surface_type)
>
> With perhaps that parameter being optional when there is a sensible
> default. So here there is single dispatch based on shape type.
>
> Anything else would be difficult to get your head around. If there are M
> shape types and N surface types, then you've got MxN methods to write,

Indeed, that's exactly the issue. Better to have the M shapes (or
widgets) use a set of drawing primitives, and have those provided by the
N surface types, IMO.


> but where are you going to write them? Functions are written one after
> the other, not in a table!
>
> You might consider introducing overloaded functions, so the plot()
> method for Rectangle is split into one each for PDF and SVG. Feature creep.
>
> (My approach as I used was single dispatch - a simple switch - using
> common code to reduce complex objects to basic edges, and then using
> single dispatch on the 'surface' when drawing those edges.)
>

Why not say that each surface has to provide a set of primitives that
other programs can use to draw on an instance of that type of surface?


--
James Harris

Dmitry A. Kazakov

unread,
Sep 3, 2021, 11:28:08 AM9/3/21
to
On 2021-09-03 16:59, James Harris wrote:

> Why not say that each surface has to provide a set of primitives that
> other programs can use to draw on an instance of that type of surface?

Because that set is open ended. OOP allows you

1. adding new primitives (new shapes)

2. using primitives with new surfaces (inheritance)

All counter proposals are static and do not scale in practice, which is
why OOP is used so widely in large projects regardless any difficulties.
There is simple no other way to do it, economically, there is no other way.

Dmitry A. Kazakov

unread,
Sep 3, 2021, 11:46:13 AM9/3/21
to
Class-wide = anything from the members of the class.

> Are you
> saying that each drawing function would have to be passed the target
> surface as a parameter - and that makes surface 'class wide'?

Not only passed, it must accept any surface of any surface type. In
particular it means that it must know the type tag.

> In practice, I'd imagine it's best for 'surfaces' to know how to draw on
> themselves and they would export drawing primitives (such as point,
> line, curve, rectangle, ellipse etc) which routines can call.

I commented on that in another post.

>>>>> When you talk about a tag do you mean something that might, if it
>>>>> cannot be optimised any better, result in a series of three or even
>>>>> more lookups executed at run time before the required method is found?
>>>>
>>>> Yes. You need some unique type identifier, short enough to fit in a
>>>> register.
>>>
>>> If the tag is the unique identifier you mention how would the
>>> compiled code get from the tag to the correct routine?
>>
>> Via the dispatching table indexed by the tag. Tag could not be a dense
>> index, true, so you would use a hash or binary search instead.
>
> Are you thinking that to find each function the executable code would
> carry out the hash or binary search at run time (if optimisation didn't
> remove the need for it)?

https://en.wikipedia.org/wiki/Hash_function

Tag -> 1st Foo's parameter Hash = I1
Tag -> 2nd Foo's parameter Hash = I2
..
Tag -> Nth Foo's parameter Hash = IN

Foo_Table (I1, I2, ... IN) --> Target pointer

There exist multi-dimensional hash techniques which would allow you to
do it in less steps. I do not know how effective they are.

>> Hash functions are pretty quick.
>
> Hash functions are not fast enough to find a call target! I was/am even
> concerned about the extra cost of following a double pointer. Using a
> hash table would be slower still.

Faster than anything the user would have to write otherwise, in order to
emulate dispatch.

Then performance is never a big concern. Safety is. User emulations of
dispatch would be nested if-statements, which are not only significantly
slower O(n), but inherently error-prone and unmaintainable.

>> Then again, if you don't repeat C++ and Java error confusing
>> class-wide and specific types, you would drastically reduce cases when
>> dispatch ever happens. Most of the code deals with statically known
>> specific types, ergo, no dispatch (or as some call it, "static
>> dispatch").
>
> I'm not sure I understand that but if you think C++ and Java got it
> wrong do you think Ada got it right and what did it do differently?

If foo and baz are virtual functions of T then

void foo (T& x)
{
x.baz (); -- Dispatches
}

In Ada

procedure Foo (X : T) is
begin
X.Baz; -- Does not dispatch, the type is established to be T
end Foo;

>> Segmented dispatching tables is IMO a bigger problem unless you
>> rebuild affected dispatching each time you load a library or
>> dynamically create a new type etc.
>
> I presume that means when a large dispatch table reaches the limits of
> its allocated memory and has to be split.

Not large, but consisting of several parts, so that you must linearly
search through all of them.

James Harris

unread,
Sep 3, 2021, 12:43:16 PM9/3/21
to
On 03/09/2021 16:28, Dmitry A. Kazakov wrote:
> On 2021-09-03 16:59, James Harris wrote:
>
>> Why not say that each surface has to provide a set of primitives that
>> other programs can use to draw on an instance of that type of surface?
>
> Because that set is open ended. OOP allows you
>
> 1. adding new primitives (new shapes)
>
> 2. using primitives with new surfaces (inheritance)
>
> All counter proposals are static and do not scale in practice, which is
> why OOP is used so widely in large projects regardless any difficulties.
> There is simple no other way to do it, economically, there is no other way.
>

Your example was good in showing multiple dispatch but I'm not so sure
it's a good solution in this case, let alone the only way!

The rights and wrongs of the approach are OT so all I'll say is that M*N
does not scale and that new shapes can be constructed from primitives.

Unless I am missing something the cartesian M*N would be costly and so
even economically, as you mention, I cannot agree that there is no other
way.


--
James Harris

Dmitry A. Kazakov

unread,
Sep 3, 2021, 1:27:59 PM9/3/21
to
On 2021-09-03 18:43, James Harris wrote:
> On 03/09/2021 16:28, Dmitry A. Kazakov wrote:
>> On 2021-09-03 16:59, James Harris wrote:
>>
>>> Why not say that each surface has to provide a set of primitives that
>>> other programs can use to draw on an instance of that type of surface?
>>
>> Because that set is open ended. OOP allows you
>>
>> 1. adding new primitives (new shapes)
>>
>> 2. using primitives with new surfaces (inheritance)
>>
>> All counter proposals are static and do not scale in practice, which
>> is why OOP is used so widely in large projects regardless any
>> difficulties. There is simple no other way to do it, economically,
>> there is no other way.
>>
>
> Your example was good in showing multiple dispatch but I'm not so sure
> it's a good solution in this case, let alone the only way!

Of course it is. Write a program that renders a list of shapes in a
widget in a way that you would not need to modify each time another
developers team creates a new shape or supports another surface.

> The rights and wrongs of the approach are OT so all I'll say is that M*N
> does not scale and that new shapes can be constructed from primitives.

You did not thought it through. A set of primitives supported by all
shapes is already a class. This is exactly what a class of shapes is.
Any client of the class expects to find an implementation of any
primitive in any surface.

Remove the class, you will need to write a new client for each surface.

> Unless I am missing something the cartesian M*N would be costly and so
> even economically, as you mention,

And its existence is a fact of life. Shapes exist, surfaces exist,
drawing exist.

> I cannot agree that there is no other way.

Show the other way to implement and use M*N operations safely.

Bart

unread,
Sep 3, 2021, 1:45:10 PM9/3/21
to
Say you have an enum type listing M different shapes, and another
listing N different surfaces:

* Create a 2D MxN array H of function pointers

* Populate the table (see below)

* To plot any shape S, call the function H[shape_tag, surface_type]


To populate the table (this is what I can do in my language; probably
it's not possible in Ada or others, it might need external
code-generation, or auxiliary, manually-created tables of available
combinations):

* Write the functions for the combinations of Shape and Surface that are
supported; name the functions according to some agreed format. (These
might be methods; that is just a detail)

* Have an initialisation routine that scans all functions in the program
looking for those that implement the plotting (it will look at the name)

* For each such function, extract the shape X and surface Y it
implements, from the name. Fill in H[X,Y] with a reference to the function

* For all table entries not yet populated, fill in a reference to an
error-handler or default-handling function. 'nil' is not recommended!


Dmitry A. Kazakov

unread,
Sep 3, 2021, 2:01:40 PM9/3/21
to
On 2021-09-03 19:44, Bart wrote:
> On 03/09/2021 18:27, Dmitry A. Kazakov wrote:
>> On 2021-09-03 18:43, James Harris wrote:
>
>>
>>> Unless I am missing something the cartesian M*N would be costly and
>>> so even economically, as you mention,
>>
>> And its existence is a fact of life. Shapes exist, surfaces exist,
>> drawing exist.
>>
>>> I cannot agree that there is no other way.
>>
>> Show the other way to implement and use M*N operations safely.
>
> Say you have an enum type listing M different shapes, and another
> listing N different surfaces:
>
> * Create a 2D MxN array H of function pointers

Failed. The requirement included different teams of developers and
different times rows and columns (types) added.

Bart

unread,
Sep 3, 2021, 4:07:03 PM9/3/21
to
How is that relevant? Presumably all the necessary information (method
definitions or their interfaces) will be visible to the (presumably AOT)
compiler when the application is built?

Then who cares who wrote it or when.

Anyway who are you to say how my application will be organised.

Dmitry A. Kazakov

unread,
Sep 3, 2021, 4:25:21 PM9/3/21
to
Nope. An unknown number of dynamically loaded shared libraries may
contain types derived from Shape and/or Surface. The libraries are
loaded during run-time per scanning a list of directories containing DLL
plug-ins.

> Then who cares who wrote it or when.

You must care. You must scan all code to determine N and M and find all
functions for all slots. Then you must recompile everything.

> Anyway who are you to say how my application will be organised.

A provider of operations.

The point is that in real projects nobody has control or knowledge about
the types comprising dispatching tables. It is called late bindings for
a reason.

You may have your wet dreams about how you organize *your* application,
but the fact is that nobody develops software this way for longer than
three decades.

Bart

unread,
Sep 3, 2021, 4:48:42 PM9/3/21
to
On 03/09/2021 21:25, Dmitry A. Kazakov wrote:
> On 2021-09-03 22:06, Bart wrote:
>> On 03/09/2021 19:01, Dmitry A. Kazakov wrote:
>>> On 2021-09-03 19:44, Bart wrote:
>>>> On 03/09/2021 18:27, Dmitry A. Kazakov wrote:
>>>>> On 2021-09-03 18:43, James Harris wrote:
>>>>
>>>>>
>>>>>> Unless I am missing something the cartesian M*N would be costly
>>>>>> and so even economically, as you mention,
>>>>>
>>>>> And its existence is a fact of life. Shapes exist, surfaces exist,
>>>>> drawing exist.
>>>>>
>>>>>> I cannot agree that there is no other way.
>>>>>
>>>>> Show the other way to implement and use M*N operations safely.
>>>>
>>>> Say you have an enum type listing M different shapes, and another
>>>> listing N different surfaces:
>>>>
>>>> * Create a 2D MxN array H of function pointers
>>>
>>> Failed. The requirement included different teams of developers and
>>> different times rows and columns (types) added.
>>
>> How is that relevant? Presumably all the necessary information (method
>> definitions or their interfaces) will be visible to the (presumably
>> AOT) compiler when the application is built?

> Nope. An unknown number of dynamically loaded shared libraries may
> contain types derived from Shape and/or Surface. The libraries are
> loaded during run-time per scanning a list of directories containing DLL
> plug-ins.

Then any approach using tables attached to class instances is going to
fail too, if every time someone makes a suggestion, you are going to
think up more unlikely scenarios to sabotage the idea.



>
>> Then who cares who wrote it or when.
>
> You must care. You must scan all code to determine N and M and find all
> functions for all slots. Then you must recompile everything.

That's what I said you have to do. a For static application which knows
all the shapes and all the surfaces.

Now you are saying you can NEVER do that because SOME applications have
shapes and surfaces defined at runtime.

Does that make sense? So I shouldn't buy a car to do shopping etc,
because I won't be able to drive from London to New York?

Anyway, when stuff like that happens at runtime, then it sounds more a
job for a dynamic language than a static one.

(In my 1990s application, modules of my script language could be
hot-loaded; commands and functionality could be added /while my
application was running/.)

>> Anyway who are you to say how my application will be organised.
>
> A provider of operations.
>
> The point is that in real projects nobody has control or knowledge about
> the types comprising dispatching tables. It is called late bindings for
> a reason.

This is not about types. It's about enumerations. What you're saying is
that I can't use a fixed enumeration like:

(Mon, Tue, Wed, Thu, Fri, Sat, Sun)

as the basis for a dispatch table, because new days of the week could be
created while the program runs.

> You may have your wet dreams about how you organize *your* application,
> but the fact is that nobody develops software this way for longer than
> three decades.

So, go back to that inheritance diagram of shapes and surfaces; you're
also saying that it's a waste of time for a compiler to even look at
that, because the real inheritance graph is not known until runtime?

In that case we might as well all pack up and go home.

Dmitry A. Kazakov

unread,
Sep 3, 2021, 5:42:34 PM9/3/21
to
On 2021-09-03 22:48, Bart wrote:
> On 03/09/2021 21:25, Dmitry A. Kazakov wrote:

>> Nope. An unknown number of dynamically loaded shared libraries may
>> contain types derived from Shape and/or Surface. The libraries are
>> loaded during run-time per scanning a list of directories containing
>> DLL plug-ins.
>
> Then any approach using tables attached to class instances is going to
> fail too,

It works perfectly well with C++, Ada and dozens other OOPL. You can try
it yourself. Get a grip, you can have a type T in one library. Derive S
from T in another. Load that library at run-time. Create an object of
the type S. Pass it to a function in the first library and dispatch from
that function to a method overridden in the dynamically loaded library!

> if every time someone makes a suggestion, you are going to
> think up more unlikely scenarios to sabotage the idea.

What? Plug-ins are unlikely scenario? Do you live under a rock?

>>> Then who cares who wrote it or when.
>>
>> You must care. You must scan all code to determine N and M and find
>> all functions for all slots. Then you must recompile everything.
>
> That's what I said you have to do. a For static application which knows
> all the shapes and all the surfaces.
>
> Now you are saying you can NEVER do that because SOME applications have
> shapes and surfaces defined at runtime.

No they are defined at compile time in the components which know nothing
about each other. This is called loose coupling design principle:

https://en.wikipedia.org/wiki/Loose_coupling

>> The point is that in real projects nobody has control or knowledge
>> about the types comprising dispatching tables. It is called late
>> bindings for a reason.
>
> This is not about types. It's about enumerations.

You can treat tags of all types from some class as an enumeration.
Dispatching is equivalent to a switch statement over the tags.

One of the motivations behind OOPL was to allow alternatives of such a
monstrous switch to be located in different modules developed and
maintained independently on each other.

Operating systems used dispatch since very beginning. Even the term
dispatch is from there, I believe. Implementations of kernels written in
assembler dispatched from interrupt handlers to the driver code. And,
yes, surprise, drivers were dynamically loadable half a century ago.
OOPLs are there merely to support such software architectures.

Bart

unread,
Sep 3, 2021, 8:49:28 PM9/3/21
to
On 03/09/2021 22:42, Dmitry A. Kazakov wrote:
> On 2021-09-03 22:48, Bart wrote:
>> On 03/09/2021 21:25, Dmitry A. Kazakov wrote:
>
>>> Nope. An unknown number of dynamically loaded shared libraries may
>>> contain types derived from Shape and/or Surface. The libraries are
>>> loaded during run-time per scanning a list of directories containing
>>> DLL plug-ins.
>>
>> Then any approach using tables attached to class instances is going to
>> fail too,
>
> It works perfectly well with C++, Ada and dozens other OOPL. You can try
> it yourself. Get a grip, you can have a type T in one library. Derive S
> from T in another. Load that library at run-time. Create an object of
> the type S. Pass it to a function in the first library and dispatch from
> that function to a method overridden in the dynamically loaded library!

So my H[shape,surface] table /can/ work?

You want to plot S, get X/Y as its shape tag, plus the desired surface
type, and call the handler. The difference is that building the handler
table is more dynamic than it was (if you remember, it was populated at
runtime anyway).

X and Y are normally linear enumarations; if not then H[,] becomes a
hashtable instead (with a key of 2 values).


>> if every time someone makes a suggestion, you are going to think up
>> more unlikely scenarios to sabotage the idea.
>
> What? Plug-ins are unlikely scenario? Do you live under a rock?

What do you think my hot-loaded modules were? Just crude versions of
plug-ins from 25 years ago.

You do like to bamboozle people don't you with yours seems to be to
bamboozle people with scenarios from giant, enterprise languages, as
though there's no place for anything else.

I want to create my own recipes in my own kitchen using my own methods.
You look down your nose at that because those methods aren't suitable
for industrial scale food production.

OK. But my own meals will still taste better for me!

There's a place for your languages, and there's a place for ones like mine.

>> Now you are saying you can NEVER do that because SOME applications
>> have shapes and surfaces defined at runtime.
>
> No they are defined at compile time in the components which know nothing
> about each other.

Well, I create my 1990s CAD application without any of these methods. I
seem to recall that it worked pretty well.

One problem that I recall was that it fitted easily onto a single floppy
disk. Since it was rather expensive product, it was shipped on CD to
make out was it was much bigger than it was! Even then people had gotten
used to bloated, cumbersome applications so that's what they expected.

Dmitry A. Kazakov

unread,
Sep 4, 2021, 4:27:21 AM9/4/21
to
On 2021-09-04 02:49, Bart wrote:
> On 03/09/2021 22:42, Dmitry A. Kazakov wrote:
>> On 2021-09-03 22:48, Bart wrote:
>>> On 03/09/2021 21:25, Dmitry A. Kazakov wrote:
>>
>>>> Nope. An unknown number of dynamically loaded shared libraries may
>>>> contain types derived from Shape and/or Surface. The libraries are
>>>> loaded during run-time per scanning a list of directories containing
>>>> DLL plug-ins.
>>>
>>> Then any approach using tables attached to class instances is going
>>> to fail too,
>>
>> It works perfectly well with C++, Ada and dozens other OOPL. You can
>> try it yourself. Get a grip, you can have a type T in one library.
>> Derive S from T in another. Load that library at run-time. Create an
>> object of the type S. Pass it to a function in the first library and
>> dispatch from that function to a method overridden in the dynamically
>> loaded library!
>
> So my H[shape,surface] table /can/ work?

Dispatching tables work. It is the methods of creating and maintaining
them safely that is the issue.

> There's a place for your languages, and there's a place for ones like mine.

Nobody forces you to implement any feature. The grapes are sour.

James Harris

unread,
Sep 4, 2021, 1:43:36 PM9/4/21
to
On 03/09/2021 16:46, Dmitry A. Kazakov wrote:
> On 2021-09-03 16:40, James Harris wrote:
>> On 03/09/2021 14:39, Dmitry A. Kazakov wrote:

...

>>> Hash functions are pretty quick.
>>
>> Hash functions are not fast enough to find a call target! I was/am
>> even concerned about the extra cost of following a double pointer.
>> Using a hash table would be slower still.
>
> Faster than anything the user would have to write otherwise, in order to
> emulate dispatch.

On the contrary, where the scheme I mentioned would have just two
dereferences to get the address of the required function ... your scheme
would have multiple dereferences, a function call, a hash calculation,
and possibly collision resolution.

>
> Then performance is never a big concern.

I'm really surprised that such a lookup would even be considered. It
would be much slower than a direct function call.


> Safety is. User emulations of
> dispatch would be nested if-statements, which are not only significantly
> slower O(n), but inherently error-prone and unmaintainable.

That's an interesting point from the point of view of performance. A
case structure would nominally be faster than sequential search but with
a potentially complex key such as your tag that case statement may well
have to do hash lookup or binary search. A simple index or a pointer
would be faster.

What is your objection to my proposed solution? To be clear, the idea is
to have the object's pointer to link directly to the method table as
follows.

object: |___ptr___|___field0___|___field1___| etc
|
V
|__type___|__reserved__|__ptr__|__ptr__|__ptr__|__ptr__|

I've shown one dimension but there could be as many as necessary for
multiple dispatch.

One area in which your scheme would do well is if there were a large,
sparse, multidimensional table of method pointers. But in that case the
routine I would dispatch to could still carry out its own lookup to find
the target function.

Having said all that, IMO the static dispatch that David mentioned would
be significantly faster than either of the above approaches, though I
gather from your point about closures that it can be impossible at
compile time to predict what the type will be.


>
>>> Then again, if you don't repeat C++ and Java error confusing
>>> class-wide and specific types, you would drastically reduce cases
>>> when dispatch ever happens. Most of the code deals with statically
>>> known specific types, ergo, no dispatch (or as some call it, "static
>>> dispatch").
>>
>> I'm not sure I understand that but if you think C++ and Java got it
>> wrong do you think Ada got it right and what did it do differently?
>
> If foo and baz are virtual functions of T then
>
>    void foo (T& x)
>    {
>       x.baz (); -- Dispatches
>    }

I didn't know what a virtual function was but AIUI it's a function which
has to be overridden in a descendant class. (?)

If that's right is the issue that Ada prohibits virtual functions?

>
> In Ada
>
>    procedure Foo (X : T) is
>    begin
>       X.Baz; -- Does not dispatch, the type is established to be T
>    end Foo;

So T.Baz has to be in T and cannot be overridden in a subclass of T?


--
James Harris

Dmitry A. Kazakov

unread,
Sep 4, 2021, 3:56:26 PM9/4/21
to
On 2021-09-04 19:43, James Harris wrote:
> On 03/09/2021 16:46, Dmitry A. Kazakov wrote:
>> On 2021-09-03 16:40, James Harris wrote:
>>> On 03/09/2021 14:39, Dmitry A. Kazakov wrote:
>
> ...
>
>>>> Hash functions are pretty quick.
>>>
>>> Hash functions are not fast enough to find a call target! I was/am
>>> even concerned about the extra cost of following a double pointer.
>>> Using a hash table would be slower still.
>>
>> Faster than anything the user would have to write otherwise, in order
>> to emulate dispatch.
>
> On the contrary, where the scheme I mentioned would have just two
> dereferences to get the address of the required function ... your scheme
> would have multiple dereferences, a function call, a hash calculation,
> and possibly collision resolution.

You did not propose anything for the case of full dispatch. Certainly
specialized cases like single dispatch or multi-methods could be handled
differently.

The point was, if you do not support this in the language, the
programmer will have to implement it, which is always slower and always
error prone.

>> Then performance is never a big concern.
>
> I'm really surprised that such a lookup would even be considered. It
> would be much slower than a direct function call.

Again, what is the alternative?

GTK is written in C. C has no classes. So GTK emulates them using the
GObject library. Guess, what is faster, C++ implementation of dispatch
or the GObject kludge?

> What is your objection to my proposed solution? To be clear, the idea is
> to have the object's pointer to link directly to the method table as
> follows.
>
>   object:   |___ptr___|___field0___|___field1___| etc
>                  |
>                  V
>             |__type___|__reserved__|__ptr__|__ptr__|__ptr__|__ptr__|

Show how this is supposed to work with full multiple dispatch, with
multi-methods, with multiple inheritance, with constructors and
destructors, with scalar types.

BTW, there is a book by Bjarne Stroustrup with a detailed discussion
regarding implementation of objects and dispatching tables (virtual
tables) in C++.

https://www.stroustrup.com/arm.html

I suggest you read it first in order to get an elementary understanding
of the problematic. And keep in mind that C++ has single dispatch.

> I've shown one dimension but there could be as many as necessary for
> multiple dispatch.

No, because there will be many objects in the call.

> Having said all that, IMO the static dispatch that David mentioned would
> be significantly faster than either of the above approaches, though I
> gather from your point about closures that it can be impossible at
> compile time to predict what the type will be.

Static dispatch is just a direct call, provided you designed the
language properly.

>>>> Then again, if you don't repeat C++ and Java error confusing
>>>> class-wide and specific types, you would drastically reduce cases
>>>> when dispatch ever happens. Most of the code deals with statically
>>>> known specific types, ergo, no dispatch (or as some call it, "static
>>>> dispatch").
>>>
>>> I'm not sure I understand that but if you think C++ and Java got it
>>> wrong do you think Ada got it right and what did it do differently?
>>
>> If foo and baz are virtual functions of T then
>>
>>     void foo (T& x)
>>     {
>>        x.baz (); -- Dispatches
>>     }
>
> I didn't know what a virtual function was but AIUI it's a function which
> has to be overridden in a descendant class. (?)

"Virtual function" is C++ term for method.

> If that's right is the issue that Ada prohibits virtual functions?

No, Ada's name for virtual function is "primitive operation."

>> In Ada
>>
>>     procedure Foo (X : T) is
>>     begin
>>        X.Baz; -- Does not dispatch, the type is established to be T
>>     end Foo;
>
> So T.Baz has to be in T and cannot be overridden in a subclass of T?

X.Baz does not dispatch in Ada. In C++ X.baz () it does dispatch.

Bart

unread,
Sep 4, 2021, 4:19:05 PM9/4/21
to
[Original DAK post lost]

DAK:

On 2021-09-04 02:49, Bart wrote:


>> There's a place for your languages, and there's a place for ones
like mine.

> Nobody forces you to implement any feature. The grapes are sour.


There is a certain amount of satisfaction in having, over exactly 40
years now, used almost exclusively my own languages, implementations and
tools, and all self-hosted.

One key to doing that is to be selective about what features I've added.
I've kept things simple, managable and understandable. Which is not
always easy.

Your (DAK's) role seems to be the opposite: turn everything into
something harder, less manageable, and incomprehensible!

James Harris

unread,
Sep 4, 2021, 4:44:57 PM9/4/21
to
On 30/08/2021 13:08, Bart wrote:
> On 30/08/2021 11:23, James Harris wrote:

...

>>    P.plot()

...

> Why, in your example, wouldn't the compiler know the type of P?

The compiler would know if P were created and manipulated wholly by the
current function but P could be a formal parameter which could be one of
a number of types. I can think of two cases. One is inheritance

type A
|
type B

then a parameter declared to be of type A could include a function from
either type A or type B. (Or is it the other way round?)

function f(A P)

A given activation of function f could be required to use either A's or
B's version of a function.

The other is a variant

typedef C = int or float

function f(C P)

A given activation of function f could have been passed an int or a float.


>
> Do you have a more elaborate example that shows off the advantages of
> OO, or whatever you are trying to do, that would be too clunky in C?
>

I'm certainly not trying to promote OOP. I have serious doubts about the
inheritance part of it.


--
James Harris

Dmitry A. Kazakov

unread,
Sep 4, 2021, 5:01:01 PM9/4/21
to
Right, different priorities. Simplicity for the compiler designer
translates into complexity for the programmer.

Bart

unread,
Sep 4, 2021, 5:35:47 PM9/4/21
to
On 04/09/2021 21:44, James Harris wrote:
> On 30/08/2021 13:08, Bart wrote:
>> On 30/08/2021 11:23, James Harris wrote:
>
> ...
>
>>>    P.plot()
>
> ...
>
>> Why, in your example, wouldn't the compiler know the type of P?
>
> The compiler would know if P were created and manipulated wholly by the
> current function but P could be a formal parameter which could be one of
> a number of types. I can think of two cases. One is inheritance
>
>   type A
>     |
>   type B
>
> then a parameter declared to be of type A could include a function from
> either type A or type B. (Or is it the other way round?)
>
>   function f(A P)
>
> A given activation of function f could be required to use either A's or
> B's version of a function.

I didn't understand that at all. I guess inheritance is not for me.

I have played with chains of derived types like this:

A -> B -> C -> D
X -> Y -> Z

Then had to decide what to do about C+Y or even A+B*D, since some ops
may be inherited, some overloaded, and what do about auto coercions in
D:=A or Y:=A.

So, I don't have distinct derived types; only aliases for the same type,
to make like simpler.


> The other is a variant
>
>   typedef C = int or float
>
>   function f(C P)
>
> A given activation of function f could have been passed an int or a float.

Here it's easy to understand your intention. But possibly more
challenging to implement.

That looks a bit like a Haskell type, but if you're not implementing
that, would f be a generic function? Or would C be a variant type along
the lines of C++'s? Will there be overloads of f()?

Just getting the basics working (eg. A+B if both are of type C) would be
challenging here.

So perhaps pin down more precisely what features you're thinking of having.

Bart

unread,
Sep 4, 2021, 5:37:12 PM9/4/21
to
You might want to look back at the fragments of my language that I've
posted. Such as 'print x'.

Dmitry A. Kazakov

unread,
Sep 5, 2021, 3:32:51 AM9/5/21
to
I do not need it. But if you ask, print x with the sign as a ternary
number aligned at the right margin of the field of 10 character width,
and do not forget it must be superscript.

Dmitry A. Kazakov

unread,
Sep 5, 2021, 3:58:07 AM9/5/21
to
On 2021-09-04 23:35, Bart wrote:

>> The other is a variant
>>
>>    typedef C = int or float
>>
>>    function f(C P)
>>
>> A given activation of function f could have been passed an int or a
>> float.
>
> Here it's easy to understand your intention. But possibly more
> challenging to implement.

This is quite easy, since no inheritance or [multi-]methods involved.
[f() is not a method]

> That looks a bit like a Haskell type,

C is a class-wide type of the class {int, float}.

> but if you're not implementing
> that, would f be a generic function?

f() is a class-wide function.

> Or would C be a variant type along
> the lines of C++'s?

One can imagine it as a variant type with the tag selecting the value.
The values of C would be like these:

(int'tag, 123)
(float'tag, 3.1415)

The representation of tags is irrelevant in this case.

> Will there be overloads of f()?

Why not? If you rather meant overriding, then no, only methods can be
overridden.

> Just getting the basics working (eg. A+B if both are of type C) would be
> challenging here.

function "+" (Left, Right : C) return C is
begin
if Left in int then -- Left.tag = int'tag
if Right in int then
return (int'tag, int (Left) + int (Right);
else
return (float'tag, float (int (Left)) + float (Right);
end if;
else
if Right in int then
return (float'tag, float (Left) + float (int (Right));
else
return (float'tag, float (Left) + float (Right);
end if;
end if;
end "+";

int (Left) is what C++ calls dynamic type cast. It checks the type tag
to be int'tag and then returns the integer value.

Here we see the an obvious problem, the above is nothing but a manual
implementation of a dispatch on the multi-method. "+" should have been a
multi-method, not a class-wide operation. Then each choice in the
cascaded ifs would be a separate overriding/inheriting body.

Bart

unread,
Sep 5, 2021, 6:24:38 AM9/5/21
to
I can manage this:

a:=730

println a
fprintln "|#|", a:"x3 10 jr"

This shows:

730
| 1000001|

I can't do much about superscript; that depends on the capabilities of
the console window I'm using.

In any case I consider aspects of typeface, style, size, colour etc
outside the remit of 'print' which is mainly concerned with turning
expressions into plain text, eg. serialising.

You might as well ask whether the output of a JSON or XML converter will
render strings as superscript.

But, show us all how it's done in Ada, and in a way that will work on my PC.

(I know have an Ada compiler installed. I've added it to one of my
benchmarks here:
https://github.com/sal55/langs/blob/master/Compilertest1.md. Ada (Gnat)
is just over halfway down.)

Dmitry A. Kazakov

unread,
Sep 5, 2021, 6:54:35 AM9/5/21
to
Looks like broken checksum rather than a program...

> This shows:
>
>     730
>     |   1000001|

Where is the +?

> I can't do much about superscript; that depends on the capabilities of
> the console window I'm using.

Windows console supports UTF-8. See the code page command CHCP.

Bart

unread,
Sep 5, 2021, 7:15:11 AM9/5/21
to
On 05/09/2021 11:54, Dmitry A. Kazakov wrote:
> On 2021-09-05 12:24, Bart wrote:

>>
>>      730
>>      |   1000001|
>
> Where is the +?


Oops, I forgot. Try:

println a : "+x3 10 jr"

I've dispensed with the format which added the |...| delimiters.

If you don't like the way those attributes are specified, then please
post how it's done in Ada.

If you like a more long-winded style, then I could use this (dynamic code):

a:=730

println a:fmt(dosign:"+", width:10, base:3, just:"right")

....

function fmt(width=0, dosign="", just="", base=10)=
s:=dosign
if width then s+:=tostr(width) fi
if base<>10 then s+:="x"+tostr(base) fi
if just then s+:="j"+leftstr(just) fi
return s
end



>> I can't do much about superscript; that depends on the capabilities of
>> the console window I'm using.
>
> Windows console supports UTF-8. See the code page command CHCP.

It might do, but it doesn't belong in the serialisation options of a
single print item. If superscript can be turned on and off by string
escape sequences, then my example becomes:

println sson, a:"x3...", ssoff

But this is no longer within the language.


James Harris

unread,
Sep 9, 2021, 1:25:06 PM9/9/21
to
On 04/09/2021 22:35, Bart wrote:
> On 04/09/2021 21:44, James Harris wrote:
>> On 30/08/2021 13:08, Bart wrote:
>>> On 30/08/2021 11:23, James Harris wrote:
>>
>> ...
>>
>>>>    P.plot()
>>
>> ...
>>
>>> Why, in your example, wouldn't the compiler know the type of P?
>>
>> The compiler would know if P were created and manipulated wholly by
>> the current function but P could be a formal parameter which could be
>> one of a number of types. I can think of two cases. One is inheritance
>>
>>    type A
>>      |
>>    type B
>>
>> then a parameter declared to be of type A could include a function
>> from either type A or type B. (Or is it the other way round?)
>>
>>    function f(A P)
>>
>> A given activation of function f could be required to use either A's
>> or B's version of a function.
>
> I didn't understand that at all. I guess inheritance is not for me.

Inheritance is maybe not for /anyone/! It encourages composition by tall
hierarchies rather than by flattening and simplifying but that's another
matter. The model I had in mind for that example is a simple one:

An object of type A has fields Aa, Ab and Ac

An object of type B has all of A's fields and some of its own.

As long as both types of object have A's fields at the same offsets then
code which expects an A can still work on A's fields and doesn't care
whether the object is really of type A or of type B.

...

>> The other is a variant
>>
>>    typedef C = int or float
>>
>>    function f(C P)
>>
>> A given activation of function f could have been passed an int or a
>> float.
>
> Here it's easy to understand your intention. But possibly more
> challenging to implement.

Could function f not just use a switch? As in

switch P.tag
case int
/* int processing */
case float
/* float processing */
end switch

(That switch could either be explicit in the source code or implicit and
generated by the compiler.)

>
> That looks a bit like a Haskell type, but if you're not implementing
> that, would f be a generic function? Or would C be a variant type along
> the lines of C++'s? Will there be overloads of f()?

To answer your questions: in the example, I had in mind C being a
variant type; f could be a simple function and would not have to be
overloaded.

I read a nice, simple description the other day:

* Polymorphism is about one function handling multiple types.

* Generics is about producing a custom function for each type.

One could add a variant type (and a switch on the type) to that in order
to achieve the same effect.

>
> Just getting the basics working (eg. A+B if both are of type C) would be
> challenging here.
>
> So perhaps pin down more precisely what features you're thinking of having.
>

Frankly, I'd like the programmer to be able to work purely in terms of
the logic of his algorithm and for the compiler to pick the implementation.

I am looking into various options but there are plenty of 'em!


--
James Harris

Bart

unread,
Sep 9, 2021, 2:10:19 PM9/9/21
to
There is also the case where B overrides some fields of A. They might be
of different types and sizes from A's, which makes keeping them at the
same offsets hard, and cause other difficulties in using a value X which
can be of either A or B types.

> As long as both types of object have A's fields at the same offsets then
> code which expects an A can still work on A's fields and doesn't care
> whether the object is really of type A or of type B.

In my dynamic language, none of these matters. The offsets can be
different, as they are picked up at runtime; and the types can be
different anyway.

But I ran into trouble with static code.


> ...
>
>>> The other is a variant
>>>
>>>    typedef C = int or float
>>>
>>>    function f(C P)
>>>
>>> A given activation of function f could have been passed an int or a
>>> float.
>>
>> Here it's easy to understand your intention. But possibly more
>> challenging to implement.
>
> Could function f not just use a switch?

What, for each reference to P inside the body? What about when you have
the expression P+Q+R, each of which has type C; will you need an 8-way
switch, or ... ?


As in
>
>   switch P.tag
>     case int
>       /* int processing */
>     case float
>       /* float processing */
>   end switch

... would the switch be on entry to the function, where it looks at all
parameter combinations and then calls a specific instantiation for that?
That would just be generics, and would preferably be done at a
compile-time by a compiler that is aware of the actual types.

> (That switch could either be explicit in the source code or implicit and
> generated by the compiler.)

>>
>> That looks a bit like a Haskell type, but if you're not implementing
>> that, would f be a generic function? Or would C be a variant type
>> along the lines of C++'s? Will there be overloads of f()?
>
> To answer your questions: in the example, I had in mind C being a
> variant type; f could be a simple function and would not have to be
> overloaded.
>
> I read a nice, simple description the other day:
>
> * Polymorphism is about one function handling multiple types.
>
> * Generics is about producing a custom function for each type.
>
> One could add a variant type (and a switch on the type) to that in order
> to achieve the same effect.

I use type-dispatching in dynamic code; I have difficulty seeing how it
could be applied efficiently to static code.

How about looking at an expression such as A:=B+C*D, where these are all
variants, and mocking up what generate code might look like. Try also
X[i] where X has variant elements, and i might be variant too.

Or maybe write some C++ code and see what the compiler generates.



> Frankly, I'd like the programmer to be able to work purely in terms of
> the logic of his algorithm and for the compiler to pick the implementation.

You might be on the verge of switching to a dynamically typed language!

James Harris

unread,
Sep 9, 2021, 2:32:04 PM9/9/21
to
On 09/09/2021 19:10, Bart wrote:
> On 09/09/2021 18:25, James Harris wrote:
>> On 04/09/2021 22:35, Bart wrote:
>>> On 04/09/2021 21:44, James Harris wrote:

...

>>>>    type A
>>>>      |
>>>>    type B
>>>>
>>>> then a parameter declared to be of type A could include a function
>>>> from either type A or type B. (Or is it the other way round?)
>>>>
>>>>    function f(A P)
>>>>
>>>> A given activation of function f could be required to use either A's
>>>> or B's version of a function.
>>>
>>> I didn't understand that at all. I guess inheritance is not for me.
>>
>> Inheritance is maybe not for /anyone/! It encourages composition by
>> tall hierarchies rather than by flattening and simplifying but that's
>> another matter. The model I had in mind for that example is a simple one:
>>
>> An object of type A has fields Aa, Ab and Ac
>>
>> An object of type B has all of A's fields and some of its own.
>
> There is also the case where B overrides some fields of A. They might be
> of different types and sizes from A's, which makes keeping them at the
> same offsets hard, and cause other difficulties in using a value X which
> can be of either A or B types.

AIUI in a statically typed language data types are typically not
changed. (They may be in a language which has dynamic typing.) Only the
functions will be overrdden, not the data type. So the offsets would not
change, and a type B could appear to be a type A.

>
>> As long as both types of object have A's fields at the same offsets
>> then code which expects an A can still work on A's fields and doesn't
>> care whether the object is really of type A or of type B.
>
> In my dynamic language, none of these matters. The offsets can be
> different, as they are picked up at runtime; and the types can be
> different anyway.
>
> But I ran into trouble with static code.

Did you try to change the types?

>
>>>> The other is a variant
>>>>
>>>>    typedef C = int or float
>>>>
>>>>    function f(C P)
>>>>
>>>> A given activation of function f could have been passed an int or a
>>>> float.
>>>
>>> Here it's easy to understand your intention. But possibly more
>>> challenging to implement.
>>
>> Could function f not just use a switch?
>
> What, for each reference to P inside the body? What about when you have
> the expression P+Q+R, each of which has type C; will you need an 8-way
> switch, or ... ?
>

If those are integers I'd have thought /you/ would just expand them all
up ... to 64 bits!

A more typical use than numbers would probably be nodes of different types.

...

> I use type-dispatching in dynamic code; I have difficulty seeing how it
> could be applied efficiently to static code.
>
> How about looking at an expression such as A:=B+C*D, where these are all
> variants, and mocking up what generate code might look like. Try also
> X[i] where X has variant elements, and i might be variant too.

1. Convert i to int.
2. Get the address of the element.

What is needed next will be the same as operating on a scalar.

>
> Or maybe write some C++ code and see what the compiler generates.

Why would I want to do that?!

>
>
>
>> Frankly, I'd like the programmer to be able to work purely in terms of
>> the logic of his algorithm and for the compiler to pick the
>> implementation.
>
> You might be on the verge of switching to a dynamically typed language!
>

Not me!


--
James Harris

James Harris

unread,
Sep 9, 2021, 3:39:08 PM9/9/21
to
On 03/09/2021 18:27, Dmitry A. Kazakov wrote:
> On 2021-09-03 18:43, James Harris wrote:
>> On 03/09/2021 16:28, Dmitry A. Kazakov wrote:
>>> On 2021-09-03 16:59, James Harris wrote:
>>>
>>>> Why not say that each surface has to provide a set of primitives
>>>> that other programs can use to draw on an instance of that type of
>>>> surface?
>>>
>>> Because that set is open ended. OOP allows you
>>>
>>> 1. adding new primitives (new shapes)
>>>
>>> 2. using primitives with new surfaces (inheritance)
>>>
>>> All counter proposals are static and do not scale in practice, which
>>> is why OOP is used so widely in large projects regardless any
>>> difficulties. There is simple no other way to do it, economically,
>>> there is no other way.
>>>
>>
>> Your example was good in showing multiple dispatch but I'm not so sure
>> it's a good solution in this case, let alone the only way!
>
> Of course it is. Write a program that renders a list of shapes in a
> widget in a way that you would not need to modify each time another
> developers team creates a new shape or supports another surface.

Sure. I would have surfaces support different interfaces: pixmap, vector
and whatever else was needed, and have a basic set of component shapes
that surfaces could draw on themselves: lines, triangles, rectangles,
ellipses, etc, and bit blit. Then widgets could then use those
primitives to do all required drawing of basic and complex shapes.

There is no need that I can see for the complexity or the lack of
scalability that you seem to be proposing.

>
>> The rights and wrongs of the approach are OT so all I'll say is that
>> M*N does not scale and that new shapes can be constructed from
>> primitives.
>
> You did not thought it through. A set of primitives supported by all
> shapes is already a class. This is exactly what a class of shapes is.
> Any client of the class expects to find an implementation of any
> primitive in any surface.

I don't know what that means but composite shapes can be made up of
primitive ones. I suspect, in fact, that that's how rendering engines
work: a set of primitives that the hardware or firmware knows how to
render and widget which know how to use those primitives to draw
everything they want to.

>
> Remove the class, you will need to write a new client for each surface.

I disagree. As above, I'd use a common interface.

>
>> Unless I am missing something the cartesian M*N would be costly and so
>> even economically, as you mention,
>
> And its existence is a fact of life. Shapes exist, surfaces exist,
> drawing exist.
>
>> I cannot agree that there is no other way.
>
> Show the other way to implement and use M*N operations safely.

See above.


--
James Harris

James Harris

unread,
Sep 9, 2021, 4:30:27 PM9/9/21
to
On 04/09/2021 20:56, Dmitry A. Kazakov wrote:
> On 2021-09-04 19:43, James Harris wrote:
>> On 03/09/2021 16:46, Dmitry A. Kazakov wrote:
>>> On 2021-09-03 16:40, James Harris wrote:
>>>> On 03/09/2021 14:39, Dmitry A. Kazakov wrote:
>>
>> ...
>>
>>>>> Hash functions are pretty quick.
>>>>
>>>> Hash functions are not fast enough to find a call target! I was/am
>>>> even concerned about the extra cost of following a double pointer.
>>>> Using a hash table would be slower still.
>>>
>>> Faster than anything the user would have to write otherwise, in order
>>> to emulate dispatch.
>>
>> On the contrary, where the scheme I mentioned would have just two
>> dereferences to get the address of the required function ... your
>> scheme would have multiple dereferences, a function call, a hash
>> calculation, and possibly collision resolution.
>
> You did not propose anything for the case of full dispatch. Certainly
> specialized cases like single dispatch or multi-methods could be handled
> differently.

If by 'full dispatch' you mean selecting a function based on the types
of all of its arguments then I don't remember you proposing a mechanism
for it. Yes, you spoke about something that could be used in toy
situations where you had two or three parameters and a handful of types
but it would not scale. If you had just five parameters which could be
of a dozen types. You would need a dispatching table of 5^12 or 244
million pointers to who knows how many functions. That is fantasyland.

>
> The point was, if you do not support this in the language, the
> programmer will have to implement it, which is always slower and always
> error prone.

I agree that there are things the language should do for the programmer.
Where I am less certain is the multiple-dispatch /concept/ you want to
compile.

>
>>> Then performance is never a big concern.
>>
>> I'm really surprised that such a lookup would even be considered. It
>> would be much slower than a direct function call.
>
> Again, what is the alternative?

I have some ideas but I'd need to see some examples of what the problems
really are before coming up with a concrete proposal.

>
> GTK is written in C. C has no classes. So GTK emulates them using the
> GObject library. Guess, what is faster, C++ implementation of dispatch
> or the GObject kludge?

Straw man!

https://en.wikipedia.org/wiki/Straw_man

Your argument reminds me of how a certain graphics system got its
reputation completely trashed (unfairly, I gather) by a slow
implementation.

>
>> What is your objection to my proposed solution? To be clear, the idea
>> is to have the object's pointer to link directly to the method table
>> as follows.
>>
>>    object:   |___ptr___|___field0___|___field1___| etc
>>                   |
>>                   V
>>              |__type___|__reserved__|__ptr__|__ptr__|__ptr__|__ptr__|
>
> Show how this is supposed to work with full multiple dispatch, with
> multi-methods, with multiple inheritance, with constructors and
> destructors, with scalar types.

Those are /your/ desires. I am not sure I would have all those because
as I've said before the fundamental design which you are espousing does
not scale. It's possible that a rethink of the design is needed.

I /would/ support hooks such as constructors and destructors and I can
see a need to support multiple /interfaces/ but I am not sold on either
inheritance or the full dispatch that you keep talking about.

>
> BTW, there is a book by Bjarne Stroustrup with a detailed discussion
> regarding implementation of objects and dispatching tables (virtual
> tables) in C++.
>
>    https://www.stroustrup.com/arm.html
>
> I suggest you read it first in order to get an elementary understanding
> of the problematic.

If I ever start to think like Stroustrup I will have failed.


> And keep in mind that C++ has single dispatch.

OK.

...

> "Virtual function" is C++ term for method.

OK, though I cannot imagine what's virtual about a normal method.

>
>> If that's right is the issue that Ada prohibits virtual functions?
>
> No, Ada's name for virtual function is "primitive operation."

Whew! ;-)

>
>>> In Ada
>>>
>>>     procedure Foo (X : T) is
>>>     begin
>>>        X.Baz; -- Does not dispatch, the type is established to be T
>>>     end Foo;
>>
>> So T.Baz has to be in T and cannot be overridden in a subclass of T?
>
> X.Baz does not dispatch in Ada. In C++ X.baz () it does dispatch.
>

I am confused by your terms. When you say 'dispatch' do you mean
dispatch at run time aka dynamic dispatch

https://en.wikipedia.org/wiki/Dynamic_dispatch

as opposed to dispatch at compile time or static dispatch

https://en.wikipedia.org/wiki/Static_dispatch

?


--
James Harris

Bart

unread,
Sep 9, 2021, 5:05:02 PM9/9/21
to
Where is that written?

I've just tried a bit of C++ online (this example:
https://www.w3schools.com/cpp/cpp_inheritance.asp). I was able to change
the type of 'brand' within derived class 'Vehicle'.

No errors were reported, but it either didn't work properly, or it crashed.

And while the methods could be overridden, I could change the function
signatures (eg. add extra params) and here it worked.

I also created a function taking the base class. I was able to pass both
an instance of the base class, or of the derived class. It worked, but
treated it as a base class (it would call the .honk method of the base
class).

If the function instead took the derived class, I couldn't pass it an
instance of a base class.

So there there seem to certain rules about what can and what can't be done.

>>
>>> As long as both types of object have A's fields at the same offsets
>>> then code which expects an A can still work on A's fields and doesn't
>>> care whether the object is really of type A or of type B.
>>
>> In my dynamic language, none of these matters. The offsets can be
>> different, as they are picked up at runtime; and the types can be
>> different anyway.
>>
>> But I ran into trouble with static code.
>
> Did you try to change the types?

I can't remember my tests. But then I wasn't aware of the rules that OOP
needs to follow to make this stuff work.

> If those are integers I'd have thought /you/ would just expand them all
> up ... to 64 bits!

A variant type can presumably be any mix of types. And if there was one
obvious dominant type, you wouldn't need a variant; you'd use that type.

> A more typical use than numbers would probably be nodes of different types.
>
> ...
>
>> I use type-dispatching in dynamic code; I have difficulty seeing how
>> it could be applied efficiently to static code.
>>
>> How about looking at an expression such as A:=B+C*D, where these are
>> all variants, and mocking up what generate code might look like. Try
>> also X[i] where X has variant elements, and i might be variant too.
>
> 1. Convert i to int.
> 2. Get the address of the element.
>
> What is needed next will be the same as operating on a scalar.

Exactly how different can variant types be? Maybe X is an int, and maybe
X is an array of structs. Or an array of Variants. What are the limitations?

Staying with my X[i] where X's type is array of Variant elements, they
could be ints, floats or strings. And maybe i is a 32-int, or 64-bit
int, or a float.

You need to analyse it each time is is accessed to see if conversion is
needed.

Also, Variants could presumably be assigned something else of a
different type? Then you can't even assume it will stay the same type.


>>
>> Or maybe write some C++ code and see what the compiler generates.
>
> Why would I want to do that?!

To see what limitations it places on the possibilities. Or maybe there
are no limitations and the resulting code is horrendous.

I don't quite understand why you seem to think that dealing with variant
types is straightforward.


>>> Frankly, I'd like the programmer to be able to work purely in terms
>>> of the logic of his algorithm and for the compiler to pick the
>>> implementation.
>>
>> You might be on the verge of switching to a dynamically typed language!
>>
>
> Not me!

If you have a fully flexible Variant type with no limitations, then you
will de facto have a language that supports dynamic typing.

The main difference is needing to specify the set of types it can
assume, which creates an extra difficulty, for example if assigning:

A := B

where A has type V1 and B has type V2, which are known at compile-time
to have some common types, but you still need a runtime check.

Dmitry A. Kazakov

unread,
Sep 10, 2021, 3:08:32 AM9/10/21
to
Who calls these? Your rendering program?

You confuse implementation with means of. Rendering primitives are means
to implement the procedure Draw you would call from your program for
each element from the list shapes. The classic OO way:

for Shape in Shapes_List loop
Shape.Draw (Widget.Get_Redenring Context);
-- Dispatches to the Shape's Draw
end loop;

Your way?

>>> The rights and wrongs of the approach are OT so all I'll say is that
>>> M*N does not scale and that new shapes can be constructed from
>>> primitives.
>>
>> You did not thought it through. A set of primitives supported by all
>> shapes is already a class. This is exactly what a class of shapes is.
>> Any client of the class expects to find an implementation of any
>> primitive in any surface.
>
> I don't know what that means but composite shapes can be made up of
> primitive ones. I suspect, in fact, that that's how rendering engines
> work: a set of primitives that the hardware or firmware knows how to
> render and widget which know how to use those primitives to draw
> everything they want to.

Again, rendering primitives is just another higher-level language, like
OpenGL. This changes nothing. You still have to implement Draw in this
language for this type of shape.

>> Remove the class, you will need to write a new client for each surface.
>
> I disagree. As above, I'd use a common interface.

for Shape in Shapes_List loop

!!! Fill the gap !!!

end loop;

>>> Unless I am missing something the cartesian M*N would be costly and
>>> so even economically, as you mention,
>>
>> And its existence is a fact of life. Shapes exist, surfaces exist,
>> drawing exist.
>>
>>> I cannot agree that there is no other way.
>>
>> Show the other way to implement and use M*N operations safely.
>
> See above.

Nothing to see, so far.

Dmitry A. Kazakov

unread,
Sep 10, 2021, 3:42:36 AM9/10/21
to
On 2021-09-09 22:30, James Harris wrote:
> On 04/09/2021 20:56, Dmitry A. Kazakov wrote:
>> On 2021-09-04 19:43, James Harris wrote:
>>> On 03/09/2021 16:46, Dmitry A. Kazakov wrote:
>>>> On 2021-09-03 16:40, James Harris wrote:
>>>>> On 03/09/2021 14:39, Dmitry A. Kazakov wrote:
>>>
>>> ...
>>>
>>>>>> Hash functions are pretty quick.
>>>>>
>>>>> Hash functions are not fast enough to find a call target! I was/am
>>>>> even concerned about the extra cost of following a double pointer.
>>>>> Using a hash table would be slower still.
>>>>
>>>> Faster than anything the user would have to write otherwise, in
>>>> order to emulate dispatch.
>>>
>>> On the contrary, where the scheme I mentioned would have just two
>>> dereferences to get the address of the required function ... your
>>> scheme would have multiple dereferences, a function call, a hash
>>> calculation, and possibly collision resolution.
>>
>> You did not propose anything for the case of full dispatch. Certainly
>> specialized cases like single dispatch or multi-methods could be
>> handled differently.
>
> If by 'full dispatch' you mean selecting a function based on the types
> of all of its arguments then I don't remember you proposing a mechanism
> for it.

Single dispatch = single controlling argument
Milti-method = several controlling arguments from the same hierarchy
Full multiple dispatch = several controlling arguments from any hierarchies

> Yes, you spoke about something that could be used in toy
> situations where you had two or three parameters and a handful of types
> but it would not scale. If you had just five parameters which could be
> of a dozen types. You would need a dispatching table of 5^12 or 244
> million pointers to who knows how many functions. That is fantasyland.

This is the requirement. Either way, manually or with the compiler
support, the programmers must do that. You cannot tell the customer, it
is too many shapes and too many surfaces you put in the requirements. It
is exactly as many as the customer requested.

>>>> Then performance is never a big concern.
>>>
>>> I'm really surprised that such a lookup would even be considered. It
>>> would be much slower than a direct function call.
>>
>> Again, what is the alternative?
>
> I have some ideas but I'd need to see some examples of what the problems
> really are before coming up with a concrete proposal.

See above. Single dispatch, multi-method, full multiple dispatch, it is
all you need to know.

>> GTK is written in C. C has no classes. So GTK emulates them using the
>> GObject library. Guess, what is faster, C++ implementation of dispatch
>> or the GObject kludge?
>
> Straw man!
>
>   https://en.wikipedia.org/wiki/Straw_man
>
> Your argument reminds me of how a certain graphics system got its
> reputation completely trashed (unfairly, I gather) by a slow
> implementation.

Irrelevant now as you already agreed with me on the principle, compiler
implementation is always better.

>>> What is your objection to my proposed solution? To be clear, the idea
>>> is to have the object's pointer to link directly to the method table
>>> as follows.
>>>
>>>    object:   |___ptr___|___field0___|___field1___| etc
>>>                   |
>>>                   V
>>>              |__type___|__reserved__|__ptr__|__ptr__|__ptr__|__ptr__|
>>
>> Show how this is supposed to work with full multiple dispatch, with
>> multi-methods, with multiple inheritance, with constructors and
>> destructors, with scalar types.
>
> Those are /your/ desires.

The grapes are sour...

> I /would/ support hooks such as constructors and destructors

Then you must confront the issue of dispatching from there.

[Just try to understand that the world is much more complex than you
imagine. Talking about fantasy lands...]

>> BTW, there is a book by Bjarne Stroustrup with a detailed discussion
>> regarding implementation of objects and dispatching tables (virtual
>> tables) in C++.
>>
>>     https://www.stroustrup.com/arm.html
>>
>> I suggest you read it first in order to get an elementary
>> understanding of the problematic.
>
> If I ever start to think like Stroustrup I will have failed.

You may think anything about him, but in the book he (and his co-writer)
discusses the technical issues of implementation of dispatching tables
specifically for single dispatch and full multiple inheritance. It is
very instructive, written in plain language with examples. It will help
you understand the basics. Since, from your proposals and questions, I
see that you still do not.

>> "Virtual function" is C++ term for method.
>
> OK, though I cannot imagine what's virtual about a normal method.

Compare this with "virtual memory" when the memory address goes through
the address table to get the physical address. So the method goes
through the virtual (dispatching) table to get the implementation of for
the specific type.

>>>> In Ada
>>>>
>>>>     procedure Foo (X : T) is
>>>>     begin
>>>>        X.Baz; -- Does not dispatch, the type is established to be T
>>>>     end Foo;
>>>
>>> So T.Baz has to be in T and cannot be overridden in a subclass of T?
>>
>> X.Baz does not dispatch in Ada. In C++ X.baz () it does dispatch.
>
> I am confused by your terms. When you say 'dispatch' do you mean
> dispatch at run time aka dynamic dispatch
>
>   https://en.wikipedia.org/wiki/Dynamic_dispatch
>
> as opposed to dispatch at compile time or static dispatch
>
>   https://en.wikipedia.org/wiki/Static_dispatch

Yes, I mean dispatch as using the dispatching table = dynamic
polymorphism. Static polymorphism is of no interest here.

luserdroog

unread,
Sep 20, 2021, 11:08:52 PM9/20/21
to
On Monday, August 30, 2021 at 4:47:58 AM UTC-5, James Harris wrote:
> On 29/08/2021 20:32, Dmitry A. Kazakov wrote:
> > On 2021-08-29 20:24, James Harris wrote:
> >
> >> Ignoring the length part (as some objects would have their own length
> >> and some would not but that difference is immaterial here) how would a
> >> 'tag' system be laid out in memory and how would one use that to
> >> locate a method?
> >
> > You must consider the following inheritance variants:
> ...
>
> Thanks for the information, Dmitry. I might come back to you about some
> of that. But for this thread I was wondering how the memory might be
> laid out under the tag model that you mentioned. What form does a tag
> take and how does the runtime get from the tag to a method?
>
> If it depends on which model is in use then what about single
> inheritance and the Full MI model you mentioned?

For TMI on tags and tagging, see Representing Type Information in
Dynamically Typed Languages by David Gudeman.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.4394&rep=rep1&type=pdf

James Harris

unread,
Sep 22, 2021, 10:24:12 AM9/22/21
to
On 09/09/2021 22:04, Bart wrote:
> On 09/09/2021 19:32, James Harris wrote:
>> On 09/09/2021 19:10, Bart wrote:
>>> On 09/09/2021 18:25, James Harris wrote:
>>>> On 04/09/2021 22:35, Bart wrote:
>>>>> On 04/09/2021 21:44, James Harris wrote:

...

>>>>>>    type A
>>>>>>      |
>>>>>>    type B

...

>>>> An object of type B has all of A's fields and some of its own.

...

>> So the offsets
>> would not change, and a type B could appear to be a type A.
>
> Where is that written?

Try the following set of slides. They are part of a compilers course and
show how such objects could be implemented, including how they could be
laid out in memory.

http://openclassroom.stanford.edu/MainFolder/courses/Compilers/docs/slides/12-06-object-layout-annotated.pdf

Keeping the fields is a natural consequence of the "is a" relationship
when using static typing. An object of type B _is_ also an object of
type A so it has to be usable wherever an object of type A is usable
and/or has been declared. That's the case even if type B provides new
methods to manipulate A's fields or adds some fields of its own.

To stress, this is about /static/ typing. If using dynamic typing then
the field types /could/ be overridden, and IIUC that's how the
dynamically typed Python works.

There are various permutations but you should find the above slides to
be really helpful in detailing the fundamentals. The full lectures are
probably findable on line, too.

...

> Exactly how different can variant types be? Maybe X is an int, and maybe
> X is an array of structs. Or an array of Variants. What are the
> limitations?

I don't see the need for any limitation on what types could be used but
the permissible types would have to be specified. For example,

typedef T = variant (int # array float # boolean)

(where # separates alternatives and "array float" means an array of
floats).

>
> Staying with my X[i] where X's type is array of Variant elements, they
> could be ints, floats or strings. And maybe i is a 32-int, or 64-bit
> int, or a float.
>
> You need to analyse it each time is is accessed to see if conversion is
> needed.
>
> Also, Variants could presumably be assigned something else of a
> different type? Then you can't even assume it will stay the same type.

Wouldn't code which handles variants typically include a switch on the
type as in the following?

switch T.type
case int:
/* Handle int */
case array float:
/* Handle array of float */
end switch /* Ignoring other cases */

>
>
>>>
>>> Or maybe write some C++ code and see what the compiler generates.
>>
>> Why would I want to do that?!
>
> To see what limitations it places on the possibilities. Or maybe there
> are no limitations and the resulting code is horrendous.

The problem with looking at how another language handles matters is that
it can limit the scope for creativity and tend to lead to the same ideas.

Wouldn't it be better to come up with designs based on how objects and
structures might be laid out and used?

>
> I don't quite understand why you seem to think that dealing with variant
> types is straightforward.

I don't think I ever said it was straightforward.

In fact, IMO the opposite is the case. There's a big language-design
challenge, here. Designers have come up with many ways to match objects
with code:

coercion
overloading
variants
polymorphism
generics
metatypes
macros

It may be better to get rid of some of them and absorb what they can do
into a common approach - as long as it makes programming easier rather
than harder.

>
>
>>>> Frankly, I'd like the programmer to be able to work purely in terms
>>>> of the logic of his algorithm and for the compiler to pick the
>>>> implementation.
>>>
>>> You might be on the verge of switching to a dynamically typed language!
>>>
>>
>> Not me!
>
> If you have a fully flexible Variant type with no limitations, then you
> will de facto have a language that supports dynamic typing.

Indeed! Variants effectively allow a programmer to use dynamically typed
objects in a statically typed language. But variants have the advantage
of letting a programmer restrict which types can be used.


BTW, if anyone is wondering why I haven't yet replied to some other
message in this group I should say it's probably your own fault! You
shouldn't raise so many big issues!


--
James Harris

Dmitry A. Kazakov

unread,
Sep 22, 2021, 11:39:12 AM9/22/21
to
On 2021-09-22 16:24, James Harris wrote:

> Try the following set of slides. They are part of a compilers course and
> show how such objects could be implemented, including how they could be
> laid out in memory.
>
> http://openclassroom.stanford.edu/MainFolder/courses/Compilers/docs/slides/12-06-object-layout-annotated.pdf

A quite poor introduction that does not even cover multiple inheritance.

> Keeping the fields is a natural consequence of the "is a" relationship
> when using static typing. An object of type B _is_ also an object of
> type A so it has to be usable wherever an object of type A is usable
> and/or has been declared.

Not quite. Is-a = substitutable, is a member of same class. B is not A,
it can fake A.

The means of substitutution may differ. If B contains a representation
of A, then substitution could be a mere reference shift. This why
languages tend to limit inheritance to extension only and pass tagged
objects by reference.

Otherwise, to achieve substitution you would need to create a new object
A from B. In C and C++ implicit conversions are that sort of is-a
relationship, e.g. float is-a double.

Bart

unread,
Sep 22, 2021, 3:47:48 PM9/22/21
to
On 22/09/2021 15:24, James Harris wrote:
> On 09/09/2021 22:04, Bart wrote:

>> Also, Variants could presumably be assigned something else of a
>> different type? Then you can't even assume it will stay the same type.
>
> Wouldn't code which handles variants typically include a switch on the
> type as in the following?
>
>   switch T.type
>     case int:
>       /* Handle int */
>     case array float:
>       /* Handle array of float */
>   end switch /* Ignoring other cases */

You need to show an example of source code where such types are used,
and see that the generated code might be like.

But let's take your example:

type U = int |[]float

U a, b, c

a := b + c

Here, it knows that b and c are both of type U, but doesn't know - at
compile-time - whether they are ints or arrays.

So just to do b+c, it will need 2-way dispatch:


int temp
if b.type=c.type=int then # Hey, a chained compare example!
temp = b.ivalue+c.ivalue
else
.... # error? Unless "+" is defined for float arrays, array+iny
end

Now to assign result to a:

if a.type=float_array then
// destroy resources, or decr reference count for array
end

a.type = int
a.ivalue = temp

And that's just for one valid combination, and one type of temp, so that
'a' is only ever going to be assigned an int. How about:

type U = int32 | int64 | float32 | float64

U a, b, c, d, e

a = b*c + d*e

All 16 combinations of types should be valid, but do you want to try
writing the switch statements for them?

I don't think this amount of boilerplate is practical to write as inline
code.

You should however try doing so for example fragments of code to see as
I suggested to see what is involved.

This type of variant means that:

a := a[i]

could be legal. Anything goes.

Dmitry A. Kazakov

unread,
Sep 22, 2021, 5:43:27 PM9/22/21
to
On 2021-09-22 21:47, Bart wrote:
> On 22/09/2021 15:24, James Harris wrote:
>> On 09/09/2021 22:04, Bart wrote:
>
>>> Also, Variants could presumably be assigned something else of a
>>> different type? Then you can't even assume it will stay the same type.
>>
>> Wouldn't code which handles variants typically include a switch on the
>> type as in the following?
>>
>>    switch T.type
>>      case int:
>>        /* Handle int */
>>      case array float:
>>        /* Handle array of float */
>>    end switch /* Ignoring other cases */
>
> You need to show an example of source code where such types are used,

Come on, is that a joke?

https://docs.microsoft.com/en-us/windows/win32/api/oaidl/ns-oaidl-variant

http://documentation.unified-automation.com/uasdkcpp/1.5.2/html/structOpcUa__Variant.html

https://en.wikipedia.org/wiki/Tagged_union

https://zone.ni.com/reference/en-XX/help/371361R-01/lvhowto/variants/

Next you will ask for examples of loops?

Bart

unread,
Sep 22, 2021, 6:13:34 PM9/22/21
to
So where are the actual examples of using those variants? What are the
capabilities? How are those capabilities implemented: what code is
generated, how much is generated, and how efficient is it?

'Variants' seem to be mean lots of different things.

I've used them for 35 years within dynamic scripting languages, so I
know the answers for my own code. But I assume the context here is for a
static language or perhaps one with type inference.

That Switch example is meaningless without knowing what operation is
being performed in the source code, hence the request for the
corresponding fragment of source code.

Or perhaps you can tell just from the example?

I'm played with variants in my static language, but they have now been
dropped; they were too difficult to deal with in the general case (eg.
arbitrary lists of elements that can be variants and also arbitrary
lists....).

But this fragment of code from an old version, where 'variant' can be
any /object/ type (ie. one implemented as a reference to a
reference-counted descriptor):

variant a, b, c
a := b + c

generated this x64 code (not ABI-compliant):

lea D0, [Dframe+-8]
push D0
call [L5] # init a
lea D0, [Dframe+-16]
push D0
call [L5] # init b
lea D0, [Dframe+-24]
push D0
call [L5] # init c

mov D0, [Dframe+-16] # 'push b'
inc word32 [D0] # step ref count
mov D1, [Dframe+-24] # 'push c'
inc word32 [D1] # step ref count
push D0
push D1
call [L6] # addvar(b, c)
lea D1, [Dframe+-8] # &a
push D0
push D1
call [L7] # 'pop 'a'

So it's all done with function calls; little inline code. And it's
inefficient.

But that's how /I/ did it. This might not be what James has in mind.

David Brown

unread,
Sep 23, 2021, 3:10:31 AM9/23/21
to
On 22/09/2021 21:47, Bart wrote:
> On 22/09/2021 15:24, James Harris wrote:
>> On 09/09/2021 22:04, Bart wrote:
>
>>> Also, Variants could presumably be assigned something else of a
>>> different type? Then you can't even assume it will stay the same type.
>>
>> Wouldn't code which handles variants typically include a switch on the
>> type as in the following?
>>
>>    switch T.type
>>      case int:
>>        /* Handle int */
>>      case array float:
>>        /* Handle array of float */
>>    end switch /* Ignoring other cases */
>
> You need to show an example of source code where such types are used,
> and see that the generated code might be like.
>

Variant types are typically implemented by having a "type" field - an
integer that says which of the allowed types is currently valid. In
C++, this is std::variant. In Pascal, it is a "tagged union" or
"variant record".

In C, you might have :

typedef struct {
enum { T_is_int, T_is_float } type_index;
union {
int int_value;
float float_value;
}
} T;

switch (t.type_index) {
case T_is_int :
// Use t.int_value
break;
case T_is_float :
// Use t.float_value
break;
}


You can produce similar code in any language. Variant types are not
magic, but of course there is plenty of scope for making them neater and
safer than in C! (It doesn't have to be part of the language - with a
powerful enough language, they can be part of the library.)

And compilers can optimise their knowledge of the type stored to reduce
storing or checking "type_index" as much as possible.
If you write impractical code, you must expect inefficient results.

It is not common for constructed types to automatically inherit support
for operations or functions from their parts. An array of floats does
not inherit addition from float, a struct of 3 ints does not inherit
multiplication from its constituent ints. A variant or tagged union
does not (in most languages or libraries) inherit arithmetic operations
from its parts even if they all support them individually.

So the programmer who makes the U type, decides what is sensible to do
with elements of the type. /They/ decide which combinations are
allowed, and how they are implemented.

(A helpful language might also generate some of the boilerplate code
automatically.)

Of course, a variant like your example here is unlikely to be found in
real code. Some more likely examples could be a type that can hold a
valid result or an error number:


type Result = Value int | Error int

A binomial "Tree" might be a leaf with a string, or a node with two
branches :

type Tree = Leaf string | Node Tree Tree

In more theoretical programming (such as with functional programming
languages), such "sum types" are as important as "product types", which
you know of as structs or records.

>
> You should however try doing so for example fragments of code to see as
> I suggested to see what is involved.
>
> This type of variant means that:
>
>    a := a[i]
>
> could be legal. Anything goes.

Anything the programmer decides to allow and support, goes.

Dmitry A. Kazakov

unread,
Sep 23, 2021, 5:14:10 AM9/23/21
to
On 2021-09-23 00:13, Bart wrote:

> That Switch example is meaningless without knowing what operation is
> being performed in the source code, hence the request for the
> corresponding fragment of source code.

The predefined operations should be:

1. Equality/inequality
2. Access to the constraint (the selector value, read-only)
3. Member access (the variant value, read/write)
4. Aggregate (literal value)
5. Assignment as a whole, IFF variant is definite, otherwise, no assignment.

Bart

unread,
Sep 23, 2021, 5:38:32 AM9/23/21
to
On 23/09/2021 10:14, Dmitry A. Kazakov wrote:
> On 2021-09-23 00:13, Bart wrote:
>
>> That Switch example is meaningless without knowing what operation is
>> being performed in the source code, hence the request for the
>> corresponding fragment of source code.
>
> The predefined operations should be:
>
> 1. Equality/inequality
> 2. Access to the constraint (the selector value, read-only)
> 3. Member access (the variant value, read/write)
> 4. Aggregate (literal value)
> 5. Assignment as a whole, IFF variant is definite, otherwise, no
> assignment.

But you can't do A+B? See, this is the sort of information which needs
to be explicitly stated: what sort of 'variant' are we talking about;
what are the possibilities, and what even source code might look like.

What do you have to do in a language with YOUR idea of a variant, in
order to evaluate A+B? Assuming it can be done, what goes on behind the
scenes?

If you have a variant which can include suitable types, tell me why any
of the following operations, which I generally allow, should not apply
to variants.

In your first link, there is a variant record shown with 50 different
types. A user-defined one could have more than that, including user
defined types.


add
sub
mul
div
idiv
irem
iand
ior
ixor
and
or
shl
shr
eq
ne
lt
le
ge
gt
neg
abs
inot
not
sqrt
sin
cos etc
conversions
indexing
slicing
bit indexing
bitfield slicing
member selection
call
len
lwb/upb
minvalue/maxvalue
switch index
in/not in
min/max

addto (augmented assignmnt)
subto
multo
divto
idivto
iremto
iandto
iorto
ixorto
andto
orto
shlto
shrto
negto
absto
inotto
notto
incr
decr
minto/maxto

Oh, and assignment. And sort(). And dozens of other operations usually
implemented as library functions rather than as built-in ops.


Bart

unread,
Sep 23, 2021, 5:54:38 AM9/23/21
to
No one has yet explained exactly what kind of variant they are talking
about.

You're assuming it is just a tagged union or sumtype, where all you can
do is write explicit code to extract one of the possible values.

Or if you try and use one of those values directly (t.int_value) the
compiler will perform a behind the scenes check that it is an int.

The C++ variant might do a little bit more.

The kind of variant I've used in dynamic code is considerably more powerful:

for x in (100, 23.4, "cat", 50..60, [7,8,9], (100,110,120)) do
println x.type, "\t", x
end

Here, 'x' is a variant type (actually, everything is), and can store
/anything/. The output of this program is:

int 100
real 23.400000
string cat
range 50..60
set [7..9]
list (100,110,120)

I assumed James was talking about the kind of variant which can only
represent a predefined set of types, and which was to be implemented in
static code.

Dmitry A. Kazakov

unread,
Sep 23, 2021, 6:22:43 AM9/23/21
to
On 2021-09-23 11:38, Bart wrote:
> On 23/09/2021 10:14, Dmitry A. Kazakov wrote:
>> On 2021-09-23 00:13, Bart wrote:
>>
>>> That Switch example is meaningless without knowing what operation is
>>> being performed in the source code, hence the request for the
>>> corresponding fragment of source code.
>>
>> The predefined operations should be:
>>
>> 1. Equality/inequality
>> 2. Access to the constraint (the selector value, read-only)
>> 3. Member access (the variant value, read/write)
>> 4. Aggregate (literal value)
>> 5. Assignment as a whole, IFF variant is definite, otherwise, no
>> assignment.
>
> But you can't do A+B?

Because it is not defined. If the programmer wants "+", Foo, Boo he can
declare and implement them as with any other type.

> What do you have to do in a language with YOUR idea of a variant, in
> order to evaluate A+B? Assuming it can be done, what goes on behind the
> scenes?

Nothing. What you see is what you get.

> If you have a variant which can include suitable types, tell me why any
> of the following operations, which I generally allow, should not apply
> to variants.

Tell me why they should. Days of PL/1 are gone.

Dmitry A. Kazakov

unread,
Sep 23, 2021, 6:29:45 AM9/23/21
to
On 2021-09-23 11:54, Bart wrote:

> You're assuming it is just a tagged union or sumtype, where all you can
> do is write explicit code to extract one of the possible values.

Why there should be more?

If you want more, you must tell that in the language. For this you need
subtypes (that pesky polymorhpism), e.g. variant being a subtype of the
type of some/all of its members.

It is the stuff you are unwilling to support, telling us that you do not
need it. Why asking then?

Bart

unread,
Sep 23, 2021, 6:40:12 AM9/23/21
to
On 23/09/2021 11:22, Dmitry A. Kazakov wrote:
> On 2021-09-23 11:38, Bart wrote:
>> On 23/09/2021 10:14, Dmitry A. Kazakov wrote:
>>> On 2021-09-23 00:13, Bart wrote:
>>>
>>>> That Switch example is meaningless without knowing what operation is
>>>> being performed in the source code, hence the request for the
>>>> corresponding fragment of source code.
>>>
>>> The predefined operations should be:
>>>
>>> 1. Equality/inequality
>>> 2. Access to the constraint (the selector value, read-only)
>>> 3. Member access (the variant value, read/write)
>>> 4. Aggregate (literal value)
>>> 5. Assignment as a whole, IFF variant is definite, otherwise, no
>>> assignment.
>>
>> But you can't do A+B?
>
> Because it is not defined. If the programmer wants "+", Foo, Boo he can
> declare and implement them as with any other type.

Show me. Let's say with a variant that can be an Integer, or a String.

And let's say that "+" is defined between two such variants when they
contain integers.

What does the source code look like to make it possible. And what will
the behind-the-scenes code be?

Assume that what the user wants to do is something like this:

Variant (String | Int) a, b, c

b := 200
c := 300
.... undisclosed code
a := b + c

print a

Would that be possible? What should it do when 'c' is set to a string?


>> What do you have to do in a language with YOUR idea of a variant, in
>> order to evaluate A+B? Assuming it can be done, what goes on behind
>> the scenes?
>
> Nothing. What you see is what you get.
>
>> If you have a variant which can include suitable types, tell me why
>> any of the following operations, which I generally allow, should not
>> apply to variants.
>
> Tell me why they should. Days of PL/1 are gone.


Most languages allows all those ops, or a big subset.

You seem to be saying that A+B is allowed when A and B are both int types.

But not when A and B are variant types, even if that type is (Int |
Real), not without a lot of work.

Then that's not a Variant type as I understand it. It's just a union or
tagged union, a much cruder type.

Dmitry A. Kazakov

unread,
Sep 23, 2021, 7:15:28 AM9/23/21
to
On 2021-09-23 12:40, Bart wrote:
> On 23/09/2021 11:22, Dmitry A. Kazakov wrote:
>> On 2021-09-23 11:38, Bart wrote:
>>> On 23/09/2021 10:14, Dmitry A. Kazakov wrote:
>>>> On 2021-09-23 00:13, Bart wrote:
>>>>
>>>>> That Switch example is meaningless without knowing what operation
>>>>> is being performed in the source code, hence the request for the
>>>>> corresponding fragment of source code.
>>>>
>>>> The predefined operations should be:
>>>>
>>>> 1. Equality/inequality
>>>> 2. Access to the constraint (the selector value, read-only)
>>>> 3. Member access (the variant value, read/write)
>>>> 4. Aggregate (literal value)
>>>> 5. Assignment as a whole, IFF variant is definite, otherwise, no
>>>> assignment.
>>>
>>> But you can't do A+B?
>>
>> Because it is not defined. If the programmer wants "+", Foo, Boo he
>> can declare and implement them as with any other type.
>
> Show me. Let's say with a variant that can be an Integer, or a String.

function "+" (Left, Right : Variant) return Variant;

>>> If you have a variant which can include suitable types, tell me why
>>> any of the following operations, which I generally allow, should not
>>> apply to variants.
>>
>> Tell me why they should. Days of PL/1 are gone.
>
> Most languages allows all those ops, or a big subset.

None I know of.

> You seem to be saying that A+B is allowed when A and B are both int types.
>
> But not when A and B are variant types, even if that type is (Int |
> Real), not without a lot of work.

Again, what is the reason why?

> Then that's not a Variant type as I understand it. It's just a union or
> tagged union, a much cruder type.

You confuse a variant type with an ad-hoc class of types. A variant type
is not a class of its member types. These are semantically different
types, though they might have similar representations.

Bart

unread,
Sep 23, 2021, 7:34:13 AM9/23/21
to
On 23/09/2021 12:15, Dmitry A. Kazakov wrote:
> On 2021-09-23 12:40, Bart wrote:
>> On 23/09/2021 11:22, Dmitry A. Kazakov wrote:
>>> On 2021-09-23 11:38, Bart wrote:
>>>> On 23/09/2021 10:14, Dmitry A. Kazakov wrote:
>>>>> On 2021-09-23 00:13, Bart wrote:
>>>>>
>>>>>> That Switch example is meaningless without knowing what operation
>>>>>> is being performed in the source code, hence the request for the
>>>>>> corresponding fragment of source code.
>>>>>
>>>>> The predefined operations should be:
>>>>>
>>>>> 1. Equality/inequality
>>>>> 2. Access to the constraint (the selector value, read-only)
>>>>> 3. Member access (the variant value, read/write)
>>>>> 4. Aggregate (literal value)
>>>>> 5. Assignment as a whole, IFF variant is definite, otherwise, no
>>>>> assignment.
>>>>
>>>> But you can't do A+B?
>>>
>>> Because it is not defined. If the programmer wants "+", Foo, Boo he
>>> can declare and implement them as with any other type.
>>
>> Show me. Let's say with a variant that can be an Integer, or a String.
>
>    function "+" (Left, Right : Variant) return Variant;


So, what does that do? Does it magically allow A+B?

If not, then what's the point?

If it does, then we're back to square one: if A+B is now allowed in user
code, how is it implemented?

(Also, what is the 'Variant' in your example; is it the name of a
specific user type, so N such declarations are needed, or is it a
generic 'variant' so that this declaration suffices for all possible types?)

Suppose, also, that variant A is (int | real), and variant B is (real |
string)'; can I do A+B? (Here A,B represent types.)

I'm not expecting any actual concrete answers. I'm just interested in
how you will avoid answering the question this time.


>
>>>> If you have a variant which can include suitable types, tell me why
>>>> any of the following operations, which I generally allow, should not
>>>> apply to variants.
>>>
>>> Tell me why they should. Days of PL/1 are gone.
>>
>> Most languages allows all those ops, or a big subset.
>
> None I know of.
>
>> You seem to be saying that A+B is allowed when A and B are both int
>> types.
>>
>> But not when A and B are variant types, even if that type is (Int |
>> Real), not without a lot of work.
>
> Again, what is the reason why?

Why what? Why I should be able to do A+B?

If not, then what is the purpose of a Variant type, other than as a
glorified union.

>
>> Then that's not a Variant type as I understand it. It's just a union
>> or tagged union, a much cruder type.
>
> You confuse a variant type with an ad-hoc class of types.

James' proposal looks to me like an ad-hoc collection of types, within
an umbrella type.

It might also be specifically created to get away from terms like class,
and member and inheritance!

Dmitry A. Kazakov

unread,
Sep 23, 2021, 8:20:33 AM9/23/21
to
On 2021-09-23 13:34, Bart wrote:
> On 23/09/2021 12:15, Dmitry A. Kazakov wrote:
>> On 2021-09-23 12:40, Bart wrote:
>>> On 23/09/2021 11:22, Dmitry A. Kazakov wrote:
>>>> On 2021-09-23 11:38, Bart wrote:
>>>>> On 23/09/2021 10:14, Dmitry A. Kazakov wrote:
>>>>>> On 2021-09-23 00:13, Bart wrote:
>>>>>>
>>>>>>> That Switch example is meaningless without knowing what operation
>>>>>>> is being performed in the source code, hence the request for the
>>>>>>> corresponding fragment of source code.
>>>>>>
>>>>>> The predefined operations should be:
>>>>>>
>>>>>> 1. Equality/inequality
>>>>>> 2. Access to the constraint (the selector value, read-only)
>>>>>> 3. Member access (the variant value, read/write)
>>>>>> 4. Aggregate (literal value)
>>>>>> 5. Assignment as a whole, IFF variant is definite, otherwise, no
>>>>>> assignment.
>>>>>
>>>>> But you can't do A+B?
>>>>
>>>> Because it is not defined. If the programmer wants "+", Foo, Boo he
>>>> can declare and implement them as with any other type.
>>>
>>> Show me. Let's say with a variant that can be an Integer, or a String.
>>
>>     function "+" (Left, Right : Variant) return Variant;
>
> So, what does that do?

Whatever the programmer puts inside it.

> Does it magically allow A+B?

Yes, when you declare a function you can magically call it.

> If not, then what's the point?

No idea, you asked how to declare a function "+". The point is up to you.

> (Also, what is the 'Variant' in your example; is it the name of a
> specific user type, so N such declarations are needed, or is it a
> generic 'variant' so that this declaration suffices for all possible
> types?)

Read the link I posted:

https://en.wikipedia.org/wiki/Tagged_union

> Suppose, also, that variant A is (int | real), and variant B is (real |
> string)'; can I do A+B? (Here A,B represent types.)

Nope.

>>> But not when A and B are variant types, even if that type is (Int |
>>> Real), not without a lot of work.
>>
>> Again, what is the reason why?
>
> Why what? Why I should be able to do A+B?

Yes, why should you be able to call arbitrary operations on arbitrary types.

> If not, then what is the purpose of a Variant type, other than as a
> glorified union.

None, variant is exactly a tagged union AKA variant record.

>>> Then that's not a Variant type as I understand it. It's just a union
>>> or tagged union, a much cruder type.
>>
>> You confuse a variant type with an ad-hoc class of types.
>
> James' proposal looks to me like an ad-hoc collection of types, within
> an umbrella type.

That is not variant, that is an ad-hoc class with inheritance and all
bells and whistles. He may not understand that or be in denial, but that
is his problem.

> It might also be specifically created to get away from terms like class,
> and member and inheritance!

He has his delusions you have yours.

Bart

unread,
Sep 23, 2021, 9:47:56 AM9/23/21
to
On 23/09/2021 13:20, Dmitry A. Kazakov wrote:
> On 2021-09-23 13:34, Bart wrote:

>> If not, then what is the purpose of a Variant type, other than as a
>> glorified union.
>
> None, variant is exactly a tagged union AKA variant record.


Then we're talking at cross purposes.

This was entirely the point of my request for usage examples in source
code, then the kind of variant people had in mind would become more
apparent.

In my language, if I implemented the type of variant I assumed this was,
A+B (and EVERY operation normally allowed for fixed static types) would
be allowed for any two variants, provided it was meaningful between at
least two combinations of types from each side.

But it would be horrendous to implement in a static language.

This is the proposed feature in action:

type T = int | real | string

function biggerof (T a, b)=>T =
if a>b then
return a
else
return b
fi
end

println biggerof(100, 200) # 200
println biggerof(pi, 2) # 3.14159...
println biggerof("two","three") # two (alphatical ordering)
println biggerof(23, "45") # error (runtime or compiletime)

Note that this is not generics, so 'biggerof' is not a template for
multiple instantiations of 'biggerof'; it is just one function.

(This is the kind of example that would have made things clear.)

>
>>>> Then that's not a Variant type as I understand it. It's just a union
>>>> or tagged union, a much cruder type.
>>>
>>> You confuse a variant type with an ad-hoc class of types.
>>
>> James' proposal looks to me like an ad-hoc collection of types, within
>> an umbrella type.
>
> That is not variant, that is an ad-hoc class with inheritance and all
> bells and whistles. He may not understand that or be in denial, but that
> is his problem.
>
>> It might also be specifically created to get away from terms like
>> class, and member and inheritance!
>
> He has his delusions you have yours.

We're both implementing actual languages.


David Brown

unread,
Sep 23, 2021, 9:55:45 AM9/23/21
to
That is almost exactly what a variant /is/. But you are missing a few
operations - you can also read which type is contained in some manner
(perhaps by reading a field, perhaps by pattern matching or a special
type of switch, perhaps by trial-and-error and exceptions), and of
course you can assign values to it.

You seem to be mixing up variants with dynamic typing. Perhaps you have
been playing with Visual Basic, and copied MS's misuse of the term.

>
> Or if you try and use one of those values directly (t.int_value) the
> compiler will perform a behind the scenes check that it is an int.
>
> The C++ variant might do a little bit more.
>
> The kind of variant I've used in dynamic code is considerably more
> powerful:
>
>    for x in (100, 23.4, "cat", 50..60, [7,8,9], (100,110,120)) do
>        println x.type, "\t", x
>    end
>

It is not "more powerful". It is /different/. You can do some things
with it that you can't do with a real variant (as the term is normally
used), but it has other limitations.

Basically, what you are talking about is a void* pointer. That is
flexible, but has severe disadvantages in being sure your code is
correct. (Python uses a similar idea - it is great for writing quick
and dirty scripts or simple code, but terrible when you want to work
with large code bases and where you need to /know/ the code is correct,
not just run some tests.)

Dmitry A. Kazakov

unread,
Sep 23, 2021, 9:57:09 AM9/23/21
to
On 2021-09-23 15:47, Bart wrote:

> This is the proposed feature in action:
>
>    type T = int | real | string
>
>    function biggerof (T a, b)=>T =
>        if a>b then
>           return a
>        else
>           return b
>        fi
>     end

Type inference, when the interfaces of the class members are somehow
combined = non-starter.

It is impossible to infer semantics from interfaces. No need to discuss.

Bart

unread,
Sep 23, 2021, 10:31:54 AM9/23/21
to
On 23/09/2021 14:57, Dmitry A. Kazakov wrote:
> On 2021-09-23 15:47, Bart wrote:
>
>> This is the proposed feature in action:
>>
>>     type T = int | real | string
>>
>>     function biggerof (T a, b)=>T =
>>         if a>b then
>>            return a
>>         else
>>            return b
>>         fi
>>      end
>
> Type inference, when the interfaces of the class members are somehow
> combined = non-starter.
>
> It is impossible to infer semantics from interfaces. No need to discuss.
>

I don't really know what you're talking about. Such a feature works fine
in dynamic code:

function biggerof (a, b) =
if a>b then
return a
else
return b
fi
end

println biggerof(100, 200) ! 200
println biggerof(pi, 2) ! 3.14159...
println biggerof("two","three") ! two (alphatical ordering)
println biggerof(23, "45") ! error (runtime or compiletime)

Output is:

200
3.141593
two
PC Error:
Types not supported: comparemixed : int/string

What it doesn't do is restrict the types to int/real/string. I'd have to
do that manually to emulate my proposal more closely:

var T := (int, real, string) # just a list, not a type

function biggerof (a, b) =
unless a.type in T and b.type in T then
abort("Illegal types")
end
...

Now a call like: 'println biggerof(10..20, 30..40)' will fail ('range'
is not one of int, real, string).

So it is viable in static. But in my view not practical without
implementing most of a dynamic type system.

(Note that my biggerof() function does the same job as the built-in
max() operator; I couldn't think of a better example off-hand.)

Dmitry A. Kazakov

unread,
Sep 23, 2021, 10:48:00 AM9/23/21
to
On 2021-09-23 16:31, Bart wrote:
> On 23/09/2021 14:57, Dmitry A. Kazakov wrote:
>> On 2021-09-23 15:47, Bart wrote:
>>
>>> This is the proposed feature in action:
>>>
>>>     type T = int | real | string
>>>
>>>     function biggerof (T a, b)=>T =
>>>         if a>b then
>>>            return a
>>>         else
>>>            return b
>>>         fi
>>>      end
>>
>> Type inference, when the interfaces of the class members are somehow
>> combined = non-starter.
>>
>> It is impossible to infer semantics from interfaces. No need to discuss.
>
> I don't really know what you're talking about. Such a feature works fine
> in dynamic code:

You just have illustrated the point by providing a broken program that
has a type error at run time.

Bart

unread,
Sep 23, 2021, 11:23:18 AM9/23/21
to
It has an /error/ at runtime. Lots of programs have errors which occur
then, such as input errors. The language might deal with that by
generating an exception, which can be checked for, or just checking the
inputs, then it can issue a message, or do something else.

The program below does arithmetic on user input, so the compiler won't
know what types are being added (since they are dynamic).

Are you saying this program is broken?

const expr_error=10

do
print "Enter A and B> "
readln a, b # a,b can be numbers, or strings
stop when a=""

try
c := dosum(a,b)
fprintln "# + # is #", a, b, c

except expr_error then
fprintln "Can't add # and #", a, b

end
od

function dosum(a,b)=
unless a.isnumber and b.isnumber then
raise exprerror
end
return a+b
end

(This language can actually add strings or numbers, but it can't mix
them. The example allows only numbers.)

Dmitry A. Kazakov

unread,
Sep 23, 2021, 11:47:43 AM9/23/21
to
On 2021-09-23 17:23, Bart wrote:
> On 23/09/2021 15:48, Dmitry A. Kazakov wrote:
>> On 2021-09-23 16:31, Bart wrote:
>>> On 23/09/2021 14:57, Dmitry A. Kazakov wrote:
>>>> On 2021-09-23 15:47, Bart wrote:
>>>>
>>>>> This is the proposed feature in action:
>>>>>
>>>>>     type T = int | real | string
>>>>>
>>>>>     function biggerof (T a, b)=>T =
>>>>>         if a>b then
>>>>>            return a
>>>>>         else
>>>>>            return b
>>>>>         fi
>>>>>      end
>>>>
>>>> Type inference, when the interfaces of the class members are somehow
>>>> combined = non-starter.
>>>>
>>>> It is impossible to infer semantics from interfaces. No need to
>>>> discuss.
>>>
>>> I don't really know what you're talking about. Such a feature works
>>> fine in dynamic code:
>>
>> You just have illustrated the point by providing a broken program that
>> has a type error at run time.
>
> It has an /error/ at runtime.

Not an error, a type error = the language's type system is broken, as
expected.

Bart

unread,
Sep 23, 2021, 12:05:50 PM9/23/21
to
The language has one type: a 'variant'. A and B will ALWAYS be variants,
so they can never have the wrong type.

The tagged type-code is effectively runtime data.

You clearly don't like it, and anything you don't like, is broken in
your eyes.

Dmitry A. Kazakov

unread,
Sep 23, 2021, 2:02:23 PM9/23/21
to
They do since you infer operations from member types. That cannot work
and it does not.

Properly designed languages inherit interfaces rather than infer them.
Interface lists operations applicable to the type.

This allows to ensure existence of implementations of all operations at
compile time, statically, so that no type error (AKA "operation not
understood") may ever occur.

Bart

unread,
Sep 23, 2021, 2:30:01 PM9/23/21
to
On 23/09/2021 19:02, Dmitry A. Kazakov wrote:
> On 2021-09-23 18:05, Bart wrote:

>> The language has one type: a 'variant'. A and B will ALWAYS be
>> variants, so they can never have the wrong type.
>
> They do since you infer operations from member types. That cannot work
> and it does not.
>
> Properly designed languages inherit interfaces rather than infer them.
> Interface lists operations applicable to the type.
>
> This allows to ensure existence of implementations of all operations at
> compile time, statically, so that no type error (AKA "operation not
> understood") may ever occur.

So tagged unions are no good either. Since you don't know their exact
active type until runtime.

Anyway, the lines between compiled and interpreted, compile-time and
run-time, are less sharply defined now.

With a very fast compiler, you might be able to run Ada source code
directly from source, rather than a precompiled binary.

If there is a type error, the user won't care if that was at
compile-time or run-time, as there is no distinction.

Dmitry A. Kazakov

unread,
Sep 23, 2021, 3:12:09 PM9/23/21
to
On 2021-09-23 20:29, Bart wrote:
> On 23/09/2021 19:02, Dmitry A. Kazakov wrote:
>> On 2021-09-23 18:05, Bart wrote:
>
>>> The language has one type: a 'variant'. A and B will ALWAYS be
>>> variants, so they can never have the wrong type.
>>
>> They do since you infer operations from member types. That cannot work
>> and it does not.
>>
>> Properly designed languages inherit interfaces rather than infer them.
>> Interface lists operations applicable to the type.
>>
>> This allows to ensure existence of implementations of all operations
>> at compile time, statically, so that no type error (AKA "operation not
>> understood") may ever occur.
>
> So tagged unions are no good either. Since you don't know their exact
> active type until runtime.

Why do I need that? Again, all interfaces are manifested and static and
apply whatever value of the selector. See?

You still do not understand basic intertype relationships. The relation
between the variant type T having int as a choice is aggregation has-a.
T does not inherit anything from int, so it does not have + of int's
interface because it is has-a.

https://en.wikipedia.org/wiki/Has-a

This is what languages having tagged unions provide.

What you unsuccessfully trying to figure out is a very different thing,
is-a.

https://en.wikipedia.org/wiki/Is-a

That is when the type T inherits from int, and maybe from float and
maybe from string. So that an instance of T may play an int, or a float,
or a string.

This is not what the variant type is. This is multiple inheritance and
class-wide object which actual tag indicates the actual specific type
int, float, string. That one could inherit + from int, + from float,
nothing from string. Then in a decent language you would have to
reconcile inherited interfaces (like conflicting +'s) and provide
implementations, which in the case of + would be a multi-method with all
niceties it brings.

That both variant and class-wide objects have similar representations
with a tag indicating the actual choice or the specific type does not
make them same. Your obsession with representations prevents you from
seeing the whole picture and turns everything upside down.

Bart

unread,
Sep 23, 2021, 3:55:21 PM9/23/21
to
And yours is an obsession with classes and inheritance and the usual OO
nonsense.

Your links meant nothing to me; I saw it as result of people with too
much time on the hands inventing all this pointless jargon.

There is nothing to stop me calling my type-tags something entirely
different, so that my variants are just user data structures, and my
interpreter is just another application.

So, are you then going to tell how me I should design my applications?

Have a look at a file system some time. Files are blocks of data with
tags (maybe the extension, maybe an internal signature), which tell you
something about the data they contains.

An OS shell program issues commands on files. The command looks at the
file types and decides how to apply the command, which might be between
two files.

Just like my variant system.

Are you going to tell the people who write such file systems that they
got it all wrong too?

Mike Austin

unread,
Sep 23, 2021, 7:01:20 PM9/23/21
to
If I don't like blue cars, but that doesn't make them "wrong". Does static typing help? Absolutely. In this day and age, should everything be statically typed? In some form, probably. But there are plenty of productive Closure users for example that get along just fine. I'd simply say that people that need static typing - should use a statically typed language.

Oh jeez, did I just respond to one of the oldest debates in computing :)

For my language Kopi... I'm letting it go where it wants to go. It's an experiment, and I feel free not defining what it is exactly. I had prototypes of a static type checker, but currently I'm having so much fun implementing features.

Dmitry A. Kazakov

unread,
Sep 24, 2021, 3:04:21 AM9/24/21
to
Just terminology for concepts same in all languages and known for a half
of century.

> There is nothing to stop me calling my type-tags something entirely
> different, so that my variants are just user data structures, and my
> interpreter is just another application.

Yes, but in order to understand and compare languages one needs words
for design concepts.

> So, are you then going to tell how me I should design my applications?

No, I tell you what you are attempting and what consequences would be.

> Have a look at a file system some time. Files are blocks of data with
> tags (maybe the extension, maybe an internal signature), which tell you
> something about the data they contains.

Filesystems can be described in terms of types as well. If you read a
bit about historical filesystems you would know that.

James Harris

unread,
Sep 25, 2021, 11:19:57 AM9/25/21
to
On 22/09/2021 20:47, Bart wrote:
> On 22/09/2021 15:24, James Harris wrote:
>> On 09/09/2021 22:04, Bart wrote:
>
>>> Also, Variants could presumably be assigned something else of a
>>> different type? Then you can't even assume it will stay the same type.
>>
>> Wouldn't code which handles variants typically include a switch on the
>> type as in the following?
>>
>>    switch T.type
>>      case int:
>>        /* Handle int */
>>      case array float:
>>        /* Handle array of float */
>>    end switch /* Ignoring other cases */
>
> You need to show an example of source code where such types are used,
> and see that the generated code might be like.
>
> But let's take your example:
>
>    type U = int |[]float

Your lack of spacing makes that difficult to read.

...

>   if b.type=c.type=int then        # Hey, a chained compare example!

Indeed, and it looks quite natural.

Apart from the lack of spacing, of course.

...

>   type U = int32 | int64 | float32 | float64
>
>   U a, b, c, d, e
>
>   a = b*c + d*e
>
> All 16 combinations of types should be valid, but do you want to try
> writing the switch statements for them?

IMO the programmer should be able to focus on the algorithm and leave
the mechanisms to the compiler so I wouldn't want a programmer to have
to write a switch statement for that. A switch statement on types is
best used when the algorithms (and, hence, source code) for each case
are different. In my example, above, the array would presumably be
treated very differently from the scalar.

But you make a good point about how variants can be difficult to compile.

I'd add that they make it possible to have type mismatches at run time -
which would be undesirable.

>
> I don't think this amount of boilerplate is practical to write as inline
> code.
>
> You should however try doing so for example fragments of code to see as
> I suggested to see what is involved.
>
> This type of variant means that:
>
>    a := a[i]
>
> could be legal. Anything goes.

A good example of a variant is in a tokeniser where the token as read
from the input needs to be categorised into a keyword, a string, a name,
a number, a known and possibly composite symbol, a single character, etc.

Another good example is an incoming message - where the message - and
the way it has to be handled - could be one of a number of different types.

I cannot at the moment, however, think of a good example of where code
would have to switch on the combination of two or more variants - which
is the type of code you have been showing. A switch (in the source code)
on a single variant would be more normal, I would think.


--
James Harris

Bart

unread,
Sep 25, 2021, 12:05:16 PM9/25/21
to
On 25/09/2021 16:19, James Harris wrote:
> On 22/09/2021 20:47, Bart wrote:

> A good example of a variant is in a tokeniser where the token as read
> from the input needs to be categorised into a keyword, a string, a name,
> a number, a known and possibly composite symbol, a single character, etc.

That one can just be a tagged union (see below).

> I cannot at the moment, however, think of a good example of where code
> would have to switch on the combination of two or more variants - which
> is the type of code you have been showing. A switch (in the source code)
> on a single variant would be more normal, I would think.

I think look at some of my follow-up posts over last few days, since
there might be some confusion over what everyone means by 'variant'.

My idea of it is something that can be used exactly like a fixed type,
which is what happens with dynamic languages.

Some might use it to mean tagged union, with variable amounts of
language support.

(For that kind, I use zero language support. Here's how I express your
token example:

global record lexrec =
union
int64 value
real64 xvalue
word64 uvalue
ichar svalue
ref int128 pvalue128
symbol symptr
end

word32 pos: (sourceoffset:24, moduleno:8)

int16 symbol
int16 subcode
end

Which kind of token it represents is dertermined by the 'symbol' field.
It's been optimised to fit into a 16-byte record, something you have
less control over using language-controlled tagged unions.)

James Harris

unread,
Sep 25, 2021, 12:09:31 PM9/25/21
to
On 23/09/2021 14:55, David Brown wrote:
> On 23/09/2021 11:54, Bart wrote:

...

>> No one has yet explained exactly what kind of variant they are talking
>> about.
>>
>> You're assuming it is just a tagged union or sumtype, where all you can
>> do is write explicit code to extract one of the possible values.
>
> That is almost exactly what a variant /is/. But you are missing a few
> operations - you can also read which type is contained in some manner
> (perhaps by reading a field, perhaps by pattern matching or a special
> type of switch, perhaps by trial-and-error and exceptions), and of
> course you can assign values to it.
>
> You seem to be mixing up variants with dynamic typing. Perhaps you have
> been playing with Visual Basic, and copied MS's misuse of the term.

That's an interesting comment. What did MS do wrong and how are you
saying a variant should differ from a dynamic type?

In a language which uses dynamic typing one typically doesn't specify
the type of any object, just using something like

var x

That would allow x to be set to any type. AISI with a variant all one
does is limit the types which x can be, e.g.

(int # bool) x ;x can be of type int or type bool

How that's implemented wouldn't matter to the semantics.


--
James Harris

James Harris

unread,
Sep 25, 2021, 12:59:44 PM9/25/21
to
On 22/09/2021 23:13, Bart wrote:

...

> 'Variants' seem to be mean lots of different things.

...

>     variant a, b, c
>     a := b + c
>
> generated this x64 code (not ABI-compliant):
>
>           lea       D0,    [Dframe+-8]
>           push      D0
>           call      [L5]               # init a
>           lea       D0,    [Dframe+-16]
>           push      D0
>           call      [L5]               # init b
>           lea       D0,    [Dframe+-24]
>           push      D0
>           call      [L5]                   # init c
>
>           mov       D0, [Dframe+-16]       # 'push b'
>           inc       word32 [D0]            # step ref count
>           mov       D1, [Dframe+-24]       # 'push c'
>           inc       word32 [D1]            # step ref count
>           push      D0
>           push      D1
>           call      [L6]                   # addvar(b, c)
>           lea       D1, [Dframe+-8]        # &a
>           push      D0
>           push      D1
>           call      [L7]                   # 'pop 'a'
>
> So it's all done with function calls; little inline code. And it's
> inefficient.
>
> But that's how /I/ did it. This might not be what James has in mind.
>

I think the key part of that is the bit you've not shown: how the
variants are combined in addvar.


--
James Harris

James Harris

unread,
Sep 25, 2021, 1:59:56 PM9/25/21
to
Run-time errors happen.

Consider a parser which reads the keyword "return" and then expects
either an expression or end-of-statement. The /type/ of the following
token returned by the lexer may, instead, be a keyword. The parser would
then have a type error detected at run time.

Neither a program nor a language is 'broken' as long as the error is
detected.


--
James Harris

James Harris

unread,
Sep 25, 2021, 2:22:08 PM9/25/21
to
On 23/09/2021 13:20, Dmitry A. Kazakov wrote:
> On 2021-09-23 13:34, Bart wrote:

...

>    https://en.wikipedia.org/wiki/Tagged_union

...

>> James' proposal looks to me like an ad-hoc collection of types, within
>> an umbrella type.

I wouldn't go so far as to call it a proposal but what I have in mind is
roughly as Bart describes: an arbitrary collection of types where the
types are explicitly specified by the programmer (as with the typedef
below). An object thereof would be implemented by a tagged or
discriminated union as in the above link with the tag being managed by
the compiler.

Code to handle a certain variant would in general use a switch or an
'if' but cases could be combined. For example, the following code
(somewhat artificially) shares code for int and bool.

typedef answer = int # bool # char

function decrement(answer A)
switch A.type
case int
case bool
A = A - 1
case char
A = char(ord(A) - 1)
end switch
end function

The whole switch construct could be omitted if all cases shared source code

>
> That is not variant, that is an ad-hoc class with inheritance and all
> bells and whistles. He may not understand that or be in denial, but that
> is his problem.

No, there's not necessarily any class or inheritance. With your OO
mindset you may struggle to understand that but it is so, nonetheless.

>
>> It might also be specifically created to get away from terms like
>> class, and member and inheritance!
>
> He has his delusions you have yours.

Anything which isn't OO is a delusion, eh?


--
James Harris

James Harris

unread,
Sep 25, 2021, 2:29:55 PM9/25/21
to
On 25/09/2021 17:05, Bart wrote:
> On 25/09/2021 16:19, James Harris wrote:
>> On 22/09/2021 20:47, Bart wrote:
>
>> A good example of a variant is in a tokeniser where the token as read
>> from the input needs to be categorised into a keyword, a string, a
>> name, a number, a known and possibly composite symbol, a single
>> character, etc.
>
> That one can just be a tagged union (see below).

Yes.
I agree with all that (though your double use of "symbol" looks
strange). But I thought you were going to come up with an example of
where TWO variants had to be combined in a single dyadic operation.

Also, in your example it looks as though the tag could be changed
without changing the contents of the union. IMO it's better to require
tag and content to change at the same time, if possible, so as to
maintain type safety. Just because this is a variant doesn't mean that
type safety needs to be lost.


--
James Harris

Dmitry A. Kazakov

unread,
Sep 25, 2021, 2:31:39 PM9/25/21
to
On 2021-09-25 19:59, James Harris wrote:
> On 23/09/2021 15:48, Dmitry A. Kazakov wrote:
>> On 2021-09-23 16:31, Bart wrote:
>>> On 23/09/2021 14:57, Dmitry A. Kazakov wrote:
>>>> On 2021-09-23 15:47, Bart wrote:
>>>>
>>>>> This is the proposed feature in action:
>>>>>
>>>>>     type T = int | real | string
>>>>>
>>>>>     function biggerof (T a, b)=>T =
>>>>>         if a>b then
>>>>>            return a
>>>>>         else
>>>>>            return b
>>>>>         fi
>>>>>      end
>>>>
>>>> Type inference, when the interfaces of the class members are somehow
>>>> combined = non-starter.
>>>>
>>>> It is impossible to infer semantics from interfaces. No need to
>>>> discuss.
>>>
>>> I don't really know what you're talking about. Such a feature works
>>> fine in dynamic code:
>>
>> You just have illustrated the point by providing a broken program that
>> has a type error at run time.
>
> Run-time errors happen.

No, errors do not. At run-time happen exceptions. Type error is a bug,
constraint check failure is a valid program state. In a properly
designed language type errors are detected at compile time.

> Consider a parser which reads the keyword "return" and then expects
> either an expression or end-of-statement. The /type/ of the following
> token returned by the lexer may, instead, be a keyword. The parser would
> then have a type error detected at run time.

Broken language.

> Neither a program nor a language is 'broken' as long as the error is
> detected.

By whom and when?

Dmitry A. Kazakov

unread,
Sep 25, 2021, 2:44:07 PM9/25/21
to
On 2021-09-25 20:22, James Harris wrote:
> On 23/09/2021 13:20, Dmitry A. Kazakov wrote:

>> That is not variant, that is an ad-hoc class with inheritance and all
>> bells and whistles. He may not understand that or be in denial, but
>> that is his problem.
>
> No, there's not necessarily any class or inheritance. With your OO
> mindset you may struggle to understand that but it is so, nonetheless.

There necessarily is, IFF variant acts as its members. That is the
definition of subtyping, subtypes form a class, also per definition of.

Standard variant does not do this, it is just an aggregation = has-a.

>>> It might also be specifically created to get away from terms like
>>> class, and member and inheritance!
>>
>> He has his delusions you have yours.
>
> Anything which isn't OO is a delusion, eh?

Anything that contradicts reality is. OO has nothing to do with all
this. Whatever choice you or Bart do it falls into one of well known and
described categories.

Delusion is to believe that if you would not call a spade spade then it
might turn into a Lamborghini.

Bart

unread,
Sep 25, 2021, 3:28:00 PM9/25/21
to
That's not inline code; it will just do whateever is necessary. My point
was that I found it too complex to do it inline; just the above is
long-winded enough.

But this s the code for when b and c are strings; a can be anything, but
will also be a string after the assignment. 'addvar' is really m$add_var:

global function m$add_var(object b,a)object =
m$obj_bin_m(v_add,a,b)
end

function m$obj_bin_m(int opc, object a, b)object c=
c := obj_bin(opc, a, b)
if --a.refcount = 0 then obj_free(a) fi
if --b.refcount = 0 then obj_free(b) fi
return c
end

global function obj_bin(int opc, object a, b)object=
ref function(object a, b)object fnptr

fnptr := binoptable[opc, a.tag, b.tag]
if fnptr = nil then
fnptr := binoptable[opc, a.tag, b.tag] :=
getbinophandler(opc, a.tag, b.tag)
if fnptr = nil then
novxhandler(binopnames[opc], a.tag, b.tag)
fi
fi

return fnptr^(a, b)
end

global function string_add(object a, b)object c=
return string_concat(a, b)
end

The dispatch between two variant types, which I think you asked about
elsewhere, is done with that binoptable 3D array.

This is quite a heavyweight method of implementing first-class strings,
for which I'd used methods imported from my interpreter projects.

I didn't really intend a variant type here to be anything other than a
string. So for statically-defined strings, this could probably be
streamlined quite a bit: inline code directly calls string_add, but it
would have to deal reference counts as these strings are shared.

If you want variants with a choice of /simple/ types (say just scalars
manipulated by value), then it can again be streamlined, but probably
still not enough to fully inline.

Bart

unread,
Sep 25, 2021, 4:58:04 PM9/25/21
to
Does this function exist in the user's source code, or is it part of the
implmementation which is called with the user code does:

answer X=1234 # X set to int type and value 1234
--X # X ends up as having int type still and value 1233

If the latter, you will also need something to deal with this line:

A = A - 1 # (A additionally needs to be a reference type)

Also, you need something for ord(A). And also, for assignment, although
that is simple enough to do inline, provided that A's existing type
doesn't need a complex destruct routine.

At some point, there must be code that knows where all the components of
A are stored.

Bart

unread,
Sep 25, 2021, 7:48:31 PM9/25/21
to
On 25/09/2021 19:29, James Harris wrote:
> On 25/09/2021 17:05, Bart wrote:

>> (For that kind, I use zero language support. Here's how I express your
>> token example:
>>
>>      global record lexrec =
>>          union
>>              int64 value
>>              real64 xvalue
>>              word64 uvalue
>>              ichar svalue
>>              ref int128 pvalue128
>>              symbol symptr
>>          end
>>
>>          word32 pos: (sourceoffset:24, moduleno:8)
>>
>>          int16 symbol
>>          int16 subcode
>>      end
>>
>> Which kind of token it represents is dertermined by the 'symbol'
>> field. It's been optimised to fit into a 16-byte record, something you
>> have less control over using language-controlled tagged unions.)
>>
>
> I agree with all that (though your double use of "symbol" looks
> strange).

I edited after pasting to bring it into line with usage elsewhere so
that the type 'ref strec' uses the alias 'symbol'. I hadn't spotted it
clashed with a field name. That probably explains why it was as it was!

> But I thought you were going to come up with an example of
> where TWO variants had to be combined in a single dyadic operation.

This why I don't call such constructs variants.


> Also, in your example it looks as though the tag could be changed
> without changing the contents of the union. IMO it's better to require
> tag and content to change at the same time, if possible, so as to
> maintain type safety.

There is only a loose coupling here between the token type (.symbol),
and the set of values, which are only relevant for a handful of the
dozens of kinds of token.

That's one of several reasons why language-controlled tagged unions
don't suit my style of coding. The tag needs to be a global enum as it
will be used in different places and can be used in other logic than
just selecting one element of a union.

In another language-related struct, an AST node, the tag is used to
lookup flags in a table that tell it whether 0, 1 or 2 sub-nodes are
used. Each subnode pointer is a union.


James Harris

unread,
Sep 26, 2021, 3:57:55 AM9/26/21
to
On 25/09/2021 19:31, Dmitry A. Kazakov wrote:
> On 2021-09-25 19:59, James Harris wrote:
>> On 23/09/2021 15:48, Dmitry A. Kazakov wrote:
>>> On 2021-09-23 16:31, Bart wrote:
>>>> On 23/09/2021 14:57, Dmitry A. Kazakov wrote:

...

>>>>> It is impossible to infer semantics from interfaces. No need to
>>>>> discuss.
>>>>
>>>> I don't really know what you're talking about. Such a feature works
>>>> fine in dynamic code:
>>>
>>> You just have illustrated the point by providing a broken program
>>> that has a type error at run time.
>>
>> Run-time errors happen.
>
> No, errors do not. At run-time happen exceptions. Type error is a bug,
> constraint check failure is a valid program state. In a properly
> designed language type errors are detected at compile time.

You always get hung up on terminology, Dmitry. Whether you call them
errors or exceptions the issue is the same.

>
>> Consider a parser which reads the keyword "return" and then expects
>> either an expression or end-of-statement. The /type/ of the following
>> token returned by the lexer may, instead, be a keyword. The parser
>> would then have a type error detected at run time.
>
> Broken language.

You've again caught yourself out due to inflexibility of thought
patterns. I didn't mention a language. Try reading the paragraph again.

>
>> Neither a program nor a language is 'broken' as long as the error is
>> detected.
>
> By whom and when?
>

In the context, by the executable code at run time.


--
James Harris

James Harris

unread,
Sep 26, 2021, 4:23:56 AM9/26/21
to
On 25/09/2021 19:44, Dmitry A. Kazakov wrote:
> On 2021-09-25 20:22, James Harris wrote:
>> On 23/09/2021 13:20, Dmitry A. Kazakov wrote:
>
>>> That is not variant, that is an ad-hoc class with inheritance and all
>>> bells and whistles. He may not understand that or be in denial, but
>>> that is his problem.
>>
>> No, there's not necessarily any class or inheritance. With your OO
>> mindset you may struggle to understand that but it is so, nonetheless.
>
> There necessarily is, IFF variant acts as its members. That is the
> definition of subtyping,

That's interesting. I see that definition at

https://en.wikipedia.org/wiki/Subtyping

but according to

http://archive.adaic.com/standards/83lrm/html/lrm-03-03.html

"The set of values of a subtype is a subset of the values of the base
type" which I thought of much as day-of-month (1-31) might be a subtype
of integer. Am I wrong?

Why the difference in definitions? Is it because OOP has hijacked
familiar terms?

>
> subtypes form a class, also per definition of.
>
> Standard variant does not do this, it is just an aggregation = has-a.
>
>>>> It might also be specifically created to get away from terms like
>>>> class, and member and inheritance!
>>>
>>> He has his delusions you have yours.
>>
>> Anything which isn't OO is a delusion, eh?
>
> Anything that contradicts reality is. OO has nothing to do with all
> this. Whatever choice you or Bart do it falls into one of well known and
> described categories.

It doesn't help communication if OOP appropriated to itself terms like
class, member and inheritance and gave then new meanings. IIRC even you
sometimes speak of a class in different terms from an OOP class.

Maybe OOP appropriates familiar terms to try to contain its overblown
complexity and make it seems more user-friendly but that just causes
problems for everyone.

I have to say that what makes things worse is that in these discussions
you do tend to rely on definitions - like class or, in a recent post,
error - which doesn't help much if they have different meanings in
different contexts.

>
> Delusion is to believe that if you would not call a spade spade then it
> might turn into a Lamborghini.
>

Well, that adds clarity to the discussion. ;-)


--
James Harris

Dmitry A. Kazakov

unread,
Sep 26, 2021, 4:38:15 AM9/26/21
to
On 2021-09-26 09:57, James Harris wrote:
> On 25/09/2021 19:31, Dmitry A. Kazakov wrote:
>> On 2021-09-25 19:59, James Harris wrote:
>>> On 23/09/2021 15:48, Dmitry A. Kazakov wrote:
>>>> On 2021-09-23 16:31, Bart wrote:
>>>>> On 23/09/2021 14:57, Dmitry A. Kazakov wrote:
>
> ...
>
>>>>>> It is impossible to infer semantics from interfaces. No need to
>>>>>> discuss.
>>>>>
>>>>> I don't really know what you're talking about. Such a feature works
>>>>> fine in dynamic code:
>>>>
>>>> You just have illustrated the point by providing a broken program
>>>> that has a type error at run time.
>>>
>>> Run-time errors happen.
>>
>> No, errors do not. At run-time happen exceptions. Type error is a bug,
>> constraint check failure is a valid program state. In a properly
>> designed language type errors are detected at compile time.
>
> You always get hung up on terminology, Dmitry. Whether you call them
> errors or exceptions the issue is the same.

No, it is very important distinction. You can always convert errors into
exceptions, by weakening contracts of the interfaces. But that has
direct consequences for the user who must deal with a weaker contracts
and this weakness propagates up the inheritance tree, so that the
interface you would never expect raising errors suddenly does this, e.g.
arithmetic + subscript errors.

>>> Neither a program nor a language is 'broken' as long as the error is
>>> detected.
>>
>> By whom and when?
>
> In the context, by the executable code at run time.

In that context consider an aircraft you are sitting in...

James Harris

unread,
Sep 26, 2021, 4:52:07 AM9/26/21
to
On 25/09/2021 21:57, Bart wrote:
> On 25/09/2021 19:22, James Harris wrote:
>> On 23/09/2021 13:20, Dmitry A. Kazakov wrote:

>>>     https://en.wikipedia.org/wiki/Tagged_union

...

>> Code to handle a certain variant would in general use a switch or an
>> 'if' but cases could be combined. For example, the following code
>> (somewhat artificially) shares code for int and bool.
>>
>>    typedef answer = int # bool # char
>>
>>    function decrement(answer A)
>>      switch A.type
>>      case int
>>      case bool
>>        A = A - 1
>>      case char
>>        A = char(ord(A) - 1)
>>      end switch
>>    end function
>>
>> The whole switch construct could be omitted if all cases shared source
>> code
>
> Does this function exist in the user's source code, or is it part of the
> implmementation which is called with the user code does:
>
>   answer X=1234    # X set to int type and value 1234
>   --X              # X ends up as having int type still and value 1233

It was meant to be an outline for a function which could be called as

decrement(X)

I picked it because it was an operation which had just one argument but
it wasn't a good example. In my language, for example, if the parameter,
A, were to be modifiable by the function it would have a corresponding
parameter definition and the actual in the call would need to appear as &X.


>
> If the latter, you will also need something to deal with this line:
>
>   A = A - 1        # (A additionally needs to be a reference type)
>
> Also, you need something for ord(A). And also, for assignment, although
> that is simple enough to do inline, provided that A's existing type
> doesn't need a complex destruct routine.

True.

>
> At some point, there must be code that knows where all the components of
> A are stored.
>

I don't understand that. If the type is a variant wouldn't there just be

tag
value

where the tag would be an index of 0 to 2 in the case of the value being
of type "int # bool # char"?


--
James Harris

Dmitry A. Kazakov

unread,
Sep 26, 2021, 4:54:13 AM9/26/21
to
On 2021-09-26 10:23, James Harris wrote:
> On 25/09/2021 19:44, Dmitry A. Kazakov wrote:
>> On 2021-09-25 20:22, James Harris wrote:
>>> On 23/09/2021 13:20, Dmitry A. Kazakov wrote:
>>
>>>> That is not variant, that is an ad-hoc class with inheritance and
>>>> all bells and whistles. He may not understand that or be in denial,
>>>> but that is his problem.
>>>
>>> No, there's not necessarily any class or inheritance. With your OO
>>> mindset you may struggle to understand that but it is so, nonetheless.
>>
>> There necessarily is, IFF variant acts as its members. That is the
>> definition of subtyping,
>
> That's interesting. I see that definition at
>
>   https://en.wikipedia.org/wiki/Subtyping
>
> but according to
>
>   http://archive.adaic.com/standards/83lrm/html/lrm-03-03.html
>
> "The set of values of a subtype is a subset of the values of the base
> type" which I thought of much as day-of-month (1-31) might be a subtype
> of integer. Am I wrong?
>
> Why the difference in definitions? Is it because OOP has hijacked
> familiar terms?

Historically Ada used the term subtype for a very narrow set of
subtyping. So, to clarify one would say "Ada subtype", or "Liskov
subtype" etc. Same with C++ classes, which are just types. C++ class
reads "dynamically polymorphic type."

>>>>> It might also be specifically created to get away from terms like
>>>>> class, and member and inheritance!
>>>>
>>>> He has his delusions you have yours.
>>>
>>> Anything which isn't OO is a delusion, eh?
>>
>> Anything that contradicts reality is. OO has nothing to do with all
>> this. Whatever choice you or Bart do it falls into one of well known
>> and described categories.
>
> It doesn't help communication if OOP appropriated to itself terms like
> class, member and inheritance and gave then new meanings. IIRC even you
> sometimes speak of a class in different terms from an OOP class.

OOP did not do that, in OOPL the terminology is as unsettled as anywhere
else. BTW, Simula is early 60's, you are welcome to claim youe
appropriation case!

But, one can agree on different terms, just define them, then it is OK.
If you want to use "class" for "type" like in C++, that is fine, just
give another word for class. Do not say they do not exist because you
used the name...

I have no issue with terms, but with broken semantics.

Bart

unread,
Sep 26, 2021, 6:10:14 AM9/26/21
to
Then this is just user-code.

So where does the variant magic come in? In your example, it appears to
be in the lines A=A-1, and A=char(ord(A)-1.

Suppose we used the trick where the language transpiles into C, and had
two columns where on the left is what the user writes in the language,
and on the right is what is generated. Then your example could have
looked like this:

New Langauge | Generated C

| enum {Void=0, Int, Bool, Char};
|
type answer = |
int # bool # char | typedef struct {int tag, value;} answer;
|
answer A | answer A = {Void, 0};
|
A = 1234 | A.tag = Int;
| A.value = 1234;
|
--A | switch (A.tag) {
| case Int: --A.value; break;
| case Char: --A.value; break;
| case Bool:
| if (A.value) --A.value;
| else /* error */;
| break;
| default: /* error */;
| }



It's not a great example, because it's not challenging: the variant
types aren't different enough to warrant the use of a union, and all the
branches of the switch can use the same code (I tried to mix it up with
Bool).

I'm using here a global set of enums for the types; but if there are 50
types in all (that MS link used a variant of 50), and your variant uses
only 3, you may want to use ordinals 1, 2, 3 instead of 12, 15, 39.

I've also uses an extra type variation: Void, whose tag has the value
zero, and is used when no value has yet been assumed. This allows such
variables in .bss for example, to have a stable type.

Anyway, you can extend my example with more operations on A, but I
suggest the type is changed to:

type answer = int # float # string

(Where string is just a char* type, nothing complicated, and it will
never 'own' its data.)

There is probably no point in having different widths of int, since each
instance will have the same size anyway. But then maybe you may want
values limited to 255 or 65535.

James Harris

unread,
Sep 26, 2021, 6:32:06 AM9/26/21
to
On 26/09/2021 09:38, Dmitry A. Kazakov wrote:
> On 2021-09-26 09:57, James Harris wrote:
>> On 25/09/2021 19:31, Dmitry A. Kazakov wrote:
>>> On 2021-09-25 19:59, James Harris wrote:

...

>>>> Neither a program nor a language is 'broken' as long as the error is
>>>> detected.
>>>
>>> By whom and when?
>>
>> In the context, by the executable code at run time.
>
> In that context consider an aircraft you are sitting in...
>

try
one path
catch exception
other path

Plane keeps flying.


--
James Harris

Bart

unread,
Sep 26, 2021, 7:04:21 AM9/26/21
to
You seems to think that having an impossibly strict type system and
using the right jargon for everything somehow magically eliminates all bugs.

Most bugs IME are not caused by a too-lax type system. Over-strict
typing is more likely to /generate/ bugs IMO since you have to pay too
much attention to over-pedanticness and you take your eye off the bigger
picture.

I used to develop some programs using dynamic code which doesn't care
about exact types (it just stops you adding 37.2 to "cat"), and allows
you to quickly develop a working program free of logic errors.

/Then/ I would port that to statically typed code to benefit from its
speed. But I'd know the program was correct, and any errors now are
likely due to the porting.


>>>> Neither a program nor a language is 'broken' as long as the error is
>>>> detected.
>>>
>>> By whom and when?
>>
>> In the context, by the executable code at run time.
>
> In that context consider an aircraft you are sitting in...

Aeroplanes seemed to manage perfectly well without computers at all!

The stuff I write isn't mission critical so there's no point in tying
myself up in knots for no reason.
It is loading more messages.
0 new messages