Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Syntax for user-defined infix operators

68 views

Skip to first unread message

James Harris

unread,

Aug 21, 2009, 8:54:15 AM8/21/09

Opinions sought....

Many (maybe most) languages accept symbols as infix operators for
binary (two-operand) operations such as

x + y

Some also predefine words as infix operators such as Pascal's

i div j

I would like to offer to programmers the ability to use the same
syntax as is available for built-in operations so instead of

op(a, b)

the programmer could code

a op b

for a user-defined binary operator, op.

The problem with this is that we have, effectively, three adjacent
words. The Pascal example only looks right because 1) i and j look
like variable names, 2) div sounds a little like an operation, 3)
Pascal reserves div so it is a known word.

Of course a programmer defining an operation could ensure that
something like a verb was used and could also ensure variables are
something like nouns but is that enough?

I'm thinking of using a modifier symbol. Some languages modify the
variables such as

$a op $b

It may be better to modify the operation such as

a $op b

where $ as a prefix indicates that the word is an operation. Maybe a
suffix or both prefix and suffix would be better. Maybe a different
symbol should be used.

Anyone have suggestions for a syntax that makes the operation clear? I
should say this is intended to be for both unary and binary operators.
Operators with more operands would have to come before the operands
such as op(a, b, c). Whatever notation is used may be best defined as
optional - i.e. just used for clarity where needed. What do you think?

James

Dmitry A. Kazakov

unread,

Aug 21, 2009, 9:21:33 AM8/21/09

On Fri, 21 Aug 2009 05:54:15 -0700 (PDT), James Harris wrote:

> Opinions sought....
>
> Many (maybe most) languages accept symbols as infix operators for
> binary (two-operand) operations such as
>
> x + y
>
> Some also predefine words as infix operators such as Pascal's
>
> i div j
>
> I would like to offer to programmers the ability to use the same
> syntax as is available for built-in operations so instead of
>
> op(a, b)
>
> the programmer could code
>
> a op b
>
> for a user-defined binary operator, op.

This is ambiguous for unary operations. Consider:

+(1)

is it:

1. Plus(1)

2. Plus(Order_Brackets(1))

> The problem with this is that we have, effectively, three adjacent
> words. The Pascal example only looks right because 1) i and j look
> like variable names, 2) div sounds a little like an operation, 3)
> Pascal reserves div so it is a known word.

Well, without an ambiguity between overloaded parameter list brackets ()
and ordering brackets (), it is no problem to parse i div j, provided that
operations are keywords. It is also possible to parse it even if div is not
reserved:

div div div

is OK and means div(div, div). The problem is with unary operations, which
will collide with proper names:

plus plus plus

is it + + plus or plus + plus?

> Of course a programmer defining an operation could ensure that
> something like a verb was used and could also ensure variables are
> something like nouns but is that enough?
>
> I'm thinking of using a modifier symbol. Some languages modify the
> variables such as
>
> $a op $b
>
> It may be better to modify the operation such as
>
> a $op b
>
> where $ as a prefix indicates that the word is an operation. Maybe a
> suffix or both prefix and suffix would be better. Maybe a different
> symbol should be used.

I think it makes no sense to allow the programmer to introduce new infix
(and unary) operations because he would need to give them association
priorities. But you need the priorities in the parser to make it working.
It is possible to reconstruct parser dynamically each time the programmer
defines an operator, but that will one a can or worms. For example nested
operations declarations will change the syntax of their scope. Nobody will
be able to understand such a program or errors spilled by the compiler.

A pragmatic approach is to fix all operations and their priorities. I.e.
there is predefined symbol + with the priority lower than *, etc. You can
introduce a special syntax for op(x,y,z). For example, C++ uses operator op
to construct a proper name of op. Ada uses "op". I think these are good
pragmatic solutions, especially because the syntax op(x,y,z) is rarely used
with operations.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

tm

unread,

Aug 21, 2009, 9:40:30 AM8/21/09

On 21 Aug., 14:54, James Harris <james.harri...@googlemail.com> wrote:
> Opinions sought....
>
> Many (maybe most) languages accept symbols as infix operators for
> binary (two-operand) operations such as
>
> x + y
>
> Some also predefine words as infix operators such as Pascal's
>
> i div j
>
> I would like to offer to programmers the ability to use the same
> syntax as is available for built-in operations so instead of
>
> op(a, b)
>
> the programmer could code
>
> a op b
>
> for a user-defined binary operator, op.
>
> The problem with this is that we have, effectively, three adjacent
> words.

For a parser it is not a problem to recognize variables and operator
symbols when only words are used.

> The Pascal example only looks right because 1) i and j look
> like variable names, 2) div sounds a little like an operation, 3)
> Pascal reserves div so it is a known word.

It is not necessary to reserve operator symbols such that user
defined operator symbols become possible. Misuse of operator symbols
as name of a variable can still be prohibited, without reserved
symbols. Seed7 supports user defined operators without reserved
symbols. See:

http://seed7.sourceforge.net/faq.htm#reserved_words

> Of course a programmer defining an operation could ensure that
> something like a verb was used and could also ensure variables are
> something like nouns but is that enough?

I think to distinguish between verbs and nouns is not a good idea.

> I'm thinking of using a modifier symbol. Some languages modify the
> variables such as
>
> $a op $b
>
> It may be better to modify the operation such as
>
> a $op b
>
> where $ as a prefix indicates that the word is an operation. Maybe a
> suffix or both prefix and suffix would be better. Maybe a different
> symbol should be used.

IMHO no modifier symbols are necessary. When a symbol is defined to
be an infix or prefix operator it is distinguishable from variables.

> Anyone have suggestions for a syntax that makes the operation clear? I
> should say this is intended to be for both unary and binary operators.
> Operators with more operands would have to come before the operands
> such as op(a, b, c). Whatever notation is used may be best defined as
> optional - i.e. just used for clarity where needed. What do you think?

You might be interested to take a look at how Seed7 defines the
syntax of operators. See:

http://seed7.sourceforge.net/manual/syntax.htm#The_syntax_of_operators

Seed7 is not restricted to user defined operator symbols. The syntax
of (almost) all constructs is defined in the language itself. See:

http://seed7.sourceforge.net/manual/syntax.htm

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.

tm

unread,

Aug 21, 2009, 10:43:22 AM8/21/09

On 21 Aug., 15:21, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> On Fri, 21 Aug 2009 05:54:15 -0700 (PDT), James Harris wrote:
> > The problem with this is that we have, effectively, three adjacent
> > words. The Pascal example only looks right because 1) i and j look
> > like variable names, 2) div sounds a little like an operation, 3)
> > Pascal reserves div so it is a known word.
>
> Well, without an ambiguity between overloaded parameter list brackets ()
> and ordering brackets (), it is no problem to parse i div j, provided that
> operations are keywords. It is also possible to parse it even if div is not
> reserved:
>
> div div div
>
> is OK and means div(div, div).

Correct.
Seed7 accepts this also, since the variable 'div' is a prefix
object, and the operator 'div' is an infix object. The following
code snipped was accepted by the 'hi' (Seed7) interpreter:

$ include "seed7_05.s7i";

var integer: div is 3;
var integer: test is div div div;

If 'div div div' makes sense is a different question. Even if such
things would be prohibited by the compiler writing obfuscated code
could never be prevented. IMHO it is essentially the job of the
programmer to write easy to read programs. A language can just help
to make it unambiguous.

> The problem is with unary operations, which
> will collide with proper names:
>
> plus plus plus
>
> is it + + plus or plus + plus?

When 'plus' is defined as infix and as prefix operator you would
not be able to define it as variable in Seed7. I wrote the following
program into the file operator.sd7:

$ include "seed7_05.s7i";

$ syntax expr: .plus.() is -> 12;
$ syntax expr: .().plus.() is -> 14;

var integer: plus is 0;

Starting the 'hi' (Seed7) interpreter gave:

D:\Programme\seed7\prg>hi operator
HI INTERPRETER Version 4.5.5095 Copyright (c) 1990-2009 Thomas...
262 D:/Programme/seed7/lib/syntax.s7i
3545 D:/Programme/seed7/lib/seed7_05.s7i
*** operator.sd7(6):30: Expression expected found "is"
var integer: plus is 0;
--------------------^

The compiler recognizes 'plus' as infix operator and expects
something after it. Since the 'is' is part of the surrounding
construct an expression is missing. Parsers (such as the table
driven parser of Seed7) usually work left to right. As such a
left to right interpretation is normally preferred.

> > Of course a programmer defining an operation could ensure that
> > something like a verb was used and could also ensure variables are
> > something like nouns but is that enough?
>
> > I'm thinking of using a modifier symbol. Some languages modify the
> > variables such as
>
> > $a op $b
>
> > It may be better to modify the operation such as
>
> > a $op b
>
> > where $ as a prefix indicates that the word is an operation. Maybe a
> > suffix or both prefix and suffix would be better. Maybe a different
> > symbol should be used.
>
> I think it makes no sense to allow the programmer to introduce new infix
> (and unary) operations because he would need to give them association
> priorities.

And what is the problem when the user defines new operator symbols
with priority and associativity?

> But you need the priorities in the parser to make it working.

Seed7 has solved this problem with a table driven LL(1) parser.

> It is possible to reconstruct parser dynamically each time the programmer
> defines an operator, but that will one a can or worms.

It is not necessary to reconstruct the parser dynamically. Just the
tables need to be changed.

> For example nested
> operations declarations will change the syntax of their scope.

Nesting of syntax declarations is not supported in Seed7.

> Nobody will
> be able to understand such a program or errors spilled by the compiler.

While there is always room to improve, the error messages of the
'hi' (Seed7) interpreter are useful.

> A pragmatic approach is to fix all operations and their priorities. I.e.
> there is predefined symbol + with the priority lower than *, etc.

No, it is just not necessary to fix all operator symbols and their
priorities in the compiler.

> You can
> introduce a special syntax for op(x,y,z).

Are you suggesting and additional notation used when the operator
is defined? Is 'y' the operator symbol and 'op' used for all
operators? An additional function style notation for operator
symbols is IMHO a bad idea. Operators are operators and functions
are functions. If an operator syntax is allowed it can be used in
the semantic declaration and when the operator is used. Seed7 does
distinguish between syntax and semantic declarations. When a
syntax declaration was done the semantic declaration does not need
a special notation.

In Seed7 the syntax of the infix '+' operator is defined with:

$ syntax expr: .(). + .() is -> 7;

and the semantic definition is defined with:

const func integer: (in integer: a) + (in integer: b) is ...

Robbert Haarman

unread,

Aug 21, 2009, 10:49:37 AM8/21/09

On Fri, Aug 21, 2009 at 05:54:15AM -0700, James Harris wrote:
>
> I would like to offer to programmers the ability to use the same
> syntax as is available for built-in operations so instead of
>
> op(a, b)
>
> the programmer could code
>
> a op b
>
> for a user-defined binary operator, op.

You may want to take a look at how Haskell does it. In Haskell, any function
whose name consists entirely of "symbols" (characters you would normally
expect to find in infix operators) is an infix operator. Other names are
prefix by default. E.g.

12 + 4
div 12 4

You can use an infix operator in prefix position by surrounding it with
parentheses (this is an instance of currying), and you can use a prefix
function in infix position by surrounding it with backticks:

(+) 12 4
12 `div` 4

Moreover, you can define associativity and priority using "fixity
declarations", where you declare whether your function is left-associative
(infixl) right-associative (infixl), or non-associative, and how strongly
it binds (0 being weakest, 9 being strongest; normal function application
has a strength of 10). E.g.

infixr 6 +++

declares +++ to be left-associative with a binding strength of 6.

A little example of definition and usage:

-- | Kripke's quus function.
-- Behaves like +, but returns 5 if either operand is 56 or greater
x `quus` y
| x < 56 && y < 56 = x + y
| otherwise = 5

-- | Tests if x is more or less equal to y
x +- y = (x >= y * 0.95) && (x <= y * 1.05)

main = do
print (4 `quus` 5)
print (56 `quus` 3)
print (10 +- 11)
print (105 +- 100)

Disclaimer: I am not a Haskell programmer, so these examples may not be
idiomatic Haskell.

Regards,

Bob

--
But I ask you, what can a mathematician do without a sponge?

Dmitry A. Kazakov

unread,

Aug 21, 2009, 2:19:55 PM8/21/09

On Fri, 21 Aug 2009 07:43:22 -0700 (PDT), tm wrote:

> On 21 Aug., 15:21, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:

> And what is the problem when the user defines new operator symbols
> with priority and associativity?
>
>> But you need the priorities in the parser to make it working.
>
> Seed7 has solved this problem with a table driven LL(1) parser.
>
>> It is possible to reconstruct parser dynamically each time the programmer
>> defines an operator, but that will one a can or worms.
>
> It is not necessary to reconstruct the parser dynamically. Just the
> tables need to be changed.
>
>> For example nested
>> operations declarations will change the syntax of their scope.
>
> Nesting of syntax declarations is not supported in Seed7.

This is the point.

>> Nobody will
>> be able to understand such a program or errors spilled by the compiler.
>
> While there is always room to improve, the error messages of the
> 'hi' (Seed7) interpreter are useful.

It is impossible to produce reasonable error messages if syntax is fluid.

(That is the reason why in natural languages syntax is the most
conservative part. Otherwise it were unable to understand each other in
presence of errors and uncertainties.)

>> A pragmatic approach is to fix all operations and their priorities. I.e.
>> there is predefined symbol + with the priority lower than *, etc.
>
> No, it is just not necessary to fix all operator symbols and their
> priorities in the compiler.

If you don't support scoped declarations.

>> You can introduce a special syntax for op(x,y,z).
>
> Are you suggesting and additional notation used when the operator
> is defined?

Not only. It is also necessary for fully qualified names. Again, I assume
scoping. I also assume that operations are treated equivalently, so that
brackets, commas, membership operation . have same treatment in the
language as + or *.

> Is 'y' the operator symbol and 'op' used for all
> operators? An additional function style notation for operator
> symbols is IMHO a bad idea.

If operation is not a proper name, then, trivially, you cannot use it in
the contexts where a proper name is expected. You cannot get rid of all
such contexts. Consider referencing an operation as an object rather than
calling it. So if you maintain a distinction, you have to able to get a
proper name of an operation.

tm

unread,

Aug 21, 2009, 4:51:52 PM8/21/09

On 21 Aug., 20:19, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

wrote:
> On Fri, 21 Aug 2009 07:43:22 -0700 (PDT), tm wrote:
> > On 21 Aug., 15:21, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> > wrote:
> > And what is the problem when the user defines new operator symbols
> > with priority and associativity?
>
> >> But you need the priorities in the parser to make it working.
>
> > Seed7 has solved this problem with a table driven LL(1) parser.
>
> >> It is possible to reconstruct parser dynamically each time the programmer
> >> defines an operator, but that will one a can or worms.
>
> > It is not necessary to reconstruct the parser dynamically. Just the
> > tables need to be changed.
>
> >> For example nested
> >> operations declarations will change the syntax of their scope.
>
> > Nesting of syntax declarations is not supported in Seed7.
>
> This is the point.
>
> >> Nobody will
> >> be able to understand such a program or errors spilled by the compiler.
>
> > While there is always room to improve, the error messages of the
> > 'hi' (Seed7) interpreter are useful.
>
> It is impossible to produce reasonable error messages if syntax is fluid.

Please take into account that an implementation of Seed7 exists. The
'hi' (Seed7) interpreter is for sure capable to produce reasonable
error messages.

> (That is the reason why in natural languages syntax is the most
> conservative part. Otherwise it were unable to understand each other in
> presence of errors and uncertainties.)

Changing the syntax is a feature not intended to be used in every
program. It is a feature to be used when librarys are defined. Some
people seem to be concerned what happens if everybody invents new
statements. I discuss this subject here:

http://seed7.sourceforge.net/faq.htm#everybody_invents_statements

> >> A pragmatic approach is to fix all operations and their priorities. I.e.
> >> there is predefined symbol + with the priority lower than *, etc.
>
> > No, it is just not necessary to fix all operator symbols and their
> > priorities in the compiler.
>
> If you don't support scoped declarations.
>
> >> You can introduce a special syntax for op(x,y,z).
>
> > Are you suggesting and additional notation used when the operator
> > is defined?
>
> Not only. It is also necessary for fully qualified names. Again, I assume
> scoping.

Seed7 allows scoping of semantic declarations (local variable
declarations, local functions, etc.). Seed7 does currently not
support local syntax declarations. Syntax declarations are top
level and take effect from the declaration to the end of the
program. It would be possible to introduce local syntax declarations
(with scoping) also. But IMHO this is not an important feature.

> I also assume that operations are treated equivalently, so that
> brackets, commas, membership operation . have same treatment in the
> language as + or *.

Exept for some expressions (see below) all operator symbols are
declared with syntax declarations. The membership operation . and
the statements are also defined with 'syntax' declarations. Even
the syntax of declaration constructs (which are used to declare
constants, variables and functions) are defined this way. To see
the syntax declarations just look into the file 'syntax.s7i' which
is part of the Seed7 library:

http://seed7.sourceforge.net/prg/syntax.htm

The concepts to define syntax with the Seed7 Structured Syntax
Description (S7SSD) are explained here:

http://seed7.sourceforge.net/manual/syntax.htm

Hardcoded syntax is used for:
- Comments (Comments start with (* and end with *) )
- Line comments (Line comments start with # )
- Char, string, integer and bigInteger literals.
- Parentheses ( ). Parentheses are used to overrule priority rules.
- Function calls with commas to separate the parameters.
- So called 'Dot expressions' which start with a .

The hardcoded syntax of expressions is explained here:

http://seed7.sourceforge.net/manual/expr.htm

The syntax of whitespace, comments, literals, and identifiers is
explained here:

http://seed7.sourceforge.net/manual/tokens.htm

> > Is 'y' the operator symbol and 'op' used for all
> > operators? An additional function style notation for operator
> > symbols is IMHO a bad idea.
>
> If operation is not a proper name, then, trivially, you cannot use it in
> the contexts where a proper name is expected.

Seed7 uses 'Dot expressions' in syntax declarations to solve this
problem. Dot expressions are parsed with a hardcoded syntax
analysis.

> You cannot get rid of all
> such contexts. Consider referencing an operation as an object rather than
> calling it.

Referencing an operation as an object rather than calling it is
a semantic problem. The syntax is not affected by this. E.g.:
When operators are defined the same syntax is used which is used
when they are called. An integer addition is called with:

1 + 2

And the integer addition operator is declared with:

const func integer:

(in integer: a) + (in integer: b)

is ...

As you can see the + is used as infix operator in both cases.

Dmitry A. Kazakov

unread,

Aug 22, 2009, 3:58:31 AM8/22/09

On Fri, 21 Aug 2009 13:51:52 -0700 (PDT), tm wrote:

> On 21 Aug., 20:19, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:

>> You cannot get rid of all
>> such contexts. Consider referencing an operation as an object rather than
>> calling it.
>
> Referencing an operation as an object rather than calling it is
> a semantic problem.

It is a syntax problem so long the syntax maintains a difference between
proper names and operations. The distinction was introduced in first place
to make it easier to parse (both for humans and the compiler). When you
pass the operation + to the operation integrate as a parameter (closure),
you are using the operation name in the context where a proper name is
expected. This breaks the rules. Either you maintain this distinction
syntactically or not. In the latter case you should consequently drop infix
notation and use something more rigid, like Polish notation.

James Harris

unread,

Aug 22, 2009, 6:54:19 PM8/22/09

On 21 Aug, 14:21, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> On Fri, 21 Aug 2009 05:54:15 -0700 (PDT), James Harris wrote:
> > Opinions sought....
>
> > Many (maybe most) languages accept symbols as infix operators for
> > binary (two-operand) operations such as
>
> > x + y
>
> > Some also predefine words as infix operators such as Pascal's
>
> > i div j
>
> > I would like to offer to programmers the ability to use the same
> > syntax as is available for built-in operations so instead of
>
> > op(a, b)
>
> > the programmer could code
>
> > a op b
>
> > for a user-defined binary operator, op.
>
> This is ambiguous for unary operations. Consider:
>
> +(1)
>
> is it:
>
> 1. Plus(1)
>
> or
>
> 2. Plus(Order_Brackets(1))

I think you lost me. The only thing I can guess you mean is the
difference between using parentheses for grouping of terms and using
them for parameters. If that's the case I'd better explain some more
of my notation.

Where the operation is a word to apply it to parameters I have

word.(parameter)

The dot binds the two so there should be no ambiguity. For symbols I
could perhaps use a dot too so if "~" means ones-complement then I
could use

~.(parameter)

but don't plan (yet) to do so. Instead symbols are aliases of words.
If "~" was a short alternative for "not" one could therefore code

not.(parameter)

>
> > The problem with this is that we have, effectively, three adjacent
> > words. The Pascal example only looks right because 1) i and j look
> > like variable names, 2) div sounds a little like an operation, 3)
> > Pascal reserves div so it is a known word.
>
> Well, without an ambiguity between overloaded parameter list brackets ()
> and ordering brackets (), it is no problem to parse i div j, provided that
> operations are keywords. It is also possible to parse it even if div is not
> reserved:
>
> div div div
>
> is OK and means div(div, div).

I don't think I want to parse "div div div"!

> The problem is with unary operations, which
> will collide with proper names:
>
> plus plus plus
>
> is it + + plus or plus + plus?

Nor "plus plus plus" either. If "plus" had been defined as an operator
it could not also be used as a variable. Conversely it it had been
defined as a variable (by a code obfuscator, perhaps) it could not
also be defined as an operator.

>
>
>
>
> > Of course a programmer defining an operation could ensure that
> > something like a verb was used and could also ensure variables are
> > something like nouns but is that enough?
>
> > I'm thinking of using a modifier symbol. Some languages modify the
> > variables such as
>
> > $a op $b
>
> > It may be better to modify the operation such as
>
> > a $op b
>
> > where $ as a prefix indicates that the word is an operation. Maybe a
> > suffix or both prefix and suffix would be better. Maybe a different
> > symbol should be used.
>
> I think it makes no sense to allow the programmer to introduce new infix
> (and unary) operations because he would need to give them association
> priorities. But you need the priorities in the parser to make it working.

These priorities and associations are NOT necessary. The plan is to
define all such user-defined infix operators as unspecified precedence
and association - at least for now. Then they would need parentheses
to explicitly define the order of application.

> It is possible to reconstruct parser dynamically each time the programmer
> defines an operator, but that will one a can or worms. For example nested
> operations declarations will change the syntax of their scope. Nobody will
> be able to understand such a program or errors spilled by the compiler.
>
> A pragmatic approach is to fix all operations and their priorities. I.e.
> there is predefined symbol + with the priority lower than *, etc.

Yes, that's what I have. Even if symbols are overloaded they still
retain their precedence and associativity.

> You can
> introduce a special syntax for op(x,y,z). For example, C++ uses operator op
> to construct a proper name of op. Ada uses "op". I think these are good
> pragmatic solutions, especially because the syntax op(x,y,z) is rarely used
> with operations.

Not sure what you mean here. By a proper name of op you mean a name
for a symbol?

James

Dmitry A. Kazakov

unread,

Aug 23, 2009, 4:03:05 AM8/23/09

On Sat, 22 Aug 2009 15:54:19 -0700 (PDT), James Harris wrote:

> I think you lost me. The only thing I can guess you mean is the
> difference between using parentheses for grouping of terms and using
> them for parameters. If that's the case I'd better explain some more
> of my notation.
>
> Where the operation is a word to apply it to parameters I have
>
> word.(parameter)
>
> The dot binds the two so there should be no ambiguity. For symbols I
> could perhaps use a dot too so if "~" means ones-complement then I
> could use
>
> ~.(parameter)
>
> but don't plan (yet) to do so. Instead symbols are aliases of words.
> If "~" was a short alternative for "not" one could therefore code
>
> not.(parameter)
>

>> I think it makes no sense to allow the programmer to introduce new infix
>> (and unary) operations because he would need to give them association
>> priorities. But you need the priorities in the parser to make it working.
>
> These priorities and associations are NOT necessary. The plan is to
> define all such user-defined infix operators as unspecified precedence
> and association - at least for now. Then they would need parentheses
> to explicitly define the order of application.

OK, but that would take the charm of infix notation away, you would have
parentheses anyway.

However many levels of priorities become a problem, like in C. I think a
reasonable solution is 4-6 levels plus association rules that forbid
certain combination of the operations of equal priority. Ada deploys this
technique. For example:

x and y or z -- Illegal

logical "and" may not share operands with "or" (they have same priority).

x and (y or z) -- This is OK

>> It is possible to reconstruct parser dynamically each time the programmer
>> defines an operator, but that will one a can or worms. For example nested
>> operations declarations will change the syntax of their scope. Nobody will
>> be able to understand such a program or errors spilled by the compiler.
>>
>> A pragmatic approach is to fix all operations and their priorities. I.e.
>> there is predefined symbol + with the priority lower than *, etc.
>
> Yes, that's what I have. Even if symbols are overloaded they still
> retain their precedence and associativity.

That is for sure, because you cannot resolve overloaded symbols before you
get parsed them. I.e. association comes before semantic analysis.

>> You can
>> introduce a special syntax for op(x,y,z). For example, C++ uses operator op
>> to construct a proper name of op. Ada uses "op". I think these are good
>> pragmatic solutions, especially because the syntax op(x,y,z) is rarely used
>> with operations.
>
> Not sure what you mean here. By a proper name of op you mean a name
> for a symbol?

I think you already have this in the form <op><dot>. I.e. if "+" is an
operation then its proper name is "+.". So

1 + 2
+.(1, 2)
integrate (array_of_data, +., *.) // Passing operations as parameters

I don't think that dot suffix is a good choice. You probably wanted to use
dot as an operation. Traditionally the record member extraction operation
is denoted as dot.

tm

unread,

Aug 26, 2009, 4:24:49 AM8/26/09

On 22 Aug., 09:58, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

wrote:
> On Fri, 21 Aug 2009 13:51:52 -0700 (PDT), tm wrote:
> > On 21 Aug., 20:19, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> > wrote:
> >> You cannot get rid of all
> >> such contexts. Consider referencing an operation as an object rather than
> >> calling it.
>
> > Referencing an operation as an object rather than calling it is
> > a semantic problem.
>
> It is a syntax problem so long the syntax maintains a difference between
> proper names and operations.

Note that Seed7 does not maintain a syntactic difference between
proper names and operations at the level of the + symbol (see
below). A + alone means nothing. In the expression 1+2 the function
'integer + integer' is called.

> The distinction was introduced in first place
> to make it easier to parse (both for humans and the compiler). When you
> pass the operation + to the operation integrate as a parameter (closure),
> you are using the operation name in the context where a proper name is
> expected. This breaks the rules.

Yes, therefore Seed7 tries to avoid braking the rules. A + alone
makes no sense as actual parameter for the function 'integrate'
(Just 'integrate' is also not enough to describe the 'integrate'
function, since several 'integrate' functions might be overloaded,
but in this context I leave the ambiguity of 'integrate'). A call
of 'integrate' can look like:

integrate( (in integer param) + (in integer param) )

where

(in integer param) + (in integer param)

describes the actual parameter for the closure. This notation has
the advantage that the type of the expression (and subexpression)
can be determined without looking where the expression is used.
The length of this notation is not so bad since this category of
closures is not used so often.

The category of closures described above requires that calling the
closure inside the 'integrate' function needs actual parameters
(the two parameters of 'integer + integer'). The implementation of
Seed7 currently does not support this category of closure parameters
(but support is planned for the future).

Closure parameters that can be called without actual parameters are
well supported by Seed7 (E.g.: They are used as condition for
'while' loops).

I plan to introduce the more complicated closure parameters (which
need parameters when the closure is called). I want to use the same
notation as in the declaration of a function. This means that the
integer addition which is defined with:

const func integer: (in integer: a) + (in integer: b) is ...

can be called with

1+2

and the operation itself (closure) can be described with

(in integer param) + (in integer param)

As you can see the priority rules defined for the infix + are used
in all three cases.

> Either you maintain this distinction
> syntactically or not.

Although a call of an operation and the operation itself look
quite different the difference is not syntactical at the level
of the + symbol (E.g.: The EBNF syntax for both + expressions could
be identical). The difference is in the expressions used left and
right of the '+'. But this is not relevant at the syntactic level.

When a function is declared you actually define two things:

1. The semantic how to call the function
2. The semantic how to refer to the function as closure

I plan to use attribute parameters to cover the second case.
The concept of attribute parameters is also used to introduce class
methods:

http://seed7.sourceforge.net/manual/objects.htm#class_methods

Dmitry A. Kazakov

unread,

Aug 26, 2009, 5:02:34 AM8/26/09

Then it is just so that in Seed7 the proper name of + is

(in integer param) + (in integer param)

My personal preference is on the side of C++ and Ada. Full signatures are
extremely tedious to use as names. They do not fit into scoped languages,
where different scopes and the same scope may contain identically named
objects with identical / equivalent signatures. That is why Ada and to a
lesser extent C++ deploy nominal equivalence.

So basically, the choice is bound to the choices nominal vs. structural
equivalence and scopes vs. flat spaces.

James Harris

unread,

Aug 26, 2009, 7:55:39 AM8/26/09

On 23 Aug, 09:03, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

...

> > These priorities and associations are NOT necessary. The plan is to
> > define all such user-defined infix operators as unspecified precedence
> > and association - at least for now. Then they would need parentheses
> > to explicitly define the order of application.
>
>
> OK, but that would take the charm of infix notation away, you would have
> parentheses anyway.

Parentheses would be there but they would appear differently. Instead
of

binop.(binop.(a, b), c)

we would have

(a binop b) binop c

which, I think, is more readable.

>
> However many levels of priorities become a problem, like in C. I think a
> reasonable solution is 4-6 levels plus association rules that forbid
> certain combination of the operations of equal priority. Ada deploys this
> technique. For example:
>
> x and y or z -- Illegal
>
> logical "and" may not share operands with "or" (they have same priority).
>
> x and (y or z) -- This is OK
>

Yes. And releasing the initial version of the language with few
precedences (thus requiring parentheses) allows a few additional
precedences to be supported in later versions of the language, if that
appears sensible.

...

> >> You can
> >> introduce a special syntax for op(x,y,z). For example, C++ uses operator op
> >> to construct a proper name of op. Ada uses "op". I think these are good
> >> pragmatic solutions, especially because the syntax op(x,y,z) is rarely used
> >> with operations.
>
> > Not sure what you mean here. By a proper name of op you mean a name
> > for a symbol?
>
> I think you already have this in the form <op><dot>. I.e. if "+" is an
> operation then its proper name is "+.". So
>
> 1 + 2
> +.(1, 2)
> integrate (array_of_data, +., *.) // Passing operations as parameters
>
> I don't think that dot suffix is a good choice. You probably wanted to use
> dot as an operation. Traditionally the record member extraction operation
> is denoted as dot.

Well, here are my current plans for the humble dot:

I have dot as a "subordinacy" or "binding" operator. For records the
notation is record.field. For indexable sequences or any other type of
mapping including function calls the notation is seq.(index) or
mapping.(arg1, arg2, ... argN). Of course, a dot is also used in
floating point numbers.

Dots are therefore used in many places. Perhaps too many. I hope they
don't become as irritating as Lisp's parentheses....

Anyone see a problem with using dot in these places?

James

bartc

unread,

Aug 26, 2009, 8:14:33 AM8/26/09

James Harris wrote:

> Well, here are my current plans for the humble dot:
>
> I have dot as a "subordinacy" or "binding" operator. For records the
> notation is record.field. For indexable sequences or any other type of
> mapping including function calls the notation is seq.(index) or
> mapping.(arg1, arg2, ... argN). Of course, a dot is also used in
> floating point numbers.
>
> Dots are therefore used in many places. Perhaps too many. I hope they
> don't become as irritating as Lisp's parentheses....
>
> Anyone see a problem with using dot in these places?

Why is the dot necessary in seq.(index) or mapping.(arg...)?

--
bartc

James Harris

unread,

Aug 26, 2009, 9:52:02 AM8/26/09

> James Harris wrote:

Ah - good question. I'll need to explain a bit more.

The first reason is for simple consistency.

Fields are components of records.
Elements are components of arrays.

For example,

record.field selects an element of the record
seq.index selects an element of the sequence

In both cases the dot allows selection of a subcomponent.

Sequences are mappings from the index to the element. In a similar
way, function calls can be seen as mappings. They may not return the
same value each time (nor do arrays) but functions do map inputs to
outputs. So they too get the same format for element reference.

function.argument or arguments

In any of the above if an element reference is an expression or a
tuple or a range it needs to be enclosed in parentheses but otherwise
the parens are optional.

The second reason is that it seems best for data structures to be
constructable by features rather than predefined by names such as
"list" or "vector". Some structures will be simple - such as an array
or a record. Others will be arbitrarily complex. In all cases the idea
is that a dot binds the structure to the subcomponent specification.

This allows arbitrary data structures to be treated as simple ones.
For example, take a FAT-formatted floppy disk. It has a boot sector,
two FATs, a root directory and a data area. If the floppy disk is
represented by variable f we might specify some of its components as

f.fat2 selects all of the second FAT
f.root_dir selects all of the root directory
f.data_sector.(15) selects sector 15 of the data area

As the last example shows, components can themselves be composites.
Field data_sector is part of f. It is also subscripted showing it has
subcomponents. It refers to sector 15 in the data area.

To implement the above the underlying floppy structure, f, may be a
simple array of sectors. It would offset its data_sector field to the
correct starting sector.

Notably, this is intended to work in exactly the same way whether we
have a real floppy disk or just a floppy disk image, and whether there
is caching or not.

The third reason is to do with keeping open the option to parse what I
call command format but hopefully the above is enough to show why I
use dot in seq.(index) and mapping.(arg...).

How does this look? Dotty? :-(

James

Dmitry A. Kazakov

unread,

Aug 26, 2009, 10:22:54 AM8/26/09

On Wed, 26 Aug 2009 06:52:02 -0700 (PDT), James Harris wrote:

> On 26 Aug, 13:14, "bartc" <ba...@freeuk.com> wrote:
>
>> James Harris wrote:
>
>>> Well, here are my current plans for the humble dot:
>>
>>> I have dot as a "subordinacy" or "binding" operator. For records the
>>> notation is record.field. For indexable sequences or any other type of
>>> mapping including function calls the notation is seq.(index) or
>>> mapping.(arg1, arg2, ... argN). Of course, a dot is also used in
>>> floating point numbers.
>>
>>> Dots are therefore used in many places. Perhaps too many. I hope they
>>> don't become as irritating as Lisp's parentheses....
>>
>>> Anyone see a problem with using dot in these places?
>>
>> Why is the dot necessary in seq.(index) or mapping.(arg...)?
>
> Ah - good question. I'll need to explain a bit more.
>
> The first reason is for simple consistency.
>
> Fields are components of records.
> Elements are components of arrays.
>
> For example,
>
> record.field selects an element of the record
> seq.index selects an element of the sequence
>
> In both cases the dot allows selection of a subcomponent.

Yes, but for a record field is a name, for an array index is an expression.
So, record selection should probably be:

record.field.

because

record.field

in your notation could rather mean - take a variable named field and index
record by the value of field.

Traditionally record member is considered itself an operation. So it is the
operation <dot><field-name> which is applied to record, rather than the
operation <select> applied to the arguments record and field. The
distinction is important. Because the former can give birth to methods, all
distinct according to the names of the fields. With <select> you have only
one method, which limits design to flat containers. Further the signature
of <select> has statically same result type (or no type), so statically all
record components would have one type (or none). This type would
dynamically be resolved to the actual specific types at run-time. I.e. you
force yourself to dynamic typing and only dynamic typing,

James Harris

unread,

Aug 26, 2009, 11:29:28 AM8/26/09

On 26 Aug, 15:22, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

Not quite. If the subcomponent is an expression it would need
parentheses. Let me show specific examples. Say we had a boring old
employee record. Its fields might include

employee.id
employee.surname
employee.initial

For an array, if we had an array called scores indexed from 0 up to 3
its elements would be

scores.0
scores.1
scores.2
scores.3

In the above, because they are constants, parentheses are optional so

scores.2
scores.(2)

mean exactly the same. If we wanted to access that array with
subscripts which were expressions - say "i" and "i + 2" we would
require parentheses so we get

scores.(i)
scores.(i + 2)

If we had an associative array p_tab its elements might be accessed by

p_tab."id"
p_tab.(name_type + "name")
p_tab.(field_name)

Parens would be mandatory for the last two entries as they are
expressions. Parentheses would be optional for "id" as it is a
constant.

>
> Traditionally record member is considered itself an operation. So it is the
> operation <dot><field-name> which is applied to record, rather than the
> operation <select> applied to the arguments record and field.

I'm not sure I follow. Are you talking about implementation? My
intention is that the source code express the algorithm but is as
ignorant as possible of the implementation. The idea is that the
implementation can change - perhaps to something faster or to a
debugging version - but the application logic does not need to change.

> The
> distinction is important. Because the former can give birth to methods, all
> distinct according to the names of the fields. With <select> you have only
> one method, which limits design to flat containers. Further the signature
> of <select> has statically same result type (or no type), so statically all
> record components would have one type (or none). This type would
> dynamically be resolved to the actual specific types at run-time. I.e. you
> force yourself to dynamic typing and only dynamic typing,

Interesting. I've not much considered object orientation much as I'm
waiting to see what is readily implementable and what would be too
slow. What I have in mind at this early stage is:

Classes are types and are effectively records with field protection
combined with a pseudo-executable inheritance. (I really don't want to
get into the pseudo-executable part of that just now as that is way
off topic. It's probably enough to just ignore the pseudo-executable
part of it and say that classes are types and are effectively records
with field protection combined with inheritance.)

Methods are effectively executable fields of classes with their own
types which includes the types of results and the types of parameters.

A reference to object instance Inst2's methods (i.e. those from
Inst2's class) might be

Inst2.update_user
Inst2.write_name

These methods would be typed. If I wanted to implement dynamic method
calls which is what I think you are referring to that's getting into
another area. Since each of the methods is typed the types would need
to match. For example, if apply_calculation was a variable referring
to one of the methods of an instance/class it would be typed to match
specific calls only. Then

Inst2.(apply_calculation)

would dynamically select the appropriate method. But as I say this is
really for the future.

A general apology is probably in order: I guess my syntax is confusing
when you see just part of it. I should say it is not arbitrary. In
fact it has all come from analysis of the meaning behind familiar
computing mechanisms and separating application logic from the
mechanisms of implementation. It wasn't enough to explain I used a dot
to separate object from element, eh!

James

James Harris

unread,

Aug 26, 2009, 12:22:06 PM8/26/09

On 21 Aug, 15:49, Robbert Haarman <comp.lang.m...@inglorion.net>
wrote:

> On Fri, Aug 21, 2009 at 05:54:15AM -0700, James Harris wrote:
>
> > I would like to offer to programmers the ability to use the same
> > syntax as is available for built-in operations so instead of
>
> > op(a, b)
>
> > the programmer could code
>
> > a op b
>
> > for a user-defined binary operator, op.
>
> You may want to take a look at how Haskell does it. In Haskell, any function
> whose name consists entirely of "symbols" (characters you would normally
> expect to find in infix operators) is an infix operator. Other names are
> prefix by default. E.g.
>
> 12 + 4
> div 12 4
>
> You can use an infix operator in prefix position by surrounding it with
> parentheses (this is an instance of currying), and you can use a prefix
> function in infix position by surrounding it with backticks:
>
> (+) 12 4
> 12 `div` 4

Thanks, Bob. This is the kind of thing I was looking for. I have my
doubts about the specific syntax used but it shows the same constructs
as I have in mind: turning prefix into infix (and vice versa which is
additional). The result is easy to read though the mechanisms seem
fairly arbitrary.

If the + sign was a symbol which meant the add operation my intention
was that these should be equivalent

12 add 4
12 + 4

And if add was part of the "integer" library they should be equivalent
to

12 integer.add

Turning that into Haskel-esque and adding a couple of other operations

12 integer.`add` 4
12 integer.`sub` 4
12 integer.`mul` 4

Other options: first, an asterisk prefix

12 integer.*add 4
12 integer.*sub 4
12 integer.*mul 4

An asterisk suffix

12 integer.add* 4
12 integer.sub* 4
12 integer.mul* 4

Or maybe using both looks better

12 integer.*add* 4
12 integer.*sub* 4
12 integer.*mul* 4

I'm not sure if any of these would fall foul of the parser and the
asterisk be recognised as a multiplication symbol....

...

> A little example of definition and usage:
>
> -- | Kripke's quus function.
> -- Behaves like +, but returns 5 if either operand is 56 or greater
> x `quus` y
> | x < 56 && y < 56 = x + y
> | otherwise = 5

>
> -- | Tests if x is more or less equal to y
> x +- y = (x >= y * 0.95) && (x <= y * 1.05)

>
> main = do
> print (4 `quus` 5)
> print (56 `quus` 3)
> print (10 +- 11)
> print (105 +- 100)
>
> Disclaimer: I am not a Haskell programmer, so these examples may not be
> idiomatic Haskell.

No need. That's very neat.

James

bartc

unread,

Aug 26, 2009, 2:46:51 PM8/26/09

"James Harris" <james.h...@googlemail.com> wrote in message
news:169f6574-7d8d-4b06...@b14g2000yqd.googlegroups.com...

> On 26 Aug, 13:14, "bartc" <ba...@freeuk.com> wrote:
>
>> James Harris wrote:
>
>> > Well, here are my current plans for the humble dot:
>>
>> > I have dot as a "subordinacy" or "binding" operator. For records the
>> > notation is record.field. For indexable sequences or any other type of
>> > mapping including function calls the notation is seq.(index) or
>> > mapping.(arg1, arg2, ... argN). Of course, a dot is also used in
>> > floating point numbers.
>>
>> > Dots are therefore used in many places. Perhaps too many. I hope they
>> > don't become as irritating as Lisp's parentheses....
>>
>> > Anyone see a problem with using dot in these places?
>>
>> Why is the dot necessary in seq.(index) or mapping.(arg...)?
>
> Ah - good question. I'll need to explain a bit more.
>
> The first reason is for simple consistency.
>
> Fields are components of records.
> Elements are components of arrays.
>
> For example,
>
> record.field selects an element of the record
> seq.index selects an element of the sequence
>
> In both cases the dot allows selection of a subcomponent.

So the dot selects a field or array element, with the parentheses needed for
array elements because otherwise there would be confusion (but you go on to
say these are optional).

(The confusion, assuming you allow seq.12 even though it looks odd, is that
seq.i could be selecting a field i, or indexing an array with i. With my
languages that would be ambiguous because both kinds of i are allowed into
the namespace at the same time, and a record can be both field selected and
indexed!)

That's OK although most array syntax looks like seq[i] or seq(i).

(I did have vaguely similar ideas once, where I grouped my complex objects
into two: multiple values (lists and arrays), and compound values usually
considered a single object (such as records and strings). So I used two
indexing methods:

array[i] i'th element
record.[i] i'th field (as well as regular fields)
string.[i] i'th character
integer.[i] i'th bit

Having the two methods caused some subtle problems however so now I just
have regular a[i] indexing for everything)

--
Bartc

James Harris

unread,

Aug 26, 2009, 3:51:01 PM8/26/09

On 26 Aug, 19:46, "bartc" <ba...@freeuk.com> wrote:
> "James Harris" <james.harri...@googlemail.com> wrote in message

The expression "seq.12" is OK and would be the same as "seq.(12)" (the
expression in parens, 12, resolves to, er, 12 so is fine) but if i is
an integer the expression would need to be "seq.(i)". On the other
hand "seq.i" would try to select a field called i (which wouldn't
exist in an array) and wouldn't compile. To be clear, if i is 12

seq.12
seq.(12)
seq.(i)
seq.(8 + 4)

would all mean the same thing.

seq.i

would be invalid for an array.

>
> That's OK although most array syntax looks like seq[i] or seq(i).
>
> (I did have vaguely similar ideas once, where I grouped my complex objects
> into two: multiple values (lists and arrays), and compound values usually
> considered a single object (such as records and strings).

I'm with you up until not regarding strings as multiple objects. In
terms of indexing how I don't see how they are different from arrays
and lists.

> So I used two
> indexing methods:
>
> array[i] i'th element
> record.[i] i'th field (as well as regular fields)
> string.[i] i'th character
> integer.[i] i'th bit
>
> Having the two methods caused some subtle problems however so now I just
> have regular a[i] indexing for everything)

OK.

James

bartc

unread,

Aug 26, 2009, 4:20:31 PM8/26/09

James Harris wrote:
> On 26 Aug, 19:46, "bartc" <ba...@freeuk.com> wrote:
>> "James Harris" <james.harri...@googlemail.com> wrote in message

>> news:169f6574-7d8d-4b06...@b14g2000yqd.googlegroups.com...

>>> On 26 Aug, 13:14, "bartc" <ba...@freeuk.com> wrote:
>>
>>>> James Harris wrote:

>>>>> I have dot as a "subordinacy" or "binding" operator. For records
>>>>> the notation is record.field. For indexable sequences or any
>>>>> other type of mapping including function calls the notation is
>>>>> seq.(index) or mapping.(arg1, arg2, ... argN). Of course, a dot
>>>>> is also used in floating point numbers.

>>>> Why is the dot necessary in seq.(index) or mapping.(arg...)?

>>> Fields are components of records.

It seems then that indexing in this language will usually be seq.(index), so
that the ability to do seq.intconst would be an exception.

I have also looked at numeric constants following dot operators, and it
seemed to cause problems:

seq.12.34

Is this seq.(12).(34), or seq.(12.34)? And at the lexical level, abc.123
looks at first like a name followed by a floating point constant value. For
that matter, does seq .12 work? What about seq..950 (seq.(0.950) (I'm
assuming floating indices will be converted to integers). Or:

define twelve=12

seq.twelve?

Yes, I think I would insist on the parentheses!

--
Bartc

Rod Pemberton

unread,

Aug 26, 2009, 4:30:31 PM8/26/09

"James Harris" <james.h...@googlemail.com> wrote in message

news:68b51e45-fd8a-40e5...@18g2000yqa.googlegroups.com...

>
> If we had an associative array p_tab its elements might be accessed by
>
> p_tab."id"
> p_tab.(name_type + "name")
> p_tab.(field_name)
>
> Parens would be mandatory for the last two entries as they are
> expressions. Parentheses would be optional for "id" as it is a
> constant.

Sorry, I'm only mildly following the thread. So, how do I know "field_name"
is an expression and not a named constant? I'm assuming an expression can
change, but a named constant would point to a fixed name, like "id".
Without a solid rule of when to use parens, I think I could get confused
between:

field_name can change:
p_tab.(field_name)

field_name is fixed, say reduces to "id" only:
p_tab.field_name

Oh, and you said somewhere else that the user can define their own
operators. Then someone responded that an operator will need precedence.
Um, how does the user define the precedence? Or, will it be maximum or
minimum by default?

James Harris

unread,

Aug 26, 2009, 5:56:37 PM8/26/09

On 26 Aug, 21:20, "bartc" <ba...@freeuk.com> wrote:

...

> > seq.12
> > seq.(12)
> > seq.(i)
> > seq.(8 + 4)
>
> > would all mean the same thing.
>
> > seq.i
>
> > would be invalid for an array.
>
> It seems then that indexing in this language will usually be seq.(index), so
> that the ability to do seq.intconst would be an exception.
>
> I have also looked at numeric constants following dot operators, and it
> seemed to cause problems:
>
> seq.12.34
>
> Is this seq.(12).(34), or seq.(12.34)? And at the lexical level, abc.123
> looks at first like a name followed by a floating point constant value. For
> that matter, does seq .12 work? What about seq..950 (seq.(0.950) (I'm
> assuming floating indices will be converted to integers). Or:
>
> define twelve=12
>
> seq.twelve?
>
> Yes, I think I would insist on the parentheses!

Thanks for explaining your findings. They are useful. In fact the
integer literal suffix of a sequence is not meant to be a virtue,
merely an effort towards consistency and a consequence of certain
design decisions. My main concern was whether people would find the
language had too many dots. You are right that seq.(12).(34) would be
valid - a reference to a two-dimensional structure or a reference to a
one-dimensional component of another one-dimensional component.

Plain floating point literals - i.e. those without an exponent - would
need at least one digit before and one after the decimal point so
there would be no ambiguity. That said, I'm undecided as yet whether
in general to permit whitespace before a binding dot. On one hand I
don't want to make the language too whitespace-sensitive - which
suggests allowing and ignoring any whitespace. On the other hand some
restrictions can make the language safer - which suggests outlawing
whitespace before the dot.

I'm also not decided but am tending away from making any implicit data
conversion that loses information (including the typical conversion
from 32-bit integer to 32-bit float which loses precision) so I may
disallow floating point indices unless they have been truncated or
rounded.

Finally, seq.twelve would be a field reference or, more generally, a
reference to a variable called twelve in the namespace called seq. It
could not refer to a variable called twelve in the active
namespace..... so it would need to be seq.(twelve).

Again, thanks for the pointers. I'm happier now about including the
dots!

James

James Harris

unread,

Aug 26, 2009, 6:41:44 PM8/26/09

On 26 Aug, 21:30, "Rod Pemberton" <do_not_h...@nohavenot.cmm> wrote:
> "James Harris" <james.harri...@googlemail.com> wrote in message

>
> news:68b51e45-fd8a-40e5...@18g2000yqa.googlegroups.com...
>
>
>
> > If we had an associative array p_tab its elements might be accessed by
>
> > p_tab."id"
> > p_tab.(name_type + "name")
> > p_tab.(field_name)
>
> > Parens would be mandatory for the last two entries as they are
> > expressions. Parentheses would be optional for "id" as it is a
> > constant.
>
> Sorry, I'm only mildly following the thread. So, how do I know "field_name"
> is an expression and not a named constant? I'm assuming an expression can
> change, but a named constant would point to a fixed name, like "id".
> Without a solid rule of when to use parens, I think I could get confused
> between:

Haha - I set out to query whether folks found the prevalence of dots
annoying but it seems all the issues are with the parentheses. No
matter - it's a big help to be challenged on those as well.

> field_name can change:
> p_tab.(field_name)

In this case p_tab is an associative array and field_name is a string.
If field_name = "abc" the expression would simply be the same as

p_tab.("abc")

and would hash "abc" and look up the value corresponding to that
label, if any.

> field_name is fixed, say reduces to "id" only:
> p_tab.field_name

In this case p_tab couldn't be an associative array as associative
arrays don't have named fields. It could be a record. If so this would
be a reference to the named field - i.e. the field called "field_name"
in record p_tab. This is a standard field reference such as you might
get in a C struct.

Does that make sense?

>
> Oh, and you said somewhere else that the user can define their own
> operators. Then someone responded that an operator will need precedence.
> Um, how does the user define the precedence? Or, will it be maximum or
> minimum by default?

I mentioned it but don't mind doing so again. We don't all follow
every thread. For now, all user-defined operators need explicit
parentheses so need no precedence and associativity rules. I may relax
this at some point in the future but if I do older code with the
explicit parens will still compile.

Additional: I think unary prefix operators may be safe to associate
without parentheses as they have to go right to left ... unless
someone can say otherwise.

James

Dmitry A. Kazakov

unread,

Aug 27, 2009, 3:39:11 AM8/27/09

You have already started explaining this in a subthread. So if I correctly
understood the idea, employee.id is equivalent to employee."id". I.e. each
name is also a string literal of itself. Considering an example with named
constants, variables and parameterless functions, let id is a variable with
the value "surname", then:

employee.id = employee."id"
employee.(id) = employee."surname"

The latter case dereferences id, the former case does not. It might turn
very confusing.

>> Traditionally record member is considered itself an operation. So it is the
>> operation <dot><field-name> which is applied to record, rather than the
>> operation <select> applied to the arguments record and field.
>
> I'm not sure I follow. Are you talking about implementation? My
> intention is that the source code express the algorithm but is as
> ignorant as possible of the implementation. The idea is that the
> implementation can change - perhaps to something faster or to a
> debugging version - but the application logic does not need to change.

I meant that dot followed by an identifier usually denotes an operation
"get member named as the identifier tells". I.e. there is a compound
operation (let's name it <.id>) defined on the employee type. Formally:

<.id> : employee_type -> id_type

so if employee is of employee_type then employee.id is <.id> called on
employee:

<.id> (employee)

rather than an operation <.> defined on the Cartesian product of the
employee and string types:

<.> : employee_type x string -> component_type (class of)

called on a tuple:

<.> (employee, "id")

These are two semantically different interpretations of the syntax sugar
employee.id with far stretching consequences. The first one is that you
should go straight to classes of components bound as late as at run-time.

>> The
>> distinction is important. Because the former can give birth to methods, all
>> distinct according to the names of the fields. With <select> you have only
>> one method, which limits design to flat containers. Further the signature
>> of <select> has statically same result type (or no type), so statically all
>> record components would have one type (or none). This type would
>> dynamically be resolved to the actual specific types at run-time. I.e. you
>> force yourself to dynamic typing and only dynamic typing,
>
> Interesting. I've not much considered object orientation much as I'm
> waiting to see what is readily implementable and what would be too
> slow. What I have in mind at this early stage is:
>
> Classes are types and are effectively records with field protection
> combined with a pseudo-executable inheritance. (I really don't want to
> get into the pseudo-executable part of that just now as that is way
> off topic. It's probably enough to just ignore the pseudo-executable
> part of it and say that classes are types and are effectively records
> with field protection combined with inheritance.)
>
> Methods are effectively executable fields of classes with their own
> types which includes the types of results and the types of parameters.

OK, this is the "standard model", which is slow (due to redispatch),
asymmetric (cannot handle integers etc), excludes multiple dispatch (cannot
handle + as a method). I dislike it.

Dmitry A. Kazakov

unread,

Aug 27, 2009, 3:55:13 AM8/27/09

On Wed, 26 Aug 2009 20:20:31 GMT, bartc wrote:

> I have also looked at numeric constants following dot operators, and it
> seemed to cause problems:
>
> seq.12.34

Once I designed a language close to what James proposes with regard to the
operation dot. (The idea is somehow infectious (:-))

In particular it has the operations "." and ":", which are used to extract
substrings (and numeric slices as well). ":" takes the string prefix, "."
does its suffix. For example:

"abcdef".3:2 = "cd"

it works as follows:

("abcdef".3) = "cdef"
"cdef":2 = "cd"

After some years of using it, I must concede that it was rather an unwise
choice because of confusion with numeric literals. Exactly as you said.

BTW, my motivation was to attempt getting rid of index/function
parenthesis. The language has only ordering parenthesis. All operations are
either unary or infix. For example: "sin (x)" is merely "sin x"

> Yes, I think I would insist on the parentheses!

Agreed.

Rod Pemberton

unread,

Aug 27, 2009, 4:40:24 AM8/27/09

"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:zm5nrfntir2z$.212ivhpy3lmn.dlg@40tude.net...

> On Wed, 26 Aug 2009 20:20:31 GMT, bartc wrote:
>
> Once I designed a language close to what James proposes with regard to the
> operation dot. (The idea is somehow infectious (:-))
>
> In particular it has the operations "." and ":", which are used to extract
> substrings (and numeric slices as well). ":" takes the string prefix, "."
> does its suffix. For example:
>
> "abcdef".3:2 = "cd"
>
> it works as follows:
>
> ("abcdef".3) = "cdef"
> "cdef":2 = "cd"
>

Heh! You just reminded me of BASIC. BASIC had/has three substring
operators, i.e., LEFT$, RIGHT$, MID$, and one operator for string
concatenation, +. Even after years of C programming with the immensely
useful and powerful C string functions, I'm always amazed by how much those
four operations could do in BASIC. Unfortunately, C needs a function to do
string concatenation, i.e., strcat()...

Rod Pemberton

tm

unread,

Aug 27, 2009, 5:10:47 AM8/27/09

On 26 Aug., 11:02, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

Yes, when you refer to the function itself (note that the Seed7
interpreter does currently not support the use of this signatures
as name of a closure).

> My personal preference is on the side of C++ and Ada. Full signatures are
> extremely tedious to use as names.

Full signatures are only needed in rare cases. Ok, some languages
rely on them when modules/packages are used (to allow a shorter
notation for some elements from a package), but this usage can
IMHO be avoided with a better module/package concept

> They do not fit into scoped languages,
> where different scopes and the same scope may contain identically named
> objects with identical / equivalent signatures.

Two objects with with identical / equivalent signature in the same
scope? That would mean that there are ambiguous expressions which can
only be resolved by context.

Ada supports this kind of expressions, but I don't think they are a
good idea. In Ada the + operator can be overloaded with the same
argument types and different result type. E.g.: Two + operators, one
with an integer and one with a float result. This way 1+2 may have
an integer or a float result and the context decides which operator
should be used.

Seed7 does not allow this kind of overloading. The Seed7 overloading
resolution does not take the result of an expression into account.
This way the overload resolution algorithm works strictly bottom up.
This also makes reading expressions easier for humans.

Seed7 supports scopes (e.g.: local declarations) but the signature
in a scope is always unambiguous.

> That is why Ada and to a
> lesser extent C++ deploy nominal equivalence.

Seed7 - The extensible programming language: User defined statements
Ada and and to a lesser extent C++ have ambiguous expressions
resolved by context. Seed7 tries to avoid this problem by using
unambiguous expressions. In some rare cases this looks complicated,
but in the common case it improves the readability.

> So basically, the choice is bound to the choices nominal vs. structural
> equivalence and scopes vs. flat spaces.

When you refer to objects just by name as in C there is no
overloading. If you want overloading and still refer to objects
by name you get ambiguous expressions which need to be resolved
by context.

You seem to think about ways to resolve ambiguities introduced by
importing modules/packages. Such ambiguities can IMHO be resolved
with other concepts unrelated to the problem to identify function
objects.

James Harris

unread,

Aug 27, 2009, 6:01:05 AM8/27/09

On 27 Aug, 09:40, "Rod Pemberton" <do_not_h...@nohavenot.cmm> wrote:
> "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de> wrote in messagenews:zm5nrfntir2z$.212ivhpy3lmn.dlg@40tude.net...

One could do a lot with mid$ and its friends but they were seriously
horrible, weren't they?

*Far* better, IMHO, is simple string slicing treating the string as an
array of characters.

Concatenation is fine though.

James

James Harris

unread,

Aug 27, 2009, 6:11:21 AM8/27/09

On 27 Aug, 08:39, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

...

I note from the other subthread that you and bartc would insist on the
parentheses. I'll take that on board.

> > Classes are types and are effectively records with field protection
> > combined with a pseudo-executable inheritance. (I really don't want to
> > get into the pseudo-executable part of that just now as that is way
> > off topic. It's probably enough to just ignore the pseudo-executable
> > part of it and say that classes are types and are effectively records
> > with field protection combined with inheritance.)
>
> > Methods are effectively executable fields of classes with their own
> > types which includes the types of results and the types of parameters.
>
> OK, this is the "standard model", which is slow (due to redispatch),
> asymmetric (cannot handle integers etc), excludes multiple dispatch (cannot
> handle + as a method). I dislike it.

"Standard model" is good, "slow" is bad. As mentioned, I'm not at the
stage of implementing this yet. I'll see what I can do for performance
when I get there.

Thanks for the input.

James

bartc

unread,

Aug 27, 2009, 6:44:31 AM8/27/09

Suppose you had a string say s="ABCDEF", and you indexed it using:

s[3]

would the result be a character, or a string of length 1?

(For years I've been using a language with the latter approach, and it's
worked well (after all why should s[3] be that different from the slice
s[3..4]), with asc(s[3]) to get the character value.)

But which is better?

--
Bartc

Dmitry A. Kazakov

unread,

Aug 27, 2009, 8:09:52 AM8/27/09

On Thu, 27 Aug 2009 02:10:47 -0700 (PDT), tm wrote:

> On 26 Aug., 11:02, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:

>> They do not fit into scoped languages,
>> where different scopes and the same scope may contain identically named
>> objects with identical / equivalent signatures.
>
> Two objects with with identical / equivalent signature in the same
> scope? That would mean that there are ambiguous expressions which can
> only be resolved by context.

Not even by context, if the signature includes the result, as it should,
then the only way to resolve ambiguity is per using fully qualified names,
which is a reason to have them.

> Ada supports this kind of expressions, but I don't think they are a
> good idea. In Ada the + operator can be overloaded with the same
> argument types and different result type. E.g.: Two + operators, one
> with an integer and one with a float result. This way 1+2 may have
> an integer or a float result and the context decides which operator
> should be used.

Well this is not really ambiguous. In Ada you can have a context where two
objects of equivalent signatures are visible. These cannot be distinguished
otherwise than qualifying the names.

> Seed7 does not allow this kind of overloading. The Seed7 overloading
> resolution does not take the result of an expression into account.
> This way the overload resolution algorithm works strictly bottom up.
> This also makes reading expressions easier for humans.

Humans use both forms. In English you cannot tell if "set" is a noun or a
verb without the context. Fully inflectional languages do exist (sort of
"bottom up"), but they are far more complex to learn and use than English.

>> That is why Ada and to a
>> lesser extent C++ deploy nominal equivalence.
>
> Seed7 - The extensible programming language: User defined statements
> Ada and and to a lesser extent C++ have ambiguous expressions
> resolved by context. Seed7 tries to avoid this problem by using
> unambiguous expressions. In some rare cases this looks complicated,
> but in the common case it improves the readability.

Well, that depends. Forcing user to invent names for language reasons is a
bad practice. Hungarian notation is the worst example of this plague.
Technically you cannot enforce unique names in a large program. You have to
be able to resolve name clashes in the context. There are only two
thinkable ways: context-local renaming and canonic qualifying. The former
is very obtrusive and totally unreadable when it comes to a large scale
software.

tm

unread,

Aug 27, 2009, 9:29:32 AM8/27/09

On 27 Aug., 14:09, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

wrote:
> On Thu, 27 Aug 2009 02:10:47 -0700 (PDT), tm wrote:
> > On 26 Aug., 11:02, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> > wrote:
> >> They do not fit into scoped languages,
> >> where different scopes and the same scope may contain identically named
> >> objects with identical / equivalent signatures.
>
> > Two objects with with identical / equivalent signature in the same
> > scope? That would mean that there are ambiguous expressions which can
> > only be resolved by context.
>
> Not even by context, if the signature includes the result, as it should,

IMHO a function should be identified by its name and its parameters.
The type of the result should not be needed to identify a function.

Overloading resolution which takes the result into account is not a
good idea. This "feature" costs a lot (expressions must be analyzed
bottom up and top down several times with a nontrivial (maybe even
heuristic) algorithm). And it buys almost nothing since arithmetic
operators can be overloaded and work for any combination of numeric
types (like integer, float, complex, bigInteger, bigRational)
without this "feature".

OTOH a bottom up overloading resolution algorithm is easy to
implement and easy to understand for humans. This way it is also
easy to see why a compiler complains. With a nontrivial overloading
resolution algorithm it can happen that humans think that something
is unambiguous, but the compiler complains and the reason for the
error is not obvious.

> then the only way to resolve ambiguity is per using fully qualified names,
> which is a reason to have them.
>
> > Ada supports this kind of expressions, but I don't think they are a
> > good idea. In Ada the + operator can be overloaded with the same
> > argument types and different result type. E.g.: Two + operators, one
> > with an integer and one with a float result. This way 1+2 may have
> > an integer or a float result and the context decides which operator
> > should be used.
>
> Well this is not really ambiguous.

So you know the result type of 1+2 in the example above?

> In Ada you can have a context where two
> objects of equivalent signatures are visible. These cannot be distinguished
> otherwise than qualifying the names.

Exactly for this reason I think the Ada way of overloading is wrong.

> > Seed7 does not allow this kind of overloading. The Seed7 overloading
> > resolution does not take the result of an expression into account.
> > This way the overload resolution algorithm works strictly bottom up.
> > This also makes reading expressions easier for humans.
>
> Humans use both forms. In English you cannot tell if "set" is a noun or a
> verb without the context. Fully inflectional languages do exist (sort of
> "bottom up"), but they are far more complex to learn and use than English.

Human languages and computer languages cannot be compared in every
aspect. In case of overloading resolution the Ada one is far more
complex to learn and use than the Seed7 one.

> >> That is why Ada and to a
> >> lesser extent C++ deploy nominal equivalence.
>
> > Seed7 - The extensible programming language: User defined statements

This line should not have been here (I sometimes insert this line
to compare the length of lines). Sorry.

> > Ada and and to a lesser extent C++ have ambiguous expressions
> > resolved by context. Seed7 tries to avoid this problem by using
> > unambiguous expressions. In some rare cases this looks complicated,
> > but in the common case it improves the readability.
>
> Well, that depends. Forcing user to invent names for language reasons is a
> bad practice. Hungarian notation is the worst example of this plague.

I was not talking about forcing users to invent names. I was taking
about the need to use something like

(in integer param) + (in integer param)

instead of just

a identifier for a closure when the 'integrate' function is called.

> Technically you cannot enforce unique names in a large program. You have to
> be able to resolve name clashes in the context. There are only two
> thinkable ways: context-local renaming and canonic qualifying. The former
> is very obtrusive and totally unreadable when it comes to a large scale
> software.

I agree, but different libraries usually work with different types so
most things are resolved by normal overloading without the need to
invent strange names. Seed7 has also a feature called attribute
parameter which can be used to attach functions to a type. This
further reduces the need to invent strange names. Attribute
parameters are explained together with class methods here:

http://seed7.sourceforge.net/manual/objects.htm#class_methods

An example of an object declared with an attribute parameter is:

const char: (attr char) . value is ' ';

This attaches '.value' to the type char. To use this constant just
write:

char.value

Seed7 uses this concept to attach default values to all types.
Attribute parameters are not reduced to expressions with '.'.
They can be used in normal functions:

const func circle: create (in integer: radius, attr circle) is
return circle(radius);

This attaches the function 'create' to the type 'circle'. This
function can be called with:

create(10, circle)

Overloading 'create' with several other attribute types is also
possible. As you can see the author of a library has ways to
avoid name clashes to some extent when the library is designed.

Back to the qualifying as you see it. Seed7 does qualifying of
objects without parameters as

myModule.anElement

This is supported in structs and will be also done this way in the
(to be implemented) modules/packages. For objects with parameters
I prefer

myModule.(1+2)

myModule.((in integer param) + (in integer param))

over

1 myModule.+ 2

But this things need to be worked out since Seed7 modules/packages
are currently not implemented. Instead a simpler mechanism (include
libraries) is used and the resolution is done with overloading and
attribute parameters (see above).

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net

Seed7 - The extensible programming language: User defined statements

robin

unread,

Aug 27, 2009, 12:07:20 PM8/27/09

"James Harris" <james.h...@googlemail.com> wrote in message

news:a20767c7-b537-4e6b...@o32g2000yqm.googlegroups.com...

> I would like to offer to programmers the ability to use the same
> syntax as is available for built-in operations so instead of
>
> op(a, b)
>
> the programmer could code
>
> a op b
>
> for a user-defined binary operator, op.
>

> The problem with this is that we have, effectively, three adjacent
> words.

You might like to look at Fortran, which offers that facility
(namely to define new operators).

Charles Lindsey

unread,

Aug 27, 2009, 12:00:27 PM8/27/09

In <6c22a3e0-4f63-49b6...@c29g2000yqd.googlegroups.com> James Harris <james.h...@googlemail.com> writes:

>On 21 Aug, 15:49, Robbert Haarman <comp.lang.m...@inglorion.net>
>wrote:

>> On Fri, Aug 21, 2009 at 05:54:15AM -0700, James Harris wrote:
>>
>> You can use an infix operator in prefix position by surrounding it with
>> parentheses (this is an instance of currying), and you can use a prefix
>> function in infix position by surrounding it with backticks:
>>
>> (+) 12 4
>> 12 `div` 4

>Thanks, Bob. This is the kind of thing I was looking for. I have my
>doubts about the specific syntax used but it shows the same constructs
>as I have in mind: turning prefix into infix (and vice versa which is
>additional). The result is easy to read though the mechanisms seem
>fairly arbitrary.

Ah! At last we have got there. Algol 68 solved this whole problem 40 years
ago. Syntactically, it is no problem, but you DO need two alphabets of
letters. So you have 26 letters which you use to construct identifiers (NO
reserved words needed) and you have another 26 letters which you use for
Types and Operators (and maybe a few reserved words like BEGIN, END,
DO...).

So how to distinguish them? Various alternatives exist:

For Publication in pretty Journals, you use Bold for the 2nd alphabet (a
good old ALGOL convention).

For practical programming, you use the Upper Case letters.

And if you are still stuck in a world without any upper/lower case
distinction (as was common in 1968) you use a "stropping convention"
(usually apostrophes).

So you can have a DIV b
or A 'DIV' B

and you can define Types (ALGOL 68 called them 'Modes') with

MODE LINK = STRUCT(REF LINK head, tail)

which does away with all the trouble you get if you mis-spell some
'typedef' in C.

--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Web: http://www.cs.man.ac.uk/~chl
Email: c...@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

tm

unread,

Aug 27, 2009, 12:59:01 PM8/27/09

So you suggest that all functions with two parameters
can be called prefix and infix?

Some random thoughts:
Why do you want two notations for the same thing?

I am not sure that inconsistent use of infix and
prefix notation will improve readability.

Or do you propose different application areas, such
as calling the function or referring to the function
object itself, for infix and prefix notation?

Functions with three or more parameters probably
don't have this infix/prefix possibility.

Infix notation with stropping and without priority
and associativity is probably not as handy as usual.

Dmitry A. Kazakov

unread,

Aug 27, 2009, 2:53:33 PM8/27/09

On Thu, 27 Aug 2009 06:29:32 -0700 (PDT), tm wrote:

> On 27 Aug., 14:09, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Thu, 27 Aug 2009 02:10:47 -0700 (PDT), tm wrote:
>>> On 26 Aug., 11:02, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>>> wrote:
>>>> They do not fit into scoped languages,
>>>> where different scopes and the same scope may contain identically named
>>>> objects with identical / equivalent signatures.
>>
>>> Two objects with with identical / equivalent signature in the same
>>> scope? That would mean that there are ambiguous expressions which can
>>> only be resolved by context.
>>
>> Not even by context, if the signature includes the result, as it should,
>
> IMHO a function should be identified by its name and its parameters.
> The type of the result should not be needed to identify a function.

Counterexample is represented by parameterless functions and named
constants. Numeric and string literals are such things. If you have several
numeric types you need to overload their literals as well as operations.

> OTOH a bottom up overloading resolution algorithm is easy to
> implement and easy to understand for humans. This way it is also
> easy to see why a compiler complains. With a nontrivial overloading
> resolution algorithm it can happen that humans think that something
> is unambiguous, but the compiler complains and the reason for the
> error is not obvious.

But anybody would expect:

A : Array (1..20) := (others => 0);

Without the context (array of 20 elements), you cannot resolve the array
aggregate of indefinite bounds, unknown index type and unknown type of the
elements.

>> then the only way to resolve ambiguity is per using fully qualified names,
>> which is a reason to have them.
>>
>>> Ada supports this kind of expressions, but I don't think they are a
>>> good idea. In Ada the + operator can be overloaded with the same
>>> argument types and different result type. E.g.: Two + operators, one
>>> with an integer and one with a float result. This way 1+2 may have
>>> an integer or a float result and the context decides which operator
>>> should be used.
>>
>> Well this is not really ambiguous.
>
> So you know the result type of 1+2 in the example above?

Why should I know it? Technically in Ada it is Universal_Integer, but for
the sake of argument, there is no reason to define it in absence of the
target.

>> In Ada you can have a context where two
>> objects of equivalent signatures are visible. These cannot be distinguished
>> otherwise than qualifying the names.
>
> Exactly for this reason I think the Ada way of overloading is wrong.

Actually it is never a problem. In 99% cases when qualified names are used
then with generic instances. But generics is an abomination by itself in
any language. C++ is strictly bottom up, yet templates there is a sheer
horror.

> In case of overloading resolution the Ada one is far more
> complex to learn and use than the Seed7 one.

No need to learn them, if you are not a compiler designer. The programmer
just uses the names he wants. The compiler rejects illegal choices. This is
automatically more friendly, because the body of name clashes is smaller in
Ada than seems to be in Seed7.

The golden rule of language design - do not introduce arbitrary
constraints.

This moves redresses lack of the result type in the form of a fake
parameter, just to make it *overloadable*, as it should have been right
from the start.

Why is this better than obvious:

My_Circle : Circle := Create (10);

> Back to the qualifying as you see it. Seed7 does qualifying of
> objects without parameters as
>
> myModule.anElement
>
> This is supported in structs and will be also done this way in the
> (to be implemented) modules/packages. For objects with parameters
> I prefer
>
> myModule.(1+2)
>
> or
>
> myModule.((in integer param) + (in integer param))
>
> over
>
> 1 myModule.+ 2

In Ada it is, in the case of a conflict or when + is not directly visible:

My_Module."+" (1, 2)

tm

unread,

Aug 27, 2009, 6:22:38 PM8/27/09

On 27 Aug., 20:53, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

wrote:
> On Thu, 27 Aug 2009 06:29:32 -0700 (PDT), tm wrote:
> > On 27 Aug., 14:09, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> > wrote:
> >> On Thu, 27 Aug 2009 02:10:47 -0700 (PDT), tm wrote:
> >>> On 26 Aug., 11:02, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> >>> wrote:
> >>>> They do not fit into scoped languages,
> >>>> where different scopes and the same scope may contain identically named
> >>>> objects with identical / equivalent signatures.
>
> >>> Two objects with with identical / equivalent signature in the same
> >>> scope? That would mean that there are ambiguous expressions which can
> >>> only be resolved by context.
>
> >> Not even by context, if the signature includes the result, as it should,
>
> > IMHO a function should be identified by its name and its parameters.
> > The type of the result should not be needed to identify a function.
>
> Counterexample is represented by parameterless functions and named
> constants. Numeric and string literals are such things. If you have several
> numeric types you need to overload their literals as well as operations.

Maybe for Ada this are counterexamples, but for Seed7 they are not.
In Seed7 a parameterless function and a named constant are both
identified just with the name (but attribute parameters can be used
to attach this names to a type or even several types). The type of
a Seed7 literal is also unambiguous:

5 ... integer literal
'a' ... character literal
1.2 ... float literal
"ab" ... string literal
3_ ... bigInt literal

Please keep in mind that I don't talk about a 'dream' language as
many in this group do. Seed7 is implemented, it can be downloaded
and this concepts work. Just try it.

> > OTOH a bottom up overloading resolution algorithm is easy to
> > implement and easy to understand for humans. This way it is also
> > easy to see why a compiler complains. With a nontrivial overloading
> > resolution algorithm it can happen that humans think that something
> > is unambiguous, but the compiler complains and the reason for the
> > error is not obvious.
>
> But anybody would expect:
>
> A : Array (1..20) := (others => 0);
>
> Without the context (array of 20 elements), you cannot resolve the array
> aggregate of indefinite bounds, unknown index type and unknown type of the
> elements.

This is a typical Ada construct. For Ada your reasoning is probably
ok, but for Seed7 a different view is necessary.

I assume that the Ada example above declares an array of integer
elements and when an element outside the allowed index range is
accessed the value 0 should be returned.

In Seed7 an array is declared this way:

var array integer: A is 20 times 0;

The expression '20 times 0' creates the array value of 20 integer
elements with the value 0. This value is assigned to the variable
'A' which has the type 'array integer'. Accessing elements outside
the allowed range of 1 to 20 results in an RANGE_ERROR exception. To
support your Ada example it would be necessary to define an improved
array type which allows to assign a value outside of the array
elements. Since Seed7 supports abstract data types such an improved
array type can be defined with it. I just prefer to ommit the
implementation here and continue as if it is already defined. The
'times' operator could be extended to something like

20 times 1 others 0

which specifies 20 integer elements with value 1 and the value
0 for all elements outside the allowed range. The declaration would
look like:

var improvedArray integer: A is 20 times 1 others 0;

Assigning an 'others' value later could be done with

A.others := 3;

As you can see: A slight shift in focus and the Seed7 world can
adopt to such needs without ambiguous expressions.

> >> then the only way to resolve ambiguity is per using fully qualified names,
> >> which is a reason to have them.
>
> >>> Ada supports this kind of expressions, but I don't think they are a
> >>> good idea. In Ada the + operator can be overloaded with the same
> >>> argument types and different result type. E.g.: Two + operators, one
> >>> with an integer and one with a float result. This way 1+2 may have
> >>> an integer or a float result and the context decides which operator
> >>> should be used.
>
> >> Well this is not really ambiguous.
>
> > So you know the result type of 1+2 in the example above?
>
> Why should I know it? Technically in Ada it is Universal_Integer,

Correct when + is only defined for integers. But what happens when
+ has been overloaded with:

function "+"(LEFT, RIGHT: INTEGER) return REAL;

In this case you don't know. The Ada compiler will take this
function or the original one depending on the context.

> but for
> the sake of argument, there is no reason to define it in absence of the
> target.
>
> >> In Ada you can have a context where two
> >> objects of equivalent signatures are visible. These cannot be distinguished
> >> otherwise than qualifying the names.
>
> > Exactly for this reason I think the Ada way of overloading is wrong.
>
> Actually it is never a problem. In 99% cases when qualified names are used
> then with generic instances. But generics is an abomination by itself in
> any language. C++ is strictly bottom up, yet templates there is a sheer
> horror.

Seed7 supports functions with type parameters and type result. They
are executed at compile they have the power of templates/generics
without introducing a special syntax. An example "template" which
defines for loops for a given type is here:

http://seed7.sourceforge.net/examples/for_decl.htm

> > In case of overloading resolution the Ada one is far more
> > complex to learn and use than the Seed7 one.
>
> No need to learn them, if you are not a compiler designer. The programmer
> just uses the names he wants. The compiler rejects illegal choices.

And you don't know why the compiler rejects it. Maybe another
compiler does accept it.

> This is
> automatically more friendly, because the body of name clashes is smaller in
> Ada than seems to be in Seed7.

I have probably more knowledge about Ada than you about Seed7 :-)
and I have a different view. "Do what I mean" concepts which are
not understood by the programmer are a possible source of undetected
errors. The programmer has one concept but the compiler has
a different view and this may lead to erroneous behaviour.

> The golden rule of language design - do not introduce arbitrary
> constraints.

Come on, this is not an arbitrary constraint. Doing overloading
without taking the result of a function into account is a natural
concept. Ask people about the result type of

1 + 2

and

1.5 + 3

They will tell you that the first expression has an 'integer'
result and the second one has a 'float' result. Nobody will assume
that there is another + operator which adds two 'integers' but has
a 'float' result. Therefore they will not ask you:

I can only tell you when I know where the epressions are used.

People use the same bottom up algorithm for overload resolution as
Seed7. They identify the + in '1 + 2' as integer addition and the +
in '1.5 + 3' as float addition.

People are instinctively aware that the bottom up overloading
resolution determines the type of every expression and subexpresion
unambibuously.

The bottom up overloading resolution just seems arbitrary when you
are influenced by the Ada overloading concept.

IMHO the bottom up overloading organizes the concept of overloading
just the same way as strucured statements organize the flow of
control. Statements like 'while' are seemingly less powerful then a
spaghetti program with 'goto' statements, nevertheless most
programmers prefer structured statements.

> parameter, ...

The attribute parameter allows class functions in a more
elegant way.

> ... just to make it *overloadable*, as it should have been right
> from the start.

What is right and what is wrong depends on the point of view
(see above).

> Why is this better than obvious:
>
> My_Circle : Circle := Create (10);

1. Because this is obvously only obvious for Ada people. :-)
2. For a "procedure" (which has no result and therefore cannot be
assigned) this approach is not possible.

> > Back to the qualifying as you see it. Seed7 does qualifying of
> > objects without parameters as
>
> > myModule.anElement
>
> > This is supported in structs and will be also done this way in the
> > (to be implemented) modules/packages. For objects with parameters
> > I prefer
>
> > myModule.(1+2)
>
> > or
>
> > myModule.((in integer param) + (in integer param))
>
> > over
>
> > 1 myModule.+ 2
>
> In Ada it is, in the case of a conflict or when + is not directly visible:
>
> My_Module."+" (1, 2)

Ah, the "infix operator is a function and vice versa" concept.

Btw.: I am interested in a critical view at my concepts.
When I don't sound like that let me repeat it: Critic is welcome.
Maybe you should take a view at the Seed7 homepage to show weak
points in my concepts even better.

Dmitry A. Kazakov

unread,

Aug 28, 2009, 4:09:04 AM8/28/09

On Thu, 27 Aug 2009 15:22:38 -0700 (PDT), tm wrote:

> On 27 Aug., 20:53, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Thu, 27 Aug 2009 06:29:32 -0700 (PDT), tm wrote:

>>> IMHO a function should be identified by its name and its parameters.
>>> The type of the result should not be needed to identify a function.
>>
>> Counterexample is represented by parameterless functions and named
>> constants. Numeric and string literals are such things. If you have several
>> numeric types you need to overload their literals as well as operations.
>
> Maybe for Ada this are counterexamples, but for Seed7 they are not.
> In Seed7 a parameterless function and a named constant are both
> identified just with the name (but attribute parameters can be used
> to attach this names to a type or even several types). The type of
> a Seed7 literal is also unambiguous:
>
> 5 ... integer literal

In Ada there is an infinite number of integer types. So already integer
literals are top down.

> 'a' ... character literal

The same is true for character types. Is 'a' ASCII, Latin-1, UCS-2, UCS-4,
EBCDIC character?

> 1.2 ... float literal

Same for real types as for integer ones, and even worse because of fixed
point numbers which share literals with floats.

> "ab" ... string literal

There are more string types than types of characters. You can declare:

type My_Fancy_String is array (Character range <>) of Wide_Character;

and use "ab" as a literal of. Note the My_Fancy_String is indexed by
Character (Latin-1) and contains Wide_Character (UCS-2 Unicode). So a value
of could be:

('a' => 'a', 'b' => 'b')

where 'a' on the left is not 'a' on the right...

> 3_ ... bigInt literal
>
> Please keep in mind that I don't talk about a 'dream' language as
> many in this group do. Seed7 is implemented, it can be downloaded
> and this concepts work. Just try it.

Well, the question was about how to make a dream language. To me
user-defined scalar types are paramount. That implies overloaded literals
and operations.

>>> OTOH a bottom up overloading resolution algorithm is easy to
>>> implement and easy to understand for humans. This way it is also
>>> easy to see why a compiler complains. With a nontrivial overloading
>>> resolution algorithm it can happen that humans think that something
>>> is unambiguous, but the compiler complains and the reason for the
>>> error is not obvious.
>>
>> But anybody would expect:
>>
>> A : Array (1..20) := (others => 0);
>>
>> Without the context (array of 20 elements), you cannot resolve the array
>> aggregate of indefinite bounds, unknown index type and unknown type of the
>> elements.
>
> This is a typical Ada construct. For Ada your reasoning is probably
> ok, but for Seed7 a different view is necessary.

(others => 0) is an array value which cannot be analyzed bottom up. Note
that the information about the array range, the type of index and the type
of the elements is redundant. There is no reason to repeat this stuff on
the right side, though possible:

A : Array_Type (1..20) :=
Array_Type'(Integer'(1)..Integer'(20) => Integer'(0));

It would only bother the programmer and make the program less readable,
while adding no value.

> The expression '20 times 0' creates the array value of 20 integer
> elements with the value 0.

But already this is ambiguous in a language like Ada. "Times" does not
define either the index or the lower bound. In Ada the index can be of any
discrete type. You will need some elaborated type conversion in order to
pass such aggregates to subprograms or assign them. Note that type
conversions is another way to achieve the same effect: top down resolution
of ambiguities.

>>>> then the only way to resolve ambiguity is per using fully qualified names,
>>>> which is a reason to have them.
>>
>>>>> Ada supports this kind of expressions, but I don't think they are a
>>>>> good idea. In Ada the + operator can be overloaded with the same
>>>>> argument types and different result type. E.g.: Two + operators, one
>>>>> with an integer and one with a float result. This way 1+2 may have
>>>>> an integer or a float result and the context decides which operator
>>>>> should be used.
>>
>>>> Well this is not really ambiguous.
>>
>>> So you know the result type of 1+2 in the example above?
>>
>> Why should I know it? Technically in Ada it is Universal_Integer,
>
> Correct when + is only defined for integers. But what happens when
> + has been overloaded with:
>
> function "+"(LEFT, RIGHT: INTEGER) return REAL;
>
> In this case you don't know. The Ada compiler will take this
> function or the original one depending on the context.

Yes it will, and where is a problem? Ada is a strongly typed language you
cannot get anything wrong because of overloading. Any ambiguity is treated
as an error.

>>> In case of overloading resolution the Ada one is far more
>>> complex to learn and use than the Seed7 one.
>>
>> No need to learn them, if you are not a compiler designer. The programmer
>> just uses the names he wants. The compiler rejects illegal choices.
>
> And you don't know why the compiler rejects it. Maybe another
> compiler does accept it.

No chance. Each certified Ada compiler shall undergo Ada Conformity
Assessment Tests (ACATS), only then the compiler vendor may call it "Ada".

>> This is
>> automatically more friendly, because the body of name clashes is smaller in
>> Ada than seems to be in Seed7.
>
> I have probably more knowledge about Ada than you about Seed7 :-)
> and I have a different view. "Do what I mean" concepts which are
> not understood by the programmer are a possible source of undetected
> errors.

I don't see how it applies here. If there are conflicting interpretations
of a construct, the program is rejected in Ada.

> The programmer has one concept but the compiler has
> a different view and this may lead to erroneous behaviour.

Where that follows from? The concept is exactly the one, the programmer
has. Otherwise (in presence of conflicting concepts) the program is
rejected.

>> The golden rule of language design - do not introduce arbitrary
>> constraints.
>
> Come on, this is not an arbitrary constraint. Doing overloading
> without taking the result of a function into account is a natural
> concept. Ask people about the result type of
>
> 1 + 2
>
> and
>
> 1.5 + 3
>
> They will tell you that the first expression has an 'integer'
> result and the second one has a 'float' result.

No, the second is just illegal in Ada because + is not defined on Float x
Integer. A programmer could make the second legal by defining another "+".
That that is his business.

> People are instinctively aware that the bottom up overloading
> resolution determines the type of every expression and subexpresion
> unambibuously.

This is an unsupported claim. On the contrary, observe natural languages
and scientific notations as examples where bottom-up does not work. Human
brain, being heavily parallel, works differently to a computer. To human
perception top down analysis is as natural as bottom up.

>> Why is this better than obvious:
>>
>> My_Circle : Circle := Create (10);
>
> 1. Because this is obvously only obvious for Ada people. :-)

What is unclear in above?

> 2. For a "procedure" (which has no result and therefore cannot be
> assigned) this approach is not possible.

But Create is not a procedure! Look, it won't work for a goto statement
either! (:-))

>> In Ada it is, in the case of a conflict or when + is not directly visible:
>>
>> My_Module."+" (1, 2)
>
> Ah, the "infix operator is a function and vice versa" concept.

Yes. In Ada each operator has a corresponding function, which can be named
and used as a function. For example, + defined on Integer has the name
Standard."+".

> Btw.: I am interested in a critical view at my concepts.

Actually, in my view, there exist some more fundamental language design
principles, which controls matters we are discussing here. For example,
whether scalar types can be user-defined. Whether the language should
support by-value semantics, whether all types become same treatment, which
objects are first-class, what has to be statically checkable etc. People
are often do not understand each other because they are motivated by
different fundamentals.

tm

unread,

Aug 28, 2009, 9:20:45 AM8/28/09

On 28 Aug., 10:09, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

wrote:
> On Thu, 27 Aug 2009 15:22:38 -0700 (PDT), tm wrote:
> > On 27 Aug., 20:53, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> > wrote:
> >> On Thu, 27 Aug 2009 06:29:32 -0700 (PDT), tm wrote:
> >>> IMHO a function should be identified by its name and its parameters.
> >>> The type of the result should not be needed to identify a function.
>
> >> Counterexample is represented by parameterless functions and named
> >> constants. Numeric and string literals are such things. If you have several
> >> numeric types you need to overload their literals as well as operations.
>
> > Maybe for Ada this are counterexamples, but for Seed7 they are not.
> > In Seed7 a parameterless function and a named constant are both
> > identified just with the name (but attribute parameters can be used
> > to attach this names to a type or even several types). The type of
> > a Seed7 literal is also unambiguous:
>

> > 'a' ... character literal
>
> The same is true for character types. Is 'a' ASCII, Latin-1, UCS-2, UCS-4,
> EBCDIC character?

Seed7 characters are UNICODE characters which use the UTF-32
encoding.

> > 1.2 ... float literal
>
> Same for real types as for integer ones, and even worse because of fixed
> point numbers which share literals with floats.
>
> > "ab" ... string literal
>
> There are more string types than types of characters.

Seed7 strings are UNICODE strings which use the UTF-32 encoding.
All conversions to and from strings used by the operating system
(ASCII, Latin-1, UTF-8, UTF-16, ...) are done automatically by the
run-time library. You don't need to care about encoding, codepages
and other low-level stuff used in system calls. BTW: Seed7 strings
are not '\0' terminated and therefore can also contain binary data.
An overview about strings can be found here:

http://seed7.sourceforge.net/manual/types.htm#string

> You can declare:
>
> type My_Fancy_String is array (Character range <>) of Wide_Character;
>
> and use "ab" as a literal of. Note the My_Fancy_String is indexed by
> Character (Latin-1) and contains Wide_Character (UCS-2 Unicode). So a value
> of could be:
>
> ('a' => 'a', 'b' => 'b')
>
> where 'a' on the left is not 'a' on the right...

Really?
Without your explanation I would have never expected that 'a' and
'a' are not the same. I guess that every normal (not Ada) programmer
would assume that 'a' means the same in both cases. This is a good
example for the negative sides of ambiguous expressions.

> > 3_ ... bigInt literal
>
> > Please keep in mind that I don't talk about a 'dream' language as
> > many in this group do. Seed7 is implemented, it can be downloaded
> > and this concepts work. Just try it.
>
> Well, the question was about how to make a dream language.

Yes.
I just had the impression that you treat Seed7 as dream language
(which is not the case). Comparisons of dream with real concepts
make only limited sense. The dream language can always easily win
against an existing implementation. But a dream might look quite
different in reality. OTOH: Cross checking dream languages with
existing implementations makes sense.

> To me
> user-defined scalar types are paramount.

Seed7 supports user defined enumeration types and subtypes of scalar
(and other) types (see below for a link to an example).

> That implies overloaded literals
> and operations.
>
> >>> OTOH a bottom up overloading resolution algorithm is easy to
> >>> implement and easy to understand for humans. This way it is also
> >>> easy to see why a compiler complains. With a nontrivial overloading
> >>> resolution algorithm it can happen that humans think that something
> >>> is unambiguous, but the compiler complains and the reason for the
> >>> error is not obvious.
>
> >> But anybody would expect:
>
> >> A : Array (1..20) := (others => 0);
>
> >> Without the context (array of 20 elements), you cannot resolve the array
> >> aggregate of indefinite bounds, unknown index type and unknown type of the
> >> elements.
>
> > This is a typical Ada construct. For Ada your reasoning is probably
> > ok, but for Seed7 a different view is necessary.
>
> (others => 0) is an array value which cannot be analyzed bottom up. Note
> that the information about the array range, the type of index and the type
> of the elements is redundant. There is no reason to repeat this stuff on
> the right side, though possible:
>
> A : Array_Type (1..20) :=
> Array_Type'(Integer'(1)..Integer'(20) => Integer'(0));
>
> It would only bother the programmer and make the program less readable,
> while adding no value.

The notation

A : Array (1..20) := (others => 0);

is misleading since it suggested to me that the values outside the
array are defined. This is a good example of a notation which is not
easy to read (For a regular Ada user perhaps, but my Ada is a little
bit rusty).

> > The expression '20 times 0' creates the array value of 20 integer
> > elements with the value 0.
>
> But already this is ambiguous in a language like Ada.

But not in Seed7. In the expression '20 times 0' the 'times'
operator uses the index type 'integer' and the lower bound 1.

> "Times" does not
> define either the index or the lower bound.

With

[0 .. 19] times 3

a lower bound of 0 is used. Seed7 also supports arrays where the
index type is not 'integer'.

> In Ada the index can be of any
> discrete type. You will need some elaborated type conversion in order to
> pass such aggregates to subprograms or assign them.

No, in this case no type conversion is necessary (see above).

> Note that type
> conversions is another way to achieve the same effect: top down resolution
> of ambiguities.

A type conversion is a function which has an parameter of one type
and a result of another type. The bottom up overloading resolution
of Seed7 can handle type conversions without problems. A type
conversion is like a function call neither top down nor bottom up.

> >>>> then the only way to resolve ambiguity is per using fully qualified names,
> >>>> which is a reason to have them.
>
> >>>>> Ada supports this kind of expressions, but I don't think they are a
> >>>>> good idea. In Ada the + operator can be overloaded with the same
> >>>>> argument types and different result type. E.g.: Two + operators, one
> >>>>> with an integer and one with a float result. This way 1+2 may have
> >>>>> an integer or a float result and the context decides which operator
> >>>>> should be used.
>
> >>>> Well this is not really ambiguous.
>
> >>> So you know the result type of 1+2 in the example above?
>
> >> Why should I know it? Technically in Ada it is Universal_Integer,
>
> > Correct when + is only defined for integers. But what happens when
> > + has been overloaded with:
>
> > function "+"(LEFT, RIGHT: INTEGER) return REAL;
>
> > In this case you don't know. The Ada compiler will take this
> > function or the original one depending on the context.
>
> Yes it will, and where is a problem?

There is no problem.
This is just the proof that 1+2 can be ambiguous in Ada.

> Ada is a strongly typed language you
> cannot get anything wrong because of overloading. Any ambiguity is treated
> as an error.

As I showed above a sub-expression like 1+2 can be ambiguous.
An Ada compiler would just complain if the ambiguity cannot be
resolved by the context.

> >> This is
> >> automatically more friendly, because the body of name clashes is smaller in
> >> Ada than seems to be in Seed7.
>
> > I have probably more knowledge about Ada than you about Seed7 :-)
> > and I have a different view. "Do what I mean" concepts which are
> > not understood by the programmer are a possible source of undetected
> > errors.
>
> I don't see how it applies here. If there are conflicting interpretations
> of a construct, the program is rejected in Ada.

When the programmer does not know how the overload resolution works
he might think that 'a' has type Latin-1 instead of UCS-2. Here I am
referring to your example

('a' => 'a', 'b' => 'b')

where you said:

"where 'a' on the left is not 'a' on the right"

> > The programmer has one concept but the compiler has

> > a different view and this may lead to erroneous behaviour.
>
> Where that follows from? The concept is exactly the one, the programmer
> has. Otherwise (in presence of conflicting concepts) the program is
> rejected.

See above.

> >> The golden rule of language design - do not introduce arbitrary
> >> constraints.
>
> > Come on, this is not an arbitrary constraint. Doing overloading
> > without taking the result of a function into account is a natural
> > concept. Ask people about the result type of
>
> > 1 + 2
>
> > and
>
> > 1.5 + 3
>
> > They will tell you that the first expression has an 'integer'
> > result and the second one has a 'float' result.
>
> No, the second is just illegal in Ada because + is not defined on Float x
> Integer. A programmer could make the second legal by defining another "+".
> That that is his business.

Ok, this is also the case in Seed7, but I did not assume you ask an
Ada or Seed7 programmer.

> > People are instinctively aware that the bottom up overloading
> > resolution determines the type of every expression and subexpresion

> > unambiguously.

>
> This is an unsupported claim.

Proof: When I asked you for the type of 1+2 you answered:

"Technically in Ada it is Universal_Integer"

You probably used a bottom up approach.

> On the contrary, observe natural languages
> and scientific notations as examples where bottom-up does not work. Human
> brain, being heavily parallel, works differently to a computer. To human
> perception top down analysis is as natural as bottom up.

I did not state that bottom up is always better than top down. Just
for overload resolution I think it is a simpler and more natural
approach with an unambiguous result.

From the time when structured programming was introduced up to today
there are discussions about the advantages of 'goto' statements.
I do not expect that the concept of unambiguous expressions (and
sub-expressions) is accepted without any discussion.

> >> Why is this better than obvious:
>
> >> My_Circle : Circle := Create (10);
>

> > 1. Because this is obviously only obvious for Ada people. :-)

>
> What is unclear in above?

QED. You are an Ada programmer.

> > 2. For a "procedure" (which has no result and therefore cannot be
> > assigned) this approach is not possible.
>
> But Create is not a procedure! Look, it won't work for a goto statement
> either! (:-))

AFAIK Ada supports attributes like S'Digits and S'First. Attributes
are predefined and IMHO the user cannot define new attributes. The
attribute parameters of Seed7 are a concept which allows user
defined attributes.

> >> In Ada it is, in the case of a conflict or when + is not directly visible:
>
> >> My_Module."+" (1, 2)
>
> > Ah, the "infix operator is a function and vice versa" concept.
>
> Yes. In Ada each operator has a corresponding function, which can be named
> and used as a function. For example, + defined on Integer has the name
> Standard."+".

I think this duality of operator/function is not necessary (At least
Seed7 does not need it).

> > Btw.: I am interested in a critical view at my concepts.
>
> Actually, in my view, there exist some more fundamental language design
> principles, which controls matters we are discussing here. For example,
> whether scalar types can be user-defined. Whether the language should
> support by-value semantics, whether all types become same treatment, which
> objects are first-class, what has to be statically checkable etc. People
> are often do not understand each other because they are motivated by
> different fundamentals.

Seed7 allows user defined enumeration types. Subtypes of scalar
types such as 'integer' can also be introduced. See:

http://seed7.sourceforge.net/examples/subtype.htm

Seed7 supports value semantics (I am not sure what you mean with
"by-value semantics").

In Seed7 all types become the same treatment. Even the type 'type'
can be used as type for variables, parameters and function
results. An example using this feature can be found here:

http://seed7.sourceforge.net/examples/for_decl.htm

Seed7 uses static type checking. The reasons behind this decision
can be found here:

http://seed7.sourceforge.net/faq.htm#static_type_checking

I hope this information helps to improve understanding each other.

Dmitry A. Kazakov

unread,

Aug 28, 2009, 12:48:01 PM8/28/09

On Fri, 28 Aug 2009 06:20:45 -0700 (PDT), tm wrote:

> On 28 Aug., 10:09, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Thu, 27 Aug 2009 15:22:38 -0700 (PDT), tm wrote:
>>> On 27 Aug., 20:53, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
>>> wrote:
>>>> On Thu, 27 Aug 2009 06:29:32 -0700 (PDT), tm wrote:
>>>>> IMHO a function should be identified by its name and its parameters.
>>>>> The type of the result should not be needed to identify a function.
>>
>>>> Counterexample is represented by parameterless functions and named
>>>> constants. Numeric and string literals are such things. If you have several
>>>> numeric types you need to overload their literals as well as operations.
>>
>>> Maybe for Ada this are counterexamples, but for Seed7 they are not.
>>> In Seed7 a parameterless function and a named constant are both
>>> identified just with the name (but attribute parameters can be used
>>> to attach this names to a type or even several types). The type of
>>> a Seed7 literal is also unambiguous:
>>
>>> 'a' ... character literal
>>
>> The same is true for character types. Is 'a' ASCII, Latin-1, UCS-2, UCS-4,
>> EBCDIC character?
>
> Seed7 characters are UNICODE characters which use the UTF-32
> encoding.

With no option for user-defined ones, like Latin-1? In case you had that
possibility you would face the problem of overloading character literals.
Or inventing different literals for different types in the way C++ does.
But how are you going to foresee literals for all possible user-defined
types?

> Seed7 strings are UNICODE strings which use the UTF-32 encoding.
> All conversions to and from strings used by the operating system
> (ASCII, Latin-1, UTF-8, UTF-16, ...) are done automatically by the
> run-time library.

Very bad, it is non-portable too.

>> You can declare:
>>
>> type My_Fancy_String is array (Character range <>) of Wide_Character;
>>
>> and use "ab" as a literal of. Note the My_Fancy_String is indexed by
>> Character (Latin-1) and contains Wide_Character (UCS-2 Unicode). So a value
>> of could be:
>>
>> ('a' => 'a', 'b' => 'b')
>>
>> where 'a' on the left is not 'a' on the right...
>
> Really?
> Without your explanation I would have never expected that 'a' and
> 'a' are not the same.

Why should you expect that?

> I guess that every normal (not Ada) programmer
> would assume that 'a' means the same in both cases. This is a good
> example for the negative sides of ambiguous expressions.

No, this is a good example of misunderstanding the differences between
program semantics (what does 'a' mean) and its presentation (as a character
literal). It is not the language business to define the semantics. That is
up to the programmer. The literal 'a' can serve any discrete type.
Consider this type declaration in Ada:

type FSM_Alphabet is ('a', '1', '\');

It is absolutely ungrounded to imagine that 'a' of FSM_Alphabet is a
character. It is NOT.

>> To me user-defined scalar types are paramount.
>
> Seed7 supports user defined enumeration types and subtypes of scalar
> (and other) types (see below for a link to an example).

But no integers, reals etc. BTW, already with enumerations there are lots
of cases where it is required to have something like:

type T is (A, B, C);
type S is (C, D, E); -- It fully legal in Ada to share C between T and S

I take that is impossible in Seed7 as well.

> The notation
>
> A : Array (1..20) := (others => 0);
>
> is misleading since it suggested to me that the values outside the
> array are defined.

In Ada you cannot define values outside the array. It is just meaningless,
there is nothing outside the array.

>>> The expression '20 times 0' creates the array value of 20 integer
>>> elements with the value 0.
>>
>> But already this is ambiguous in a language like Ada.
>
> But not in Seed7. In the expression '20 times 0' the 'times'
> operator uses the index type 'integer' and the lower bound 1.

Why Integer and not Byte, Priority_Level, Identity_No etc?

>> Ada is a strongly typed language you
>> cannot get anything wrong because of overloading. Any ambiguity is treated
>> as an error.
>
> As I showed above a sub-expression like 1+2 can be ambiguous.

Ambiguous in Ada = illegal. You cannot compile an illegal program. No harm
can happen.

> An Ada compiler would just complain if the ambiguity cannot be
> resolved by the context.

Exactly. I hope you are not saying that any random sequence of characters
shall constitute a legal, compilable program.

>>>> This is
>>>> automatically more friendly, because the body of name clashes is smaller in
>>>> Ada than seems to be in Seed7.
>>
>>> I have probably more knowledge about Ada than you about Seed7 :-)
>>> and I have a different view. "Do what I mean" concepts which are
>>> not understood by the programmer are a possible source of undetected
>>> errors.
>>
>> I don't see how it applies here. If there are conflicting interpretations
>> of a construct, the program is rejected in Ada.
>
> When the programmer does not know how the overload resolution works
> he might think that 'a' has type Latin-1 instead of UCS-2. Here I am
> referring to your example
>
> ('a' => 'a', 'b' => 'b')
>
> where you said:
>
> "where 'a' on the left is not 'a' on the right"

The programmer can think anything he wants. What is the problem? So long
there is no ambiguity, everything is OK.

>>> People are instinctively aware that the bottom up overloading
>>> resolution determines the type of every expression and subexpresion
>>> unambiguously.
>>
>> This is an unsupported claim.
>
> Proof: When I asked you for the type of 1+2 you answered:
> "Technically in Ada it is Universal_Integer"
> You probably used a bottom up approach.

No, I did not. Ada declares all integer literals of the type
Universal_Integer which is automatically converted to the particular
integer or modular type. The effect is as if types had literals of their
own. That does not change the semantics. You can treat 1+2 as
Unsigned_64'(1) + Unsigned_64'(2).

There is a subtle but important difference, which can illustrate the
advantages of Ada's model. The standard requires all static numeric
expressions to be evaluated exact. Consider the following:

type T is range 1..2; -- Has only 1 and 2 values
X : T := 1024 / 512; -- This is OK!

Though neither 1024 nor 512 belong to T, the compiler is required to accept
this program because 1024 / 512 is statically 2 which is in T. Observe that
an attempt to qualify the types involved in, in a bottom-up manner would
produce an illegal program,

X : T := T'(1024) / T'(512); -- Illegal, 1024 is not in T!

This example might look artificial, but in real programs it is very handy
not to work around he last one + 1 values etc. You just write what you
mean, and it works.

>> On the contrary, observe natural languages
>> and scientific notations as examples where bottom-up does not work. Human
>> brain, being heavily parallel, works differently to a computer. To human
>> perception top down analysis is as natural as bottom up.
>
> I did not state that bottom up is always better than top down. Just
> for overload resolution I think it is a simpler and more natural
> approach with an unambiguous result.
>
> From the time when structured programming was introduced up to today
> there are discussions about the advantages of 'goto' statements.
> I do not expect that the concept of unambiguous expressions (and
> sub-expressions) is accepted without any discussion.

Huh, the alias name of structured programming was TOP DOWN design.

1+2 perfectly fits the concept. You want to add things at the higher
abstraction level (on the TOP). If the compiler grasps your idea,
everything is OK and you both are happy. If it complains, you descend one
level below (DOWN) and consider what are these types involved etc. It is
just a matter of productivity, comfort and SAFETY. Consider awful integer
literals in C. of different length. You have to specify 1L and if later the
type gets changed you will have to revise the program. That is error prone.

>>>> Why is this better than obvious:
>>
>>>> My_Circle : Circle := Create (10);
>>
>>> 1. Because this is obviously only obvious for Ada people. :-)
>>
>> What is unclear in above?
>
> QED. You are an Ada programmer.

Should I be ashamed, then? (:-)) But you did not answered the question. In
tons of other languages you would find:

int myNumber = create (10);

What's wrong with that? Why Circles shall be different?

>>> 2. For a "procedure" (which has no result and therefore cannot be
>>> assigned) this approach is not possible.
>>
>> But Create is not a procedure! Look, it won't work for a goto statement
>> either! (:-))
>
> AFAIK Ada supports attributes like S'Digits and S'First. Attributes
> are predefined and IMHO the user cannot define new attributes.

Just like operators, some attributes can be redefined in Ada. BTW I am not
a bid fan of attributes, especially because they aren't well integrated
into Ada OO model. I wished to see them primitive operations (AKA "virtual"
in C++).

> The
> attribute parameters of Seed7 are a concept which allows user
> defined attributes.

That is good. Bad is that this feature is misused in order to patch a
language problem, that parameters of a subprogram receive different
treatment depending on whether they are arguments or results.

>>>> In Ada it is, in the case of a conflict or when + is not directly visible:
>>
>>>> My_Module."+" (1, 2)
>>
>>> Ah, the "infix operator is a function and vice versa" concept.
>>
>> Yes. In Ada each operator has a corresponding function, which can be named
>> and used as a function. For example, + defined on Integer has the name
>> Standard."+".
>
> I think this duality of operator/function is not necessary (At least
> Seed7 does not need it).

Consider this:

type Saturated is new Integer range 1..20;
function "+" (Left, Right : Saturated) return Saturated;

How would you implement "+" that overrides the standard one? The
implementation can access it using its fully qualified name. (There are
also other ways. Ada usually offers more than one solution to the problem)

tm

unread,

Aug 28, 2009, 5:01:34 PM8/28/09

On 28 Aug., 18:48, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>

wrote:
> On Fri, 28 Aug 2009 06:20:45 -0700 (PDT), tm wrote:
> > On 28 Aug., 10:09, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> > wrote:
> >> On Thu, 27 Aug 2009 15:22:38 -0700 (PDT), tm wrote:
> >>> On 27 Aug., 20:53, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> >>> wrote:
> >>>> On Thu, 27 Aug 2009 06:29:32 -0700 (PDT), tm wrote:
> >>>>> IMHO a function should be identified by its name and its parameters.
> >>>>> The type of the result should not be needed to identify a function.
>
> >>>> Counterexample is represented by parameterless functions and named
> >>>> constants. Numeric and string literals are such things. If you have several
> >>>> numeric types you need to overload their literals as well as operations.
>
> >>> Maybe for Ada this are counterexamples, but for Seed7 they are not.
> >>> In Seed7 a parameterless function and a named constant are both
> >>> identified just with the name (but attribute parameters can be used
> >>> to attach this names to a type or even several types). The type of
> >>> a Seed7 literal is also unambiguous:
>
> >>> 'a' ... character literal
>
> >> The same is true for character types. Is 'a' ASCII, Latin-1, UCS-2, UCS-4,
> >> EBCDIC character?
>

> > Seed7 strings are UNICODE strings which use the UTF-32 encoding.
> > All conversions to and from strings used by the operating system
> > (ASCII, Latin-1, UTF-8, UTF-16, ...) are done automatically by the
> > run-time library.
>
> Very bad, it is non-portable too.

Totally wrong. Seed7 strings are portable.
Please look at the Seed7 documentation before claiming such things.

> >> You can declare:
>
> >> type My_Fancy_String is array (Character range <>) of Wide_Character;
>
> >> and use "ab" as a literal of. Note the My_Fancy_String is indexed by
> >> Character (Latin-1) and contains Wide_Character (UCS-2 Unicode). So a value
> >> of could be:
>
> >> ('a' => 'a', 'b' => 'b')
>
> >> where 'a' on the left is not 'a' on the right...
>
> > Really?
> > Without your explanation I would have never expected that 'a' and
> > 'a' are not the same.
>
> Why should you expect that?

It seems obvious that 'a' and 'a' is the same.
Most people will agree with me. Ask somebody else to verify.

> >> To me user-defined scalar types are paramount.
>
> > Seed7 supports user defined enumeration types and subtypes of scalar
> > (and other) types (see below for a link to an example).
>
> But no integers, reals etc.

This is wrong. Seed7 supports subtypes of integers and floats also.
Please do not make such unproven claims.

> >>> The expression '20 times 0' creates the array value of 20 integer
> >>> elements with the value 0.
>
> >> But already this is ambiguous in a language like Ada.
>
> > But not in Seed7. In the expression '20 times 0' the 'times'
> > operator uses the index type 'integer' and the lower bound 1.
>
> Why Integer and not Byte, Priority_Level, Identity_No etc?

I did not say that the index must be 'integer'. I wrote:

Seed7 also supports arrays where the
index type is not 'integer'.

which includes Byte, Priority_Level, Identity_No etc.

> >> Ada is a strongly typed language you
> >> cannot get anything wrong because of overloading. Any ambiguity is treated
> >> as an error.
>
> > As I showed above a sub-expression like 1+2 can be ambiguous.
>
> Ambiguous in Ada = illegal. You cannot compile an illegal program. No harm
> can happen.

You obviously don't try to follow my arguments.

> >>>> This is
> >>>> automatically more friendly, because the body of name clashes is smaller in
> >>>> Ada than seems to be in Seed7.
>
> >>> I have probably more knowledge about Ada than you about Seed7 :-)
> >>> and I have a different view. "Do what I mean" concepts which are
> >>> not understood by the programmer are a possible source of undetected
> >>> errors.
>
> >> I don't see how it applies here. If there are conflicting interpretations
> >> of a construct, the program is rejected in Ada.
>
> > When the programmer does not know how the overload resolution works
> > he might think that 'a' has type Latin-1 instead of UCS-2. Here I am
> > referring to your example
>
> > ('a' => 'a', 'b' => 'b')
>
> > where you said:
>
> > "where 'a' on the left is not 'a' on the right"
>
> The programmer can think anything he wants. What is the problem? So long
> there is no ambiguity, everything is OK.

You think that a program that compiles without errors is OK?

> >>> People are instinctively aware that the bottom up overloading
> >>> resolution determines the type of every expression and subexpresion
> >>> unambiguously.
>
> >> This is an unsupported claim.
>
> > Proof: When I asked you for the type of 1+2 you answered:
> > "Technically in Ada it is Universal_Integer"
> > You probably used a bottom up approach.
>
> No, I did not. Ada declares all integer literals of the type
> Universal_Integer which is automatically converted to the particular
> integer or modular type. The effect is as if types had literals of their
> own. That does not change the semantics. You can treat 1+2 as
> Unsigned_64'(1) + Unsigned_64'(2).

When the + is overloaded with

function "+"(LEFT, RIGHT: INTEGER) return REAL;

the expression 1+2 may return a 'REAL' result as well as an
'INTEGER' one.

> There is a subtle but important difference, which can illustrate the
> advantages of Ada's model. The standard requires all static numeric
> expressions to be evaluated exact. Consider the following:
>
> type T is range 1..2; -- Has only 1 and 2 values
> X : T := 1024 / 512; -- This is OK!
>
> Though neither 1024 nor 512 belong to T, the compiler is required to accept
> this program because 1024 / 512 is statically 2 which is in T. Observe that
> an attempt to qualify the types involved in, in a bottom-up manner would
> produce an illegal program,

There is a cast involved when the integer 1024 / 512 is assigned
to T. Seed7 would just require that this cast is explicit instead
of implicit.

> >> On the contrary, observe natural languages
> >> and scientific notations as examples where bottom-up does not work. Human
> >> brain, being heavily parallel, works differently to a computer. To human
> >> perception top down analysis is as natural as bottom up.
>
> > I did not state that bottom up is always better than top down. Just
> > for overload resolution I think it is a simpler and more natural
> > approach with an unambiguous result.
>
> > From the time when structured programming was introduced up to today
> > there are discussions about the advantages of 'goto' statements.
> > I do not expect that the concept of unambiguous expressions (and
> > sub-expressions) is accepted without any discussion.
>
> Huh, the alias name of structured programming was TOP DOWN design.

You seem to misunderstand something.
This is not a fight of TOP DOWN against BOTTOM UP.

> 1+2 perfectly fits the concept. You want to add things at the higher
> abstraction level (on the TOP). If the compiler grasps your idea,
> everything is OK and you both are happy. If it complains, you descend one
> level below (DOWN) and consider what are these types involved etc. It is
> just a matter of productivity, comfort and SAFETY. Consider awful integer
> literals in C. of different length. You have to specify 1L and if later the
> type gets changed you will have to revise the program. That is error prone.

A strong typed language would at least tell you where you need
to change something. In C you don't get helpful error messages
to hint which changes are necessary.

> > AFAIK Ada supports attributes like S'Digits and S'First. Attributes
> > are predefined and IMHO the user cannot define new attributes.
>
> Just like operators, some attributes can be redefined in Ada. BTW I am not
> a bid fan of attributes, especially because they aren't well integrated
> into Ada OO model. I wished to see them primitive operations (AKA "virtual"
> in C++).
>
> > The
> > attribute parameters of Seed7 are a concept which allows user
> > defined attributes.
>
> That is good. Bad is that this feature is misused in order to patch a
> language problem, that parameters of a subprogram receive different
> treatment depending on whether they are arguments or results.

Attribute parameters are a feature that can be used for different
purposes.

> >>>> In Ada it is, in the case of a conflict or when + is not directly visible:
>
> >>>> My_Module."+" (1, 2)
>
> >>> Ah, the "infix operator is a function and vice versa" concept.
>
> >> Yes. In Ada each operator has a corresponding function, which can be named
> >> and used as a function. For example, + defined on Integer has the name
> >> Standard."+".
>
> > I think this duality of operator/function is not necessary (At least
> > Seed7 does not need it).
>
> Consider this:
>
> type Saturated is new Integer range 1..20;
> function "+" (Left, Right : Saturated) return Saturated;
>
> How would you implement "+" that overrides the standard one? The
> implementation can access it using its fully qualified name. (There are
> also other ways. Ada usually offers more than one solution to the problem)

I will think over that.

Dmitry A. Kazakov

unread,

Aug 29, 2009, 4:20:33 AM8/29/09

On Fri, 28 Aug 2009 14:01:34 -0700 (PDT), tm wrote:

> On 28 Aug., 18:48, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:

>>>> To me user-defined scalar types are paramount.
>>
>>> Seed7 supports user defined enumeration types and subtypes of scalar
>>> (and other) types (see below for a link to an example).
>>
>> But no integers, reals etc.
>
> This is wrong. Seed7 supports subtypes of integers and floats also.

I wrote about types. If user-defined integer *types* are supported, do they
have literals? If they do, then there shall be contexts where 1 may mean
more than one type.

>>>>> The expression '20 times 0' creates the array value of 20 integer
>>>>> elements with the value 0.
>>
>>>> But already this is ambiguous in a language like Ada.
>>
>>> But not in Seed7. In the expression '20 times 0' the 'times'
>>> operator uses the index type 'integer' and the lower bound 1.
>>
>> Why Integer and not Byte, Priority_Level, Identity_No etc?
>
> I did not say that the index must be 'integer'. I wrote:
>
> Seed7 also supports arrays where the
> index type is not 'integer'.
>
> which includes Byte, Priority_Level, Identity_No etc.

In that case 20 times 0 is "ambiguous" because 20 may refer to
Priority_Levels Idle, Very_Low, Low etc, or Bytes from 0 to 19, or
whatever.

>>>> Ada is a strongly typed language you
>>>> cannot get anything wrong because of overloading. Any ambiguity is treated
>>>> as an error.
>>
>>> As I showed above a sub-expression like 1+2 can be ambiguous.
>>
>> Ambiguous in Ada = illegal. You cannot compile an illegal program. No harm
>> can happen.
>
> You obviously don't try to follow my arguments.

No, "ambiguity" is a language term. You are using it is a psychological
context of some imaginary layman, who would consider some language
constructs "ambiguous" or not. This is fruitless, because in order to make
statements about psychology or sociology, one should conduct scientific
experiments. Otherwise it is all a matter of taste.

As an OO programmer I am accustomed to generic programming, that is
programming in terms of sets of types (AKA classes). So an expression like
1+2 renders to me to a class of additive types probably of the structure of
a ring or a group, with an operation + and elements 1 and 2. This is quite
enough to grasp the idea (semantics) of the program. The concrete types
involved are of no interests so long my strongly typed language has checked
them OK.

----------------
Anyway. The language design point is rather simple. In presence of
user-defined scalar types (=types that have literals), you have
semantically overloaded literals. Period.

>>>>>> This is
>>>>>> automatically more friendly, because the body of name clashes is smaller in
>>>>>> Ada than seems to be in Seed7.
>>
>>>>> I have probably more knowledge about Ada than you about Seed7 :-)
>>>>> and I have a different view. "Do what I mean" concepts which are
>>>>> not understood by the programmer are a possible source of undetected
>>>>> errors.
>>
>>>> I don't see how it applies here. If there are conflicting interpretations
>>>> of a construct, the program is rejected in Ada.
>>
>>> When the programmer does not know how the overload resolution works
>>> he might think that 'a' has type Latin-1 instead of UCS-2. Here I am
>>> referring to your example
>>
>>> ('a' => 'a', 'b' => 'b')
>>
>>> where you said:
>>
>>> "where 'a' on the left is not 'a' on the right"
>>
>> The programmer can think anything he wants. What is the problem? So long
>> there is no ambiguity, everything is OK.
>
> You think that a program that compiles without errors is OK?

Certainly yes, if its semantics responds the programmer's intention. The
converse is wrong.

If the programmer's intention was to index UCS-2 string using Latin-1
index, why should the language forbid this? Consider it as a task:

The task: create a map of Latin-1 characters to the Unicode UCS-2 code
positions. Then create a To_Lower map.

In Ada this task can be accomplished in a way I used above:

type Latin_to_UCS is array (Character) of Wide_Character;
To_Lower_Case : Latin_to_UCS :=
( 'a' | 'A' => 'a', 'b' | 'B' => 'b', -- and so on
);

It is natural, obvious and elegant to denote Latin-1 'a' and Unicode 'a'
using the same literal. Why should it be otherwise?

>>>>> People are instinctively aware that the bottom up overloading
>>>>> resolution determines the type of every expression and subexpresion
>>>>> unambiguously.
>>
>>>> This is an unsupported claim.
>>
>>> Proof: When I asked you for the type of 1+2 you answered:
>>> "Technically in Ada it is Universal_Integer"
>>> You probably used a bottom up approach.
>>
>> No, I did not. Ada declares all integer literals of the type
>> Universal_Integer which is automatically converted to the particular
>> integer or modular type. The effect is as if types had literals of their
>> own. That does not change the semantics. You can treat 1+2 as
>> Unsigned_64'(1) + Unsigned_64'(2).
>
> When the + is overloaded with
>
> function "+"(LEFT, RIGHT: INTEGER) return REAL;
>
> the expression 1+2 may return a 'REAL' result as well as an
> 'INTEGER' one.

And Unsigned_8, and Integer_16, and Long_Integer, and potentially infinite
number of types. Why should I care without a context?

>> There is a subtle but important difference, which can illustrate the
>> advantages of Ada's model. The standard requires all static numeric
>> expressions to be evaluated exact. Consider the following:
>>
>> type T is range 1..2; -- Has only 1 and 2 values
>> X : T := 1024 / 512; -- This is OK!
>>
>> Though neither 1024 nor 512 belong to T, the compiler is required to accept
>> this program because 1024 / 512 is statically 2 which is in T. Observe that
>> an attempt to qualify the types involved in, in a bottom-up manner would
>> produce an illegal program,
>
> There is a cast involved when the integer 1024 / 512 is assigned
> to T. Seed7 would just require that this cast is explicit instead
> of implicit.

So casting to a known type is supposed to be readable? I prefer a language
where I am not forced to cast obvious expressions.

>> 1+2 perfectly fits the concept. You want to add things at the higher
>> abstraction level (on the TOP). If the compiler grasps your idea,
>> everything is OK and you both are happy. If it complains, you descend one
>> level below (DOWN) and consider what are these types involved etc. It is
>> just a matter of productivity, comfort and SAFETY. Consider awful integer
>> literals in C. of different length. You have to specify 1L and if later the
>> type gets changed you will have to revise the program. That is error prone.
>
> A strong typed language would at least tell you where you need
> to change something.

Yes, and this argument equally refutes what you wrote about "ambiguities".

> Attribute parameters are a feature that can be used for different
> purposes.

Overloading is such a feature. Programmers are not advised to extensively
use it, but in certain cases they have to.

Bart

unread,

Aug 29, 2009, 8:16:43 AM8/29/09

On Aug 28, 5:48 pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
wrote:

> On Fri, 28 Aug 2009 06:20:45 -0700 (PDT), tm wrote:

> Consider this type declaration in Ada:
>
> type FSM_Alphabet is ('a', '1', '\');
>
> It is absolutely ungrounded to imagine that 'a' of FSM_Alphabet is a
> character. It is NOT.

What's the purpose of the quotes then? Usually they designate a
literal value.

And if you're going to say what I think you are, then the following
must also be allowed:

type Whatever = (10,20,30);

with 10, 20, 30 now meaning something other than the obvious. That
can't be right!

> But no integers, reals etc. BTW, already with enumerations there are lots
> of cases where it is required to have something like:
>
> type T is (A, B, C);
> type S is (C, D, E); -- It fully legal in Ada to share C between T and S

And just about everywhere else; C is just a name, and overloading
names is well understood. Although with the following:

type U is (U); -- assuming I can have just one thing in the list

I would expect some problems, depending on whether a type name can
appear in an expression (in my syntax, I think U.U is needed to
resolve tbis.)

--
Bartc

Dmitry A. Kazakov

unread,

Aug 29, 2009, 9:54:57 AM8/29/09

On Sat, 29 Aug 2009 05:16:43 -0700 (PDT), Bart wrote:

> On Aug 28, 5:48�pm, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Fri, 28 Aug 2009 06:20:45 -0700 (PDT), tm wrote:
>
>> Consider this type declaration in Ada:
>>
>> � �type FSM_Alphabet is ('a', '1', '\');
>>
>> It is absolutely ungrounded to imagine that 'a' of FSM_Alphabet is a
>> character. It is NOT.
>
> What's the purpose of the quotes then? Usually they designate a
> literal value.

They do. 'a' is a character literal (Ada Reference Manual (RM) 2.5). But
that does not imply anything about its semantic. As a more insightful could
be:

type EBCDIC is ('A', 'B', 'C', ...);
for EBCDIC use ('A'=>193, 'B'=>194, 'C'=>195, ...);

The above defines a character type with EBCDIC encoding. The 'for' clause
assigns EBCDIC representation for the literals. The obtained characters
have nothing to do with the standard character type. Naturally you can have
EBCDIC strings:

type IBM_String is array (Positive range <>) of EBCDIC;

This string type will have literals like "ABC", again having nothing to do
with the standard string type.

All character and string types in Ada are equal, even user-defined ones.

> And if you're going to say what I think you are, then the following
> must also be allowed:
>
> type Whatever = (10,20,30);
>
> with 10, 20, 30 now meaning something other than the obvious. That
> can't be right!

type Whatever is (10, 20, 30);

is illegal in Ada because 10, 20, 30 are integer literals. An enumeration
type may not have integer literals.

>> But no integers, reals etc. BTW, already with enumerations there are lots
>> of cases where it is required to have something like:
>>
>> � �type T is (A, B, C);
>> � �type S is (C, D, E); -- It fully legal in Ada to share C between T and S
>
> And just about everywhere else; C is just a name, and overloading
> names is well understood.

Exactly. Note that A, B, C in above are literals. 'a', 'b', 'c' are
literals as well. There is no reason why different literals should get
different treatment. In Ada all literals are handled equally. You can
overload A, so you can do 'A', 1 and 1.5e5.

> Although with the following:
>
> type U is (U); -- assuming I can have just one thing in the list
>
> I would expect some problems, depending on whether a type name can
> appear in an expression (in my syntax, I think U.U is needed to
> resolve tbis.)

Yes, Ada does not allow overloading of types and variables being declared
in the same declaration scope. So U will conflict with the literal U.

The following trick is also illegal:

type U_1 is (U);
subtype U is U_1; -- Attempted to get the same effect

This is still illegal because the subtype name U will conflict with the
literal U. You can continue tricking the compiler by:

package P is
type U_1 is (U); -- declared in P (another scope also)
end P;
use P;
subtype U is P.U_1; -- this is OK now

But it wont really help because in any context where both U are visible,
either one hides another, or else both hide each other. For example with
the latter declarations:

X : U := P.U; -- X of subtype U with the value U

To summarize Ada's behavior:

1. functions, procedures, literals and operators are overloadable, other
named objects are not

2. as usual overloading may result in ambiguity

3. non-overloadable things hide each other and overloadable things

There is no way to do it sufficiently different. All languages with
user-definable operators allow oveloading. I.e. the position 1 is always
present. The only question which non-operators should be overloadable as
well.

James Harris

unread,

Aug 30, 2009, 6:58:05 AM8/30/09

This could be a big question. As no replies yet I've started a new
thread with the query above.

James

Marco van de Voort

unread,

Oct 4, 2009, 6:15:31 PM10/4/09

On 2009-08-27, bartc <ba...@freeuk.com> wrote:
>> *Far* better, IMHO, is simple string slicing treating the string as an
>> array of characters.
> Suppose you had a string say s="ABCDEF", and you indexed it using:
>
> s[3]
>
> would the result be a character, or a string of length 1?
>
> (For years I've been using a language with the latter approach, and it's
> worked well (after all why should s[3] be that different from the slice
> s[3..4]), with asc(s[3]) to get the character value.)
>
> But which is better?

Depends on your definition of character. Is s[3] really a character, a
codepoint or just the granularity of your encoding (e.g. 2 with UTF-16,
while characters can be multiple 32-bit codepoints in theory)

Dmitry A. Kazakov

unread,

Oct 5, 2009, 3:50:04 AM10/5/09

I think that character string should obviously consist of characters, where
each character is a code point independently on the encoding. Or better to
say it is of no matter which encoding String has. That is an implementation
detail.

To bring a particular encoding into the picture one should have strings of
octets (for UTF-8) strings of words (UTF-16) etc. These are different types
which basically have nothing to do with String.

Now there could be conversions between Strings (of characters) and strings
of octets in UTF-8.

In a more elaborated type system a String of characters may implement the
interface of a string of octets etc. This means that operator [] is
overloaded in its result, so s[3] is a character, when a character is
expected, and an octet when the octet is. Not a big problem.

Marco van de Voort

unread,

Oct 5, 2009, 4:07:11 AM10/5/09

On 2009-10-05, Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:
>>>
>>> But which is better?
>>
>> Depends on your definition of character. Is s[3] really a character, a
>> codepoint or just the granularity of your encoding (e.g. 2 with UTF-16,
>> while characters can be multiple 32-bit codepoints in theory)
>
> I think that character string should obviously consist of characters, where
> each character is a code point independently on the encoding. Or better to
> say it is of no matter which encoding String has. That is an implementation
> detail.

But in Unicode, (printable) characters can be multiple codepoints. Specially
languages that allow combining of accents need this iirc.

The trouble of using codepoints as character, is that to access s[n] you
have to parse the entire string till you find character n. Might be fine for
a scripting language, but can be a performance killer.

> To bring a particular encoding into the picture one should have strings of
> octets (for UTF-8) strings of words (UTF-16) etc. These are different types
> which basically have nothing to do with String.

I'm talking about any realistic choice to be used as internal storage inside
"String", and how it translates to String's properties. Other types not
included.

Dmitry A. Kazakov

unread,

Oct 5, 2009, 4:52:25 AM10/5/09

On Mon, 5 Oct 2009 08:07:11 +0000 (UTC), Marco van de Voort wrote:

> On 2009-10-05, Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:
>>>>
>>>> But which is better?
>>>
>>> Depends on your definition of character. Is s[3] really a character, a
>>> codepoint or just the granularity of your encoding (e.g. 2 with UTF-16,
>>> while characters can be multiple 32-bit codepoints in theory)
>>
>> I think that character string should obviously consist of characters, where
>> each character is a code point independently on the encoding. Or better to
>> say it is of no matter which encoding String has. That is an implementation
>> detail.
>
> But in Unicode, (printable) characters can be multiple codepoints. Specially
> languages that allow combining of accents need this iirc.

You are right, but that is an insanity the Unicode guys have introduced
upon us. IMO it is hopeless to maintain the "character = glyph" idea. I
would ignore that stuff and stop at the code point.

> The trouble of using codepoints as character, is that to access s[n] you
> have to parse the entire string till you find character n. Might be fine for
> a scripting language, but can be a performance killer.

>> To bring a particular encoding into the picture one should have strings of
>> octets (for UTF-8) strings of words (UTF-16) etc. These are different types
>> which basically have nothing to do with String.
>
> I'm talking about any realistic choice to be used as internal storage inside
> "String", and how it translates to String's properties. Other types not
> included.

The internal representation can be UCS-4 (memory is cheap) or UTF-8 with an
index to speed up search. The language should not specify this. In practice
I would favor an UTF-8 internal representation with cached last fetched
character's octet index. This will suffice for almost all cases of
indexing. Normally characters are scanned either forward or backward.
Random access to string characters is practically never used.

There could be issues with concurrent access to the string cache, but I
presume that all objects including strings to be allocated on the stack, or
else the language should properly handle shared objects anyway.

Marco van de Voort

unread,

Oct 5, 2009, 8:48:28 AM10/5/09

On 2009-10-05, Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:
>>> each character is a code point independently on the encoding. Or better to
>>> say it is of no matter which encoding String has. That is an implementation
>>> detail.
>>
>> But in Unicode, (printable) characters can be multiple codepoints. Specially
>> languages that allow combining of accents need this iirc.
>
> You are right, but that is an insanity the Unicode guys have introduced
> upon us. IMO it is hopeless to maintain the "character = glyph" idea. I
> would ignore that stuff and stop at the code point.

Perfectly sensible for a higher level language. We chose encoding
granularity to keep basic support fast, and provide separate functions to
iterate over codepoints.

>> I'm talking about any realistic choice to be used as internal storage inside
>> "String", and how it translates to String's properties. Other types not
>> included.
>
> The internal representation can be UCS-4 (memory is cheap) or UTF-8 with an
> index to speed up search.

Memory is cheap, but cache is expensive ;-)

Problem with UCS-4 is that no system uses it, which means you are putting
yourself on an island and must convert/marshall to call any external function.
If your runtime is encoded in itself using the standard stringtype, this
hurts. (which can be seen in e.g. processing XML based DB exports). And of
course you are also lugging a lot more bytes along.

Which is why we chose to add both UTF-8 and UTF-16 (in the native
endianness). Partially we had no choice because Delphi did this too,
partially because we really think we need it.

Inside of string routines sometimes 32-bit arrays are used as pseudo 32-bit
type, but it is rarely needed, so for now we decided against making it an
own type.

We are still thinking about what to make default on *nix. (Delphi2009+ has
the standard string type UTF-16, but it doesn't support platforms like *nix
that do UTF-8 as default stringtype)

> The language should not specify this. In practice
> I would favor an UTF-8 internal representation with cached last fetched
> character's octet index. This will suffice for almost all cases of
> indexing. Normally characters are scanned either forward or backward.
> Random access to string characters is practically never used.

Provided that you can of course distinguish iteration from single character
access. If people iterate using for loops this means a lot of analysis.

> There could be issues with concurrent access to the string cache, but I
> presume that all objects including strings to be allocated on the stack, or
> else the language should properly handle shared objects anyway.

In FPC/DElphi strings are first class types, not objects, heapbased,
refcounted, and threadsafe using fairly cheap (CPU level) primitives, aided
by the fact that nesting is not possible. Character access is safe even when
shared due to copy-on-write semantics, which could be extended to the char
level indexes. I'm more worried about the constant scanning of strings to
build these indexes. Maybe in Delphi char level access is more common.

Making them local is not always a solution to performance problems, since it
means making local copies first. (or checking global state on every access)
Throwing out performance out of one window, while saving a bit on the other
side.

To my knowledge FPC and Delphi don't make use of knowledge of single
strings. IOW if the compiler knows it has a locally referenced string only,
it still uses the global routines that check ref count and copy-on-write
status. These calls are cheap, but can hurt in tight loops. (usually avoided
by using pointer arithmethic inside the string's memory)

Dmitry A. Kazakov

unread,

Oct 5, 2009, 9:52:31 AM10/5/09

On Mon, 5 Oct 2009 12:48:28 +0000 (UTC), Marco van de Voort wrote:

> On 2009-10-05, Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:
>>>> each character is a code point independently on the encoding. Or better to
>>>> say it is of no matter which encoding String has. That is an implementation
>>>> detail.
>>>
>>> But in Unicode, (printable) characters can be multiple codepoints. Specially
>>> languages that allow combining of accents need this iirc.
>>
>> You are right, but that is an insanity the Unicode guys have introduced
>> upon us. IMO it is hopeless to maintain the "character = glyph" idea. I
>> would ignore that stuff and stop at the code point.
>
> Perfectly sensible for a higher level language. We chose encoding
> granularity to keep basic support fast, and provide separate functions to
> iterate over codepoints.

But that still does not define "character", if not a code point then what?
It is like trying to define a printer language or terminal escape sequences
in terms what appears on the paper or display screen. The case is lost
before we'd start. Considering something like 'A' & backspace & 'E' = "E"
or maybe "�"... (:-))

>>> I'm talking about any realistic choice to be used as internal storage inside
>>> "String", and how it translates to String's properties. Other types not
>>> included.
>>
>> The internal representation can be UCS-4 (memory is cheap) or UTF-8 with an
>> index to speed up search.
>
> Memory is cheap, but cache is expensive ;-)
>
> Problem with UCS-4 is that no system uses it, which means you are putting
> yourself on an island and must convert/marshall to call any external function.
> If your runtime is encoded in itself using the standard stringtype, this
> hurts. (which can be seen in e.g. processing XML based DB exports). And of
> course you are also lugging a lot more bytes along.

I agree that UCS-4 looks too heavy. And nobody knows if it would not turn
into UTF-32 one day. The downfall of UCS-2 should warn us.

> Which is why we chose to add both UTF-8 and UTF-16 (in the native
> endianness). Partially we had no choice because Delphi did this too,
> partially because we really think we need it.
>
> Inside of string routines sometimes 32-bit arrays are used as pseudo 32-bit
> type, but it is rarely needed, so for now we decided against making it an
> own type.
>
> We are still thinking about what to make default on *nix. (Delphi2009+ has
> the standard string type UTF-16, but it doesn't support platforms like *nix
> that do UTF-8 as default stringtype)
>
>> The language should not specify this. In practice
>> I would favor an UTF-8 internal representation with cached last fetched
>> character's octet index. This will suffice for almost all cases of
>> indexing. Normally characters are scanned either forward or backward.
>> Random access to string characters is practically never used.
>
> Provided that you can of course distinguish iteration from single character
> access.

I do not. The string object just keeps the last octet index computed by
s[n] operation. That gives a translation n=index (or better
next-to-n=index). If then s[m] is asked and |n - m| < m, then n is taken as
the starting position for the search for the index of m. This is just the
old schema of linear search improvement. When this code is inlined then the
optimization might remove all checks for calls to consequent indices, maybe
even in a loop. One can elaborate this schema with some tree of position to
index translation nodes, built on demand when indexing happens.

> If people iterate using for loops this means a lot of analysis.

But considering a typical application like searching in a string map, we
can see that indexing is actually almost never used. A search will instead
use "=" and "<". These operations are equivalent to ones defined on the
octets. So I wouldn't much worry about the performance of s[n]. I would
take UTF-8 + plain search. Nobody would ever notice a difference! (:-))

>> There could be issues with concurrent access to the string cache, but I
>> presume that all objects including strings to be allocated on the stack, or
>> else the language should properly handle shared objects anyway.
>
> In FPC/DElphi strings are first class types, not objects, heapbased,
> refcounted, and threadsafe using fairly cheap (CPU level) primitives, aided
> by the fact that nesting is not possible. Character access is safe even when
> shared due to copy-on-write semantics, which could be extended to the char
> level indexes. I'm more worried about the constant scanning of strings to
> build these indexes. Maybe in Delphi char level access is more common.

(I think that all objects should be objects independently on whether their
types are built in or not. But that is another story.)

> Making them local is not always a solution to performance problems, since it
> means making local copies first. (or checking global state on every access)
> Throwing out performance out of one window, while saving a bit on the other
> side.

The problem is that the stored last search index makes the object mutable
where its semantics is immutable. When accessed from concurrent threads
this index I/O must be atomic or else the compiler must lock each access.

> To my knowledge FPC and Delphi don't make use of knowledge of single
> strings. IOW if the compiler knows it has a locally referenced string only,
> it still uses the global routines that check ref count and copy-on-write
> status. These calls are cheap, but can hurt in tight loops. (usually avoided
> by using pointer arithmethic inside the string's memory)

I think it is a general language design issue, which goes beyond strings.

Marco van de Voort

unread,

Oct 5, 2009, 10:22:57 AM10/5/09

On 2009-10-05, Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:
>>> You are right, but that is an insanity the Unicode guys have introduced
>>> upon us. IMO it is hopeless to maintain the "character = glyph" idea. I
>>> would ignore that stuff and stop at the code point.
>>
>> Perfectly sensible for a higher level language. We chose encoding
>> granularity to keep basic support fast, and provide separate functions to
>> iterate over codepoints.
>
> But that still does not define "character", if not a code point then what?

Well, we already established that a codepoint is not perfectly a character
either :-)

IMHO it is a choice to take encodinggranularity or codepoint as base unit.

> It is like trying to define a printer language or terminal escape sequences
> in terms what appears on the paper or display screen. The case is lost
> before we'd start. Considering something like 'A' & backspace & 'E' = "E"
> or maybe "�"... (:-))

It is with codepoint too, since a codepoint is not necessarily exactly one
glyph.

>> Problem with UCS-4 is that no system uses it, which means you are putting
>> yourself on an island and must convert/marshall to call any external
>> function. If your runtime is encoded in itself using the standard
>> stringtype, this hurts. (which can be seen in e.g. processing XML based
>> DB exports). And of course you are also lugging a lot more bytes along.
>
> I agree that UCS-4 looks too heavy. And nobody knows if it would not turn
> into UTF-32 one day. The downfall of UCS-2 should warn us.

Well. That is less of a problem. It is just that one codepoint (whatever the
range), one glyph already doesn't work.

>> Provided that you can of course distinguish iteration from single character
>> access.
>
> I do not. The string object just keeps the last octet index computed by
> s[n] operation. That gives a translation n=index (or better
> next-to-n=index). If then s[m] is asked and |n - m| < m, then n is taken as
> the starting position for the search for the index of m. This is just the
> old schema of linear search improvement. When this code is inlined then the
> optimization might remove all checks for calls to consequent indices, maybe
> even in a loop. One can elaborate this schema with some tree of position to
> index translation nodes, built on demand when indexing happens.

Possible. At least the state is fairly cheap then (two integers, one for
codepoint index, one for granularity index). Not perfect, but would fix the
worst of the problem.

>> If people iterate using for loops this means a lot of analysis.
>
> But considering a typical application like searching in a string map, we
> can see that indexing is actually almost never used. A search will instead
> use "=" and "<". These operations are equivalent to ones defined on the
> octets. So I wouldn't much worry about the performance of s[n]. I would
> take UTF-8 + plain search. Nobody would ever notice a difference! (:-))

IMHO that is not such a typical operation, since often performed by
optimized library routines, and not endusers.

>> refcounted, and threadsafe using fairly cheap (CPU level) primitives, aided
>> by the fact that nesting is not possible. Character access is safe even when
>> shared due to copy-on-write semantics, which could be extended to the char
>> level indexes. I'm more worried about the constant scanning of strings to
>> build these indexes. Maybe in Delphi char level access is more common.
>
> (I think that all objects should be objects independently on whether their
> types are built in or not. But that is another story.)

I left that behind in Oberon times :-) Too often one sees that it is later
watered down (by final objects, value objects and related marshalling) that
the whole principle is more confusing than it adds.

I always deemed the resulting "Everything is an object, but some objects are
more objects than other objects" more complicated than simply having a few
base types that are not objects.

In the string case for instance, the inability to inherit (and thus nest)
makes cheap automated memory using refcounts possible, to optimize
statements using strings etc.

>> Making them local is not always a solution to performance problems, since it
>> means making local copies first. (or checking global state on every access)
>> Throwing out performance out of one window, while saving a bit on the other
>> side.
>
> The problem is that the stored last search index makes the object mutable
> where its semantics is immutable. When accessed from concurrent threads
> this index I/O must be atomic or else the compiler must lock each access.

Hmm, yes. Stupid me. One must now write to it for read support too. Do you
see a good solution for that? Make that context a local variable for every
string used in a loop? Complex in the compiler, but possible I guess.

>> To my knowledge FPC and Delphi don't make use of knowledge of single
>> strings. IOW if the compiler knows it has a locally referenced string only,
>> it still uses the global routines that check ref count and copy-on-write
>> status. These calls are cheap, but can hurt in tight loops. (usually avoided
>> by using pointer arithmethic inside the string's memory)
>
> I think it is a general language design issue, which goes beyond strings.

It was more to demonstrate that it still performs nicely without even
investing massive time in optimizing it.

Dmitry A. Kazakov

unread,

Oct 6, 2009, 4:57:28 AM10/6/09

On Mon, 5 Oct 2009 14:22:57 +0000 (UTC), Marco van de Voort wrote:

> On 2009-10-05, Dmitry A. Kazakov <mai...@dmitry-kazakov.de> wrote:

>> It is like trying to define a printer language or terminal escape sequences
>> in terms what appears on the paper or display screen. The case is lost
>> before we'd start. Considering something like 'A' & backspace & 'E' = "E"
>> or maybe "�"... (:-))
>
> It is with codepoint too, since a codepoint is not necessarily exactly one
> glyph.

Yes, the idea is broken anyway. It was already in ASCII with its format
control characters. One should not have mixed rendering with encoding.
Although probably any user would intuitively try to associate characters
with rendering elements, rather than with their "true" semantic meaning,
like a phonetic or a punctuation element (in European languages).
Whatever... (:-))

>>> Problem with UCS-4 is that no system uses it, which means you are putting
>>> yourself on an island and must convert/marshall to call any external
>>> function. If your runtime is encoded in itself using the standard
>>> stringtype, this hurts. (which can be seen in e.g. processing XML based
>>> DB exports). And of course you are also lugging a lot more bytes along.
>>
>> I agree that UCS-4 looks too heavy. And nobody knows if it would not turn
>> into UTF-32 one day. The downfall of UCS-2 should warn us.
>
> Well. That is less of a problem. It is just that one codepoint (whatever the
> range), one glyph already doesn't work.

But as I said, I would not try to go after glyphs. Code points as
characters look perfect to me. At least it moves the responsibility of
choice to the Unicode guys. (:-)) Whatever mess they do, it would not
influence the language semantics.

>>> If people iterate using for loops this means a lot of analysis.
>>
>> But considering a typical application like searching in a string map, we
>> can see that indexing is actually almost never used. A search will instead
>> use "=" and "<". These operations are equivalent to ones defined on the
>> octets. So I wouldn't much worry about the performance of s[n]. I would
>> take UTF-8 + plain search. Nobody would ever notice a difference! (:-))
>
> IMHO that is not such a typical operation, since often performed by
> optimized library routines, and not endusers.

Maybe, but s[n] is even less frequent. The point is that almost everything,
where s[n] could come in question, can be expressed directly in terms of
UTF-8 encoded octet string (with roughly same computational complexity).
I.e. if the language provided higher level primitives (like string
comparison) and interfaces of octet string and character stream, then the
problem of inefficient s[n] would vanish. Other examples: string parsing
requires rather character input stream. String formatted output requires
character output stream etc.

>>> refcounted, and threadsafe using fairly cheap (CPU level) primitives, aided
>>> by the fact that nesting is not possible. Character access is safe even when
>>> shared due to copy-on-write semantics, which could be extended to the char
>>> level indexes. I'm more worried about the constant scanning of strings to
>>> build these indexes. Maybe in Delphi char level access is more common.
>>
>> (I think that all objects should be objects independently on whether their
>> types are built in or not. But that is another story.)
>
> I left that behind in Oberon times :-) Too often one sees that it is later
> watered down (by final objects, value objects and related marshalling) that
> the whole principle is more confusing than it adds.

I think it is too early to judge. IMO no language implements that
consistently. I mean down to Boolean objects of value semantics.

> I always deemed the resulting "Everything is an object, but some objects are
> more objects than other objects" more complicated than simply having a few
> base types that are not objects.

Right, "everything is object" has damaged the idea of a uniform type system
enormously. It is just inconsistent. Obviously there must be first and
second class things. The problem is that now this margin goes between
built-in and user-defined or sometimes between by-value and by-reference,
which absolutely artificial and has nothing to do with the semantics of
things. Event if taking into account the OO peoples' usual claim that
objects have identity, that does not imply that "identity" = memory address
(reference). There are too many old myths and poor implementations here.

> In the string case for instance, the inability to inherit (and thus nest)
> makes cheap automated memory using refcounts possible, to optimize
> statements using strings etc.

But does not this hits that maybe there is something wrong with the
implementation of inheritance (and classes), prohibiting vital
optimizations and memory management schemes?

>>> Making them local is not always a solution to performance problems, since it
>>> means making local copies first. (or checking global state on every access)
>>> Throwing out performance out of one window, while saving a bit on the other
>>> side.
>>
>> The problem is that the stored last search index makes the object mutable
>> where its semantics is immutable. When accessed from concurrent threads
>> this index I/O must be atomic or else the compiler must lock each access.
>
> Hmm, yes. Stupid me. One must now write to it for read support too. Do you
> see a good solution for that? Make that context a local variable for every
> string used in a loop? Complex in the compiler, but possible I guess.

Ah, that would be nice if the compiler would allocate a supplementary
local-context object.

BTW, it would be extremely useful to provide this feature to all types,
user defined as well. Considering some immutable data structures which need
to be navigated or hashed. It is always a huge problem to do this
concurrently. If the user could provide some "hash" object referencing the
original and tell the compiler to create it as necessary...

0 new messages