My scripting language - any suggestions?

lican

unread,

Aug 25, 2008, 7:41:28 PM8/25/08

to

Hi!

I've been writing my own scripting language for 6 months (with some
small breaks). I wrote the lexer, parser (similar to recursive
descent, but extended; LL grammar) and now I'm writing the
interpreter. The syntax is similar to C/C++ and the language is mostly
influenced by PHP and Lua. Sample code:

[code]
// a comment

class Child extends Parent
{
public x;
protected y;
private z;

static public function Run()
{
/*
other stuff
*/
super.Run();
}
}

t = [ 1, 2, 3 ]; // array, like array() in PHP
a =
[
x = t,
'y' = 15,
'z' = 'to jest co6',
];

foreach( a as k,v ): petla
{
if( is_array(v) )
foreach( v as v2 )
print(v2);
else
print(v);
}

for( i = 0; i < 5; ++i )
print(i);

k = new Child a;
[/code]

Yeah. It's just like PHP, but without the '$'. One significant
difference will be that you can do

[code]
t.DoSth();
[/code]

instead of

[code]
array_DoSth(t);
[/code]

etc. Upgraded PHP? Something like that... Anyway I'm having some
difficulties. One small decision, should variables should be declared:

[code]
var a = 5; // or
local a = 5; // like in lua or unreal script, or
global a = 5; // declaration of a global variable in given scope, not
like referencing global variables in PHP, or
a = 5;
[/code]

And the second problem. What kind of scope to implement. In PHP you
have either global or local (function) scope. So writing:

[code]
if( variable )
{
a = 5;
}

print(a);
[/code]

gives the result '5'. Something like this would mean a compiler error
in C++. But I'm willing to implement it as per block (like in C/C++)
scope. Anyone seeing any prons/cons? The interpreter is a simple AST
walking class, but when some problems are fixed I will replace it with
a bytecode VM (like in Lua). And as for the VM itself... stack or
register based? :)

That's all of my questions (doubts) so far. Thanks for your help. But
if you're just gonna write 'use (f)lex/yacc/whatever' or 'why another
language, python/ruby/php/whatever is great' please don't :P I'm not
even going to read it. Everyone else is invited to this discussion.
Help me build my first scripting language! :)

Johannes

unread,

Aug 27, 2008, 6:38:53 AM8/27/08

to

On Aug 26, 1:41 am, lican <lica...@gmail.com> wrote:
> Anyway I'm having some
> difficulties. One small decision, should variables should be declared:
>
> [code]
> var a = 5; // or
> local a = 5; // like in lua or unreal script, or
> global a = 5; // declaration of a global variable in given scope, not
> like referencing global variables in PHP, or
> a = 5;
> [/code]

IMO, dynamic languages need to require variable declarations. I still
loathe the waste of time PHP caused because I missed a typo in a
variable name. That's something the compiler should catch. Or at least
handle unit test in a simple way like Ruby. I believe the reason typos
don't bite Ruby people so much because they discover them via unit
tests easily. Also ruby complains if you access a name which it hasn't
seen before. I can't remember what PHP does in that situation.

Regarding the way how to declare the variables: It depends on your
needs. If you plan to support proper class support you need to be able
to say which members are public, protected or private. There are two
ways: Either you put the modifier in front of each member (like Java/
C#) or you put the modifier in a separate line and say, that
everything afterwards has this modifier (like C++/Ruby). It depends to
a part on the need of proper declaration and to a part on the
aesthetics - in other words, if it looks ugly or not.

> And the second problem. What kind of scope to implement. In PHP you
> have either global or local (function) scope. So writing:
>
> [code]
> if( variable )
> {
> a = 5;
>
> }
>
> print(a);
> [/code]
>
> gives the result '5'. Something like this would mean a compiler error
> in C++. But I'm willing to implement it as per block (like in C/C++)
> scope. Anyone seeing any prons/cons?

Using scopes has the advantage to reduce the variable life time. If
you only need the variable a certain number of lines, then accessing
it later should give an error as it goes against declared intent. On
the other hand is separating declaration and assignment a tad ugly
(even if you don't use "var a", you have to put the symbol into the
symbol table somehow). Also you can shadow variables, if a declaration
before prior use is required. If you don't like that you can prevent
any total shadowing (unlike shadowing member variables which can still
be accessed via "this.a") like C# does (read its spec as it is a bit
more involved than I hinted on).

> The interpreter is a simple AST
> walking class, but when some problems are fixed I will replace it with
> a bytecode VM (like in Lua). And as for the VM itself... stack or
> register based? :)

IIRC, .NET uses stack because it makes it easier to verify bytecode.
But I haven't looked into VM design myself, so I can't say anymore on
this subject.

Johannes

Christoffer Lernö

unread,

Aug 27, 2008, 11:52:34 AM8/27/08

to

On 26 Aug, 01:41, lican <lica...@gmail.com> wrote:
> I've been writing my own scripting language for 6 months (with some
> small breaks). I wrote the lexer, parser (similar to recursive
> descent, but extended; LL grammar) and now I'm writing the
> interpreter. The syntax is similar to C/C++ and the language is mostly
> influenced by PHP and Lua. Sample code:

This is a bit similar to what I am playing around with, let me give
you my own very subjective opinions.

> foreach( a as k,v ): petla
> {
> if( is_array(v) )
> foreach( v as v2 )
> print(v2);
> else
> print(v);
>
> }

If you want a more pure OO, it would make more sense with v.is_array()
than the functional is_array(v).

Also consider "Child.new()" instead of "new Child()", since the former
allows you to easily create class clusters (http://developer.apple.com/
documentation/Cocoa/Conceptual/CocoaFundamentals/CocoaObjects/
chapter_3_section_9.html)

> etc. Upgraded PHP? Something like that... Anyway I'm having some
> difficulties. One small decision, should variables should be declared:
>

> var a = 5; // or
> local a = 5; // like in lua or unreal script, or
> global a = 5; // declaration of a global variable in given scope, not
> like referencing global variables in PHP, or
> a = 5

All of these have advantages and drawbacks.
Using something like "var" signals to the reader that the variable is
actually created at this point and it allows you to catch errors like:

var someVal = 5
if (someThing) someVar = 7 // Incorrect spelling of "someVal"
immediately detected by compiler.

On the other hand "a = 5" is very convenient and reduces the amount of
text you both have to read and write. So it is a trade-off.

> And the second problem. What kind of scope to implement. In PHP you
> have either global or local (function) scope. So writing:
>

> if( variable )
> {
> a = 5;
> }
> print(a);

This would be more clear-cut if you had variable-declarations
explicit. In that case, the scope would be expected to be in the block
where it is declared.
Without a declaration then function scope is more reasonable, the
reason is this:

// If enforcing block scope, we need to make a peudo-declaration here:
a = 616; // dummy value to move "a" outside of the if-blocks.
if (variable)
{
a = 5;
}
else
{
a = 6;
}

> walking class, but when some problems are fixed I will replace it with
> a bytecode VM (like in Lua). And as for the VM itself... stack or
> register based? :)

This paper argues that register based VMs are better:
http://www.usenix.org/events/vee05/full_papers/p153-yunhe.pdf

/Christoffer

lican

unread,

Aug 29, 2008, 11:41:41 AM8/29/08

to

Thanks. As Johannes said it's rather a matter of taste if someone
wants to declare variables with or without a keyword. I'm also aware
that depending on the method of declaration the scope matter will be
rather straightforward. I think I'll go with the var keyword. And as
for class fields declaration like "public someVar;" would be
sufficient, without "var publiuc someVar;". Also this kind of solves
the scope problem. I chose the per-block type. Also forgot to write
that I am in fact planning to do something like
"a.is_array();" (almost pure OO). The same for strings and any other
class:

[code]
s = "some text";
if( s.Length() < 5 )
s.Replace('s','t');

if( s.Is(string) )
// sth
[/code]

ect. I believe it would look (and work) better. I read somewhere
(don't remember where really) that there's no significant difference
when it comes to bytecode verification. It's generally done by a
separate (slower) bytecode reader - interpreter. Some time ago I read
that paper you sent Christoffer (also a similar paper can be found on
the Lua page regarding their transition from stack to register VM).
They claim that the register one is faster so I'll go with that. I
have some spare time now so I'm willing to experiment.

The OO code is one of my priorities. I think that even the simple
types like int should have some class for let's say conversion (a = 5;
a.ToFloat()) and such. It really simplifies some things like
a.ToFloat().Floor().ToString() all done in one line ;) I know it's an
extreme example, but I think you get my point.
ToString(Floor((float)a)) doesn't look so good (or maybe it's also a
matter of taste). To be honest I never really heard of class clusters,
but surely I'll look into it.

Thanks for your help.

Mark

Johannes

unread,

Aug 30, 2008, 6:53:57 AM8/30/08

to

On Aug 29, 5:41 pm, lican <lica...@gmail.com> wrote:
> Also forgot to write
> that I am in fact planning to do something like
> "a.is_array();" (almost pure OO). The same for strings and any other
> class:

If you create an array class (like in .NET), you can write
a.Is(Array), too. Would be more consistent and can be expanded to
cover similar cases (like b.Is(Complex<T>) or b.Is(Complex<float>)).
...

> The OO code is one of my priorities. I think that even the simple
> types like int should have some class for let's say conversion (a = 5;
> a.ToFloat()) and such. It really simplifies some things like
> a.ToFloat().Floor().ToString() all done in one line ;) I know it's an
> extreme example, but I think you get my point.

You could create a conversion operator "to" instead using "ToClass"
functions. Then pipelining would go like:

a to Float.Floor() to String

Hmm... The dot is disturbing the aesthetics here, but one can only use
a different way to declare an method invocation to get rid of it. Then
you would have to remove the dot in general or live with two
equivalent ways to do calls. Or it could be that the syntax is just
unfamiliar. Anyway, the advantage of the "to" keyword would be that
you wouldn't write "((MyObject) object).Calculate()" like in C#/Java,
but could write "object to MyObject.Calculate()". You directly know
which expression is converted and don't have to add some extra
parentheses just to get the priorities right. Also, if the expression
is a bit longer, you don't have to memorize the entire thing at once
or that the closing parenthesis still belongs to the conversion. That
being said, you haven't specified how you do cast objects in your
language, so I simply speculate.

Johannes

Dmitry A. Kazakov

unread,

Aug 31, 2008, 5:26:03 AM8/31/08

to

On Fri, 29 Aug 2008 08:41:41 -0700 (PDT), lican wrote:

> The OO code is one of my priorities. I think that even the simple
> types like int should have some class for let's say conversion (a = 5;
> a.ToFloat()) and such. It really simplifies some things like
> a.ToFloat().Floor().ToString() all done in one line ;) I know it's an
> extreme example, but I think you get my point.
> ToString(Floor((float)a)) doesn't look so good (or maybe it's also a
> matter of taste). To be honest I never really heard of class clusters,
> but surely I'll look into it.

Prefix notation X.Y is merely a sugar for Y(X), it is not necessarily
related to classes.

The problem with ToFloat etc, is that this is irregular, you have to define
or not to define the conversions between all possible pairs of types. How
are you going to do this? In presence of user-defined types?

In a language with an elaborated types system Integer and Float would have
subtyping relation making explicit conversions unnecessary, for instance
when Integer were a subtype of Float, then it could inherit contravariant
Floor from Float:

Floor : Integer -> Float (contravariant in the result)

(a bad example, because Floor on integers is an identity function)

Conversion to string is also not that shiny. Actually, from the OO stand
point, it is rather an operation defined on the class Serializable to which
interesting types like Integer belong. This operation should deal with some
object from the class Persistent of which the String type is a member,
TCP_Stream is another, XML_File is yet another etc. Beware, that this
option would require double dispatch.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Aleksey Demakov

unread,

Aug 31, 2008, 10:04:15 AM8/31/08

to

On Sun, Aug 31, 2008 at 4:26 PM, Dmitry A. Kazakov
<mai...@dmitry-kazakov.de> wrote:
> In a language with an elaborated types system Integer and Float would have
> subtyping relation making explicit conversions unnecessary, for instance
> when Integer were a subtype of Float, then it could inherit contravariant
> Floor from Float:
>
> Floor : Integer -> Float (contravariant in the result)

If Integer is a subtype of Float then how would you deal with the
representation of floating point numbers?

If you use hardware-supported 32-bit representation of floats then
there will be a problem with precision. Some Int values cannot not be
precisely represented as floats.

If you use your own representation of Floats then you will have
inefficient floating point ops.

Regards,
Aleksey

Dmitry A. Kazakov

unread,

Aug 31, 2008, 10:58:20 AM8/31/08

to

On Sun, 31 Aug 2008 21:04:15 +0700, Aleksey Demakov wrote:

> On Sun, Aug 31, 2008 at 4:26 PM, Dmitry A. Kazakov
> <mai...@dmitry-kazakov.de> wrote:
>> In a language with an elaborated types system Integer and Float would have
>> subtyping relation making explicit conversions unnecessary, for instance
>> when Integer were a subtype of Float, then it could inherit contravariant
>> Floor from Float:
>>
>> Floor : Integer -> Float (contravariant in the result)
>
> If Integer is a subtype of Float then how would you deal with the
> representation of floating point numbers?

Subtypes are not required to share representations of their values.

> If you use hardware-supported 32-bit representation of floats then
> there will be a problem with precision. Some Int values cannot not be
> precisely represented as floats.

That is up to inherited operations. Basically, if Integer inherits
anything from Float it also does the property of Float being an
interval of [real] numbers, with the consequences of. If Integer can
do this operation better, then it should override. The third
alternative is adding ideal values to the class in the form of NaN or
else an exception propagation.

> If you use your own representation of Floats then you will have
> inefficient floating point ops.

No, the operations defined on the common class may have distinct
implementations for different types (from the class).

Only inherited operations composed with an implicit conversion of the
representation will be slower. But that is exactly what OP wished to
do, using explicit conversions instead... My point was that explicit
conversions are usually bad. They suggest either some subtyping
relation (which has to be articulated), or else a manifestation of
some design problem.

lican

unread,

Aug 31, 2008, 1:05:00 PM8/31/08

to

Yeah, I thought about it. You're right. ToFloat is not scalable. Maybe
something like: To(Float), To(Type)? It's something between
my .ToType() and the 'to' operator proposed Johannes. Every solution
is better than the ugly (SomeClass)var).SomeMethod() ;) The preference
is to use as few (key)word operators as possible. I'm also thinking
about changing new Class to Class.New() or Class.Create(); It would
create a rather consistent interface with methods like object.Clone()
and maybe object.Destroy(). Also the general idea is that all objects
inherit some general methods from the base object called
'Object' (like Java and C#). The methods can be overridden depending
on the type:

- bool Is( Type )
- bool Instance( Type ) or Of( Type ) or InstanceOf( Type )
- Object To( Type )
- String Serialize();
- bool Unserialize();
- Object Clone();
- void Destroy();

As for the int and float representation... the Value class takes care
of that stuff. It's written in C++ and goes something like this:

[code]
class Value
{
public:
Value();
Value( Value& value );

Value& operator =( Value& value );

void SetNull();
void SetBool( bool b );
void SetInt( int i );
void SetFloat( float f );
void SetString( String* s );
...................................

public:
int type; // NULL, BOOL, INT, FLOAT, STRING, ARRAY, REF, OBJECT,
FUNC, ect
union
{
bool b;
int i;
float f;
} ;
Object* o; // everything else
};
[/code]

It's rather simple, but it works. Most scripting VM work that way.

As for the Serialize and To(String) methods, I find them distinct.
I.e. someone wants to display a float to the user, they do
var.To(Float) and get '1234.0987'. But if someone wants to write the
data to a file Serialize would return 'f:1234.0987' or 'float:
1234.0987'. The thing is I think the type:value can be parsed more
easily than just value.

Dmitry A. Kazakov

unread,

Sep 1, 2008, 5:52:48 AM9/1/08

to

On Sun, 31 Aug 2008 10:05:00 -0700 (PDT), lican wrote:

> Yeah, I thought about it. You're right. ToFloat is not scalable. Maybe
> something like: To(Float), To(Type)? It's something between
> my .ToType() and the 'to' operator proposed Johannes. Every solution
> is better than the ugly (SomeClass)var).SomeMethod() ;)

Well, C/C++ has problems it created all by itself. If you want to have the
type name involved, you do not need to twist the language. Make it
straight:

Method (Float (var))

or/and an equivalent postfix sugar

var.Float.Method

You could put "to" around (Float) (or in postfix sugar: Float.to), all this
semantically changes absolutely nothing. The problem of conversions is of
semantic nature.

> The preference is to use as few (key)word operators as possible.

The conversion operation is an operation as any other. It does not require
any special syntax and keywords. BUT this normality implies double
dispatch, unless conversions have to be explicitly defined by the user. The
latter is a lot easier.

> I'm also thinking
> about changing new Class to Class.New() or Class.Create();

The type of the object should be specified in its declaration. Otherwise a
necessity to specify the type indicates immaturity of the language (when
statically typed). It must be clear from the context, what of the type it
is.

> It would
> create a rather consistent interface with methods like object.Clone()
> and maybe object.Destroy(). Also the general idea is that all objects
> inherit some general methods from the base object called
> 'Object' (like Java and C#). The methods can be overridden depending
> on the type:
>
> - bool Is( Type )
> - bool Instance( Type ) or Of( Type ) or InstanceOf( Type )

When type is a first-class object then you can get it from an object and
then define necessary membership tests on the type's type. Note that in
this case the model of common base shall somewhere break.

Anyway for type comparisons, equality is not enough (Is is an equality).
Types form a tree or maybe a more general graph. You need operations
between types and sets of (classes). For example, in order to test if an
object X has a type A, such that A is a descendant of B.

> - Object To( Type )

This is equivalent to double dispatch. It is hard to bite...

> - String Serialize();
> - bool Unserialize();

The reverse to Serialize is an abstract factory.

> - Object Clone();

Not this. An object can be non-copyable, a clock, a hardware port for
instance.

> - void Destroy();

This is a difficult issue. Destructor (and constructors) is not a method.
It must be prevented from being called explicitly.

> As for the int and float representation... the Value class takes care
> of that stuff. It's written in C++ and goes something like this:
>

[...]

>
> It's rather simple, but it works. Most scripting VM work that way.

This is a representation sharing (which is in your case a union). This is
IMO a bad idea, because it is inefficient (distributed overhead). Further
it makes it impossible to control the representation when it is necessary
to do (low-level I/O, hardware support, communication protocol
implementation etc).

> As for the Serialize and To(String) methods, I find them distinct.
> I.e. someone wants to display a float to the user, they do
> var.To(Float) and get '1234.0987'. But if someone wants to write the
> data to a file Serialize would return 'f:1234.0987' or 'float:
> 1234.0987'. The thing is I think the type:value can be parsed more
> easily than just value.

This is why it need to be doubly dispatching. The dispatch goes along two
axes: the source types hierarchy and the hierarchy of the types of the
medium. If the target is Human_Readable_Left_To_Right_String, then
Serialize spits 1234.0987 or maybe, "about thousand" (:-)). When the target
is GTK_Cell_Renderer, then it does way different stuff.

BTW, putting the type into the output is another issue. I don't go into it,
because this is already too close to off-topic.

Aleksey Demakov

unread,

Sep 1, 2008, 1:54:12 PM9/1/08

to

On Sun, Aug 31, 2008 at 9:58 PM, Dmitry A. Kazakov
<mai...@dmitry-kazakov.de> wrote:
>> If you use hardware-supported 32-bit representation of floats then
>> there will be a problem with precision. Some Int values cannot not be
>> precisely represented as floats.
>
> That is up to inherited operations. Basically, if Integer inherits
> anything from Float it also does the property of Float being an
> interval of [real] numbers, with the consequences of. If Integer can
> do this operation better, then it should override. The third
> alternative is adding ideal values to the class in the form of NaN or
> else an exception propagation.
>

Sorry, I still don't get how the inheritance thing alone could
automagically resolve all the subtle numeric issues.

Suppose we have a method that simply returns the sum of two args:

m(a, b) { return a + b; }

I could understand what happens If both args have the same type.
For instance if ints and floats both use 32-bit representation then

m(1, 2000000000)

will quite obviously result in 2000000001

while

m(1.0, 2000000000,0)

will result in 2000000000.0 - 32-bit floats do not have enough
precision to keep the '1' in the end.

Now what will happen if one argument is int and another
is float ?

m(1, 2000000000.0)

will it be 2000000001, or 2000000000,0, or a runtime error?

In any case the user might prefer one way or another. In
the last example if the user wants to preserve the precision
then this could be done by converting float to int. On the
other hand for m(1, 1.0e15) conversion to int will not work
so it should not be done.

So do you reserve the possibility for a user to do the explicit
conversion? Or never is never and explicit conversion is always
a design problem?

Regards,
Aleksey

Dmitry A. Kazakov

unread,

Sep 2, 2008, 1:22:47 PM9/2/08

to

On Tue, 2 Sep 2008 00:54:12 +0700, Aleksey Demakov wrote:

> On Sun, Aug 31, 2008 at 9:58 PM, Dmitry A. Kazakov
> <mai...@dmitry-kazakov.de> wrote:

>>> If you use hardware-supported 32-bit representation of floats then
>>> there will be a problem with precision. Some Int values cannot not be
>>> precisely represented as floats.
>>
>> That is up to inherited operations. Basically, if Integer inherits
>> anything from Float it also does the property of Float being an
>> interval of [real] numbers, with the consequences of. If Integer can
>> do this operation better, then it should override. The third
>> alternative is adding ideal values to the class in the form of NaN or
>> else an exception propagation.
>
> Sorry, I still don't get how the inheritance thing alone could
> automagically resolve all the subtle numeric issues.

It cannot. But the question was about shared vs separate representations,
which is not a numerical issue.

> Suppose we have a method that simply returns the sum of two args:
>
> m(a, b) { return a + b; }

First of all, this is not a specification of the method. The types of the
arguments and of the result are unspecified, as well as the covariance of.
So one cannot tell which combinations of arguments and result types are
involved.

> I could understand what happens If both args have the same type.
> For instance if ints and floats both use 32-bit representation then
>
> m(1, 2000000000)
>
> will quite obviously result in 2000000001
>
> while
>
> m(1.0, 2000000000,0)
>
> will result in 2000000000.0 - 32-bit floats do not have enough
> precision to keep the '1' in the end.
>
> Now what will happen if one argument is int and another
> is float ?
>
> m(1, 2000000000.0)

That depends on how the question above is answered and of course on the
semantics of m. Note that the language is not to specify the semantics of
m.

As for multimethods, yes there are six combinations of 3 (2 arguments + 1
result) x 2 types. The semantics of m shall unambiguously define all six.
But again it is not the language business, except for the predefined
operations of course. The language shall merely allow an implementation of
the desired semantics for all combinations in question.

> will it be 2000000001, or 2000000000,0, or a runtime error?
>
> In any case the user might prefer one way or another. In
> the last example if the user wants to preserve the precision
> then this could be done by converting float to int. On the
> other hand for m(1, 1.0e15) conversion to int will not work
> so it should not be done.

The result is involved, provided that m is covariant, or else when Float is
the ancestor of Integer and the result is contravariant. As for the
semantics (numerical value of the result), see above.

> So do you reserve the possibility for a user to do the explicit
> conversion?

Certainly yes. An explicit conversion is merely a subprogram. But I would
also allow user-defined ad-hoc subtypes, so that one could tie two
originally independent hierarchies.

> Or never is never and explicit conversion is always
> a design problem?

Close to that. When I analyse the cases where I used conversions, most of
them liked suspicious or else were introduced by language limitations.

lican

unread,

Sep 4, 2008, 7:59:35 AM9/4/08

to

I think we're all getting a bit ahead of ourselfs. The thing is I
specify what type gets to be cast and when. The default to 100.0 + 1
would be 101.0, so as you see with flaot and int operation the default
type is float. If a programmer wants the operation done in a different
way, the person can do:

a = 100.0
b = 1;
s = "string text";
c = a.ToInt() + b; or c = (int)a + b; or c = a.To(int)
+ b;
d = s + a; // d == "string text100.0"

If you don't like the float I can make it double or even long double.
That really doesn't matter right now. I'll get to that while
implementing operators in the VM.

And please do remember that this is meant to be a scripting language
with dynamic typing, running on a VM. The will be no low level magic,
machine code generation. Just bytecode stuff. If you ever read the PHP
or Lua manual you should know what kind of predefined types exist
there, what are there limitations and how operators react given
certain types. Without operator overloading ClassA + ClassB will
always print an error 'cannot do something with this...'.

The thing is I just wanted to ask about general preferences when it
comes to the syntax. Also I'm interested if anyone can see any
penalties when it comes to the per block vs global/local scopes. Maybe
performace or any other problems. These are my problems as for now,
leave the conversion problem alone ;) As I said earlier it WILL be
predefined so that the user knows what to expect and when.

Johannes

unread,

Sep 6, 2008, 6:43:11 AM9/6/08

to

On Sep 4, 1:59 pm, lican <lica...@gmail.com> wrote:
> a = 100.0
> s = "string text";

> d = s + a; // d == "string text100.0"

Personally, I'd prefer a different operator for string concatenations
(like D's "~"). That way you can prevent the following code from
misbehaving:

print("The round is: " + round + 1);

Instead of calculating round + 1 first, round is translated into a
string and 1 is translated into a string, which obviously gives wrong
results. Of course one can put parentheses around round + 1, but this
kind of bug is pretty much a newbie-trap and this can be avoided with:

print("The round is: " ~ round + 1);

Just make the priority of ~ low enough that it is called last before =
and everything should sort itself out.

Aleksey Demakov

unread,

Sep 6, 2008, 1:02:32 PM9/6/08

to

On Wed, Sep 3, 2008 at 12:22 AM, Dmitry A. Kazakov
<mai...@dmitry-kazakov.de> wrote:
> On Tue, 2 Sep 2008 00:54:12 +0700, Aleksey Demakov wrote:
>> Suppose we have a method that simply returns the sum of two args:
>>
>> m(a, b) { return a + b; }
>
> First of all, this is not a specification of the method. The types of the
> arguments and of the result are unspecified, as well as the covariance of.
> So one cannot tell which combinations of arguments and result types are
> involved.
>

We were talking in the context of a scripting language. If you notice
the variables in this language are declared as "var a = 5;". No type
tag whatsoever. I don't see a reason why for such a language
parameters are to be declared with the type tag. But if you wish I
could write the method definition like this:

Float m(Float a, Float b) { return a + b; }

> As for multimethods, yes there are six combinations of 3 (2 arguments + 1
> result) x 2 types. The semantics of m shall unambiguously define all six.
> But again it is not the language business, except for the predefined
> operations of course. The language shall merely allow an implementation of
> the desired semantics for all combinations in question.
>

What do multimethods have to do with this? You say that a language
"with an elaborated types system" should make integer a subtype
of float. My understanding of subtyping is that a subtype might go
anywhere the supertype could go. I conclude that there is no need for
any 6 combinations. The method defined for float arguments should
somehow handle int arguments too.

> The result is involved, provided that m is covariant, or else when Float is
> the ancestor of Integer and the result is contravariant. As for the
> semantics (numerical value of the result), see above.
>

I don't get what "covariant" or "contravariant" mean. I asked a simple
question. Please tell me what a language "with an elaborated types
system" will do for m(1, 2000000000,0) .

Regards,
Aleksey

Robert A Duff

unread,

Sep 7, 2008, 12:31:49 PM9/7/08

to

lican <lic...@gmail.com> writes:

> If you don't like the float I can make it double or even long double.
> That really doesn't matter right now. I'll get to that while
> implementing operators in the VM.
>
> And please do remember that this is meant to be a scripting language
> with dynamic typing, running on a VM.

Does "scripting language" mean "efficiency does not matter much"?
If so, then why bother with floating point? Why not use exact
rational arithmetic instead?

- Bob

Dmitry A. Kazakov

unread,

Sep 8, 2008, 3:44:31 AM9/8/08

to

On Sun, 7 Sep 2008 00:02:32 +0700, Aleksey Demakov wrote:

> On Wed, Sep 3, 2008 at 12:22 AM, Dmitry A. Kazakov
> <mai...@dmitry-kazakov.de> wrote:
>> On Tue, 2 Sep 2008 00:54:12 +0700, Aleksey Demakov wrote:
>>> Suppose we have a method that simply returns the sum of two args:
>>>
>>> m(a, b) { return a + b; }
>>
>> First of all, this is not a specification of the method. The types of the
>> arguments and of the result are unspecified, as well as the covariance of.
>> So one cannot tell which combinations of arguments and result types are
>> involved.
>
> We were talking in the context of a scripting language. If you notice
> the variables in this language are declared as "var a = 5;". No type
> tag whatsoever.

The language is typed, thus a must have a type. Whether that type is
manifested in the declaration or else inferred from the type of the
initializing expression is no matter.

> I don't see a reason why for such a language
> parameters are to be declared with the type tag. But if you wish I
> could write the method definition like this:
>
> Float m(Float a, Float b) { return a + b; }

You should also define covariance. When Integer is derived from Float which
parameters (arguments and the result) become Integer and which are not. In
other words in which parameters m is a method of Float. Method = covariant.

>> As for multimethods, yes there are six combinations of 3 (2 arguments + 1
>> result) x 2 types. The semantics of m shall unambiguously define all six.
>> But again it is not the language business, except for the predefined
>> operations of course. The language shall merely allow an implementation of
>> the desired semantics for all combinations in question.
>
> What do multimethods have to do with this? You say that a language
> "with an elaborated types system" should make integer a subtype
> of float. My understanding of subtyping is that a subtype might go
> anywhere the supertype could go.

Right, but this does not mean that you can pass Integer where Float is
expected. That is a type error. Substitutability is achieved:

1. by introducing new instances of polymorphic m. For example, when m is a
method in the argument a, that means that m is defined on the class Float
and for each type from the class there exists an instance of m with a of
this type:

... m (Float a, ...);
... m (Integer a, ...);

When Integer inherits m without overriding, then

... m (Integer a, ...);

is defined as a composition of Float m and a conversion from Integer to
Float.

2. by using operations defined on the whole class.

> I conclude that there is no need for
> any 6 combinations. The method defined for float arguments should
> somehow handle int arguments too.

They cannot, it were untyped. It must be either 1 or 2. You have to choose
what case m represents in which parameter.

>> The result is involved, provided that m is covariant, or else when Float is
>> the ancestor of Integer and the result is contravariant. As for the
>> semantics (numerical value of the result), see above.
>
> I don't get what "covariant" or "contravariant" mean.

See above. Covariant = method, which is inherited and can be overridden.

> I asked a simple
> question. Please tell me what a language "with an elaborated types
> system" will do for m(1, 2000000000,0) .

I will allow the user to define the semantics of above in accordance with
the meaning of m. I don't know what m is supposed to do. Does it model
addition? In R? N? C?

Again, the point is that if m is a method in all arguments and the result
then overriding it gives you an opportunity to implement any semantics you
want:

Integer m (Integer a, Float b) { ... } // When integer expected
Float m (Integer a, Float b) { ... } // When float expected

Felipe Angriman

unread,

Sep 8, 2008, 10:54:10 AM9/8/08

to

On Sat, Sep 6, 2008 at 2:02 PM, Aleksey Demakov <adem...@gmail.com> wrote:
>
> I don't get what "covariant" or "contravariant" mean. I asked a simple
> question. Please tell me what a language "with an elaborated types
> system" will do for m(1, 2000000000,0) .
>

Type Theory Discussions always get a bit dense. Discussions of what
covariant and contravariant mean are better treated in books.
If you like to get deeper into the subject you should check this book

Benjamin C. Pierce
The MIT Press
Types and Programming Languages

This book requires some mathematical background in order to be read.
IMO it should give you a great introduction to type theory.

With repect to

> Now what will happen if one argument is int and another
> is float ?
>
> m(1, 2000000000.0)
>

> will it be 2000000001, or 2000000000,0, or a runtime error?

if types are evaluated at runtime the result i would expect it to return
the most restrictive type capable of holding the result,
in this case an integer (2000000001).

Otherwise i would expect it to return float. A Runtime error would
not be prudent if you are using IEEE 754 floating point arithmetic.
If an overflow occurred you would have +Inf as result (I think) and
of course you could check this value to see if and overflow
has taken place indeed.

I hope I was useful.

Regards,
Felipe

Sammy

unread,

Sep 10, 2008, 11:34:36 AM9/10/08

to

: I don't get what "covariant" or "contravariant" mean.

There is some info on Wikipedia:

http://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science)