Don't expect too much as the site is new. I'll move some of the
external stuff in to it over time and will get round to improving the
wording of some of the principles. But it does show the basics. Thanks
to Robbert for providing the domain name.
For reference, discussions that led to the site are in the threads
linked below.
http://groups.google.com/group/comp.lang.misc/browse_frm/thread/93501b369a148fd6
http://groups.google.com/group/comp.lang.misc/browse_frm/thread/4500919f2cd56c96
http://groups.google.com/group/comp.lang.misc/browse_frm/thread/1b0a27386dd6aa67
James
I had just a short glimpse on it.
IMHO the most important hint, for somebody who wants to design his
own language, is missing:
DON'T DO IT
People should explore their motives to design a new language, before
they start to do it (to learn about language design or compiler /
interpreter writing, to make the world a better place, to become
famous). They should be aware of the alternatives (preprocessor,
macros or library for existing language, extension for existing
compiler/interpreter, joining another new-language project). They
should be aware which goal their language project has (will be
thrown away after the exam, only for private use, to be released on
the internet, to reach world domination). Will the designer be
able to implement the design (projects with great ideas, which
"just" need somebody to implement them, will usually fail). They
should ask themself: Why should anybody switch to this language.
What makes it unique.
Your page about source portability concentrates on hardware
portability issues. There are also other portability issues, which
IMHO have became more important than hardware portability issues.
I speak of portability between operating systems and libraries.
Many libraries are not available everywhere.
Some people distinguish between language and runtime library and
consider a language portable, when its core part is portable. But a
programmer is only interested to know, if his program can be moved
to another computer without effort.
A program which uses Win32 calls will not run on Linux and vice
versa (I know about Wine, but this would need that a language
integrates somehow with Wine).
Even when no OS specific functions are called, some OS specific
things, like path delimiters and drive letters, can still lead to
unportable programs. Even system calls which seem portable still can
make problems (e.g.: Under Linux you can open a directory as file,
while under Windows this fails).
The pldev page about Error location led me to some problem:
What should happen, when you read numbers from a file, and there is
no number? Most people will probably agree that throwing an
exception is the right thing to do. But when the file is connected
to a keyboard the program should probably not terminate, when
the user writes a letter instead of a digit. Requesting that every
read from the keyboard must be followed by exception handlers, seems
a little bit heavvy.
For that reason I decided to put an exception handler in the read
function. This exception handler sets a flag, when the conversion
(to an integer or some other type) fails. That way an iteractive
read of a number either reads the number or leaves the number
unchanged and sets the flag. This seems elegant for interactive I/O,
but could lead to hard to find errors when reading data from files.
I see two possible solutions:
- Use some file mode (conversion exceptions caught or not),
which needs to be set up.
- Use different read functions with and without catching of
conversion exceptions.
I would be pleased to get some feedback about this.
Greetings Thomas Mertes
--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
>> Because it's yours! That won't be an advantage to anyone else of
>> course...
>
> The implementer of a language will probably use his
> language. :-) But he (she) might not switch to it.
(I would; it's just so much more 'comfortable' than anything else. Although
that also applies to other kinds of software..)
> The question (Why should anybody ...) is tailored for
> people who want to release their language and who want
> that their language is actually used.
I created a scripting language to go with my applications. The only way to
extend the application was to use that language. So I did have 'users', even
if they didn't exactly choose it out of free will... I wasn't bothered
however about whether the language (which was powerful enough to do general
stuff) was used much elsewhere; it's other people's loss if I could be more
productive than them!).
>> So this leads to someone creating their own abstract library (for
>> graphics,
>> files, whatever) which is not tied to an OS and which can be made to run
>> on
>> a range of systems, ...
>
> This is exactly the stategy I use for Seed7.
OK, so why can't someone create a new language for exactly the same reasons?
Or are you saying you wouldn't have created Seed7 in hindsight...?
>> ... rather than relying on someone's huge library which only
>> runs on machine X, or someone else's even huger system which is
>> cross-platform, but has to be programmed in C++...
>
> Functions from the C++ library could be called from the
> new designed language. To use this stategy some conditions
> must be fulfilled:
>
> 1. The C++ library must be available on the supported
> platforms (ideally under a free license). And it
> should be released together with the language
> implementation.
...
C++ is pretty much impossible if you don't use, understand, or like C++,
especially trying to interface from a different language.
I've wasted plenty of time trying to link into C++ libraries via C-style
interfaces (GDI+ for example), now I don't even bother. If it's in C++, then
forget it.
--
Bartc
> The first piece of code I ever saw in action was the following Basic
> program
>
> 10 for i=1 to 10
> 20 print i,sqr(i)
> 30 next i
>
> Now let's take a piece of equivalent C code.
> int i;
> for (i=1; i<=10; ++i)
> printf("%d %f\n",i, sqrt(i));
>
> The syntax I used was along the lines of:
>
> for i:=1 to 10 do
> println i,sqrt(i)
> end
And you think that is now good syntax? It's cryptic, verbose, not easily
comprehendable, inelegant.
(The C header thing is a separate issue not to confuse it with the syntax
of a language proper).
The Basic I used first had left$, mid$ and right$ which, IMHO, are
*awful*. What did you have in mind?
> > Basic loops are great
> > for the simple 1 to 10 or 10 to 0 in steps of 2 etc, but are not
> > general. If you want to express a more complex loop in Basic you end
> > up making the code MORE complex than it can be in, say, C.
> > Is it a case of: you can have it simple or complete but not both?
>
> I keep my iterative loops simple: they only ever iterate over an integer
> range, and always in steps of 1, either up or down (steps other than one are
> so rare, I've eliminated that option).
This leads to some familiar questions:
1. What happens to the loop control variable outside the loop? If it
is in scope after the loop finishes what value does it have?
2. What happens if the programmer changes or tries to change the
control variable value in the body of the loop?
3. What happens if the programmer changes or tries to change the far
bound?
4. What should programmers generally do if they want step sizes other
than one and does the compiler cope well if they make inefficient
choices? In other words, when a more complex loop control is needed do
you encourage a programmer to write (for i ... ; real_index = equation
involving i; ...) or to write (real_index := start; for i ...;
real_index := real_index + step; ...), i.e. to maintain real_index
manually? Or, at what point should they switch to a while loop?
All of these are complications which can occur when a Basic-style for
loop is used. It looks friendly in the, er, basic case. And it may
look simple. But it isn't. It has complexities that are hidden. This
is a case of a non-manifest interface. Certain choices are made by the
language designer but they are not at all apparent in the syntax.
C has its own plethora of complex hidden rules in many areas but it
deals fairly well with *all* of the above questions in its loop
construct.
One additional benefit of C's for loop: it makes fine adjustments
simple by allowing such tests as less than (<) as well as less than or
equal to (<=). Thus it avoids some of the "minus one" bounds
adjustments such as "for i = near to far - 1" that a Basic-type for
loop sometimes/often requires.
> Anything more complex, is a different kind of statement, for example C's
> general purpose loop 'for(a;b;c)' I write as 'while a,b,c do'.
Do you mean you write 'for(a;b;c)d' as 'a;while(b)d;c do'?
James
I am not sure of the point of all this...
maybe at the time they were right, as pretty much the only other options
were FORTRAN and COBOL (mostly in the 60s), which were not necessarily
"clearly better" than assembler.
C didn't come around until the 70s... (so its later existence or
non-existence was of no relevance to them).
in any case, it is not clear at the moment that there is anything
clearly better (who knows about 10 or 20 years from now, that is the
future, not "right now").
BASIC-like, Python-like, or Lua-like syntax is not clearly better, and
on many fronts I might contend that they are actually worse on various
fronts.
hence, the issue is as it is.
for the moment, C-family syntax (extended to include the likes of Java,
C#, JavaScript, ActionScript, ... as well) seems "reasonably close" to
optimal (the rest then is "fine-tuning").
not that one has to treat syntactic details or language trivia as a
religion or anything, but rather refrain from making significant changes
unless there is a good reason for doing so (for example, my choice of
"/expr/ as /type/" and "/expr/ as! /type/" cast syntax over
"(/type/)/expr/" syntax for sake of avoiding parser ambiguity related to
the use of function-returning-expressions and curried functions and
similar...).
the rest is pedantics:
some people don't like ';' so they make line-breaks significant... then
you have all of these funky (and often subtle) issues related to
formatting (the parser doesn't always parse the code as intended, ...);
some people like keywords more (making very "wordy" languages), and
others prefer to avoid keywords wherever possible and have nearly
everything as operator glyphs;
...
so, at this point, things are as they are...
> BASIC-like, Python-like, or Lua-like syntax is not clearly better,
No? It's certainly more informal and therefore more friendly.
> for the moment, C-family syntax (extended to include the likes of Java,
> C#, JavaScript, ActionScript, ... as well) seems "reasonably close" to
> optimal (the rest then is "fine-tuning").
There's one thing wrong with thinking C-family syntax is 'optimal': if you
ask anyone to write an algorithm in pseudo-code, the chances are they will
write something that looks like Basic, Algol, Pascal, Lua, Python ... in
fact pretty much anything except C!
This might give a clue that perhaps that perhaps C-family languages aren't
the most natural way of writing code. (In fact even the C pre-processor
prefers to use #if ... #endif!)
(Understandly, C-family programmers will defend their syntax to the death,
even to the extent of pretending that there's nothing really wrong with C's
type-declaration syntax, a format so convoluted and impossible, it's
necessary to employ third-party utilities to disentangle their meaning!)
However, syntax is just syntax, it's merely a superficial layer over the
language proper, and it ought to be possible to just switch from one syntax
style to another, but that hasn't really happened. (It's not a wild idea:
I've been pretty much writing C code, but with Algol-like syntax, for
years.)
> the rest is pedantics:
> some people don't like ';' so they make line-breaks significant... then
> you have all of these funky (and often subtle) issues related to
> formatting (the parser doesn't always parse the code as intended, ...);
> some people like keywords more (making very "wordy" languages), and others
> prefer to avoid keywords wherever possible and have nearly everything as
> operator glyphs;
Exactly, so why not have a choice? A choice of selecting a preferred syntax
instead of having to completely switch languages.
--
Bartc
>> No? It's certainly more informal and therefore more friendly.
>>
>
> BASIC code often ends up requiring explicit line-continuation characters;
C uses line-continuation too.
> Python's indentation-based syntax is a source of many problems and much
> controversy; Lua has lots of keywords and generally looks funky;
(Is 'funky' good or bad?)
>
> also, many other attempts at "soft line-break" languages have ended up
> creating lots of awkward ambiguities, and often require funky rules so
> that the parser doesn't get confused.
For years I used this rule for line-breaks: "End-of-line is converted to a
semicolon, unless preceded by a comma or a "\" line continuation".
(Although the syntax has to be tolerant of superfluous semicolons.)
This has worked well so far: even though my syntax is semicolon-separated,
you can look at a thousand lines of code and have trouble finding even one!
In fact I'm thinking of adding tokens like "(" and "[" to the comma, where
further input is obviously expected.
>> if
>> you ask anyone to write an algorithm in pseudo-code, the chances are
>> they will write something that looks like Basic, Algol, Pascal, Lua,
>> Python ... in fact pretty much anything except C!
> my own pseudo-code tends to look mostly C-like (actually, my own
> pseudo-code tends to mostly resemble something Java-like, but sometimes
> with features which wouldn't map well to real languages, such as
> conditionals or procedural logic in structures, ...).
That's fair enough, but bear in mind that pseudo-code often has to be impart
an idea to someone without knowledge of the idiosyncrasies of your syntax.
(Pseudo-code also tends to use higher-level expressions than might be
available in an actual language, although the higher level the language, the
closer it will map.)
> as-is, my own language uses:
> ifdef(...) { ... }
> and:
> ifndef(...) { ... }
That's good, at least the syntax is consistent!
>> However, syntax is just syntax, it's merely a superficial layer over the
>> language proper
> for many people though, the syntax is the language (many people express
> their thoughts/memories/... more in syntax than in semantics).
I agree there might be issues in putting the concept across.
>>> the rest is pedantics:
>>> some people don't like ';' so they make line-breaks significant...
>>> then you have all of these funky (and often subtle) issues related to
>>> formatting (the parser doesn't always parse the code as intended, ...);
>>> some people like keywords more (making very "wordy" languages), and
>>> others prefer to avoid keywords wherever possible and have nearly
>>> everything as operator glyphs;
>>
>> Exactly, so why not have a choice? A choice of selecting a preferred
>> syntax instead of having to completely switch languages.
>>
>
> well, there are issues with this:
> one would likely need a standardized AST or some kind of standardized
> meta-parser system;
> it could lead to an inability for people to nearly so readily copy-paste
> code between projects or between programmers (this is why standardized
> syntax, APIs, and coding conventions, are often so much emphasized).
I had in mind some switch on an IDE that would instantly convert from one
form to another. Possibly copy-and-paste might be a problem when dealing
with partial code-fragments (which can't be converted).
I can see further problems also as I prefer case-insensitive syntax, C-style
(and quite a few others) are case-sensitive, and converting between the two
is not trivial.
> as-is, I have seen people essentially get into fights over things which
> the compiler doesn't give a crap about, such as "where the brace goes" and
> similar...
Don't IDEs already have some way of reformatting code according to one style
or another?
> (in many of my own parsers, the ';' is in-fact optional, as are most uses
> of commas, as the parsers generally involve "whitespace heuristics", but
> they are generally recommended).
If you get rid of commas, some parsing possibilities disappear, for example
I can write "25 cm" (a constant length), which is otherwise parsed as "25,
cm".
--
Bartc
Suck? Suck the most? Or, suck, but suck much less than the other choices?
Lets start with the two most widespread languages: C and Forth.
C - concisely uses the following for all blocks
{ }
Forth - has no syntax so a numerous variety Forth words are used to delimit
blocks
BEGIN AGAIN
BEGIN UNTIL
BEGIN WHILE REPEAT
IF ELSE THEN
DO LOOP
: ;
etc.
I had to look these up. It's either been too long since I coded them, or I
don't program in them.
Pascal -
begin end
Fortran -
do enddo
Cobol - word delimited
PERFORM VARYING
EVALUATE WHEN
etc.
Lisp -
( )
Lua - similar to Forth, delimited by various words ...
function end
for do end
if elseif else end
Perl -
{ }
AWK -
{ }
Java -
{ }
Javascript -
{ }
It seems the curly braces might be winning, and where they aren't parens are
... ;)
Rod Pemberton
> OK how about some code comparisons? Using no language in particular,
>
> left$(a$, 7)
> vs
> a$[.. 7]
>
> mid$(a$, 2, 3)
> vs
> a$[2 .. 3]
> So, do you dislike the second of each example? If so, why? They look
> clear to me and are essentially a single notation, which I think is a
> benefit.
OK, you're using slicing notation.
That's fine, but how do you select, say, the last 4 characters of a string?
Or everything *except* the first character?
Don't know how Basic's right$() deals with that second example, but with
*my* left/right functions, that would be right(a,4) and right(a,-1).
And my left()/right() can also specify a substring *longer* than the
original, and can pad out with another string, by adding an extra argument.
So functionality has been easily extended using this format.
With slicing notation, I would have to write those examples as follows (for
strings, I needed a 'dot' to break apart an object normally considered a
single entity):
a.[a.len-4..a.len]
a.[2..a.len].
With a tidier way of writing the end-bound, say by writing "$", or just
omitting it, and getting rid of that ".", the examples might become:
a[$-4..]
a[2..]
Definitely shorter, but much more cryptic too. These forms also depend on
the zero-based or one-based nature of the subscripts. In fact I created
special notation for these left/right slices, and the examples become:
a[:4]
a[:-1]
But even here, it's not clear whether that last one should be a[-1:] or
a[:-1] (probably the latter, as the slice is to the right in both cases).
And when specifying a larger slice, this form seems to demand that a bounds
error should be raised.
Finally, when porting an algorithm to another language without these
features, then left()/right() functions can be more easily emulated than
dedicated syntax. So all good reasons for retaining function format!
--
Bartc
suck. No additional qualifiers needed.
> Lets start with the two most widespread languages: C and Forth.
Forth widespread? Where, in Apple firmwares ? :)
> I had to look these up. It's either been too long since I coded them, or I
> don't program in them.
>
> Pascal -
> begin end
That's what I'm most familiar with. Though I like the Modula2 system more,
snce it solves dangling els problems.
> It seems the curly braces might be winning, and where they aren't parens are
> ... ;)
Not all popular things are good things. Take for instance Berlusconi :-)
So my remark was about quality, not quantity.
a$ = left$(z$,4)
> Or everything *except* the first character?
a$ = left$(z$,len(z$)-1)
>
> Don't know how Basic's right$() deals with that second example, but with
For every right$() there is (obviously) a complimentary left$().
> *my* left/right functions, that would be right(a,4) and right(a,-1).
>
> And my left()/right() can also specify a substring *longer* than the
> original,
Something longer than the original would not be a substring but a superset
of the original string. Give me an example of what you want and I will
show you how it would be done using the BASIC String Functions.
> and can pad out with another string, by adding an extra argument.
> So functionality has been easily extended using this format.
See above. Easily done with the BASIC String functions (which consist
of much more than right$(), left$() and mid$().
>
> With slicing notation, I would have to write those examples as follows (for
> strings, I needed a 'dot' to break apart an object normally considered a
> single entity):
>
> a.[a.len-4..a.len]
> a.[2..a.len].
>
> With a tidier way of writing the end-bound, say by writing "$", or just
> omitting it, and getting rid of that ".", the examples might become:
>
> a[$-4..]
> a[2..]
It is probably a matter of taste, but IMHO the BASIC examples are probably
a lot easier to understand for someone coming in from the cold. :-)
>
> Definitely shorter, but much more cryptic too. These forms also depend on
> the zero-based or one-based nature of the subscripts. In fact I created
> special notation for these left/right slices, and the examples become:
>
> a[:4]
> a[:-1]
>
> But even here, it's not clear whether that last one should be a[-1:] or
> a[:-1] (probably the latter, as the slice is to the right in both cases).
> And when specifying a larger slice, this form seems to demand that a bounds
> error should be raised.
>
> Finally, when porting an algorithm to another language without these
> features, then left()/right() functions can be more easily emulated than
> dedicated syntax. So all good reasons for retaining function format!
But they assume that all strings are arrays of characters and while true
in C, that is not a common convention as well as being one that most non-C
programmers consider a C weakness, not strength.
bill
--
Bill Gunshannon | de-moc-ra-cy (di mok' ra see) n. Three wolves
bill...@cs.scranton.edu | and a sheep voting on what's for dinner.
University of Scranton |
Scranton, Pennsylvania | #include <std.disclaimer.h>
>> That's fine, but how do you select, say, the last 4 characters of a
>> string?
>
> a$ = left$(z$,4)
That's fine (although you need right$ there!), but the discussion was about
about using function-style as opposed to indexing and slicing notation.
>
>> Or everything *except* the first character?
>
> a$ = left$(z$,len(z$)-1)
(Again, you need right$ here. And it's no longer quite as sweet as writing
right$(z$,-1) as convention for excluding rather than including characters.)
>> and can pad out with another string, by adding an extra
>> argument.
>> So functionality has been easily extended using this format.
>
> See above. Easily done with the BASIC String functions (which consist
> of much more than right$(), left$() and mid$().
Example (no longer Basic syntax):
a:="123"
println right(a,6,"*")
Output:
***123
It was just an example of something awkward with slice notation.
>> a[$-4..]
>> a[2..]
>
> It is probably a matter of taste, but IMHO the BASIC examples are probably
> a lot easier to understand for someone coming in from the cold. :-)
Exactly my point of view! And they can be more functional too.
--
Bartc
> Rod Pemberton wrote:
>> I was just reminding James
>> that strings are not a fundamental type in some languages.
>
> More vague verbage: "fundamental". Isn't a string always a "fundamental"
> type? ;) It's just a holder for data. How more fundamental can a type be?
The string type is normally defined in terms of other types, e.g. as an
array of characters, which makes the index type, the character type and
array type more fundamental than strings. Not all arrays are strings, not
even all character arrays are.
>> I put arrays in quotes for a reason: C doesn't actually have them
>> either.
>
> C "arrays" are just strings!
C "arrays" are pointers. C does not have string type. It has string
literals of some unnamed type, which cannot be used explicitly as a proper
type and implicitly converted to char*.
First, I rarely ever type the end btw, and the begin not that much either.
They are typically completed by the (Delphi or Lazarus) ide.
And if you are a keystroke counter, IDE is also important in handling the
number of keys needed for identation
> if ... then begin .... end else ...
>
> as 'then' and 'else' are already perfectly good delimiters.
No, they are not redundant per se if not every IF mandatory has an ELSE, and
you allow empty blocks and/or blocks without begin..end. It gets you into
trouble with nesting, and the problem is called dangling ELSE. Both Pascal
and C suffer from it and have workarounds.
As said earlier, the successor to Pascal fixes this in the way I like best:
* begin..end is kept for the function scope. The "end" is followed by the
function name, like end functionname;
* For other blocks "begin" is dropped and end is mandatory.
The second bit is the nice part. Readable, reasonbly slim and never again
expand a single line to a block (e.g. to add a log msg). IMHO still the
best system. Unfortunately, compiler quality on Modula2 was bad, so I ended
up with the next best thing, Pascal.
For the people interested in the math, the number of characters is C:3 vs
M2: 3-4.
C has { } and a mandatory semi colon before the ;, Modula2 has three (end),
followed by a semicolon only if the next line is a statement (and not e.g.
another end)
The first bit is a mixed blessing, meant to make sure that an error wrt
block nesting will always be detected at the function's end, and never
beyond. Very secure, but back then I considered it a nuisance when renaming
functions, and in the relatively safe Wirthian languages the error was
99.9% on the next line (next function declaration) anyway. So I then
considered the tradeoff bad.
OTOH, with a decent IDE (with editing based on syntax parsing) it might be
not so much of a problem. (since you could easily create an editor that
would rename the functionname after "end" too when renaming the function,
assuming that the general blockstructure is intact)
>> if ... then begin .... end else ...
>>
>> as 'then' and 'else' are already perfectly good delimiters.
>
> No, they are not redundant per se if not every IF mandatory has an ELSE,
> and
> you allow empty blocks and/or blocks without begin..end. It gets you into
> trouble with nesting, and the problem is called dangling ELSE. Both
> Pascal
> and C suffer from it and have workarounds.
>
> As said earlier, the successor to Pascal fixes this in the way I like
> best:
>
> * begin..end is kept for the function scope. The "end" is followed by the
> function name, like end functionname;
> * For other blocks "begin" is dropped and end is mandatory.
>
> The second bit is the nice part. Readable, reasonbly slim and never again
> expand a single line to a block (e.g. to add a log msg). IMHO still the
> best system. Unfortunately, compiler quality on Modula2 was bad, so I
> ended
> up with the next best thing, Pascal.
I thought Algol68 sorted this out pretty well.
However, every 'if' statement needs to end with 'fi', not to everyone's
taste. (In my versions of the syntax, I also allow 'end', 'endif' and 'end
if', just to provide a choice.)
Anyway, it means many places where you might sometimes write more than one
statement, are already delimited, or bracketed. So you can change from 1 to
2 or N statements, and back again, without inserting and remove blocks.
Quite luxurious! Of course not all of us use fancy IDEs..
Also there are a few places where you wouldn't usually expect multiple
statements, where they will also work thanks to being naturally delimited:
in between 'if' and 'then' for example. It all helps to give a nice,
expressive, orthogonal language.
--
bartc