Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A website or wiki on language design

6 views
Skip to first unread message

James Harris

unread,
Dec 12, 2010, 5:56:18 PM12/12/10
to
I've been thinking of this for some time but wasn't sure there was
much interest. Given the surprising recent activity on comp.lang.misc
now is possibly a good time to raise the idea. Two questions:

1. Does anyone else here have an interest in setting up a website or
wiki on the topic of programming language design, or is there an
existing site that's good enough to join and welcomes new authors?

2. Whether you would want write access or not do you have any thoughts
on what info should be included?

Whether a web site or a wiki my feeling is that it should probably be
for a closed group of people to update - rather than world-writeable.
Other opinions welcome.

James

BGB

unread,
Dec 13, 2010, 1:26:29 AM12/13/10
to
On 12/12/2010 3:56 PM, James Harris wrote:
> I've been thinking of this for some time but wasn't sure there was
> much interest. Given the surprising recent activity on comp.lang.misc
> now is possibly a good time to raise the idea. Two questions:
>
> 1. Does anyone else here have an interest in setting up a website or
> wiki on the topic of programming language design, or is there an
> existing site that's good enough to join and welcomes new authors?
>

cool idea...

I don't know of any like this, and personally lack either the expertise
or resources to manage something like this.

a wiki running off an old laptop (serving as the server) on a slow DSL
connection with dynamic DNS where half the time either the router
doesn't route HTTP or the DNS is stale because the ISP causes the IP to
change often... yeah, this would turn out well...

but, yeah, if something like this was around, this would be nifty...


well, there is FONC / VPRI:
http://en.wikipedia.org/wiki/Viewpoints_Research_Institute

but, activity there seems far less than even here on comp.lang.misc...


and, also, their stuff is in "academic paper format", which is IMO far
more effort than I am willing to bother with (it asks enough for me to
even summarize thoughts into a form which would make a good article of
blog post...).

hell, maybe I almost need a blog?... it would probably still do a better
way of getting some of my ideas out there than either usenet or my own
projects is proving to be...

> 2. Whether you would want write access or not do you have any thoughts
> on what info should be included?
>

well, at this point I am a little more interested in matters of VM
design than of language design per-se, as at this point much of language
design boils down to "how can I make something absurd and push it off as
a good idea" (since most of what would make a "good" language design, in
general, is apparent mostly through observation of the trends in
existing technologies).

as it so happens, thinking in terms of "the language" seems to be the
wrong level of abstraction, with the language more serving as a means of
most usably exposing functionality.

the VM is then a more interesting problem IMO, since this is the
manifestation of the core sets of functionality which may be exposed by
a language.


meanwhile, the endless treadmill about matters of syntax and parser
generators is no longer of much personal interest (stale...).

> Whether a web site or a wiki my feeling is that it should probably be
> for a closed group of people to update - rather than world-writeable.
> Other opinions welcome.
>

either way, but ideas should be easy to express, as any kind of long
awkward process to publicize ideas limits the variety of ideas which may
be expressed.

granted, making things slower and more awkward does increase the "signal
to noise ratio", but many good ideas may be lost, and it is a question
of which is better.

an extreme example of this would be something like the JVM, where
advancement is very slow and the process very bureaucratic (where the
implementation of *any* new core features takes easily 5-10 years or more).


the other extreme is where all the parts are simple and specifications
and standards are loose, so people can more easily beat something
together and experiment with different ideas, but often efforts are
sufficiently fragmentary as to limit overall progress...

everyone and their dog has their own custom Scheme implementation, and
with a sufficient amount of luck, small code fragments may even still
work between them...


at this point, I am more in favor of "micro-specs", where a single piece
of functionality is documented in a reasonable level of detail (and is
ideally both elegantly simple and general purpose), but where the usage
of the feature (and its relation to other features) is more open-ended.


to some extent, C is a micro-spec (though on the border), since enough
freedom is left in the "C architecture" to basically use it for damn
near any task one sees fit (and there is no "one true C").

this is unlike, say, Java, where the only "true" implementation is the
Sun/Oracle JDK, and all others are essentially regarded as copy-cats,
and there is no real community architecture apart from sucking-up to
Oracle and the JCP (one can't really officially have a feature until it
both goes through the process and Oracle implements it in the JDK).


for example, to some extent, I like many parts of both JVM and .NET
design (both have done some things well), but both impose a certain
macro-structure which subsequently hinders what can be done with it (in
the general sense, one has to pick "one or the other", with all of the
baggage that comes with them, and by no means can one "mix and match"
parts to their own liking).

the result then is that I am left not really liking either of them,
since both have limitations for which there is no provision to address them.


and, in all of this, I am finding that the bigger issues are not
technical, but psychological:
people just can't see these things as components which could be used
freely, but only in terms of the "platforms", and any implementation of
a feature is seen as an attempt to emulate said platform.

this is, as I see it, wrong...


the platforms gloss over one problem and in the process give us a much
bigger set of problems... this whole architecture is, IMO, flawed...

and then LLVM and Mono arise, and basically fall into the same sorts of
traps...

LLVM is open, at least as far as the code itself goes, but fails in that
LLVM *is* the implementation (its components are only really seen in
relation to the whole).

Mono, OTOH, falls into the trap that the code is basically poor and
tangled enough that it makes using it more of an "all or nothing"
proposition in many respects (I am not certain how people see it, but
the parts of the codebase I have looked at don't give me a whole lot of
room for optimism...).


I would like a "platform" free of centralized control (and ideally, even
a centralized implementation), where experimentation and alternate
implementations are readily allowed, where there is no "one true
implementation", and there is no real "authority" apart from possibly
that of providing information and implementation advice (and where one
has an option other than "either live with it or go elsewhere" if they
have a problem with the architecture).


I don't expect such an architecture can be created in exactly the same
way as one of the major platforms, but one can at least have some hope.


as I see it, there is no real need to entirely discard or replace the
existing platforms either, but rather one can change how they are
viewed, from being do everything monolithic platforms, to being
interconnected collections of components and technologies.


just as COFF and ELF are not the sole property of any specific compiler
or OS, so should languages, bytecodes, and IL's be liberated from their
platforms.

where, a bytecode exists apart from the languages which run on top of
it, and the means by which it is loaded and executed, and from the
run-time libraries for which it is associated.

so, one can get something JBC or MSIL without having to drag along all
of the JVM or .NET along with them...


for example, C runs on many systems and CPUs, each with a different
implementation and architecture.

and, JavaScript runs on many different web-browsers, each in turn with
their own implementations (and no one really bows down before Mozilla's
SpiderMonkey in the same way as they do for the JVM and for .NET).

there is no more "one true C implementation" than there is "one true
JavaScript implementation".


languages are part of the problem, yes, but components, APIs, file
formats, ... are another part of the problem.

granted, being a single person, I have little idea what the whole
"should" be, only that in time I am getting a better idea what sorts of
things it "should not" be...


or such...

Robbert Haarman

unread,
Dec 13, 2010, 2:58:38 AM12/13/10
to
Hi James,

On Sun, Dec 12, 2010 at 02:56:18PM -0800, James Harris wrote:
>
> 1. Does anyone else here have an interest in setting up a website or
> wiki on the topic of programming language design, or is there an
> existing site that's good enough to join and welcomes new authors?
>
> 2. Whether you would want write access or not do you have any thoughts
> on what info should be included?

I think it could be useful to have a website that tracks the state of the
art in programming languages. Programming language design requires making
a lot of decisions about a wide range of topics, and not everyone can be
an expert on every topic. The result is lots of recurring discussions, and
perhaps languages that could have been better, had their designers known
more.

So what would be interesting to me is a website that divides the subject
of programming language design into various subtopics, and for each topic,
lists the considerations, languages that have made the choice one way or
another, pointers to more information (e.g. research papers), and the
current state of the art for various dimensions.

For example, one aspect of programming language design is type checking.
This is a large and important topic that generates lots of discussion. It
can be broken down into various subtopics, e.g. static vs. dynamic typing,
weak vs. strong typing, manifest vs. implicit typing, type safety,
polymorphism (ad-hoc as well as parametric), subtypes, dependent types,
etc.

Regarding state of the art: some topics in programming language design are
the subject of active research, and once in a while a programming language
or paper comes out that pushes the boundary of what has been achieved.
Memory management, for example. Real-time garbage collectors guarantee
minimum mutator utilization, that is, a minimum percentage of time that a
program can spend on doing things other than memory management. But what
is the currently highest minimum mutator utilization that has been achieved,
and what algorithm was used?

Of course, there are also a lot of places that touch on the subject of
programming language design. This newsgroup, for one. But also:

- The C2 Wiki: http://c2.com/cgi/wiki?IdealProgrammingLanguage

- Lambda the Ultimate: http://lambda-the-ultimate.org/

- Ulf Schünemann's Programming Language Design page:
http://web.cs.mun.ca/~ulf/pld/

- Jason Voegele's Programming Language Comparison page:
http://www.jvoegele.com/software/langcomp.html

- Mike Austin's page about language design:
http://mike-austin.com/impulse/languages.html

There also used to be a page by someone who was attempting to chart all the
dimensions and choices involved in designing a programming language, but
I can't seem to find the link anymore.

As you can see, there is a lot of material already out there. I still think
another site would be interesting. Even if it's Yet Another Page About
Programming Languages, that's how the Web grows. But it would be really
useful to aggregate and summarize the enormous amount of material that is
already out there. This is an enormous task which I suspect exceeds what
a lone person can do in his copious free time, so it would probably be
a good idea to have multiple editors, and perhaps make it publically
editable.

Regards,

Bob

--
"The nice thing about standards is that you have so many to choose from."
-- Andrew S. Tanenbaum


James Harris

unread,
Dec 13, 2010, 6:03:33 AM12/13/10
to
On Dec 12, 10:56 pm, James Harris <james.harri...@googlemail.com>
wrote:

...

> 1. Does anyone else here have an interest in setting up a website or
> wiki on the topic of programming language design, or is there an
> existing site that's good enough to join and welcomes new authors?
>
> 2. Whether you would want write access or not do you have any thoughts
> on what info should be included?

Not ignoring the responses so far but just to add a couple of
suggestions for content

* Document the key principles of good languages. The book Principles
of Programming Languages by Bruce MacLennan is an excellent start and
I think others could be added.

* Make a list of useful books on the topic of language design (maybe
with cover photos?) and include opinions about each publication.

James

manso...@rediffmail.com

unread,
Dec 13, 2010, 8:38:59 AM12/13/10
to
Nice Idea..

Actually I have recently started working on such website. If you guys
are interested we can join and work together. I have already setup a
website and started adding content to it.
I have written on C to assembly code generation and C++ object model.
I have called these as "C Internals" and "C++ Internals".

Here is what the website can possibly contains -
- The books/tutorials/articles that we can write.
- Link and references to other useful websites.
- FAQ (frequently asked questions) or wiki
- And possibly a forum.

Actually you can find most of the things on the Internet. But my idea
is make a website that can become one place for getting all
informations.

By the way, here is the link to my website-
http://www.avabodh.com/

And here is the link to on-line books that I have written -
http://www.avabodh.com/cin/cin.html
http://www.avabodh.com/cxxin/cxx.html

BGB

unread,
Dec 13, 2010, 2:41:13 PM12/13/10
to James Harris

meanwhile, this has got me thinking about partly reviving an older idea
of mine from around 5 years ago:
creating a BGBScript2 language, which I am currently imagining as a sort
of hybrid between BGBScript, ActionScript3, and C#.

(the original idea was more BGBScript+C+Java, but BS+AS3+C# is not that
much different).


partial reason:
basically, I am frustrated with Java and the JVM architecture, ...
(do ones' own implementation, deal with Java-heads, deal with piles of
poorly-designed VM features, ...).
at the moment, I am thinking "hell with it, back to doing my own design"
(at least I wont have to go through contortions to work around VM design
limitations).

I am probably going to use a class/package syntax similar to AS3.

probably, I will use C#-style syntax for method and variable
declarations (personal aesthetic preference, albeit a slight bit more
awkward to parse than JS/AS style declarations).

it will probably also include user-defined value types (IOW: structs).


type-model:
likely soft-static (type handling in general will be closer to C and
C++, vs Java/C# "complain about the smallest thing" behavior, and
support for dynamic types will also exist);
the type declarations will use manifest types (dynamic types will be a
base type, likely 'var');
will likely use Class/Instance OO with some prototype features (cloning,
ability to modify live objects, and support for field and method
delegation like in Self);
...


so, say:
package myapp.foo
{
import myapp.bar;

public struct Vec3
{
double x, y, z; //default=public

public Vec3() { this(0, 0, 0); }
public Vec3(double vx, vy, vz)
{ x=vx; y=vy; z=vz; }
}

public class Bar extends Object
{
public Vec3 org; //field, default=protected
}

public class Baz extends Object
{
delegate Bar bar; //delegated object
//note: delegate will be a modifier

public Baz() //constructor
{
bar=new Bar(); //new instance of bar
//bar.org: initialized with default constructor
}

public ~Baz() //finalizer
{
}

public void someMethod(int x)
{
Vec3 v;
int z;
var bif;

bif=x*y; //x*y, converted to a fixnum
org.x=buf; //coerce

v=bar[#org]; //named field access #org=symbol

//modifies v, bar.org.x unchanged:
v.y=x;
}
}
}


I may eventually properly JIT the thing, but my current leaning is to
initially just use threaded-code or similar (for an interpreter).

bytecode will probably use MSIL-like type handling (as I have come to
the opinion that JBC-style statically-typed bytecode is a PITA, and is
rendered moot when using either JIT or threaded code...).

...


or such...

Jacko

unread,
Dec 13, 2010, 3:50:44 PM12/13/10
to
> 1. Does anyone else here have an interest in setting up a website or
> wiki on the topic of programming language design, or is there an
> existing site that's good enough to join and welcomes new authors?

I've seen many language comparison sites.

> 2. Whether you would want write access or not do you have any thoughts
> on what info should be included?

Many good things have been pointed out so far. Here's how I would
roughly add an entry.

I am designig my own language at the moment.

Language: pHone.

* Mainly LISP/FORTH inspired.
* Fully one typed as chained value symbols.
* Combining the idea of pass by value or reference, into pass by
reference to symbol which itself has a name's characters' encoding and
a value chain encoding. This removes the command VAR and LIT is used
instaed, and some other commands will be changed slightly.
* It's a preparsed to threaded value chains, makig it fast.
* It's not goig to use the number 0 as a number. As 0 is NaN in many
contexts.
* Hooks will be many for user extension.
* Extending the primitive base will be allowed, but will throw a
warning.
* All errors can be configured to halt or to only terminate the
current subroutine.
* Will support namespaces/modules to reduce name chaff.
* Will support templating named instances of modules.

Cheers Jacko

BartC

unread,
Dec 13, 2010, 7:29:24 PM12/13/10
to
"Jacko" <jacko...@gmail.com> wrote in message
news:22f1dd43-c488-4736...@m20g2000prc.googlegroups.com...

> Many good things have been pointed out so far. Here's how I would
> roughly add an entry.
>
> I am designig my own language at the moment.
>
> Language: pHone.

> * It's not goig to use the number 0 as a number. As 0 is NaN in many
> contexts.

This language doesn't have zero? I can see that causing some difficulties
for those used to languages that do have it...

--
Bartc

Rod Pemberton

unread,
Dec 14, 2010, 12:50:46 AM12/14/10
to
"BartC" <b...@freeuk.com> wrote in message
news:ie6dpt$9mt$1...@news.eternal-september.org...

> "Jacko" <jacko...@gmail.com> wrote in message
> news:22f1dd43-c488-4736...@m20g2000prc.googlegroups.com...
>
> > * It's not goig to use the number 0 as a number. As 0 is NaN in many
> > contexts.
>
> This language doesn't have zero? I can see that causing some difficulties
> for those used to languages that do have it...
>

How do you represent "false" and "true"? Does anything use them? e.g.,
conditionals like if else while etc.

AIUI, most languages represent "false" as zero, and "true" as not false.


Rod Pemberton


Rod Pemberton

unread,
Dec 14, 2010, 12:51:04 AM12/14/10
to
"BGB" <cr8...@hotmail.com> wrote in message
news:4D0676D9...@hotmail.com...

>
> basically, I am frustrated with Java and the JVM architecture, ...

Limitations? Pointers?

> I may eventually properly JIT the thing, but my current leaning is to
> initially just use threaded-code or similar (for an interpreter).

The threading works well for interpreters. It's a bit slow. I'm not so
sure about languages built using a threaded interpreter though... Forth is
the only one I'm aware of, and it has issues IMO.

> bytecode will probably use MSIL-like type handling (as I have come to
> the opinion that JBC-style statically-typed bytecode is a PITA, and is
> rendered moot when using either JIT or threaded code...).

Does one really need 256 bytecodes? While that doesn't seem like much, it
is enough to give one headaches. You have to write 256 routines, maintain
256 routines, figure out when and where it's optimal to use all 256
routines... I think you'll find that only a small quantity are used
frequently. ISTM that around 32 to 40 or less is goal. Nybble code,
perhaps? One trade-off is a slower interpreter, but the fewer codes there
are the work their should be. Another trade-off is more work programming,
if code is being written for it by hand. If a compiler is generating code
for the interpreter, then there is no increased work trade-off. One way to
reduce the number of bytecodes is to reduce the number of types being used,
e.g., only one integer size. Those that are too small get promoted. Those
that are too large get truncated or eliminated from the language. This
reduces the number of operations, i.e., only one "add" routine, instead of
an "add" routine for various sized types.


Rod Pemberton


BGB

unread,
Dec 14, 2010, 4:27:19 AM12/14/10
to
On 12/13/2010 10:51 PM, Rod Pemberton wrote:
> "BGB"<cr8...@hotmail.com> wrote in message
> news:4D0676D9...@hotmail.com...
>>
>> basically, I am frustrated with Java and the JVM architecture, ...
>
> Limitations? Pointers?
>

hacked on structs and pointers, these are ugly but they work...

Java's scope model is a PITA though...
producing JBC is also much more of a pain than it needs to be IMO.


>> I may eventually properly JIT the thing, but my current leaning is to
>> initially just use threaded-code or similar (for an interpreter).
>
> The threading works well for interpreters. It's a bit slow. I'm not so
> sure about languages built using a threaded interpreter though... Forth is
> the only one I'm aware of, and it has issues IMO.
>

I tried using something akin to threaded code in implementing my x86
interpreter, with good success...

in an x86 interpreter, instruction decoding and the huge nested switches
involved are expensive, and using a semi-threaded design allowed a much
faster interpreter overall.


threading is IME faster than using a while loop and a big switch
statement...

something like:
while(*ctx->ip)
(*ctx->ip++)(ctx);

or:
while(*ctx->ip)
((*ctx->ip++)->fcn)(ctx);


seems to be (IME) generlly somewhat faster than:
while(!ctx->ret)
{
switch(*ctx->ip++)
{
case ...:
...
...
}
}


also threading allows more type specialization, without needing to
commit doing to a full JIT (since the code to transcribe the bytecode
into the threaded form may pick functions which are already specialized
to the known argument types).

>> bytecode will probably use MSIL-like type handling (as I have come to
>> the opinion that JBC-style statically-typed bytecode is a PITA, and is
>> rendered moot when using either JIT or threaded code...).
>
> Does one really need 256 bytecodes? While that doesn't seem like much, it
> is enough to give one headaches. You have to write 256 routines, maintain
> 256 routines, figure out when and where it's optimal to use all 256
> routines... I think you'll find that only a small quantity are used
> frequently. ISTM that around 32 to 40 or less is goal. Nybble code,
> perhaps? One trade-off is a slower interpreter, but the fewer codes there
> are the work their should be. Another trade-off is more work programming,
> if code is being written for it by hand. If a compiler is generating code
> for the interpreter, then there is no increased work trade-off. One way to
> reduce the number of bytecodes is to reduce the number of types being used,
> e.g., only one integer size. Those that are too small get promoted. Those
> that are too large get truncated or eliminated from the language. This
> reduces the number of operations, i.e., only one "add" routine, instead of
> an "add" routine for various sized types.
>

MSIL uses a single 'add' opcode which is type-neutral...
JBC has opcodes specialized for each type...


oddly, my bytecode formats have tended towards a largish number of
different operations, and frequently a single byte is not enough.

granted, this may be because my operations tend to often represent more
higher-level and complex operations (vs a smaller number of primitive
operations).

so, I typically allow multi-byte opcodes (usually with a scheme which
can handle 12-14 opcode bits...). I have personally never exceeded the
limits of a 12 or 14 bit opcode space.


I plan to skip out on type-specialized bytecode operations as they tend
to be a lot more painful and hinder functionality much worse than would
using type-generic bytecodes.

another reason:
often, a lot of type info will not be readily visible until JIT time,
and requiring this info to be fully visible prior to JIT hinders what
can be done in the HLL (particularly when interacting with different
languages, where providing for complete type info at this stage is...
difficult).


the cost is that it does put a limitation on a naive direct interpreter:
it would have to implement dynamic type semantics in order to work.

a JIT or threaded implementation can, however, figure out the types, and
use a more efficient implementation.

say:
int i, j, k;
...
k=i+j;

may be something like:
load #i
load #j
add
store #k


a naive interpreter would need to handle dynamic types in all of these
operations, whereas a threaded interpreter could notice that each uses a
statically-determinable type, and so use type-specific handlers
internally (it picks out the function pointers most appropriate for the
given types).

hence, the matter of being able to support type specialization (without
type-specific opcodes or a JIT) is the main merit for the interpreter at
present.


a JIT will likely handle the types simply as part of the JIT process
(since dynamic types are notably expensive to handle in the produced
machine code).

however, in many cases dynamic types can be eliminated, as they often
"shake out of the mix" somewhere along the path (even in many cases
where the code would otherwise appear to be dynamically typed...).


or such...

Torben Ægidius Mogensen

unread,
Dec 14, 2010, 4:58:05 AM12/14/10
to
"Rod Pemberton" <do_no...@notreplytome.cmm> writes:


> Does one really need 256 bytecodes? While that doesn't seem like much, it
> is enough to give one headaches. You have to write 256 routines, maintain
> 256 routines, figure out when and where it's optimal to use all 256
> routines... I think you'll find that only a small quantity are used
> frequently. ISTM that around 32 to 40 or less is goal.

You can certainly do with less than 256 byte codes, but while it is true
that your byte-code interpreter needs 256 routines, this can be an
optimisation. For example, you can have add1 and sub1 instructions in
addition to a general add instruction. These would be quite common in a
lot of code and are faster to interpret than a sequence of a constant
load and a normal add.

I have seen 16-bit "bytecode" that exploits the extra bits to encode a
lot of common special cases to get faster interpretation.

Of course, if you intend to compile your byte code to machine code
instead of interpreting it, I agree that it is better to use a few
general instructions and not add special cases -- the code you generate
from add1 and a combination of a constant load and an add should be the
same, so there is no need for special-casing.

Torben

Marco van de Voort

unread,
Dec 14, 2010, 5:58:26 AM12/14/10
to
On 2010-12-12, James Harris <james.h...@googlemail.com> wrote:
> I've been thinking of this for some time but wasn't sure there was
> much interest. Given the surprising recent activity on comp.lang.misc
> now is possibly a good time to raise the idea. Two questions:
>
> 1. Does anyone else here have an interest in setting up a website or
> wiki on the topic of programming language design, or is there an
> existing site that's good enough to join and welcomes new authors?
>
> 2. Whether you would want write access or not do you have any thoughts
> on what info should be included?

I would not make it a wiki in the wikipedia sense, with lemma's that try to
to combine the opinions of all. The matter is too subjective.

I rather would work from independant casestudies in a simular format (e.g.
recurring paragraphs like objectives of the language, design of the
language, implementation aspects, practical issues during use, lessons
learned)

> Whether a web site or a wiki my feeling is that it should probably be
> for a closed group of people to update - rather than world-writeable.

I'd lean towards a static website generated from a revision system. This
allows users to write the bulk offline, keeps all data in something that can
be backuped tightly.

But wiki is possible too if sb is deeply versed into administrating one (I'm
not).

Marco van de Voort

unread,
Dec 14, 2010, 6:54:53 AM12/14/10
to
On 2010-12-14, Rod Pemberton <do_no...@notreplytome.cmm> wrote:
> AIUI, most languages represent "false" as zero, and "true" as not false.

False as zero is very common. "true" as not false is way less common. E.g.
all Wirthian languages define true as "1", and the other values are
undefined.

I think that many strong typed languages can have such scheme (since such
languages don't allow anything but 0,1 without explicit overrides)

BGB

unread,
Dec 14, 2010, 12:29:02 PM12/14/10
to
On 12/14/2010 2:58 AM, Torben Ægidius Mogensen wrote:
> "Rod Pemberton"<do_no...@notreplytome.cmm> writes:
>
>
>> Does one really need 256 bytecodes? While that doesn't seem like much, it
>> is enough to give one headaches. You have to write 256 routines, maintain
>> 256 routines, figure out when and where it's optimal to use all 256
>> routines... I think you'll find that only a small quantity are used
>> frequently. ISTM that around 32 to 40 or less is goal.
>
> You can certainly do with less than 256 byte codes, but while it is true
> that your byte-code interpreter needs 256 routines, this can be an
> optimisation. For example, you can have add1 and sub1 instructions in
> addition to a general add instruction. These would be quite common in a
> lot of code and are faster to interpret than a sequence of a constant
> load and a normal add.
>

things such as the level of abstraction may also effect things.
for example, a bytecode representing operations at a relatively similar
level of abstraction to the language syntax and supporting multiple
programming languages may end up with a few more than one instead
focusing on providing a more detailed computational model and
decomposing most complex operations into primitive operations (or
dealing with a language with less syntax-level features).

for example, it would only require a small number of operations to
represent most of Scheme in a bytecode (maybe 'apply' and a few others),
but a bit more to represent the entire C-style operation set ('foo->x',
'foo.x', 'foo[x]', 'foo(x)', ..., 'x++', 'x--', '++x', '--x', 'x+=1',
'x-=1', ..., '(*(&x))++', ...)


when one does start type-specializing in this case they can end up with
a large number of opcodes (although in this case I am unlikely to type
specialize, as this would be of little benefit to the interpreter design
I am imagining...).


> I have seen 16-bit "bytecode" that exploits the extra bits to encode a
> lot of common special cases to get faster interpretation.
>

yeah.
one of my prior interpreters used 16 bit units, but this was (IIRC)
mostly 14 bits for the opcode number, and 2 bits to encode the number of
opcode arguments.


since then I have mostly just used bytes, usually:
0x00..0xBF: single-byte opcodes
0xC0..0xFF: multi-byte opcodes (2-byte typically)

this usually allows a 14 bit opcode space (a variant could use
0xF0..0xFF to encode 3-byte opcodes, but this is likely not needed).

most later formats don't encode the number of arguments, since usually I
use an opcode list to manage this (to know the argument format of an
opcode, it simply checks its entry in the list).


> Of course, if you intend to compile your byte code to machine code
> instead of interpreting it, I agree that it is better to use a few
> general instructions and not add special cases -- the code you generate
> from add1 and a combination of a constant load and an add should be the
> same, so there is no need for special-casing.
>

yes.

however, some depends on the complexity and smartness of the JIT.

one could encode 'x++' as, say:
load #x
dup
ldc 1
add
store #x
pop

or they could encode it as, say:
postinc_s #x

a smart JIT could infer that the former could be compiled to, say:
inc dword [ebp-8]

but, the later can go directly to handler code which is more like:
is the type of X one that I know about?
if yes, handle via specific ASM sequence;
else decompose into more primitive operations.


one could then start doing things like:
'foo[x]++', 'foo[x]--', ... as single opcodes as well, ...

say:
ldiipostinc_ss #foo, #x
ldiipostdec_ss #foo, #x
...

vs:
load #x
ldipostinc_s #foo

or vs:
load #x
load #foo
loadindex
dup
ldc 1
add
load #x
load #foo
storeindex
pop

a JIT could figure out how to optimize more primitive instructions, yes,
but with complex operations it doesn't need to, as it can jump right to
the intended place in the codegen which includes the special case logic
in question.


in the past (~ 2006), this allowed a very naive JIT to compile BGBScript
code into a form performance competitive with what I was getting out of
GCC, which was impressive at the time... (basically, a big switch table
which spit out various pre-form ASM sequences, using a few flags fields
for simplistic peephole optimizations, typically allowing the top 2
stack items to reside in registers, ...).


then, I went and wrote my C compiler, and then ended up fighting many
battles with complexity and bugs (as C proved a somewhat more complex
language to compile and work with than BGBScript).


but, yeah, higher-level opcodes do tend to lead to a larger total number
of opcodes...

tm

unread,
Dec 15, 2010, 2:33:36 AM12/15/10
to
On 14 Dez., 10:58, torb...@diku.dk (Torben Ægidius Mogensen) wrote:

> "Rod Pemberton" <do_not_h...@notreplytome.cmm> writes:
> > Does one really need 256 bytecodes?  While that doesn't seem like much, it
> > is enough to give one headaches.  You have to write 256 routines, maintain
> > 256 routines, figure out when and where it's optimal to use all 256
> > routines...  I think you'll find that only a small quantity are used
> > frequently.  ISTM that around 32 to 40 or less is goal.
>
> You can certainly do with less than 256 byte codes, but while it is true
> that your byte-code interpreter needs 256 routines, this can be an
> optimisation.  For example, you can have add1 and sub1 instructions in
> addition to a general add instruction.  These would be quite common in a
> lot of code and are faster to interpret than a sequence of a constant
> load and a normal add.
>
> I have seen 16-bit "bytecode" that exploits the extra bits to encode a
> lot of common special cases to get faster interpretation.

This discussion is a good example of, what I call: Bottom
down design. A discussion about programming language
design which quickly leads to the size and the number of
bytecodes in a VM implementing this language.

I know that discussion often go off topic, but the desingn
of a language should really not be determined by such
implementation details.

Does it really make sense to determine that a language
must be implementend by a VM?


Greetings Thomas Mertes

--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.

BGB

unread,
Dec 15, 2010, 3:29:44 AM12/15/10
to
On 12/15/2010 12:33 AM, tm wrote:
> On 14 Dez., 10:58, torb...@diku.dk (Torben Ægidius Mogensen) wrote:
>> "Rod Pemberton"<do_not_h...@notreplytome.cmm> writes:
>>> Does one really need 256 bytecodes? While that doesn't seem like much, it
>>> is enough to give one headaches. You have to write 256 routines, maintain
>>> 256 routines, figure out when and where it's optimal to use all 256
>>> routines... I think you'll find that only a small quantity are used
>>> frequently. ISTM that around 32 to 40 or less is goal.
>>
>> You can certainly do with less than 256 byte codes, but while it is true
>> that your byte-code interpreter needs 256 routines, this can be an
>> optimisation. For example, you can have add1 and sub1 instructions in
>> addition to a general add instruction. These would be quite common in a
>> lot of code and are faster to interpret than a sequence of a constant
>> load and a normal add.
>>
>> I have seen 16-bit "bytecode" that exploits the extra bits to encode a
>> lot of common special cases to get faster interpretation.
>
> This discussion is a good example of, what I call: Bottom
> down design. A discussion about programming language
> design which quickly leads to the size and the number of
> bytecodes in a VM implementing this language.
>
> I know that discussion often go off topic, but the desingn
> of a language should really not be determined by such
> implementation details.
>

however, this type of concern does often somewhat influence how the
thing will behave...

for example, with a statically-typed language, the question of when and
how types are handled has a notable impact on the overall behavior of
what sorts of languages may be implemented, where for example:
earlier handling of types tends to be more rigid, but often catches
issues sooner;
later handling of types tends to allow more flexibility, but may in some
cases also delay detecting type errors.


the number of opcodes is not so much of an issue IMO, but some people
are really into minimizing the number of opcodes:
a smaller number leads to a simpler design with more "elegance" (I don't
entirely agree);
a larger number means it is easier to get more speed out of a naive
interpreter or JIT, but does usually add some complexity and rarely are
these features entirely orthogonal (usually, the "common special case"
is king).


all this may still have more overall effect than does even the matters
of syntax, ...


> Does it really make sense to determine that a language
> must be implementend by a VM?
>

very often a language will be...

very often, if it isn't, a bytecode is a good stopping point between the
language-specific compiler and the logic to target a specific output CPU
or similar...

hence, everything above the bytecode is mostly concerned with the source
language, and everything below is concerned with the target architecture.


my current architecture has actually several intermediate stages:
source -> GAST;
GAST -> bytecode;
bytecode -> ASM;
ASM -> ...

Aaron Gray

unread,
Dec 15, 2010, 1:38:22 PM12/15/10
to
"Rod Pemberton" <do_no...@notreplytome.cmm> wrote in message
news:ie70fu$tio$1...@speranza.aioe.org...

> "BGB" <cr8...@hotmail.com> wrote in message
> news:4D0676D9...@hotmail.com...
>>
>> basically, I am frustrated with Java and the JVM architecture, ...
>
> Limitations? Pointers?

Lockin technology.

Self has 7. Basically different types of message send.

Aaron

Aaron Gray

unread,
Dec 15, 2010, 1:40:33 PM12/15/10
to
"BGB" <cr8...@hotmail.com> wrote in message
news:ie7da4$4iv$1...@news.albasani.net...

> On 12/13/2010 10:51 PM, Rod Pemberton wrote:
>> "BGB"<cr8...@hotmail.com> wrote in message
>> news:4D0676D9...@hotmail.com...

Depends on cacheing and how many different VM's you have running on the
machine.

Aaron

BartC

unread,
Dec 15, 2010, 3:20:49 PM12/15/10
to

"Rod Pemberton" <do_no...@notreplytome.cmm> wrote in message
news:ie70fu$tio$1...@speranza.aioe.org...

> Does one really need 256 bytecodes? While that doesn't seem like much, it


> is enough to give one headaches. You have to write 256 routines, maintain
> 256 routines, figure out when and where it's optimal to use all 256
> routines... I think you'll find that only a small quantity are used
> frequently. ISTM that around 32 to 40 or less is goal.

256 is not a big number. If the code has to exist somewhere, it will just
exist as 256 smallish chunks, instead of 32 bigger ones (which then have to
waste time determining which of 8 possibilities they should be executing).

And bytecode handlers will all have the same pattern so that an empty
framework can be created automatically.

I really can't see the number of bytecodes used by an interpreter as being
any sort of issue. I liked my interpreters to work briskly, so I tended to
use hundreds (until the number threatened to go into the thousands, then it
was clear I should really have been generating native code).

--
Bartc

Jacko

unread,
Dec 15, 2010, 5:36:05 PM12/15/10
to
On 14 Dec, 05:50, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:

> "BartC" <b...@freeuk.com> wrote in message
>
> news:ie6dpt$9mt$1...@news.eternal-september.org...
>
> > "Jacko" <jackokr...@gmail.com> wrote in message

> >news:22f1dd43-c488-4736...@m20g2000prc.googlegroups.com...
>
> > > * It's not goig to use the number 0 as a number. As 0 is NaN in many
> > > contexts.
>
> > This language doesn't have zero? I can see that causing some difficulties
> > for those used to languages that do have it...

Maybe but a good break rom convention is needed.

> How do you represent "false" and "true"?  Does anything use them?  e.g.,
> conditionals like if else while etc.

+ is true - is false 0 is a null value.

> AIUI, most languages represent "false" as zero, and "true" as not false.

true is not false, and without zero in the way "not" is just a sign
change.

cheers jacko

BGB

unread,
Dec 15, 2010, 11:46:13 PM12/15/10
to

yep...

my bytecode formats have also often tended towards a moderately largish
number of opcodes.

either way, my formats tend to be bigger than JBC but smaller than x86,
typically having a few hundred opcodes.


recently, I am designing a new language specifically to work with a
threaded-code interpreter (basically, it is ending up looking mostly
like a hybrid of Java, C#, and ActionScript3).

note that the language design is statically typed...


a quick example of currently imagined syntax:

package myapp.foo
{
import BS2.IO;

struct Point
{
double x, y;
}

delegate void DoSomethingFunc();

public class Bar extends Object
{

private Point org;

public Bar(double x, double y)
{ org.x=x; org.y=y; }
public Bar()
{ this(0, 0); }

public Point get Org()
{ return org; }
public Point set Org(value)
{ org=value; }

public double get X()
{ return org.x; }
public double get Y()
{ return org.y; }

public double set X(value)
{ org.x=value; }
public double set Y(value)
{ org.y=value; }
}

public interface IFoo
{
public double get X();
public double set X(value);
public double get Y();
public double set Y(value);
public Point get Org();
public Point set Org(value);

public DoSomethingFunc doSomething;
}

public class Foo extends Object implements IFoo
{
delegate Bar bar;

public DoSomethingFunc doSomething;

public Foo()
{
bar=new Bar();
doSomething=doSomethingDefault;
}

private void doSomethingDefault()
{
this.X=32;
this.Y=64;

doSomething=doSomethingElse;
}

private void doSomethingElse()
{
this.X=0;
this.Y=0;

doSomething = fun void()
{ doSomething=doSomethingDefault; };
}

public void doSomethingDifferent(ref Point pt)
{
Point tmp;
tmp=org;
org=pt;
pt=tmp;
}
}

public class Baz
{
public static void main(string[] args)
{
IFoo foo;
Foo foo1;
Point pt0, pt1, pt3;
*Point ppt; //pointer

foo=new Foo();
pt0=foo.Org;
foo.doSomething();
pt1=foo.Org;
foo.doSomething();
pt2=foo.Org;

foo1=foo;
foo1["doSomethingDifferent"](ref pt0);

Console.WriteLine("Pt0 {0} {1}", pt0.x, pt0.y);
Console.WriteLine("Pt1 {0} {1}", pt1.x, pt1.y);
Console.WriteLine("Pt2 {0} {1}", pt2.x, pt2.y);

ppt=&pt0; //get address

Console.WriteLine("PPt {0} {1}", ppt.x, ppt.y);
}
}
}


language should probably be within my ability to implement.
the spec does include generics though which are a bit uncertain.
I left out unions as they pose issues...


or such...


Marco

unread,
Dec 16, 2010, 8:12:57 AM12/16/10
to
How about http://en.wikibooks.org/wiki/Subject:Computing ?

Start by creating a table of contents

Since this is meant to be truly a shared site it would be a good choice.

For general language design discussion - I am partial to comp.lang.misc and/or comp.compilers

Rod Pemberton

unread,
Dec 16, 2010, 3:59:46 PM12/16/10
to
"BartC" <b...@freeuk.com> wrote in message
news:ieb80k$mrv$1...@news.eternal-september.org...

> "Rod Pemberton" <do_no...@notreplytome.cmm> wrote in message
> news:ie70fu$tio$1...@speranza.aioe.org...
>
> > Does one really need 256 bytecodes? While that doesn't seem like much,
it
> > is enough to give one headaches. You have to write 256 routines,
maintain
> > 256 routines, figure out when and where it's optimal to use all 256
> > routines... I think you'll find that only a small quantity are used
> > frequently. ISTM that around 32 to 40 or less is goal.
>
> 256 is not a big number. If the code has to exist somewhere, it will just
> exist as 256 smallish chunks, instead of 32 bigger ones (which then have
to
> waste time determining which of 8 possibilities they should be executing).
>
> And bytecode handlers will all have the same pattern so that an empty
> framework can be created automatically.
>
> I really can't see the number of bytecodes used by an interpreter as being
> any sort of issue. I liked my interpreters to work briskly,

Did using more speed them up? Or, did using more just allow you locate an
optimal small set of instructions out of the many?

> so I tended to
> use hundreds (until the number threatened to go into the thousands, then
it
> was clear I should really have been generating native code).
>

It's likely that many of these instructions represent optimized combinations
of other instructions. In which case, they are infrequently used and offer
a very small speed gain when used, i.e., neglible. I guess I'm not clear on
how instructions which have very low frequency of usage will speed up an
interpreter when they aren't used that much. It's the most frequently used
instructions that consume most of the time, right? That set is likely to be
much smaller.


Rod Pemberton


BartC

unread,
Dec 16, 2010, 5:46:28 PM12/16/10
to
"Rod Pemberton" <do_no...@notreplytome.cmm> wrote in message
news:iedufj$m0v$1...@speranza.aioe.org...

> "BartC" <b...@freeuk.com> wrote in message

>> I really can't see the number of bytecodes used by an interpreter as


>> being
>> any sort of issue. I liked my interpreters to work briskly,
>
> Did using more speed them up? Or, did using more just allow you locate an
> optimal small set of instructions out of the many?

>> so I tended to
>> use hundreds (until the number threatened to go into the thousands, then
> it
>> was clear I should really have been generating native code).
>>
>
> It's likely that many of these instructions represent optimized
> combinations
> of other instructions. In which case, they are infrequently used and
> offer
> a very small speed gain when used, i.e., neglible. I guess I'm not clear
> on
> how instructions which have very low frequency of usage will speed up an
> interpreter when they aren't used that much. It's the most frequently
> used
> instructions that consume most of the time, right? That set is likely to
> be
> much smaller.

In one design (using dynamic typing so only about 150 bytecodes), memory
operands were of two kinds: static and frame. Each bytecode accessing memory
needed two versions. Some bytecodes with two memory accesses, might need
four!

I think there were 8 bytecodes just for handling for-loop interations:
selected combinations of up/down, memory/const, memory/memory, with
static/frame variations. This meant an empty for-loop was very fast; in fact
somewhat faster than unoptimised C using gcc (otherwise 4 x as slow, which
is not bad for an interpreted, dynamic language).

Then, many arithmetic operators would have their own bytecode, as well as
common instructions such as push 0 or push 1.

Sometimes I used profiling to find which bytecodes needed speeding up.
Perhaps the number of bytecodes could be doubled, but it was clear the
returns were limited (as you say, many would be common combinations, which
just served to reduce dispatch overheads).

There was another design, with over 300 *base* bytecodes, but the language
could add type-hints to the dynamic variables, so that each bytecode could
expand at loadtime to one of several more specialised codes. This was the
one were I ended up with 700-800 'bytecodes' before I decided to try another
approach.

(It was pretty fast however: about 3 to 10 x slower than *optimised* C, on
integer benchmarks, but using some static type hints. Dynamic typing
generally halved the speed.)

--
Bartc

BGB

unread,
Dec 16, 2010, 10:36:00 PM12/16/10
to

very often, an interpreter will start bogging down at the main switch()
statement.

once this happens, the only real option left (while still using a giant
switch statement) is to start using compound opcodes to deal with many
common sequences of opcodes.

hence, by using compound opcodes, the amount of time dependent on
jumping through the switch is reduced (and more power is available for
actual logic).


an example of this is consider one has a sequence of several opcodes:
load #x
load #y
cmp_le
jmp_true L0

if this sequence takes place often, then some speed can be gained by
using shortcuts, say:
load #x
load #y
jmp_le L0

or, even:
jldcmple #x, #y, L0

now, as opposed to needing 3 or 4 dispatches, one only needs a single
dispatch...


although, since then I have discovered that is another strategy:
one can translate the bytecode into a sequence of function pointers.

in itself, this wouldn't amount to much, but it offers a nifty trick:
it is possible to recognize special cases in the interpreter, and then
start writing special-case function pointers specifically for these cases.

for example, one can recognize types, and use a type-specialized version
of an operation rather than having to type-check the thing, or start
merging opcodes during the rewrite.


and the big switch of opcode interpretation logic and generic operations
becomes a big switch to find the most appropriate function pointer to
handle a given instruction trace...


so, this way, one can just do something like:
load #x
load #y
cmp_le
jmp_true L0


and then the interpreter may recognize this sequence for the integer and
floating-point cases, and may instead simply emit a single function
pointer (and fall back to the generic case if it doesn't recognize the
types or operations involved).

I effectively discovered a lot of this while implementing an x86
interpreter, since there is a lot of crap going on in x86 which can
simply be left out or short-cut in many cases.

another advantage is that the micro-optimizing logic can be kept much
closer to the interpreter, rather than having to have the compiler know
about all of it, and it more cleanly deals with the issue that what are
good design tradeoffs for an interpreter are not necessarily the same as
those for a JIT compiler, ...


for example, an interpreter might need to handle all of these compound
cases to be efficient, but a JIT may instead rely on simpler operations
and peephole optimization, for example...


more so, it also simplifies implementing a hybrid strategy, whereby
certain opcodes and traces could be replaced by chunks of
custom-generated machine code, without the interpreter needing to care
as much which parts are simply plain C code and which parts were JITed, ...


or such...

BartC

unread,
Dec 17, 2010, 4:16:16 PM12/17/10
to
"BGB" <cr8...@hotmail.com> wrote in message
news:ieelrj$onf$1...@news.albasani.net...

> although, since then I have discovered that is another strategy:
> one can translate the bytecode into a sequence of function pointers.
>
> in itself, this wouldn't amount to much, but it offers a nifty trick:
> it is possible to recognize special cases in the interpreter, and then
> start writing special-case function pointers specifically for these cases.
>
> for example, one can recognize types, and use a type-specialized version
> of an operation rather than having to type-check the thing, or start
> merging opcodes during the rewrite.
>
>
> and the big switch of opcode interpretation logic and generic operations
> becomes a big switch to find the most appropriate function pointer to
> handle a given instruction trace...
>
>
> so, this way, one can just do something like:
> load #x
> load #y
> cmp_le
> jmp_true L0
>
>
> and then the interpreter may recognize this sequence for the integer and
> floating-point cases, and may instead simply emit a single function
> pointer (and fall back to the generic case if it doesn't recognize the
> types or operations involved).

How does the recognition work? Wouldn't it still need to test operands for
both integer or both floating point? How can it avoid those tests next time
round, when the operands could be different? (I'm assuming type inference,
at any level, is not being used.)

--
Bartc

BGB

unread,
Dec 17, 2010, 9:33:43 PM12/17/10
to

type inference is possible...

more likely, it would be because x and y were declared as having certain
types (IOW: manifest types...).

or, sub-versions of the function can be generated for each set of
argument types it encounters, causing types in many cases to be
statically-determined (I guess some SmallTalk implementations did this).

...


if it really can't be determined, then dynamic types may be used
(although super-opcodes may still be usable...).


or such...

James Harris

unread,
Dec 18, 2010, 8:31:51 AM12/18/10
to
On Dec 13, 11:03 am, James Harris <james.harri...@googlemail.com>
wrote:

* As another suggestion for what would make appropriate content for a
website on programming language design, how about topical notes
somewhat like the tutorials on operating systems which are hosted at

http://www.osdever.net/tutorials/index.

The notes on programming language issues could be similarly attributed
to whoever wrote them. I guess that topics could include anything
relevant to design or implementation of programming languages.

Would they be called tutorials? I think of a tutorial as describing
something mechanical - i.e. how to do something. When talking about
design maybe essays is a better term.

They don't need to be long and wordy. Thinking about it, there have
been various words of wisdom written on this newsgroup that I think
have been good enough for a more stable medium. (Although Google
retains a Usenet archive it's not indexed by topic and the insightful
comments can get lost in among other posts.) Placing such information
on a website would make it stand out more.

If the website kept to using URLs that don't need to change then
Usenet posts could link to them when appropriate.

Any opinions on the above? Feel free to say.

James

James Harris

unread,
Dec 18, 2010, 9:02:43 AM12/18/10
to
On Dec 13, 6:26 am, BGB <cr88...@hotmail.com> wrote:

> On 12/12/2010 3:56 PM, James Harris wrote:
>
> > I've been thinking of this for some time but wasn't sure there was
> > much interest. Given the surprising recent activity on comp.lang.misc
> > now is possibly a good time to raise the idea. Two questions:
>
> > 1. Does anyone else here have an interest in setting up a website or
> > wiki on the topic of programming language design, or is there an
> > existing site that's good enough to join and welcomes new authors?
>
> cool idea...

Thanks.

> I don't know of any like this, and personally lack either the expertise
> or resources to manage something like this.
>
> a wiki running off an old laptop (serving as the server) on a slow DSL
> connection with dynamic DNS where half the time either the router
> doesn't route HTTP or the DNS is stale because the ISP causes the IP to
> change often... yeah, this would turn out well...
>
> but, yeah, if something like this was around, this would be nifty...

It doesn't need to be technical. We could put the content on something
like either Wikispaces or Google Sites. In both cases they carry out
the hosting and DNS etc. And both seem to have a stable business model
that wouldn't involve any costs to us.

> well, there is FONC / VPRI:
> http://en.wikipedia.org/wiki/Viewpoints_Research_Institute
>
> but, activity there seems far less than even here on comp.lang.misc...
>
> and, also, their stuff is in "academic paper format", which is IMO far
> more effort than I am willing to bother with (it asks enough for me to
> even summarize thoughts into a form which would make a good article of
> blog post...).
>
> hell, maybe I almost need a blog?... it would probably still do a better
> way of getting some of my ideas out there than either usenet or my own
> projects is proving to be...

You know, thinking about the way in your posts that you've described
your thoughts as they have developed a blog might very well suit you.
A blog would give you and anyone else a way to look back over how your
ideas have evolved. As I just mentioned in another post stuff posted
to Usenet tends to get lost in amongst everything else.

I have never looked for one but there are probably free blog-hosting
sites out there.

> > 2. Whether you would want write access or not do you have any thoughts
> > on what info should be included?
>

> well, at this point I am a little more interested in matters of VM
> design than of language design per-se, as at this point much of language
> design boils down to "how can I make something absurd and push it off as
> a good idea" (since most of what would make a "good" language design, in
> general, is apparent mostly through observation of the trends in
> existing technologies).
>
> as it so happens, thinking in terms of "the language" seems to be the
> wrong level of abstraction, with the language more serving as a means of
> most usably exposing functionality.
>
> the VM is then a more interesting problem IMO, since this is the
> manifestation of the core sets of functionality which may be exposed by
> a language.
>
> meanwhile, the endless treadmill about matters of syntax and parser
> generators is no longer of much personal interest (stale...).


>
> > Whether a web site or a wiki my feeling is that it should probably be
> > for a closed group of people to update - rather than world-writeable.

> > Other opinions welcome.
>
> either way, but ideas should be easy to express, as any kind of long
> awkward process to publicize ideas limits the variety of ideas which may
> be expressed.

Yes, you are right. I've worked with Wikispaces and can report that
it's easy to use. I don't know about Google Sites but given their
market I suspect it's not too complex. Having said that, realistically
I guess any new system will have some learning associated with it.

James

James Harris

unread,
Dec 18, 2010, 11:23:41 AM12/18/10
to
On Dec 13, 7:58 am, Robbert Haarman <comp.lang.m...@inglorion.net>
wrote:

> On Sun, Dec 12, 2010 at 02:56:18PM -0800, James Harris wrote:
>
> > 1. Does anyone else here have an interest in setting up a website or
> > wiki on the topic of programming language design, or is there an
> > existing site that's good enough to join and welcomes new authors?
>
> > 2. Whether you would want write access or not do you have any thoughts
> > on what info should be included?
>
> I think it could be useful to have a website that tracks the state of the
> art in programming languages. Programming language design requires making
> a lot of decisions about a wide range of topics,

So true!

> and not everyone can be
> an expert on every topic. The result is lots of recurring discussions, and
> perhaps languages that could have been better, had their designers known
> more.
>
> So what would be interesting to me is a website that divides the subject
> of programming language design into various subtopics, and for each topic,
> lists the considerations, languages that have made the choice one way or
> another, pointers to more information (e.g. research papers), and the
> current state of the art for various dimensions.

It would be useful. Some design choices might be distinct and can thus
be expressed alone. Where they are not and one choice relates to
others then, at least in theory, the web should be a good medium as
URLs could be used to add links between topics.

Having said that, don't you think that a top down approach could
become a burden? The overall topic of language design is large and
whatever structure was decided upon there would be pressure to write
the content.

What do you think about building the site bottom up, at least
initially? As we find out something useful it could go up on the web
site. Or as issues are discussed on comp.lang.misc the findings could
go on the site. I think we can thereby add something genuinely new -
i.e. new content - to the web.

As we take a view on this newsgroup about what makes a language good
the web site is somewhere that it can be documented.

If we take this approach then any indices could be built as the
content appears and thus always be complete in themselves.

That's not to say that anyone who wants to could not do as you suggest
but just to avoid saying that we *should* work that way. If the site
ever becomes mature enough that may be a good time to organise the
topics and fill in the gaps.

> For example, one aspect of programming language design is type checking.
> This is a large and important topic that generates lots of discussion. It
> can be broken down into various subtopics, e.g. static vs. dynamic typing,
> weak vs. strong typing, manifest vs. implicit typing, type safety,
> polymorphism (ad-hoc as well as parametric), subtypes, dependent types,
> etc.
>
> Regarding state of the art: some topics in programming language design are
> the subject of active research, and once in a while a programming language
> or paper comes out that pushes the boundary of what has been achieved.
> Memory management, for example. Real-time garbage collectors guarantee
> minimum mutator utilization, that is, a minimum percentage of time that a
> program can spend on doing things other than memory management. But what
> is the currently highest minimum mutator utilization that has been achieved,
> and what algorithm was used?
>
> Of course, there are also a lot of places that touch on the subject of
> programming language design. This newsgroup, for one. But also:
>
>  - The C2 Wiki: http://c2.com/cgi/wiki?IdealProgrammingLanguage

Interesting link. A good goal but the ideas seem to be mixed in
together. Although it appears to be a wiki (within CGI which is weird)
it doesn't seem to welcome input unless it goes via the editor, though
I could be wrong. I'd like *anyone* to feel he could edit directly. I
think we could set up something that treats people more equally. Oh
no, as I wrote that it reminded me of the one-time French aim:
Liberte, Egalite, Fraternite. A good principle I guess but we'll have
to agree not to go around cutting people's heads off.

>  - Lambda the Ultimate: http://lambda-the-ultimate.org/
>
>  - Ulf Schünemann's Programming Language Design page:
>    http://web.cs.mun.ca/~ulf/pld/
>
>  - Jason Voegele's Programming Language Comparison page:
>    http://www.jvoegele.com/software/langcomp.html

Again, good sites, though they seem to put a lot of content in a blob
on the main page (and don't seem to welcome edits).

>  - Mike Austin's page about language design:
>    http://mike-austin.com/impulse/languages.html

Not just saying this because Mike reads the newsgroup but ISTM this is
a better approach. It links to the fundamentals first (and is better
laid out on screen).

> There also used to be a page by someone who was attempting to chart all the
> dimensions and choices involved in designing a programming language, but
> I can't seem to find the link anymore.
>
> As you can see, there is a lot of material already out there.

I know. I've searched for info on language design a few times and
there's an amazing quantity of information out there. Lots of energy
being put into it by our peers around the world. Maybe programmers, by
their nature, are always looking to make a better machine rather than
accept an inferior one.

> I still think
> another site would be interesting. Even if it's Yet Another Page About
> Programming Languages, that's how the Web grows.

That's good. I can't see anything comparable out there in ether land.

> But it would be really
> useful to aggregate and summarize the enormous amount of material that is
> already out there. This is an enormous task which I suspect exceeds what
> a lone person can do in his copious free time,

How do you feel about the incremental approach?

> so it would probably be
> a good idea to have multiple editors, and perhaps make it publically
> editable.

Yes, I'd welcome any who want to edit. From the experience of setting
up a wiki about source code fragments,

http://codewiki.wikispaces.com

occasionally people make changes and all that's logged is their IP
address. I think it might be better to link changes to userids. Then
at least a reader can see that user X changed the page on such-and-
such a date. (The changes made can be seen in either case.)

James

James Harris

unread,
Dec 18, 2010, 12:35:31 PM12/18/10
to
On Dec 13, 1:38 pm, mansoon_...@rediffmail.com wrote:

> Nice Idea..

Thanks.

> Actually I have recently started working on such website. If you guys
> are interested we can join and work together. I have already setup a
> website and started adding content to it.

Yes, we could definitely work together. Do we share a site or link
between them? My feeling is that links between them would be best for
two reasons.

1. One is longevity of the domain name. We could set up something on a
public hosting service that we could reasonably expect to exist in ten
or twenty years' time. And nothing needs to be paid to maintain the
name. This is important if we want to link to it from Usenet postings
and elsewhere.

2. The second is that whatever we set up should be a public resource.
A private web site (such as yours, http://www.avabodh.com/) is, well,
yours! If we created a wiki on Wikispaces it would not 'belong' to any
one of us. Thus people who added content to it would have almost as
much ownership as whoever set the site up. I say "almost" as someone
has to be an admin of the site. But even that can be changed. Say I'd
set it up but I later died. Edits could continue and someone who was
willing to take over admin duties could e-mail Wikispaces support who,
after checking that I was no longer responding!, could reasign that
task.

I think that's important to encourage contributions. I guess a person
would rather contribute to a public resource than to a private one.

> I have written on C to assembly code generation and C++ object model.
> I have called these as "C Internals" and "C++ Internals".
>
> Here is what the website can possibly contains -
> - The books/tutorials/articles that we can write.
> - Link and references to other useful websites.
> - FAQ (frequently asked questions) or wiki

Good suggestions.

> - And possibly a forum.

A forum is useful but don't forget we've already got comp.lang.misc
for that.

> Actually you can find most of the things on the Internet. But my idea
> is make a website that can become one place for getting all
> informations.
> By the way, here is the link to my website-
> http://www.avabodh.com/
>
> And here is the link to on-line books that I have written -
> http://www.avabodh.com/cin/cin.html
> http://www.avabodh.com/cxxin/cxx.html

I think you are right to add your own content. Well done! Too many
places are just links to each other.

James

BGB

unread,
Dec 18, 2010, 12:40:54 PM12/18/10
to
On 12/18/2010 7:02 AM, James Harris wrote:
> On Dec 13, 6:26 am, BGB<cr88...@hotmail.com> wrote:
>> On 12/12/2010 3:56 PM, James Harris wrote:
>>
>>> I've been thinking of this for some time but wasn't sure there was
>>> much interest. Given the surprising recent activity on comp.lang.misc
>>> now is possibly a good time to raise the idea. Two questions:
>>
>>> 1. Does anyone else here have an interest in setting up a website or
>>> wiki on the topic of programming language design, or is there an
>>> existing site that's good enough to join and welcomes new authors?
>>
>> cool idea...
>
> Thanks.
>
>> I don't know of any like this, and personally lack either the expertise
>> or resources to manage something like this.
>>
>> a wiki running off an old laptop (serving as the server) on a slow DSL
>> connection with dynamic DNS where half the time either the router
>> doesn't route HTTP or the DNS is stale because the ISP causes the IP to
>> change often... yeah, this would turn out well...
>>
>> but, yeah, if something like this was around, this would be nifty...
>
> It doesn't need to be technical. We could put the content on something
> like either Wikispaces or Google Sites. In both cases they carry out
> the hosting and DNS etc. And both seem to have a stable business model
> that wouldn't involve any costs to us.
>

yeah, that is probably a decent idea...


>> well, there is FONC / VPRI:
>> http://en.wikipedia.org/wiki/Viewpoints_Research_Institute
>>
>> but, activity there seems far less than even here on comp.lang.misc...
>>
>> and, also, their stuff is in "academic paper format", which is IMO far
>> more effort than I am willing to bother with (it asks enough for me to
>> even summarize thoughts into a form which would make a good article of
>> blog post...).
>>
>> hell, maybe I almost need a blog?... it would probably still do a better
>> way of getting some of my ideas out there than either usenet or my own
>> projects is proving to be...
>
> You know, thinking about the way in your posts that you've described
> your thoughts as they have developed a blog might very well suit you.
> A blog would give you and anyone else a way to look back over how your
> ideas have evolved. As I just mentioned in another post stuff posted
> to Usenet tends to get lost in amongst everything else.
>
> I have never looked for one but there are probably free blog-hosting
> sites out there.
>

yeah...

there were several sites I have accounts on, that were listed as having
blogging capability.

namely, there was MySpace and LinkedIn...

tries using the former before, but it wasn't really good, and I had more
been using the site for personal stuff than for programming stuff. also
the new site design (Facebook-like but much worse performance, because
one really needs the web-browser to lag apparently, well and
teh-terrible memory use...).

for the latter, I couldn't find any blogging facilities...

Tumblr is also an option...

still considering though.

yeah...


did get around to mostly designing a new language, but the design is
still ongoing. this time I decided to use HTML for the spec, and this is
working out fairly well (since one can express syntax forms using
formatting, and tables are nifty for, errm, tables...).

however, unless I use HTML posts I can't post a copy/pasted spec, so
would need to put it on my site or something and provide a link.

was recently using SeaMonkey Composer as the HTML editor, since this
works fairly well...

vs MS Office or OpenOffice which produce HTML which doesn't format
correctly in ordinary browsers (and the HTML is filled with garbage),
and Visual Studio which includes an IMO somewhat awkward HTML editor.


well, was just going to send it over to server and post a link here, but
for whatever reason my ability to copy files to it isn't working.
currently installing Windows' Updates and after this will reboot the
thing and see if it will be more cooperates...

done, server not being responsive...
noted that apparently it was giving dyndns the wrong IP address (giving
it the IP from the wrong internet connection...). now I have to wait for
current IP to make its way through DNS again, grr...

grr...

my uptime sucks it seems...


when it works:
http://cr88192.dyndns.org/2010-12-15_BGBScript2.html
http://cr88192.dyndns.org/2010-12-15_BS2BC.html

for site:
http://cr88192.dyndns.org/

for the moment (should work):
http://184.99.146.104/2010-12-15_BGBScript2.html
http://184.99.146.104/2010-12-15_BS2BC.html

note: my ISP has the IP addresses changing often, with one having to pay
lots extra to not have their IP address jump all over the place...

update, setting change, now new IP is:
71.214.176.85

grr, qwest...

or such...

James Harris

unread,
Dec 18, 2010, 12:57:47 PM12/18/10
to
On Dec 14, 10:58 am, Marco van de Voort <mar...@turtle.stack.nl>
wrote:

> On 2010-12-12, James Harris <james.harri...@googlemail.com> wrote:
>
> > I've been thinking of this for some time but wasn't sure there was
> > much interest. Given the surprising recent activity on comp.lang.misc
> > now is possibly a good time to raise the idea. Two questions:
>
> > 1. Does anyone else here have an interest in setting up a website or
> > wiki on the topic of programming language design, or is there an
> > existing site that's good enough to join and welcomes new authors?
>
> > 2. Whether you would want write access or not do you have any thoughts
> > on what info should be included?
>
> I would not make it a wiki in the wikipedia sense, with lemma's that try to
> to combine the opinions of all. The matter is too subjective.

I had to look "lemma" up and having done so I'm not sure how it
applies. Can you elucidate?

> I rather would work from independant casestudies in a simular format (e.g.
> recurring paragraphs like objectives of the language, design of the
> language, implementation aspects, practical issues during use, lessons
> learned)

I looked up "simular" too. It is a word but again can't see how it
applies. You mean similar?

> > Whether a web site or a wiki my feeling is that it should probably be
> > for a closed group of people to update - rather than world-writeable.
>
> I'd lean towards a static website generated from a revision system. This
> allows users to write the bulk offline, keeps all data in something that can
> be backuped tightly.

Are you thinking of just source code or are you saying you would use
the revision system for text too? Why not just paste in a new document
to overwrite the current one?

> But wiki is possible too if sb is deeply versed into administrating one (I'm
> not).

IMHO either a wiki or a website would be fine as long as as any who
want to edit are able to do so. I have tried wiki-site.com. I like
their page editor but the site has (or at least had) graphical adverts
which I think detracts from the material. So the two options I am
aware of are Wikispaces and Google Sites. Am happy to try another,
though.

James

James Harris

unread,
Dec 18, 2010, 12:58:37 PM12/18/10
to
On Dec 16, 1:12 pm, Marco <prenom_no...@yahoo.com> wrote:
> How abouthttp://en.wikibooks.org/wiki/Subject:Computing ?

>
> Start by creating a table of contents
>
> Since this is meant to be truly a shared site it would be a good choice.
>
> For general language design discussion - I am partial to comp.lang.misc and/or comp.compilers

Yes, I'm not proposing any other forum. We could still use
comp.lang.misc and comp.compilers for any discussions.

James

BGB

unread,
Dec 18, 2010, 1:00:01 PM12/18/10
to

not thought much into it, but it seems to make sense...


> James

Jacko

unread,
Dec 19, 2010, 6:10:07 PM12/19/10
to
I decided the best forumfor the language pHone will be facebook. I
think a dropbox.com public file could be the best distribution method
for 'simple' users. All the other distro methods require too much
inside knowledge.

http://www.facebook.com/home.php?sk=lf#!/home.php?sk=group_134993063226228&ap=1

I've tried googlesites and although not bad, does have organization
issues. It does allow open comments, but not open edits.

Cheers Jacko

http://sites.google.com/site/jackokring

James Harris

unread,
Dec 19, 2010, 7:15:50 PM12/19/10
to
On Dec 19, 11:10 pm, Jacko <jackokr...@gmail.com> wrote:
> I decided the best forumfor the language pHone will be facebook. I
> think a dropbox.com public file could be the best distribution method
> for 'simple' users. All the other distro methods require too much
> inside knowledge.
>
> http://www.facebook.com/home.php?sk=lf#!/home.php?sk=group_1349930632...

I've never used Facebook for anything. I'm not even sure what it is! I
tried the link but it says "You must log in to see this page."

James

Robbert Haarman

unread,
Dec 20, 2010, 1:55:36 AM12/20/10
to
Hi James,

On Sat, Dec 18, 2010 at 08:23:41AM -0800, James Harris wrote:

> What do you think about building the site bottom up, at least
> initially? As we find out something useful it could go up on the web
> site. Or as issues are discussed on comp.lang.misc the findings could
> go on the site. I think we can thereby add something genuinely new -
> i.e. new content - to the web.

I think we should do whatever works. As you pointed out, working from a
structure that has been set out up front is quite limiting. I think
structure is very useful, but the most important thing is having valuable
content. So I am all in favor of letting users add of whatever they feel
like adding.

On the other hand, lots of interesting content without the structure
is already out there. In c.l.m, and also on the sites I linked to. So my
idea would be to add value by providing organization: the content is out
there, now let's get it organized so one can quickly get an overview of
what has already been written.

In my view, we can have it both ways: allow people to add whatever they
want, and, parallelly, try to organize and condense things to provide
a quick overview of the state of the art. And we could bootstrap this by
using the material that is already out there.

I've found C2 Wiki very valuable for that purpose, although it's a bit
too discussion-like for what I have in mind. They do try to summarize
and organize, and it works to the extent that I get an overview of what
has been figured out more quickly from C2 than from anywhere else. It's
also good that they keep the discussion accessible, but IMO Wikipedia
does that better by keeping it out of the main article. In any case,
if we can all edit, we can summarize what's already there, contribute
our own new content, and provide whatever organization we want, all
at the same time.

> Yes, I'd welcome any who want to edit. From the experience of setting
> up a wiki about source code fragments,
>
> http://codewiki.wikispaces.com
>
> occasionally people make changes and all that's logged is their IP
> address. I think it might be better to link changes to userids. Then
> at least a reader can see that user X changed the page on such-and-
> such a date. (The changes made can be seen in either case.)

There is a trade-off here. I prefer having people use handles rather
than being completely anonymous, but requiring authentication does present
a barrier to entry. Perhaps this can be sweetened somewhat by interoperating
with OpenID.

Regards,

Bob

--
For a list of the ways which technology has failed to improve our
quality of life, press 3.


James Harris

unread,
Dec 21, 2010, 5:22:07 AM12/21/10
to
On Dec 20, 6:55 am, Robbert Haarman <comp.lang.m...@inglorion.net>
wrote:

> On Sat, Dec 18, 2010 at 08:23:41AM -0800, James Harris wrote:

> > What do you think about building the site bottom up, at least
> > initially? As we find out something useful it could go up on the web
> > site. Or as issues are discussed on comp.lang.misc the findings could
> > go on the site. I think we can thereby add something genuinely new -
> > i.e. new content - to the web.
>
> I think we should do whatever works. As you pointed out, working from a
> structure that has been set out up front is quite limiting. I think
> structure is very useful, but the most important thing is having valuable
> content. So I am all in favor of letting users add of whatever they feel
> like adding.

I agree. As long as it's about programming language design (and legal,
decent, honest and truthful! - is that the phrase?) I'd say it would
be appropriate. In this context "design" would include implementation
and anything else that goes along with designing a new language.

> On the other hand, lots of interesting content without the structure
> is already out there. In c.l.m, and also on the sites I linked to. So my
> idea would be to add value by providing organization: the content is out
> there, now let's get it organized so one can quickly get an overview of
> what has already been written.

Yes, that sounds good. It's a lot of work - but I know you know that.

> In my view, we can have it both ways: allow people to add whatever they
> want, and, parallelly, try to organize and condense things to provide
> a quick overview of the state of the art. And we could bootstrap this by
> using the material that is already out there.

Agreed. I'd be quite happy for the site to be used for whatever is on
topic.

The one thing I'd like to avoid is the site having lots of markers
saying "under construction". They are a favourite of sites which never
get completed and always look to me like someone had an idea but never
found the energy to complete it. Many of the under construction
notices have got cobwebs on them!

I don't mind wiki-style red links (or whatever colour) that link to
possibly-future content. They are different in that at least they
provide information in themselves by appearing in the context of the
text in which they are found. (Hope that makes sense.)

At any rate, is it OK with you if we avoid too many under construction
signs by building up indexing - if there is any - as we go along?

> I've found C2 Wiki very valuable for that purpose, although it's a bit
> too discussion-like for what I have in mind. They do try to summarize
> and organize, and it works to the extent that I get an overview of what
> has been figured out more quickly from C2 than from anywhere else. It's
> also good that they keep the discussion accessible, but IMO Wikipedia
> does that better by keeping it out of the main article. In any case,
> if we can all edit, we can summarize what's already there, contribute
> our own new content, and provide whatever organization we want, all
> at the same time.

The C2 wiki is growing on me. It's better than I thought but takes a
while to work out what's where. It's more organic than organised. Not
necessarily a bad thing but takes a bit of getting used to. Like you,
though, I think it's a bit too discussion-like and it's somewhat
anarchic.

> > Yes, I'd welcome any who want to edit. From the experience of setting
> > up a wiki about source code fragments,
>
> >  http://codewiki.wikispaces.com
>
> > occasionally people make changes and all that's logged is their IP
> > address. I think it might be better to link changes to userids. Then
> > at least a reader can see that user X changed the page on such-and-
> > such a date. (The changes made can be seen in either case.)
>
> There is a trade-off here. I prefer having people use handles rather
> than being completely anonymous, but requiring authentication does present
> a barrier to entry. Perhaps this can be sweetened somewhat by interoperating
> with OpenID.

Fine with me.

James

0 new messages