Fundamental types! What to do with them?

SeanBDurkin

unread,

Sep 6, 2009, 4:42:53 AM9/6/09

to PascalScriptServerPages

Colleagues,

In my pascali vision wiki pages, I suggested that

SeanBDurkin

unread,

Sep 6, 2009, 4:52:31 AM9/6/09

to PascalScriptServerPages

Colleagues,

I my Pascali vision wiki pages, I suggested that the language should
treat all fundamental types as objects. This would require the
language to make a distinction between value assignement (think Delphi
assignment statement on an Integer variable), and reference assignment
(think Delphi assignment statement on a TObject variable).

A contrary mode of thinking, would be to allow Delphi - style of
fundamental types, which fundamentally (no pun intended) operate
differently from objects. In this mode of thinking, there is often a
need for variants.

Both forms could work. The first is cleaner and more object-oriented,
and only possible in an interpreted language. But the second is closer
to Delphi and maybe Delphi developers will be more comfortable with it
because it is more familiar.

What do you think guys?

Faithfully,
Sean B. Durkin

Sean

unread,

Sep 10, 2009, 5:48:42 AM9/10/09

to PascalScriptServerPages

Hi Colleagues.

In answer to my own question, I think I can have both. The more I
think
about it, the more I want to make Pascali closer to Delphi and not try
to take great leaps away from it for the sake on engineering beauty.

So Pascali will offer non-object fundamental types just like Delphi,
but ALSO object wrappers around fundamental types that can be
specified
easily with literal expressions.

I said before that my starting point should be the formal syntax
declaration.
And I do plan to do that soon, but actually, right now, I am all fired-
up
to write a small Proof-of-concept dummy script engine for IIS. So that
is what I am going to do, and my first priority. With any luck, I
should
do it over the weekend. :-)

Oh - by the way, we now have a code repository. It is located at
"http://code.google.com/p/project-pascali/", but is currently empty.
I will let you know when I have developed some content.

Faithfully,
Sean.

Jonathan Bishop

unread,

Sep 10, 2009, 8:46:23 AM9/10/09

to pascalscrip...@googlegroups.com

Umm - can I suggest we get the language definition right first? Seriously,
the script engine is a lot simpler with a clear definition of the
language...

If you haven't done this before, I suggest that the easiest/fastest approach
is to create a function that builds a node for each BNF clause in the
language with a pointer to an associated real Delphi function that actually
implements the clause, and use the Delphi stack as your temporary parse tree
(populating a couple of helper structures like compiler directives table,
function pointer table, constants tables, string tables as you go, and
variable tables), then collapse out of the functions sewing the nodes into
an interpretable btree. Then simply traverse the btree, executing the the
function referenced by each node as you go, using the Delphi stack to be
your interpreter's stack.

The disadvantage of this approach is that the entire app must be
syntactically correct to execute, and start up is slightly slower than a
lazy evaluater - where as an interpreter that tokenizes and interprets as it
goes doesn't care about the bits that it doesn't actually have to execute -
and therefore starts faster, but then it also results in slower total
execution, and allows bugs through that only show in production when obscure
code paths are executed.

Class polymorphism and inheritance is going to be a little trickier, but the
process is made a lot easier with a clear language definition that includes
inheritance and interfacing rules.

But I guess it isn't the end of the world to jump in feet first...You will
need a tokeniser first. In my experience the tokeniser is best designed to
break a token on every non alpha-numeric+ '_' character. This makes the
design of layer interpreter functions following the language grammar more
straight forward.

One consideration is how strings should be handled. If the pure tokeniser
is adopted, it means strings are tokenized to the individual components,
which also means that you need to store the spacing with each token so you
can reassemble a string exactly. Alternatively the tokeniser can know about
strings and present them as a single token - but then you need to have the
string rules as part of the tokeniser. The latter is more powerful as a
generalized string handler, but the former is more robust and flexible in
that it maintains true language independence between the tokeniser and the
language.

In spite of this I suggest the tokeniser should know about strings and the
string parsing rules. Now, this is not a automatic as it sounds as we have
to decide how line breaks in strings are to be parsed. As an embedded
scripting language you might decide to allow the string to spread over
multiple lines because this can be an important simplification when
embedding blocks of HTML or javascript into the pascal script, but as a
Delphi mirror you would break over the line break.

This gets messier when you think about quoting rules.

It is admirable to want to stick with the Delphi syntax, but what object
centric Delphi is good at is not necessarily the same as a text centric web
app requires to minimize coding and maximize clarity.

This is one of the reasons why I suggest a clear written definition of the
language with sample scripts is an extremely wise first step.

Your call, of course.

Regards

Jonathan Bishop
Managing Director

Bishop Phillips Consulting | Melbourne, Australia – Vancouver, Canada
Mobile +61 411.404.483 | Office +61 (3) 9525.7066 | Fax +61 (3) 9525.6080
bis...@bishopphillips.com | www.bishopphillips.com

SeanBDurkin

unread,

Sep 13, 2009, 9:03:57 PM9/13/09

to PascalScriptServerPages

All,

I've been working on the weekend on a formal syntax definition for
Pascali. (http://www.seanbdurkin.id.au/psstiki/tiki-index.php?
page=Pascali+Syntax). There is still a long way to go, but it is a
good start. So I will add a page or so every few days. I am expecting
that this endeavour will take 3 or 4 weeks just to get to a first
draft. But its interesting work for me. My plan of work is go define
from bottom up. That is I start with the simplest low level concepts
like literal values and work my way up to units, tag fragments and
concepts of scope.

Next week's goal is to cover expressions and statements.

Faithfully,
Sean B. Durkin

Responses interleaved below.

On Sep 10, 10:46 pm, "Jonathan Bishop" <bish...@bishopphillips.com>
wrote:

> Umm - can I suggest we get the language definition right first? Seriously,
> the script engine is a lot simpler with a clear definition of the
> language...

Agreed.

>
> If you haven't done this before, I suggest that the easiest/fastest approach
> is to create a function that builds a node for each BNF clause in the
> language with a pointer to an associated real Delphi function that actually
> implements the clause, and use the Delphi stack as your temporary parse tree
> (populating a couple of helper structures like compiler directives table,
> function pointer table, constants tables, string tables as you go, and
> variable tables), then collapse out of the functions sewing the nodes into
> an interpretable btree. Then simply traverse the btree, executing the the
> function referenced by each node as you go, using the Delphi stack to be
> your interpreter's stack.
>

Its been more than 2 decades scince I wrote my last compiler and my
memories of
the green dragon grow dim. Your right about leveraging the BNF
clauses, but at first I
dont understand where a btree comes into it. After doing a bit of
reading (http://en.wikipedia.org/wiki/Recursive_descent_parser),
I think you are describing a recursive decent parser.

> The disadvantage of this approach is that the entire app must be
> syntactically correct to execute, and start up is slightly slower than a
> lazy evaluater - where as an interpreter that tokenizes and interprets as it
> goes doesn't care about the bits that it doesn't actually have to execute -
> and therefore starts faster, but then it also results in slower total
> execution, and allows bugs through that only show in production when obscure
> code paths are executed.

Yeah - there appears to be 2 competing schools of thought on
interpreters:
(1) The JIT model, where source is read, tokenised and executed, just-
in-time and all in one hit; and
(2) The compiler model, where all the source is read, tokenised and
some kind of structured output is produced which is then independantly
executed.

Remember that in a web application (generally) the user doesnt get any
information until the whole page has been produced. This means that,
unlike a desktop application, start-up times are relativily
unimportant, and total time becomes the overriding factor in
determining the user's level of satisfaction. In the back of my mind
is the memory of how painfully slow PHP websites can be. PHP uses JIT,
and the JIT model appears to be favoured in modern times. However I
would prefer to use the compiler model because at the end of the day,
I want Pascali to compare favourably to PHP in terms of page rendering
time.

>
> Class polymorphism and inheritance is going to be a little trickier, but the
> process is made a lot easier with a clear language definition that includes
> inheritance and interfacing rules.
>
> But I guess it isn't the end of the world to jump in feet first...You will
> need a tokeniser first. In my experience the tokeniser is best designed to
> break a token on every non alpha-numeric+ '_' character. This makes the
> design of layer interpreter functions following the language grammar more
> straight forward.
>
> One consideration is how strings should be handled. If the pure tokeniser
> is adopted, it means strings are tokenized to the individual components,
> which also means that you need to store the spacing with each token so you
> can reassemble a string exactly. Alternatively the tokeniser can know about
> strings and present them as a single token - but then you need to have the
> string rules as part of the tokeniser. The latter is more powerful as a
> generalized string handler, but the former is more robust and flexible in
> that it maintains true language independence between the tokeniser and the
> language.
>
> In spite of this I suggest the tokeniser should know about strings and the
> string parsing rules. Now, this is not a automatic as it sounds as we have
> to decide how line breaks in strings are to be parsed. As an embedded
> scripting language you might decide to allow the string to spread over
> multiple lines because this can be an important simplification when
> embedding blocks of HTML or javascript into the pascal script, but as a

I like the idea of allowing multiple line strings. In a general
purpose language, this would not be worth the trouble,
but for a web scripting language, I can see how useful that will be.
In fact I have already described the
syntax of normal string literals and multiline ("embedded") string
literals in the first section
of the Pascali syntax definition.

> Delphi mirror you would break over the line break.
>
> This gets messier when you think about quoting rules.
>
> It is admirable to want to stick with the Delphi syntax, but what object
> centric Delphi is good at is not necessarily the same as a text centric web
> app requires to minimize coding and maximize clarity.

Agreed. Deviations from Delphi syntax should be made where it makes
sense for this particular problem domain.

>
> This is one of the reasons why I suggest a clear written definition of the
> language with sample scripts is an extremely wise first step.
>
> Your call, of course.
>
> Regards
>
> Jonathan Bishop
> Managing Director
>
> Bishop Phillips Consulting | Melbourne, Australia – Vancouver, Canada

> Mobile+61 411.404.483| Office+61 (3) 9525.7066| Fax +61 (3) 9525.6080
> bish...@bishopphillips.com |www.bishopphillips.com

Sean Durkin

unread,

Sep 16, 2009, 1:21:07 AM9/16/09

to pascalscrip...@googlegroups.com

Colleagues,

Should mention of White Space be explicitly included in syntax
definitions?

It seems that "No" is the favoured answer in other peep's definitions,
including Dragonkiller's.

The idea seems to be that you can fit white space in almost where, and
the reader understands the exceptions. That has always bothered me a
bit. What about when you DONT want to allow the script writer to put in
white space between the tokens. For example you can't have arbitrary
white space inside a quoted string (in Delphi or Pascali), or inside a
guid (in Pascali).

For example here is the ebnf for a literal constant guid value in
Pascali.
EBNF:
digit = "0" .. "9";
guid char = digit | ("A" .. "F");
guid = "[*" , 8 * guid char , "-" ,
4 * guid char , "-" ,
4 * guid char , "-" ,
4 * guid char , "-" ,
12 * guid char , "*]";

And here is what it looks like ....

Example pascalia:
[*1FB62321-44A7-11D0-9E93-0020AF3D82DA*] // No white spaces
allowed!

I don't know what the best thing to do is here. Should Optional White
Space be included in the syntax definition? Or left implicit?

Faithfully,
Sean.

Reply all

Reply to author

Forward