Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DataStructure representing SyntaxDiagrams?

9 views
Skip to first unread message

Avoi...@gmail.com

unread,
Oct 20, 2012, 1:10:45 PM10/20/12
to
In the 70's there was a 2 or 3 issue article in the BYTE periodical,
developing a PASCAL compiler in BASIC. Since I had just then learned about
Finite-State-Machines represented by diagrams equivalent to syntax-diagrams
and since I wanted to make the HP [IEEE488-instrument-driver] desktop
calculator into a PASCAL-like language controller for a suite of
IEEE488 instruments, I was able to use the FSM ideas.

It's amazingly simple, and very maintainable.
You just let the source code 'walk through the syntax diagrams'
[which are nested: blok, statement, exprn.. -- I just guessing now]
and at the relevant nodes [like stations on a railwayline] you
do the action: push, pop, access SymblTbl, generate-code ...

I used to 'code' mainly from my paper-written syntax diagrams IIRC.
Even later when I ported this 'PASCAL' to 8 & 16 bit microprocessors,
running 'its own PASCAL' I prefered to use the FSM method.

Since the HP machine had a BASIC-like [ROM based] language, it had only
numbers and characters and 1-dimensional arrays of both.
Consequently, I never learned to use records. I can read them, but
I've never written any and totally lack a feeling for their use.

The FSM was represented by arrays. IIRC, one per each column of
my paper diagram. BTW this is one BIG reason why I oppose
Oberon's abandonment of <data-arrays>.

This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Can someone suggest a suiatble data stucture for the syntax diagrams?
The correspondence between the code and the paper diagrams must be
easy and obvious, for ease of maintenance, because the intended application
is structured editors for 'your next new languages'.
Fill in the table and let it guide you, when writing code,
instead of having to remember a new syntax.

BTW, for google: what is the name [L, LR, ?] of the family of grammars
like PASCAL which need only one-token look-ahead to parse?

== TIA


Richard

unread,
Oct 21, 2012, 6:27:57 AM10/21/12
to
On 10/20/12 19:10, Avoi...@gmail.com wrote:

> This may finally be an opportunity for me to learn to THINK
> in terms of records? But it seems absurd to use 1 record per
> node of the syntax diagram?

Some productions of the grammar are usually very similar. E.g., there
is probably one called Factor for multiplication and division, and one
called Term for addition and subtraction. Both have the same number of
subexpressions; thus, a single record type containing a flag
specifying the actual operation will be sufficient for all binary
operations.

Richard


Avoi...@gmail.com

unread,
Oct 23, 2012, 5:42:22 AM10/23/12
to
Seeking structures which are similar, in order to factorise,
is too much like low-level optimising.

It's not the computer that matters, it's the human.
A 2-dimensional representastion [spreadsheet-like], with perhaps
an extra dimension given by color seems impossible to beat.

With the system that I used, I could immediately see if/what
code was generated at the last N nodes/tokens before the
error token % in the code:
Total := Max(BestOf(A,B,C), Min(B,C)) + % Y ;

> Richard
BTW I'm ad-libbing this from memory.

The reaon why ETH-Oberon & V4 are marvelous, is that they
are visual: allowing you to recognise instead of needing to
remember. Plus the minimalist cognitive load of eg.
each TextFrame is labeled with what-it-can-do.

Avoi...@gmail.com

unread,
Oct 23, 2012, 5:42:38 AM10/23/12
to
In article <7d7efe1c-47c8-432a...@googlegroups.com>, rug...@gmail.com wrote:
>
> So you already have the railroad diagrams? For original Pascal, they are
> also found in Wirth's _Algorithms and Data Structures_ (1976 edition)
> in the appendix.
>
Well no. I want to apply the same principle to a completely
different application:
<traversing syntax diagrams to parse other languages,
eg. to make a syntax-aware editor, so that it shows you, what's
the next possible valid entry, like a menu shows you,
so that you just need to recognise, instead of remember>.

> I don't offhand know of any online on the web except the Pascal-ish
> PL/0 offshoot XPL0:
>
> http://www.xpl0.org/syntax.html
>
Ok, that may be good to have on-file.

> > The FSM was represented by arrays. IIRC, one per each column of
> > my paper diagram. BTW this is one BIG reason why I oppose
> > Oberon's abandonment of <data-arrays>.
>
> I don't understand what this means. AFAICT, Oberon arrays are the same as
> Pascal. Sure, for dynamic structured data, you'd be better off with POINTER
> TO RECORD, but ARRAY OF CHAR, INTEGER, etc. is still available.
> Or maybe you mean Oberon-07 restrictions? Dunno.
>
IIRC in Turbo-PASCAL [and definitely in my PLO-derivative] I could write:
CharY[ "B","S","A","I","R"];
IntR[1,4,7,3,5];
and by suitably spacing/formatting the source code, it looked like
a 2-dim-array. Which is a massively powerfull tool. Which is why
spreadseets were the killier-app. because you can easily map
cause-and-effect, from eg. a bus/train time-table.
Oberon doesn't allow this.




> > This may finally be an opportunity for me to learn to THINK
> > in terms of records? But it seems absurd to use 1 record per
> > node of the syntax diagram?
>
> Just try it out and see. If it doesn't work, use something else. There's no=
> fixed answer, just use whatever works.
>
> > Can someone suggest a suiatble data stucture for the syntax diagrams?
>
> In Wirth's 1976 A+D book, chapter five was the mini PL/0 compiler example. =
> Later he ripped that out in lieu of a separate book entirely: _Compiler Con=
> struction_
>
> http://www.inf.ethz.ch/personal/wirth/books/CBEAll.pdf
>
> > The correspondence between the code and the paper diagrams must be
> > easy and obvious, for ease of maintenance, because the intended
> > application
> > is structured editors for 'your next new languages'.
> > Fill in the table and let it guide you, when writing code,
> > instead of having to remember a new syntax.
>
> > BTW, for google: what is the name [L, LR, ?] of the family of grammars
> > like PASCAL which need only one-token look-ahead to parse?
>
> I believe Pascal's grammar is called LL(1), but most other languages use
> more complicated ones, e.g. LALR or LR (or even GLR). I suppose Pat Terry's
> adaptation of Coco/R is probably the easiest non-YACC/LEX toolset to use.
>
> http://en.wikipedia.org/wiki/LL_parser
>
> http://www.scifac.ru.ac.za/compilers/
> http://www.scifac.ru.ac.za/coco/

I've looked at coco previously. It lacks: what I had was MAGIC!
You just look at the paper, and at any node you can see immediately:
= what the valid Next-Tokens are,
= what action is done at that node.

So you don't need to write a top-down-compiler, with the
syntax of the langauge interwoven in the compiler's code.

You write a utility to read the syntax diagrams, where the reader
is completely separate from the particular syntax. The syntax is in the
diagrams. And the code-generation is also separate, and represented
by notation next the the corresponding node.

So, IIRC, I needed a dummy node, so that when exiting from an
assignment statement: X := 2+3
the action at the dummy-node was:
pop [the SymbtblPointer (which was pushed at node "ID=X"]
and use it to show where to store [block & offset] the
value of the expression: 2+3.
==============

Avoi...@gmail.com

unread,
Oct 23, 2012, 11:07:49 AM10/23/12
to
In article <slrnk8crnn...@toad.stack.nl>, Marco van de Voort <mar...@toad.stack.nl> wrote:

> On 2012-10-20, rug...@gmail.com <rug...@gmail.com> wrote:
> >> and at the relevant nodes [like stations on a railwayline] you
> >> do the action: push, pop, access SymblTbl, generate-code ...
> >
> > So you already have the railroad diagrams? For original Pascal, they are
> > also found in Wirth's _Algorithms and Data Structures_ (1976 edition) in
> > the appendix.
>
> Free Pascal "ref" manual has railroad diagrams:
>
> http://www.freepascal.org/docs-html/ref/refse90.html#x190-20000016.2
>
Is it SOO difficult to understand me?!

I'm asking people who know about the syntax diagram method of
syntax specification, if they have ideas about suitable data-structures
for representing such <railroad diagrams> ?

And I'm noting that a previous very successfull method which I used
decades ago [so I'd have to re-invent the details now] used several
arrays: representing [IIRC] one-per-column of my diagrams.
Instead of left to right, I drew them top-to-bottom.

And then I'm commenting on the inconvenience that Oberon,
does NOT provide for such data-arrays.

-> ipNews.Send *

Richard

unread,
Oct 23, 2012, 1:35:24 PM10/23/12
to
On 10/23/12 11:42, Avoi...@gmail.com wrote:

> Seeking structures which are similar, in order to factorise,
> is too much like low-level optimising.

I think, it is an interesting challenge to find common patterns and to
limit complexity this way.

> A 2-dimensional representastion [spreadsheet-like], with perhaps
> an extra dimension given by color seems impossible to beat.

Well, you could store this information in a text file and read it into
appropriate data structures at runtime. The Oberon System already
provides a text scanner which makes this easy.

Of course, this is not as elegant as built-in array literals, but such
is the price of simplicity.

Richard


Marco van de Voort

unread,
Oct 23, 2012, 3:24:08 PM10/23/12
to
On 2012-10-23, Avoi...@gmail.com <Avoi...@gmail.com> wrote:
>> > also found in Wirth's _Algorithms and Data Structures_ (1976 edition) in
>> > the appendix.
>>
>> Free Pascal "ref" manual has railroad diagrams:
>>
>> http://www.freepascal.org/docs-html/ref/refse90.html#x190-20000016.2
>>
> Is it SOO difficult to understand me?!


> I'm asking people who know about the syntax diagram method of
> syntax specification, if they have ideas about suitable data-structures
> for representing such <railroad diagrams> ?

Well, let's do a first attempt:

Basically it is EBNF. The devil is in the possible recursion.

If you require that a repititeve term always has a separate definition, one
could probably structure each production as an array of record.

The record would have tokentype, a variant part (to store literal values),
and some flags (to indicate repetition)

> And I'm noting that a previous very successfull method which I used
> decades ago [so I'd have to re-invent the details now] used several
> arrays: representing [IIRC] one-per-column of my diagrams.
> Instead of left to right, I drew them top-to-bottom.

I think if you really want a comment on such vague descriptions, better work
out a detailed example.

> And then I'm commenting on the inconvenience that Oberon,
> does NOT provide for such data-arrays.

Well, I have never seen a requirement of "Oberon", since you crossposted
this to M2 and Pascal.misc. And afaik both support multi dimensional arrays
just fine (but not ragged ones).

However I couldn't make heads or tails of your example, that was most
definitely not Pascal or M2.

Bernhard Treutwein

unread,
Jun 8, 2013, 10:01:02 AM6/8/13
to
> Can someone suggest a suiatble data stucture for the syntax diagrams?
> The correspondence between the code and the paper diagrams must be
> easy and obvious, for ease of maintenance, because the intended
> application
> is structured editors for 'your next new languages'.
> Fill in the table and let it guide you, when writing code,
> instead of having to remember a new syntax.
>
yet another month later, I was able to track down my memories ...
there is an EBNF visualizer, which creates syntax diagrams from
EBNF code at http://dotnet.jku.at/applications/visualizer/. It is
implemnted in C# and has been generated with the help of Coco
(the Compiler Compiler Toolkit) (see
http://www.scifac.ru.ac.za/cspt/modula2.htm#Compilertools) and/or
http://www.ssw.uni-linz.ac.at/coco/).

Bernhard


Richard

unread,
Jun 8, 2013, 4:16:26 PM6/8/13
to
On 06/08/13 16:01, Bernhard Treutwein wrote:
> there is an EBNF visualizer, which creates syntax diagrams from
> EBNF code at http://dotnet.jku.at/applications/visualizer/. It is
> implemnted in C# and has been generated with the help of Coco
> (the Compiler Compiler Toolkit) (see
> http://www.scifac.ru.ac.za/cspt/modula2.htm#Compilertools) and/or
> http://www.ssw.uni-linz.ac.at/coco/).

A similar tool, which also works in other operating systems than
Windows, is the Java-based ANTLRWorks:

http://www.antlr3.org/works

It contains an editor for ANTLR grammars (slightly different syntax
than EBNF: e.g. (...)? instead of [...]) and can show syntax diagrams
immediately during editing.

Richard
0 new messages