AST (Parsed RB code)

1 view

Skip to first unread message

Thomas Tempelmann

unread,

Feb 5, 2006, 3:43:15 PM2/5/06

to REALs...@googlegroups.com

Scott and others:

I have a really hard time with Morphe parsing RB code. I get slowly
the impression that Morphe has one or two serious problems (meaning
it's not just my incomprehension of how things should work with it).
But Jonathan, the author of Morphe, is very helpful and perhaps we'll
get it all working in a while.

My goal is to create a AST (abstract syntax tree) of a piece of RB
code. I need your feedback to decide if it's going the right way for
you:

1. as far as I can tell, dealing with syntactically invalid source
code will not be very good. Meaning that if the source has a syntax
error, you will not get an AST that's mostly complete but it may as
well just be totally useless because it stops at the error entirely. I
wonder if can can live with that?

2. the tree will tell you only about the syntactical layout of the
code, hardly any semantics, though.
For instance, it will say that there is a identifier followed by
paranthesis, containing some value (e.g. "foo(4)"), which could mean
either an array element access or a function call. To figure out what
it is, you have to analyse the scopes of the identifiers in the code
(i.e. find out where "foo" comes from). The syntax tree won't do that
for you (although you can find all identifiers from the syntax tree,
and then find out that there is a "foo" variable, and maybe you also
find a "foo" method, and then it's up to you to know which one is in
reach at that time).

Does that work for you?

--
Thomas Tempelmann - exaggerating over a million times a day!
http://www.tempel.org/rb/ -- The primary source of outdated REALbasic
plugins and examples
Skype: tempel.org AIM: superTempel

Scott Steinman

unread,

Feb 5, 2006, 6:09:57 PM2/5/06

to REALs...@googlegroups.com

Hi Thomas-

> I have a really hard time with Morphe parsing RB code. I get slowly
> the impression that Morphe has one or two serious problems (meaning
> it's not just my incomprehension of how things should work with it).

For me, it's mostly incomprehension. :) I'm having a very difficult
time understanding how Morphe's code works, but then again I'm a self-
taught programmer. I took a few college-level courses in computer
science a few years ago, but never got to the courses on theory of
computation or compiler design. that may be why I'm having
difficulty undestanding the code.

> 1. as far as I can tell, dealing with syntactically invalid source
> code will not be very good. Meaning that if the source has a syntax
> error, you will not get an AST that's mostly complete but it may as
> well just be totally useless because it stops at the error entirely. I
> wonder if can can live with that?

I've already listed as one of the requirements for Reality Check that
projects must be compilable code, i.e., syntactically correct. I
don't think that it's an unreasonable requirement. Reality Check is
not meant to replace the REALbasic compiler's syntax checking.
Therefore, we can live with displaying an error message if the code
is not syntactically correct.

> 2. the tree will tell you only about the syntactical layout of the
> code, hardly any semantics, though.
> For instance, it will say that there is a identifier followed by
> paranthesis, containing some value (e.g. "foo(4)"), which could mean
> either an array element access or a function call. To figure out what
> it is, you have to analyse the scopes of the identifiers in the code
> (i.e. find out where "foo" comes from). The syntax tree won't do that
> for you (although you can find all identifiers from the syntax tree,
> and then find out that there is a "foo" variable, and maybe you also
> find a "foo" method, and then it's up to you to know which one is in
> reach at that time).
>
> Does that work for you?

This could pose a problem. There are two stages in abstract syntax
tree construction: (1) creating the tree itself based upon the syntax
of the code, and (2) "decorating" the tree based upon the semantics,
which contains references at each node of the tree containing an
identifier to its entry in the symbol table. You're correct that we
need to access information in the symbol table and analyze the scope
of each symbol in order to resolve the references to each identifier.
This information has to be preserved in the abstract syntax tree
too. For example, if foo(4) is a method call, we need to know what
type of information it is returning to check if it is a reference to
a data type used in the project, or an object which will be used to
call a method (foo(4).doSomething). When Reality Check does its
reference search to look for unused methods or data, or to see if
data or methods can have a more restricted scope, it's relying on
information such as this.

It is also the "decorated" tree that I need in order to do
refactoring. I need to know at each location of the program whether
an identifier is a method call or an array, etc., and which specific
one it is, because if I rename that identifier throughout the
program, making an error would be disastrous. In addition, if I
rearrange code when refactoring, all of the references within the
rearranged code must be correctly identified.

Am I making sense? Today's _another_ bad migraine/vertigo day. My
wife had to help me walk out of a restaurant because I was so dizzy I
couldn't stand up by myself. I feel better now that I'm sitting at
my computer desk, but it's hard to concentrate so I may not be
communicating effectively.

-Scott

Dr. Scott Steinman
Brought to you by a grant from the Steinman Foundation (Thanks, Mom
and Dad!)
Recommended by Major University Studies Over the Leading Brand
steinman at midsouth dot rr dot com

I hope I die peacefully in my sleep like my grandfather. . .not
screaming in terror like his passengers. -- "Deep Thoughts", Jack Handy

Ed Kleban

unread,

Feb 7, 2006, 12:07:21 PM2/7/06

to REALs...@googlegroups.com

On 2/5/06 5:09 PM, "Scott Steinman" <stei...@midsouth.rr.com> wrote:

>
> Hi Thomas-
>
>> I have a really hard time with Morphe parsing RB code. I get slowly
>> the impression that Morphe has one or two serious problems (meaning
>> it's not just my incomprehension of how things should work with it).
>
> For me, it's mostly incomprehension. :) I'm having a very difficult
> time understanding how Morphe's code works, but then again I'm a self-
> taught programmer. I took a few college-level courses in computer
> science a few years ago, but never got to the courses on theory of
> computation or compiler design. that may be why I'm having
> difficulty undestanding the code.

I'm clueless regarding morphe. If you have any documentation, clues, email
correspondence with jon, or other useful doc, I'd love to learn more so I
can better understand it.

>> 1. as far as I can tell, dealing with syntactically invalid source
>> code will not be very good. Meaning that if the source has a syntax
>> error, you will not get an AST that's mostly complete but it may as
>> well just be totally useless because it stops at the error entirely. I
>> wonder if can can live with that?
>
> I've already listed as one of the requirements for Reality Check that
> projects must be compilable code, i.e., syntactically correct. I
> don't think that it's an unreasonable requirement. Reality Check is
> not meant to replace the REALbasic compiler's syntax checking.
> Therefore, we can live with displaying an error message if the code
> is not syntactically correct.

It won't fully work for my needs as I need to work on code as it is being
written, but certainly something is better than nothing and every little
clue helps.

>> 2. the tree will tell you only about the syntactical layout of the
>> code, hardly any semantics, though.
>> For instance, it will say that there is a identifier followed by
>> paranthesis, containing some value (e.g. "foo(4)"), which could mean
>> either an array element access or a function call. To figure out what
>> it is, you have to analyse the scopes of the identifiers in the code
>> (i.e. find out where "foo" comes from). The syntax tree won't do that
>> for you (although you can find all identifiers from the syntax tree,
>> and then find out that there is a "foo" variable, and maybe you also
>> find a "foo" method, and then it's up to you to know which one is in
>> reach at that time).
>>
>> Does that work for you?

One question I do have however is how to access the tree information
content. Specifically, if I know there are three occurances of the string
"Foo" on line 27. How possible or difficult will it be for me to use the
more result if I want to investigate "what's the usage of the 2nd occurrence
of foo on line 27"? Basically I want to know first if it's in
code, comment, or string -- which the simple context scanner algorithm I
described for you previously will tell me immediately. Given that I
discover that it's in a piece of code -- I guess the key thing I care about
most is scope, which as you've pointed out, Morphe isn't going to tell me
directly. After I know those... I'm not much sure I care too much about
usage of it for my main-line purpose. If I did know more then I, or we,
could do some cute things like more rainbowific text coloring I suppose.
But I guess my basic conclusion is I don't have the same "understand the
code" level requirement that Scott does.

Reply all

Reply to author

Forward

0 new messages