REALsource Parse Tree Terminology

0 views
Skip to first unread message

Ed Kleban

unread,
Dec 21, 2005, 12:41:47 AM12/21/05
to REALs...@googlegroups.com

Below is the text of a discussion that was started about the same time as the REALsource mailing list.  I'm copying the first couple of private messages between Thomas and myself here so all can participate.  

=====

On 12/14/05 3:43 PM, "Thomas Tempelmann" <t...@tempel.org> wrote:

>
> BTW, I'm currently once again redesigning some of the classes. I believe
> I'll now steer it in the way that I make the "RBPrjItemsTree" (or
> formerly called RBPrjTree, I believe) the central container.

It might be worthwhile to spend a few words here addressing terminology.  Having been up and down all these hierarchies several times and changed my terminology a good many times as well over the last several years, I'll offer the following background info for your collective consideration.  It gets a bit deep, so if you feel like your getting lost just take what you like and ignore what confuses:

Once we iron this out and clean it up, this can be some great stuff to add to the Wiki as well:

*) The terms "Project", "File", "Project File", "Project Item", and "Folder" all have very well-defined meanings in the RB literature and documentation.  Most notably, Project Items are the named entities appearing on the "Project" tab (of RB 2005 or in the Project Window of pre-2005) in an arrangement that I whimsically refer to as the "Folderarchy" and should probably be formally referred to as the Project Item Hierarchy or Project Hierarchy for short.

*) The Project Item Hierarchy is independent of the Class Inheritance Hierarchy and the Interface Inheritance Hierarchy -- or Class Hierarchy and Interface Hierarchy for short.  The later of these is rarely talked about, but it is NOT the same as the Class Hierarchy.


*) Despite a massive amount of reluctance on my part, I have bowed to the intuitive, the large existing documentation base, and what is probably the inevitable and used the term Project Item "type" as the characteristic that distinguishes among the various different kinds of Project Items that can included in the Project and arranged in Folders.  These of course include "Global Modules", "Classes", "Interfaces", "Windows", "Pictures", and a few other "Internal" and "External" Project Item Types.

*) As far as I am aware, there is no standard term for the various pieces of a Project Item that are grouped by kind and presented alphabetically in the Class, Module, Window, and Interface tabs of an IDE Window such as properties, methods, constants, menu handlers, etc.  I have chosen to refer to entities at this level as "Project Item Parts" and to the components that they bread down into such as constant name and constant value as "subparts".  

*) I don't yet have a standard term I'm enthralled with for the various part classifications such as "Methods" and "Properties".  However there are some specific words I definitely want to avoid for these, including the term "Groups", "Types", and "Kinds".  I welcome suggestions on this front.

*) On the XML file format side of the world I've endeavored to keep with standard XML terminology and use the terms XML Element, Tag, and -- especially once rendered into a parse tree -- the term Node.  A distinction that Theo makes in his XML Engine which I use is that the tree structure is constructed from "XML Nodes" as the lowest-level conceptual node (or highest level in the class hierarchy for implementing the tree structure). Two subclasses are XML_Element and Text_Node.  In the end whatever package one is using to parse the XML will dictate these terms.  The only point I really want to make is that since "Element" and "Node" are used extensively here I try to avoid overloading these terms by choosing to not also use them elsewhere.

*) On the RBP file format side the term "Group" is in use and will have to remain one of the overloaded uses of this term.  I refer to the 8-character identifiers here as RBP tags which essentially all have counter parts in the XML format realm which I refer to as XML Tag Identifiers or simply XML Tags.

*) In both the RBP and XML formats for RB Project files the term "block" has a very evident presence as represented by the "block" XML tag.  Blocks have a "block type" as designated by the "type" "attribute" in the XML tag.  The "ID" tag attribute of blocks is used to knit together the Folderarchy based on temporary id tags assigned to blocks that for all practical purposes should probably be considered as strictly local to a given file.  The jury is still out on that one based on some feedback Thomas has offered me.

*) The term "Module" is unfortunately overloaded in the context of the XML file format since it is used as a "block type" that can represent either what I call a "Global Module" to avoid ambiguity, or a Class, or a (Class) Interface.

*) In rendering all of the XML element nodes down to a very deep level into representations either within a parse tree or a database record representation thereof, I have formerly used the term "item" for each of these.  And that I have concluded is a very serious mistake.  When you're writing code that translates "Project Items" into arbitrary ... let's call them "node items", the code can get real ambiguous and messy looking real quick.  In retrospect, I've decided that things would have been much cleaner if I had picked some other term.  My current favorite for these is "Member" and I plan to do a massive renaming edit to clean up this mess sometime soon.  This is probably a little vauge and should be explained more clearly, but for this usage in essence each XML Element node effectively gets mapped to some database record "member" in my current implementation, or would have been mapped to an in-memory member Object using my pre-database implementation.

*) I'll also toss out two more terms: "Rep" and "Stem" that I have used quite successfully.  In my former implementations, "Stem" (as in the stem of branch of a tree) was a term I used for an object that represented a parsed structure from some provider source such as a file.  Thus I had (and still have) XMLStems (or xStems) that correspond to each of the parsed nodes from an XML parse tree.  Using my new terminology I suppose each of these stems would could be called an XML member.  A similar, but different concept is the "Rep" (for representation) of a project item, part, or subpart at some arbitrary level in the structural hierarchy.  In my early implementation there was essentially a 1:1 mapping between stems and reps.  When I parsed an XML file, I got a parse Tree of XML Elements.  I would then build stems that mapped to these nodes.  The stems extracted out the attributes I wanted to use from the XML and stored them as easily accessible properties in the stems. Stems would also provide access to other XML-specific meta data such as the start and length of the source span for an XML Element in the original source file.  I had a whole class hierarchy of different flavor xStems resulting from analyzing an XML parse tree derived from an XML file; and I had always intended that at some point I would create a class hierarchy of various rbpStems resulting from analyzing the parse tree I'd get back from parsing an RBP file.   But the Reps that were constructed from stems to represent the project item members were completely independent of whether the parsed representation happened to be stored as an xStem or as an rbpStem or one of several other kinds of stems I could imagine.  Essentially there is a Rep interface, and a stem interface and as long as your classes implement a compatible set of methods it simply doesn't matter what structures they use or what files they parse.

====

So then, as you were saying Thomas:
 
>
> BTW, I'm currently once again redesigning some of the classes. I believe
> I'll now steer it in the way that I make the "RBPrjItemsTree" (or
> formerly called RBPrjTree, I believe) the central container.

You may want to figure out what terms you want to in all the various domains and to what depth your tree that is going to contain Project Items is going to go when choosing RBPrjItemsTree -- or anything else as a name.

There is no requirement that we agree on a standardized terminology and all use it.  But if do it will probably be helpful and allow us to work toward defining a standard.

I look forward to your feedback.

Thanks!
--Ed

Ed Kleban

unread,
Dec 21, 2005, 12:49:56 AM12/21/05
to REALsource
Thomas then replied:

===

Ed Kleban wrote:

>It might be worthwhile to spend a few words here addressing terminology.

Good - I always have a hard time with those.

In summary:

'Project Items' are those first-level "blocks" in a prj file

'Project Item Hierarchy' is the way a user groups the Project Items
using
Folders in the IDE's left browser pane.

'Project Item Types' differentiate between Classes, Windows, Modules,
etc.

'Project Item Parts' denote things such as methods, properties,
controls,
...
(yes, that's not a name I'm too happy with, either)

'Member' - hmm, is that some part of a prj file? Apparently not, it's
rather something you use internally only, right? So, I can forget that
term right away again :)

> The only point I really want to make
>is that since "Element" and "Node" are used extensively here I
>try to avoid overloading these terms

Why? If you keep saying "XML Element" and "XML Node", you could use
node
and elem in other areas with other prefixes as well.

> On the RBP file format side the term "Group" is in use

I avoid that term altogether. It has no self-explanatory meaning to me
in
regards to understanding Prj parts.

>In both the RBP and XML formats for RB Project files the term "block" has
>a very evident presence

That would be, as pointed out above by me, identical to 'Project Item',

right?
So, let's just avoid "block" as well and use Project Item instead, when

we talk about Prj parts.
Or call them alternatively "Project Block", which should be
non-redundant, isn't it?

Thomas


--
Thomas Tempelmann. Mad, mad scientist. <http://www.tempel.org/>

======================================

And Thomas also replied:

===

Ed Kleban wrote:

>> BTW, I'm currently once again redesigning some of the classes. I believe
>> I'll now steer it in the way that I make the "RBPrjItemsTree" (or
>> formerly called RBPrjTree, I believe) the central container.
>

>You may want to figure out what terms you want to in all the various domains
>and to what depth your tree that is going to contain Project Items is going
>to go when choosing RBPrjItemsTree -- or anything else as a name.

I could need some help with these terms I'm using so far:

RBPrjItem, RBObj, RBPrjIdent

RBPrjItem:
Items (both for nodes and leafs) in a tree that hold an abstract
copy
of a project file (coming from either a binary or XML file). Since
Ed
suggests to use "Project Item" for only the parts of the outer
"block"
level (Module, Class, Window, ...), I should find a different name
for
this. Perhaps RBPrjMember? RBPrjThing? RBPrjDoodah? :)

RBObj and RBObjEntity:
A base name for items in a tree of code-related parts, such as
classes,
modules, methods, properties etc.
But since "obj" = Object, and since objects have a specific meaning
in
RB, a better name would be nice here as well. Perhaps it should also
be
prefixed with "RBPrj" like almost all of my other classes

RBPrjIdent:
Ident is short for Identifier - it's used to build a tree for the
class browser, in which we have a top (global) level and then these
idents are methods, classes, controls, properties etc. that are
visible
on that or deeper scope levels.

I think this latest one, Ident, is the one that's the most unique and
self-explanatory. I could use "Symbol" instead, but decided against it
because Symbol also includes RB language elements such as "if", "dim"
etc,
right? Or did I get confused here?

Any suggestions?


--
Thomas Tempelmann. Mad, mad scientist. <http://www.tempel.org/>

Ed Kleban

unread,
Dec 21, 2005, 2:01:38 AM12/21/05
to REALs...@googlegroups.com
And now I think we're in synch in the REALsource thread and can proceed:

=======

Hey Thomas,

Thanks for the positive feedback and questions. My apology for not
answering sooner but I've been swamped on other fronts.

My comments below.

>> It might be worthwhile to spend a few words here addressing terminology.
>
> Good - I always have a hard time with those.
>
> In summary:
>
> 'Project Items' are those first-level "blocks" in a prj file
>
> 'Project Item Hierarchy' is the way a user groups the Project Items
> using Folders in the IDE's left browser pane.
>
> 'Project Item Types' differentiate between Classes, Windows, Modules,
> etc.

Umm. Yes, but see below.



> 'Project Item Parts' denote things such as methods, properties,
> controls,
> ...
> (yes, that's not a name I'm too happy with, either)

Pardon? You are happy or you are not happy with this? Actually I like the
term "parts" so far.

I've just spent the last two days coding and debugging in this space and I
have a few afterthoughts to offer:

) There are several distinct concepts that it is worth giving separate
names. My current names for these are:

"Project Item Types"
These are indeed the various kinds of items that appear in the "Project"
Tab of the RB 2005 IDE, or in the old "Project" window of the pre-2005
interface. The term should also correspond, both in usage and definition,
with usage in the RB User's Guide. No problem there.


"Block Types"
However are different. I use this term to specifically refer to the
various enumerated strings that occur in the XML Elements preceded by:

<block type="

Thus "Window", "Folder", "Module", and "Picture" are all proper kinds both
of Block Types and of Project Item Types.

HOWEVER:

1) There is no "Class" or "Interface" block type. Rather these are
represented by "Module" block types that have an IsClass or IsInterface tag
with a text string value of "1" respectively.

2) Thus the word "Module" means something very different in the context of a
Block Type and a Project Item Type. To keep these terms clear I therefore
refer to these by the terms "Block Module" and "Global Module".

[Note also that although there is indeed a "Window" block type corresponding
with the "Window Project Item Type", that the XML for a Window block also
contains IsClass and IsInterface XML Elements. In the Project Item world,
there is a class hierarchy where Window is a subclass of Object, each
separate "Class" (including Windows) has a superclass, but Interface Items
are not classes and have therefore have no SuperClass. In the XML Block
world, however, there is a separate implicit block hierarchy as defined by
the presence of the IsClass and IsInterface tags, and I have found the
following arrangement of classes representing block types in my parsers to
be most useful: In my parser, subclasses of the abstract, generic BlockRep
include FolderRep, PictureRep, and CodedRep. Subclasses of CodedRep include
(global) ModuleRep and InterfaceRep; then ClassRep is as subclass of
ModuleRep and WindowRep is a subclass of ModuleRep. Others may of course
choose a different way to implement a parser, but I find this approach most
convenient for sharing common methods that manage various common inherited
properties for these Project Item variations.]

3) There are some special cases of Block Types for which there is not really
a similar pristine mapping to Project Item Types. Namely:

a) The first block always appears to be of block type "Project". This block
contains a lot of XML Elements such as version numbers and build parameters
that are normally associated with the blessed App Class in RB 2005 or with
various preference dialogs in RB pre-2005 versions. The blessed App Class is
designated in the XML file with an <IsApplicationObject> Tag.

b) In RB 2005 XML files, the last block has a Block Type of "UIState" which
doesn't correspond with any Project Item Type.

c) In RB 2005 XML files, file type information is contained in a block of
block type "FileTypes" -- which makes sense with this now showing up as a
"FileTypes" Project Item Type in the RB 2005 "Project" tab of the IDE. In
pre-2005 XML files this information shows up in a <FileType> tags under the
"Project" block type.

Finally, another detail I realized I forgot to mention. At the top of the
XML file hierarchy there is the "File Root" if you will. It includes both
the XML Format line as the first text line of the XML file and is
immediately followed on the second lines by the "RBProject" root, which
contains all of the "block" elements. One of the hairiest pieces of
terminology for me to get straight in my code is to properly distinguish the
"File Root" from the "RBProject" root from the "Project" block which is the
first block underneath the "RBProject" root.


> 'Member' - hmm, is that some part of a prj file? Apparently not, it's
> rather something you use internally only, right? So, I can forget that
> term right away again :)
>

Yeah, this is really pushing into the meta terminology realm. The term
"member" came to mind as the result of a regret. I like and use Project
Item, just as you and I have both used it in the text above. But when I
went to make my latest implementation I referred to the database entries
that corresponded to XML_Elements as "items" and therefore ended up
overloading the term "item" and soon came to wish I had not. At some point
I'll fix all those to make it more clear. But yeah, you can probably forget
it for your needs. I just recommend you learn from my mistake and avoid
overloading any terms. It keeps the code much more readable.

>> The only point I really want to make
>> is that since "Element" and "Node" are used extensively here I
>> try to avoid overloading these terms
>
> Why? If you keep saying "XML Element" and "XML Node", you could use
> node
> and elem in other areas with other prefixes as well.
>

Yes I could. But I chose not to for the exact same reason I wish I had not
overloaded use of the the term "item" even though I can use qualifiers such
as: "Project Item" and "Table Item". But it's really nice to be able to use
an abbreviation such as "elm", "mbr", "prt", or "itm" in a variable name and
know precisely what kind of element, member, part, or item you are implying
rather than leaving it ambiguous or having to use an abbeviation such as
"prjItm" or "xmlElm" to make it clear.

>> On the RBP file format side the term "Group" is in use
>
> I avoid that term altogether. It has no self-explanatory meaning to me
> in
> regards to understanding Prj parts.
>

Good :)

>> In both the RBP and XML formats for RB Project files the term "block" has
>> a very evident presence
>
> That would be, as pointed out above by me, identical to 'Project Item',
>
> right?

In my opinion, wrong; for all of the reasons I have enumerated above.


> So, let's just avoid "block" as well and use Project Item instead, when
>
> we talk about Prj parts.

You can do that just fine when you are talking about parts -- which are
component of source code in the context of the IDE. You can NOT avoid
talking about blocks however when you are discussing either XML or RBP
formats. They are an integral piece of the puzzle that have to be
acknowledged, named, parsed, and accommodated in code.

> Or call them alternatively "Project Block", which should be
> non-redundant, isn't it?
>
> Thomas

To me the term "block" is unambiguous, whether used in the context of an XML
block in an XML file or RBP block in the context of an RBP file.

"Project Block" on the other hand is asking for trouble I think, considering
there is a block of block type "Project", and an "RBProject" XML_Element
which is not a block.

> --
> Thomas Tempelmann. Mad, mad scientist. <http://www.tempel.org/>
>
>
>
> ======================================
>
> And Thomas also replied:
>
> ===
>
> Ed Kleban wrote:
>
>>> BTW, I'm currently once again redesigning some of the classes. I believe
>>> I'll now steer it in the way that I make the "RBPrjItemsTree" (or
>>> formerly called RBPrjTree, I believe) the central container.
>>
>> You may want to figure out what terms you want to in all the various domains
>> and to what depth your tree that is going to contain Project Items is going
>> to go when choosing RBPrjItemsTree -- or anything else as a name.
>
> I could need some help with these terms I'm using so far:
>
> RBPrjItem, RBObj, RBPrjIdent
>

> RBPrjItem:
> Items (both for nodes and leafs) in a tree that hold an abstract
> copy of a project file (coming from either a binary or XML file). Since
> Ed suggests to use "Project Item" for only the parts of the outer
> "block" level (Module, Class, Window, ...), I should find a different name
> for this. Perhaps RBPrjMember? RBPrjThing? RBPrjDoodah? :)

Bingo! This is the specific purpose for which I propose using the term
"Member" so as to avoid confusion with "Project Items" which is a well
established term in the RB User's Guide.

So RBPrjMember, PrjMember, Member, Mbr all work fine for me.

> RBObj and RBObjEntity:
> A base name for items in a tree of code-related parts, such as
> classes, modules, methods, properties etc.
> But since "obj" = Object, and since objects have a specific meaning
> in RB, a better name would be nice here as well. Perhaps it should also
> be prefixed with "RBPrj" like almost all of my other classes

This is essentially what I used the term "Rep" for, namely to refer to an
object in a parse tree that "represents" some Project Item or Part. A Rep
in the case of my code, is necessarily an instantiated RB object that I can
communicate with through messages (i.e. by calling it's methods) to get and
set its attributes and use polymorphism to manage all of the tricks that it
can perform.



> RBPrjIdent:
> Ident is short for Identifier - it's used to build a tree for the
> class browser, in which we have a top (global) level and then these
> idents are methods, classes, controls, properties etc. that are
> visible on that or deeper scope levels.

Umm... I'm not exactly sure what you are referring to here and how it may
differ from what you suggested could be called "RBObj and/or RBObjEntity"
and that I suggested you consider using "Rep" for.

I believe that what you may be referring to here is a string that can be
used to identify a rep. If so, I used the term "repName" for this, and
indeed every rep is capable of telling you its repName if it has one. I
have a whole rep protocol defined in terms of an interface, and one of the
supported calls is to return the repName. Note that this is used to
designate the name by which the object is referred to in some local context.
It is NOT a unique identifier in the parse tree. Thus if you have a two
different methods of the same class that take a different number of
arguments, they will both claim to have the same repName.



> I think this latest one, Ident, is the one that's the most unique and
> self-explanatory. I could use "Symbol" instead, but decided against it
> because Symbol also includes RB language elements such as "if", "dim"
> etc, right? Or did I get confused here?

I would DEFINITELY avoid use of the term "symbol". That's just asking for
trouble.

> Any suggestions?
>

Just those.

Good luck!

--Ed


Ed Kleban

unread,
Dec 21, 2005, 2:22:21 AM12/21/05
to REALsource
Ooops. I said:

" In my parser, subclasses of the abstract, generic BlockRep
include FolderRep, PictureRep, and CodedRep. Subclasses of CodedRep
include
(global) ModuleRep and InterfaceRep; then ClassRep is as subclass of
ModuleRep and WindowRep is a subclass of ModuleRep."

The last line should instead read: "... and WindowRep is a subclass of
ClassRep"

--Ed

Reply all
Reply to author
Forward
0 new messages