Tony
Scope is a set of rules about how variable references are resolved. When
implementing your symbol table management, you have to encode rules about
where to look to get values of variable names. Those rules are your scope
rules.
A given name in your code may refer to many different variables. When
your compiler finds the variable name, it has to map it to the memory
address of the correct variable.
For example, you have a variable name "foo" in a function, and your
symbol table says that name refers to an automatic variable in the
current invocation of that function, to a file variable, and to a
global variable. You need to resolve the conflict using scope rules -
and in this case, in C or C++ your compiler would map the name to
the automatic variable.
In most languages, scope rules require the variable name, the function
name, and a unique identifier for the activation frame. In C and C++,
however (and a few other languages) you also need the code address of
the reference, because the baroque scoping rules include variables whose
scope is a smaller unit than a full function.
Hope that helps.
Bear
I'm not sure how other compilers do it, but here's how my own compiler
does it. First a short introduction to scoping in my language:
* There's the global scope. In this language that's the same thing as
the translation unit scope, since as of yet, there can be only one
source file. It is unnamed. It can contain variables, functions and
classes.
* There are functions. They introduce a named, opaque scope. They can
contain variables, functions, classes and local scopes.
* There are classes. They introduce a named, tansparent scope. They
can contain variables, functions and classes.
* There are local scopes. They may be named or unnamed. They are
opaque. They can contain variables, functions, classes and local
scopes.
There's more to it than that, since there are also preconditions and
postconditions in the language, which require special visibility
rules. But for simplicity, let's ignore those for now.
The main purpose of these scopes is to determine symbol visibility and
variable lifetime. In this language, a symbol may be referenced by
absolute or relative path. For example:
void f() {
int a;
void g() {
int a;
class C {
int a;
}
f.a
.f.a
a
g.a
f.g.a
.f.g.a
C.a
g.C.a
f.g.C.a
.f.g.C.a
}
}
The first group of references here reference the variable a directly
in function f. The second group references the variable a directly in
function g. The third group references the variable a inside class C.
In general, absolute paths (those starting with .) start with the
global scope. For relative paths, their first element (x) refers to
the most deeply nested declaration x in a scope that encloses the
reference. Each subsequent element in the path must be a direct child
of its predecessor. And they must also enclose the reference, except
possibly for the last element and any elements refering to transparent
scopes directly before the last element.
Did that make any sense? :-) It sure took a while to put this into
words.
Well, you may only reference a symbol where this symbol is visible, or
it's a compiler error. Ensuring that all references are resolved
properly will make sure that they will work in the target language. If
you're going for assembly, they'll be present in the current
stack-frame or in another stack-frame you can reach by pointers. But
this language is currently translated to C++, so... :-)
--
Michiel Helvensteijn
Tony
"Tony" <to...@my.net> wrote in message news:08-1...@comp.compilers...
The last time I looked at a (Burroughs Extended) ALGOL compiler, it used
a linked list of symbol tables. When it entered a scope, it linked its
symbol table to the head of the list; when it left a scope, it removed
its table and pointed the list head at the previous scope.
Louis
Yes it does, especially the last paragraph. It seems that scope is a very
very key concept of a language. If I was building a toolkit to implement
languages with (I wish there was such a thing, btw), there would be data
structures/mechanisms/algorithms to make scope "a first class citizen", so
to speak. From one of the other replies, a potential starting point for the
aforementioned could be having separate symbol tables per each scope. Or at
least hierarchial ones per "major" (translation unit?) scope.
Tony
"Alex L.K" <liangk...@gmail.com> wrote in message
> Basically, you can implement it with a tree. Each node in this tree
> corresponds to a scope and contains the declarations in this scope.
> When you search for a declaration, you begin with current node in the
> tree, if you do not find it, you continue in the parent and so on.
> In practice, nobody would like accturally maintain this tree, they use
> a stack to keep a path from the root to current node of this tree.
How does a compiler keep track of which scope it is currently under?
Naively, I could think that it maintains something like a trace call stack
noting the hierarchial scope level, but that seems too processor intensive
to do at each scope entry/exit.
Tony
[This topic is covered well in texts like the old Dragon Book. In the
object code you don't usually need to know what scope you're in, because
the compiler can figure out what names correspond to what variables.
If you have variable sized arrays and scope across routine boundaries,
i.e., code in a routine can see non-static data outside the routine, it
gets somewhat hairier. But this was all figured out in the early 1960s
for Algol60. -John]
I like the multiple table method, or some kind of
hierarchial/multi-data-structure thing. It seems to me though that
keeping track of the current scope can be a potential source of
inefficiency. For example:
Developer writes:
void my_func()
{
// func code
}
Compiler generates :
void my_func()
{
EnterScope(my_func); // compiler generates
// func code
LeaveScope(my_func); // compiler generates
}
Tony
For static variables.
The same way as for C -a tree of stacks. The scope in which you are
executing shows the node where you are present in the tree, If a
symbol is not found within a node, you look above (in the enclosing
scope) till you hit the root node.
For member functions/variables,
there is something called RTTI (run-time type identification) where
the C++ runtime library resolves scope at runtime and enables you to
call the appropriate member function. The instance of class which is
used to call a member function is represented by an implicit pointer
(this) and that is used to access variables when a member function
accesses them.
Hope this answers your question.
regards
-kamal
.
> I like the multiple table method, or some kind of
> hierarchial/multi-data-structure thing. It seems to me though that
> keeping track of the current scope can be a potential source of
> inefficiency. For example:
[...]
> Compiler generates :
>
> void my_func()
> {
> EnterScope(my_func); // compiler generates
>
> // func code
>
> LeaveScope(my_func); // compiler generates
> }
I can't see anything inefficient here. When a scope requires allocation
of memory, on the stack or somewhere else, it doesn't matter whether
that memory is allocated and released by the coder or by the compiler.
In C++, Delphi etc. the compiler inserts the construction and
destruction of local objects, at the begin and end of every scope block.
Perhaps you missed that the scopes (as symbol tables) are required only
during compilation, not at runtime. Typical object files only contain
one symbol table, for use by the linker. More detailed debug information
may be created, too, but only for use by an interpreter or debugger.
DoDi
EnterScope(my_func);
In my understanding, the meaning of scope is to help the code
generator to generate code, after code generation phase, I don't think
it would be necessary to keep the scope in the memory. You could go
and discard the scope.
And go on with the further optimization work.
I think the compiler keeps the multiple table method only during
compilation time, to *fix* the addresses. In your example:
>
> Developer writes:
>
> void my_func()
> {
> // func code
> }
>
Say we have an `int a;' inside my_func(); the compiler knows that its
offset is x bytes from the stack pointer (say). It needn't know the
absolute address and needn't have to output code to compute the
address during run time. So, I doubt there is much of *computation*
(at run time) that takes place in statically scoped languages.
Yes, in dynamically scoped, there will be issues. I am curious -- Are
there any dynamically scoped compiled languages?
--
Vimal
[Some versions of Lisp can compile dynamically scoped code, albeit not
all that efficiently. -John]
I'm still not used to that concept: inserting stuff at the entry and exit of
every function, because to me (as a higher level developer), "a function
call is expensive" rings in my ear. I have a tracer class that I use during
development that maintains a call stack list, but I turn it off for
"release" code.
Obviously I'm missing the common knowlege on the subject, but I do
appreciate the responses because I am learning things. My texts on the
subject matter haven't arrived yet, but I have them on order to help me to
decide whether or not to pursue implementing (not designing, I know WHAT I
want) a new (not just "another"?) language.
>
> Perhaps you missed that the scopes (as symbol tables) are required only
> during compilation, not at runtime.
Definitely "missed" that. As a developer, of course I know what scope is,
but now I'm trying to get my mind around how that is implemented, and now
you just taught me WHEN it applies.
> Typical object files only contain
> one symbol table, for use by the linker. More detailed debug information
> may be created, too, but only for use by an interpreter or debugger.
To me, (take with a lot of salt), it seems that at some level, MULTIPLE
symbol tables would be appropriate: at the "module" level, for example.
I'm "way out of the league" as far as compiler implementation goes, so I
apologize to everyone in advance for asking "dumb" questions. All of this
info and my ponderings will surely help me in reading what has been
described, mildly, as "hard to approach" tomes (e.g., "The Dragon Book").
Tony
That's 3! I feel so dumb! This is great! (I like learning and knowing
stuff).
>> Developer writes:
>>
>> void my_func()
>> {
>> // func code
>> }
>>
>
> Say we have an `int a;' inside my_func(); the compiler knows that its
> offset is x bytes from the stack pointer (say). It needn't know the
> absolute address and needn't have to output code to compute the
> address during run time. So, I doubt there is much of *computation*
> (at run time) that takes place in statically scoped languages.
I just recently "got a clue" about assembly language that I could relate to
my HP15C RPN calculator. As someone who has done a lot of programming at a
high level (read, above assembly), "the stack" has been rather nebulous
until just recently for me. I DO understand what you are saying though.
To a "high level" developer like me, I just ACCEPT the opening brace of a
function, but that opening (and closing) brace represents all the hidden
details of what the code is doing, curtesy of the compiler, and while my
real goal is being able to code in the language I envision, I am curious
enough about what's going on there to potentially "render it into
submission". ( ;) on the last analogy).
Tony
I can relate to trees (I've implemented a b+-tree for what I hope will
become my in-house embedded database). I think the biggest unknowns to me at
this time are those chasms between parsing and code generation.
>
> For member functions/variables,
> there is something called RTTI (run-time type identification) where
> the C++ runtime library resolves scope at runtime and enables you to
> call the appropriate member function. The instance of class which is
> used to call a member function is represented by an implicit pointer
> (this) and that is used to access variables when a member function
> accesses them.
Separate topic stuff. I've been "round-n-round" with the language guys about
that kind of stuff: RTTI, exceptions, templates. Currently, I AVOID using
those things in C++ because I don't know how they are implemented and I may
want to exclude those things from "my language" (and find a better solution
perhaps or just do without) if they are "too messy". (Again, separate
topic).
That said, the question becomes: Is RTTI THE (uppercase) solution to the
problem? Is the "problem" polymorphism? If that IS the problem that RTTI
solves: I AM ALL EARS!!! (Because I have BIG issues with vptrs).
Tony
Tony, Tony, Tony.
Learn one thing at a time.
If you want to learn about how to implement scope, learn how to
implement scope. Download the source to a free Pascal compiler and see
how one implementation does it for a fairly straightforward language.
As you've correctly observed, C++ is complicated. If you want to learn
how some of that language is implemented, get an old copy of the
Annotated C++ Reference Manual. There are no new copies, because, as
far as I know, the book was never updated to reflect the latest version
of C++. Which is unfortunate, but there you are.
Reading source listings is how I learned what little I know about system
software -- compilers, operating systems, etc. When I was in high
school, we used a Burroughs B5500 at a local university, and us kiddies
got a hold of a copy of an ALGOL compiler listing. The compiler was
written in ALGOL. It was a single-pass compiler with a bounded context
recursive descent parser, and every time I looked through the listing, a
little more of it made sense.
So go find a Pascal compiler and see how it's done. Just be warned that
in my experience, once you do systems stuff, you'll never be truly happy
as an applications programmer again.
Louis