Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Compiling C++ Templates as opposed to Preprocessing them.

13 views
Skip to first unread message

Seima Rao

unread,
Jul 31, 2010, 10:34:09 AM7/31/10
to
Hi,


Can readers of this forum advise on whether it would be
fruitful to consider compiling C++ templates to
IR instead of preprocessing them?


Consider the following snippet:


template<class T>
void
foo(const T &t)
{
T::i; // #1
}


Since the compiler cannot tell if `i' is a static member of
`T',
an extra operator needs to be introduced in the IR so that
the `::' operator can be mapped to it. This operator would
have no use in a non-generic scope, therefore, that operator
is there merely to represent templates in the IR.


I am sure professional compiler writers would have
(many) non-trivial issues to tell.


What I want to know is the following:


1) Has C++ been defined to accomodate compilation of
templates?
2) Is name resolution unambiguously doable in
template scopes
so that IRs that represent templates dont
have to worry about
it at all?


Sincerely,
Seima Rao.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

CornedBee

unread,
Aug 1, 2010, 10:32:59 AM8/1/10
to
On Jul 31, 7:34 am, Seima Rao <seima...@gmail.com> wrote:
> Hi,
>
> Can readers of this forum advise on whether it would be
> fruitful to consider compiling C++ templates to
> IR instead of preprocessing them?

This is a question without an answer, really, because it makes
assumptions about the way compilers work that, as far as I can
understand your question, are simply wrong. So let me start out by
describing how templates are processed by the compiler.

GCC and Clang (I cannot really talk about the others) first preprocess
the source (as any other) and lex and parse it. When they encounter a
template, they will build an AST for it, just as they would do for any
other source code. The thing is that this AST contains some nodes that
carry some things that only appear in templates. For example, the type
of an expression might be dependent.
I am pretty sure, from observing its behaviour, that MSVC caches the
token stream instead of building an AST, but that's not a good
approach: there are various bugs that can be traced to this
implementation choice.

Anyway, when a template is instantiated, the AST for it is duplicated,
but with the template parameters filled in. The resulting template
instantiation is just like a normal function or class. In particular,
code generation doesn't really treat them differently.

Now I believe you're asking whether the compiler could, instead of
instantiating a template, generate code where a type can be
substituted in at runtime? The answer is no. There is too much
semantic validation that has to be done when instantiating a template,
which includes name lookup, overload resolution, and more. This cannot
be deferred until runtime. First of all, template instantiation could
fail, and you really don't want such failures to result in runtime
errors. You want them at compile time. Second, preserving all the
information necessary for runtime template instantiation means you
might as well carry the whole program source code with you and
recompile it on demand. In other words, embed a complete C++
interpreter in your program.

So while in theory, this might be doable, it's simply not practical.

Mathias Gaunard

unread,
Aug 1, 2010, 10:48:33 AM8/1/10
to
On Jul 31, 3:34 pm, Seima Rao <seima...@gmail.com> wrote:
> Consider the following snippet:
>
> template<class T>
> void
> foo(const T &t)
> {
> T::i; // #1
> }
>
> Since the compiler cannot tell if `i' is a static member of
> `T',
> an extra operator needs to be introduced in the IR so that
> the `::' operator can be mapped to it. This operator would
> have no use in a non-generic scope, therefore, that operator
> is there merely to represent templates in the IR.
>
> I am sure professional compiler writers would have
> (many) non-trivial issues to tell.

The AST of a C++ frontend is the IR you're looking for.
Whether you can make it simpler and more low-level or not, I don't
know.


>
> What I want to know is the following:
>
> 1) Has C++ been defined to accomodate compilation of
> templates?

Typically, templates are directly compiled to low-level code by the C+
+ frontend.

Walter Bright

unread,
Aug 1, 2010, 10:48:33 AM8/1/10
to
Seima Rao wrote:
> Can readers of this forum advise on whether it would be
> fruitful to consider compiling C++ templates to
> IR instead of preprocessing them?

Frankly, precompiling them is a waste of time and effort, in my not-so-humble
opinion. The Digital Mars C++ compiler stores template definitions as a sequence
of tokens. At instantiation time, it does the syntactic and semantic analysis in
one go.

The Standard allows precompiling them into some syntactical intermediate form,
and there are some crutches in the syntax to allow this to work. It's my
understanding that this is how all the other C++ compilers handle it. The
advantage of this approach is that some syntactical errors can be diagnosed
before instantiating the template. The disadvantage is it's significantly more
complex to implement.

The Standard allows for either method.

The D programming language has complete separation of syntax from semantics, and
so the template definitions are converted to and stored as syntax trees.

Seima Rao

unread,
Aug 1, 2010, 7:53:31 PM8/1/10
to
>
> the source (as any other) and lex and parse it. When they encounter a
> template, they will build an AST for it, just as they would do for any
> other source code. The thing is that this AST contains some nodes that
> carry some things that only appear in templates.

The IR(AST as is cited by you) has extra node types just for
templates,
if I understand you correctly?

>
> Now I believe you're asking whether the compiler could, instead of
> instantiating a template, generate code where a type can be
> substituted in at runtime? The answer is no.

You assumed incorrectly. What I am interested in is the two
scenarios that a compiler has to contend with:

1) template declaration
2) template instantiation

Your reply seems to suggest that compilers piggyback on
preprocessing technology(aka macro substitution) to
get template instantations to work admittedly
in a transparent fashion to the rest of the toolchain.

Any one with suggestions for compiling templates
only during declaration and skipping name lookup
and other sundries during instantiation?
[C++ makes it impossible, but still...]

Sincerely,
Seima Rao.

Walter Bright

unread,
Aug 2, 2010, 1:51:21 AM8/2/10
to
CornedBee wrote:
> I am pretty sure, from observing its behaviour, that MSVC caches the
> token stream instead of building an AST, but that's not a good
> approach: there are various bugs that can be traced to this
> implementation choice.

Please give an example.

Edward Rosten

unread,
Aug 2, 2010, 2:50:35 PM8/2/10
to
On Aug 1, 3:48 pm, Walter Bright <newshou...@digitalmars.com> wrote:
> Seima Rao wrote:
> > Can readers of this forum advise on whether it would be
> > fruitful to consider compiling C++ templates to
> > IR instead of preprocessing them?
>
> Frankly, precompiling them is a waste of time and effort, in my not-so-humble
> opinion. The Digital Mars C++ compiler stores template definitions as a sequence
> of tokens. At instantiation time, it does the syntactic and semantic analysis in
> one go.
>
> The Standard allows precompiling them into some syntactical intermediate form,
> and there are some crutches in the syntax to allow this to work. It's my
> understanding that this is how all the other C++ compilers handle it.

I assume by this that you are referring to the need to
scatter .template within templated code? I presume then that DMC++
doesn't need them (much like GCC 3.x series of compilers), and 4.x
with appropriate compiler flags?

> The
> advantage of this approach is that some syntactical errors can be diagnosed
> before instantiating the template. The disadvantage is it's significantly more
> complex to implement.

Not to mention making the template code somewhat nastier to read, if
my above assumption is correct.

-Ed

Walter Bright

unread,
Aug 3, 2010, 8:47:37 AM8/3/10
to
Edward Rosten wrote:
> On Aug 1, 3:48 pm, Walter Bright <newshou...@digitalmars.com> wrote:
>> Frankly, precompiling them is a waste of time and effort, in my not-so-humble
>> opinion. The Digital Mars C++ compiler stores template definitions as a sequence
>> of tokens. At instantiation time, it does the syntactic and semantic analysis in
>> one go.
>>
>> The Standard allows precompiling them into some syntactical intermediate form,
>> and there are some crutches in the syntax to allow this to work. It's my
>> understanding that this is how all the other C++ compilers handle it.
> I assume by this that you are referring to the need to
> scatter .template within templated code? I presume then that DMC++
> doesn't need them (much like GCC 3.x series of compilers), and 4.x
> with appropriate compiler flags?

Right, it doesn't need them, though in order to be Standard compliant the parser gives an error if they aren't there!

Nikolay Ivchenkov

unread,
Aug 3, 2010, 6:17:41 PM8/3/10
to
On 1 Aug, 18:48, Walter Bright <newshou...@digitalmars.com> wrote:
> Seima Rao wrote:
> > Can readers of this forum advise on whether it would be
> > fruitful to consider compiling C++ templates to
> > IR instead of preprocessing them?
>
> Frankly, precompiling them is a waste of time and effort, in my not-so-humble
> opinion. The Digital Mars C++ compiler stores template definitions as a sequence
> of tokens. At instantiation time, it does the syntactic and semantic analysis in
> one go.

According to syntactic rules, a sequence of tokens can be considered
as a template declaration (and possibly a template definition) _only
if_ the sequence satisfies the syntactic rules for template
declarations (see below). Otherwise if the program would be
syntactically incorrect, a diagnostic message shall be issued in most
cases. (Of course, a compiler - like yours - can violate standard
requirements and, in particular, do not provide diagnostics when it is
required by the standard).

template-declaration:
export opt template < template-parameter-list > declaration

A declaration cannot consist of arbitrary sequence of tokens (see C+
+03 - [dcl.dcl]). There are no special exceptions for templates in
this respect. Thus, the syntax of any template must be completely
checked regardless of whether the template is instantiated.

Walter Bright

unread,
Aug 4, 2010, 12:30:58 AM8/4/10
to
Nikolay Ivchenkov wrote:
> A declaration cannot consist of arbitrary sequence of tokens (see C+
> +03 - [dcl.dcl]). There are no special exceptions for templates in
> this respect. Thus, the syntax of any template must be completely
> checked regardless of whether the template is instantiated.


This is the template definition part which is stored as a sequence of tokens
until instantiation time, not the declaration. The declarations are always
parsed and checked, even if never used.

My reading of C++98 was that the compiler was not required to issue a diagnostic
for a malformed definition if it was never instantiated.

Nikolay Ivchenkov

unread,
Aug 4, 2010, 3:18:35 PM8/4/10
to
On 4 Aug, 08:30, Walter Bright <newshou...@digitalmars.com> wrote:
> Nikolay Ivchenkov wrote:
> > A declaration cannot consist of arbitrary sequence of tokens (see C+
> > +03 - [dcl.dcl]). There are no special exceptions for templates in
> > this respect. Thus, the syntax of any template must be completely
> > checked regardless of whether the template is instantiated.
>
> This is the template definition part which is stored as a sequence of tokens
> until instantiation time, not the declaration.

Every template definition is a template declaration. For example, in
the following template definition

template <class T>
void f() { T::g(); }

every its token is a part of the template-declaration. According to
the syntactic rules, the following sequence of tokens

template <class T>
void f() { T + 1; }

cannot be considered as a template-declaration at all.

> My reading of C++98 was that the compiler was not required to issue a diagnostic
> for a malformed definition if it was never instantiated.

What's the malformed definition? (I want to see normative criteria if
they exist). For example, can we consider the following sequence of
tokens

%+*!^~

as a malformed template definition (according to your interpretation)?

Felipe Magno de Almeida

unread,
Aug 4, 2010, 3:18:34 PM8/4/10
to
On Aug 4, 1:30 am, Walter Bright <newshou...@digitalmars.com> wrote:
> Nikolay Ivchenkov wrote:
> > A declaration cannot consist of arbitrary sequence of tokens (see C+
> > +03 - [dcl.dcl]). There are no special exceptions for templates in
> > this respect. Thus, the syntax of any template must be completely
> > checked regardless of whether the template is instantiated.
>
> This is the template definition part which is stored as a sequence of tokens
> until instantiation time, not the declaration. The declarations are always
> parsed and checked, even if never used.
>
> My reading of C++98 was that the compiler was not required to issue a diagnostic
> for a malformed definition if it was never instantiated.

Unless I'm missing something, I think it is required to diagnose.
After all, if you implement Two Phase Lookup then you must make the
first lookup
when you first parse the template.

> --

Regards,
--
Felipe Magno de Almeida

Walter Bright

unread,
Aug 4, 2010, 10:15:59 PM8/4/10
to
Nikolay Ivchenkov wrote:
> On 4 Aug, 08:30, Walter Bright <newshou...@digitalmars.com> wrote:
>> My reading of C++98 was that the compiler was not required to issue a diagnostic
>> for a malformed definition if it was never instantiated.
>
> What's the malformed definition?

One that doesn't follow the grammar.

> (I want to see normative criteria if
> they exist). For example, can we consider the following sequence of
> tokens
>
> %+*!^~
>
> as a malformed template definition (according to your interpretation)?

Yes.

Walter Bright

unread,
Aug 4, 2010, 10:15:26 PM8/4/10
to
Felipe Magno de Almeida wrote:
> On Aug 4, 1:30 am, Walter Bright <newshou...@digitalmars.com> wrote:
>> My reading of C++98 was that the compiler was not required to issue a diagnostic
>> for a malformed definition if it was never instantiated.
>
> Unless I'm missing something, I think it is required to diagnose.

At this point, I'll require a specific quote from C++98 <g>.


> After all, if you implement Two Phase Lookup then you must make the
> first lookup
> when you first parse the template.

The Digital Mars C++ does implement two phase lookup correctly, so
preparsing
the template definitions is not required in order to get that right.

--

CornedBee

unread,
Aug 5, 2010, 5:13:45 AM8/5/10
to
On Aug 4, 7:15 pm, Walter Bright <newshou...@digitalmars.com> wrote:
> Felipe Magno de Almeida wrote:
>
>> On Aug 4, 1:30 am, Walter Bright <newshou...@digitalmars.com> wrote:
>>> My reading of C++98 was that the compiler was not required to issue a diagnostic
>>> for a malformed definition if it was never instantiated.
>
>> Unless I'm missing something, I think it is required to diagnose.
>
> At this point, I'll require a specific quote from C++98 <g>.

[temp.res]p8 gives an explicit example of completely nonsensical code
(not even syntactically correct), with the note that it *may* be
diagnosed even if the template is never instantiated. In other words,
as long as a template is not instantiated, you can have whatever you
want in there, as long as the parser is still able to find the end of
the template. Getting diagnostics early is simply a QoI issue.
Clang aims at diagnosing things as early as possible and as thoroughly
as possible (there are many things in the area that are ill-formed, no
diagnostic required; we want to diagnose them), so blindly caching
tokens is simply not an option. We would have to cache them and parse
them at definition time, and then parse them again at instantiation
time. (For every instantiation, of course.) Now *that* is a waste of
effort.

Sebastian

CornedBee

unread,
Aug 5, 2010, 6:18:36 AM8/5/10
to
On Aug 1, 4:53 pm, Seima Rao <seima...@gmail.com> wrote:
> The IR(AST as is cited by you) has extra node types just for
> templates,
> if I understand you correctly?

Yes. In Clang's AST, we have various declaration nodes for templates,
we have the dependent type, we have dependent-sized array types, and
we have various pseudo-types for templates and parameters. All in all,
I'd say there's about a dozen AST node types specific to templates
(out of about 200).

> You assumed incorrectly. What I am interested in is the two
> scenarios that a compiler has to contend with:
>
> 1) template declaration
> 2) template instantiation
>
> Your reply seems to suggest that compilers piggyback on
> preprocessing technology(aka macro substitution) to
> get template instantations to work admittedly
> in a transparent fashion to the rest of the toolchain.

No, Clang does tree substitution on its ASTs. It has nothing to do
with the preprocessor. Token-caching implementations like VC++ and DMC+
+ use mechanics that are similar to the preprocessor, but I doubt
there's any code sharing going on.

> Any one with suggestions for compiling templates
> only during declaration and skipping name lookup
> and other sundries during instantiation?
> [C++ makes it impossible, but still...]

I have no clue what you want to know.

Sebastian

CornedBee

unread,
Aug 5, 2010, 6:18:34 AM8/5/10
to
On Aug 2, 11:50 am, Edward Rosten <edward.ros...@gmail.com> wrote:
> I assume by this that you are referring to the need to
> scatter .template within templated code? I presume then that DMC++
> doesn't need them (much like GCC 3.x series of compilers), and 4.x
> with appropriate compiler flags?

Template and typename, yes. Clang needs them.

Sebastian

CornedBee

unread,
Aug 5, 2010, 6:18:36 AM8/5/10
to
On Aug 1, 10:51 pm, Walter Bright <newshou...@digitalmars.com> wrote:
> CornedBee wrote:
>> I am pretty sure, from observing its behaviour, that MSVC caches the
>> token stream instead of building an AST, but that's not a good
>> approach: there are various bugs that can be traced to this
>> implementation choice.
>
> Please give an example.

I don't have any way to try out code in VC++ at the moment, so I can
only guess. Besides, token-caching doesn't mean it's impossible to get
a correct implementation; it's just my opinion that Clang's tree
substitution is a lot simpler than what I come up with when I think
about token-caching implementations. VC++ has improved a lot ever
since VC++6, and most of its bugs are gone. (Its name lookup is still
broken, but that's kinda by design.)

Here's an interesting bug, though. It demonstrates that VC++ most
likely uses token-caching, and also some other details about how it
works. I'm recreating it from memory here, so I can't guarantee this
really demonstrates the bug I encountered.

namespace detail {
struct S {};
}
namespace foo {
template <typename X>
void f(X, detail::S *) {
std::cerr << "Template\n";
}
void f(int, bool) {
std::cerr << "Plain\n";
}

namespace detail {
}
}
int main() {
detail::S s;
foo::f(1, &s);
}

A conforming compiler will compile this program to print "Template". VC
++ compiled it to print "Plain". (Except of course that in my original
program, the difference was much more insidious. Obviously.) What
happened?
VC++, when caching tokens, also validates declarations. That is, it
performs name lookup to make sure everything is alright. So when
parsing fn<T>, it looked up detail::S and found ::detail::S.
But it doesn't save the result of the lookup. Instead, it does lookup
again when instantiating. And while it does that within the correct
scope, it does it with full knowledge of that scope. OK, then, the
namespace detail should refer to ::foo::detail and thus detail::S
should be not found, right? Have an error?
No, actually. I believe that VC++'s SFINAE is implemented such that
any error in the function declaration that occurs during instantiation
is taken as a SFINAE, since they validated the declaration when first
parsing it. In other words, the name lookup failure for detail::S,
which is completely non-dependent, is not a hard error, but SFINAE'd
away. The f<int> instantiation is removed from the overload set, and
plain f is called.

This is what surprises me about your statement that token-caching is
easier than tree substitution. I would have thought that really
getting name lookup (in particular two-phase name lookup) right for
token caching is devilishly hard. You either have to attach the lookup
result to the cached identifier tokens (but you can't do that, because
you need a proper parse to determine if you're even supposed to look
up a name), or you need to be able to recreate the exact lookup
context at the template definition point. I don't know about DMC++,
but in Clang that would be a major challenge.
If you have a better idea, I'd be interested in hearing about it.

Nikolay Ivchenkov

unread,
Aug 5, 2010, 2:53:29 PM8/5/10
to
On 5 Aug, 06:15, Walter Bright <newshou...@digitalmars.com> wrote:
> Nikolay Ivchenkov wrote:
> > On 4 Aug, 08:30, Walter Bright <newshou...@digitalmars.com> wrote:
> >> My reading of C++98 was that the compiler was not required to issue a diagnostic
> >> for a malformed definition if it was never instantiated.
>
> > What's the malformed definition?
>
> One that doesn't follow the grammar.
>
> > (I want to see normative criteria if
> > they exist). For example, can we consider the following sequence of
> > tokens
>
> > %+*!^~
>
> > as a malformed template definition (according to your interpretation)?
>
> Yes.

If any ill-formed according to syntactic rules sequence of tokens
might be interpreted as an ill-formed template definition for which no
diagnostic is required, then syntactic rules would be non-diagnosable
at all. Actually the set of diagnosable rules includes syntactic rules
- see C++03 - 1.4:

[quote]
The set of diagnosable rules consists of all syntactic and semantic
rules in this International Standard except for those rules containing
an explicit notation that "no diagnostic is required" or which are
described as resulting in "undefined behavior."

Although this International Standard states only requirements on C++
implementations, those requirements are often easier to understand if
they are phrased as requirements on programs, parts of programs, or
execution of programs. Such requirements have the following meaning:
- If a program contains no violations of the rules in this
International Standard, a conforming implementation shall, within its
resource limits, accept and correctly execute that program.
- If a program contains a violation of any diagnosable rule, a
conforming implementation shall issue at least one diagnostic message,
except that
- If a program contains a violation of a rule for which no diagnostic
is required, this International Standard places no requirement on
implementations with respect to that program.
[/quote]

A compiler is allowed to consider a sequence of tokens as an ill-
formed template definition for which no diagnostic is required only if
it can prove that the sequence of tokens is really a template
definition. The only way to do this is to check whether the sequence
of tokens matches the syntax of possible template definitions.

On 5 Aug, 13:13, CornedBee <wasti.r...@gmx.net> wrote:
>
> [temp.res]p8 gives an explicit example of completely nonsensical code
> (not even syntactically correct), with the note that it *may* be
> diagnosed even if the template is never instantiated. In other words,
> as long as a template is not instantiated, you can have whatever you
> want in there, as long as the parser is still able to find the end of
> the template.

Lets consider the following ill-formed program:

template <class T>
void f()
{

(
} // is this the end of the template definition?
)
} // is this the end of the template definition?

int main()
{
} // is this the end of the template definition?

There are two mutually exclusive options:
* no diagnostic is required for this program;
* a diagnostic message shall be issued.

There shall be standard criteria for determining which option is
right. Can you show these criteria?

Walter Bright

unread,
Aug 5, 2010, 3:48:35 PM8/5/10
to
CornedBee wrote:
> On Aug 4, 7:15 pm, Walter Bright <newshou...@digitalmars.com> wrote:
>> Felipe Magno de Almeida wrote:
>>
>>> On Aug 4, 1:30 am, Walter Bright <newshou...@digitalmars.com> wrote:
>>>> My reading of C++98 was that the compiler was not required to issue a
diagnostic
>>>> for a malformed definition if it was never instantiated.
>>> Unless I'm missing something, I think it is required to diagnose.
>> At this point, I'll require a specific quote from C++98 <g>.
>
> [temp.res]p8 gives an explicit example of completely nonsensical code
> (not even syntactically correct), with the note that it *may* be
> diagnosed even if the template is never instantiated. In other words,
> as long as a template is not instantiated, you can have whatever you
> want in there, as long as the parser is still able to find the end of
> the template.

Right, so it's not required.


> Getting diagnostics early is simply a QoI issue.
> Clang aims at diagnosing things as early as possible and as thoroughly
> as possible (there are many things in the area that are ill-formed, no
> diagnostic required; we want to diagnose them), so blindly caching
> tokens is simply not an option. We would have to cache them and parse
> them at definition time, and then parse them again at instantiation
> time. (For every instantiation, of course.) Now *that* is a waste of
> effort.

It's a worthy goal to try and do that.

Felipe Magno de Almeida

unread,
Aug 5, 2010, 3:55:09 PM8/5/10
to
On Aug 5, 7:18 am, CornedBee <wasti.r...@gmx.net> wrote:
> On Aug 1, 10:51 pm, Walter Bright <newshou...@digitalmars.com> wrote:
>
> > CornedBee wrote:
> >> I am pretty sure, from observing its behaviour, that MSVC caches the
> >> token stream instead of building an AST, but that's not a good
> >> approach: there are various bugs that can be traced to this
> >> implementation choice.
>
> > Please give an example.

[snipped example]

> This is what surprises me about your statement that token-caching is
> easier than tree substitution. I would have thought that really
> getting name lookup (in particular two-phase name lookup) right for
> token caching is devilishly hard. You either have to attach the lookup
> result to the cached identifier tokens (but you can't do that, because
> you need a proper parse to determine if you're even supposed to look
> up a name), or you need to be able to recreate the exact lookup
> context at the template definition point.

That's what I was thinking. How can you make two-phase lookup in just
one phase?
Specially since the lookup must be done in the template's defintion as
well, not just
its signature, AFAIU.
You can't pick context that comes after the template definition.
Though I see I was wrong about the diagnose part.

> I don't know about DMC++,
> but in Clang that would be a major challenge.
> If you have a better idea, I'd be interested in hearing about it.

So am I.

> --

Regards,
--
Felipe Magno de Almeida

Walter Bright

unread,
Aug 5, 2010, 11:29:26 PM8/5/10
to
CornedBee wrote:
> Token-caching implementations like VC++ and DMC+
> + use mechanics that are similar to the preprocessor, but I doubt
> there's any code sharing going on.

Digital Mars C++ does not use token caching for the preprocessor, it stores the
macros as text. I was told that it was impossible to do a correct preprocessor
implementation this way, but DMC++ is 100% correct (passes Mensonidas' suite).

(BTW, it was far harder to get a correct pp implementation than to do the two
phase name lookup. My main issue with the pp was getting it to be super fast,
which is devilishly hard while maintaining all the weird rules of the pp. I
figured the only way to get the speed was by making it a text based one.)

Walter Bright

unread,
Aug 5, 2010, 11:28:09 PM8/5/10
to
CornedBee wrote:
> On Aug 1, 10:51 pm, Walter Bright <newshou...@digitalmars.com> wrote:
>> CornedBee wrote:
>>> I am pretty sure, from observing its behaviour, that MSVC caches the
>>> token stream instead of building an AST, but that's not a good
>>> approach: there are various bugs that can be traced to this
>>> implementation choice.
>> Please give an example.
> Here's an interesting bug, though. It demonstrates that VC++ most
> likely uses token-caching, and also some other details about how it
> works. I'm recreating it from memory here, so I can't guarantee this
> really demonstrates the bug I encountered.

#include <iostream>

> namespace detail {
> struct S {};
> }
> namespace foo {
> template <typename X>
> void f(X, detail::S *) {
> std::cerr << "Template\n";
> }
> void f(int, bool) {
> std::cerr << "Plain\n";
> }
>
> namespace detail {
> }
> }
> int main() {
> detail::S s;
> foo::f(1, &s);
> }
>
> A conforming compiler will compile this program to print "Template". VC
> ++ compiled it to print "Plain".

Digital Mars C++ compiles it and prints "Template", yet uses token caching.


> This is what surprises me about your statement that token-caching is
> easier than tree substitution. I would have thought that really
> getting name lookup (in particular two-phase name lookup) right for
> token caching is devilishly hard.

No, it's straightforward, though it took me a while to figure it out.

> You either have to attach the lookup
> result to the cached identifier tokens (but you can't do that, because
> you need a proper parse to determine if you're even supposed to look
> up a name), or you need to be able to recreate the exact lookup
> context at the template definition point. I don't know about DMC++,
> but in Clang that would be a major challenge.
> If you have a better idea, I'd be interested in hearing about it.

Just keep track of the source locations of each definition in the symbol table.
You probably do that already for the symbolic debug info.

Walter Bright

unread,
Aug 5, 2010, 11:30:01 PM8/5/10
to
Nikolay Ivchenkov wrote:
> On 5 Aug, 06:15, Walter Bright <newshou...@digitalmars.com> wrote:
>> Nikolay Ivchenkov wrote:
>>> On 4 Aug, 08:30, Walter Bright <newshou...@digitalmars.com> wrote:
>>>> My reading of C++98 was that the compiler was not required to issue a diagnostic
>>>> for a malformed definition if it was never instantiated.
>>> What's the malformed definition?
>> One that doesn't follow the grammar.
>>
>>> (I want to see normative criteria if
>>> they exist). For example, can we consider the following sequence of
>>> tokens
>>> %+*!^~
>>> as a malformed template definition (according to your interpretation)?
>> Yes.

[...]

> There shall be standard criteria for determining which option is
> right. Can you show these criteria?

Consider this quote from C++98 14.6.7 which I believe overrides the quote from
the introduction:

"Knowing which names are type names allows the syntax of every template
definition to be checked. No diagnostic shall be issued for a template
definition for which a valid specialization can be generated. If no
valid specialization can be generated for a template definition, and that
template is not instantiated, the template definition is illformed,
no diagnostic required. [Note: if a template is instantiated, errors will be
diagnosed according to the other rules in this Standard. Exactly when these
errors are diagnosed is a quality of
implementation issue. ]"

This suggests that if the template is never instantiated, errors in it do not
have to be diagnosed.

BTW, the token collection scheme for templates counts the braces. Your example
won't compile with DMC++. The template definition "parser" does just enough to
correctly find the end of the definition.

Nikolay Ivchenkov

unread,
Aug 5, 2010, 11:36:10 PM8/5/10
to
On 5 Aug, 23:48, Walter Bright <newshou...@digitalmars.com> wrote:
> CornedBee wrote:
> > On Aug 4, 7:15 pm, Walter Bright <newshou...@digitalmars.com> wrote:
> >> Felipe Magno de Almeida wrote:
>
> >>> On Aug 4, 1:30 am, Walter Bright <newshou...@digitalmars.com> wrote:
> >>>> My reading of C++98 was that the compiler was not required to issue a
> diagnostic
> >>>> for a malformed definition if it was never instantiated.
> >>> Unless I'm missing something, I think it is required to diagnose.
> >> At this point, I'll require a specific quote from C++98 <g>.
>
> > [temp.res]p8 gives an explicit example of completely nonsensical code
> > (not even syntactically correct), with the note that it *may* be
> > diagnosed even if the template is never instantiated. In other words,
> > as long as a template is not instantiated, you can have whatever you
> > want in there, as long as the parser is still able to find the end of
> > the template.
>
> Right, so it's not required.

Examples and notes are not normative part of the standard. Moreover,
some examples are provided with inappropriate comments.

1) 3.4.1/3:
[quote]
typedef int f;
struct A {
friend void f(A &);
operator int();
void g(A a) {
f(a);
}
};

The expression f(a) is a cast-expression equivalent to int(a).
[/quote]

f(a) is not an expression, since f(a); is treated as a declaration of
a variable with name "a". The issue is fixed in C++0x.

2) 5/4:
[quote]
i = v[i++]; // the behavior is unspecified
i = 7, i++, i++; // i becomes 9
i = ++i + 1; // the behavior is unspecified
i = i + 1; // the value of i is incremented
[/quote]

Here both occurrences of "the behavior is unspecified" must be
replaced with "the behavior is undefined". The issue is fixed in C+
+0x.

3) 5.2.7/9:
[quote]
class A { virtual void f(); };
class B { virtual void g(); };
class D : public virtual A, private B {};
void g()
{
D d;
B* bp = (B*)&d; // cast needed to break protection
A* ap = &d; // public derivation, no cast needed
D& dr = dynamic_cast<D&>(*bp); // fails
ap = dynamic_cast<A*>(bp); // fails
bp = dynamic_cast<B*>(ap); // fails
ap = dynamic_cast<A*>(&d); // succeeds
bp = dynamic_cast<B*>(&d); // fails
}
class E : public D, public B {};
class F : public E, public D {};
void h()
{
F f;
A* ap = &f; // succeeds: finds unique A
D* dp = dynamic_cast<D*>(ap); // fails: yields 0
// f has two D sub-objects
E* ep = (E*)ap; // ill-formed:
// cast from virtual base
E* ep1 = dynamic_cast<E*>(ap); // succeeds
}
[/quote]

The comment in the line

bp = dynamic_cast<B*>(&d); // fails

shall be changed to "ill-formed". The issue is fixed in C++0x.

Similarly, in 14.6/7

int j;
template<class T> class X {
// ...
void f(T t, int i, char* p)
{
t = i; // diagnosed if X::f is instantiated
// and the assignment to t is an error
p = i; // may be diagnosed even if X::f is
// not instantiated
p = j; // may be diagnosed even if X::f is
// not instantiated
}
void g(T t) {
+; // may be diagnosed even if X::g is
// not instantiated
}
};

the sequence of tokens void g(T t) { +; } shall be completely removed
from the example, because otherwise the entire code does not contain a
template definition. This issue is not fixed yet. In any case examples
do not define standard conformance.

Nikolay Ivchenkov

unread,
Aug 6, 2010, 9:01:33 PM8/6/10
to
On 6 Aug, 07:30, Walter Bright <newshou...@digitalmars.com> wrote:
> Nikolay Ivchenkov wrote:
> > On 5 Aug, 06:15, Walter Bright <newshou...@digitalmars.com> wrote:
> >> Nikolay Ivchenkov wrote:
> >>> On 4 Aug, 08:30, Walter Bright <newshou...@digitalmars.com> wrote:
> >>>> My reading of C++98 was that the compiler was not required to issue a diagnostic
> >>>> for a malformed definition if it was never instantiated.
> >>> What's the malformed definition?
> >> One that doesn't follow the grammar.
>
> >>> (I want to see normative criteria if
> >>> they exist). For example, can we consider the following sequence of
> >>> tokens
> >>> %+*!^~
> >>> as a malformed template definition (according to your interpretation)?
> >> Yes.
>
> [...]
>
> > There shall be standard criteria for determining which option is
> > right. Can you show these criteria?
>
> Consider this quote from C++98 14.6.7 which I believe overrides the quote from
> the introduction:
>
> "Knowing which names are type names allows the syntax of every template
> definition to be checked.

It allows to distinguish template definitions from any other sequences
of tokens.

> If no
> valid specialization can be generated for a template definition, and that
> template is not instantiated, the template definition is illformed,
> no diagnostic required.

We may apply this rule only to a sequence of tokens that can be
recognized as a template definition according to syntactic rules. A
template definition can be semantically incorrect, but every template
definition is syntactically correct _by definition_ (there are no
criteria to distinguish template definitions from any other sequences
of tokens other than syntactic rules).

> This suggests that if the template is never instantiated, errors in it do not
> have to be diagnosed.

That's right only for semantic errors.

> BTW, the token collection scheme for templates counts the braces.

Look at the grammar of C++. Do you see something like

template-declaration:
export opt template < template-parameter-list > potential-
declaration

potential-declaration:
potential-function-declaration
potential-class-declaration
potential-static-data-member-definition

potential-function-declaration:
simple-declaration
potential-function-definition
template-declaration

potential-function-definition:
decl-specifier-seq opt declarator p-ctor-initializer opt potential-
function-body
decl-specifier-seq opt declarator potential-function-try-block

potential-function-body:
{ potential-statement-seq opt }

potential-statement-seq:
potential-statement-seq opt { potential-statement-seq opt }
not-a-brace-seq

not-a-brace-seq:
not-a-brace-seq opt not-a-brace

not-a-brace:
token

(where not-a-brace is any token except { and })

in the standard? I don't see. So if your interpretation would be
correct, how exactly the bounds of a template should be determined?

> Your example
> won't compile with DMC++. The template definition "parser" does just enough to
> correctly find the end of the definition.

If your interpretation would be correct, the end of this template
should be strictly defined, because it affects whether the diagnostic
is required.

Nikolay Ivchenkov

unread,
Aug 7, 2010, 6:09:42 AM8/7/10
to
Nikolay Ivchenkov wrote:

> potential-statement-seq:
> potential-statement-seq opt { potential-statement-seq opt }
> not-a-brace-seq

Correction:


potential-statement-seq:
potential-statement-seq-lhs-seq
not-a-brace-seq
potential-statement-seq-lhs-seq not-a-brace-seq

potential-statement-seq-lhs-seq:
potential-statement-seq-lhs-seq opt potential-statement-seq-
lhs

potential-statement-seq-lhs:
not-a-brace-seq opt { potential-statement-seq opt }

CornedBee

unread,
Aug 7, 2010, 9:20:31 PM8/7/10
to
On Aug 5, 8:28 pm, Walter Bright <newshou...@digitalmars.com> wrote:
> CornedBee wrote:
>
> > A conforming compiler will compile this program to print "Template". VC
> > ++ compiled it to print "Plain".
>
> Digital Mars C++ compiles it and prints "Template", yet uses token caching.
>

I get it, your implementation uses token caching and works. Stop being
so defensive ;)

> > If you have a better idea, I'd be interested in hearing about it.
>
> Just keep track of the source locations of each definition in the symbol table.
> You probably do that already for the symbolic debug info.

Ah, so you filter the by location? Yeah, I can see how that is
possible, although it doesn't sound too efficient to me. (But then,
comparing source locations for absolute position is something that is
inefficient in Clang's implementation, but needn't be in any other.)

Sebastian

Walter Bright

unread,
Aug 7, 2010, 11:03:37 PM8/7/10
to
Nikolay Ivchenkov wrote:
> in the standard? I don't see. So if your interpretation would be
> correct, how exactly the bounds of a template should be determined?

It doesn't matter as long as it correctly parses the rest of the file, and diagnoses template definition errors when the template definition is instantiated. I don't see any conflict, and believe that interpretation is in conformance with the Standard. For your interpretation to be correct, we have to agree that the examples in the Standard are invalid, and have to impute extra conditions into "if no valid specialization can be generated", etc.

If the Standard means something, it should come out and say it rather than relying on such, especially something as straightforward as this should be. I should add that the C++0x Concepts proposal did specifically require that diagnostics be generated even without instantiations - I requested that this requirement be removed, and it was, and I thank the Committee members for their indulgence of my request.

Furthermore, I don't see any *point* to the Standard requiring a diagnostic on this. It's not a problem that needs fixing. It's purely a QoI issue, and the Standard should not involve itself with that.

Walter Bright

unread,
Aug 7, 2010, 11:02:24 PM8/7/10
to
{This article seems to be merely a quote of the original article and it is thus rejected
from comp.lang.c++.moderated. -mod/dk}

Walter Bright

unread,
Aug 8, 2010, 5:48:40 PM8/8/10
to
CornedBee wrote:
> On Aug 5, 8:28 pm, Walter Bright <newshou...@digitalmars.com> wrote:
>> CornedBee wrote:
>>
>>> A conforming compiler will compile this program to print "Template". VC
>>> ++ compiled it to print "Plain".
>> Digital Mars C++ compiles it and prints "Template", yet uses token caching.
>>
>
> I get it, your implementation uses token caching and works. Stop being
> so defensive ;)

Read on...


>>> If you have a better idea, I'd be interested in hearing about it.
>> Just keep track of the source locations of each definition in the symbol table.
>> You probably do that already for the symbolic debug info.
>
> Ah, so you filter the by location? Yeah, I can see how that is
> possible, although it doesn't sound too efficient to me.

Now you see why I sound a bit defensive. First it can't work, then when I show
it does, it can't work very well :-)

BTW, it works very well. Digital Mars C++ is very fast at compiling, far faster
than any other C++ compiler.

Johannes Schaub (litb)

unread,
Aug 8, 2010, 5:49:57 PM8/8/10
to
Walter Bright wrote:

> Nikolay Ivchenkov wrote:
>> in the standard? I don't see. So if your interpretation would be
>> correct, how exactly the bounds of a template should be determined?
>
> It doesn't matter as long as it correctly parses the rest of the file, and
> diagnoses template definition errors when the template definition is
> instantiated. I don't see any conflict, and believe that interpretation is
> in conformance with the Standard. For your interpretation to be correct,
> we have to agree that the examples in the Standard are invalid, and have
> to impute extra conditions into "if no valid specialization can be
> generated", etc.
>
> If the Standard means something, it should come out and say it rather than
> relying on such, especially something as straightforward as this should
> be.

It already does, as Nikolay pointed out in great detail. How is "if no valid
specialization can be generated for a template definition..." not clearly
stating that all the text following applies only to template definitions,
and not to any other random sequence of tokens? I think the Standard just
has a wrong example, but clear normative text that seems to render your
compiler non-conforming.

Walter Bright

unread,
Aug 9, 2010, 10:24:09 PM8/9/10
to
Johannes Schaub (litb) wrote:
> Walter Bright wrote:
>
>> Nikolay Ivchenkov wrote:
>>> in the standard? I don't see. So if your interpretation would be
>>> correct, how exactly the bounds of a template should be determined?
>> It doesn't matter as long as it correctly parses the rest of the file, and
>> diagnoses template definition errors when the template definition is
>> instantiated. I don't see any conflict, and believe that interpretation is
>> in conformance with the Standard. For your interpretation to be correct,
>> we have to agree that the examples in the Standard are invalid, and have
>> to impute extra conditions into "if no valid specialization can be
>> generated", etc.
>>
>> If the Standard means something, it should come out and say it rather than
>> relying on such, especially something as straightforward as this should
>> be.
>
> It already does, as Nikolay pointed out in great detail. How is "if no valid
> specialization can be generated for a template definition..." not clearly
> stating that all the text following applies only to template definitions,
> and not to any other random sequence of tokens?

That is a good point.

> I think the Standard just has a wrong example, but clear normative text that seems to render your
> compiler non-conforming.

Perhaps. I should do more reading on this. Even so, it's irrelevant to
generating a working program from source text, as I mentioned.

Nikolay Ivchenkov

unread,
Aug 9, 2010, 10:25:47 PM8/9/10
to
On 8 Aug, 07:03, Walter Bright <newshou...@digitalmars.com> wrote:
> Nikolay Ivchenkov wrote:
> > in the standard? I don't see. So if your interpretation would be
> > correct, how exactly the bounds of a template should be determined?
>
> It doesn't matter as long as it correctly parses the rest of the file

What is the rest of the file? You can't determine it without applying
the syntactic rules.

> For your interpretation to be correct, we have to agree that the examples
in the Standard
> are invalid,

Yes.

> and have to impute extra conditions into "if no valid specialization can
be generated",

No. "If no valid specialization can be generated for a template
definition..." already refers to a template definition, not to an
arbitrary sequence of tokens. A template definition is a template-
declaration whose declaration (see the syntax of template-declaration)
defines a function, a class, or a static data member - see 14/1. The
syntax of declarations is defined in the clause 7. If you or compiler
can't prove that some sequence of tokens is a template-declaration
then you/compiler can't prove that it is a template definition for
which no diagnostic is required. Is it so difficult to understand
this?

> etc.

What's this?

> If the Standard means something, it should come out and say it

The standard contains a formal description of requirements for
implementations. If you can't understand a consequences of a correct
formal description, that's your problem.

> Furthermore, I don't see any *point* to the Standard requiring a
diagnostic on this.

That's for people who like early diagnostics for errors. Though, this
approach has disadvantages.

Walter Bright

unread,
Aug 10, 2010, 4:29:20 PM8/10/10
to
Nikolay Ivchenkov wrote:
> On 8 Aug, 07:03, Walter Bright <newshou...@digitalmars.com> wrote:
>> If the Standard means something, it should come out and say it
>
> The standard contains a formal description of requirements for
> implementations. If you can't understand a consequences of a correct
> formal description, that's your problem.

It isn't just my problem. Even the examples in the Standard are
incorrect, so
obviously plenty of language lawyers are misunderstanding it.

Furthermore, any non-trivial Standard contains mistakes. Trying to
figure out
the consequences of low level stuff on higher order stuff can produce
unintended
higher level behavior, hence it is quite reasonable to spell out the
intended
higher order behavior in an example or in a footnote. Such can save much
agony
not only for compiler implementors, but users as well.

Johannes Schaub (litb)

unread,
Aug 12, 2010, 3:36:58 AM8/12/10
to
Walter Bright wrote:

> Nikolay Ivchenkov wrote:
>> On 8 Aug, 07:03, Walter Bright <newshou...@digitalmars.com> wrote:
>>> If the Standard means something, it should come out and say it
>>
>> The standard contains a formal description of requirements for
>> implementations. If you can't understand a consequences of a correct
>> formal description, that's your problem.
>
> It isn't just my problem. Even the examples in the Standard are incorrect, so
> obviously plenty of language lawyers are misunderstanding it.
>
> Furthermore, any non-trivial Standard contains mistakes. Trying to figure out
> the consequences of low level stuff on higher order stuff can produce unintended
> higher level behavior, hence it is quite reasonable to spell out the intended
> higher order behavior in an example or in a footnote. Such can save much agony
> not only for compiler implementors, but users as well.
>

Actually, the '97 drafts have different wording. It's interesting to see how
it evolved. It read

"Knowing which names are type names allows the syntax of every template

definition to be checked. A diagnostic shall be issued for a syntactically
ill-formed template definition. [Note: that means that a diagnostic
must be issued even if the template is never instantiated. ..."

And it has the same example with the "+" and says a diagnostic shall be
generated. Now, it seems the standard's body just took this paragraph and
reworded it to allow accepting "syntactically ill-formed template
definitions", whatever that might be. :) Apparently, people are very open to
what can be left without diagnostics - your compiler for instance seems to
accept a TU solely consisting of "%+*!^~" without emitting a diagnostic. I
always wanted a compiler that doesn't moan all that much at me, great!

Felipe Magno de Almeida

unread,
Aug 12, 2010, 3:51:31 PM8/12/10
to
On Aug 12, 4:36 am, "Johannes Schaub (litb)" <schaub-johan...@web.de>
wrote:
>

[snip]

> And it has the same example with the "+" and says a diagnostic shall be
> generated. Now, it seems the standard's body just took this paragraph and
> reworded it to allow accepting "syntactically ill-formed template
> definitions", whatever that might be. :) Apparently, people are very open to
> what can be left without diagnostics - your compiler for instance seems to
> accept a TU solely consisting of "%+*!^~" without emitting a diagnostic. I
> always wanted a compiler that doesn't moan all that much at me, great!

I do prefer one that checks as much as it can. That easies testing
every
template my library might contain.

> --

--
Felipe Magno de Almeida

Walter Bright

unread,
Aug 13, 2010, 10:18:39 AM8/13/10
to
Felipe Magno de Almeida wrote:
> I do prefer one that checks as much as it can. That easies testing
> every template my library might contain.


I'd argue that a template test suite is pitifully inadequate if it
doesn't, at a
bare minimum, try to instantiate each template.

What you need is a coverage analyzer, which will tell you which lines of
code
(including lines in template definitions) that are never executed.

--

Miles Bader

unread,
Aug 14, 2010, 4:22:04 AM8/14/10
to
Walter Bright <newsh...@digitalmars.com> writes:
>> I do prefer one that checks as much as it can. That easies testing
>> every template my library might contain.
>
> I'd argue that a template test suite is pitifully inadequate if it
> doesn't, at a bare minimum, try to instantiate each template.
>
> What you need is a coverage analyzer, which will tell you which lines
> of code (including lines in template definitions) that are never
> executed.

Still it's nice to have one's usual compiler being very picky, just
makes the whole process easier.

[At work, I spend tons of time fixing bogus code that VC++ lets
through without a peep; sometimes it seems that not a single change
the MS-using devs make is actually valid C++...]

-Miles

--
Success, n. The one unpardonable sin against one's fellows.

0 new messages