The >> token

Peter Ammon

unread,

Dec 15, 2003, 2:49:05 PM12/15/03

to

As we know, due to C++'s "longest match" rule, the >> token causes
headaches when working with nested templates, e.g.

vector<vector<int>>

will not parse correctly without inserting a space between the two >
signs. Why have a >> token at all? Why not have > be the token, and
handle >> in the grammar as two > tokens?

This would permit code like 3 > > 1, but that seems harmless to me.

Dave

unread,

Dec 15, 2003, 2:59:50 PM12/15/03

to

"Peter Ammon" <peter...@rocketmail.com> wrote in message
news:brl37f$sud$1...@news.apple.com...

Hmmm, then what would you use for "greater than"? If it's "overloaded", you
then end up with context sensitivity issues, which makes grammars much
harder to deal with...

Andrey Tarasevich

unread,

Dec 15, 2003, 4:14:28 PM12/15/03

to

Peter Ammon wrote:
> Why have a >> token at all? Why not have > be the token, and
> handle >> in the grammar as two > tokens?

That would make the grammar much more complex than it is now. It is not
worth it.

--
Best regards,
Andrey Tarasevich

Rolf Magnus

unread,

Dec 15, 2003, 5:04:28 PM12/15/03

to

Dave wrote:

There are already context sensitivity issues. That's the reason why you
can't write vector<vector<int>>. The "greater" token already is
"overloaded".

M. Akkerman

unread,

Dec 16, 2003, 4:04:04 PM12/16/03

to

On Mon, 15 Dec 2003 11:49:05 -0800, Peter Ammon
<peter...@rocketmail.com> wrote:

>> Why have a >> token at all?

Because it's pretty usefull. I'm sure a desktop programmer won't
bother much with stuff like individual bits but if you're going to
code for a lower level layer (example: device driver) then
manuipulating bits is your only friend.

Unforgiven

unread,

Dec 16, 2003, 5:56:11 PM12/16/03

to

I don't believe that that's what the OP meant. He wants to keep bitshift of
course, but wants to have the compilers not treat '>>' as a seperate token,
but instead have the compiler determine by context what two consecutive '>'
tokens mean. If that had been done, it would have been possible to write >>
(without a space) at the end of nested templates because the compiler would
see the two seperate '>' tokens and determine that they can't be a bitshift
in that context so correctly see them as the end of the template
instantiation. Currently, the grammatical analyzer gets a '>>' token from
the lexical analyzer in that situation, and concludes that that token is
invalid in that context.

--
Unforgiven

Jerry Coffin

unread,

Dec 17, 2003, 4:28:18 AM12/17/03

to

In article <brlbfs$cdb$01$1...@news.t-online.com>, rama...@t-online.de
says...

[ ... ]

> There are already context sensitivity issues. That's the reason why you
> can't write vector<vector<int>>. The "greater" token already is
> "overloaded".

That's not context sensitivity. Context sensitivity is when your
grammar contains at least one production like:

xA ::= whatever

where an 'A' is recognized as a particular syntactic element ONLY in the
context of an 'x'. Otherwise, it's recognized as some other syntactic
element.

In the case of '<<' or '>>', there's no such thing -- distinguishing
between '>' and '>>' is done entirely at the lexical level, before the
grammar sees either one at all. By the time the parser sees any of
these, the lexer has converted each one to a token. The lexer doesn't
use any context sensitivity either -- it just creates a token out of the
longest sequence of input characters that it can. I.e. it reads in
characters until it encounters one that can't possibly be part of any
token that started with the characters that have already been read. At
that point, it does one of two things: returns the characters its
already read as a token, or else signals an error because what it's read
isn't a token, and the next character in the input can't be part of any
token that could start with the characters that have already been read
either.

There are a few parts of C++ that involve context sensitivity, but
they're mostly there to resolve ambiguities in the grammar proper --
e.g. in some cases, the choice between a declaration and a definition is
context sensitive.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Rolf Magnus

unread,

Dec 17, 2003, 5:18:59 AM12/17/03

to

Jerry Coffin wrote:

> In article <brlbfs$cdb$01$1...@news.t-online.com>, rama...@t-online.de
> says...
>
> [ ... ]
>
>> There are already context sensitivity issues. That's the reason why
>> you can't write vector<vector<int>>. The "greater" token already is
>> "overloaded".
>
> That's not context sensitivity. Context sensitivity is when your
> grammar contains at least one production like:
>
> xA ::= whatever
>
> where an 'A' is recognized as a particular syntactic element ONLY in
> the
> context of an 'x'. Otherwise, it's recognized as some other syntactic
> element.
>
> In the case of '<<' or '>>', there's no such thing --

I was talking about something like 'if (a<3)' vs. 'vector<int>', not
about the '<<' token. Sorry, I should have said that more clearly.

Jerry Coffin

unread,

Dec 17, 2003, 10:53:15 PM12/17/03

to

In article <brpatv$msd$02$1...@news.t-online.com>, rama...@t-online.de
says...

[ talking about context sensitivity ]

> I was talking about something like 'if (a<3)' vs. 'vector<int>', not
> about the '<<' token. Sorry, I should have said that more clearly.

That's still not really context sensitivity, at least in the way the
term is normally defined. Basically, the grammar just has something
like (simplifying drastically):

cmpop: '=' | '<' | '>' | '<=' | '>='
;

expression: operand cmpop operand
| lots of other possibilities elided
;

/* ... */
template_instantiation: template_name '<' templ_args '>' name ';'
;

and a given input will only match one of these. There is (usually) a
bit of trickery involved in recognizing whether 'x' is the name of a
template or some other name (e.g. of a variable), and while this means
the parser needs access to the symbol table, it still isn't context
sensitivity in the classic sense.

In the end, none of this is really new or different with C++ though --
in C, the compiler runs into the same kinds of things, such as '&' being
both a unary operator to take an address and a binary operator to do a
bitwise AND. Here again, the parser has to

Jerry Coffin

unread,

Dec 18, 2003, 9:54:52 AM12/18/03

to

In article <MPG.1a4ad3501...@news.clspco.adelphia.net>,
jco...@taeus.com says...

[ ... ]

> cmpop: '=' | '<' | '>' | '<=' | '>='

Oops -- that should be '==' not '=', of course...