hello, world\n
I'm writing a lexer aiming to be able to tokenize all the variations of
the C and C++ languages. My knowledge of the C++ side of things is
likely incomplete, so I ask the C++ community to tell me where I'm
missing something. So far I have come up with this matrix of token
additions since K&R.
Language K&R C89 C99 C11 cfront C++98 C++11 Comment
Tokens:
... - Y Y Y ? Y Y
Trigraphs - Y Y Y ? Y Y
Digraphs - - Y Y ? Y Y
#stringify - Y Y Y ? Y Y
gl##ue - Y Y Y ? Y Y
//Comment - - Y Y Y Y Y
Hexfloats - - Y Y - - Y 0xABC.DEFp+42
:: - - - - Y Y Y
.* - - - - ? Y Y
->* - - - - ? Y Y
L"String" - Y Y Y ? Y Y Wide string literals
L'C' - Y Y Y ? Y Y Wide
character constants
\[uU] IDs - - Y Y - - Y \uABCD in identifiers
U|u|u8"S" - - - Y - - Y Unicode prefixes
U|u'C' - - - Y - - Y Unicode prefixes
R"String" - - - - - - Y Raw strings
"String"x - - - - - ? Y User defined suffix
Notes: - Digraphs are actually a C94 feature.
I'm especially interested in the cells marked '?'. Are there tokens in
C++03 calling for a separate column? Thanks for your comments!
Regards,
Jens
--
Jens Schweikhardt
http://www.schweikhardt.net/
SIGSIG -- signature too long (core dumped)
[ comp.std.c++ is moderated. To submit articles, try posting with your ]
[ newsreader. If that fails, use mailto:
std-cpp...@vandevoorde.com ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ:
http://www.comeaucomputing.com/csc/faq.html ]