you perhaps need to take this somewhere that deals with regular
expressions. A Unix news group maybe?
Very little.
A token is a keyword, identifier, constant, string literal or
punctuator. Each of these has its own definition. Preprocessing tokens,
as opposed to normal, "post-processing" tokens, can be a few other
things as well.
Given the inclusion of [a-zA-Z_], I suspect that what those regexes try
to define, in a particularly opaque manner even for regexes, is valid
identifiers. If so, AFAICT they fail, because they do not seem to take
into account that identifiers may not start with, but may contain,
digits[1].
In C89, an identifier is a "non-digit", which is indeed a letter from a
through z[2], or from A through Z, or an underscore; followed by zero or
more of the same set plus the digits 0 through 9. In C99, each of those
(including the first one) may also be either a universal character
name[3], or what the C99 Standard calls "other implementation-defined
characters".
Richard
[1] They also seem to be pre-C99, since they don't contain universal
character names; but then, anyone who uses a UCN in an identifier
(as opposed to in a string or character literal) deserves a
kippering.
[2] In the normal English alphabet, not in the implementation charset
[3] Except that the UCN for the first character may not represent a
digit
Here is my guess: The two expressions describe an (optionally wide) character
constant and an (optionally wide) string literal. In that case L would be the
character L, not a symbol for any letter.
--
Thad