Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

help,about the regular expression

1 view
Skip to first unread message

jk

unread,
Mar 17, 2010, 1:02:22 AM3/17/10
to
Hello everyone,I want know what are the fellowing regular
expressions,describing the tokens in the C language, mean
to ?
L?'(\\.|[^\\'\n])+'
and
L?\"(\\.|[^\\"\n])*\"
in which
L [a-zA-Z_]
Thanks a lot.


Nick Keighley

unread,
Mar 17, 2010, 4:42:30 AM3/17/10
to

you perhaps need to take this somewhere that deals with regular
expressions. A Unix news group maybe?

Richard Bos

unread,
Mar 17, 2010, 9:03:34 AM3/17/10
to
"jk" <jik...@gmail.com> wrote:

Very little.

A token is a keyword, identifier, constant, string literal or
punctuator. Each of these has its own definition. Preprocessing tokens,
as opposed to normal, "post-processing" tokens, can be a few other
things as well.
Given the inclusion of [a-zA-Z_], I suspect that what those regexes try
to define, in a particularly opaque manner even for regexes, is valid
identifiers. If so, AFAICT they fail, because they do not seem to take
into account that identifiers may not start with, but may contain,
digits[1].
In C89, an identifier is a "non-digit", which is indeed a letter from a
through z[2], or from A through Z, or an underscore; followed by zero or
more of the same set plus the digits 0 through 9. In C99, each of those
(including the first one) may also be either a universal character
name[3], or what the C99 Standard calls "other implementation-defined
characters".

Richard

[1] They also seem to be pre-C99, since they don't contain universal
character names; but then, anyone who uses a UCN in an identifier
(as opposed to in a string or character literal) deserves a
kippering.
[2] In the normal English alphabet, not in the implementation charset
[3] Except that the UCN for the first character may not represent a
digit

Thad Smith

unread,
Mar 21, 2010, 5:28:49 PM3/21/10
to

Here is my guess: The two expressions describe an (optionally wide) character
constant and an (optionally wide) string literal. In that case L would be the
character L, not a symbol for any letter.


--
Thad

0 new messages