Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Skip C like comments

30 views
Skip to first unread message

rober...@gmail.com

unread,
Jan 7, 2005, 4:06:52 AM1/7/05
to
I have written a small program to search all files in a directory for a
particular string. I want to add the feature that it should not search
for the string inside comments. I was able to skip single line comments
but I need help in skipping multiple line C style comments /* blah
blah...*/
Currently while looking for the search string I do a line by line
search.

Brian McCauley

unread,
Jan 7, 2005, 7:11:11 AM1/7/05
to

rober...@gmail.com wrote:
> I have written a small program to search all files in a directory for a
> particular string. I want to add the feature that it should not search
> for the string inside comments. I was able to skip single line comments
> but I need help in skipping multiple line C style comments /* blah
> blah...*/

See FAQ "How do I use a regular expression to strip C style com­
ments from a file?"

> Currently while looking for the search string I do a line by line
> search.

That will make life a lot harder since you will have to track the state,
inside/not inside quotes, inside/not inside comments at each line/

Stephane CHAZELAS

unread,
Jan 7, 2005, 8:06:00 AM1/7/05
to
2005-01-07, 12:11(+00), Brian McCauley:

>
>
> rober...@gmail.com wrote:
>> I have written a small program to search all files in a directory for a
>> particular string. I want to add the feature that it should not search
>> for the string inside comments. I was able to skip single line comments
>> but I need help in skipping multiple line C style comments /* blah
>> blah...*/
>
> See FAQ "How do I use a regular expression to strip C style com­
> ments from a file?"
[...]

Note that the solution provided:

$/ = undef;
$_ = <>;

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|([^/"']*("[^"\\]*(\\[\d\D][^"\\]*)*"[^/"']*|'[^'\\]*(\\[\d\D][^'\\]*)*'[^/"']*|/+[^*/][^/"']*)*)#$2#g;


is unnecessarily complicated. One can make use of non-greedy
perl operators and the special way alternation works in perl
(left to right instead of longest as in ERE).

And it fails for inputs like:

foo/* comment */bar (which cpp turns into "foo bar", not
"foobar" as with that perl solution).

s{
/\*.*?\*/
| //[^\n]*
| (
"(?:\\.|.)*?"
| '(?:\\.)?.*?'
| \?\?'
| .[^'"/]*
)
}{if ($1 eq ""){" "}else{$1}}exsg

should be enough and work as well on valid C code.

(??' is a trigraph).

--
Stephane

0 new messages