rober...@gmail.com wrote:
> I have written a small program to search all files in a directory for a
> particular string. I want to add the feature that it should not search
> for the string inside comments. I was able to skip single line comments
> but I need help in skipping multiple line C style comments /* blah
> blah...*/
See FAQ "How do I use a regular expression to strip C style com
ments from a file?"
> Currently while looking for the search string I do a line by line
> search.
That will make life a lot harder since you will have to track the state,
inside/not inside quotes, inside/not inside comments at each line/
Note that the solution provided:
$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|([^/"']*("[^"\\]*(\\[\d\D][^"\\]*)*"[^/"']*|'[^'\\]*(\\[\d\D][^'\\]*)*'[^/"']*|/+[^*/][^/"']*)*)#$2#g;
is unnecessarily complicated. One can make use of non-greedy
perl operators and the special way alternation works in perl
(left to right instead of longest as in ERE).
And it fails for inputs like:
foo/* comment */bar (which cpp turns into "foo bar", not
"foobar" as with that perl solution).
s{
/\*.*?\*/
| //[^\n]*
| (
"(?:\\.|.)*?"
| '(?:\\.)?.*?'
| \?\?'
| .[^'"/]*
)
}{if ($1 eq ""){" "}else{$1}}exsg
should be enough and work as well on valid C code.
(??' is a trigraph).
--
Stephane