Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

delete comments in .c file

2 views
Skip to first unread message

Timex

unread,
Oct 30, 2003, 8:40:50 PM10/30/03
to
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.

Jem Berkes

unread,
Oct 30, 2003, 9:47:39 PM10/30/03
to
> I want to delete all comments in .c file.
>
> Size of .c file is very big.
>
> Any good idea to do this?

Assuming you have C comments /* like this, right? */
You can use UNIX sed (stream editor). You're not going to believe this,
but: cat input.c | sed -e 's/\/\*.*\*\///g' > output.c

Do a diff to make sure it's working correctly.

--
Jem Berkes
http://www.sysdesign.ca/

Ed Morton

unread,
Oct 30, 2003, 10:23:12 PM10/30/03
to

Jem Berkes wrote:
>>I want to delete all comments in .c file.
>>
>>Size of .c file is very big.
>>
>>Any good idea to do this?
>
>
> Assuming you have C comments /* like this, right? */
> You can use UNIX sed (stream editor). You're not going to believe this,
> but: cat input.c | sed -e 's/\/\*.*\*\///g' > output.c
>
> Do a diff to make sure it's working correctly.
>

That would only work if all comments are in the form you describe, which
isn't likely. e.g. it won't work if the comments are:

/* Start of a fairly common form
* of comment block.
*/

or:

/* set x: */ x = 7; /* now more stuff... */

In the first case it won't delete the comment while in the second
unusual case it'll delete both the comments plus the "x = 7;" assignment
between them.

You need to Google around or if you want a UNIX tool solution, post this
to a UNIX NG (e.g. comp.unix.questions or comp.unix.shell).

Ed.

Keith Thompson

unread,
Oct 30, 2003, 10:56:01 PM10/30/03
to
Jem Berkes <j...@users.pc9__org> writes:
> > I want to delete all comments in .c file.
> >
> > Size of .c file is very big.
> >
> > Any good idea to do this?
>
> Assuming you have C comments /* like this, right? */
> You can use UNIX sed (stream editor). You're not going to believe this,
> but: cat input.c | sed -e 's/\/\*.*\*\///g' > output.c
>
> Do a diff to make sure it's working correctly.

That won't detect multi-line comments. It also fails to properly
ignore comment delimiters inside string and character literals. It
deletes everything from the first "/*" on a line to the last "*/" on
the same line; for example, it transforms this:
x = /* one comment */ 42; /* another comment */
to this:
x =
And it replaces each comment by nothing rather than by a blank, so the
following valid C fragment:
x = sizeof/*comment*/int;
is replaced with this:
x = sizeofint;

If you're dealing with C99 code you'll have to worry about "//"
comments (many pre-C99 compilers support these as an extension).

Stripping C comments is a lot more complex than it looks; you almost
have to duplicate most of the functionality of the preprocessor to get
it right.

I really have to ask the original poster: why do you want to do this?

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"

Derk Gwen

unread,
Oct 30, 2003, 11:36:17 PM10/30/03
to
"Timex" <sugar...@hotmail.com> wrote:
# I want to delete all comments in .c file.
#
# Size of .c file is very big.
#
# Any good idea to do this?
#
# Please show me example code.

tclsh <<':eof'
set c [open something.c]
set b [read $c]
close $c

regsub -all {//[^\n]*\n} $b \n b
regsub -all {/[*]([^*]|[*](?!/))*[*]/} $b {} b

set c [open something-1.c w]
puts $c $b
close $c
:eof

--
Derk Gwen http://derkgwen.250free.com/html/index.html
The whole world's against us.

Keith Thompson

unread,
Oct 31, 2003, 1:18:29 AM10/31/03
to
Derk Gwen <derk...@HotPOP.com> writes:
> "Timex" <sugar...@hotmail.com> wrote:
> # I want to delete all comments in .c file.
> #
> # Size of .c file is very big.
> #
> # Any good idea to do this?
> #
> # Please show me example code.
>
> tclsh <<':eof'
> set c [open something.c]
> set b [read $c]
> close $c
>
> regsub -all {//[^\n]*\n} $b \n b
> regsub -all {/[*]([^*]|[*](?!/))*[*]/} $b {} b
>
> set c [open something-1.c w]
> puts $c $b
> close $c
> :eof

This seems to add an extra blank line at the end of the output file.
It transforms "token/**/pasting" to "tokenpasting", which doesn't
violate the original poster's requirements, but it doesn't match the
way comments are treated in C.

It also doesn't ignore comment delimiters in string and character
literals.

Colin Newell

unread,
Oct 31, 2003, 7:27:45 AM10/31/03
to
Assuming you have access to the C preprocessor (cpp) you can do a quick hack
to use that. Isolate the file from it's include files and run like,

cpp -nostdinc file.c > new_file.c

It should warn that it cannot find all the include files. Now put your
#includes back in and you should have a comment free file.

This is a quick and dirty solution because the C preprocessor is trying to
do a whole load of stuff aswell as stripping comments. You need to make
sure that there are no #defines because otherwise they will be expanded out.
Put them back in after you have run the processor.

I have only come across the C preprocessor as a seperate program called
'cpp' on *nix boxes. It may be available on windows but I do not know where
or how.


Colin.

"Timex" <sugar...@hotmail.com> wrote in message
news:bnsegb$pi1$1...@news.kreonet.re.kr...

Tim Hagan

unread,
Oct 31, 2003, 11:01:45 AM10/31/03
to

This will replace each /* ... */ style comment with a single space:

#include<stdio.h>
int main(void){i\
nt c,p=-1,k=0,s=0
;while((c=getchar
())!=EOF){if(s==0
){if(p=='/'&&c==
'*'){s=1;k=2;}el\
se if(c=='\"'&&p
!='\\'&&p!='\'')s
=2;}else if(s==1)
{if (p=='*'&&c==
'/')s=0;}else if(
s==2){if(c=='\"'
&&p!='\\')s=0;}if
(k==1)putchar(' '
);if(p>0&&s!=1){
if(!k)putchar(p);
if(--k<0)k=0;}p=c
;}putchar(p);ret\
urn 0;}

--
Tim Hagan

Jeremy Yallop

unread,
Oct 31, 2003, 11:52:51 AM10/31/03
to
Tim Hagan wrote:
> This will replace each /* ... */ style comment with a single space:
>
> #include<stdio.h>
> int main(void){i\
> nt c,p=-1,k=0,s=0
> ;while((c=getchar
> ())!=EOF){if(s==0
> ){if(p=='/'&&c==
> '*'){s=1;k=2;}el\
> se if(c=='\"'&&p
> !='\\'&&p!='\'')s
>=2;}else if(s==1)
> {if (p=='*'&&c==
> '/')s=0;}else if(
> s==2){if(c=='\"'
> &&p!='\\')s=0;}if
> (k==1)putchar(' '
> );if(p>0&&s!=1){
> if(!k)putchar(p);
> if(--k<0)k=0;}p=c
> ;}putchar(p);ret\
> urn 0;}


It doesn't handle line-splicing. Also, putchar(-1) is not portable.

Jeremy.

Halleyscomet

unread,
Oct 31, 2003, 12:35:13 PM10/31/03
to
> I want to delete all comments in .c file.
>
> Size of .c file is very big.

You work for SCO's Linux division, don't you?

Tim Hagan

unread,
Oct 31, 2003, 12:44:40 PM10/31/03
to

putchar(-1) is never executed in the above code, but you're right
about the line-splicing. Oh, well, back to the drawing board ...

--
Tim Hagan

Nudge

unread,
Oct 31, 2003, 1:04:20 PM10/31/03
to
>> I want to delete all comments in .c file.
>>
>> Size of .c file is very big.
>
> You work for SCO's Linux division, don't you?

<GRIN>

Jeremy Yallop

unread,
Oct 31, 2003, 3:40:47 PM10/31/03
to

putchar(-1) is executed if EOF is encountered immediately.

Jeremy.

Christopher Benson-Manica

unread,
Oct 31, 2003, 3:56:03 PM10/31/03
to
Jeremy Yallop <jer...@jdyallop.freeserve.co.uk> spoke thus:

> putchar(-1) is executed if EOF is encountered immediately.

Why would one want to write putchar(-1) anyway...?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Mark A. Odell

unread,
Oct 31, 2003, 4:03:38 PM10/31/03
to
"Timex" <sugar...@hotmail.com> wrote in
news:bnsegb$pi1$1...@news.kreonet.re.kr:

Just grab an evaluation copy of Codewright from Borland. Do a
search-replace on . <-- regexp and restrict "to comments" with a
replacement string of nothing. I just did it to a very large file in about
5 seconds. Now what does this have to do with the C language? The C
language does not specify how to delete comments.

--
- Mark ->
--

Tim Hagan

unread,
Oct 31, 2003, 5:25:26 PM10/31/03
to

... unless one tries to remove the comments from an empty file. :-)

> putchar(-1) is executed if EOF is encountered immediately.

So just insert 'if (p > 0)' before the final putchar.

--
Tim Hagan

Richard Heathfield

unread,
Oct 31, 2003, 5:47:18 PM10/31/03
to
Mark A. Odell wrote:

> Now what does this have to do with the C language? The C
> language does not specify how to delete comments.

The Standard says: "Each comment is replaced by one space character." If
that doesn't specify how to delete comments, I don't know what does.

--
Richard Heathfield : bin...@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Mark McIntyre

unread,
Oct 31, 2003, 6:25:04 PM10/31/03
to
On Fri, 31 Oct 2003 22:47:18 +0000 (UTC), in comp.lang.c , Richard
Heathfield <dont...@address.co.uk.invalid> wrote:

>Mark A. Odell wrote:
>
>> Now what does this have to do with the C language? The C
>> language does not specify how to delete comments.
>
>The Standard says: "Each comment is replaced by one space character." If
>that doesn't specify how to delete comments, I don't know what does.

I guess Mark's point is that to be sure of being syntactically
identical to the original you should replace all comments by a space.
For instance
int i = 23/* */12;
should still generate a syntax error . :-)

What the OP would expect it to do with
double i = 32 //* */ 4
;
is anyone's guess. I guess you'd have to have a C99 and a C89 mode.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>


----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---

Stephen Samuel

unread,
Nov 2, 2003, 8:49:18 AM11/2/03
to
Here's a perl script which will handle *MOST* sane C code...

Some things that it will miss (scan manually for it first:

a double quote inside of single quotes (e.g.)
char confusion = '"';

C-99 // comments like this

I'm sure that some people can come up with other convoluted counter-examples.

It reads and plays with the entire file, so it will need to hold
at least two or three copies of it in RAM. (for today's computers,
that would be some number of megabytes).

If you want any of the above fixed, feel free to send me a cheque.
____________________________________________________
#!/usr/bin/perl
$s=join("",<>);
# printf "[[%s]]\n\n",$s;
$s=~ s/("(\\\\|\\"|[^"])*")|(\/\*([^*]|\*(?=[^\/]))*\*\/)|(\/\/.*)/[[$1 ]]/g;
printf "[[%s]]\n\n",$s;
____________________________________________________
Yep, That's it... 5 lines including the shell header.


--
Stephen Samuel +1(604)876-0426 sam...@bcgreen.com
http://www.bcgreen.com/~samuel/
Powerful committed communication. Transformation touching
the jewel within each person and bringing it to light.

Irrwahn Grausewitz

unread,
Nov 2, 2003, 9:27:45 AM11/2/03
to
Stephen Samuel <stephen...@telus.net> wrote:

> Here's a perl script which will handle *MOST* sane C code...

<snip>

Since when is perl topical in c.l.c?

BTW:
Does your "solution" account for comment delimiters inside string
literals? (I'm unfortunately unable to decrypt the line-noise
provided.)

Regards
--
Irrwahn
(irrw...@freenet.de)

CBFalconer

unread,
Nov 2, 2003, 10:56:12 AM11/2/03
to
*** rude top-posting fixed ***

Please do not top-post.

The following AFAIK does not have the above faults, and does not
need to store any file copies, in fact not even any line copies.
It will probably be at least an order of magnitude faster.

/* File uncmntc.c - demo of a text filter
Strips C comments. Tested to strip itself
by C.B. Falconer. 2002-08-15
Public Domain. Attribution appreciated
report bugs to <mailto:cbfal...@worldnet.att.net>
*/

/* With gcc3.1, must omit -ansi to compile eol comments */

#include <stdio.h>
#include <stdlib.h>

static int ch, lastch;

/* ---------------- */

static void putlast(void)
{
if (0 != lastch) fputc(lastch, stdout);
lastch = ch;
ch = 0;
} /* putlast */

/* ---------------- */

/* gobble chars until star slash appears */
static int stdcomment(void)
{
int ch, lastch;

ch = 0;
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('*' == lastch) && ('/' == ch)));
return ch;
} /* stdcomment */

/* ---------------- */

/* gobble chars until EOLine or EOF. i.e. // comments */
static int eolcomment(void)
{
int ch, lastch;

ch = '\0';
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('\n' == ch) && ('\\' != lastch)));
return ch;
} /* eolcomment */

/* ---------------- */

/* echo chars until '"' or EOF */
static int echostring(void)
{
putlast();
if (EOF == (ch = fgetc(stdin))) return EOF;
do {
putlast();
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('"' == ch) && ('\\' != lastch)));
return ch;
} /* echostring */

/* ---------------- */

int main(void)
{
lastch = '\0';
while (EOF != (ch = fgetc(stdin))) {
if ('/' == lastch)
if (ch == '*') {
lastch = '\0';
if (EOF == stdcomment()) break;
ch = ' ';
putlast();
}
else if (ch == '/') {
lastch = '\0';
if (EOF == eolcomment()) break;
ch = '\n';
putlast(); // Eolcomment here
// Eolcomment line \
with continuation line.
}
else {
putlast();
}
else if (('"' == ch) && ('\\' != lastch)
&& ('\'' != lastch)) {
if ('"' != (ch = echostring())) {
fputs("\"Unterminated\" string\n", stderr);
fputs("checking for\
continuation line string\n", stderr);
fputs("checking for" "concat string\n", stderr);
return EXIT_FAILURE;
}
putlast();
}
else {
putlast();
}
} /* while */
putlast(/* embedded comment */);
return 0;
} /* main */


--
Chuck F (cbfal...@yahoo.com) (cbfal...@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!


Ed Morton

unread,
Nov 2, 2003, 11:15:29 AM11/2/03
to

Try "ncsl": http://www.lucentssg.com/displayProduct.cfm?prodid=33
It strips all comments and indentation so just run an indenter (e.g.
"indent") or a C beautifier (e.g. "cb" - google for "cb download
beautifier" and take your pick) on the output to get it back in readable
format. Disclaimer - I've never used this specific download of "ncsl",
I've just used the version provided on UNIX boxes within Lucent.

Ed.

Stephen Samuel

unread,
Nov 2, 2003, 1:52:37 PM11/2/03
to
Here's a perl script which will handle *MOST* sane C code...

Some things that it will miss (scan manually for it first:

a double quote inside of single quotes (e.g.)
char confusion = '"';

C-99 // comments like this

I'm sure that some people can come up with other convoluted counter-examples.

It reads and plays with the entire file, so it will need to hold
at least two or three copies of it in RAM. (for today's computers,
that would be some number of megabytes).

If you want any of the above fixed, feel free to send me a cheque.
____________________________________________________
#!/usr/bin/perl
$s=join("",<>);
# printf "[[%s]]\n\n",$s;
$s=~ s/("(\\\\|\\"|[^"])*")|(\/\*([^*]|\*(?=[^\/]))*\*\/)|(\/\/.*)/[[$1 ]]/g;
printf "[[%s]]\n\n",$s;
____________________________________________________
Yep, That's it... 5 lines including the shell header.

One bug: Quoted strings have a space inserted after them.
Again: fixable, but not worth the trouble for free.

Stephen Samuel

unread,
Nov 2, 2003, 1:54:03 PM11/2/03
to
Here's a perl script which will handle *MOST* sane C code...

Some things that it will miss (scan manually for it first:

a double quote inside of single quotes (e.g.)
char confusion = '"';

C-99 // comments like this

I'm sure that some people can come up with other convoluted counter-examples.

It reads and plays with the entire file, so it will need to hold
at least two or three copies of it in RAM. (for today's computers,
that would be some number of megabytes).

If you want any of the above fixed, feel free to send me a cheque.
____________________________________________________
#!/usr/bin/perl
$s=join("",<>);
# printf "[[%s]]\n\n",$s;
$s=~ s/("(\\\\|\\"|[^"])*")|(\/\*([^*]|\*(?=[^\/]))*\*\/)|(\/\/.*)/[[$1 ]]/g;
printf "[[%s]]\n\n",$s;
____________________________________________________
Yep, That's it... 5 lines including the shell header.

One bug: Quoted strings have a space inserted after them.
Again: fixable, but not worth the trouble for free.

Stephen Samuel

unread,
Nov 2, 2003, 2:08:06 PM11/2/03
to
Irrwahn Grausewitz wrote:
> Stephen Samuel <stephen...@telus.net> wrote:
>
>
>>Here's a perl script which will handle *MOST* sane C code...
>
> <snip>
>
> Since when is perl topical in c.l.c?
It's a C solution .. But Perl is written in C, so if you like,
I can just
#include <perl-source.c>

> BTW:
> Does your "solution" account for comment delimiters inside string
> literals? (I'm unfortunately unable to decrypt the line-noise
> provided.)

Yes. It accounts for comment delimiters in quotes and quote
delimiters in comments (One side effect is that double quote
strings have a space added after them. Given the way that I
wrote it, it was a choice between that, replacing comments with
Nothing (possible to cause syntax errors) or added complexity.)

It also handles quoted double-quotes inside of strings.

It does NOT handle double-quote or comment-start delimiters inside
of single-quotes (char literals), but that would be easy enough to add.

Joona I Palaste

unread,
Nov 2, 2003, 2:42:12 PM11/2/03
to
Stephen Samuel <stephen...@telus.net> scribbled the following:

> Irrwahn Grausewitz wrote:
>> Stephen Samuel <stephen...@telus.net> wrote:
>>>Here's a perl script which will handle *MOST* sane C code...
>>
>> <snip>
>>
>> Since when is perl topical in c.l.c?
> It's a C solution .. But Perl is written in C, so if you like,
> I can just
> #include <perl-source.c>

Are Perl implementations *required* to be written in C? And are
Perl implementations *required* to ship with the source code?

--
/-- Joona Palaste (pal...@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"'So called' means: 'There is a long explanation for this, but I have no
time to explain it here.'"
- JIPsoft

Stephen Samuel

unread,
Nov 2, 2003, 2:49:56 PM11/2/03
to
CBFalconer wrote:
> *** rude top-posting fixed ***
>
Hmm.. This must be a relatively recent addition to usenet
ettiquete (i.e. in the last decade or so).

Appologies. I'm an old foggie, and it's probably been an decade
since I've posted here.

Irrwahn Grausewitz

unread,
Nov 2, 2003, 3:18:37 PM11/2/03
to
"Stephen Samuel" wrote:

> Irrwahn Grausewitz wrote:
> > Since when is perl topical in c.l.c?
> It's a C solution

Err, no.

>.. But Perl is written in C, so if you like,
> I can just
> #include <perl-source.c>

Non-standard header file. ;-)

> > Does your "solution" account for comment delimiters inside string
> > literals?
>

> Yes.

Nice.

> It accounts for comment delimiters in quotes and quote
> delimiters in comments (One side effect is that double quote
> strings have a space added after them. Given the way that I
> wrote it, it was a choice between that, replacing comments with
> Nothing (possible to cause syntax errors) or added complexity.)

Hm. AFAICT that shouldn't cause much trouble, OK.

> It also handles quoted double-quotes inside of strings.

ITYM something like "\""?

> It does NOT handle double-quote or comment-start delimiters inside
> of single-quotes (char literals), but that would be easy enough to
> add.

Fair enough.
But still there might be "strange" cases caused where your script may
fail. Consider:

/* gotcha! *\
/

A C preprocessor would have deleted the <backslash><new-line> sequence
in translation phase 2 *before* the tokenization and comment replacement
takes place in phase 3. And if the backslash is written as a trigraph
sequence we need to "fake" translation phase 1 as well... :-(

Admittedly, these are rare situations, but you see: sophisticated
comment replacement in C files isn't /that/ easy after all, you have to
provide quite an amount of preprocessor functionality to get it right.

Best Regards
--
Irrwahn

PS: Please don't email me if you already posted
your reply to the newsgroup; thank you.

Keith Thompson

unread,
Nov 2, 2003, 3:31:18 PM11/2/03
to
Joona I Palaste <pal...@cc.helsinki.fi> writes:
[...]

> Are Perl implementations *required* to be written in C? And are
> Perl implementations *required* to ship with the source code?

<OT>
Perl is pretty much defined by its implementation, not by a language
standard. The implementation (there's basically only one) is written
in C. It's distributed under one of two open source licenses, both of
which require the source to be available (but not necessarily shipped
with the binaries).

This is probably incorrect in some minor details. If I had posted to
a more appropriate newsgroup, someone would jump in and correct me.
</OT>

Joona I Palaste

unread,
Nov 2, 2003, 3:34:39 PM11/2/03
to
Keith Thompson <k...@cts.com> scribbled the following:

> Joona I Palaste <pal...@cc.helsinki.fi> writes:
> [...]
>> Are Perl implementations *required* to be written in C? And are
>> Perl implementations *required* to ship with the source code?

> <OT>
> Perl is pretty much defined by its implementation, not by a language
> standard. The implementation (there's basically only one) is written
> in C. It's distributed under one of two open source licenses, both of
> which require the source to be available (but not necessarily shipped
> with the binaries).

> This is probably incorrect in some minor details. If I had posted to
> a more appropriate newsgroup, someone would jump in and correct me.
> </OT>

OK, I have to concede with that, but Samuel's answer still wasn't
sufficient. Writing #include <perl_source.h> at the top of the Perl
file will change the program into a mix-and-match of C and Perl,
which will not compile as either language.

--
/-- Joona Palaste (pal...@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/

"Roses are red, violets are blue, I'm a schitzophrenic and so am I."
- Bob Wiley

Keith Thompson

unread,
Nov 2, 2003, 3:35:24 PM11/2/03
to
Irrwahn Grausewitz <irrw...@freenet.de> writes:
> Stephen Samuel <stephen...@telus.net> wrote:
>
> > Here's a perl script which will handle *MOST* sane C code...
> <snip>
>
> Since when is perl topical in c.l.c?

This is an interesing edge case with respect to topicality. One could
argue that we're talking *about* C (which is clearly topical), but
we're using a mixture of Perl and English to discuss it. Think of the
Perl regular expression as a description of how to strip comments from
C source code.

On the other hand, not everyone here can be expected to speak Perl
regexps fluently.

Irrwahn Grausewitz

unread,
Nov 2, 2003, 3:43:27 PM11/2/03
to
Stephen Samuel <stephen...@telus.net> wrote:

> CBFalconer wrote:
> > *** rude top-posting fixed ***
> >
> Hmm.. This must be a relatively recent addition to usenet
> ettiquete (i.e. in the last decade or so).

It's a convention in comp.lang.c (and several other technical
newsgroups) to place your comments after the part of the original
post you are responding to, in order to retain context. Thus
top-posting is discouraged in c.l.c.



> Appologies. I'm an old foggie, and it's probably been an decade
> since I've posted here.

Again, please do not send email copies of your replies; thank you.

Regards
--
Irrwahn
(irrw...@freenet.de)

Irrwahn Grausewitz

unread,
Nov 2, 2003, 7:07:00 PM11/2/03
to
Keith Thompson <k...@cts.com> wrote:

> Irrwahn Grausewitz <irrw...@freenet.de> writes:
> >
> > Since when is perl topical in c.l.c?
>
> This is an interesing edge case with respect to topicality. One could
> argue that we're talking *about* C (which is clearly topical), but
> we're using a mixture of Perl and English to discuss it. Think of the
> Perl regular expression as a description of how to strip comments from
> C source code.

That would make any solution to manipulate C sources implemented in
any language other than C topical in c.l.c. IMHO that would not be a
Good Thing[tm].



> On the other hand, not everyone here can be expected to speak Perl
> regexps fluently.

Indeed.

Regards
--
Irrwahn
(irrw...@freenet.de)

Message has been deleted

CBFalconer

unread,
Nov 4, 2003, 11:20:51 AM11/4/03
to
"Arthur J. O'Dwyer" wrote:
> On Sun, 2 Nov 2003, CBFalconer wrote:
> >
.... snip ...

>
> > /* File uncmntc.c - demo of a text filter
> > Strips C comments. Tested to strip itself
> > by C.B. Falconer. 2002-08-15
> > Public Domain. Attribution appreciated
> > report bugs to <mailto:cbfal...@worldnet.att.net>
> > */
> <snip code>
>
> I ran your program through some hurdles, and found that
> it couldn't handle multibyte character constants for some
> reason. I didn't bother to track down why; I just re-wrote
> the filter from scratch. ;-) Here's my version, whose
> algorithm may be completely different from yours.
... snip ...

A known failing. It also fails miserably with trigraphs. The
multibyte char is probably easily handled analogously to handling
quoted strings.

>
> /* File uncmntc2.c - demo of a different text filter


> Strips C comments. Tested to strip itself

> Improves on CBFalconer's design by correctly handling '/*'
> and by having a C89/C99 switch, but doesn't handle the /\
> * delimiter correctly.
> by Arthur O'Dwyer, 2002-11-03
^^^^
That is the year I wrote mine :-)

All of which shows that there are multiple ways to implement a
black box. I omitted any reference to cats because I happen to
like them.

Patrick Foley

unread,
Nov 4, 2003, 1:07:15 PM11/4/03
to
In <Pine.LNX.4.58-035....@unix42.andrew.cmu.edu> "Arthur J. O'Dwyer" <a...@nospam.andrew.cmu.edu> writes:

> [snip] Comment removal really is tricky in the most general case!

Since this is exercise 1-23 in K&R2, there are several solutions
available at Richard's site:

http://users.powernet.co.uk/eton/kandr2/index.html

including a 556-line entry from Chris Torek that I think also brews
coffee...

Pat

BTW, Richard: Would you consider adding a plaintext version of the
"naming conventions" page to the zipfile as a sort of "README"?

Richard Heathfield

unread,
Nov 4, 2003, 2:31:46 PM11/4/03
to
Patrick Foley wrote:

> BTW, Richard: Would you consider adding a plaintext version of the
> "naming conventions" page to the zipfile as a sort of "README"?

I am currently re-evaluating the Answers section of my site. I'll get back
to you when I have a bit more time.

Chris Torek

unread,
Nov 6, 2003, 3:53:16 AM11/6/03
to
>In <Pine.LNX.4.58-035....@unix42.andrew.cmu.edu> "Arthur J. O'Dwyer" <a...@nospam.andrew.cmu.edu> writes:
>> [snip] Comment removal really is tricky in the most general case!

In article <nc6l71-...@myname.my.domain>


Patrick Foley <pfo...@earthlink.net> writes:
>Since this is exercise 1-23 in K&R2, there are several solutions
>available at Richard's site:
>
>http://users.powernet.co.uk/eton/kandr2/index.html
>
>including a 556-line entry from Chris Torek that I think also brews
>coffee...

But it has (gasp!) a *bug*. :-) The "level 2 state machine" for
handling comments fails to reconsider characters in a few cases.
I think the main (only?) problem can be fixed without too much
fuss:

case L2_SLASH:
if (c == '*')
l2state = L2_COMM;
else if (c99 && c == '/')
l2state = L2_SLASHSLASH;
else {
SYNCLINES();
OUTPUT('/', 0);
--> if (c != '/') {
--> if (c != EOF)
--> COPY();
--> l2state = L2_NORMAL;
--> }
}
break;

The bug is in the marked lines, which output the first slash
and then change the level-2 state. But the new state should
be "that which results in seeing character c as if the initial
state had been L2_NORMAL", so we could replace all of them with:

l2state = L2_NORMAL;
goto l2_normal_case;

and add an "l2_normal_case" label under case L2_NORMAL: above.
Alternatively, the assignment to l2state can be changed to:

l2state = c == '\'' ? L2_CC :
c == '"' ? L2_SC : L2_NORMAL;

which avoids the dreaded "goto", and simply duplicates what would
have happened in L2_NORMAL state (except of course that instead of
replacing l2state with L2_SLASH for '/', we have to replace it with
L2_NORMAL for characters that are not in [/'"]).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.

Thomas Matthews

unread,
Nov 6, 2003, 12:52:05 PM11/6/03
to
Timex wrote:

> I want to delete all comments in .c file.
>
> Size of .c file is very big.
>
> Any good idea to do this?
>
> Please show me example code.
>
>
>

Perhaps a better idea is to break the file into
smaller pieces upon better themes.

I believe that delete all the comments is crime
against programming ethics. After all, one of
the greatest ideals to achieve is to make a
program readable by a programming illiterate
person.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Gary E. Ansok

unread,
Nov 6, 2003, 7:37:44 PM11/6/03
to
In article <Pine.LNX.4.58-035....@unix42.andrew.cmu.edu>,

Arthur J. O'Dwyer <a...@nospam.andrew.cmu.edu> wrote:
>> > Timex wrote:
>> > >
>> > > I want to delete all comments in .c file.
>

I tested Arthur's program and, despite its claim, it couldn't
even strip its own comments (it left in the comment in
put_carefully()). The bug is that it thought the backslash
meant that '\\' was not a complete character constant (nor
would it think "\\" was a complete string).

Is this a complete C99-style comment?
// \\
If it is, a similar fix may be needed in that part of the code.

Lesson:


> Comment removal really is tricky in the most general case!

Agreed.

-- Gary

My attempt at a bug fix:
/* File uncmntc2.c - demo of a different text filter


Strips C comments. Tested to strip itself

Improves on CBFalconer's design by correctly handling '/*'
and by having a C89/C99 switch, but doesn't handle the /\
* delimiter correctly.
by Arthur O'Dwyer, 2002-11-03

bug fix by Gary Ansok, 2003-11-06 to handle '\\' and "\\"
Public Domain. Attribution appreciated
don't bother reporting bugs, just fix 'em...
*/

#include <stdio.h>
#include <stdlib.h>

/* Strip C99-style end-of-line comments? */
int AllowEOLComments = 1;

int strip_comments(FILE *fp, FILE *outfp);
static int put_carefully(int lastch, int ch, FILE *outfp);


int main(void)
{
strip_comments(stdin, stdout);
return 0;
}


int strip_comments(FILE *fp, FILE *outfp)
{
int ch;
int lastch;
int inchotes = 0;
int inquotes = 0;
int incomment = 0;
int ineolcomment = 0;
int backslashed = 0;

for (lastch = ' '; (ch = getc(fp)) != EOF; lastch = ch)
{
if (!incomment && !ineolcomment)
{
if (inquotes || inchotes)
putc(ch, outfp);
else
put_carefully(lastch, ch, outfp);
}

if (inchotes) {
if (lastch == '\\')
backslashed ^= 1;
else
backslashed = 0;
if (ch == '\'' && !backslashed)
inchotes = 0;
} else if (inquotes) {
if (lastch == '\\')
backslashed ^= 1;
else
backslashed = 0;
if (ch == '"' && !backslashed)
inquotes = 0;
} else if (incomment) {
if (ch == '/' && lastch == '*')
incomment = 0, ch = ' ';
} else if (ineolcomment) {
if (ch == '\n' && lastch != '\\')
ineolcomment = 0;
} else {
if (ch == '\'')
inchotes = 1;
else if (ch == '"')
inquotes = 1;
else if (lastch == '/' && ch == '*') {
putc(' ', outfp);
incomment = 1;
}
else if (AllowEOLComments && lastch == '/' && ch == '/')
ineolcomment = 1;
}
}

if (lastch == '/')
putc(lastch, outfp);

return 0;
}


static int put_carefully(int lastch, int ch, FILE *outfp)
{
/* Print out 'ch', but be very careful not to print
* any characters that might be part of a comment
* delimiter. Contrariwise, if 'lastch' is now
* definitely *not* a comment delimiter, we must now
* print it, too.
*/

if (AllowEOLComments) {
if (lastch == '/' && ch == '/')
return 0;
}
if (lastch == '/' && ch == '*')
return 0;
if (lastch == '/')
putc(lastch, outfp);
if (ch != '/')
putc(ch, outfp);
return 0;
}

0 new messages