codelines.awk

mss

unread,

Nov 15, 2009, 9:10:03 PM11/15/09

to

Wrote a quick & dirty line counter for my scripts, but noted:

If the final line of the file that's being counted contains
an empty newline, its excluded.

So... either only consecutive empty lines (excepting the final
line) are considered, or I'm not understanding something with
this script:

{

CMT="#"

if (NF > 0) {
if (index($0, CMT) == 1) {
y++} else {x++}
} else {z++}
}

END{
system("clear")
print "source: " FILENAME
print "lines total: " NR
print "lines of code: " x
print "lines of comments: " y
print "lines empty " z
print "code ratio: " int(((x + y) / NR) * 100) "%"
}

--
later on,
Mike

Ed Morton

unread,

Nov 16, 2009, 9:52:43 AM11/16/09

to

mss wrote:
> Wrote a quick & dirty line counter for my scripts, but noted:
>
> If the final line of the file that's being counted contains
> an empty newline, its excluded.

What do you want to happen with a line that is not empty but just
contains white space? What about a line that starts with white space and
then has a comment start character?

> So... either only consecutive empty lines (excepting the final
> line) are considered, or I'm not understanding something with
> this script:

I think it's more likely there's something unexpected in your input
file. Maybe it was created on DOS?

>
> {
>
> CMT="#"
>
> if (NF > 0) {
> if (index($0, CMT) == 1) {
> y++} else {x++}
> } else {z++}
> }
>
> END{
> system("clear")
> print "source: " FILENAME
> print "lines total: " NR
> print "lines of code: " x
> print "lines of comments: " y
> print "lines empty " z
> print "code ratio: " int(((x + y) / NR) * 100) "%"
> }
>

If I understand you correctly, the above should do what you want given a
specific set of simple input. Please post some sample input and the
output of you running the script on it.

Ed.

mss

unread,

Nov 16, 2009, 12:49:46 PM11/16/09

to

On 2009-11-16, Ed Morton <morto...@gmail.com> wrote:

>> So... either only consecutive empty lines (excepting the final
>> line) are considered, or I'm not understanding something with
>> this script:
>
> I think it's more likely there's something unexpected in your input
> file. Maybe it was created on DOS?

Hey Ed.

Here's another example...(GNU Awk 3.1.6 - Slackware/Linux)

file 'foo' contains four lines, the final line is a linefeed by itself:

--cut below this line--
x
y
z

--cut above this line--

invoking cat foo | gawk '{print NR}' produces:

1
2
3

file 'baz' contains five lines, the final two lines each contain a linefeed by
itself:

--cut below this line--
x
y
z

--cut above this line--

invoking cat baz | gawk '{print NR}' produces:

1
2
3
4

In both cases, the final line is ignored. Yet note in
the 2nd example, the next to last empty line is not ignored.

Is this behavior canonical?

--
later on,
Mike

pk

unread,

Nov 16, 2009, 1:29:57 PM11/16/09

to

mss wrote:

> On 2009-11-16, Ed Morton <morto...@gmail.com> wrote:
>
>>> So... either only consecutive empty lines (excepting the final
>>> line) are considered, or I'm not understanding something with
>>> this script:
>>
>> I think it's more likely there's something unexpected in your input
>> file. Maybe it was created on DOS?
>
> Hey Ed.
>
> Here's another example...(GNU Awk 3.1.6 - Slackware/Linux)
>
> file 'foo' contains four lines, the final line is a linefeed by itself:
>
> --cut below this line--
> x
> y
> z
>
> --cut above this line--
>
> invoking cat foo | gawk '{print NR}' produces:
>
> 1
> 2
> 3

Works fine for me with 3.1.7:

$ cat tempfile
aa
bb
cc
dd

$ awk '{print NR}' tempfile
1
2
3
4
5
$

Aharon Robbins

unread,

Nov 16, 2009, 3:00:58 PM11/16/09

to

Something is broken somewhere. You should get output for each line,
even the empty ones.

Try rebuilding from source using the latest released version, 3.1.7.

Arnold

--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

Ed Morton

unread,

Nov 16, 2009, 3:08:06 PM11/16/09

to

On Nov 16, 11:49 am, mss <m...@dev.null> wrote:

I think you may be confusing carriage return with linefeed. As I said
earlier, there's probably something wrong with your input file. Try
this:

cat -v file
awk '{printf "%d:<%s>\n",NR,$0}' file | cat -v

and post what it produces.

Ed.

mss

unread,

Nov 16, 2009, 4:08:01 PM11/16/09

to

Thanks for the help pk, Aharon, & Ed.

Problem solved, here's the low down...

Typically use vi (vim), but have been using joe of late,
it's somehow leaving any files it edits err...
"transparently trashed". Whats more, a couple of KDE
editors I attempted to verify the integrity of the file
with, by re-saving, failed to catch the incomplete
final line, vim however, complained loudly about the
file containing an "incomplete newline"...

So...

- downloaded the tarball for 3.1.7, built a new binary

- re-saved the file in question in vim

- rm /path/joe

And now I correctly have:

cat foo
x
y
z

$
gawk '{printf "%d:<%s>\n",NR,$0}' foo | cat -v
1:<x>
2:<y>
3:<z>
4:<>
$
wc -l foo
4 foo
$

Tested good with v3.1.7 under Linux, and good as well
with v3.1.6 under Vista.

Sorry for any hassle there, my bad. And finally, here's
the *next iteration* of the script with which I discovered
the issue (not gawk's or Aharon's fault):

#!/bin/awk -f

BEGIN {c="#"; x = y = z = 0; if (ARGC != 2) {q++; exit}}

{

if (NF > 0) {
gsub(/^ +/, "", $0)
if (index($0, c) == 1) {x++} else {y++}
} else {z++}

}

END {
if (! q) {
print "source: " FILENAME
print "comment character: " c
print "lines total: " NR
print "lines of comments: " x
print "lines of code: " y
print "lines empty: " z
print "code density: " int(((y + x) / NR) * 100) "%\n"
}
}

--
later on,
Mike

Ed Morton

unread,

Nov 16, 2009, 5:21:27 PM11/16/09

to

You might want to use "[[:blank:]]" in your gsub() instead of " " so
you catch any sepquence white-space characters (e.g. a tab) rather
than just a sequence of blank chars.

Actually, instead of:

{
if (NF > 0) {
gsub(/^ +/, "", $0)
if (index($0, c) == 1) {x++} else {y++}
} else {z++}
}

I'd use something more like:

{
if ($0 ~ "^[[:blank:]]*" c) {
x++
} else if ($0 ~ "^[[:blank:]]*$") {
z++
} else {
y++
}
}

so you don't need to call 2 functions (gsub() and index())but mainly
so it's clear that your code will always increment exactly one of the
variables.

Ed.

Janis Papanagnou

unread,

Nov 16, 2009, 7:11:09 PM11/16/09

to

Would that be equivalent to...? (one of many possible variants)

!NF { z++ ; next }
/^[[:blank:]]*#/ { x++ }
END { y = NR-(x+z) }

Just to foster awk's condition {action} paradigm and reduce the
expressions a bit more.

Janis

mss

unread,

Nov 17, 2009, 8:33:45 AM11/17/09

to

On 2009-11-16, Ed Morton <morto...@gmail.com> wrote:

> You might want to use "[[:blank:]]" in your gsub() instead of " " so
> you catch any sepquence white-space characters (e.g. a tab) rather
> than just a sequence of blank chars.
>
> Actually, instead of:
>
> {
> if (NF > 0) {
> gsub(/^ +/, "", $0)
> if (index($0, c) == 1) {x++} else {y++}
> } else {z++}
> }
>
> I'd use something more like:
>
> {
> if ($0 ~ "^[[:blank:]]*" c) {
> x++
> } else if ($0 ~ "^[[:blank:]]*$") {
> z++
> } else {
> y++
> }
> }
>
> so you don't need to call 2 functions (gsub() and index())but mainly
> so it's clear that your code will always increment exactly one of the
> variables.
>
> Ed.

Noted, I'll use that in fact. Thanks for the insight,
starting to learn AWK...

--
later on,
Mike