finding lines in source code longer than 80 chars

Russ P.

unread,

Nov 2, 2016, 3:25:17 PM11/2/16

to

What is a quick way to find lines in source code that are longer than 80 chars? I would like to be able to run it on multiple files using a wild-card (e.g. *.scala) and list each file and line number that exceed 80 chars. I could write a python or scala script to do this, but I figure there is probably a one-line (or otherwise very short) sed or awk command that can do it. Any ideas? Thanks.

Ed Morton

unread,

Nov 2, 2016, 3:37:32 PM11/2/16

to

On 11/2/2016 2:25 PM, Russ P. wrote:
> What is a quick way to find lines in source code that are longer than 80 chars? I would like to be able to run it on multiple files using a wild-card (e.g. *.scala) and list each file and line number that exceed 80 chars. I could write a python or scala script to do this, but I figure there is probably a one-line (or otherwise very short) sed or awk command that can do it. Any ideas? Thanks.
>

grep -E -n '.{81}' *.scala

awk '/.{81}/{print FILENAME, FNR}' *.scala

Russ P.

unread,

Nov 2, 2016, 3:43:13 PM11/2/16

to

Nice. Thanks!

Kaz Kylheku

unread,

Nov 2, 2016, 3:50:19 PM11/2/16

to

On 2016-11-02, Ed Morton <morto...@gmail.com> wrote:
> awk '/.{81}/{print FILENAME, FNR}' *.scala

awk 'length ~ /^(8[1-9]|9.|...+)$/ { print FILENAME, FNR }'

This is better: it says "Yes, I *know* about the goddamned length
function, but my cold, dead hands will still be clutching a regex."

Ed Morton

unread,

Nov 2, 2016, 4:15:08 PM11/2/16

to

No, if you wanted to use the length() function for some reason then you wouldn't
need a regexp at all, you'd just do:

awk 'length($0) > 80 { print FILENAME, FNR }'

Your regexp could be improved upon btw IF you really wanted to compare the
numeric output of length() to a regexp but that seems pointless.

Ed.

Kaz Kylheku

unread,

Nov 2, 2016, 4:27:17 PM11/2/16

to

On 2016-11-02, Ed Morton <morto...@gmail.com> wrote:
> On 11/2/2016 2:50 PM, Kaz Kylheku wrote:
>> On 2016-11-02, Ed Morton <morto...@gmail.com> wrote:
>>> awk '/.{81}/{print FILENAME, FNR}' *.scala
>>
>> awk 'length ~ /^(8[1-9]|9.|...+)$/ { print FILENAME, FNR }'
>>
>> This is better: it says "Yes, I *know* about the goddamned length
>> function, but my cold, dead hands will still be clutching a regex."
>>
>
> No, if you wanted to use the length() function for some reason then you wouldn't
> need a regexp at all, you'd just do:

What's that thing Kenny said once?

"I will take missing the point for $800, Alex."

> awk 'length($0) > 80 { print FILENAME, FNR }'

"length" is already "length($0)".

Kenny McCormack

unread,

Nov 2, 2016, 4:33:44 PM11/2/16

to

In article <201611021...@kylheku.com>,

Bravo! Of course, Ed won't get it...

Anyway, I was going to suggest using length, like this:

gawk 'length > 80' *.scala

But note that using 'length' by itself like that, with no parens and no
arg, may or may not work in any given version of AWK. It is known to work
in GAWK (hence my specifying gawk above), but it is noted in the GAWK
manual that it is an anachronism. The portable form is, of course:

gawk 'length($0) > 80' *.scala

FWIW, I think that just: length()
might also work...

--
Shikata ga nai...

Ed Morton

unread,

Nov 2, 2016, 4:35:28 PM11/2/16

to

On 11/2/2016 3:27 PM, Kaz Kylheku wrote:
> On 2016-11-02, Ed Morton <morto...@gmail.com> wrote:
>> On 11/2/2016 2:50 PM, Kaz Kylheku wrote:
>>> On 2016-11-02, Ed Morton <morto...@gmail.com> wrote:
>>>> awk '/.{81}/{print FILENAME, FNR}' *.scala
>>>
>>> awk 'length ~ /^(8[1-9]|9.|...+)$/ { print FILENAME, FNR }'
>>>
>>> This is better: it says "Yes, I *know* about the goddamned length
>>> function, but my cold, dead hands will still be clutching a regex."
>>>
>>
>> No, if you wanted to use the length() function for some reason then you wouldn't
>> need a regexp at all, you'd just do:
>
> What's that thing Kenny said once?
>
> "I will take missing the point for $800, Alex."

No, I got your point, I'm just acknowledging your childishness in making it that
way by treating you like a child in return. Thanks for playing along.

>> awk 'length($0) > 80 { print FILENAME, FNR }'
>
> "length" is already "length($0)".
>

I know that's in the current POSIX standard but it doesn't apply to all awks and
has no appreciable value vs the clearer alternative of just specifying the
argument so it's not a useful abbreviation to use, especially in a NG.

Ed.

Kenny McCormack

unread,

Nov 2, 2016, 4:35:40 PM11/2/16

to

In article <nvdhfe$umv$1...@dont-email.me>,
Ed Morton <morto...@gmail.com> wrote:
...

>No, if you wanted to use the length() function for some reason then you
>wouldn't need a regexp at all, you'd just do:

What did I just post about Ed being likely to not get it?

I'm good. I tell ya; I'm good.

--
> No, I haven't, that's why I'm asking questions. If you won't help me,
> why don't you just go find your lost manhood elsewhere.

CLC in a nutshell.

Ed Morton

unread,

Nov 2, 2016, 4:49:19 PM11/2/16

to

You're welcome. Ignore the noise from the 2 resident buffoons, their moms
probably let them eat too much Halloween candy. Once they've been put down with
their blankets and some warm milk they'll be fine.

Cheers,

Ed.

Kaz Kylheku

unread,

Nov 2, 2016, 4:57:20 PM11/2/16

to

On 2016-11-02, Ed Morton <morto...@gmail.com> wrote:

> On 11/2/2016 2:43 PM, Russ P. wrote:
>> Nice. Thanks!
>>
>> On Wednesday, November 2, 2016 at 12:37:32 PM UTC-7, Ed Morton wrote:
>>> On 11/2/2016 2:25 PM, Russ P. wrote:
>>>> What is a quick way to find lines in source code that are longer
>>>> than 80 chars?
>>>

>>> grep -E -n '.{81}' *.scala
>>>
>>> awk '/.{81}/{print FILENAME, FNR}' *.scala
>>
>
> You're welcome. Ignore the noise from the 2 resident buffoons, their moms
> probably let them eat too much Halloween candy. Once they've been put
> down with their blankets and some warm milk they'll be fine.

Now, the delivery could be worse here, folks; "same day and free"
is nothing to sneeze at.

Kenny McCormack

unread,

Nov 2, 2016, 5:04:08 PM11/2/16

to

In article <201611021...@kylheku.com>,
Kaz Kylheku <221-50...@kylheku.com> wrote:

Achooo!

Incidentally, Ed's AWK solution won't work on an out-of-the-box Debian
Linux system. Points to those who can tell me why this is so.

--
I love the poorly educated.

Janis Papanagnou

unread,

Nov 2, 2016, 5:24:28 PM11/2/16

to

What version is that on your Debian?

I was interested in performance - suspected that the length of $0 may be
stored by awk so it might be slightly faster than a regexp - and checked
with a 4.1.2 version I currently use and the 3.1.8 version from /usr/bin.

I was surprised that the regexp version is only _slightly_ slower than the
length() based version in that recent GNU awk version. Yet more astonished
I was when I saw that the length() based version was nearly 7 times slower
than the regexp based version with that old GNU awk version!

Given that result, Ed may have a point - a weak one, maybe, given the age
of that version, but anyway. It would be interesting what performance other
awks are showing with those two script variants.

Janis

Russ P.

unread,

Nov 2, 2016, 5:29:01 PM11/2/16

to

On Wednesday, November 2, 2016 at 12:37:32 PM UTC-7, Ed Morton wrote:

I like this grep solution, but I am having a surprisingly hard time converting it to a bash alias or function due to the ticks. Having it as an alias or function is just a way to remember it.

I tried

alias showLinesOver80chars=$(grep -E -n '.{81}')

and a couple other variations, but nothing seems to work. Any ideas?

Janis Papanagnou

unread,

Nov 2, 2016, 5:43:39 PM11/2/16

to

On 02.11.2016 22:28, Russ P. wrote:
> On Wednesday, November 2, 2016 at 12:37:32 PM UTC-7, Ed Morton wrote:
>> On 11/2/2016 2:25 PM, Russ P. wrote:
>>> What is a quick way to find lines in source code that are longer than
>>> 80 chars? I would like to be able to run it on multiple files using a
>>> wild-card (e.g. *.scala) and list each file and line number that exceed
>>> 80 chars. I could write a python or scala script to do this, but I
>>> figure there is probably a one-line (or otherwise very short) sed or
>>> awk command that can do it. Any ideas? Thanks.
>>>
>>
>> grep -E -n '.{81}' *.scala
>
> I like this grep solution,

It is not exactly what you asked for, is it?
("using a wild-card (e.g. *.scala) and list each file and line number")

> but I am having a surprisingly hard time
> converting it to a bash alias or function due to the ticks. Having it as an
> alias or function is just a way to remember it.
>
> I tried
>
> alias showLinesOver80chars=$(grep -E -n '.{81}')
>
> and a couple other variations, but nothing seems to work. Any ideas?

Have you tried one of

alias showLinesOver80chars="grep -E -n '.{81}'"
alias showLinesOver80chars='grep -E -n ".{81}"'

But better forget aliases, put the command in a function.

Janis

Russ P.

unread,

Nov 2, 2016, 5:53:38 PM11/2/16

to

OK, that works. Thanks.

> But better forget aliases, put the command in a function.

A function could be a bit more flexible I suppose, but I don't think I need one for my purposes.

Rakesh Sharma

unread,

Nov 2, 2016, 6:12:05 PM11/2/16

to

On Thursday, 3 November 2016 00:55:17 UTC+5:30, Russ P. wrote:

> What is a quick way to find lines in source code that are longer than 80 chars? I would like to be able to run it on multiple files using a wild-card (e.g. *.scala) and list each file and line number that exceed 80 chars.

grep -nE '.{80}.' /dev/null *.scala

the null file is put in place just in case there's just one scala file then "grep" won't put the filename on the selected lines.

also you might want to consider tabs separately since they occupy multi-char space. something along the lines:

perl -lne 's/\t/" "x8/eg;print "$ARGV:$.: $_" if 80 < +length' *.scala

Geoff Clare

unread,

Nov 3, 2016, 9:41:07 AM11/3/16

to

Rakesh Sharma wrote:

> also you might want to consider tabs separately since they occupy multi-char space. something along the lines:
>
> perl -lne 's/\t/" "x8/eg;print "$ARGV:$.: $_" if 80 < +length' *.scala

Replacing each tab character by 8 spaces is not the right thing to
do if any tabs do not occur at a tab position. There is a standard
command for expanding tabs the right way, called .... "expand".

$ printf 'abc\tx\n' | perl -lne 's/\t/" "x8/eg;print length'
12
$ printf 'abc\tx\n' | expand | awk '{print length}'
9

--
Geoff Clare <net...@gclare.org.uk>

Russ P.

unread,

Nov 3, 2016, 3:38:53 PM11/3/16

to

Using tabs in source code can cause problems, and it is wise to avoid them. In Scala, the convention is to use only two spaces for each indentation level to save space on a line and avoid the need for wrapped and continued lines. That took some getting used to at first, but now it seems reasonable to me.

Kaz Kylheku

unread,

Nov 3, 2016, 4:18:36 PM11/3/16

to

I won't indent in any other way unless required to. E.g.:

http://www.kylheku.com/cgit/c-snippets/tree/autotab.c

I got used to two character indents from coding in Lisp.

Incidentally, autotab.c is a utility that calculates Vim indentation
settings based on sampling the file you're about to edit.

With this, I instantly conform to the style of the given file
without having to fiddle with expandtab, tabstop and shiftwdith.

Kenny McCormack

unread,

Nov 3, 2016, 4:35:09 PM11/3/16

to

In article <201611031...@kylheku.com>,
Kaz Kylheku <221-50...@kylheku.com> wrote:
...

>
>http://www.kylheku.com/cgit/c-snippets/tree/autotab.c
>
>I got used to two character indents from coding in Lisp.
>
>Incidentally, autotab.c is a utility that calculates Vim indentation
>settings based on sampling the file you're about to edit.
>
>With this, I instantly conform to the style of the given file
>without having to fiddle with expandtab, tabstop and shiftwdith.

This looks interesting. However, a couple of questions:

1) Do you provide any easier way to get (download) it? I tried first
with 'wget' and got a file full of HTML garbage. Then I did it
over with 'lynx -dump', which was better but not ideal. I still
had to do a fair amount of editing to get it down to just the C
source. Here is the wc output of the resulting file:
$ wc autotab.c
766 2795 19479 autotab.c
$
Is this the correct size/info?

2) I'm a little surprised at how long/complex the code is. Care to say
anything about how it came to be in its current form?

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain in
compliance with said RFCs, the actual sig can be found at the following web address:
http://www.xmission.com/~gazelle/Sigs/Infallibility

Ben Bacarisse

unread,

Nov 3, 2016, 5:19:39 PM11/3/16

to

gaz...@shell.xmission.com (Kenny McCormack) writes:

> In article <201611031...@kylheku.com>,
> Kaz Kylheku <221-50...@kylheku.com> wrote:
> ...
>>
>>http://www.kylheku.com/cgit/c-snippets/tree/autotab.c
>>
>>I got used to two character indents from coding in Lisp.
>>
>>Incidentally, autotab.c is a utility that calculates Vim indentation
>>settings based on sampling the file you're about to edit.

<snip>

> This looks interesting. However, a couple of questions:
>
> 1) Do you provide any easier way to get (download) it? I tried first
> with 'wget' and got a file full of HTML garbage.

There's a "plain" link near the top of the listing. I goes to

http://www.kylheku.com/cgit/c-snippets/plain/autotab.c

<snip>
--
Ben.

Kaz Kylheku

unread,

Nov 3, 2016, 5:20:27 PM11/3/16

to

On 2016-11-03, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <201611031...@kylheku.com>,
> Kaz Kylheku <221-50...@kylheku.com> wrote:
> ...
>>
>>http://www.kylheku.com/cgit/c-snippets/tree/autotab.c
>>
>>I got used to two character indents from coding in Lisp.
>>
>>Incidentally, autotab.c is a utility that calculates Vim indentation
>>settings based on sampling the file you're about to edit.
>>
>>With this, I instantly conform to the style of the given file
>>without having to fiddle with expandtab, tabstop and shiftwdith.
>
> This looks interesting. However, a couple of questions:
>
> 1) Do you provide any easier way to get (download) it? I tried first
> with 'wget' and got a file full of HTML garbage.

Try following CGIT's "plain" link: that serves up a raw form of it,
by navigating you to this URL:

http://www.kylheku.com/cgit/c-snippets/plain/autotab.c
^^^^^

CGIT also lets you download tarball snapshots from the main
page of a repo:

http://www.kylheku.com/cgit/c-snippets/

See the autotab-5.tar.gz links (also .zip and .bz2). These are generated
for all tags. Also any version you navigate to has downlaod links of the
form <160bit-hex-SHA1>.tar.gz, etc.

It's very friendly for people who don't have git installed or don't know
how/care to use it.

> $ wc autotab.c
> 766 2795 19479 autotab.c

That's what I get in my dev sandbox of that repo.

>
> 2) I'm a little surprised at how long/complex the code is. Care to say
> anything about how it came to be in its current form?

Shrug. The first git version (003) is 742 lines already and not
that different from the current verison. The first 200-300 lines of it
is support code: linked list of strings stuff, and tokenizing and
whatnot.

The logic is the way it is because this is a task which requires
a modicum of cunning. From the very beginning, it resembled its
current form; it didn't start out as some 100 line hack that grew.

At the time I wrote it, I was putting together a from-scratch embedded
Linux distro with quite a few packages, and had to go "down to the
elbows" in many of them, and the kernel of course, to patch many things.
It was a a PITA always adjusting the editor to match the indentation
style, not to mention figuring out what to adjust it to.

Amidst this distro work, I had a wide range of real-world test cases for
autotab. Over a period of some time, I tweaked it based on not getting
the right sort of behavior on this file or that until those situations
were rare enough not to care about. Those tweaks weren't large additions
to the code or rewrites.

Kaz Kylheku

unread,

Nov 3, 2016, 5:55:19 PM11/3/16

to

On 2016-11-03, Kaz Kylheku <221-50...@kylheku.com> wrote:
> On 2016-11-03, Kenny McCormack <gaz...@shell.xmission.com> wrote:
>> In article <201611031...@kylheku.com>,
>> Kaz Kylheku <221-50...@kylheku.com> wrote:
>> ...
>>>
>>>http://www.kylheku.com/cgit/c-snippets/tree/autotab.c
>>>
>>>I got used to two character indents from coding in Lisp.
>>>
>>>Incidentally, autotab.c is a utility that calculates Vim indentation
>>>settings based on sampling the file you're about to edit.
>>>
>>>With this, I instantly conform to the style of the given file
>>>without having to fiddle with expandtab, tabstop and shiftwdith.
>>
>> This looks interesting. However, a couple of questions:
>>
>> 1) Do you provide any easier way to get (download) it? I tried first
>> with 'wget' and got a file full of HTML garbage.
>
> Try following CGIT's "plain" link: that serves up a raw form of it,
> by navigating you to this URL:
>
> http://www.kylheku.com/cgit/c-snippets/plain/autotab.c
>

By the way, seeing that it's not OSS licensed properly, I pushed out a
new revision, adding the BSD 2-Clause license to the block comment.

Kenny McCormack

unread,

Nov 3, 2016, 7:05:32 PM11/3/16

to

In article <201611031...@kylheku.com>,
Kaz Kylheku <221-50...@kylheku.com> wrote:
...

>>> This looks interesting. However, a couple of questions:
>>>
>>> 1) Do you provide any easier way to get (download) it? I tried first
>>> with 'wget' and got a file full of HTML garbage.
>>
>> Try following CGIT's "plain" link: that serves up a raw form of it,
>> by navigating you to this URL:
>>
>> http://www.kylheku.com/cgit/c-snippets/plain/autotab.c
>>
>
>By the way, seeing that it's not OSS licensed properly, I pushed out a
>new revision, adding the BSD 2-Clause license to the block comment.

Thanks. Got it.

Keith Thompson

unread,

Nov 5, 2016, 11:10:07 PM11/5/16

to

Ed Morton <morto...@gmail.com> writes:
> On 11/2/2016 3:27 PM, Kaz Kylheku wrote:
>> On 2016-11-02, Ed Morton <morto...@gmail.com> wrote:

[...]

>>> awk 'length($0) > 80 { print FILENAME, FNR }'
>>
>> "length" is already "length($0)".
>
> I know that's in the current POSIX standard but it doesn't apply to
> all awks and has no appreciable value vs the clearer alternative of
> just specifying the argument so it's not a useful abbreviation to use,
> especially in a NG.

Most implementations of awk do support length without an argument to
mean length($0), and given that POSIX requires it one could argue that
all implementations *should* -- but a quick experiment shows that the
BusyBox version of awk does not (at least as of busyBox v1.22.1).

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Rakesh Sharma

unread,

Nov 6, 2016, 7:25:28 PM11/6/16

to

On Thursday, 3 November 2016 19:11:07 UTC+5:30, Geoff Clare wrote:

>
> Replacing each tab character by 8 spaces is not the right thing to
> do if any tabs do not occur at a tab position. There is a standard
> command for expanding tabs the right way, called .... "expand".
>
> $ printf 'abc\tx\n' | perl -lne 's/\t/" "x8/eg;print length'
> 12
> $ printf 'abc\tx\n' | expand | awk '{print length}'
> 9
>

That's correct...
But then we lose the filename info.
We can whip up something along these lines:

perl -lne '
##### emulate <<expand>>
my $x; # will hold the expanded line
$_ = "\t$_";
while ( /\t/g ) {
/\G([^\t]*)(?=\t)/ && do{
my $l = length($x .= $1);
$x .= " " x (8*(1+int($l/8))-$l);
next;
};
/\G([^\t]*)$/ and $x .= $1;
}
$_ = $x;
###########
print "$ARGV:$.:$_" if +length > 80;
' ./*.scala

victorma...@gmail.com

unread,

Nov 18, 2016, 12:25:03 PM11/18/16

to

With SED:
sed -rn '/^.{81,}/p' *.scala

Rakesh Sharma

unread,

Nov 18, 2016, 6:41:24 PM11/18/16

to

On Friday, 18 November 2016 22:55:03 UTC+5:30, victorma...@gmail.com wrote:
> With SED:
> sed -rn '/^.{81,}/p' *.scala

This will just report those lines without the filenames where they come from.

Joerg.S...@fokus.fraunhofer.de

unread,

Nov 19, 2016, 7:36:12 AM11/19/16

to

In article <695c61e9-732c-4ff5...@googlegroups.com>,

BTW: IIRC, all solutions mentioned before just count chars but do not compute
the number of columns in the output.

"cstyle" will report lines that do not fit on a 80 column printout.

cstyle is e.g. part of the schilytools at:

http://sourceforge.net/projects/schilytools/

and recent versions allow to specify the line width with the -l option.

If you are not interested in the line width only, you need to grep for lines
like:

test.c: 44: line (len 86) > 80 characters

--
EMail:jo...@schily.net (home) Jörg Schilling D-13353 Berlin
joerg.s...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/

Nathan Wagner

unread,

Nov 21, 2016, 7:52:41 PM11/21/16

to

On 2016-11-19, <Joerg.S...@fokus.fraunhofer.de> wrote:
> In article <695c61e9-732c-4ff5...@googlegroups.com>,
> Rakesh Sharma <shar...@hotmail.com> wrote:
>>On Friday, 18 November 2016 22:55:03 UTC+5:30, victorma...@gmail.com wrote:
>>> With SED:
>>> sed -rn '/^.{81,}/p' *.scala
>>
>>This will just report those lines without the filenames where they come from.
>
> BTW: IIRC, all solutions mentioned before just count chars but do not compute
> the number of columns in the output.

Assuming you're thinking about tabs, one could run the source files
through expand(1) before running whatever test seems appropriate.

Amusingly, my news-reader complained about the >80 character attribution
line caused by (who I assume to be) Mr Schilling's rather long From
header.

--
nw