Awk? Undent source code file...

delr...@gmail.com

unread,

Feb 19, 2006, 5:04:33 PM2/19/06

to

I need a simple script or command that I can execute from the command
line (to be executed inside an Ant script) that will undent a source
code file, ie, basically remove leading whitespace (spaces and tabs
only) from each line. Nothing more... I'm not sure which tool is the
right one for the job. I imagine this would be a couple line script at
maximum... any suggestions?

Thanks!

Cliff.
delr...@gmail.com

William James

unread,

Feb 19, 2006, 5:14:07 PM2/19/06

to

awk '{sub(/^[ \t]+/,"");print}' oldfile >newfile

Chris F.A. Johnson

unread,

Feb 19, 2006, 5:18:50 PM2/19/06

to

awk '{$1 = $1; print}'

--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence

William James

unread,

Feb 19, 2006, 5:27:58 PM2/19/06

to

Chris F.A. Johnson wrote:
> On 2006-02-19, delr...@gmail.com wrote:
> > I need a simple script or command that I can execute from the command
> > line (to be executed inside an Ant script) that will undent a source
> > code file, ie, basically remove leading whitespace (spaces and tabs
> > only) from each line. Nothing more... I'm not sure which tool is the
> > right one for the job. I imagine this would be a couple line script at
> > maximum... any suggestions?
>
> awk '{$1 = $1; print}'

This does more than it should. Not only does it remove leading
whitespace, but it also changes the the whitespace between each
pair of fields to " ".

Joel Reicher

unread,

Feb 19, 2006, 10:35:58 PM2/19/06

to

"William James" <w_a_...@yahoo.com> writes:

> delr...@gmail.com wrote:
> > I need a simple script or command that I can execute from the command
> > line (to be executed inside an Ant script) that will undent a source
> > code file, ie, basically remove leading whitespace (spaces and tabs
> > only) from each line. Nothing more... I'm not sure which tool is the
> > right one for the job. I imagine this would be a couple line script at
> > maximum... any suggestions?
>

> awk '{sub(/^[ \t]+/,"");print}' oldfile >newfile

Should be /[ \t]*/

* instead of + because there might not be any leading whitespace, and
with * don't need the ^ anchor because regexp matching is greedy.

Cheers,

- Joel

Kenny McCormack

unread,

Feb 19, 2006, 10:39:39 PM2/19/06

to

In article <rnaccm9...@succubus.panacea.null.org>,
Joel Reicher <jo...@panacea.null.org> wrote:
...

>> awk '{sub(/^[ \t]+/,"");print}' oldfile >newfile
>
>Should be /[ \t]*/
>
>* instead of + because there might not be any leading whitespace, and
>with * don't need the ^ anchor because regexp matching is greedy.

Yes. And not only that, but, because this substitution will always
succeed, you can reduce the program to

sub(/[ \t]*/,"")

(Yes, that's the whole program!)

cutting your golf score by 9 strokes.

William James

unread,

Feb 19, 2006, 11:42:34 PM2/19/06

to

Joel Reicher wrote:
> "William James" <w_a_...@yahoo.com> writes:
>
> > delr...@gmail.com wrote:
> > > I need a simple script or command that I can execute from the command
> > > line (to be executed inside an Ant script) that will undent a source
> > > code file, ie, basically remove leading whitespace (spaces and tabs
> > > only) from each line. Nothing more... I'm not sure which tool is the
> > > right one for the job. I imagine this would be a couple line script at
> > > maximum... any suggestions?
> >
> > awk '{sub(/^[ \t]+/,"");print}' oldfile >newfile
>
> Should be /[ \t]*/

Wrong. * means zero or more of the preceding item.
If there are zero of them, then there's nothing to replace.
So + is easier for the o.p. to understand..

>
> * instead of + because there might not be any leading whitespace, and
> with * don't need the ^ anchor because regexp matching is greedy.

This has nothing to do with regular-expression greediness, which
you evidently don't understand.

BEGIN {
$0 = "abaaaaa"
sub(/a*/, "")
print
}

The output is "baaaaa", not "ab". Awk picks the first match, not the
longest.

BEGIN {
$0 = "the <b>only</b> way"
sub(/<.*>/, "")
print
}

The output is "the way". Awk takes the first match and GREEDILY
makes it as long as possible.

The goal here is not to make the program as short as possible,
but to help the o.p. by providing code that is easy to understand.

Joel Reicher

unread,

Feb 20, 2006, 12:29:18 AM2/20/06

to

"William James" <w_a_...@yahoo.com> writes:

> Joel Reicher wrote:
> > "William James" <w_a_...@yahoo.com> writes:
> >
> > > delr...@gmail.com wrote:
> > > > I need a simple script or command that I can execute from the command
> > > > line (to be executed inside an Ant script) that will undent a source
> > > > code file, ie, basically remove leading whitespace (spaces and tabs
> > > > only) from each line. Nothing more... I'm not sure which tool is the
> > > > right one for the job. I imagine this would be a couple line script at
> > > > maximum... any suggestions?
> > >
> > > awk '{sub(/^[ \t]+/,"");print}' oldfile >newfile
> >
> > Should be /[ \t]*/
>
> Wrong. * means zero or more of the preceding item.
> If there are zero of them, then there's nothing to replace.
> So + is easier for the o.p. to understand..

Could be that /^[ \t]+/ is eaiser to understand, or not. Nothing wrong
with an alternative, but I admit I was wrong to say "should be".

> > * instead of + because there might not be any leading whitespace, and
> > with * don't need the ^ anchor because regexp matching is greedy.
>
> This has nothing to do with regular-expression greediness, which
> you evidently don't understand.
>
> BEGIN {
> $0 = "abaaaaa"
> sub(/a*/, "")
> print
> }
>
> The output is "baaaaa", not "ab". Awk picks the first match, not the
> longest.

I never said anything about longest.

> BEGIN {
> $0 = "the <b>only</b> way"
> sub(/<.*>/, "")
> print
> }
>
> The output is "the way". Awk takes the first match and GREEDILY
> makes it as long as possible.

There's no surprise here, so I'm not sure what your point is, or what
you think my misunderstanding is. Regexps take the first match
*because* they're greedy. That was my only point above when I said the
^ could be omitted. The * matches the 0-char case at the start of the
line.

> The goal here is not to make the program as short as possible,
> but to help the o.p. by providing code that is easy to understand.

I agree completely. I didn't post /[ \t]*/ because it is shorter, but
because I find it easier to understand.

Cheers,

- Joel

Joel Reicher

unread,

Feb 20, 2006, 2:10:40 AM2/20/06

to

Joel Reicher <jo...@panacea.null.org> writes:

> > The goal here is not to make the program as short as possible,
> > but to help the o.p. by providing code that is easy to understand.
>
> I agree completely. I didn't post /[ \t]*/ because it is shorter, but
> because I find it easier to understand.

I should perhaps mention that if I were doing this kind of thing in
awk at all, I would do

{ print substr($0, match($0, /[^ \t]/)) }

Cheers,

- Joel

Kenny McCormack

unread,

Feb 20, 2006, 4:47:23 AM2/20/06

to

In article <1140410554.4...@g47g2000cwa.googlegroups.com>,
William James <w_a_...@yahoo.com> wrote a bunch of gibberish that doesn't
stand up to analysis:
<...>

Joel is right on all counts. Get over it. Move on.

Ed Morton

unread,

Feb 20, 2006, 8:28:23 AM2/20/06

to

I'd have intuitively done what Kenny suggested, i.e. just:

sub(/[ \t]*/,"")

Is there any benefit to calling match() and substr()?

Ed.

Joel Reicher

unread,

Feb 20, 2006, 9:16:14 AM2/20/06

to

Ed Morton <mor...@lsupcaemnt.com> writes:

> Joel Reicher wrote:
> > awk at all, I would do
> > { print substr($0, match($0, /[^ \t]/)) }
>
> I'd have intuitively done what Kenny suggested, i.e. just:
>
> sub(/[ \t]*/,"")
>
> Is there any benefit to calling match() and substr()?

This more than likely comes down to personal taste, but I believe so,
yes. The problem (computation) being solved, is not by nature one of
editing the input, but of filtering it, so doing a sub() isn't really
faithful. To put that in concrete terms, consider that without any of
these higher level constructs you would probably write something like

while(i++<length())
if(" "!=substr($0, i, 1) && "\t"!=substr($0, i, 1))
break
print substr($0, i)

Using sub() is really not much like this more explicit code, whereas
match() does almost exactly that. And because $0 is not being changed
this has the added advantage of potentially being much more efficient,
which I believe usually happens when code is faithful to the nature of
the problem.

This is far too much analysis for such a small problem though. :)
Besides, it doesn't apply if you like reducing keystrokes for
one-liners (but then you might not use awk for this problem...).

Cheers,

- Joel