Thanks!
Cliff.
delr...@gmail.com
awk '{sub(/^[ \t]+/,"");print}' oldfile >newfile
awk '{$1 = $1; print}'
--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
This does more than it should. Not only does it remove leading
whitespace, but it also changes the the whitespace between each
pair of fields to " ".
> delr...@gmail.com wrote:
> > I need a simple script or command that I can execute from the command
> > line (to be executed inside an Ant script) that will undent a source
> > code file, ie, basically remove leading whitespace (spaces and tabs
> > only) from each line. Nothing more... I'm not sure which tool is the
> > right one for the job. I imagine this would be a couple line script at
> > maximum... any suggestions?
>
> awk '{sub(/^[ \t]+/,"");print}' oldfile >newfile
Should be /[ \t]*/
* instead of + because there might not be any leading whitespace, and
with * don't need the ^ anchor because regexp matching is greedy.
Cheers,
- Joel
Yes. And not only that, but, because this substitution will always
succeed, you can reduce the program to
sub(/[ \t]*/,"")
(Yes, that's the whole program!)
cutting your golf score by 9 strokes.
Wrong. * means zero or more of the preceding item.
If there are zero of them, then there's nothing to replace.
So + is easier for the o.p. to understand..
>
> * instead of + because there might not be any leading whitespace, and
> with * don't need the ^ anchor because regexp matching is greedy.
This has nothing to do with regular-expression greediness, which
you evidently don't understand.
BEGIN {
$0 = "abaaaaa"
sub(/a*/, "")
print
}
The output is "baaaaa", not "ab". Awk picks the first match, not the
longest.
BEGIN {
$0 = "the <b>only</b> way"
sub(/<.*>/, "")
print
}
The output is "the way". Awk takes the first match and GREEDILY
makes it as long as possible.
The goal here is not to make the program as short as possible,
but to help the o.p. by providing code that is easy to understand.
> Joel Reicher wrote:
> > "William James" <w_a_...@yahoo.com> writes:
> >
> > > delr...@gmail.com wrote:
> > > > I need a simple script or command that I can execute from the command
> > > > line (to be executed inside an Ant script) that will undent a source
> > > > code file, ie, basically remove leading whitespace (spaces and tabs
> > > > only) from each line. Nothing more... I'm not sure which tool is the
> > > > right one for the job. I imagine this would be a couple line script at
> > > > maximum... any suggestions?
> > >
> > > awk '{sub(/^[ \t]+/,"");print}' oldfile >newfile
> >
> > Should be /[ \t]*/
>
> Wrong. * means zero or more of the preceding item.
> If there are zero of them, then there's nothing to replace.
> So + is easier for the o.p. to understand..
Could be that /^[ \t]+/ is eaiser to understand, or not. Nothing wrong
with an alternative, but I admit I was wrong to say "should be".
> > * instead of + because there might not be any leading whitespace, and
> > with * don't need the ^ anchor because regexp matching is greedy.
>
> This has nothing to do with regular-expression greediness, which
> you evidently don't understand.
>
> BEGIN {
> $0 = "abaaaaa"
> sub(/a*/, "")
> print
> }
>
> The output is "baaaaa", not "ab". Awk picks the first match, not the
> longest.
I never said anything about longest.
> BEGIN {
> $0 = "the <b>only</b> way"
> sub(/<.*>/, "")
> print
> }
>
> The output is "the way". Awk takes the first match and GREEDILY
> makes it as long as possible.
There's no surprise here, so I'm not sure what your point is, or what
you think my misunderstanding is. Regexps take the first match
*because* they're greedy. That was my only point above when I said the
^ could be omitted. The * matches the 0-char case at the start of the
line.
> The goal here is not to make the program as short as possible,
> but to help the o.p. by providing code that is easy to understand.
I agree completely. I didn't post /[ \t]*/ because it is shorter, but
because I find it easier to understand.
Cheers,
- Joel
> > The goal here is not to make the program as short as possible,
> > but to help the o.p. by providing code that is easy to understand.
>
> I agree completely. I didn't post /[ \t]*/ because it is shorter, but
> because I find it easier to understand.
I should perhaps mention that if I were doing this kind of thing in
awk at all, I would do
{ print substr($0, match($0, /[^ \t]/)) }
Cheers,
- Joel
Joel is right on all counts. Get over it. Move on.
I'd have intuitively done what Kenny suggested, i.e. just:
sub(/[ \t]*/,"")
Is there any benefit to calling match() and substr()?
Ed.
> Joel Reicher wrote:
> > awk at all, I would do
> > { print substr($0, match($0, /[^ \t]/)) }
>
> I'd have intuitively done what Kenny suggested, i.e. just:
>
> sub(/[ \t]*/,"")
>
> Is there any benefit to calling match() and substr()?
This more than likely comes down to personal taste, but I believe so,
yes. The problem (computation) being solved, is not by nature one of
editing the input, but of filtering it, so doing a sub() isn't really
faithful. To put that in concrete terms, consider that without any of
these higher level constructs you would probably write something like
while(i++<length())
if(" "!=substr($0, i, 1) && "\t"!=substr($0, i, 1))
break
print substr($0, i)
Using sub() is really not much like this more explicit code, whereas
match() does almost exactly that. And because $0 is not being changed
this has the added advantage of potentially being much more efficient,
which I believe usually happens when code is faithful to the nature of
the problem.
This is far too much analysis for such a small problem though. :)
Besides, it doesn't apply if you like reducing keystrokes for
one-liners (but then you might not use awk for this problem...).
Cheers,
- Joel