Don't let awk truncating white spaces when print lines.

Hongyi Zhao

unread,

Jan 21, 2016, 10:03:53 AM1/21/16

to

Hi all,

See the following testings:

$ echo " a n" | awk '{$1="";print}'
n
$ echo " a n" | awk '{sub(" ","");$1="";print}'
n

It seems awk trimmed out some white spaces from the input content. How
to disable awk's truncating white spaces when print lines, i.e, leaving
them as they were in the original input context?

Regards
--
.: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.

Ben Bacarisse

unread,

Jan 21, 2016, 10:34:37 AM1/21/16

to

Hongyi Zhao <hongy...@gmail.com> writes:

> See the following testings:
>
> $ echo " a n" | awk '{$1="";print}'
> n
> $ echo " a n" | awk '{sub(" ","");$1="";print}'
> n
>
> It seems awk trimmed out some white spaces from the input content. How
> to disable awk's truncating white spaces when print lines, i.e, leaving
> them as they were in the original input context?

You are almost certainly better off using some other tool. Awk does not
"truncate white spaces when printing lines" but it does (by default)
treat multiple spaces as field separators, and altering a field causes
$0 to be re-constructed using the output field separator (OFS).

The simplest way to remove the first space-delimited field from every
line, without altering the spaces, is to use sed:

echo " a n" | sed -e 's/[^ ]\+//'

You can do that sort of substitution in Awk but it's not a natural use
of the tool.

--
Ben.

Janis Papanagnou

unread,

Jan 21, 2016, 11:14:50 AM1/21/16

to

On 21.01.2016 16:34, Ben Bacarisse wrote:
> Hongyi Zhao <hongy...@gmail.com> writes:
>
>> See the following testings:
>>
>> $ echo " a n" | awk '{$1="";print}'
>> n
>> $ echo " a n" | awk '{sub(" ","");$1="";print}'
>> n
>>
>> It seems awk trimmed out some white spaces from the input content. How
>> to disable awk's truncating white spaces when print lines, i.e, leaving
>> them as they were in the original input context?

Use a regexp in sub(). E.g. (depending on what your fields may contain)...

$ echo " a n" | awk 'sub(/[[:space:]]*[[:alnum:]]+/,"")'
n

Or in the simpler case of optional leading blanks and a sequence of
non-blanks:

$ echo " a n" | awk 'sub(/ *[^ ]+/,"")'
n

Janis

> [...]

Kenny McCormack

unread,

Jan 21, 2016, 11:23:25 AM1/21/16

to

In article <87io2ny...@bsb.me.uk>,
Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
...

>The simplest way to remove the first space-delimited field from every
>line, without altering the spaces, is to use sed:
>
> echo " a n" | sed -e 's/[^ ]\+//'
>
>You can do that sort of substitution in Awk but it's not a natural use
>of the tool.

I disagree with that. AWK is a perfectly fine tool for this, and it is
better to just leave sed (and many of the rest of those cute little Unix-y
commands like join, comm, etc) back in the 20th century where it/they
belong.

That said, OP has identified one of the slightly weird bits in AWK, that
does catch up newcomers.

If, indeed, the OP's goal is to safely remove the first field from the record
and print what remains, he could try either of these approaches (alert: may
be gawk-specific):

# This is reg-exp-proof
{ print substr($0,index($0,$1)+length($1)+1) }

or

# Uses reg-exps
{ print gensub($1,"",1) }

Whether or not you want reg-exps in play is up to you; arguments could
favor either position.

--
People seem to think that Youtube (and Facebook and Twitter and so on) is
(are) some sort of resource created for the public good. Why they delude
themselves into believing this is beyond my ability to comprehend.

Usenet will always be here. Longer than people remember what port 80 was for.

Ed Morton

unread,

Jan 21, 2016, 12:07:53 PM1/21/16

to

On 1/21/2016 9:01 AM, Hongyi Zhao wrote:
> Hi all,
>
> See the following testings:
>
> $ echo " a n" | awk '{$1="";print}'
> n
> $ echo " a n" | awk '{sub(" ","");$1="";print}'
> n
>
> It seems awk trimmed out some white spaces from the input content. How
> to disable awk's truncating white spaces when print lines, i.e, leaving
> them as they were in the original input context?
>
> Regards
>

This is ridiculous. You've been told a dozen times to read the book Effective
Awk programming, 4th Edition, by Arnold Robbins. Just do it and stop cluttering
up usenet with questions you wouldn't be asking if you just did the tiniest bit
of research yourself.

Ed.

Barry Margolin

unread,

Jan 21, 2016, 1:59:22 PM1/21/16

to

In article <n7r33q$iou$1...@dont-email.me>,

Ed Morton <morto...@gmail.com> wrote:

> This is ridiculous. You've been told a dozen times to read the book Effective
> Awk programming, 4th Edition, by Arnold Robbins. Just do it and stop
> cluttering
> up usenet with questions you wouldn't be asking if you just did the tiniest
> bit
> of research yourself.

He's long since demonstrated that he's incapable of learning this stuff
on his own. No book is going to solve that problem.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Thomas 'PointedEars' Lahn

unread,

Jan 30, 2016, 1:21:34 AM1/30/16

to

Kenny McCormack wrote:

> If, indeed, the OP's goal is to safely remove the first field from the
> record and print what remains, he could try either of these approaches
> (alert: may be gawk-specific):
> […]

> # Uses reg-exps
> { print gensub($1,"",1) }

Verbatim this is error-prone because $1 can contain regular expression
special characters that are not escaped. That can be fixed:

{
gsub(/[]|*+?{}()[.[.].^$\\-]/, "\\\\&", $1);
print gensub($1, "", 1);
}

(See <https://www.gnu.org/software/gawk/manual/html_node/Gory-Details.html#Gory-Details> for why the more intuitive "\\&" does not suffice
with either GNU awk or POSIX awk.)

--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

Janis Papanagnou

unread,

Jan 30, 2016, 1:30:00 AM1/30/16

to

On 30.01.2016 07:21, Thomas 'PointedEars' Lahn wrote:
> Kenny McCormack wrote:
>
>> If, indeed, the OP's goal is to safely remove the first field from the
>> record and print what remains, he could try either of these approaches
>> (alert: may be gawk-specific):
>> […]
>> # Uses reg-exps
>> { print gensub($1,"",1) }
>
> Verbatim this is error-prone because $1 can contain regular expression
> special characters that are not escaped. That can be fixed:

Indeed.

>
> {
> gsub(/[]|*+?{}()[.[.].^$\\-]/, "\\\\&", $1);
> print gensub($1, "", 1);
> }

But then you don't need the two (partly non-standard) substitutions; one
sub() suffices, as depicted upthread in my response.

Janis

Thomas 'PointedEars' Lahn

unread,

Jan 30, 2016, 3:11:32 AM1/30/16

to

Janis Papanagnou wrote:

> On 30.01.2016 07:21, Thomas 'PointedEars' Lahn wrote:
>> Kenny McCormack wrote:
>>> If, indeed, the OP's goal is to safely remove the first field from the

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> record and print what remains, he could try either of these approaches

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> (alert: may be gawk-specific):
>>> […]
>>> # Uses reg-exps
>>> { print gensub($1,"",1) }
>>
>> Verbatim this is error-prone because $1 can contain regular expression
>> special characters that are not escaped. That can be fixed:

> […]

>>
>> {
>> gsub(/[]|*+?{}()[.[.].^$\\-]/, "\\\\&", $1);
>> print gensub($1, "", 1);
>> }
>
> But then you don't need the two (partly non-standard) substitutions; one
> sub() suffices, as depicted upthread in my response.

Your suggestion works well in specific cases. It does not work so well in
general, for which solutions were discussed in this subthread.