Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

problem using cut with backslash delimiter

1,560 views
Skip to first unread message

Cary

unread,
Jan 10, 2013, 6:47:23 AM1/10/13
to
Here is a sample of a file, example.txt, containing a few lines of data:

files data=foo {} rel\path\to\01-song_name.ogg
files data=bar {} rel\path\to\02-song_name.ogg
files data=bas {} rel\path\to\03-song_name.ogg

EOF


When invoking cut to process the file its output is what I would expect.

#>>cut -d\\ -f4 example.txt

01-song_name.ogg
02-song_name.ogg
03-song_name.ogg

#>>


But when I use cut in a pipe with output from read cut seems
to behave more like a sed substitution. It removes the backslashes from
the line.

#>> while read; do echo $REPLY | cut -d\\ -f4; done <example.txt

files data=foo {} relpathto01-song_name.ogg
files data=bar {} relpathto02-song_name.ogg
files data=bas {} relpathto03-song_name.ogg

#>>


Using cut in the same kind of a pipe but with different
arguments such as:
#>>while read; do echo $REPLY | cut -d= -f1 <example.txt
will show expected results. "files data" is output for each line.

I would be glad to learn what the mistake is that I am making here.
I was trying to write a script using read and cut to extract values from
a similar file when I encountered the issue.

I am using cut version 8.19 (GNU).

Marc Girod

unread,
Jan 10, 2013, 7:27:10 AM1/10/13
to
On Jan 10, 11:47 am, Cary <notva...@nmail.com> wrote:

> I would be glad to learn what the mistake is that I am making here.

Your problem is not with cut, but with echo.
It is where your backslashes get evaled.

Marc

Marc Girod

unread,
Jan 10, 2013, 7:33:01 AM1/10/13
to
On Jan 10, 12:27 pm, Marc Girod <marc.gi...@gmail.com> wrote:

> It is where your backslashes get evaled.

$ perl -ple 's%\\%\\\\%g' example.txt | \
while read; do echo $REPLY | cut -d\\ -f4; done

Marc

Dave Gibson

unread,
Jan 10, 2013, 7:59:22 AM1/10/13
to
Cary <notv...@nmail.com> wrote:
> Here is a sample of a file, example.txt, containing a few lines of data:
>
> files data=foo {} rel\path\to\01-song_name.ogg
> files data=bar {} rel\path\to\02-song_name.ogg
> files data=bas {} rel\path\to\03-song_name.ogg
>
> EOF
>
>
> When invoking cut to process the file its output is what I would expect.
>
> #>>cut -d\\ -f4 example.txt
>
> 01-song_name.ogg
> 02-song_name.ogg
> 03-song_name.ogg
>
> #>>
>
>
> But when I use cut in a pipe with output from read cut seems
> to behave more like a sed substitution. It removes the backslashes from
> the line.
>
> #>> while read; do echo $REPLY | cut -d\\ -f4; done <example.txt

read interprets backslashes. Use read -r.

while read -r ; do echo "$REPLY" | cut -d\\ -f4 ; done < example.txt

Ed Morton

unread,
Jan 10, 2013, 9:36:41 AM1/10/13
to
That would still fail as the read without a -r argument will interpret the
backslashes and without setting IFS= leading and trailing spaces will be
stripped off which may or may not be desirable and the unquoted echo argument
will be expanded by the shell.

Try this instead:

while IFS= read -r REPLY; do printf "%s\n" "$REPLY" | cut -d\\ -f4; done

Ed.

Lew Pitcher

unread,
Jan 10, 2013, 11:08:33 AM1/10/13
to
On Thursday 10 January 2013 06:47, in comp.unix.shell, notv...@nmail.com
wrote:

> Here is a sample of a file, example.txt, containing a few lines of data:
>
> files data=foo {} rel\path\to\01-song_name.ogg
> files data=bar {} rel\path\to\02-song_name.ogg
> files data=bas {} rel\path\to\03-song_name.ogg
>
> EOF
>
>
> When invoking cut to process the file its output is what I would expect.
>
> #>>cut -d\\ -f4 example.txt
>
> 01-song_name.ogg
> 02-song_name.ogg
> 03-song_name.ogg
>
> #>>
>
>
> But when I use cut in a pipe with output from read cut seems
> to behave more like a sed substitution. It removes the backslashes from
> the line.
>
> #>> while read; do echo $REPLY | cut -d\\ -f4; done <example.txt
>
> files data=foo {} relpathto01-song_name.ogg
> files data=bar {} relpathto02-song_name.ogg
> files data=bas {} relpathto03-song_name.ogg
>
> #>>

The problem isn't with cut(1), it is with your /echo $REPLY/

Because the echo parses an unquoted argument, the shell escape processing
takes effect, effectively removing the backslashes for you. Thus, the
cut(1) never receives a backslash-delimited string, and cannot "cut" it at
the fourth backslash. Instead, cut(1) presents back to you everything it
got, as it got it.

To see this in action, try these two tests:

1) echo without quotes (permits shell parsing of echo arguments)
~ $ echo files data=foo {} rel\path\to\01-song_name.ogg | cut -d\\ -f4
files data=foo {} relpathto01-song_name.ogg

Notice that we get the results you already see.

2) echo with singlequotes to disable shell parsing of echo arguments
~ $ echo 'files data=foo {} rel\path\to\01-song_name.ogg' | cut -d\\ -f4
01-song_name.ogg

Notice that we get your expected results.

To fix your problem, restructure your script to eliminate the shell parsing
of the text prior to input into cut(1).

HTH
--
Lew Pitcher
"In Skills, We Trust"

Cary

unread,
Jan 10, 2013, 12:41:18 PM1/10/13
to
On Thu, 10 Jan 2013 11:08:33 -0500, Lew Pitcher wrote:



>
> The problem isn't with cut(1), it is with your /echo $REPLY/
>
>


I thank all who responded.

Yes, I was only looking at cut. Just adding the -r switch to read
allowed cut to receive the desired input. My little script worked
after that change.

Cary

Dave Gibson

unread,
Jan 10, 2013, 2:24:58 PM1/10/13
to
Lew Pitcher <lpit...@teksavvy.com> wrote:
> On Thursday 10 January 2013 06:47, in comp.unix.shell, notv...@nmail.com
> wrote:
>
>> Here is a sample of a file, example.txt, containing a few lines of data:
>>
>> files data=foo {} rel\path\to\01-song_name.ogg
>> files data=bar {} rel\path\to\02-song_name.ogg
>> files data=bas {} rel\path\to\03-song_name.ogg
>>
>> EOF

>> #>> while read; do echo $REPLY | cut -d\\ -f4; done <example.txt
>>
>> files data=foo {} relpathto01-song_name.ogg
>> files data=bar {} relpathto02-song_name.ogg
>> files data=bas {} relpathto03-song_name.ogg
>>
>> #>>
>
> The problem isn't with cut(1), it is with your /echo $REPLY/

The backslashes are being removed during input (read).

$ read REPLY <<< 'a\b\c\d\e\f\g'
$ echo $REPLY
abcdefg

$ read -r REPLY <<< 'a\b\c\d\e\f\g'
$ echo $REPLY
a\b\c\d\e\f\g

Cary

unread,
Jan 10, 2013, 6:19:47 PM1/10/13
to
Thank you again.

Dave, your suggestion to add the optional argument to
read was just what I had missed.
In the script I was trying to set two variables from each
line of data and then run a cp(1) command for every iteration using
the two variables as filename arguments.
To set the first variable I used cut(1) once as a filter. For the
second variable I called cut again two more times in a pipe. What
confused me was that the first and the third filters gave me most
of the output I expected.

#!/bin/bash

while read -r; do

second_name="$(echo "$REPLY" | cut -d\" -f4)"
first_name="$(echo "$REPLY" | cut -d\\ -f4 | cut -d\" -f1)"

echo "cp $first_name $second_name"
# cp "$first_name" "$second_name"

done <example2.txt
#EOF


bash-4.2$ ./my_script.sh
cp 01-song_title.ogg D_E1M1
cp 02-song_title.ogg D_E1M2
cp 03-song_title.ogg D_E1M3
cp 04-song_title.ogg D_E1M4
bash-4.2$



example2.txt [4 line file]:

Music { ID = "e1m1"; foo = "D_E1M1"; Ext = "rel\path\to\01-song_title.ogg"; }
Music { ID = "e1m2"; bar = "D_E1M2"; Ext = "rel\path\to\02-song_title.ogg"; }
Music { ID = "e1m3"; bas = "D_E1M3"; Ext = "rel\path\to\03-song_title.ogg"; }
Music { ID = "e1m4"; foo = "D_E1M4"; Ext = "rel\path\to\04-song_title.ogg"; }


Output of the same script without the -r switch:

bash-4.2$ ./my_script.sh
cp Music { ID = D_E1M1
cp Music { ID = D_E1M2
cp Music { ID = D_E1M3
cp Music { ID = D_E1M4
bash-4.2$













Dave Gibson

unread,
Jan 11, 2013, 2:51:59 PM1/11/13
to
Cary <notv...@nmail.com> wrote:

[snip: read with and without -r]

> In the script I was trying to set two variables from each
> line of data and then run a cp(1) command for every iteration using
> the two variables as filename arguments.
> To set the first variable I used cut(1) once as a filter. For the
> second variable I called cut again two more times in a pipe. What
> confused me was that the first and the third filters gave me most
> of the output I expected.
>
> #!/bin/bash
>
> while read -r; do
>
> second_name="$(echo "$REPLY" | cut -d\" -f4)"
> first_name="$(echo "$REPLY" | cut -d\\ -f4 | cut -d\" -f1)"
>
> echo "cp $first_name $second_name"
> # cp "$first_name" "$second_name"
>
> done <example2.txt
> #EOF

> example2.txt [4 line file]:
>
> Music { ID = "e1m1"; foo = "D_E1M1"; Ext = "rel\path\to\01-song_title.ogg"; }
> Music { ID = "e1m2"; bar = "D_E1M2"; Ext = "rel\path\to\02-song_title.ogg"; }
> Music { ID = "e1m3"; bas = "D_E1M3"; Ext = "rel\path\to\03-song_title.ogg"; }
> Music { ID = "e1m4"; foo = "D_E1M4"; Ext = "rel\path\to\04-song_title.ogg"; }

It's possible, by splitting on both " and \, to have the input tokenised
by read:

while IFS='"\' read -r j j j tgt j j j j src j ; do
echo cp -- "$src" "$tgt"
done < example2.txt

Marc Girod

unread,
Jan 11, 2013, 4:50:50 PM1/11/13
to
On Jan 10, 2:36 pm, Ed Morton <mortons...@gmail.com> wrote:

> That would still fail

Apart that it does not.
I take both IFS= and read -r as valid points, but it is another story.

Marc

Cary

unread,
Jan 12, 2013, 3:58:29 AM1/12/13
to
On Thu, 10 Jan 2013 08:36:41 -0600, Ed Morton wrote:

> On 1/10/2013 6:33 AM, Marc Girod wrote:
>> On Jan 10, 12:27 pm, Marc Girod <marc.gi...@gmail.com> wrote:
>>
>>> It is where your backslashes get evaled.
>>
>> $ perl -ple 's%\\%\\\\%g' example.txt | \
>> while read; do echo $REPLY | cut -d\\ -f4; done
>>
>
> without setting IFS= leading and trailing spaces will be
> stripped off which may or may not be desirable and the unquoted echo argument
> will be expanded by the shell.
>
> Try this instead:
>
> while IFS= read -r REPLY; do printf "%s\n" "$REPLY" | cut -d\\ -f4; done
>
> Ed.



Does the IFS= have the effect of unsetting the
default "space-tab-newline" value of IFS? And how would that
change what is passed to read?

Ben Bacarisse

unread,
Jan 12, 2013, 7:38:16 AM1/12/13
to
Cary <notv...@nmail.com> writes:

> On Thu, 10 Jan 2013 08:36:41 -0600, Ed Morton wrote:
>
>> On 1/10/2013 6:33 AM, Marc Girod wrote:
>>> On Jan 10, 12:27 pm, Marc Girod <marc.gi...@gmail.com> wrote:
>>>
>>>> It is where your backslashes get evaled.
>>>
>>> $ perl -ple 's%\\%\\\\%g' example.txt | \
>>> while read; do echo $REPLY | cut -d\\ -f4; done
>>
>> without setting IFS= leading and trailing spaces will be stripped off
>> which may or may not be desirable and the unquoted echo argument will
>> be expanded by the shell.
>>
>> Try this instead:
>>
>> while IFS= read -r REPLY; do printf "%s\n" "$REPLY" | cut -d\\ -f4; done
>
> Does the IFS= have the effect of unsetting the
> default "space-tab-newline" value of IFS?

Yes, but only for the read command.

> And how would that change what is passed to read?

Ed Morton explained that: "without setting IFS= leading and trailing
spaces will be stripped off". 'IFS= read -r var' is the way to read a
line unaltered.

--
Ben.

Marc Girod

unread,
Jan 12, 2013, 9:53:49 AM1/12/13
to
On Jan 11, 9:50 pm, Marc Girod <marc.gi...@gmail.com> wrote:

> Apart that it does not.

I mean, should I report it as a bug to Cygwin 1.7.16(0.262/5/3),
bash 4.1.10(4)?

Marc

Marc Girod

unread,
Jan 12, 2013, 10:00:18 AM1/12/13
to
On Jan 12, 2:53 pm, Marc Girod <marc.gi...@gmail.com> wrote:

> I mean, should I report it as a bug [...]?

Of course not.

tmp> perl -ple 's%\\%\\\\%g' example.txt | \
while read REPLY; do echo $REPLY; done
files data=foo {} rel\path\to\01-song_name.ogg
files data=bar {} rel\path\to\02-song_name.ogg
files data=bas {} rel\path\to\03-song_name.ogg
tmp> perl -ple 's%\\%\\\\%g' example.txt | \
while read -r REPLY; do echo $REPLY; done
files data=foo {} rel\\path\\to\\01-song_name.ogg
files data=bar {} rel\\path\\to\\02-song_name.ogg
files data=bas {} rel\\path\\to\\03-song_name.ogg

Marc

Dave Gibson

unread,
Jan 12, 2013, 10:03:32 AM1/12/13
to
Cary <notv...@nmail.com> wrote:
> On Thu, 10 Jan 2013 08:36:41 -0600, Ed Morton wrote:
>
>> On 1/10/2013 6:33 AM, Marc Girod wrote:
>>> On Jan 10, 12:27 pm, Marc Girod <marc.gi...@gmail.com> wrote:
>>>
>>>> It is where your backslashes get evaled.
>>>
>>> $ perl -ple 's%\\%\\\\%g' example.txt | \
>>> while read; do echo $REPLY | cut -d\\ -f4; done
>>>
>>
>> without setting IFS= leading and trailing spaces will be
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> stripped off which may or may not be desirable and the unquoted
^^^^^^^^^^^^
>> echo argument will be expanded by the shell.
>>
>> Try this instead:
>>
>> while IFS= read -r REPLY; do printf "%s\n" "$REPLY" | cut -d\\ -f4; done
>>
>
> Does the IFS= have the effect of unsetting the
> default "space-tab-newline" value of IFS?

It sets IFS to an empty string (equivalent to IFS="").

> And how would that change what is passed to read?

It changes what read does to its input (word splitting).

Note what happens to the leading and trailing spaces:

$ read a <<< ' one two '
$ echo ".$a."
.one two.

$ IFS= read a <<< ' one two '
$ echo ".$a."
. one two .

Cary

unread,
Jan 12, 2013, 2:33:16 PM1/12/13
to
It bears repeating for me, then.
Is it accurate to say that the value of
$IFS is null in that case?

Cary

unread,
Jan 12, 2013, 2:41:07 PM1/12/13
to
Noted, thanks.

Ben Bacarisse

unread,
Jan 12, 2013, 2:49:42 PM1/12/13
to
Ah, you were asking what IFS= does to IFS, not what the effect of the
setting is on the read command. Yes, IFS= has the effect of making IFS
null.

--
Ben.

Cary

unread,
Jan 12, 2013, 3:16:16 PM1/12/13
to
The internal variable is reset before read is forked. When read exits
the script continues with the default setting for $IFS.

Jon LaBadie

unread,
Jan 12, 2013, 5:06:50 PM1/12/13
to
One minor adjustment, "the script continues with the previous setting for
IFS". It may not have been default before the read.

Ben Bacarisse

unread,
Jan 12, 2013, 5:31:53 PM1/12/13
to
read is not forked because it's a built-in shell command but, yes the
effect of the setting is only for the duration of the read command.

--
Ben.
0 new messages