Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

read Built-in Parameter Behavior -- Null Byte Delimiter

111 views
Skip to first unread message

Adam Danischewski

unread,
Jan 16, 2016, 5:13:20 PM1/16/16
to bug-...@gnu.org
It seems the parameter for the delimiter for the read built-in behaves differently for the NULL case, and it is a very useful case. I found this after a difficult to track down bug appeared in some of  my code, so I thought I would pass it on to you.

If it is expected behavior I didn't see it in the documents. Most other options seem to follow the getopts model and allow for no space for parameter arguments.

It seems to work for the read built-in yet not for NULL read -d''.

E.g.
$> touch file_{1..40..3}.txt
$> while IFS= read -r -d'.' a; do echo "got $a"; done < <(find . -type f -print0)
                       has the same results as
$> while IFS= read -r -d '.' a; do echo "got $a"; done < <(find . -type f -print0)
got
got /file_40
got txt
got /file_37
...
got txt
got /file_1

Yet if we look for the null byte:
$> while IFS= read -r -d'' a; do echo "got $a"; done < <(find . -type f -print0)
                       returns nothing
$> while IFS= read -r -d '' a; do echo "got $a"; done < <(find . -type f -print0)
                       returns the expected results
got ./file_40.txt
got ./file_37.txt
got ./file_34.txt
...
got ./file_16.txt


Piotr Grzybowski

unread,
Jan 16, 2016, 6:25:07 PM1/16/16
to Adam Danischewski, bug-bash
hi,

On Sat, Jan 16, 2016 at 8:28 PM, Adam Danischewski
<adam.dan...@gmail.com> wrote:
> ..
> Yet if we look for the null byte:
> $> while IFS= read -r -d'' a; do echo "got $a"; done < <(find . -type f
> -print0)
> returns nothing

this is because it is the same as:

read -r -d a;

I think there is no way to distinguish the two.
The documentation clearly says:

-d delim

so without going into details of how getopt works, it seems to me
that this is not a bug.

> $> while IFS= read -r -d '' a; do echo "got $a"; done < <(find . -type f
> -print0)
> returns the expected results

pg

Martijn Dekker

unread,
Jan 16, 2016, 7:02:24 PM1/16/16
to bug-...@gnu.org, Adam Danischewski
Adam Danischewski schreef op 16-01-16 om 20:28:
> Yet if we look for the null byte:

bash, like most UNIX shells, cannot store or handle null bytes. (zsh is
the only exception I know of.)

> $> while IFS= read -r -d'' a; do echo "got $a"; done < <(find . -type f
> -print0)
> * returns nothing *

This is because of how shell quoting works. The "read" command doesn't
even enter into it. It's all happening before "read" is even executed.

'' does not represent a null byte; it's an empty, as in zero bytes,
surrounded by single quotes.

Before passing the arguments to the command (such as 'read'), the shell
performs quote removal. This removes the quotes from the empty.

That leaves the 'read' command with no way to distinguish between

read -r -d'' a

and

read -r -d a

Meaning, you were actually executing "read" with "a" as the delimiter,
and no variable, so your "echo" command naturally returned nothing.
Meanwhile "read" would have stored whatever it got in the variable REPLY
by default.

Separating the quoted empty from the -d makes it possible to distinguish
it as an empty argument (which is still different from a null byte).

Hope this helps.

- M.


Greg Wooledge

unread,
Jan 18, 2016, 8:15:31 AM1/18/16
to Adam Danischewski, bug-...@gnu.org
On Sat, Jan 16, 2016 at 02:28:27PM -0500, Adam Danischewski wrote:
> If it is expected behavior I didn't see it in the documents.

I suspect the interpretation of -d '' to mean a NUL byte delimiter
may have been an accident originally (but that's a guess). But it's
an incredibly useful feature, and has been used in countless real
life scripts. At this point, while it is still undocumented, it is
nevertheless a feature whose omission would be considered a regression.

Others have correctly described the difference between -d'' and -d ''
so I won't repeat that, except to say that they are right.

Chet Ramey

unread,
Jan 18, 2016, 11:16:45 AM1/18/16
to Greg Wooledge, Adam Danischewski, bug-...@gnu.org, chet....@case.edu
On 1/18/16 8:14 AM, Greg Wooledge wrote:
> On Sat, Jan 16, 2016 at 02:28:27PM -0500, Adam Danischewski wrote:
>> If it is expected behavior I didn't see it in the documents.
>
> I suspect the interpretation of -d '' to mean a NUL byte delimiter
> may have been an accident originally (but that's a guess).

I guess that depends on what you mean by `accident'. There's nothing
special in the code that checks for NUL; it's just another delimiter
and another value returned by read(2). There's no reason it should
not just work.

> But it's
> an incredibly useful feature, and has been used in countless real
> life scripts. At this point, while it is still undocumented, it is
> nevertheless a feature whose omission would be considered a regression.

It's not a special case that needs to be documented as such. It's a
straightforward application of the rules for assigning the delimiter
and reading it. You do have to understand how C strings work and how
they are used when passing arguments to commands for it to make sense.

Chet


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/

Greg Wooledge

unread,
Jan 18, 2016, 11:25:57 AM1/18/16
to Chet Ramey, Adam Danischewski, bug-...@gnu.org
On Mon, Jan 18, 2016 at 11:16:06AM -0500, Chet Ramey wrote:
> On 1/18/16 8:14 AM, Greg Wooledge wrote:
> > I suspect the interpretation of -d '' to mean a NUL byte delimiter
> > may have been an accident originally (but that's a guess).
>
> I guess that depends on what you mean by `accident'.

Happy coincidence, then.

> It's not a special case that needs to be documented as such. It's a
> straightforward application of the rules for assigning the delimiter
> and reading it. You do have to understand how C strings work and how
> they are used when passing arguments to commands for it to make sense.

Other shells must go out of their way to suppress it, then.

wooledg@wooledg:~$ while IFS= read -r -d '' foo; do echo "<$foo>"; done < <(printf 'one\0two\0')
<one>
<two>
wooledg@wooledg:~$ ksh
$ while IFS= read -r -d '' foo; do echo "<$foo>"; done < <(printf 'one\0two\0')
$ while IFS= read -r -d x foo; do echo "<$foo>"; done < <(printf 'onextwox')
<one>
<two>

Chet Ramey

unread,
Jan 18, 2016, 1:44:43 PM1/18/16
to Greg Wooledge, Adam Danischewski, bug-...@gnu.org, chet....@case.edu
On 1/18/16 11:25 AM, Greg Wooledge wrote:
> On Mon, Jan 18, 2016 at 11:16:06AM -0500, Chet Ramey wrote:
>> On 1/18/16 8:14 AM, Greg Wooledge wrote:
>>> I suspect the interpretation of -d '' to mean a NUL byte delimiter
>>> may have been an accident originally (but that's a guess).
>>
>> I guess that depends on what you mean by `accident'.
>
> Happy coincidence, then.

Sure.

>
>> It's not a special case that needs to be documented as such. It's a
>> straightforward application of the rules for assigning the delimiter
>> and reading it. You do have to understand how C strings work and how
>> they are used when passing arguments to commands for it to make sense.
>
> Other shells must go out of their way to suppress it, then.

Most of the other shells remove NUL bytes from `read's input. They
probably do this before checking the delimiter.

Stephane Chazelas

unread,
Jan 18, 2016, 5:00:06 PM1/18/16
to bug-...@gnu.org
2016-01-18 11:16:06 -0500, Chet Ramey:
[...]
> > But it's
> > an incredibly useful feature, and has been used in countless real
> > life scripts. At this point, while it is still undocumented, it is
> > nevertheless a feature whose omission would be considered a regression.
>
> It's not a special case that needs to be documented as such. It's a
> straightforward application of the rules for assigning the delimiter
> and reading it. You do have to understand how C strings work and how
> they are used when passing arguments to commands for it to make sense.
[...]

You should not have to understand C strings to be able to use a
shell.

C strings are *encoded* with a NUL delimiter. Stricktly
speaking, that NUL is not part of the line's content. "read -d
something" sets the delimited to the first character of
"something", with read -d "", there's no character in that
string.

I think there's no way anyone that has no notion of C could
guess that

read -d '' var

reads until the first NUL character.

A more intuitive reading of that would be that it disables
delimiters altogether (reads until end of file).

I'd agree it should be documented and that it's a useful feature.
An I see the same feature is coming to readarray/mapfile in 4.4
which is welcome as well.

Also, we see people doing:

read -d $'\0' var

That is actually doing what it seems to be saying on the can
(read until a $'\0'), except that what it says on the can is
wrong, since

read -d $'\0' var

actually calls "read" with ("read", "-d", "", "var") arguments.

It may be worth mentioning that command line arguments, here
documents and variables in bash don't support the NUL character
(and how it behaves in various contexts).

--
Stephane


Stephane Chazelas

unread,
Jan 18, 2016, 5:00:07 PM1/18/16
to bug-...@gnu.org
2016-01-18 11:25:49 -0500, Greg Wooledge:
[...]
> Other shells must go out of their way to suppress it, then.
>
> wooledg@wooledg:~$ while IFS= read -r -d '' foo; do echo "<$foo>"; done < <(printf 'one\0two\0')
> <one>
> <two>
> wooledg@wooledg:~$ ksh
> $ while IFS= read -r -d '' foo; do echo "<$foo>"; done < <(printf 'one\0two\0')
> $ while IFS= read -r -d x foo; do echo "<$foo>"; done < <(printf 'onextwox')
> <one>
> <two>
[...]

zsh behaves like bash here (wrt to '', not in corner cases like:
$ echo 'aéb' | bash -c "read -d $'\xc3' a; echo \$a"
a
$ echo 'aéb' | zsh -c "read -d $'\xc3' a; echo \$a"
aéb
)

Note that zsh supports passing NUL characters to its builtins, so

read -d $'\0' var

does do what it says on the tin there.

ksh93's one seems to be broken with multi-byte characters:

~$ echo 'aéb' | ksh -c 'read -d é a; echo $a'
aéb
~$ echo 'aéb' | ksh -c "read -d $'\xc3' a; echo \$a"
zsh: done echo 'aéb' |
zsh: segmentation fault ksh -c "read -d $'\xc3' a; echo $a"

--
Stephane


Adam Danischewski

unread,
Jan 19, 2016, 11:39:14 AM1/19/16
to Chester Ramey, Greg Wooledge, bug-...@gnu.org
> Other shells must go out of their way to suppress it, then.

Most of the other shells remove NUL bytes from `read's input.  They
probably do this before checking the delimiter.

Bash also removes the single quotes before it hits read when the single quotes are attached to the delimiter option (-d'').  

I know its possible to not think of it as a bug, but it can cause very difficult to find errors because the other behavior -- putting the quoted argument next to the option -- works when the option value is not null -- a null delimiter option for read is common and useful -- so I think it's something worth addressing.

Perhaps it would be possible to put a filter on the Bash interpreter to check for the case of empty single quotes next to an option and handle it separately as special case?

E.g. my.bsh

#!/bin/bash
echo "$#"
while getopts d: OPTION "$@"; do
 case "$OPTION" in
   d)
     echo "the optarg is ${OPTARG##*=}, optind is ${OPTIND}"
     [[ -z "${OPTARG}" ]] && echo "Let's set the null byte as the delim."
    ;;
 esac
done
exit 0

$> ./my.bsh -d'' arg1 
## Incorrectly interprets arg1 as the option argument.
2
the optarg is arg1, optind is 3

## If Bash instead would do something like this to preserve the presence of the null argument I think it would alleviate a potential source of bugs:

$> eval $(echo "./my.bsh -d'' arg1" | sed "s/-d''/-d ''/g")
3
the optarg is , optind is 3
Let's set the null byte as the delim.


+AMD 

Greg Wooledge

unread,
Jan 19, 2016, 11:50:01 AM1/19/16
to Adam Danischewski, bug-...@gnu.org
On Tue, Jan 19, 2016 at 11:39:07AM -0500, Adam Danischewski wrote:
> Bash also removes the single quotes before it hits read when the single
> quotes are attached to the delimiter option (-d'').

And in EVERY OTHER COMMAND. This is how quotes work. This is utterly
fundamental to bash and the entire Bourne shell syntax.

echo ""'a'''$''""''

Is *exactly* the same as

echo a

Echo has no way to know that you typed a and not "a" or a"" or 'a' or ....

Adam Danischewski

unread,
Jan 19, 2016, 12:09:59 PM1/19/16
to Greg Wooledge, bug-...@gnu.org
Right, the programs don't see those empty strings because the shell is stripping them when it parses the line. But for options, for the shell parser to strip away the empty string presumes that the user made a mistake and has needlessly/erroneously provided a useless empty string.

In the case for read -d'' very probably the user has attempted to supply the null delimiter, but read quietly takes the next argument. I think it makes sense to bump empty strings away from the option when they occur after options in the Bash interpreter parsing of the commandline.
-d''  --> -d ''

Greg Wooledge

unread,
Jan 19, 2016, 12:13:25 PM1/19/16
to Adam Danischewski, bug-...@gnu.org
On Tue, Jan 19, 2016 at 12:09:54PM -0500, Adam Danischewski wrote:
> In the case for read -d'' very probably the user has attempted to supply
> the null delimiter, but read quietly takes the next argument. I think it
> makes sense to bump empty strings away from the option when they occur
> after options in the Bash interpreter parsing of the commandline.
> -d'' --> -d ''

I strongly disagree. Making special exceptions in an already horrendously
complex and obtuse syntax is not progress.

Reuti

unread,
Jan 25, 2016, 1:10:12 PM1/25/16
to Greg Wooledge, Adam Danischewski, bug-...@gnu.org
Hi,

> Am 19.01.2016 um 17:49 schrieb Greg Wooledge <woo...@eeg.ccf.org>:
>
> On Tue, Jan 19, 2016 at 11:39:07AM -0500, Adam Danischewski wrote:
>> Bash also removes the single quotes before it hits read when the single
>> quotes are attached to the delimiter option (-d'').
>
> And in EVERY OTHER COMMAND. This is how quotes work. This is utterly
> fundamental to bash and the entire Bourne shell syntax.
>
> echo ""'a'''$''""''
>
> Is *exactly* the same as
>
> echo a
>
> Echo has no way to know that you typed a and not "a" or a"" or 'a' or ....

Sometimes I miss a feature in Bash to get access to the plain command line the user typed, including all quotes and other redirections - just as someone would press the cursor up key on the command line to get the last line. E.g. like an environment variable INVOCATION_LINE which I can check in a started script.

The reason for this is to remember the exact command the user typed to submit a job to our queuingsystems in our clusters (there is a field for comments in the queuingsystem, and the users requested a feature to look up what they typed initially at a later point in time, i.e. when the job has finished, for exactly this particular job - so the bash history is no option here).

Sure, I can take the command line and enclose all options in quotes and put this in the comment field. But this won't work in case some path expansion took place, process substitutions, input/output redirections, using quotes in arguments.

-- Reuti

Greg Wooledge

unread,
Jan 25, 2016, 1:52:44 PM1/25/16
to Reuti, bug-...@gnu.org
On Mon, Jan 25, 2016 at 07:09:27PM +0100, Reuti wrote:
> Sometimes I miss a feature in Bash to get access to the plain command line
> the user typed, including all quotes and other redirections

There is a way to do this, but it is not a path for the sane.

http://www.chiark.greenend.org.uk/~sgtatham/aliases.html

Reuti

unread,
Jan 30, 2016, 5:24:11 PM1/30/16
to Greg Wooledge, bug-...@gnu.org
Greg,
Thanks a bunch. This hack was exactly what I was looking for - looks like I'm not the first one having a need for it. Inside the function I had to use () to get a subshell to avoid that "str" is known after the issued command, but I get the exact command line in the job's comment this way now.

-- Reuti
0 new messages