Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1059873: glibc-doc-reference: 12.14.5 String Input Conversions: The ‘%[’ conversion requires 1 match

1 view
Skip to first unread message

Christopher Yeleighton

unread,
Jan 2, 2024, 1:40:03 PM1/2/24
to
Package: glibc-doc-reference
Version: 2.31-1
Severity: normal

Dear Maintainer,

*** Reporter, please consider answering these questions, where appropriate ***

* What led up to the situation?

I used %s conversion to print a string to a file
and %7[^] conversion to scan it back.

* What exactly did you do (or not do) that was effective (or
ineffective)?

fscanf (tf, "%7[^]", b) == 01

* What was the outcome of this action?

false

* What outcome did you expect instead?

true

*** End of the template - remove these template lines ***


-- System Information:
Distributor ID: Bunsenlabs
Description: BunsenLabs GNU/Linux 11 (Beryllium)
Release: 11
Codename: bullseye
Architecture: x86_64

Kernel: Linux 5.10.0-26-amd64 (SMP w/2 CPU threads)
Locale: LANG=pl_PL.UTF-8, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

-- no debconf information

Krzysztof Żelechowski

unread,
Jan 3, 2024, 10:30:04 AM1/3/24
to
It is not clear whether the format string "%[\0a]" matches the input
string "a" or not.  I interpret the documentation as saying that once
the selector "%[ " has been opened, the format interpreter must find a
matching "]", therefore it must not interpret the embedded "\0" as the
end of the format string. However, trying to use such a format string
causes the compiler to report the warning "format contains NUL", which
suggests that using such a format string involves undefined behaviour.

Aurelien Jarno

unread,
Jan 4, 2024, 10:40:04 AM1/4/24
to
Hi,

On 2024-01-03 16:13, Krzysztof Żelechowski wrote:
> It is not clear whether the format string "%[\0a]" matches the input string
> "a" or not.  I interpret the documentation as saying that once the selector
> "%[ " has been opened, the format interpreter must find a matching "]",
> therefore it must not interpret the embedded "\0" as the end of the format
> string.

According to the C standard, \0 or the NUL character is the end of
string. Therefore the string passed to fscanf is simply "%[", which is
invalid.

> However, trying to use such a format string causes the compiler to
> report the warning "format contains NUL", which suggests that using such a
> format string involves undefined behaviour.

As explained above, the compiler is right about that.

Regards
Aurelien

--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aure...@aurel32.net http://aurel32.net

Aurelien Jarno

unread,
Jan 4, 2024, 10:40:05 AM1/4/24
to
Hi,

On 2024-01-02 19:24, Christopher Yeleighton wrote:
> Package: glibc-doc-reference
> Version: 2.31-1
> Severity: normal
>
> Dear Maintainer,
>
> *** Reporter, please consider answering these questions, where appropriate ***
>
> * What led up to the situation?
>
> I used %s conversion to print a string to a file
> and %7[^] conversion to scan it back.

%7[^] is not a valid format string. Quoting the manual, section 12.14.5:

To read in characters that belong to an arbitrary set of your choice,
use the ‘%[’ conversion. You specify the set between the ‘[’ character
and a following ‘]’ character, using the same syntax used in regular
expressions for explicit sets of characters. As special cases:

- A literal ‘]’ character can be specified as the first character of the set.

[...]

- If a caret character ‘^’ immediately follows the initial ‘[’, then
the set of allowed input characters is everything except the
characters listed.

In your case, you want to match up to 7 characters, but given there is
no character or list of characters after the caret, it's not clear what
you want to achieve. If you want to match everything but a ‘[’, you
should use %7[^]] .

Regards,

Aurelien Jarno

unread,
Jan 4, 2024, 11:30:04 AM1/4/24
to
Hi,

On 2024-01-04 17:11, Krzysztof Żelechowski wrote:
> So we have conflicting information here.
> 1. The character '\0' is the end of a string.

This is the basic of the C language. It has nothing to do with the GNU
libc.

> 2. The character ']' is the end of this scan set.
> Information (2) is more specific than information (1), and a format string
> is not a normal string, and a scan set is not a substring but a data
> island embedded within a string (for example, the format specifier
> "%[\1-\377]" is valid as a format string but invalid as a string under
> UTF—8 locale).  Therefore it is natural to assume that rule (2) should
> prevail.  Since it apparently does not, this oddity should be documented.

I disagree there. This documentation is about the GNU C library, not a
basic C course.

Aurelien Jarno

unread,
Jan 4, 2024, 2:50:04 PM1/4/24
to
Hi,

On 2024-01-04 17:59, Krzysztof Żelechowski wrote:
> The fact that the NUL character ends a string data structure is a library
> convention rather than a language feature, except for the "" literal
> syntactic sugar. 

We are here talking about *string* functions of the C standard. String
functions in this context are defined as NULL terminated. The beginning
of section 12.14 makes clear that the argument of the fscanf function is
a format *string*.

> But it is not a programming error to have a string
> literal with embedded NULs; in fact, if your narrowing interpretation were
> universally correct, the argz API would not be possible.  You can write

The argz functions do not operate on strings. They operate on *vector of
strings*. See section 5.15.

> valid programs in C without using this convention, e.g. using BSTR
> everywhere.  You were even supposed to do that when programming in C for
> Apple Macintosh, to the extent that their C compiler used to provide an
> alternative string literal syntax for this purpose as an extension.

Indeed you are allow to use your own convention for a library, but But a
string is well defined in the C standard in the context of the string
functions.

Aurelien Jarno

unread,
Jan 4, 2024, 3:00:04 PM1/4/24
to
Hi,

On 2024-01-04 17:38, Krzysztof Żelechowski wrote:
> I was trying to read up to 7 characters, including blanks.  I assumed that
> the specifier "%[^]" would mean any character except an empty set, i.e.
> any character whatsoever.  I can see now that it can also be an incomplete

Your definition of "empty set" is quite vague. What you want is clearly
defined in the section 12.14.3, that is %7s or %7S.

> format specifier excluding the character ']'.  This is my misunderstanding
> but I think the documentation could be improved to prevent such mistakes
> in future.

I don't see how things can be improved further. But if you have a
wording suggestion, please submit one. Otherwise, I'll just close the
bug.

Regards
0 new messages