Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Replace all non-printable/control characters with plain space except keeping the `\n' as they are.

89 views
Skip to first unread message

hongy...@gmail.com

unread,
Mar 20, 2023, 11:00:32 AM3/20/23
to
I want to replace all the non-printable/control characters with plain space except keeping the `\n' as they are in the following string:

```
str:="""#g2 % point group to the space group of group
3 % generator
0 -1 0
0 0 -1
-1 0 0
3 /8 % generator
-30 58 -30
-33 55 -25
-25 55 -33
% order of the group unknown""";
```

Is there a convenient way to do this?

Regards,
Zhao

Janis Papanagnou

unread,
Mar 20, 2023, 12:33:06 PM3/20/23
to
On 20.03.2023 16:00, hongy...@gmail.com wrote:
> I want to replace all the non-printable/control characters with plain
> space except keeping the `\n' as they are in the following string:

I don't see any control characters in your data below.

>
> ```
> str:="""#g2 % point group to the space group of group
> 3 % generator
> 0 -1 0
> 0 0 -1
> -1 0 0
> 3 /8 % generator
> -30 58 -30
> -33 55 -25
> -25 55 -33
> % order of the group unknown""";
> ```
>
> Is there a convenient way to do this?

Of course. Use variable substitution with patterns containing
the respective character classes; for example

str="..." # any string
printf "%s" "${str//[^[:print:]$'\n']/ }"

to replace all occurrences of non-printable and also not '\n'.

Janis

>
> Regards,
> Zhao
>

hongy...@gmail.com

unread,
Mar 20, 2023, 7:59:13 PM3/20/23
to
On Tuesday, March 21, 2023 at 12:33:06 AM UTC+8, Janis Papanagnou wrote:
> On 20.03.2023 16:00, hongy...@gmail.com wrote:
> > I want to replace all the non-printable/control characters with plain
> > space except keeping the `\n' as they are in the following string:
> I don't see any control characters in your data below.

If I first store the string here into a file named as `strfile' and then check it as follows, you will see them:

werner@X10DAi:~$ cat -A strfile
#g2 % point group to the space group of group$
3^I% generator$
0 -1 0$
0 0 -1$
-1 0 0$
3^I/8^I^I% generator$
-30 58 -30$
-33 55 -25$
-25 55 -33$
% order of the group unknown$


> >
> > ```
> > str:="""#g2 % point group to the space group of group
> > 3 % generator
> > 0 -1 0
> > 0 0 -1
> > -1 0 0
> > 3 /8 % generator
> > -30 58 -30
> > -33 55 -25
> > -25 55 -33
> > % order of the group unknown""";
> > ```
> >
> > Is there a convenient way to do this?
> Of course. Use variable substitution with patterns containing
> the respective character classes; for example
>
> str="..." # any string
> printf "%s" "${str//[^[:print:]$'\n']/ }"
>
> to replace all occurrences of non-printable and also not '\n'.

But your description above is inconsistent with the answer given by ChatGPT:

```
User:
str="..." # any string
printf "%s" "${str//[^[:print:]$'\n']/ }"

ChatGPT:
This code replaces any non-printable characters in the string "str" with a space. The "printf" command is used to print the modified string.

The "${str//[^[:print:]$'\n']/ }" part of the code is a parameter expansion that uses pattern substitution. The pattern "[^[:print:]$'\n']" matches any character that is not a printable character or a newline character. The substitution "/ " replaces each matched character with a space.

Overall, this code is useful for cleaning up strings and removing any non-printable characters that may cause issues in further processing or display.

User:
explain it in more detail

ChatGPT:
Sure, I'd be happy to explain this code in more detail!

The code you provided is a Bash script that uses the `printf` command to print a modified version of a string. The string is stored in the variable `str`, which is defined as a placeholder in the code you provided.

The `${str//pattern/replacement}` syntax is a parameter expansion that performs pattern substitution on the `str` variable. In this case, the pattern is `[^[:print:]$'\n']`, which matches any character that is not a printable character or a newline character. The `[:print:]` character class matches any printable character, which includes letters, numbers, punctuation, and whitespace. The `$'\n'` syntax is a Bash escape sequence that represents a newline character.

The replacement in this pattern substitution is a space character, denoted by the single space between the forward slashes. This means that any non-printable characters or newline characters in the `str` variable will be replaced with a space character.

The `printf` command then prints the modified string using the `%s` format specifier. This format specifier tells `printf` to print the argument as a string. The modified string is passed as an argument to `printf`, using the `${str//pattern/replacement}` syntax to substitute the modified string for the original `str` variable.

Overall, this code is a useful way to clean up strings and remove any non-printable characters or newline characters that may cause issues in further processing or display.
```

Finally, I only want to replace all occurrences of non-printable with one space and also keep '\n' as they are. The following is the desired result when applied on the file whose content is the string discussed here:

werner@X10DAi:~$ sed -e 's/[^[:print:]]/ /g' strfile | cat -A
#g2 % point group to the space group of group$
3 % generator$
0 -1 0$
0 0 -1$
-1 0 0$
3 /8 % generator$
-30 58 -30$
-33 55 -25$
-25 55 -33$
% order of the group unknown$

But my concern here is that is the whole file is represented in a string, I should do the above string operations on this string instead of a file.

> Janis
>
> >
> > Regards,
> > Zhao
> >

Janis Papanagnou

unread,
Mar 20, 2023, 9:28:39 PM3/20/23
to
On 21.03.2023 00:59, hongy...@gmail.com wrote:
> On Tuesday, March 21, 2023 at 12:33:06 AM UTC+8, Janis Papanagnou
> wrote:
>> On 20.03.2023 16:00, hongy...@gmail.com wrote:
>>> I want to replace all the non-printable/control characters with
>>> plain space except keeping the `\n' as they are in the following
>>> string:
>> I don't see any control characters in your data below.
>
> If I first store the string here into a file named as `strfile' and
> then check it as follows, you will see them:

No. If _you_ do that *you* will see them. _I_ just see spaces tabs
and newlines as the only control characters.

>
> werner@X10DAi:~$ cat -A strfile #g2 % point group to the space group
> of group$ 3^I% generator$ 0 -1 0$ 0 0 -1$ -1 0 0$ 3^I/8^I^I%
> generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$ % order of the group
> unknown$
>
>
>>>
>>> ``` str:="""#g2 % point group to the space group of group 3 %
>>> generator 0 -1 0 0 0 -1 -1 0 0 3 /8 % generator -30 58 -30 -33 55
>>> -25 -25 55 -33 % order of the group unknown"""; ```
>>>
>>> Is there a convenient way to do this?
>> Of course. Use variable substitution with patterns containing the
>> respective character classes; for example
>>
>> str="..." # any string printf "%s" "${str//[^[:print:]$'\n']/ }"
>>
>> to replace all occurrences of non-printable and also not '\n'.
>
> But your description above is inconsistent with the answer given by
> ChatGPT:

I suggest to discuss that with ChatGPT then, if you think there's
more expertise, and if you prefer chatting with that tool instead
of just trying the suggestion on your data.

[ big snip of chat protocol spam ]

>
> Overall, this code is a useful way to clean up strings and remove any
> non-printable characters or newline characters that may cause issues
> in further processing or display. ```

Are you saying that or your chat tool?

The character set [^[:print:]$'\n'] specifies a pattern defined
by the negated ('^') sets comprising printables and newlines.

That is what you said you need. No?

>
> Finally, I only want to replace all occurrences of non-printable with
> one space and also keep '\n' as they are.

Newlines are not touched with the code I presented.

You didn't say in your original post that you want multiple occurrences
"compressed" to a single character replacement.

To transform _multiple_ consecutive control characters by a _single_
character adjust your regexp. Depending on what tool (what shell type,
sed, whatever) you want to use it's either
[^[:print:]$'\n']+
[^[:print:]$'\n'][^[:print:]$'\n']*
+([^[:print:]$'\n'])

> The following is the
> desired result when applied on the file whose content is the string
> discussed here:
>
> werner@X10DAi:~$ sed -e 's/[^[:print:]]/ /g' strfile | cat -A #g2 %
> point group to the space group of group$ 3 % generator$ 0 -1 0$ 0 0
> -1$ -1 0 0$ 3 /8 % generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$
> % order of the group unknown$
>
> But my concern here is that is the whole file is represented in a
> string, I should do the above string operations on this string
> instead of a file.

That's what the shell's string substitution ${str//.../...} is for.

In other words, just apply the solution, or go chatting with ChatGPT.

>
>> Janis
>>
>>>
>>> Regards, Zhao
>>>

hongy...@gmail.com

unread,
Mar 21, 2023, 2:35:43 AM3/21/23
to
In fact, in the analysis of your regex, ChatGPT is indeed correct. However, its final summary is wrong:

The replacement in this pattern substitution is a space character, denoted by the single space between the forward slashes. This means that any non-printable characters or newline characters in the `str` variable will be replaced with a space character.

"any non-printable characters or newline characters" should be "any non-printable characters other than newline characters".

> [ big snip of chat protocol spam ]
> >
> > Overall, this code is a useful way to clean up strings and remove any
> > non-printable characters or newline characters that may cause issues
> > in further processing or display. ```
> Are you saying that or your chat tool?
>
> The character set [^[:print:]$'\n'] specifies a pattern defined
> by the negated ('^') sets comprising printables and newlines.
>
> That is what you said you need. No?

Yes.

> >
> > Finally, I only want to replace all occurrences of non-printable with
> > one space and also keep '\n' as they are.
> Newlines are not touched with the code I presented.
>
> You didn't say in your original post that you want multiple occurrences
> "compressed" to a single character replacement.
>
> To transform _multiple_ consecutive control characters by a _single_
> character adjust your regexp. Depending on what tool (what shell type,
> sed, whatever) you want to use it's either
> [^[:print:]$'\n']+
> [^[:print:]$'\n'][^[:print:]$'\n']*
> +([^[:print:]$'\n'])

They all work as follows, with `grep -E':

werner@X10DAi:~$ grep -E '[^[:print:]$'\n']+' strfile | cat -A
3^I% generator$
3^I/8^I^I% generator$
werner@X10DAi:~$ grep -E '[^[:print:]$'\n'][^[:print:]$'\n']*' strfile | cat -A
3^I% generator$
3^I/8^I^I% generator$
werner@X10DAi:~$ grep -E '[^[:print:]$'\n']+' strfile | cat -A
3^I% generator$
3^I/8^I^I% generator$

> > The following is the
> > desired result when applied on the file whose content is the string
> > discussed here:
> >
> > werner@X10DAi:~$ sed -e 's/[^[:print:]]/ /g' strfile | cat -A #g2 %
> > point group to the space group of group$ 3 % generator$ 0 -1 0$ 0 0
> > -1$ -1 0 0$ 3 /8 % generator$ -30 58 -30$ -33 55 -25$ -25 55 -33$
> > % order of the group unknown$
> >
> > But my concern here is that is the whole file is represented in a
> > string, I should do the above string operations on this string
> > instead of a file.
> That's what the shell's string substitution ${str//.../...} is for.

In my example, the more portable usage should be as follows:

werner@X10DAi:~$ sed -Ee 's/[^[:print:]]/ /g' strfile | cat -A
#g2 % point group to the space group of group$
3 % generator$
0 -1 0$
0 0 -1$
-1 0 0$
3 /8 % generator$
-30 58 -30$
-33 55 -25$
-25 55 -33$
% order of the group unknown$

> In other words, just apply the solution, or go chatting with ChatGPT.

Agreed. But isn't it better to combine the advantages of both to a certain extent?

Zhao
> >
> >> Janis
> >>
> >>>
> >>> Regards, Zhao
> >>>

hongy...@gmail.com

unread,
Mar 21, 2023, 9:46:43 AM3/21/23
to
Another question:

[[:^print:]] and [^[:print:]], can they both be used here?

Janis Papanagnou

unread,
Mar 21, 2023, 9:20:52 PM3/21/23
to
You should be aware what ChatGPT is, how it basically works. It is not
an all-knowing deity or somesuch technical counterpart. And this group
is not about discussions whether some "conclusion" of some chat tool
is ("by accident") correct or not; for that speak with the developers
and promoters of that tool.

Generally, on your posting habits here there's certainly a lot to say.
In case you appreciate the support you get here I suggest to at least
invest in the quality of your questions and your responses. If you get
hints, e.g., to consult the man page don't post that man page content
to show us that you've read it; that's noise - folks here either know
what there's in the man pages or they know how they can call them. The
chat protocol logs are in that respect also noise (see above). Usually
there's rarely one who cares what their output is if they know how that
information is actually generated. And any bugs in the database of any
tool (whether chatGPT or else) or any buggy generation process is just
OT here.

If you understood that I can try to address your question above...
Make use of their respective advantages. But don't mix responsibilities
(if you now understand what I mean). Use information from these sources
and make your mind. Compare sources. Experiment. Judge. For yourself,
primarily. If you think you have something precious or worthwhile to
contribute to this group then do it. But don't expect that the insights
you've personally got are of any interest, even if they don't fail the
on-topic test.

Just a few suggestions and things to ponder about. Feel free to ignore
them.

Janis

Janis Papanagnou

unread,
Mar 21, 2023, 9:35:40 PM3/21/23
to
[ please snip the 140 lines of previous context if all you have is a
simple question ]

On 21.03.2023 14:46, hongy...@gmail.com wrote:
>
> Another question:
>
> [[:^print:]] and [^[:print:]], can they both be used here?

If you mean whether they are interchangeable, then No.
(Where did you get _that idea_ from, from charGPT ?)
But this you could also have easily testes yourself.

Here's some basics for your convenience...

[...] defines a character set ...
[^...] defines the complement of the character set ...
[:...:] defines a (predefined) character class ...

The latter can take the position of any character in a character set,
say, [:lower:], [:upper:], or [:digit:] may be in a character set as
in (for example)

[[:lower:][:digit:]]

(to match any lowercase character or character representing a digit)
and as

[^a-z0-9]

matches the complement also

[^[:lower:][:digit:]]

matches that complement.

This is very basic and you should inspect some contemporary source
describing the Unix'y form of regular expressions and their syntax.

Janis

hongy...@gmail.com

unread,
Mar 21, 2023, 10:42:46 PM3/21/23
to
On Wednesday, March 22, 2023 at 9:20:52 AM UTC+8, Janis Papanagnou wrote:
> [...]
> Just a few suggestions and things to ponder about. Feel free to ignore
> them.

Thank you very much for your helpful comments and suggestions!

> Janis

Zhao

hongy...@gmail.com

unread,
Mar 21, 2023, 10:49:07 PM3/21/23
to
On Wednesday, March 22, 2023 at 9:35:40 AM UTC+8, Janis Papanagnou wrote:
> [ please snip the 140 lines of previous context if all you have is a
> simple question ]
> On 21.03.2023 14:46, hongy...@gmail.com wrote:
> >
> > Another question:
> >
> > [[:^print:]] and [^[:print:]], can they both be used here?
> If you mean whether they are interchangeable, then No.
> (Where did you get _that idea_ from, from charGPT ?)

It's really given by ChatGPT, but subsequently, it also showed that this is a non-existent pattern that was arbitrarily fabricated and rejected its correctness.

> But this you could also have easily testes yourself.

I've also checked it.

> This is very basic and you should inspect some contemporary source
> describing the Unix'y form of regular expressions and their syntax.

Thank you for telling me the correct route and method in tackling problems.

> Janis

Zhao

Jalen Q

unread,
Apr 1, 2023, 10:41:55 PM4/1/23
to
bdddddccd

Jalen Q

unread,
Apr 5, 2023, 3:17:14 AM4/5/23
to
0 new messages