Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sed for search and Replace for large string in a file

303 views
Skip to first unread message

bubunia...@gmail.com

unread,
Apr 2, 2019, 3:54:23 PM4/2/19
to
Hi all,

I am 100 files in a directory and I am trying to search and Replace a large string in all files in a directory using sed. I noticed that sed is not able to replace strings in all the files successfully. Some files the string was replaced successfully and some file not.

for i in `ls $dir`
do
sed -i '/s/'"large string"'/'"SECOND LARGE STRING"/g' $i

done

Can anyone help me in this regard? I am stuck here


Regards
Pradeep


Kenny McCormack

unread,
Apr 2, 2019, 4:06:20 PM4/2/19
to
In article <b4ee7bb4-258f-4181...@googlegroups.com>,
Trying out my crystal ball here...

Does "large string" contain reg exp magic characters?

--
I'm building a wall.

Lew Pitcher

unread,
Apr 2, 2019, 4:13:32 PM4/2/19
to
To add to that...

Given the script above, does "large string" or "SECOND LARGE STRING" contain
shell globbing or shell subtitution characters?

--
Lew Pitcher
"In Skills, We Trust"

Ed Morton

unread,
Apr 2, 2019, 4:37:26 PM4/2/19
to
sed doesn't work on strings it works on regexps for searching and
back-reference enabled replacement text. See
https://stackoverflow.com/q/29613304/1745001 for the hoops you have to
jump through to make sed behave as if it were operating with strings but
you're better off just using a tool like awk that does understand
strings. Also never parse the output of ls (see
https://mywiki.wooledge.org/ParsingLs)

This MIGHT be all you need depending on your input and strings which you
haven't shown us:

awk -i inplace '
BEGIN {
old = "large string"
new = "SECOND LARGE STRING"
lgth = length(old)
}
s=index($0,old) {
$0=substr($0,1,s-1) new substr($0,s+lgth)
}
{ print }
' "$dir"/*

The above uses GNU awk for "inplace" editing just like you're using GNU
sed for that but it's using literal strings in all cases so it'll work
no matter what characters are in your strings.

Ed.

Kaz Kylheku

unread,
Apr 2, 2019, 5:05:54 PM4/2/19
to
Trying my new made-in-China acrylic ball here from the Psychic Friends web store.

It says, "not all the 'large strings' are actually the same; there is a small
difference somewhere."

Dmitry Alexandrov

unread,
Apr 2, 2019, 5:35:27 PM4/2/19
to
bubunia...@gmail.com wrote:
> I am 100 files in a directory and I am trying to search and Replace a large string in all files in a directory using sed.

That’s impossible, since sed(1) is a ‘_stream_ editor’, it cannot edit files: it can only remove them and create anew. If that’s okay for you, then keep going. Otherwise, try to use just editor with no ‘stream’, that is ed(1).

> I noticed that sed is not able to replace strings in all the files successfully. Some files the string was replaced successfully and some file not.
>
> for i in `ls $dir`

This is the first most common mistake in Bash coding. [0]

> do
> sed -i '/s/'"large string"'/'"SECOND LARGE STRING"/g'

This sed expression is invalid. It reads: find strings, that contain ‘s’, and execute ‘large string/SECOND LARGE STRING/g’ on them, but there is no ‘large string’ command in sed.

> $i

This is another most common mistake in Bash coding. [1]

> done

[0] https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_.2A.mp3.29
[1] https://mywiki.wooledge.org/BashPitfalls#cp_.24file_.24target
signature.asc

Allodoxaphobia

unread,
Apr 2, 2019, 5:43:30 PM4/2/19
to
On Tue, 02 Apr 2019 16:13:29 -0400, Lew Pitcher wrote:
> Kenny McCormack wrote:
>> <bubunia...@gmail.com> wrote:
>>>
>>>I am 100 files in a directory and I am trying to search and Replace a
>>>large string in all files in a directory using sed. I noticed that sed is
>>>not able to replace strings in all the files successfully. Some files the
>>>string was replaced successfully and some file not.
>>>
>>>for i in `ls $dir`
>>>do
>>> sed -i '/s/'"large string"'/'"SECOND LARGE STRING"/g' $i
>>>
>>>done
>>>
>>>Can anyone help me in this regard? I am stuck here
>>
>> Trying out my crystal ball here...
>>
>> Does "large string" contain reg exp magic characters?
>
> To add to that...
>
> Given the script above, does "large string" or "SECOND LARGE STRING" contain
> shell globbing or shell subtitution characters?

The mind-reading team here wonders if there be line breaks
in the Large String amonst the different files. They also
sent out a note asking if the Large String is sited in a
collection of html files.

bubunia...@gmail.com

unread,
Apr 2, 2019, 9:29:12 PM4/2/19
to
Sorry for not clarifying in my qs There are special characters like = etc in the strings. The string is a base64 encoded string to be specific.

echo "large string" | base64 > test.txt

Lew Pitcher

unread,
Apr 2, 2019, 9:43:10 PM4/2/19
to
bubunia...@gmail.com wrote:

> Sorry for not clarifying in my qs There are special characters like = etc
> in the strings. The string is a base64 encoded string to be specific.

And, thus, may contain a '/' character. Please note that the '/' is the
character that sed(1) uses to separate the regex string from the replacement
string in the 's' command.

> echo "large string" | base64 > test.txt


Janis Papanagnou

unread,
Apr 3, 2019, 3:41:26 AM4/3/19
to
On 03.04.2019 03:43, Lew Pitcher wrote:
> bubunia...@gmail.com wrote:
>
>> Sorry for not clarifying in my qs There are special characters like = etc
>> in the strings. The string is a base64 encoded string to be specific.
>
> And, thus, may contain a '/' character. Please note that the '/' is the
> character that sed(1) uses to separate the regex string from the replacement
> string in the 's' command.

Since sed accepts other characters as separator just use something
different from '/', e.g. sed 's|/path/file|/otherpath/newname|g'
to avoid all the escapes.

Janis

Kenny McCormack

unread,
Apr 3, 2019, 3:54:48 AM4/3/19
to
In article <q81o33$1im$1...@news-1.m-online.net>,
Janis Papanagnou <janis_pa...@hotmail.com> made an obvious comment:
Yeah, but the problem is that you have to pick something that is guaranteed
not to occur in your string(s), and it isn't always possible to do that.

Especially since many of our crystal balls seem to be in the shop these days.

Comments:
1) Can you use control characters (e.g., ^a) as sed delimiters? Anyone
ever try that?
2) Ed's AWK solution looks best, although there will probably be
difficulties encountered by OP if/when he actually tries to use it.
I.e., he will have to figure out how to get his string(s) into the
AWK program and will probably run up against the usual litany of
"shell beginner" quoting problems. Not that any of these issues
aren't solvable, but he will have to endure it. Everybody does, at
some point in their careers.

--
The only thing Trump's made great again is Saturday Night Live.

Janis Papanagnou

unread,
Apr 3, 2019, 3:59:08 AM4/3/19
to
On 03.04.2019 09:54, Kenny McCormack wrote:
> In article <q81o33$1im$1...@news-1.m-online.net>,
> Janis Papanagnou <janis_pa...@hotmail.com> made an obvious comment:
>> On 03.04.2019 03:43, Lew Pitcher wrote:
>>> bubunia...@gmail.com wrote:
>>>
>>>> Sorry for not clarifying in my qs There are special characters like = etc
>>>> in the strings. The string is a base64 encoded string to be specific.
>>>
>>> And, thus, may contain a '/' character. Please note that the '/' is the
>>> character that sed(1) uses to separate the regex string from the
>>> replacement string in the 's' command.
>>
>> Since sed accepts other characters as separator just use something
>> different from '/', e.g. sed 's|/path/file|/otherpath/newname|g'
>> to avoid all the escapes.
>
> Yeah, but the problem is that you have to pick something that is guaranteed
> not to occur in your string(s), and it isn't always possible to do that.

With that feature you have at least many options. In this case they spoke
of base64 data so you have a fairly large set of available delimiters.

>
> Especially since many of our crystal balls seem to be in the shop these days.
>
> Comments:
> 1) Can you use control characters (e.g., ^a) as sed delimiters? Anyone
> ever try that?

I seem to recall that this is possible.

Janis

Josef Moellers

unread,
Apr 3, 2019, 10:21:27 AM4/3/19
to
On 03.04.19 03:43, Lew Pitcher wrote:
> bubunia...@gmail.com wrote:
>
>> Sorry for not clarifying in my qs There are special characters like = etc
>> in the strings. The string is a base64 encoded string to be specific.
>
> And, thus, may contain a '/' character. Please note that the '/' is the
> character that sed(1) uses to separate the regex string from the replacement
> string in the 's' command.

Wouldn't "sed" complain?

$ sed 's/a/b/g/'
sed: -e expression #1, char 8: unknown option to `s'

Josef

Ed Morton

unread,
Apr 3, 2019, 11:03:37 AM4/3/19
to
"no matter what characters" isn't exactly true with the above - you'd
need to convert single quotes to \047 or '\'' (unless you put the script
in a file and executed with awk -f ...) and escape double quotes.

If your strings are multi-line and can contain literally any character
then the simplest thing would be to save them in files named old and new
and then do:

awk -i inplace '
BEGINFILE { RS = (ARGIND<3 ? "^$" : ORS) }
ARGIND == 1 { old = $0; lgth = length(old); next }
ARGIND == 2 { new = $0; next }
s=index($0,old) {
$0=substr($0,1,s-1) new substr($0,s+lgth)
}
{ print }
' old new "$dir"/*

Regards,

Ed.

Ben Bacarisse

unread,
Apr 3, 2019, 11:16:29 AM4/3/19
to
Janis Papanagnou <janis_pa...@hotmail.com> writes:

> On 03.04.2019 09:54, Kenny McCormack wrote:
<cut>
>> Comments:
>> 1) Can you use control characters (e.g., ^a) as sed delimiters? Anyone
>> ever try that?
>
> I seem to recall that this is possible.

Yes. POSIX sed allows "any character other than <backslash> or
<newline>" and I think it is widely implemented. Modern Linux sed
appears to complain only when the delimiter is not a single-byte
character.

--
Ben.

Janis Papanagnou

unread,
Apr 3, 2019, 4:58:20 PM4/3/19
to
I am not surprised that multi-byte characters make problems.

And (just checked) ASCII NUL seems also not a possible option.
(Not surprising as well.)

Janis

Ben Bacarisse

unread,
Apr 3, 2019, 5:58:37 PM4/3/19
to
Janis Papanagnou <janis_pa...@hotmail.com> writes:

> On 03.04.2019 17:16, Ben Bacarisse wrote:
>> Janis Papanagnou <janis_pa...@hotmail.com> writes:
>>
>>> On 03.04.2019 09:54, Kenny McCormack wrote:
>> <cut>
>>>> Comments:
>>>> 1) Can you use control characters (e.g., ^a) as sed delimiters? Anyone
>>>> ever try that?
>>>
>>> I seem to recall that this is possible.
>>
>> Yes. POSIX sed allows "any character other than <backslash> or
>> <newline>" and I think it is widely implemented. Modern Linux sed
>> appears to complain only when the delimiter is not a single-byte
>> character.
>
> I am not surprised that multi-byte characters make problems.

No, nor I.

> And (just checked) ASCII NUL seems also not a possible option.
> (Not surprising as well.)

That's a bit more surprising to me. What's very surprising is that I
get the same error message for a null as I do for a UTF-8 encoded
multi-byte character:

$ sed -f s.sed
sed: file s.sed line 1: delimiter character is not a single-byte character
$ cat -A s.sed
s^@x^@X^@g$
$

I wonder why sed would think that null announces the start of a
multi-byte character.

--
Ben.

Brian Patrie

unread,
Apr 4, 2019, 2:40:51 AM4/4/19
to
On 03/04/2019 16.58, Ben Bacarisse wrote:
> What's very surprising is that I get the same error message for a null
> as I do for a UTF-8 encoded multi-byte character:
>
> $ sed -f s.sed
> sed: file s.sed line 1: delimiter character is not a
> single-byte character
> $ cat -A s.sed
> s^@x^@X^@g$
> $
>
> I wonder why sed would think that null announces the start of a
> multi-byte character.

Perhaps it thinks it's a zero-byte character. ;)

Janis Papanagnou

unread,
Apr 4, 2019, 12:21:41 PM4/4/19
to
On 03.04.2019 23:58, Ben Bacarisse wrote:
> Janis Papanagnou <janis_pa...@hotmail.com> writes:
>> [ sed substitution delimiters ]
>
>> And (just checked) ASCII NUL seems also not a possible option.
>> (Not surprising as well.)
>
> That's a bit more surprising to me. What's very surprising is that I
> get the same error message for a null as I do for a UTF-8 encoded
> multi-byte character:
>
> $ sed -f s.sed
> sed: file s.sed line 1: delimiter character is not a single-byte character
> $ cat -A s.sed
> s^@x^@X^@g$
> $
>
> I wonder why sed would think that null announces the start of a
> multi-byte character.

Interesting. I get: "... unterminated `s' command", obviously terminating
the $'s\0x\0X\0g' string interpretation (as a C-string?) at the first \0.

Janis

Ben Bacarisse

unread,
Apr 4, 2019, 12:45:56 PM4/4/19
to
I get that, but only when I run sed like this:

$ sed -e $'s\0x\0X\0g'
sed: -e expression #1, char 1: unterminated `s' command
$ echo $'s\0x\0X\0g' | cat -A
s^@x^@X^@g$
$ /bin/echo $'s\0x\0X\0g' | cat -A
s$
$

I think there is something going on in the shell.

--
Ben.

Ivan Shmakov

unread,
Apr 4, 2019, 12:57:35 PM4/4/19
to
Huh? Since when the exec family of POSIX calls (one of which
the shell by necessity uses to start Sed) is NUL-safe?

--
FSF associate member #7257 http://am-1.org/~ivan/

Janis Papanagnou

unread,
Apr 4, 2019, 2:00:12 PM4/4/19
to
I've tried it in two contexts, with $'...' ANSI strings and using a binary
file as you did; both with the same result, ksh and bash.

Janis

Janis Papanagnou

unread,
Apr 4, 2019, 2:19:55 PM4/4/19
to
On 04.04.2019 18:57, Ivan Shmakov wrote:
> [...]
>
> Huh? Since when the exec family of POSIX calls (one of which
> the shell by necessity uses to start Sed) is NUL-safe?

I had got exactly the same result with a binary data file as Ben did.

Janis

Ivan Shmakov

unread,
Apr 4, 2019, 3:10:27 PM4/4/19
to
>>>>> Janis Papanagnou <janis_pa...@hotmail.com> writes:
>>>>> On 04.04.2019 18:57, Ivan Shmakov wrote:

BB> sed: file s.sed line 1: delimiter character is not a single-byte character

>> Huh? Since when the exec family of POSIX calls (one of which the
>> shell by necessity uses to start Sed) is NUL-safe?

> I had got exactly the same result with a binary data file as Ben did.

My point is that when you use $'s\0<whatever>' in the
command line, Sed only gets "s\0" -- which explains the
"... unterminated `s' command" message.

Reading a file does not have such a limitation, so a different
behavior (as observed) is also explainable. As to why the
specific message... well, my guess is that has something to do
with the locale. Consider, e. g.:

$ LC_ALL=eo.UTF-8 sed -f <(printf s\\0a\\0b\\0\\n)
sed: dosiero /dev/fd/63 linio 1: delimiter character is not a single-byte character
$ LC_ALL=C sed -f <(printf s\\0a\\0b\\0\\n)
a
b
$

Janis Papanagnou

unread,
Apr 4, 2019, 4:19:55 PM4/4/19
to
On 04.04.2019 21:10, Ivan Shmakov wrote:
>>>>>> Janis Papanagnou <janis_pa...@hotmail.com> writes:
>>>>>> On 04.04.2019 18:57, Ivan Shmakov wrote:
>
> BB> sed: file s.sed line 1: delimiter character is not a single-byte character
>
> >> Huh? Since when the exec family of POSIX calls (one of which the
> >> shell by necessity uses to start Sed) is NUL-safe?
>
> > I had got exactly the same result with a binary data file as Ben did.
>
> My point is that when you use $'s\0<whatever>' in the
> command line, Sed only gets "s\0" -- which explains the
> "... unterminated `s' command" message.

Yes, that's why I tested the binary as well.

> Reading a file does not have such a limitation, so a different
> behavior (as observed) is also explainable. As to why the
> specific message... well, my guess is that has something to do
> with the locale. Consider, e. g.:

That indeed changes behaviour.

Janis

Barry Margolin

unread,
Apr 6, 2019, 2:14:29 PM4/6/19
to
In article <87sguxh...@bsb.me.uk>,
You can't pass a string containing null characters as arguments to
external programs. argv is an array of C strings, which use \0 as the
terminator.

Since echo is a shell built-in, it isn't limited like that, it has
direct access to the shell's parse tree.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Ben Bacarisse

unread,
Apr 6, 2019, 7:31:18 PM4/6/19
to
The shell /could/ use the same information to pass a char array with an
embedded null in it. Whether the external program would know what to do
with it is another matter, of course. sed definitively would not.

--
Ben.

Jorgen Grahn

unread,
Apr 7, 2019, 3:24:11 AM4/7/19
to
On Sat, 2019-04-06, Ben Bacarisse wrote:
> Barry Margolin <bar...@alum.mit.edu> writes:
...
>> You can't pass a string containing null characters as arguments to
>> external programs. argv is an array of C strings, which use \0 as the
>> terminator.
>>
>> Since echo is a shell built-in, it isn't limited like that, it has
>> direct access to the shell's parse tree.
>
> The shell /could/ use the same information to pass a char array with an
> embedded null in it. Whether the external program would know what to do
> with it is another matter, of course. sed definitively would not.

Isn't Barry's point that it couldn't? The shell has to call execve(2) or
similar, and it takes a char *const argv[], i.e. array of C strings, with
no additional length information.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Ben Bacarisse

unread,
Apr 7, 2019, 7:47:00 AM4/7/19
to
Yes, I think he was making that point. I needed an explicit mention of
execve to make me think of what must happen on a Unix system.

I was focusing on what the shell knows and what main could be presented
with, not on what happens in between. The problem is not that a program
can't run with an argv[x] with an embedded null in it, it's that execve
has to copy the argument vector so you can't set that up on a Unix
system. Running

execv(path, (char *[]){ path, "abc\0def", 0});

does not run the program with an 8-byte argv[1].

--
Ben.

Barry Margolin

unread,
Apr 7, 2019, 10:10:47 PM4/7/19
to
In article <87h8bag...@bsb.me.uk>,
There's a standard for how command line arguments are passed to
programs. It was designed with C strings in mind.

Barry Margolin

unread,
Apr 7, 2019, 10:13:32 PM4/7/19
to
In article <87bm1if...@bsb.me.uk>,
And even if execv() could copy the whole string, there's no way for the
target program to know that it should process anything beyond the null.

Ben Bacarisse

unread,
Apr 7, 2019, 10:21:18 PM4/7/19
to
That's something of a chicken-and-egg issue. If there were a way to
pass nulls in arguments, some programs might get written that make use
of them. I've seen, for example, the double null convention where a
list of non-empty strings is passed as one with the end marked by \0\0.
And then the null might be part of binary numeric data.

--
Ben.
0 new messages