How To Insert an Escaped Apostrophe With Sed?

Rick

unread,

Feb 1, 2012, 11:16:26 AM2/1/12

to

The problem to solve are UNIX filesystems full of user files with
special characters in the names. These are the typical characters )
(*', etc.

The approach I've taken is to use the find command to produce an input
list of filenames. Then manipulate those entries and produce a script
full of move commands to rename the offending files. The script full
of move commands ends up with the special character escaped in the
source field and the destination field with no special character at
all.

For example, an original file named:
/home/rick/file(name).txt

is renamed as follows:
/usr/bin/mv /home/rick/file$name$.txt /home/rick/filename.txt

Here is an example of the input list creation and processing, we'll
look for left parens and remove them altogether:
find . -type f -name \*\(\* >> list

awk '{ print $1, $1 }' list | while read a b; do
echo $b | sed -e 's/(//g' | read c
echo $a | sed -e 's/(/\\\\(/g' | read d
echo /usr/bin/mv $d $c >> fix-it
done

The fix-it script ends up with a bunch of move entries all escaped and
ready to execute.

I have all the special character substitutions solved except the
apostrophe. I can get sed to remove the apostrophe from an input
line. What I cannot figure out is how to get the escaped apostrophe
to echo into the script of move commands. I have messed with every
conceivable combination of escapes and quotes, any I can conceive
anyway.

I want to end up with a move command something like:
/usr/bin/mv /home/rick/filename\'s.txt /home/rick/filenames.txt

I can not get the \' to work.

Thanks,
Rick

pk

unread,

Feb 1, 2012, 11:24:53 AM2/1/12

to

On Wed, 1 Feb 2012 08:16:26 -0800 (PST), Rick <rickre...@gmail.com>
wrote:

For the specific problem:

echo "fo'o'bar" | sed 's/'\''/\\&/g'

or

echo "fo'o'bar" | sed 's/'\''/\\'\''/g'

or

echo "fo'o'bar" | sed "s/'/\\\\&/g"

or

echo "fo'o'bar" | sed "s/'/\\\\'/g"

plus probably others.

for the general issue of mass renaming, see

http://mywiki.wooledge.org/BashFAQ/030

Janis Papanagnou

unread,

Feb 1, 2012, 11:26:39 AM2/1/12

to

Am 01.02.2012 17:16, schrieb Rick:
> [...]

>
> awk '{ print $1, $1 }' list | while read a b; do
> echo $b | sed -e 's/(//g' | read c
> echo $a | sed -e 's/(/\\\\(/g' | read d
> echo /usr/bin/mv $d $c>> fix-it
> done

One point with using awk is that it has the loop built in; so you
don't need a shell loop, nor all those pipes or many internal and
external commands.

Instead of print $1, $1 assign the field(s) to awk variables, use
awk function gsub() to do the substitution, and compose the output
by awk's string concatenation, finally print the "mv" command with
the awk variables as arguments. That way awk produces the whole
sequence of shell commands that you can inspect. If all is fine
pipe that awk command into sh, and all is done.

Janis

>
> [...]

Ed Morton

unread,

Feb 1, 2012, 11:55:40 AM2/1/12

to

On 2/1/2012 10:16 AM, Rick wrote:
> The problem to solve are UNIX filesystems full of user files with
> special characters in the names. These are the typical characters )
> (*', etc.
>
> The approach I've taken is to use the find command to produce an input
> list of filenames. Then manipulate those entries and produce a script
> full of move commands to rename the offending files. The script full
> of move commands ends up with the special character escaped in the
> source field and the destination field with no special character at
> all.
>
> For example, an original file named:
> /home/rick/file(name).txt
>
> is renamed as follows:
> /usr/bin/mv /home/rick/file$name$.txt /home/rick/filename.txt
>
> Here is an example of the input list creation and processing, we'll
> look for left parens and remove them altogether:
> find . -type f -name \*\(\*>> list
>
> awk '{ print $1, $1 }' list | while read a b; do
> echo $b | sed -e 's/(//g' | read c
> echo $a | sed -e 's/(/\\\\(/g' | read d
> echo /usr/bin/mv $d $c>> fix-it
> done

You must quote your variables (e.g. "$b" not $b) unless you have a very specific
reason not to and fully understand what you're doing.

The above could be re-written as:

awk '{ tgt=$1; gsub(/[(]/,"",tgt); printf "/usr/bin/mv \"%s\" \"%s\"\n", $1, tgt
}' list

> The fix-it script ends up with a bunch of move entries all escaped and
> ready to execute.
>
> I have all the special character substitutions solved except the
> apostrophe. I can get sed to remove the apostrophe from an input
> line. What I cannot figure out is how to get the escaped apostrophe
> to echo into the script of move commands. I have messed with every
> conceivable combination of escapes and quotes, any I can conceive
> anyway.
>
> I want to end up with a move command something like:
> /usr/bin/mv /home/rick/filename\'s.txt /home/rick/filenames.txt

No, you want something like:

/usr/bin/mv "/home/rick/filename's.txt" "/home/rick/filenames.txt"

>
> I can not get the \' to work.
>
> Thanks,
> Rick

awk '{ tgt=$1; gsub(/[(*\047]/,"",tgt); printf "/usr/bin/mv \"%s\" \"%s\"\n",
$1, tgt }' list

Just add whatever characters you want removed to the list inside the [...] of
the gsub() command. You'll want to add at least " to the list or handle it
otherwise. \047 is a special case - it represents a single ' within a
command-line awk script that's enclosed in 's.

Ed.

Rick

unread,

Feb 2, 2012, 12:20:34 PM2/2/12

to

Thanks for all the responses, learned something from all of them. I
implemented the suggestion by Ed as my first interest was to enclose
with quotes, somehow I got distracted with the escape thing. Also
discovered that you have to use nawk on Solaris as it's awk doen't
have gsub.

I've been a UNIX administrator for many years and can do 99% of my
automated work with simple C shell routines. I can get some pretty
tortured routines going, wish I had a job earlier in the career that
forced more scripting.

Thanks again,
Rick

Ed Morton

unread,

Feb 2, 2012, 5:36:55 PM2/2/12

to

Rick <rickre...@gmail.com> wrote:

> Thanks for all the responses, learned something from all of them. I
> implemented the suggestion by Ed as my first interest was to enclose
> with quotes, somehow I got distracted with the escape thing. Also
> discovered that you have to use nawk on Solaris as it's awk doen't
> have gsub.

That's not quite correct. Solaris comes with 3 awks and only one of them
(/usr/bin/awk) is tragically broken and inadequate (see
http://groups.google.com/group/comp.lang.awk/msg/d97448c38e830202).

You can happily use /usr/xpg4/bin/awk. I'd use that over nawk though in
reality I just have GNU awk (gawk) installed and use that religiously.

>
> I've been a UNIX administrator for many years and can do 99% of my
> automated work with simple C shell routines. I can get some pretty
> tortured routines going, wish I had a job earlier in the career that
> forced more scripting.

The tortured scripts are probably due to you using C shell. Do not write
scripts in C shell - google "why not C shell" or search the archives.

Ed.
>
> Thanks again,
> Rick

Posted using www.webuse.net

Geoff Clare

unread,

Feb 3, 2012, 8:23:45 AM2/3/12

to

Ed Morton wrote:

> Rick <rickre...@gmail.com> wrote:
>
>> Thanks for all the responses, learned something from all of them. I
>> implemented the suggestion by Ed as my first interest was to enclose
>> with quotes, somehow I got distracted with the escape thing. Also
>> discovered that you have to use nawk on Solaris as it's awk doen't
>> have gsub.

> You can happily use /usr/xpg4/bin/awk. I'd use that over nawk

Yes, /usr/xpg4/bin/awk has better standards conformance than nawk.
However, the way to use it is not to use that full pathname but
to ensure that the shell's PATH variable has /usr/xpg4/bin ahead
of /usr/bin (and /bin if it's there) and just use the name "awk".
That way you get conforming versions of all the POSIX/UNIX utilities,
not just awk.

--
Geoff Clare <net...@gclare.org.uk>

Kees Nuyt

unread,

Feb 3, 2012, 2:33:58 PM2/3/12

to

Just as a sidenote / spinoff:

Or use
getconf PATH
to retrieve the path to the POSIX compliant utilities in a
portable way (AFAICT).

This oneliner composes the PATH variable from getconf and $PATH
and deduplicates it (line wrapped by my newsreader):

PATH=$(printf "%s" "$(getconf PATH):$PATH"|awk
'BEGIN{RS=":"}{printf "%s",($0 in
x)?"":($0?(y==0?"":":")$0:"");x[$0]=1;y=1}')

Best regards,
--
( Kees Nuyt
)
c[_]

Eric_DL

unread,

Feb 4, 2012, 9:08:30 PM2/4/12

to

Hi all,

Based on the following considerations, my solution for this problem is a
bit different from Ed's one above :

- "awk" is far more powerful than "find" for regexp filtering
- it's far simpler to list the authorized characters than to parse through
all filenames searching for the forbidden ones

I chose not to create "list" file and instead to pipe "find" output
directly in "awk". I also considered that only files had to be renamed,
not directories, so I applied modifications only to the last field of each
line :

find . -type f | nawk 'BEGIN {FS = "/"; OFS = "/"; authorized =
"[^a-zA-Z0-9._-]"}; $NF ~ authorized {wrong = $0; gsub(authorized, "",
$NF); printf("/usr/bin/mv \"%s\" \"%s\"\n", wrong, $0)}'

I've listed the usual basic characters used for UNIX filenames, but you
can just update the list of authorized characters (between the brackets in
"authorized" variable declaration) to your convenience and "awk" will do
the rest.

NB 1: When updating the list, pay attention to keep "^" as first character
and "-" as last, because in other positions, they would have different
meanings in the regexp.

NB 2: If modifications also concern directories names, remove declarations
of "FS" and "OFS", include "/" inside the brackets of "authorized"
characters and replace all occurences of "$NF" by "$0".

NB 3: If your "/usr/xpg4/bin/awk" is up-to-date in POSIX compliancy, you
should be able to rewrite "authorized" variable declaration like this :
authorized = "[^[:print:]._-]" ([:print:] meaning all alphanumeric
characters)

Cheers
Eric

--
PM : http://www.webuse.net/pm.php?u=2832

Posted using www.webuse.net

Kaz Kylheku

unread,

Feb 4, 2012, 10:13:29 PM2/4/12

to

On 2012-02-05, Eric_DL <2832i...@webuse.net> wrote:
> - "awk" is far more powerful than "find" for regexp filtering
> - it's far simpler to list the authorized characters than to parse through
> all filenames searching for the forbidden ones

Also, looking for authorized characters rather than forbidden ones is more
secure. Suppose you're dealing with UTF-8 (but as raw bytes), which is destined
for some other program whose decoder has security issues. Suppose that a
particularly bad input is the word "delete" that you must not pass. To do it
100% safely (not trusting the handling of UTF-8 in other programs), you would
have to look not only for the ASCII string "delete" but also for all
combinations of invalid UTF-8 that could conceivably produce "delete".