e.g. \342\200\231
as in: can\342\200\231t or don\342\200\231t
is there a way to replace these with their ascii equivalent
from the shell with sed, perl or awk?
Thanks.
I fear there might not be an ASCII equivalent if some encoding
of a different character set has been used here instead of ASCII.
You'll have to find out what encoding has been used in the first
place. Then the program iconv may help you converting the data.
Janis
> Thanks.
The example is a useful one. \342\200\231 is the UTF-8 encoding of a
"right single quote" which Unicode recommends as the character to use
for an apostrophe. It is therefore very likely that the file is UTF-8
encoded.
When you say the file contains octal characters it is not clear if you
are showing us the octal values for the characters or whether the file
really has the backslash followed by the three digits. In other words,
does \342\200\231 represent 3 or 12 octets?
If (as is likely) it is the former then iconv (with //translit) is the
place to start. You may run into trouble when there are characters in
the file that have no obvious ASCII equivalent, but that is another
problem.
iconv --from=utf-8 --to=ascii//translit my-input-file
--
Ben.