> "Byte" is the only viable alternative, but that leaves the burden of
> counting bytes on the user.
If you use a very long sed script, this can be a problem.
I happen to write sed scripts that are more than 1000 characters long.
My last one is currently 1762 characters long and still growing.
That's why I wrote a little bash function which would show me
the n'th character. but this gave wrong positions on scripts with
UTF-8 chars.
The work-around is to change LC_CTYPE to C around the string
processing part in my bash function, but, I believe that since sed
supports multibytes characters, the error message should count
characters and not bytes. btw : the error message states that the
position is "char" and not "byte" :
> How do you expect Sed to know what character set is being used for the
> command line? Are we again going to limit ourselves to the current
> locale's charset?
John Cowan wrote
> I think it is a reasonable assumption that the command line uses the
> same encoding that is used for the input files.
Eli Zaretskii wrote:
> Yes, mostly. But how do you know what is the encoding of the input
> files?
What are you talking about !!!
This is neither about current character set nor about the processed file.
The only way to do this is to have sed respect the values of the current locales
variables LANG and LC_* which can be changed according to the needs.
thats what every localized program do.