Of course, at a certain point in time, any digital content is a matter
of bytes. That's not what is discussed here.
The approach Python choose is to push for character conversion happening
in process boundaries, that is at input and output time. When you get
some string input, you have to know (or guess) the encoding and the idea
is to immediately convert to Unicode. Then during the whole string
lifetime in your program, it is Unicode (Python 3 str type). Then, at
some point you have to produce some outpout, and that's the time to
convert back to bytes with the expected encoding from the output
consumer side.
This simplify things *a lot* compared to the Python 2 world when you
never knew if you had to manipulate pure bytes or unicode, and had to
constantly test content in many parts of your code, as you can see in
ReportLab with the many isStr, isBytes, isUnicode, asNative, etc. uses
throughout the code base. I don't despise that, it was a "normal"
consequence of string status on Python 2.
> If python said it was abandoning byte strings then that would be a
> reason to drop all support for them. That would really annoy the gene
> analysts though :)
This won't happen. Bytes, be it strings or any other content type has
legitimate use cases, of course.
> I don't think I would like to apply this patch anytime soon. If others
> have an opinion please speak up.
I totally respect your maintainer choice. It was a (first-step) proposal
in order to simplify string handling and to also improve performances by
less function calls. I'm not angry if you refuse it, we can agree to
disagree :-)
Regards,
Claude
--
www.2xlibre.net
I don't think I would like to apply this patch anytime soon. If others have an opinion please speak up.
No, the isUnicode check would force text input to be Unicode (a normal
Python string). The encoding parameter should be deprecated/removed at
some point.
So instead of String(b'd\xe9j\xe0', encoding='latin-1'), users should
pass String(b'd\xe9j\xe0'.decode('latin-1')).
Claude
--
www.2xlibre.net