Hi Joachim,
in Smalltalk, an (almost) ideal world of objects, we tend to forget the real world: the CFS stream hierarchy is based on Strings.
Historically this hierarchy already had to be duplicated (CfsLeadEncodedFileStream) to understand DBStrings (I do not go back further, display code was 6 bit, ASCII 7 bit and EBCDIC was the first 8 bit = byte code).
Now given UnicodeStrings to support UTF(8, 16, ...) such a (third) hierarchy does not exist yet.
To stay philosophical, this reminds me to halt and think about a redesign instead of duplicating
s.th. again without a profound argument before proceeding to do so.
The point to be solved is marshalling, that is converting external world (files of
s.th.) representations into the image (streams of s.th.).
Streams initially are already "of
s.th.", alread object oriented, that is is conceptually the design of stream is clean.
The design of filestreams closely related to those streams however is not clean, as this information "of what" to be provided by the external world is missing. We have to open a file of what?
Traditionally (see open dialogs of several programs) this missing information to be able to open is provided from somewhere else, sometimes
even
hidden in the content of a file (BOM) or left to trial and error, or inspection on the fly, while reading (in most of the editors).
In Smalltalk, this missing information however is to be hard coded (see CfsFileStream>>#initialize, String new: buffersize).
The decision to use DBStrings (via CfsLeadEncodedFileStream) was historically delegated to the locale, so it was traditionally configured in the outside.
And there is even a rarely known third variant, concerning bytes and characters, when using streams around the
CfsFileStream>>#isBytes: protocol (to reflect the historicial difference of binary vs. text files representations, see 6, 7, 8 bit bytes I mentioned earlier).
Now all of this became insufficent. To stay competitive, this cannot be simply made configurable again as it was in the DBString case based on a locale.
We already have now large characters (larger than bytes) under Windows and that has to be extended again to support UTF.
Besides any extension has to be carefully inspected concerning ANSI, which implies rules on classes like streams.
My conceptual idea is
s.th. like ReadStream on: UnicodeString new, which provides the missing information "of. s.th.)".
This is going to become complex.
My experiment, simply replacing (in the CfsFileStream) the hardcoded
CfsFileStream>>#initialize, String new: buffersize
by
CfsFileStream>>#initialize, UnicodeString new: buffersize
causes recursive walkbacks, as the whole underlying character support around locals in turn depends on streams, of the old style.
Kind regards
M