On 29/10/2022 16:30, Bart wrote:
> On 29/10/2022 15:16, James Harris wrote:
>> On 29/10/2022 13:24, Bart wrote:
>>> On 29/10/2022 12:23, James Harris wrote:
>>
>>
>>>> I guess there would be these kinds of string argument:
>>>>
>>>> 1. Read-write string. Anything could be done to the string by the
>>>> callee. (Would have to be a real string.)
>>>>
>>>> 2. Read-write fixed-length string. The string's contents could be
>>>> altered but it could not be made longer or shorter. (Could be a real
>>>> string or a slice.)
>>>>
>>>> 3. Read-only string. Neither its length nor it contents could be
>>>> altered by the callee. (Could be a real string or a slice.)
>>>
>>> 4. Extensible string. This is not quite the same as your (1) which
>>> requires only a mutable string.
>>
>> You mean a string which can be made longer but the existing contents
>> could not be changed? I cannot think of a use case for that.
>
> That's a pattern I used all the time to incrementally build strings, for
> example to generate C or ASM source files from a language app.
>
> Or it can be as simple as this:
>
> errormess +:= " on line "+tostr(linenumber)
>
> Once extended, the existing parts of the string are never modified.
Good examples. The 'extend' permission seems a bit specific although I
accept that the uses you mention are common. I suppose it adds to the
security of the language to be able to designate a string as
extensible/inextensible separately from designating whether its existing
contents can be changed or not.
How would it be used? Thinking about functions which take a string as
input, most strings would be purely inputs. They would therefore be both
read-only and inextensible within the called function. Such arguments
could be strings or slices.
Further, functions which /return/ a string would create the string and
return it whole.
It is only functions which /modify/ a string, i.e. take it as an inout
parameter, where it would matter whether the string was read/write or
extensible. For an inout string what should be the defaults? If we say
an inout string defaults to immutable and inextensible then that would
lead to the following ways to specify a string, s, as a parameter:
f: function(s: inout string char)
f: function(s: inout string char rw)
f: function(s: inout string char ext rw)
f: function(s: inout string char ext)
Note the "ext" and "rw" attributes. The idea is that they would specify
how the string could be modified in the function. Adding rw would allow
the string's existing contents to be taken as read-write rather than
read-only. Adding ext would allow the string to be extended.
That's effectively me thinking out loud and trying out some ideas. How
does it look to you?
What about other permissions such as prepend, split, insert, delete,
etc? Perhaps it's too specific to have too many qualifiers although I
can see value in using such info to help match caller and callee. For
example, given the above one could say that as long as the callee
doesn't specify the string as ext then it could be either a string or a
slice. That is appealing from a security perspective.
That said, can a compiler ensure that a string is not used in a way
which breaks the contract indicated by its keywords? You raise some big
issues!
>
> Perhaps you can give an example of where mutating the characters of a
> string, extensible or otherwise, comes in useful.
I intend a string to be simply an array whose length can be changed. The
idea being that a program could have a string of integers, a string of
floats etc just as easily as having a string of characters. As such,
anything which changes the content of an array should also work on
strings. For example, one might want to sort an array in place. As a
string of characters one might want to convert lower case to upper case,
etc.
>
> (My strings generally are mutable, but it's not a feature I use a great
> deal.
>
> For applications like text editors, I use a list of strings, one per
> line. And editing within each line create a new string for each edit.
> Efficiency here is not critical, and the needs are diverse, like
> deleting within the string, or insertion. It's just easier to construct
> a new one.)
OK.
..
>>>
>>> (You might further split that into mutable/non-mutable extensible
>>> strings. Usually if growing a string by appending to it, you don't
>>> want to also alter existing parts of the string.)
>>
>> Mutable and extensible are good descriptions though as above I don't
>> yet see the value in allowing a string to be extensible but its
>> existing contents to be immutable.
>>
>> A slice would be inextensible but could be mutable or immutable, AISI.
>>
>>>
>>> (You probably need to consider Unicode strings too, especially if
>>> represented as UTF8, as the meaning of 'length' needs pinning down.)
>>
>> I haven't mentioned it but ATM my chars are 32-bit and any 32-bit
>> value can be stored in them, including zero. It also means there's no
>> way to reserve a value for EOF so that condition has to be handled a
>> different way from what C programmers are used to where EOF is a value
>> which is outside the range permitted for chars. Challenges a plenty!
>
> But you're not using all 2**32 bit patterns? It could reserve -1 or all
> 1s for EOF just like C does. Because EOF would generally be used for
> character-at-a-time streaming, which is typically 8-bit anyway.
As above, the language is meant to treat strings as arrays. So AISI it
should not ascribe any particular meaning to their contents.
There are other ways. For example, my plan for EOF is twofold:
1. to have it as an attribute of a file object
2. to have an attempt to read at EOF throw a weak exception which would
be a catchable way to end an iteration.
>
> Or have you developed a binary file system which works with 32-bit-wide
> 'bytes'?
No, my system is nothing like that advanced. At present all bytes
(octets) I read from disk are zero extended to 32 bits. And all chars I
write to disk have their top 24 zero bits chopped off. Though please
don't think that's by design. It's only a temporary measure while I get
the compiler up and running properly. (The compiler and the compilable
language are, at present, rather limited.) In the long term IO streams
should be via typed channels where chars of octets (or some other size)
could be handled natively.
--
James Harris