Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Back to character set implementation thinking
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Stephan H.M.J. Houben  
View profile  
 More options Mar 30 2002, 4:40 am
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: steph...@wsan03.win.tue.nl (Stephan H.M.J. Houben)
Date: 30 Mar 2002 09:38:30 GMT
Local: Sat, Mar 30 2002 4:38 am
Subject: Re: Back to character set implementation thinking

In article <usn6kh477....@globalgraphics.com>, Pekka P. Pirinen wrote:
>> Basically then we would have strings which are UCS-4, UCS-2 and
>> Latin-1 restricted (internally, not visibly to users). [...]
>> Procedures like string-set! therefore might have to inflate (and
>> thus copy) the entire string if a value outside the range is stored.
>> But that's ok with me; I don't think it's a serious lose.

>I suppose that is a viable implementation strategy, but I don't think
>it's the right option.  The language should expose the range of string
>data types to the programmer, and let them choose, because the range
>of memory usage is just too great to sweep under the mat.  Also,
>having strings automatically reallocated means an extra indirection
>for access which cannot always be optimized away.

If you have more than one string type anyway, then you can have
both directly and indirectly represented strings. It is then
possible to arrange that any directly represented string can
be replaced with an indirectly represented string. Then,
arrange for the garbage collector to remove all indirections.

Again, this is not that more complex once you have decided to
go for multiple string types anyway. Moreover, it is
completely transparent to the programmer and it can provide
other useful features, e.g. growing of strings. Indeed, it is
even possible for the implementation to dynamically decide to
overallocate storage once a string has been grown, so that
naively building a string character-by-character will be
O(n).

all this adds implementation complexity, but it makes string handling
much easier on the programmer.

To go even further: one could provide lazy string copying with
copy-on-write, optimised string concatenation in which
substrings are shared, and since the OP wants to replace files
by strings, he could even consider to have the GC dynamically
compress and uncompress large strings.

OK, this is really overengineered, but anyway...

Greetings,

Stephan

>I note that offering multiple string types is exactly what all the CL
>implementations seem to have done.  This doesn't preclude having
>features that automatically select the smallest feasible type, e.g.,
>for "" read syntax or a STRING-APPEND function.
>--
>Pekka P. Pirinen
>The gap between theory and practice is bigger in practice than in theory.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.