broadcasting request to LIMY Scheme maintainers: char-infty

Marco Maggi

unread,

Jun 12, 2009, 9:59:34 AM6/12/09

to

This is a request to the Larceny, Ikarus, Mosh, Ypsilon
(LIMY) maintainers. When dealing with ranges of values, it
is useful to model data with half-open intervals:

[included-left, excluded-right)
(excluded-left, included-right]

it is easy to represent a half-open interval that comprises
all the reals up to/down to infinity: Just use "+inf.0" and
"-inf.0" as open bounds.

Characters have no such values, so they push towards a
closed range model. I do not like this. So I am asking to
add a couple of values that act as infinities for
characters.

TIA
--
Marco Maggi

marcomaggi

unread,

Jun 12, 2009, 10:33:47 AM6/12/09

to

On Jun 12, 3:59 pm, Marco Maggi <mrc....@gmail.com> wrote:
> So I am asking to add a couple of values that act
> as infinities for characters.

The easiest way is probably to add a library, let's call it (char-
bounds),
that defines extended char predicates and adds two bindings, one for
char-overflow and one for char-underflow. So that:

(import (rnrs)
(char-bounds))

(char<*? char-underflow #\a) => #t
(char<*? #\a char-overflow) => #t
(char*? char-underflow) => #t
(char*? char-overflow) => #t

of course everyone can write such library, but it is not clear how
to portably define new values for char-underflow and char-overflow
that do not conflict with already existent values; instead, it is easy
for an implementation to reserve an uninterned symbol or something
like that.

Sam TH

unread,

Jun 12, 2009, 10:59:54 AM6/12/09

to

On Jun 12, 10:33 am, marcomaggi <mrc....@gmail.com> wrote:

> of course everyone can write such library, but it is not clear how
> to portably define new values for char-underflow and char-overflow
> that do not conflict with already existent values; instead, it is easy
> for an implementation to reserve an uninterned symbol or something
> like that.

This is exactly what structs (in particular, opaque ones) are for.

sam th

marcomaggi

unread,

Jun 12, 2009, 11:34:30 AM6/12/09

to

Do you mean R6RS's nongenerative records?

Sam TH

unread,

Jun 12, 2009, 12:26:02 PM6/12/09

to

No, the generative ones. It's important that they be generative so
that they are unforgeable.

sam th

marcomaggi

unread,

Jun 12, 2009, 2:37:16 PM6/12/09

to

On Jun 12, 6:26 pm, Sam TH <sam...@gmail.com> wrote:
> > > This is exactly what structs (in particular, opaque ones) are for.
>
> > > sam th
>
> > Do you mean R6RS's nongenerative records?
>
> No, the generative ones. It's important that they be generative so
> that they are unforgeable.
>
> sam th

I give up. I will try to understand records another day. What I meant
is
that, for example, Ypsilon makes use of "uintptr_t" as internal
representation
of characters; changing it to "intptr_t" should allow it to accept, as
internal
representation, the special values -1 and 1+#x10FFFF which are the
char-underflow and char-overflow; with this change all the predicates
automatically work; what is left is to make a (char-bounds) library
exporting char-underflow and char-overflow bindings.
Of course, integer->char still rejects numbers outside the range [0,
#x10FF].

Sam TH

unread,

Jun 12, 2009, 4:41:58 PM6/12/09

to

If what you want is a user implemented library, then the internal
representation of Ypsilon doesn't matter. If what you want is to
change the primitive behavior of R6RS characters, then what is the
library suggestion for?

sam th

marcomaggi

unread,

Jun 13, 2009, 2:22:48 AM6/13/09

to

On Jun 12, 10:41 pm, Sam TH <sam...@gmail.com> wrote:
> If what you want is a user implemented library, then the internal
> representation of Ypsilon doesn't matter. If what you want is to
> change the primitive behavior of R6RS characters, then what is the
> library suggestion for?

Sorry, I explained myself unclearly. What I would like is to let
everything work as it is now when we just import (rnrs). And
I would like the feature of general half-open ranges to be available
by just importing (char-bounds); that is, the only way to have the
two special characters is to access their values predefined in
the (char-bounds) library.

The small change in the internal representation and a small
change in the char->integer function (which should reject
the special characters) should be the only needed things. I am
asking for this change because, if the implementations work
as I understand or guess, the change is very small and a small
number of tests is enough to verify its correctness.

My post on the custom library was just to say that I know the
feature can be added by a custom library, and a custom library
can be used to partly prototype the change. I messed up my
explanation there.

William D Clinger

unread,

Jun 13, 2009, 6:42:23 AM6/13/09

to

The simplest way to do this is not to add two new characters,
but to subtract two existing characters by defining the least
and greatest characters as your infinities:

(define char-negative-infinity #\x0)
(define char-positive-infinity #\x10ffff)

Those two definitions will give you a close analogue of -inf.0
and +inf.0, which are regarded as reals by both the IEEE-754
standard and by R6RS Scheme and also, for all non-NaN reals x,
satisfy (<= -inf.0 x +inf.0).

Will

Sam TH

unread,

Jun 13, 2009, 9:10:22 AM6/13/09

to

On Jun 13, 2:22 am, marcomaggi <mrc....@gmail.com> wrote:

> My post on the custom library was just to say that I know the
> feature can be added by a custom library, and a custom library
> can be used to partly prototype the change. I messed up my
> explanation there.

But if it can be added as a library, why not do it that way? Then
everyone would have a portable implementation, and no one would have
to change the core.

Alternatively, Will's suggestion is by far the simplest in all
situations when you don't need the null character or Unicode character
10FFFF, which I think is in the "private use range", suggesting that
it's perfect for this use.

sam th

marcomaggi

unread,

Jun 13, 2009, 9:40:53 AM6/13/09

to

First (damn!) I forgot the forbidden range [#xD800, #xDFFF], which
also needs to be taken into account.

On Jun 13, 3:10 pm, Sam TH <sam...@gmail.com> wrote:
> But if it can be added as a library, why not do it that way? Then
> everyone would have a portable implementation, and no one would have
> to change the core.

Because the change is simple and the resulting code would be faster.

> Alternatively, Will's suggestion is by far the simplest in all
> situations when you don't need the null character or Unicode character
> 10FFFF, which I think is in the "private use range", suggesting that
> it's perfect for this use.

Yes. This is what I do now in my rewriting of the char-set library
from the SRFI: It is a two-level library whose lower layer handles
lists like:

((start1 . past1) (start2 . past2) (start3 . past3) ...)

with the constraints:

start1 < past1 < start2 < past2 < start3 < past3 < ...

this layer abstracts the type of the items to any type for which we
can define =, <, <=, next-of. In the upper, char-specific, layer I am
using #x10FFFF as the upper bound, but it seems incorrect to
exclude a character (or two if I exclude #xD800-1) because of an
implementation issue.

Besides, the implementation with half-open ranges seems to me the
most usable and natural to think about.