Character codes for different spaces?

2 views
Skip to first unread message

Ron Kaplan

unread,
Mar 12, 2025, 5:54:59 PMMar 12
to Interlisp
In working through the various places that keys are bound to Tedit actions, I came across a list of space mappings:

(MSPACE 153) =231Q emquad?
(NSPACE 152) =230Q enquad?
(THINSPACE 159) =237Q
(FIGSPACE 154) =232Q

These are not legal XCCS codes, and they don't appear to be Alto-font codes. Any idea of where they might be defined, and what kinds of documents they might appear in?

Tedit seems just to mark these spaces as TEXT characters so that they are passed over when looking for word boundaries to operate on.

(Also, 159 is one of the interrupt codes that i use in wheelscroll--a separate set of constraints.)


Matt Heffron

unread,
Mar 12, 2025, 6:43:21 PMMar 12
to Medley Interlisp core
They appear to be Press character codes. 
In the file PRESS, in the function \PRESS.CONVERT.NSCHARACTER, three of these four spaces are generated from the corresponding NS character codes (along with some others). 
The FIGSPACE is not handled here.

Ron Kaplan

unread,
Mar 12, 2025, 8:09:21 PMMar 12
to Matt Heffron, Medley Interlisp core
Interesting.  If that's where they originated, there doesn't seem to be any reason at all for Tedit to know about them.

It may be reasonable to suppress word-breaking around those spaces, but then they should be assigned their XCCS codes (or eventually Unicode).

--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lispcore/055c5a18-675c-4803-a0b7-22551a37d6b5n%40googlegroups.com.

Ron Kaplan

unread,
Mar 12, 2025, 8:10:19 PMMar 12
to John Cowan, Interlisp
Yes, but the question, is why are those conceptual characters associated with those particular numbers.

On Mar 12, 2025, at 4:49 PM, John Cowan <co...@ccil.org> wrote:

They have fixed widths relative to the font size: one em, one en (usually half an em), the space between adjacent quotation marks (usually 1/5 or 1/6 of an em), and the width of a European digit.  They correspond to U+2003, U+2002,  U+2009, and U+2007 respectively.  They also have XCCS codes. See WP articles for details.

--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.

Nick Briggs

unread,
Mar 12, 2025, 9:27:29 PMMar 12
to Ron Kaplan, Matt Heffron, Lisp Core
If you do a FONTSAMPLER and look at, say, TimesRoman 12, with

(FontSample (FONTCREATE 'TIMESROMAN 12) 0 NIL 'DISPLAY)

you'll see (actually, you wont, since they're spaces...)  those positions filled with the characters described.

As Matt points out --

(\PRESS.CONVERT.NSCHARACTER

  [LAMBDA (CHARCODE)                                         (* jds " 4-Nov-85 08:02")

          

          (* Provide backward compatibility for extended-language characters in the PRESS 

          printing environment. Converts certain of the NS characters into their 

          equivalent PARC-internal charcodes)


    (SELCHARQ CHARCODE

         (357,55                                             (* em quad)

                 153)

         (357,54                                             (* en quad)

                 152)

         (357,57                                             (* Thin space)

                 159)

         (357,44                                             (* en dash / figure dash)

                 155)

         (357,45                                             (* em dash)

                 156)

         (357,146                                            (* bullet)

                  183)

         (0,251                                              (* left single quote)

                96)

         (0,271                                              (* right single quote)

                39)

         (\CHAR8CODE CHARCODE])

)


If TEDIT is working with an Alto/Press font on the display, does it need to know about those particular characters?


Ron Kaplan

unread,
Mar 12, 2025, 11:01:25 PMMar 12
to Nick Briggs, Matt Heffron, Lisp Core
Those characters were assigned different mappings in the font description in the old Bravo documentation.  I use the \ASCII2XCCSMAP in INTERPRESS to fix when Tedit coerces Ascii-font characters to NS encodings, and that maps (most of) these the way that Press does it.  So there is still a little confusion.

I think I will fix up the Interpress table (maybe with multiple mappings in one direction) to bridge, and then use the XCCS names and codes for these characters, and base Tedit word-breaking on the XCCS codes.  An unconverted Timesroman Tedit file just won't select, move over, or delete the same "words".

John Cowan

unread,
Mar 14, 2025, 7:13:40 PMMar 14
to Ron Kaplan, Interlisp
They have fixed widths relative to the font size: one em, one en (usually half an em), the space between adjacent quotation marks (usually 1/5 or 1/6 of an em), and the width of a European digit.  They correspond to U+2003, U+2002,  U+2009, and U+2007 respectively.  They also have XCCS codes. See WP articles for details.

On Wed, Mar 12, 2025, 5:54 PM Ron Kaplan <ron.k...@post.harvard.edu> wrote:
--
You received this message because you are subscribed to the Google Groups "Medley Interlisp core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lispcore+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages