Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
case-sensitivity and identifiers (was Re: Wide character implementation)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 126 - 150 of 160 - Collapse all  -  Translate all to Translated (View all originals) < Older  Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Brian Spilsbury  
View profile  
 More options Mar 29 2002, 2:02 pm
Newsgroups: comp.lang.lisp
From: br...@designix.com.au (Brian Spilsbury)
Date: 29 Mar 2002 11:02:02 -0800
Local: Fri, Mar 29 2002 2:02 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

No, kanji are not used for syntax, syntax in japanese is mediated via
hiragana/katakana modifiers, and particles (also represented in
hiragana, although 'wo' is distinct, and the particle 'ha' is
pronounced 'wa'.

Kanji give direct semantic forms. These semantic forms are distinct
from any prononciation, and the pronounciation of a particular
sequence of kanji is determined by the phonetic modifiers trailing it,
and/or the combination of a sequence of kanji though the On (chinese
derived) or Kun (japanese) readings which are not mixed in a given
sequence [a bit like composing latin with latin, and greek with
greek].

You might think of kanji as giving a particular root form.

[watakushi]-ha [ni][hon][go]-wo [ben][kyou] shite-imashita.
[I]-topic [japanese][language]-object [study] doing-was.

[boku]-ga basu-de [kou][kou]-e [i](ki)mashita.
[I]-subject bus-by [highschool]-to [go](phonetic kanji
modifier)-past-tense.

the [bracketed] forms are kanji in these examples.

my japanese is a bit rusty, so please excuse any error.

as a side note, a study found that quite different areas of the brain
are used to process the hiragana/katakana forms and the kanji forms,
and a different study found that reasonably severely dyslexic american
(english speaking) children were able to learn several hundred chinese
characters without undue difficulty, although they were unable to read
roman characters.

as a final note, a certain jesuit missionary declared that the
japanese written language was designed by the devil, and I think that
anyone who is familiar with it would be inclined to agree. ^^


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Blume  
View profile   Translate to Translated (View Original)
 More options Mar 29 2002, 3:54 pm
Newsgroups: comp.lang.lisp
From: Matthias Blume <matth...@shimizu-blume.com>
Date: 29 Mar 2002 15:52:31 -0500
Local: Fri, Mar 29 2002 3:52 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Just in case you care what Brian's examples looks like when rendered
in actual Kanji and hiragana (requires sufficient MIME support in
you newsreader):

> [watakushi]-ha [ni][hon][go]-wo [ben][kyou] shite-imashita.
> [I]-topic [japanese][language]-object [study] doing-was.

pure hiragana: わたくしにほんごをべんきょうしていました。
with kanji:    私は日本語を勉強していました。

> [boku]-ga basu-de [kou][kou]-e [i](ki)mashita.
> [I]-subject bus-by [highschool]-to [go](phonetic kanji
> modifier)-past-tense.

pure hiragana: ぼくがばすでこうこうへいきました。
with kanji:    僕がバスで高校へ行きました。

マティアス


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Blume  
View profile   Translate to Translated (View Original)
 More options Mar 29 2002, 4:14 pm
Newsgroups: comp.lang.lisp
From: Matthias Blume <matth...@shimizu-blume.com>
Date: 29 Mar 2002 16:05:59 -0500
Local: Fri, Mar 29 2002 4:05 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Matthias Blume <matth...@shimizu-blume.com> writes:
> Just in case you care what Brian's examples looks like when rendered
> in actual Kanji and hiragana (requires sufficient MIME support in
> you newsreader):

> > [watakushi]-ha [ni][hon][go]-wo [ben][kyou] shite-imashita.
> > [I]-topic [japanese][language]-object [study] doing-was.

> pure hiragana: わたくしにほんごをべんきょうしていました。

Oops, typo.  Should have been:

                 わたくしはにほんごをべんきょうしていました。

> > [boku]-ga basu-de [kou][kou]-e [i](ki)mashita.
> > [I]-subject bus-by [highschool]-to [go](phonetic kanji
> > modifier)-past-tense.

> pure hiragana: ぼくがばすでこうこうへいきました。
> with kanji:    僕がバスで高校へ行きました。

By the way, notice how "bus" (basu) is spelled in _katakana_.

Anyway, this is perhaps getting a bit off-topic... :-)

Matthias


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Torsten  
View profile  
 More options Mar 29 2002, 7:11 pm
Newsgroups: comp.lang.lisp
From: Torsten <vi...@fraqz.archeron.dk>
Date: Sat, 30 Mar 2002 00:11:03 +0000 (UTC)
Local: Fri, Mar 29 2002 7:11 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Nils Goesche <n...@cartan.de> wrote:
> I don't know Danish.  Maybe it doesn't make a difference there.
> Maybe Danes only used to capitalize nouns because the Germans
> did, and as the Germans weren't exactly very popular in 1948,
> that might have been a good opportunity to give up on it.

The capitalization tradition had the same origin as the German
one. It started as a fad among printers back when printing was a
relatively new craft. It is also true that the lack of popularity
of anything German in the late forties made it possible to
pass the bill that put the spelling reform into effect, but it
wasn't the reason for doing it, it just provided the necessary
leverage in the general public. The idea goes back to the 19th
century, but was for a long time met with scorn, partly due to
inertia and conservatism and partly due to skepticism. Why change
something that has worked well for centuries? Well, it hadn't
worked well. Many people couldn't figure out which words should
be capitalized. That was the real reason for the reform. The
main arguments against it invariably consisted of examples, like
the ones you gave in German, where the capitalization eliminates
ambiguity. In isolation, that was - and is - correct, but what
the critics overlooked is that sentences don't occur in total
isolation. They are always part of a larger conversational
context, a discourse, and that is what implicitly removes the
ambiguity.

> Or maybe it wasn't. Who is supposed to know anymore? You said
> there was a controversy about it; maybe the people who were
> against it were right? Who would remember? How could you tell?

The discussions ended decades ago when people realized that
nothing really had been lost and the new system made it easier
for people with a more modest formal knowledge of grammar to
write in a way that reasonably conforms to the official norm. A
net gain for everybody. And, even today, there are still quite a
few people around who originally learned the old system.

I think we should end this subthread new as it has nothing to do
with Lisp anymore.

--
Torsten


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nils Goesche  
View profile  
 More options Mar 29 2002, 7:28 pm
Newsgroups: comp.lang.lisp
From: Nils Goesche <n...@cartan.de>
Date: 30 Mar 2002 01:28:55 +0100
Local: Fri, Mar 29 2002 7:28 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Torsten <vi...@fraqz.archeron.dk> writes:
> I think we should end this subthread new as it has nothing to do
> with Lisp anymore.

So be it.  I disagree of course, but so what.  I've made my
point, you've made your point, everything is clear.

Regards,
--
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID 0xC66D6E6F


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hartmann Schaffer  
View profile  
 More options Mar 29 2002, 11:53 pm
Newsgroups: comp.lang.lisp
From: h...@heaven.nirvananet (Hartmann Schaffer)
Date: 29 Mar 2002 23:53:37 -0500
Local: Fri, Mar 29 2002 11:53 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
In article <a81d1j$25j...@news.cybercity.dk>,
        Torsten <vi...@fraqz.archeron.dk> writes:

i remember a feuilleton article/editorial in the frankfurter
allgemeine quite a while ago (around 1970).  it was during one of the
ever returning discussions in the german language area about a reform
of the capitalisation rules and their reform.  in the article, the
author suggested that they should not be simplified, since some of the
more obscure rules would help distinguish between netter and less well
educated persons.  iirc, he didn't forward his position tongue in
cheek

hs

--

don't use malice as an explanation when stupidity suffices


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hartmann Schaffer  
View profile  
 More options Mar 30 2002, 12:02 am
Newsgroups: comp.lang.lisp
From: h...@heaven.nirvananet (Hartmann Schaffer)
Date: 30 Mar 2002 00:01:44 -0500
Local: Sat, Mar 30 2002 12:01 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
In article <fo3cyjfm6j....@trex10.cs.bell-labs.com>,
        Matthias Blume <matth...@shimizu-blume.com> writes:

> ...
> There are on the order of 50 hiragana, but there are several thousands
> of Kanji -- which means that learing just hiragana is immensely easier
> than learning both.  According to the above, one would expect that
> someone without prior exposure to either system would have an easier
> time reading pure hiragana text.

> I, having not been raised in Japan, fall into this category of having
> no prior exposure.  But what can I tell you?  The moment I managed to
> memorize even just a tiny number of Kanji, sentences that actually
> used them (in place of their hiragana spellings) became *vastly*
> easier to read for me.  I am not a psychologist or linguist, so I
> won't speculate on why that is.

does japanes have many homophones?  from what i remember having read a
while ago, the kanji characters quite often are taken over from
chinese to designate the japanese word for the chinese word the
character was developed for.  if the language is rich in homophones,
this would help distinguish between identically sounding words with
totally different meanings

hs

--

don't use malice as an explanation when stupidity suffices


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Back to character set implementation thinking" by Brian Spilsbury
Brian Spilsbury  
View profile  
 More options Mar 30 2002, 3:42 am
Newsgroups: comp.lang.lisp
From: br...@designix.com.au (Brian Spilsbury)
Date: 30 Mar 2002 00:42:40 -0800
Local: Sat, Mar 30 2002 3:42 am
Subject: Re: Back to character set implementation thinking

I think that this approach separates things which do not require it.

If we view a string as a sequence rather than a vector, I believe that
most of these problems evaporate.

A sequence contains things which have both
vector-access-characteristics and list-access-characteristics.

The problem is that sequences in CL have relatively poor iteration
support.

One of the more complex things that we might want to do with a string
is to tokenise it.

(let ((last-point nil))
  (dosequence (char point string)
    (when (char= char #\,)
       (if last-point
           (collect (subseq string :start-point last-point :end-point
point))
           (setq last-point point)))))

for a half-baked example, to break up a string into a list of comma
delimited strings.

The key here is the ability to access a sequence from a stored point
in the sequence, and to use these points to delimit sequence actions.

Given this a string can easily have either kind of substrate - a
random access, or linear access implementation, and this behaviour
extends naturally to lists.

There are some issues with points and the mutation of the string, as
well as the usable life-time of the points, but I think that these can
be addressed with some thought.

This also does not preclude the (expensive) random access of a
variable-width character string, and would also tie into the lazy
construction of sequences (whereby you might deal with a file as a
lazy sequence, something like a lisp version of mmap).

Anyhow, given that variable-width-character strings would tend to be
immutable (or perhaps extensible and truncatable) points should have
few problems there. I don't see any issues with points into lists
either.

Regards,

Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "case-sensitivity and identifiers (was Re: Wide character implementation)" by Torsten
Torsten  
View profile  
 More options Mar 30 2002, 4:37 am
Newsgroups: comp.lang.lisp
From: Torsten <vi...@fraqz.archeron.dk>
Date: Sat, 30 Mar 2002 09:37:28 +0000 (UTC)
Local: Sat, Mar 30 2002 4:37 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Nils Goesche <n...@cartan.de> wrote:
> So be it.  I disagree of course, but so what.  I've made my
> point, you've made your point, everything is clear.

Yup :)

Have fun,
--
Torsten


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Back to character set implementation thinking" by Stephan H.M.J. Houben
Stephan H.M.J. Houben  
View profile  
 More options Mar 30 2002, 4:40 am
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: steph...@wsan03.win.tue.nl (Stephan H.M.J. Houben)
Date: 30 Mar 2002 09:38:30 GMT
Local: Sat, Mar 30 2002 4:38 am
Subject: Re: Back to character set implementation thinking

In article <usn6kh477....@globalgraphics.com>, Pekka P. Pirinen wrote:
>> Basically then we would have strings which are UCS-4, UCS-2 and
>> Latin-1 restricted (internally, not visibly to users). [...]
>> Procedures like string-set! therefore might have to inflate (and
>> thus copy) the entire string if a value outside the range is stored.
>> But that's ok with me; I don't think it's a serious lose.

>I suppose that is a viable implementation strategy, but I don't think
>it's the right option.  The language should expose the range of string
>data types to the programmer, and let them choose, because the range
>of memory usage is just too great to sweep under the mat.  Also,
>having strings automatically reallocated means an extra indirection
>for access which cannot always be optimized away.

If you have more than one string type anyway, then you can have
both directly and indirectly represented strings. It is then
possible to arrange that any directly represented string can
be replaced with an indirectly represented string. Then,
arrange for the garbage collector to remove all indirections.

Again, this is not that more complex once you have decided to
go for multiple string types anyway. Moreover, it is
completely transparent to the programmer and it can provide
other useful features, e.g. growing of strings. Indeed, it is
even possible for the implementation to dynamically decide to
overallocate storage once a string has been grown, so that
naively building a string character-by-character will be
O(n).

all this adds implementation complexity, but it makes string handling
much easier on the programmer.

To go even further: one could provide lazy string copying with
copy-on-write, optimised string concatenation in which
substrings are shared, and since the OP wants to replace files
by strings, he could even consider to have the GC dynamically
compress and uncompress large strings.

OK, this is really overengineered, but anyway...

Greetings,

Stephan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 30 2002, 8:16 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Sat, 30 Mar 2002 13:12:52 GMT
Local: Sat, Mar 30 2002 8:12 am
Subject: Re: Back to character set implementation thinking
* Brian Spilsbury
| I think that this approach separates things which do not require it.
|
| If we view a string as a sequence rather than a vector, I believe that
| most of these problems evaporate.

  I think we have a terminological problem here.  What you call a sequence
  is not the Common Lisp concept of "sequence" since all of list, string,
  vector are sequences.  I think you mean something very close to what I
  mean by stream-string with your non-Common Lisp "sequence" concept.

| A sequence contains things which have both vector-access-characteristics
| and list-access-characteristics.

  This would also a new invention because this is currently foreign to
  Common Lisp.  What I _think_ you mean is very close to what I have tried
  to explain in (more) Common Lisp terminology.

| The problem is that sequences in CL have relatively poor iteration
| support.

  Well, there is nothing in Common Lisp that has both O(1) and O(n) access
  characteristics, and nothing in Common Lisp that has both support for
  random access and sequential access.  I propose that stream-string
  support sequential access and string remaining the random access.

| One of the more complex things that we might want to do with a string is
| to tokenise it.

  Precisely, but this is a problem that has many different kinds of
  solutions, not just one.

| (let ((last-point nil))
|   (dosequence (char point string)
|     (when (char= char #\,)
|        (if last-point
|            (collect (subseq string :start-point last-point :end-point
| point))
|            (setq last-point point)))))
|
| for a half-baked example, to break up a string into a list of comma
| delimited strings.

  I prefer a design that has an opaque mark in a stream-string iterator,
  but this should also be in regular streams.  Extracting the string
  between mark and point (in Emacs terminology) may re-establish some
  context in the new string if it is merely a sub-stream-string, but could
  also copy characters into a string (vector).

| The key here is the ability to access a sequence from a stored point in
| the sequence, and to use these points to delimit sequence actions.

  I think the key is that you do not want the string itself to know
  anything about how it is being read sequentially, but a simple pointer
  into the string is not enough.  (C has certainly shown us the folly of
  such a design.)  Specifically, I want a stream-string ot be processed
  both with read-byte and read-char.

| Given this a string can easily have either kind of substrate - a random
| access, or linear access implementation, and this behaviour extends
| naturally to lists.

  Well, I have implemented a few processors for weird and stateful
  encodings, and I can tell you that it is not easily done.

| This also does not preclude the (expensive) random access of a
| variable-width character string, and would also tie into the lazy
| construction of sequences (whereby you might deal with a file as a
| lazy sequence, something like a lisp version of mmap).

  I think random access into a variable-width string is simply wrong, like
  using nth to do more than grab exactly one element of a list.

| Anyhow, given that variable-width-character strings would tend to be
| immutable (or perhaps extensible and truncatable) points should have few
| problems there.  I don't see any issues with points into lists either.

  Except that you generally need quite a lot of state, which a stream
  implementation would be fully able to support for you.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Spilsbury  
View profile  
 More options Mar 30 2002, 1:56 pm
Newsgroups: comp.lang.lisp
From: br...@designix.com.au (Brian Spilsbury)
Date: 30 Mar 2002 10:55:59 -0800
Local: Sat, Mar 30 2002 1:55 pm
Subject: Re: Back to character set implementation thinking

Erik Naggum <e...@naggum.net> wrote in message <news:3226482787784866@naggum.net>...
> * Brian Spilsbury
> | I think that this approach separates things which do not require it.
> |
> | If we view a string as a sequence rather than a vector, I believe that
> | most of these problems evaporate.

>   I think we have a terminological problem here.  What you call a sequence
>   is not the Common Lisp concept of "sequence" since all of list, string,
>   vector are sequences.  I think you mean something very close to what I
>   mean by stream-string with your non-Common Lisp "sequence" concept.

My point is that string is defined as vector in CL.

It is only due to being a vector that a string is a sequence.

A string cannot use non-vector substrate in CL, if it were
fundamentally a sequence, they it could, as long as that substrate
satisfied sequence.

(although, from memory vectors are not necessarily O(1) random access
in CL, so you might produce such a primitive type as a kind of vector,
except that vectors   types don't have the expressivity for noting
encodings, etc...)

> | A sequence contains things which have both vector-access-characteristics
> | and list-access-characteristics.

>   This would also a new invention because this is currently foreign to
>   Common Lisp.  What I _think_ you mean is very close to what I have tried
>   to explain in (more) Common Lisp terminology.

I think the issue here is the distinction between a primitive
data-type in CL and a type-definition.

When I say sequence, I mean the type-definition, rather than a
particular data-type.

> | The problem is that sequences in CL have relatively poor iteration
> | support.

>   Well, there is nothing in Common Lisp that has both O(1) and O(n) access
>   characteristics, and nothing in Common Lisp that has both support for
>   random access and sequential access.  I propose that stream-string
>   support sequential access and string remaining the random access.

Lists have support for random access implemented via sequential
accessors.
Vectors have support for linear access implemented via random
accessors.

I don't see a problem with providing a unified interface which at
least brings continuing iteration from saved positions to O(1) [which
would include simply fetching the value at that point, although that
doesn't seem very useful].

The real problem is that sequence doesn't define any iterative
operators, only cons [as list] does via cdr/rest and dolist, and the
ad-hoc support via loop.

I do not think that limiting yourself to a single mark/point pair, nor
keeping a mark/point in the container, where any modification
propagates side-effects, is a particularly good strategy for lisp.

I think that this makes sense for a Text-Buffer type object (which is
what emacs uses that approach for), though. A Stream interface to a
Text-Buffer would make perfect sense imho.

> | The key here is the ability to access a sequence from a stored point in
> | the sequence, and to use these points to delimit sequence actions.

>   I think the key is that you do not want the string itself to know
>   anything about how it is being read sequentially, but a simple pointer
>   into the string is not enough.  (C has certainly shown us the folly of
>   such a design.)  Specifically, I want a stream-string ot be processed
>   both with read-byte and read-char.

I don't think that this is particularly relevant to strings, although
for a string-stream, certainly.

> | Given this a string can easily have either kind of substrate - a random
> | access, or linear access implementation, and this behaviour extends
> | naturally to lists.

>   Well, I have implemented a few processors for weird and stateful
>   encodings, and I can tell you that it is not easily done.

I think it is relatively straightforward, in some encodings the amount
of state might be annoyingly large, though.

In UTF-8, euc-kr, euc-jp, etc there is no state to be saved except for
the octet-position.

In the standard compression scheme for unicode you need to save
Single-Byte-Mode-P, Current-Window, and the 8 Dynamic-Window-Offsets,
and Locking-Shift-P, I've only glanced over the spec, so please excuse
omission or error.

The unicode SCS is pretty heavy on state, I'll agree, that's 11 words
in the most conversative form, although there are various
optimisations you could apply, I might expect to represent that in 5
32-bit words with packing.

The other advantage is that we don't need to store the state in the
string at all, the transitory state is kept in the iterator (ie,
dosequence, map, subseq, etc), and this means that we can share the
string freely between readers, as we currently expect to be able to.

> | This also does not preclude the (expensive) random access of a
> | variable-width character string, and would also tie into the lazy
> | construction of sequences (whereby you might deal with a file as a
> | lazy sequence, something like a lisp version of mmap).

>   I think random access into a variable-width string is simply wrong, like
>   using nth to do more than grab exactly one element of a list.

> | Anyhow, given that variable-width-character strings would tend to be
> | immutable (or perhaps extensible and truncatable) points should have few
> | problems there.  I don't see any issues with points into lists either.

>   Except that you generally need quite a lot of state, which a stream
>   implementation would be fully able to support for you.

I think that a lot of state is the exception rather than the rule.

I also think that as shown above, we can externalise that state into
points, at an acceptable cost for reasonable encodings.

Better sequence iteration support might also facilitate a general
sequence-stream mechanism.

It may be that I am unaware of some more complex common encodings, if
there are any that you are thinking of in specific, please let me
know.

Regards,

Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 30 2002, 9:59 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Sun, 31 Mar 2002 02:59:34 GMT
Local: Sat, Mar 30 2002 9:59 pm
Subject: Re: Back to character set implementation thinking
* Brian Spilsbury
| A string cannot use non-vector substrate in CL, if it were
| fundamentally a sequence, they it could, as long as that substrate
| satisfied sequence.

  As I said, we have a terminological problem here.  vector and list are
  disjoint subclasses of sequence.  string is a subclass of vector.

| from memory vectors are not necessarily O(1) random access in CL,

  This might be at the core of your confusion.

| When I say sequence, I mean the type-definition, rather than a particular
| data-type.

  I know Common Lisp too well to understand what you mean.

| Lists have support for random access implemented via sequential
| accessors.  Vectors have support for linear access implemented via random
| accessors.

  No, this is really fundamentally confused.  Random access _means_ O(1).
  Linear access means that you have a first-class pointer to each element,
  required to access the next.  Both the cons cell and the stream satisfy
  the latter.

| The real problem is that sequence doesn't define any iterative operators,
| only cons [as list] does via cdr/rest and dolist, and the ad-hoc support
| via loop.

  What is "ad-hoc" about it?  This is very puzzling.

| I do not think that limiting yourself to a single mark/point pair, nor
| keeping a mark/point in the container, where any modification propagates
| side-effects, is a particularly good strategy for lisp.

  I think you should read what I write a little better.  It is vital that
  mark and point are _not_ part of the string, but of the iterator.  I have
  said as much.  Please do not rudely ask me to waste my time to refute
  conclusions based on things I have not said.

| I think it is relatively straightforward, in some encodings the amount
| of state might be annoyingly large, though.

  Well, we just appear to have different tolerance of necessities, or you
  know some encodings I do not, which I kind of doubt.  An example of a
  stateful encoding with an annoyingly large amount of state would be
  useful so I know where the amount becomes annoyingly large.

| In the standard compression scheme for unicode you need to save
| Single-Byte-Mode-P, Current-Window, and the 8 Dynamic-Window-Offsets, and
| Locking-Shift-P, I've only glanced over the spec, so please excuse
| omission or error.

  Seems pretty accurate.

| The unicode SCS is pretty heavy on state, I'll agree, that's 11 words
| in the most conversative form, although there are various
| optimisations you could apply, I might expect to represent that in 5
| 32-bit words with packing.

  This is so heavy on state you want to optimize the storage?  My good man,
  this is nothing and not worth optimizing.

| The other advantage is that we don't need to store the state in the
| string at all, the transitory state is kept in the iterator (ie,
| dosequence, map, subseq, etc), and this means that we can share the
| string freely between readers, as we currently expect to be able to.

  I am really curious now.  You _always_ store the state in the object that
  modifies it, _never_ in the object it refers to.  A peculiar C++ disease
  which I had the good fortune of discussing with a project leader who just
  had to vent his frustration with some of his programmers and their sheer
  inability to write threadsafe code precisely because they were hell-bent
  on "optimizing" data storage and stored the state of an iterator in the
  object iterated over.  I wondered how anyone could even think of such an
  obviously boneheaded thing, but these people, he told me, were so deeply
  concerned with not using dynamic memory and conserving memory in general
  that they made this idiotic coding practice a matter of _pride_ and would
  therefore not consider changing it, even when ordered to fix the problem.
  Thread safety or, more generally, the ability to have multiple references
  to the same object, is the Lisp way, and being anal about memory usage is
  not the Lisp way.

| I think that a lot of state is the exception rather than the rule.

  You are actually wrong about this.  The ideal of statelessness is
  generally a very bad idea, as it tries to hide state under the rug.
  Generally, state can be layered, and this is good, but it is therefore
  exctemely important to layer it correctly.  I mean, I thought this would
  be exceptionally obvious when we have a string-stream concept that can
  iterate over a string with stream operators, but you have to be explicit
  about setting up the these iterators.  (It should have been more general,
  so one could iterate over the elements of a vector with read-byte.)

| I also think that as shown above, we can externalise that state into
| points, at an acceptable cost for reasonable encodings.

  I truly wonder how you could have thought that anyone would want to store
  the iteration state in the object iterated over.  That is such a classic
  mistake that I am annoyed that I have to argue against it.

| It may be that I am unaware of some more complex common encodings, if
| there are any that you are thinking of in specific, please let me know.

  Try implementing a full ISO 2022 processor, try representing the device
  that ISO 6429 (informally known as "ANSI escape sequences") writes to, or
  consider the amount of state in a fully fledged MIME processor.  Side-
  effects and modifying state is a good thing, but it must, of course, be
  localized with the functions that maintains the state, not with the
  object that is being referenced incidentally.  Or maybe this is just that
  annoyingly stupid Object Oriented Programming thing, again, where the
  object itself is supposed to know something about how it is used.  This
  is just plain bad design.  Stuffing "next" pointers into a structure to
  build a linked list is equally nuts, but many believe this is good and
  cannot fathom the point of using a vector or a linked list that points to
  the objects in question.  Such people should be kept away from computers.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Bushnell, BSG  
View profile  
 More options Mar 31 2002, 1:10 am
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: tb+use...@becket.net (Thomas Bushnell, BSG)
Date: 30 Mar 2002 22:11:30 -0800
Local: Sun, Mar 31 2002 1:11 am
Subject: Re: Back to character set implementation thinking
steph...@wsan03.win.tue.nl (Stephan H.M.J. Houben) writes:

> To go even further: one could provide lazy string copying with
> copy-on-write, optimised string concatenation in which
> substrings are shared, and since the OP wants to replace files
> by strings, he could even consider to have the GC dynamically
> compress and uncompress large strings.

I don't know about compressing (though it's not a bogus idea).  Doing
lazy sharing by copy-on-write is certainly a good approach for large
strings, and that will probably be a necessary feature of the system
to make various user-interface tweaks work right.  Thanks for the
idea.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Spilsbury  
View profile  
 More options Mar 31 2002, 3:09 am
Newsgroups: comp.lang.lisp
From: br...@designix.com.au (Brian Spilsbury)
Date: 31 Mar 2002 00:09:06 -0800
Local: Sun, Mar 31 2002 3:09 am
Subject: Re: Back to character set implementation thinking

Erik Naggum <e...@naggum.net> wrote in message <news:3226532389569746@naggum.net>...
> * Brian Spilsbury
> | from memory vectors are not necessarily O(1) random access in CL,

>   This might be at the core of your confusion.

It's possible, but you have provided no reasoning or references.

"System Class ARRAY:

An array contains objects arranged according to a Cartesian coordinate
system. An array provides mappings from a set of fixnums
{i0,i1,...,ir-1} to corresponding elements of the array, where 0 <=ij
< dj, r is the rank of the array, and dj is the size of dimension j of
the array."

Vectors are defined in terms of arrays.

The definition of an array is such that you could implement an array
via a hash-bucket which accepted only integers in the specified range.

> | Lists have support for random access implemented via sequential
> | accessors.  Vectors have support for linear access implemented via random
> | accessors.

>   []  Random access _means_ O(1). []

No, random access means that the interface allows access to elements
in a random order.

This does not necessarily imply an O(1) access characteristic,
although this might be commonly expected.

As an example:
 * Does a hash-bucket object provide a random-access accessor?
 * Is it O(1) to access?
 * Does the degenerate case of a hash-bucket containing only one
bucket implemented with a list give O(n) access?

> | The real problem is that sequence doesn't define any iterative operators,
> | only cons [as list] does via cdr/rest and dolist, and the ad-hoc support
> | via loop.

>   What is "ad-hoc" about it? []

What is ad-hoc is that loop is a nice baroque flow-control language
which happens to have some support for iterating sequences in certain
circumstances.
.
Loop is not an iteration primitive for sequences, and CL does not
contain such a primitive to my knowledge.

> | I do not think that limiting yourself to a single mark/point pair, nor
> | keeping a mark/point in the container, where any modification propagates
> | side-effects, is a particularly good strategy for lisp.

>   [] It is vital that mark and point are _not_ part of the string, but of the iterator.  []

I'm glad that you agree.

> | I think it is relatively straightforward, in some encodings the amount
> | of state might be annoyingly large, though.

>   Well, we just appear to have different tolerance of necessities, or you
>   know some encodings I do not, which I kind of doubt.  An example of a
>   stateful encoding with an annoyingly large amount of state would be
>   useful so I know where the amount becomes annoyingly large.

This depends on how easily annoyed you are. The example of the SCS
encoding is one that I would consider to have a relatively large
amount of state carried between elements.

> | The unicode SCS is pretty heavy on state, I'll agree, that's 11 words
> | in the most conversative form, although there are various
> | optimisations you could apply, I might expect to represent that in 5
> | 32-bit words with packing.

>   This is so heavy on state you want to optimize the storage? []

I did not say that it was necessary or desirable, merely possible.

I can imagine some cases in which it would be desirable to sacrifice
speed for reduced consing, although they would be unusual.

> | The other advantage is that we don't need to store the state in the
> | string at all, the transitory state is kept in the iterator (ie,
> | dosequence, map, subseq, etc), and this means that we can share the
> | string freely between readers, as we currently expect to be able to.

>   I am really curious now.  You _always_ store the state in the object that
>   modifies it, _never_ in the object it refers to. []

Yes, that is what I'm advocating.

> | I think that a lot of state is the exception rather than the rule.

>   You are actually wrong about this.  []

I may be wrong about this, but you would need to provide statistics to
demonstrate that a lot of state is the rule rather than the exception.

> | I also think that as shown above, we can externalise that state into
> | points, at an acceptable cost for reasonable encodings.

>   I truly wonder how you could have thought that anyone would want to store
>   the iteration state in the object iterated over. []

Probably because of a reference to Emacs and mark/point.

> | It may be that I am unaware of some more complex common encodings, if
> | there are any that you are thinking of in specific, please let me know.

>   Try implementing a full ISO 2022 processor, try representing the device
>   that ISO 6429 (informally known as "ANSI escape sequences") writes to, or
>   consider the amount of state in a fully fledged MIME processor. []

From a quick glance ISO-2022 doesn't seem enormously different to the
Unicode SCS, set-selection, lock-shift, character-escaping, etc.
Unfortunately the specification doesn't appear available on-line. If
you have a reference to such,  please provide it.

I'm not sure how display control sequences and MIME processing relate
to string encoding.

Regards,

Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kent M Pitman  
View profile  
 More options Apr 1 2002, 3:51 am
Newsgroups: comp.lang.lisp
From: Kent M Pitman <pit...@world.std.com>
Date: Mon, 1 Apr 2002 08:51:03 GMT
Local: Mon, Apr 1 2002 3:51 am
Subject: Re: Back to character set implementation thinking

br...@designix.com.au (Brian Spilsbury) writes:
> A sequence contains things which have both
> vector-access-characteristics and list-access-characteristics.

No.

A sequence contains things which have EITHER
vector-access-characteristics OR list-access-characteristics.

The SET of all sequences admits BOTH things that have
vector-access-characteristics AND things that have
list-access-characteristics.

These are different statements.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Apr 1 2002, 6:11 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Mon, 01 Apr 2002 11:11:04 GMT
Local: Mon, Apr 1 2002 6:11 am
Subject: Re: Back to character set implementation thinking
* Brian Spilsbury
| It's possible, but you have provided no reasoning or references.

  I generally do not consider it my job to unconfuse peoplw who make claims
  that something untrue is true.  In fact, I take part in a discussion with
  the premise that those I talk to have done their own homework.  If they
  have not and are not inclined to do it upon request, there can be no
  discussion.

| The definition of an array is such that you could implement an array
| via a hash-bucket which accepted only integers in the specified range.

> Random access _means_ O(1).

| No, random access means that the interface allows access to elements in a
| random order.

  OK, so our terminology problem has just been compounded with
  stubbornness.

| This does not necessarily imply an O(1) access characteristic, although
| this might be commonly expected.

  If an implementation offers arrays that have anything other than O(1)
  access characteristics, it will be so resoundingly trashed that even
  inventing such silly interpretations indicates that you come here to
  quibble, not understand anything.

| What is ad-hoc is that loop is a nice baroque flow-control language which
| happens to have some support for iterating sequences in certain
| circumstances.

  (incf *troll-indicator*)

> Well, we just appear to have different tolerance of necessities

| This depends on how easily annoyed you are.

  Really?

> An example of a stateful encoding with an annoyingly large amount of
> state would be useful so I know where the amount becomes annoyingly
> large.

| The example of the SCS encoding is one that I would consider to have a
| relatively large amount of state carried between elements.

  SCS is nice and small by all standards.

| I can imagine some cases in which it would be desirable to sacrifice
| speed for reduced consing, although they would be unusual.

  Huh?  Why would anyone sacrifice speed for reduced consing?  Are you sure
  you know what you are talking about here?  Do you think using more memory
  leads to _slower_ code?  It is usually the opposite that is true.

| Yes, that is what I'm advocating.

  So you are just agreeing with me by arguing against what I suggest?

| > | I think that a lot of state is the exception rather than the rule.
| >
| >   You are actually wrong about this.  []
|
| I may be wrong about this, but you would need to provide statistics to
| demonstrate that a lot of state is the rule rather than the exception.

  How about you cough up some statistics to support your own claim!?

  (incf *troll-indicator*)

| > | I also think that as shown above, we can externalise that state into
| > | points, at an acceptable cost for reasonable encodings.
| >
| >   I truly wonder how you could have thought that anyone would want to
| >   store the iteration state in the object iterated over. []
|
| Probably because of a reference to Emacs and mark/point.

  OK, I see that this simile/analogy/metaphor thing is too complex for
  communication with you.  I shall adjust accordingly.

| I'm not sure how display control sequences and MIME processing relate
| to string encoding.

  Just think about it.  This kind of statefulness is also found in input
  editing, which may occur at different times.

  But I think you are a literate troll, and will probably not respond if
  you do not do any work on your own and only demand work of others when
  they doubt your statements.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ozan s. yigit  
View profile  
 More options Apr 1 2002, 10:39 am
Newsgroups: comp.lang.lisp
From: o...@cs.yorku.ca (ozan s. yigit)
Date: 1 Apr 2002 07:39:28 -0800
Local: Mon, Apr 1 2002 10:39 am
Subject: Re: Back to character set implementation thinking
br...@designix.com.au (Brian Spilsbury):

> No, random access means that the interface allows access to elements
> in a random order.

the term originally meant "uniform/unit-cost access" for any element in
any order. vectors have this property. hash tables in general do not.

oz


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Spilsbury  
View profile  
 More options Apr 1 2002, 11:10 am
Newsgroups: comp.lang.lisp
From: br...@designix.com.au (Brian Spilsbury)
Date: 1 Apr 2002 08:10:04 -0800
Local: Mon, Apr 1 2002 11:10 am
Subject: Re: Back to character set implementation thinking

I'm glad that you've recanted your position about random access
requiring O(1) access characteristics.

This is not a silly interpretation nor a quibble, it is essential to
the understanding of data-type interfaces and performance
characteristics.

Beyond which you have taken an aside note, and blown it out of all
proportion.

Accept that you made an incorrect assertion and move on.

The point that was being raised was that the spirit of the CL
definition of string in terms of vector severely hampered any variable
width encoding.

The aside point was that given the word of the CL definition of array
and therefore vector, you could actually implement such a variable
width encoding as a vector type and remain compliant.

Does this clarify the situation?

> | What is ad-hoc is that loop is a nice baroque flow-control language which
> | happens to have some support for iterating sequences in certain
> | circumstances.

>   (incf *troll-indicator*)

Do you engage in personal attack in lieu of actual reasoning?

Can you provide meaningful disagreement with that assement of loop?

> > An example of a stateful encoding with an annoyingly large amount of
> > state would be useful so I know where the amount becomes annoyingly
> > large.

> | The example of the SCS encoding is one that I would consider to have a
> | relatively large amount of state carried between elements.

>   SCS is nice and small by all standards.

Give an example which is average by your standards.

> | I can imagine some cases in which it would be desirable to sacrifice
> | speed for reduced consing, although they would be unusual.

>   Huh?  Why would anyone sacrifice speed for reduced consing?  Are you sure
>   you know what you are talking about here?  Do you think using more memory
>   leads to _slower_ code?  It is usually the opposite that is true.

Someone might be concerned with latency spikes from a non real-time
garbage-collector.

Again, this would be unusual. (As a side note, if one thing is usually
true, then it being false in an unusual situation is not in any way
conflicting.)

> | Yes, that is what I'm advocating.

>   So you are just agreeing with me by arguing against what I suggest?

No. You misunderstood what I was saying.

> | > | I think that a lot of state is the exception rather than the rule.
> | >
> | >   You are actually wrong about this.  []
> |
> | I may be wrong about this, but you would need to provide statistics to
> | demonstrate that a lot of state is the rule rather than the exception.

>   How about you cough up some statistics to support your own claim!?

>   (incf *troll-indicator*)

Firstly I offered an opinion.

Secondly you rebutted this in harsh terms without any relevant
information supplied.

Thirdly you engaged in personal attacks when asked for justification
for your unsupported rebuttal.

Perhaps you need to re-think what trolling means.

Secondly, all of the examples that I showed have quite small amounts
of contextual state. utf-8, shift-jis, euc-jp, euc-kr. The one with
the most state is SCS. ISO 2022 doesn't look much heavier than SCS,
however I do not have access to the ISO 2022 specification.

You have failed to provide any reference to any character-stream
protocol which is heavier in such state. MIME and terminal control
sequences do not qualify.

Please do so, and do not make empty complaints about being forced to
do homework. This is called 'backing up your own argument'.

> | > | I also think that as shown above, we can externalise that state into
> | > | points, at an acceptablue cost for reasonable encodings.
> | >
> | >   I truly wonder how you could have thought that anyone would want to
> | >   store the iteration state in the object iterated over. []
> |
> | Probably because of a reference to Emacs and mark/point.

>   OK, I see that this simile/analogy/metaphor thing is too complex for
>   communication with you.  I shall adjust accordingly.

Try to avoid personal attack if you want to be taken seriously.

> | I'm not sure how display control sequences and MIME processing relate
> | to string encoding.

>   Just think about it.  This kind of statefulness is also found in input
>   editing, which may occur at different times.

Input editing deals largely with intermediate state, as opposed to
contextual state, and is not within the domain of the problem of
string representation and accessing.

If you mean something else, then please clarify, without personal
attacks.

>   But I think you are a literate troll, and will probably not respond if
>   you do not do any work on your own and only demand work of others when
>   they doubt your statements

The weight of the onus with disageeing statements falls upon the
person making the stronger claim. (for example 'You are actually wrong
about this.', in contrast with 'I think that a lot of state is the
exception rather than the rule.' which is a far weaker claim)

Secondly, what work have you done here apart from demand of myself
when you disagree? Avoid hypocritical positions.

Please also avoid engaging in personal attack.

It is no substitute for reasoned discussion.

At this point it does not appear likely that it will be profitable to
continue.

Regards,

Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Apr 1 2002, 11:31 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Mon, 01 Apr 2002 16:28:07 GMT
Local: Mon, Apr 1 2002 11:28 am
Subject: Re: Back to character set implementation thinking
* Brian Spilsbury
| I'm glad that you've recanted your position about random access
| requiring O(1) access characteristics.

  Troll.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas F. Burdick  
View profile  
 More options Apr 1 2002, 2:21 pm
Newsgroups: comp.lang.lisp
From: t...@conquest.OCF.Berkeley.EDU (Thomas F. Burdick)
Date: 01 Apr 2002 11:21:18 -0800
Local: Mon, Apr 1 2002 2:21 pm
Subject: Re: Back to character set implementation thinking

br...@designix.com.au (Brian Spilsbury) writes:
> Erik Naggum <e...@naggum.net> wrote in message <news:3226532389569746@naggum.net>...
> >   []  Random access _means_ O(1). []

> No, random access means that the interface allows access to elements
> in a random order.

> This does not necessarily imply an O(1) access characteristic,
> although this might be commonly expected.

I cannot think of any way of having a random-access data structure
where lookups weren't O(1).  If you have some exceptional data
structure in mind, please say what it is, because no one else has
heard of it.

> As an example:
>  * Does a hash-bucket object provide a random-access accessor?

Yes, probably.

>  * Is it O(1) to access?

To the extent that it provides random access, yes.  In really
degenerate cases, hash tables can only provide linear access, which
means they're O(n), but in that case, they're not random access; but
then, you probably knew this.

>  * Does the degenerate case of a hash-bucket containing only one
> bucket implemented with a list give O(n) access?

Of course.  But it does not give random access, just a crappy
interface to a list.

> > | The real problem is that sequence doesn't define any iterative operators,
> > | only cons [as list] does via cdr/rest and dolist, and the ad-hoc support
> > | via loop.

> >   What is "ad-hoc" about it? []

> What is ad-hoc is that loop is a nice baroque flow-control language
> which happens to have some support for iterating sequences in certain
> circumstances.

True, but the support for sequences in LOOP is not ad-hoc, it's nicely
integrated into the rest of LOOP.

> Loop is not an iteration primitive for sequences, and CL does not
> contain such a primitive to my knowledge.

Sure it does, MAP.  IMHO, CL could have used a DOSEQUENCE to go along
with MAP, but CL certainly gives you a general sequence iteration
facility.

--
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                              
   |     ) |                              
  (`-.  '--.)                              
   `. )----'                              


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Spilsbury  
View profile  
 More options Apr 2 2002, 12:02 am
Newsgroups: comp.lang.lisp
From: br...@designix.com.au (Brian Spilsbury)
Date: 1 Apr 2002 21:02:47 -0800
Local: Tues, Apr 2 2002 12:02 am
Subject: Re: Back to character set implementation thinking
t...@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) wrote in message <news:xcvu1qvns29.fsf@conquest.OCF.Berkeley.EDU>...

Well, this is a consistent position to take.

However there are some implications which might not be obvious.

If we define random-access to be uniform time access, then the
addition of a cache mechanism to an otherwise random-access structure
causes it to stop being random-access (or at least become less
random-access).

Beyond this, it begs the question 'why is random-access called
random-access rather than uniform-time access?'

My understanding is that it is random-access in that sense that random
elements are necessarily unrelated, and therefore random-accesses are
likewise independent of one another, but may well be dependent upon
their own individual differences.

I think that it makes little sense to tie independent element access
back to uniform access time.

As an example, is your random-access memory random-access if we have
added a cache to it? By the definition which you have given, we would
at least have to say that it is 'less random-access' than uncached RAM
would be.

This does not seem particularly reasonable.

As a second example; Is a hard-drive random-access? The underlying
implementation certainly is not. The interface that we use to a
hard-drive tends to be.

This is a more interesting example, since the implementation's access
characteristics for different elements are not independent, but we
ignore this factor in the higher level interface, ie we deal with the
sequential access implementation of the hard-drive though an
abstraction which provides a random access interface.

My feeling is that for a consistent view of random-access we need to
consider whether access to a given element is dependent upon access to
another element at the level of the interface that is exposed.

This means that I need to accept a hash-bucket structure as
random-access, but I can still talk about lousy degenerate
performance.

As a final note you've ended up with a hash-bucket's random-access
nature being undefined.

> > > | The real problem is that sequence doesn't define any iterative operators,
> > > | only cons [as list] does via cdr/rest and dolist, and the ad-hoc support
> > > | via loop.

> > >   What is "ad-hoc" about it? []

> > What is ad-hoc is that loop is a nice baroque flow-control language
> > which happens to have some support for iterating sequences in certain
> > circumstances.

> True, but the support for sequences in LOOP is not ad-hoc, it's nicely
> integrated into the rest of LOOP.

Yes, but not into the rest of CL :)

I'm not saying that loop is a bad thing, which is why I added nice.

> > Loop is not an iteration primitive for sequences, and CL does not
> > contain such a primitive to my knowledge.

> Sure it does, MAP.  IMHO, CL could have used a DOSEQUENCE to go along
> with MAP, but CL certainly gives you a general sequence iteration
> facility.

Map and the associated functions do iterate across sequences.

There are two things that are lacking in this regard though, imho.

One is an ability to iterate a subsequence.

The other is the ability to provide access to the sequence being
iterated from the current position.

As an example, consider using map to implement a LALR(1) parser.

We can have no look-ahead at all, so we must look backward, which we
can do.

We cannot know when we're about to terminate (unless we track our
position and the length manually).

We could implement a string parser like;

(let ((last nil) (state (make-state))
  (map nil (lambda (char) (build-state state last char) (setf last
char)) buffer)
  ; handle the last element
  (build-state state last (elt (- (length buffer) 1)))
  state)

I do not think that it is reasonable to view map as being a general
iteration mechanism.

I think that map is quite sufficient as a mapping mechanism without
trying to shoehorn things like this in. :)

Regards,

Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Spilsbury  
View profile  
 More options Apr 2 2002, 12:27 am
Newsgroups: comp.lang.lisp
From: br...@designix.com.au (Brian Spilsbury)
Date: 1 Apr 2002 21:27:32 -0800
Local: Tues, Apr 2 2002 12:27 am
Subject: Re: Back to character set implementation thinking
Kent M Pitman <pit...@world.std.com> wrote in message <news:sfw1ydziyyw.fsf@shell01.TheWorld.com>...

> br...@designix.com.au (Brian Spilsbury) writes:

> > A sequence contains things which have both
> > vector-access-characteristics and list-access-characteristics.

> No.

> A sequence contains things which have EITHER
> vector-access-characteristics OR list-access-characteristics.

> The SET of all sequences admits BOTH things that have
> vector-access-characteristics AND things that have
> list-access-characteristics.

> These are different statements.

Sequence is not just defined by the type-restriction.

"Sequences are ordered collections of objects, called the elements of
the sequence.

The types vector and the type list are disjoint subtypes of type
sequence, but are not necessarily an exhaustive partition of
sequence."

The not necessarily exhaustive clause is important imho, since it
allows for things which are sequences, but neither vector nor list.

Given this we need to look at the operations defined upon objects of
type sequence.

elt, length, subseq, copy-seq, fill, replace, count, position, ...

Some of these operations use independent element acccess, some of
these use interdependent element access, ie elt vs' position.

The unexhaustive partition note indicates that you cannot reduce
sequence to list XOR vector, and must consider sequence to be an ADT
of its own, with two common implementations.

I do agree that my statement above was problematic, thank you for
pointing this out.

Regards,

Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kent M Pitman  
View profile  
 More options Apr 2 2002, 12:53 am
Newsgroups: comp.lang.lisp
From: Kent M Pitman <pit...@world.std.com>
Date: Tue, 2 Apr 2002 05:49:40 GMT
Local: Tues, Apr 2 2002 12:49 am
Subject: Re: Back to character set implementation thinking

br...@designix.com.au (Brian Spilsbury) writes:
> The types vector and the type list are disjoint subtypes of type
> sequence, but are not necessarily an exhaustive partition of
> sequence."

Yes, for better or worse, this is left to _vendor_ experimentation.

A vendor, of course, can pass through experimentation capability to you.

As the NBS rep pointed out early on in the standards process, it's not the
role of a standards committee to do design.  We did it sometimes, but always
as a last resort in order to achieve consensus when the options were in
conflict.  The first choice, though, is to have one or more vendors with a
happy experience to report..

So I'd work on convincing my vendor if I were you...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Moore  
View profile  
 More options Apr 2 2002, 1:12 am
Newsgroups: comp.lang.lisp
From: tmo...@sea-tmoore-l.dotcast.com (Tim Moore)
Date: 2 Apr 2002 06:12:45 GMT
Local: Tues, Apr 2 2002 1:12 am
Subject: Re: Back to character set implementation thinking
On Tue, 2 Apr 2002 05:49:40 GMT, Kent M Pitman <pit...@world.std.com> wrote:

Brian *is* assuming the rule of vendor here i.e., SBCL hacker.

Tim


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 126 - 150 of 160 < Older  Newer >
« Back to Discussions « Newer topic     Older topic »