d...@goldshoe.gte.com (Dorai Sitaram) wrote in message <news:a822bj$o2s$1@news.gte.com>... > In article <fo3cyjfm6j....@trex10.cs.bell-labs.com>, > Matthias Blume <matth...@shimizu-blume.com> wrote:
> >So if it were true that either way would be equally easy to read for > >someone without prior training, why would an utterly untrained person > >such as I (and pretty much all of my fellow students as well, BTW) see > >this effect? In other words, there is certainly more going on than > >just a "trained dog effect".
> Do kanji perhaps serve as some sort of abbreviation, > or, I should rather say, syntactic abstraction? If so, > their appeal may have the same reason as why > no-longer-newbie users of a programming language prefer > to extend the language with (their own or > others') procedural and textual abstractions rather > than sticking to core procedures and core syntax. I'm > speculating only.
No, kanji are not used for syntax, syntax in japanese is mediated via hiragana/katakana modifiers, and particles (also represented in hiragana, although 'wo' is distinct, and the particle 'ha' is pronounced 'wa'.
Kanji give direct semantic forms. These semantic forms are distinct from any prononciation, and the pronounciation of a particular sequence of kanji is determined by the phonetic modifiers trailing it, and/or the combination of a sequence of kanji though the On (chinese derived) or Kun (japanese) readings which are not mixed in a given sequence [a bit like composing latin with latin, and greek with greek].
You might think of kanji as giving a particular root form.
[boku]-ga basu-de [kou][kou]-e [i](ki)mashita. [I]-subject bus-by [highschool]-to [go](phonetic kanji modifier)-past-tense.
the [bracketed] forms are kanji in these examples.
my japanese is a bit rusty, so please excuse any error.
as a side note, a study found that quite different areas of the brain are used to process the hiragana/katakana forms and the kanji forms, and a different study found that reasonably severely dyslexic american (english speaking) children were able to learn several hundred chinese characters without undue difficulty, although they were unable to read roman characters.
as a final note, a certain jesuit missionary declared that the japanese written language was designed by the devil, and I think that anyone who is familiar with it would be inclined to agree. ^^
Just in case you care what Brian's examples looks like when rendered in actual Kanji and hiragana (requires sufficient MIME support in you newsreader):
Matthias Blume <matth...@shimizu-blume.com> writes: > Just in case you care what Brian's examples looks like when rendered > in actual Kanji and hiragana (requires sufficient MIME support in > you newsreader):
Nils Goesche <n...@cartan.de> wrote: > I don't know Danish. Maybe it doesn't make a difference there. > Maybe Danes only used to capitalize nouns because the Germans > did, and as the Germans weren't exactly very popular in 1948, > that might have been a good opportunity to give up on it.
The capitalization tradition had the same origin as the German one. It started as a fad among printers back when printing was a relatively new craft. It is also true that the lack of popularity of anything German in the late forties made it possible to pass the bill that put the spelling reform into effect, but it wasn't the reason for doing it, it just provided the necessary leverage in the general public. The idea goes back to the 19th century, but was for a long time met with scorn, partly due to inertia and conservatism and partly due to skepticism. Why change something that has worked well for centuries? Well, it hadn't worked well. Many people couldn't figure out which words should be capitalized. That was the real reason for the reform. The main arguments against it invariably consisted of examples, like the ones you gave in German, where the capitalization eliminates ambiguity. In isolation, that was - and is - correct, but what the critics overlooked is that sentences don't occur in total isolation. They are always part of a larger conversational context, a discourse, and that is what implicitly removes the ambiguity.
> Or maybe it wasn't. Who is supposed to know anymore? You said > there was a controversy about it; maybe the people who were > against it were right? Who would remember? How could you tell?
The discussions ended decades ago when people realized that nothing really had been lost and the new system made it easier for people with a more modest formal knowledge of grammar to write in a way that reasonably conforms to the official norm. A net gain for everybody. And, even today, there are still quite a few people around who originally learned the old system.
I think we should end this subthread new as it has nothing to do with Lisp anymore.
> ... >> I don't have much time in the morning, but still manage to read >> large parts of the Frankfurter Allgemeine every morning, in >> very little time. I sometimes ``observe'' myself how I read >> that fast, and I found out that when looking at a whole block >> of text at a time, the visual structure of sentences indicated >> by capitalized words is a very useful help for the reader.
> Most likely because that's what you are used to.
>> I don't know Danish. [...] You said there was a controversy >> about it; maybe the people who were against it were right?
> The most vocal opponents were the kind of people who always think > the world is coming to an end if anything changes. They are now > spending their energy on how to, or not to, place commas. In > general, they are surprisingly clueless about the subjects they > make sarcastic remarks about.
i remember a feuilleton article/editorial in the frankfurter allgemeine quite a while ago (around 1970). it was during one of the ever returning discussions in the german language area about a reform of the capitalisation rules and their reform. in the article, the author suggested that they should not be simplified, since some of the more obscure rules would help distinguish between netter and less well educated persons. iirc, he didn't forward his position tongue in cheek
hs
--
don't use malice as an explanation when stupidity suffices
In article <fo3cyjfm6j....@trex10.cs.bell-labs.com>, Matthias Blume <matth...@shimizu-blume.com> writes:
> ... > There are on the order of 50 hiragana, but there are several thousands > of Kanji -- which means that learing just hiragana is immensely easier > than learning both. According to the above, one would expect that > someone without prior exposure to either system would have an easier > time reading pure hiragana text.
> I, having not been raised in Japan, fall into this category of having > no prior exposure. But what can I tell you? The moment I managed to > memorize even just a tiny number of Kanji, sentences that actually > used them (in place of their hiragana spellings) became *vastly* > easier to read for me. I am not a psychologist or linguist, so I > won't speculate on why that is.
does japanes have many homophones? from what i remember having read a while ago, the kanji characters quite often are taken over from chinese to designate the japanese word for the chinese word the character was developed for. if the language is rich in homophones, this would help distinguish between identically sounding words with totally different meanings
hs
--
don't use malice as an explanation when stupidity suffices
Erik Naggum <e...@naggum.net> wrote in message <news:3226402339496495@naggum.net>... > * Brian Spilsbury > | I think you need to differentiate between mutable and immutable > | strings.
> I have suggested that strings need to be separated into two mor basic > types: a stream which you read one element at a time, and a vector which > provides random access. The former maps directly to files and is > suitable for parsing and formatting, while a vector of characters is more > useful for repeated access to the same characters.
> We have the system class string-stream today, which offers stream access > to a string, but I think we need a subclass of string like stream-string, > which may contain such things as the octets from another stream such as > directly from an input file, and be processed sequentially, and therefore > should also be able to use stateful encodings such that reading through > them with the string-stream functions would maintain that state.
I think that this approach separates things which do not require it.
If we view a string as a sequence rather than a vector, I believe that most of these problems evaporate.
A sequence contains things which have both vector-access-characteristics and list-access-characteristics.
The problem is that sequences in CL have relatively poor iteration support.
One of the more complex things that we might want to do with a string is to tokenise it.
for a half-baked example, to break up a string into a list of comma delimited strings.
The key here is the ability to access a sequence from a stored point in the sequence, and to use these points to delimit sequence actions.
Given this a string can easily have either kind of substrate - a random access, or linear access implementation, and this behaviour extends naturally to lists.
There are some issues with points and the mutation of the string, as well as the usable life-time of the points, but I think that these can be addressed with some thought.
This also does not preclude the (expensive) random access of a variable-width character string, and would also tie into the lazy construction of sequences (whereby you might deal with a file as a lazy sequence, something like a lisp version of mmap).
Anyhow, given that variable-width-character strings would tend to be immutable (or perhaps extensible and truncatable) points should have few problems there. I don't see any issues with points into lists either.
In article <usn6kh477....@globalgraphics.com>, Pekka P. Pirinen wrote: >> Basically then we would have strings which are UCS-4, UCS-2 and >> Latin-1 restricted (internally, not visibly to users). [...] >> Procedures like string-set! therefore might have to inflate (and >> thus copy) the entire string if a value outside the range is stored. >> But that's ok with me; I don't think it's a serious lose.
>I suppose that is a viable implementation strategy, but I don't think >it's the right option. The language should expose the range of string >data types to the programmer, and let them choose, because the range >of memory usage is just too great to sweep under the mat. Also, >having strings automatically reallocated means an extra indirection >for access which cannot always be optimized away.
If you have more than one string type anyway, then you can have both directly and indirectly represented strings. It is then possible to arrange that any directly represented string can be replaced with an indirectly represented string. Then, arrange for the garbage collector to remove all indirections.
Again, this is not that more complex once you have decided to go for multiple string types anyway. Moreover, it is completely transparent to the programmer and it can provide other useful features, e.g. growing of strings. Indeed, it is even possible for the implementation to dynamically decide to overallocate storage once a string has been grown, so that naively building a string character-by-character will be O(n).
all this adds implementation complexity, but it makes string handling much easier on the programmer.
To go even further: one could provide lazy string copying with copy-on-write, optimised string concatenation in which substrings are shared, and since the OP wants to replace files by strings, he could even consider to have the GC dynamically compress and uncompress large strings.
>I note that offering multiple string types is exactly what all the CL >implementations seem to have done. This doesn't preclude having >features that automatically select the smallest feasible type, e.g., >for "" read syntax or a STRING-APPEND function. >-- >Pekka P. Pirinen >The gap between theory and practice is bigger in practice than in theory.
* Brian Spilsbury | I think that this approach separates things which do not require it. | | If we view a string as a sequence rather than a vector, I believe that | most of these problems evaporate.
I think we have a terminological problem here. What you call a sequence is not the Common Lisp concept of "sequence" since all of list, string, vector are sequences. I think you mean something very close to what I mean by stream-string with your non-Common Lisp "sequence" concept.
| A sequence contains things which have both vector-access-characteristics | and list-access-characteristics.
This would also a new invention because this is currently foreign to Common Lisp. What I _think_ you mean is very close to what I have tried to explain in (more) Common Lisp terminology.
| The problem is that sequences in CL have relatively poor iteration | support.
Well, there is nothing in Common Lisp that has both O(1) and O(n) access characteristics, and nothing in Common Lisp that has both support for random access and sequential access. I propose that stream-string support sequential access and string remaining the random access.
| One of the more complex things that we might want to do with a string is | to tokenise it.
Precisely, but this is a problem that has many different kinds of solutions, not just one.
| (let ((last-point nil)) | (dosequence (char point string) | (when (char= char #\,) | (if last-point | (collect (subseq string :start-point last-point :end-point | point)) | (setq last-point point))))) | | for a half-baked example, to break up a string into a list of comma | delimited strings.
I prefer a design that has an opaque mark in a stream-string iterator, but this should also be in regular streams. Extracting the string between mark and point (in Emacs terminology) may re-establish some context in the new string if it is merely a sub-stream-string, but could also copy characters into a string (vector).
| The key here is the ability to access a sequence from a stored point in | the sequence, and to use these points to delimit sequence actions.
I think the key is that you do not want the string itself to know anything about how it is being read sequentially, but a simple pointer into the string is not enough. (C has certainly shown us the folly of such a design.) Specifically, I want a stream-string ot be processed both with read-byte and read-char.
| Given this a string can easily have either kind of substrate - a random | access, or linear access implementation, and this behaviour extends | naturally to lists.
Well, I have implemented a few processors for weird and stateful encodings, and I can tell you that it is not easily done.
| This also does not preclude the (expensive) random access of a | variable-width character string, and would also tie into the lazy | construction of sequences (whereby you might deal with a file as a | lazy sequence, something like a lisp version of mmap).
I think random access into a variable-width string is simply wrong, like using nth to do more than grab exactly one element of a list.
| Anyhow, given that variable-width-character strings would tend to be | immutable (or perhaps extensible and truncatable) points should have few | problems there. I don't see any issues with points into lists either.
Except that you generally need quite a lot of state, which a stream implementation would be fully able to support for you.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
Erik Naggum <e...@naggum.net> wrote in message <news:3226482787784866@naggum.net>... > * Brian Spilsbury > | I think that this approach separates things which do not require it. > | > | If we view a string as a sequence rather than a vector, I believe that > | most of these problems evaporate.
> I think we have a terminological problem here. What you call a sequence > is not the Common Lisp concept of "sequence" since all of list, string, > vector are sequences. I think you mean something very close to what I > mean by stream-string with your non-Common Lisp "sequence" concept.
My point is that string is defined as vector in CL.
It is only due to being a vector that a string is a sequence.
A string cannot use non-vector substrate in CL, if it were fundamentally a sequence, they it could, as long as that substrate satisfied sequence.
(although, from memory vectors are not necessarily O(1) random access in CL, so you might produce such a primitive type as a kind of vector, except that vectors types don't have the expressivity for noting encodings, etc...)
> | A sequence contains things which have both vector-access-characteristics > | and list-access-characteristics.
> This would also a new invention because this is currently foreign to > Common Lisp. What I _think_ you mean is very close to what I have tried > to explain in (more) Common Lisp terminology.
I think the issue here is the distinction between a primitive data-type in CL and a type-definition.
When I say sequence, I mean the type-definition, rather than a particular data-type.
> | The problem is that sequences in CL have relatively poor iteration > | support.
> Well, there is nothing in Common Lisp that has both O(1) and O(n) access > characteristics, and nothing in Common Lisp that has both support for > random access and sequential access. I propose that stream-string > support sequential access and string remaining the random access.
Lists have support for random access implemented via sequential accessors. Vectors have support for linear access implemented via random accessors.
I don't see a problem with providing a unified interface which at least brings continuing iteration from saved positions to O(1) [which would include simply fetching the value at that point, although that doesn't seem very useful].
The real problem is that sequence doesn't define any iterative operators, only cons [as list] does via cdr/rest and dolist, and the ad-hoc support via loop.
> | One of the more complex things that we might want to do with a string is > | to tokenise it.
> Precisely, but this is a problem that has many different kinds of > solutions, not just one.
> | (let ((last-point nil)) > | (dosequence (char point string) > | (when (char= char #\,) > | (if last-point > | (collect (subseq string :start-point last-point :end-point > | point)) > | (setq last-point point))))) > | > | for a half-baked example, to break up a string into a list of comma > | delimited strings.
> I prefer a design that has an opaque mark in a stream-string iterator, > but this should also be in regular streams. Extracting the string > between mark and point (in Emacs terminology) may re-establish some > context in the new string if it is merely a sub-stream-string, but could > also copy characters into a string (vector).
I do not think that limiting yourself to a single mark/point pair, nor keeping a mark/point in the container, where any modification propagates side-effects, is a particularly good strategy for lisp.
I think that this makes sense for a Text-Buffer type object (which is what emacs uses that approach for), though. A Stream interface to a Text-Buffer would make perfect sense imho.
> | The key here is the ability to access a sequence from a stored point in > | the sequence, and to use these points to delimit sequence actions.
> I think the key is that you do not want the string itself to know > anything about how it is being read sequentially, but a simple pointer > into the string is not enough. (C has certainly shown us the folly of > such a design.) Specifically, I want a stream-string ot be processed > both with read-byte and read-char.
I don't think that this is particularly relevant to strings, although for a string-stream, certainly.
> | Given this a string can easily have either kind of substrate - a random > | access, or linear access implementation, and this behaviour extends > | naturally to lists.
> Well, I have implemented a few processors for weird and stateful > encodings, and I can tell you that it is not easily done.
I think it is relatively straightforward, in some encodings the amount of state might be annoyingly large, though.
In UTF-8, euc-kr, euc-jp, etc there is no state to be saved except for the octet-position.
In the standard compression scheme for unicode you need to save Single-Byte-Mode-P, Current-Window, and the 8 Dynamic-Window-Offsets, and Locking-Shift-P, I've only glanced over the spec, so please excuse omission or error.
The unicode SCS is pretty heavy on state, I'll agree, that's 11 words in the most conversative form, although there are various optimisations you could apply, I might expect to represent that in 5 32-bit words with packing.
The other advantage is that we don't need to store the state in the string at all, the transitory state is kept in the iterator (ie, dosequence, map, subseq, etc), and this means that we can share the string freely between readers, as we currently expect to be able to.
> | This also does not preclude the (expensive) random access of a > | variable-width character string, and would also tie into the lazy > | construction of sequences (whereby you might deal with a file as a > | lazy sequence, something like a lisp version of mmap).
> I think random access into a variable-width string is simply wrong, like > using nth to do more than grab exactly one element of a list.
> | Anyhow, given that variable-width-character strings would tend to be > | immutable (or perhaps extensible and truncatable) points should have few > | problems there. I don't see any issues with points into lists either.
> Except that you generally need quite a lot of state, which a stream > implementation would be fully able to support for you.
I think that a lot of state is the exception rather than the rule.
I also think that as shown above, we can externalise that state into points, at an acceptable cost for reasonable encodings.
Better sequence iteration support might also facilitate a general sequence-stream mechanism.
It may be that I am unaware of some more complex common encodings, if there are any that you are thinking of in specific, please let me know.
* Brian Spilsbury | A string cannot use non-vector substrate in CL, if it were | fundamentally a sequence, they it could, as long as that substrate | satisfied sequence.
As I said, we have a terminological problem here. vector and list are disjoint subclasses of sequence. string is a subclass of vector.
| from memory vectors are not necessarily O(1) random access in CL,
This might be at the core of your confusion.
| When I say sequence, I mean the type-definition, rather than a particular | data-type.
I know Common Lisp too well to understand what you mean.
| Lists have support for random access implemented via sequential | accessors. Vectors have support for linear access implemented via random | accessors.
No, this is really fundamentally confused. Random access _means_ O(1). Linear access means that you have a first-class pointer to each element, required to access the next. Both the cons cell and the stream satisfy the latter.
| The real problem is that sequence doesn't define any iterative operators, | only cons [as list] does via cdr/rest and dolist, and the ad-hoc support | via loop.
What is "ad-hoc" about it? This is very puzzling.
| I do not think that limiting yourself to a single mark/point pair, nor | keeping a mark/point in the container, where any modification propagates | side-effects, is a particularly good strategy for lisp.
I think you should read what I write a little better. It is vital that mark and point are _not_ part of the string, but of the iterator. I have said as much. Please do not rudely ask me to waste my time to refute conclusions based on things I have not said.
| I think it is relatively straightforward, in some encodings the amount | of state might be annoyingly large, though.
Well, we just appear to have different tolerance of necessities, or you know some encodings I do not, which I kind of doubt. An example of a stateful encoding with an annoyingly large amount of state would be useful so I know where the amount becomes annoyingly large.
| In the standard compression scheme for unicode you need to save | Single-Byte-Mode-P, Current-Window, and the 8 Dynamic-Window-Offsets, and | Locking-Shift-P, I've only glanced over the spec, so please excuse | omission or error.
Seems pretty accurate.
| The unicode SCS is pretty heavy on state, I'll agree, that's 11 words | in the most conversative form, although there are various | optimisations you could apply, I might expect to represent that in 5 | 32-bit words with packing.
This is so heavy on state you want to optimize the storage? My good man, this is nothing and not worth optimizing.
| The other advantage is that we don't need to store the state in the | string at all, the transitory state is kept in the iterator (ie, | dosequence, map, subseq, etc), and this means that we can share the | string freely between readers, as we currently expect to be able to.
I am really curious now. You _always_ store the state in the object that modifies it, _never_ in the object it refers to. A peculiar C++ disease which I had the good fortune of discussing with a project leader who just had to vent his frustration with some of his programmers and their sheer inability to write threadsafe code precisely because they were hell-bent on "optimizing" data storage and stored the state of an iterator in the object iterated over. I wondered how anyone could even think of such an obviously boneheaded thing, but these people, he told me, were so deeply concerned with not using dynamic memory and conserving memory in general that they made this idiotic coding practice a matter of _pride_ and would therefore not consider changing it, even when ordered to fix the problem. Thread safety or, more generally, the ability to have multiple references to the same object, is the Lisp way, and being anal about memory usage is not the Lisp way.
| I think that a lot of state is the exception rather than the rule.
You are actually wrong about this. The ideal of statelessness is generally a very bad idea, as it tries to hide state under the rug. Generally, state can be layered, and this is good, but it is therefore exctemely important to layer it correctly. I mean, I thought this would be exceptionally obvious when we have a string-stream concept that can iterate over a string with stream operators, but you have to be explicit about setting up the these iterators. (It should have been more general, so one could iterate over the elements of a vector with read-byte.)
| I also think that as shown above, we can externalise that state into | points, at an acceptable cost for reasonable encodings.
I truly wonder how you could have thought that anyone would want to store the iteration state in the object iterated over. That is such a classic mistake that I am annoyed that I have to argue against it.
| It may be that I am unaware of some more complex common encodings, if | there are any that you are thinking of in specific, please let me know.
Try implementing a full ISO 2022 processor, try representing the device that ISO 6429 (informally known as "ANSI escape sequences") writes to, or consider the amount of state in a fully fledged MIME processor. Side- effects and modifying state is a good thing, but it must, of course, be localized with the functions that maintains the state, not with the object that is being referenced incidentally. Or maybe this is just that annoyingly stupid Object Oriented Programming thing, again, where the object itself is supposed to know something about how it is used. This is just plain bad design. Stuffing "next" pointers into a structure to build a linked list is equally nuts, but many believe this is good and cannot fathom the point of using a vector or a linked list that points to the objects in question. Such people should be kept away from computers.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
> To go even further: one could provide lazy string copying with > copy-on-write, optimised string concatenation in which > substrings are shared, and since the OP wants to replace files > by strings, he could even consider to have the GC dynamically > compress and uncompress large strings.
I don't know about compressing (though it's not a bogus idea). Doing lazy sharing by copy-on-write is certainly a good approach for large strings, and that will probably be a necessary feature of the system to make various user-interface tweaks work right. Thanks for the idea.
Erik Naggum <e...@naggum.net> wrote in message <news:3226532389569746@naggum.net>... > * Brian Spilsbury > | from memory vectors are not necessarily O(1) random access in CL,
> This might be at the core of your confusion.
It's possible, but you have provided no reasoning or references.
"System Class ARRAY:
An array contains objects arranged according to a Cartesian coordinate system. An array provides mappings from a set of fixnums {i0,i1,...,ir-1} to corresponding elements of the array, where 0 <=ij < dj, r is the rank of the array, and dj is the size of dimension j of the array."
Vectors are defined in terms of arrays.
The definition of an array is such that you could implement an array via a hash-bucket which accepted only integers in the specified range.
> | Lists have support for random access implemented via sequential > | accessors. Vectors have support for linear access implemented via random > | accessors.
> [] Random access _means_ O(1). []
No, random access means that the interface allows access to elements in a random order.
This does not necessarily imply an O(1) access characteristic, although this might be commonly expected.
As an example: * Does a hash-bucket object provide a random-access accessor? * Is it O(1) to access? * Does the degenerate case of a hash-bucket containing only one bucket implemented with a list give O(n) access?
> | The real problem is that sequence doesn't define any iterative operators, > | only cons [as list] does via cdr/rest and dolist, and the ad-hoc support > | via loop.
> What is "ad-hoc" about it? []
What is ad-hoc is that loop is a nice baroque flow-control language which happens to have some support for iterating sequences in certain circumstances. . Loop is not an iteration primitive for sequences, and CL does not contain such a primitive to my knowledge.
> | I do not think that limiting yourself to a single mark/point pair, nor > | keeping a mark/point in the container, where any modification propagates > | side-effects, is a particularly good strategy for lisp.
> [] It is vital that mark and point are _not_ part of the string, but of the iterator. []
I'm glad that you agree.
> | I think it is relatively straightforward, in some encodings the amount > | of state might be annoyingly large, though.
> Well, we just appear to have different tolerance of necessities, or you > know some encodings I do not, which I kind of doubt. An example of a > stateful encoding with an annoyingly large amount of state would be > useful so I know where the amount becomes annoyingly large.
This depends on how easily annoyed you are. The example of the SCS encoding is one that I would consider to have a relatively large amount of state carried between elements.
> | The unicode SCS is pretty heavy on state, I'll agree, that's 11 words > | in the most conversative form, although there are various > | optimisations you could apply, I might expect to represent that in 5 > | 32-bit words with packing.
> This is so heavy on state you want to optimize the storage? []
I did not say that it was necessary or desirable, merely possible.
I can imagine some cases in which it would be desirable to sacrifice speed for reduced consing, although they would be unusual.
> | The other advantage is that we don't need to store the state in the > | string at all, the transitory state is kept in the iterator (ie, > | dosequence, map, subseq, etc), and this means that we can share the > | string freely between readers, as we currently expect to be able to.
> I am really curious now. You _always_ store the state in the object that > modifies it, _never_ in the object it refers to. []
Yes, that is what I'm advocating.
> | I think that a lot of state is the exception rather than the rule.
> You are actually wrong about this. []
I may be wrong about this, but you would need to provide statistics to demonstrate that a lot of state is the rule rather than the exception.
> | I also think that as shown above, we can externalise that state into > | points, at an acceptable cost for reasonable encodings.
> I truly wonder how you could have thought that anyone would want to store > the iteration state in the object iterated over. []
Probably because of a reference to Emacs and mark/point.
> | It may be that I am unaware of some more complex common encodings, if > | there are any that you are thinking of in specific, please let me know.
> Try implementing a full ISO 2022 processor, try representing the device > that ISO 6429 (informally known as "ANSI escape sequences") writes to, or > consider the amount of state in a fully fledged MIME processor. []
From a quick glance ISO-2022 doesn't seem enormously different to the Unicode SCS, set-selection, lock-shift, character-escaping, etc. Unfortunately the specification doesn't appear available on-line. If you have a reference to such, please provide it.
I'm not sure how display control sequences and MIME processing relate to string encoding.
br...@designix.com.au (Brian Spilsbury) writes: > A sequence contains things which have both > vector-access-characteristics and list-access-characteristics.
No.
A sequence contains things which have EITHER vector-access-characteristics OR list-access-characteristics.
The SET of all sequences admits BOTH things that have vector-access-characteristics AND things that have list-access-characteristics.
* Brian Spilsbury | It's possible, but you have provided no reasoning or references.
I generally do not consider it my job to unconfuse peoplw who make claims that something untrue is true. In fact, I take part in a discussion with the premise that those I talk to have done their own homework. If they have not and are not inclined to do it upon request, there can be no discussion.
| The definition of an array is such that you could implement an array | via a hash-bucket which accepted only integers in the specified range.
> Random access _means_ O(1).
| No, random access means that the interface allows access to elements in a | random order.
OK, so our terminology problem has just been compounded with stubbornness.
| This does not necessarily imply an O(1) access characteristic, although | this might be commonly expected.
If an implementation offers arrays that have anything other than O(1) access characteristics, it will be so resoundingly trashed that even inventing such silly interpretations indicates that you come here to quibble, not understand anything.
| What is ad-hoc is that loop is a nice baroque flow-control language which | happens to have some support for iterating sequences in certain | circumstances.
(incf *troll-indicator*)
> Well, we just appear to have different tolerance of necessities
| This depends on how easily annoyed you are.
Really?
> An example of a stateful encoding with an annoyingly large amount of > state would be useful so I know where the amount becomes annoyingly > large.
| The example of the SCS encoding is one that I would consider to have a | relatively large amount of state carried between elements.
SCS is nice and small by all standards.
| I can imagine some cases in which it would be desirable to sacrifice | speed for reduced consing, although they would be unusual.
Huh? Why would anyone sacrifice speed for reduced consing? Are you sure you know what you are talking about here? Do you think using more memory leads to _slower_ code? It is usually the opposite that is true.
| Yes, that is what I'm advocating.
So you are just agreeing with me by arguing against what I suggest?
| > | I think that a lot of state is the exception rather than the rule. | > | > You are actually wrong about this. [] | | I may be wrong about this, but you would need to provide statistics to | demonstrate that a lot of state is the rule rather than the exception.
How about you cough up some statistics to support your own claim!?
(incf *troll-indicator*)
| > | I also think that as shown above, we can externalise that state into | > | points, at an acceptable cost for reasonable encodings. | > | > I truly wonder how you could have thought that anyone would want to | > store the iteration state in the object iterated over. [] | | Probably because of a reference to Emacs and mark/point.
OK, I see that this simile/analogy/metaphor thing is too complex for communication with you. I shall adjust accordingly.
| I'm not sure how display control sequences and MIME processing relate | to string encoding.
Just think about it. This kind of statefulness is also found in input editing, which may occur at different times.
But I think you are a literate troll, and will probably not respond if you do not do any work on your own and only demand work of others when they doubt your statements.
/// -- In a fight against something, the fight has value, victory has none. In a fight for something, the fight is a loss, victory merely relief.
> | No, random access means that the interface allows access to elements in a > | random order.
> OK, so our terminology problem has just been compounded with > stubbornness. > | This does not necessarily imply an O(1) access characteristic, although > | this might be commonly expected.
> If an implementation offers arrays that have anything other than O(1) > access characteristics, it will be so resoundingly trashed that even > inventing such silly interpretations indicates that you come here to > quibble, not understand anything.
I'm glad that you've recanted your position about random access requiring O(1) access characteristics.
This is not a silly interpretation nor a quibble, it is essential to the understanding of data-type interfaces and performance characteristics.
Beyond which you have taken an aside note, and blown it out of all proportion.
Accept that you made an incorrect assertion and move on.
The point that was being raised was that the spirit of the CL definition of string in terms of vector severely hampered any variable width encoding.
The aside point was that given the word of the CL definition of array and therefore vector, you could actually implement such a variable width encoding as a vector type and remain compliant.
Does this clarify the situation?
> | What is ad-hoc is that loop is a nice baroque flow-control language which > | happens to have some support for iterating sequences in certain > | circumstances.
> (incf *troll-indicator*)
Do you engage in personal attack in lieu of actual reasoning?
Can you provide meaningful disagreement with that assement of loop?
> > An example of a stateful encoding with an annoyingly large amount of > > state would be useful so I know where the amount becomes annoyingly > > large.
> | The example of the SCS encoding is one that I would consider to have a > | relatively large amount of state carried between elements.
> SCS is nice and small by all standards.
Give an example which is average by your standards.
> | I can imagine some cases in which it would be desirable to sacrifice > | speed for reduced consing, although they would be unusual.
> Huh? Why would anyone sacrifice speed for reduced consing? Are you sure > you know what you are talking about here? Do you think using more memory > leads to _slower_ code? It is usually the opposite that is true.
Someone might be concerned with latency spikes from a non real-time garbage-collector.
Again, this would be unusual. (As a side note, if one thing is usually true, then it being false in an unusual situation is not in any way conflicting.)
> | Yes, that is what I'm advocating.
> So you are just agreeing with me by arguing against what I suggest?
No. You misunderstood what I was saying.
> | > | I think that a lot of state is the exception rather than the rule. > | > > | > You are actually wrong about this. [] > | > | I may be wrong about this, but you would need to provide statistics to > | demonstrate that a lot of state is the rule rather than the exception.
> How about you cough up some statistics to support your own claim!?
> (incf *troll-indicator*)
Firstly I offered an opinion.
Secondly you rebutted this in harsh terms without any relevant information supplied.
Thirdly you engaged in personal attacks when asked for justification for your unsupported rebuttal.
Perhaps you need to re-think what trolling means.
Secondly, all of the examples that I showed have quite small amounts of contextual state. utf-8, shift-jis, euc-jp, euc-kr. The one with the most state is SCS. ISO 2022 doesn't look much heavier than SCS, however I do not have access to the ISO 2022 specification.
You have failed to provide any reference to any character-stream protocol which is heavier in such state. MIME and terminal control sequences do not qualify.
Please do so, and do not make empty complaints about being forced to do homework. This is called 'backing up your own argument'.
> | > | I also think that as shown above, we can externalise that state into > | > | points, at an acceptablue cost for reasonable encodings. > | > > | > I truly wonder how you could have thought that anyone would want to > | > store the iteration state in the object iterated over. [] > | > | Probably because of a reference to Emacs and mark/point.
> OK, I see that this simile/analogy/metaphor thing is too complex for > communication with you. I shall adjust accordingly.
Try to avoid personal attack if you want to be taken seriously.
> | I'm not sure how display control sequences and MIME processing relate > | to string encoding.
> Just think about it. This kind of statefulness is also found in input > editing, which may occur at different times.
Input editing deals largely with intermediate state, as opposed to contextual state, and is not within the domain of the problem of string representation and accessing.
If you mean something else, then please clarify, without personal attacks.
> But I think you are a literate troll, and will probably not respond if > you do not do any work on your own and only demand work of others when > they doubt your statements
The weight of the onus with disageeing statements falls upon the person making the stronger claim. (for example 'You are actually wrong about this.', in contrast with 'I think that a lot of state is the exception rather than the rule.' which is a far weaker claim)
Secondly, what work have you done here apart from demand of myself when you disagree? Avoid hypocritical positions.
Please also avoid engaging in personal attack.
It is no substitute for reasoned discussion.
At this point it does not appear likely that it will be profitable to continue.
br...@designix.com.au (Brian Spilsbury) writes: > Erik Naggum <e...@naggum.net> wrote in message <news:3226532389569746@naggum.net>... > > [] Random access _means_ O(1). []
> No, random access means that the interface allows access to elements > in a random order.
> This does not necessarily imply an O(1) access characteristic, > although this might be commonly expected.
I cannot think of any way of having a random-access data structure where lookups weren't O(1). If you have some exceptional data structure in mind, please say what it is, because no one else has heard of it.
> As an example: > * Does a hash-bucket object provide a random-access accessor?
Yes, probably.
> * Is it O(1) to access?
To the extent that it provides random access, yes. In really degenerate cases, hash tables can only provide linear access, which means they're O(n), but in that case, they're not random access; but then, you probably knew this.
> * Does the degenerate case of a hash-bucket containing only one > bucket implemented with a list give O(n) access?
Of course. But it does not give random access, just a crappy interface to a list.
> > | The real problem is that sequence doesn't define any iterative operators, > > | only cons [as list] does via cdr/rest and dolist, and the ad-hoc support > > | via loop.
> > What is "ad-hoc" about it? []
> What is ad-hoc is that loop is a nice baroque flow-control language > which happens to have some support for iterating sequences in certain > circumstances.
True, but the support for sequences in LOOP is not ad-hoc, it's nicely integrated into the rest of LOOP.
> Loop is not an iteration primitive for sequences, and CL does not > contain such a primitive to my knowledge.
Sure it does, MAP. IMHO, CL could have used a DOSEQUENCE to go along with MAP, but CL certainly gives you a general sequence iteration facility.
-- /|_ .-----------------------. ,' .\ / | No to Imperialist war | ,--' _,' | Wage class war! | / / `-----------------------' ( -. | | ) | (`-. '--.) `. )----'
> > No, random access means that the interface allows access to elements > > in a random order.
> > This does not necessarily imply an O(1) access characteristic, > > although this might be commonly expected.
> I cannot think of any way of having a random-access data structure > where lookups weren't O(1). If you have some exceptional data > structure in mind, please say what it is, because no one else has > heard of it.
> > As an example: > > * Does a hash-bucket object provide a random-access accessor?
> Yes, probably.
> > * Is it O(1) to access?
> To the extent that it provides random access, yes. In really > degenerate cases, hash tables can only provide linear access, which > means they're O(n), but in that case, they're not random access; but > then, you probably knew this.
> > * Does the degenerate case of a hash-bucket containing only one > > bucket implemented with a list give O(n) access?
> Of course. But it does not give random access, just a crappy > interface to a list.
Well, this is a consistent position to take.
However there are some implications which might not be obvious.
If we define random-access to be uniform time access, then the addition of a cache mechanism to an otherwise random-access structure causes it to stop being random-access (or at least become less random-access).
Beyond this, it begs the question 'why is random-access called random-access rather than uniform-time access?'
My understanding is that it is random-access in that sense that random elements are necessarily unrelated, and therefore random-accesses are likewise independent of one another, but may well be dependent upon their own individual differences.
I think that it makes little sense to tie independent element access back to uniform access time.
As an example, is your random-access memory random-access if we have added a cache to it? By the definition which you have given, we would at least have to say that it is 'less random-access' than uncached RAM would be.
This does not seem particularly reasonable.
As a second example; Is a hard-drive random-access? The underlying implementation certainly is not. The interface that we use to a hard-drive tends to be.
This is a more interesting example, since the implementation's access characteristics for different elements are not independent, but we ignore this factor in the higher level interface, ie we deal with the sequential access implementation of the hard-drive though an abstraction which provides a random access interface.
My feeling is that for a consistent view of random-access we need to consider whether access to a given element is dependent upon access to another element at the level of the interface that is exposed.
This means that I need to accept a hash-bucket structure as random-access, but I can still talk about lousy degenerate performance.
As a final note you've ended up with a hash-bucket's random-access nature being undefined.
> > > | The real problem is that sequence doesn't define any iterative operators, > > > | only cons [as list] does via cdr/rest and dolist, and the ad-hoc support > > > | via loop.
> > > What is "ad-hoc" about it? []
> > What is ad-hoc is that loop is a nice baroque flow-control language > > which happens to have some support for iterating sequences in certain > > circumstances.
> True, but the support for sequences in LOOP is not ad-hoc, it's nicely > integrated into the rest of LOOP.
Yes, but not into the rest of CL :)
I'm not saying that loop is a bad thing, which is why I added nice.
> > Loop is not an iteration primitive for sequences, and CL does not > > contain such a primitive to my knowledge.
> Sure it does, MAP. IMHO, CL could have used a DOSEQUENCE to go along > with MAP, but CL certainly gives you a general sequence iteration > facility.
Map and the associated functions do iterate across sequences.
There are two things that are lacking in this regard though, imho.
One is an ability to iterate a subsequence.
The other is the ability to provide access to the sequence being iterated from the current position.
As an example, consider using map to implement a LALR(1) parser.
We can have no look-ahead at all, so we must look backward, which we can do.
We cannot know when we're about to terminate (unless we track our position and the length manually).
We could implement a string parser like;
(let ((last nil) (state (make-state)) (map nil (lambda (char) (build-state state last char) (setf last char)) buffer) ; handle the last element (build-state state last (elt (- (length buffer) 1))) state)
I do not think that it is reasonable to view map as being a general iteration mechanism.
I think that map is quite sufficient as a mapping mechanism without trying to shoehorn things like this in. :)
Some of these operations use independent element acccess, some of these use interdependent element access, ie elt vs' position.
The unexhaustive partition note indicates that you cannot reduce sequence to list XOR vector, and must consider sequence to be an ADT of its own, with two common implementations.
I do agree that my statement above was problematic, thank you for pointing this out.
br...@designix.com.au (Brian Spilsbury) writes: > The types vector and the type list are disjoint subtypes of type > sequence, but are not necessarily an exhaustive partition of > sequence."
Yes, for better or worse, this is left to _vendor_ experimentation.
A vendor, of course, can pass through experimentation capability to you.
As the NBS rep pointed out early on in the standards process, it's not the role of a standards committee to do design. We did it sometimes, but always as a last resort in order to achieve consensus when the options were in conflict. The first choice, though, is to have one or more vendors with a happy experience to report..
So I'd work on convincing my vendor if I were you...
>> The types vector and the type list are disjoint subtypes of type >> sequence, but are not necessarily an exhaustive partition of >> sequence."
>Yes, for better or worse, this is left to _vendor_ experimentation.
>A vendor, of course, can pass through experimentation capability to you.
>As the NBS rep pointed out early on in the standards process, it's not the >role of a standards committee to do design. We did it sometimes, but always >as a last resort in order to achieve consensus when the options were in >conflict. The first choice, though, is to have one or more vendors with a >happy experience to report..
>So I'd work on convincing my vendor if I were you...
Brian *is* assuming the rule of vendor here i.e., SBCL hacker.