Hi, Could someone recommend any good, independent sites for the above? In particular, I'm looking for product comparisons between Harlequin's end Franz's implementations. I had tried the ALU site (and the rest of the web) for a few hours without any success. Thanks in advance.
> In particular, I'm looking for product comparisons between > Harlequin's and Franz's implementations. I had tried the ALU site > (and the rest of the web) for a few hours without > any success.
Apart from Usenet snippets, the only product comparison I'm aware of is by David Lamkins (http://www.teleport.com/~dlamkins). Unfortunately, it's probably too old to be of any use.
I've waited two days for people with more experience to shed some light here. But, apparently, nobody is willing to burn his fingers on a comparison between Harlequin and Franz. So here's my (very personal and very subjective) impression, based on about 1000 hours of working with Harlequin's Lispworks, 50 hours of experiments with Franz' previous version (don't remember version number) for Windows and about 5 hours of playing with Franz' current version. All of this on Windows 95/98.
* Price Franz is a lot more expensive than Harlequin (at least a few thousand vs. less than one thousand dollars). Also, Franz wants royalties for programs that you distribute; Harlequin doesn't (unless you use their Enterprise Edition). For personal use, both companies have a free version.
* Conformance to standards. My impression is that both companies are pretty good at conforming to the ANSI spec, but that Harlequin takes it a bit more seriously than Franz. Harlequin seems to be much better at supporting Unicode and other character sets. Franz still seems to think that 256 characters is more than enough (just like Bill Gates thought that 640K is more than anyone would ever need).
* Integration with underlying platform My impression is that Franz puts more effort into this than Harlequin. For Windows, Franz seems to support more platform-specific stuff (e.g. multimedia extensions, tree views). Also, their development environment has a more 'natural' feel.
* Performance I haven't run any benchmarks, but Lispworks feels a bit more sluggish (both in space and speed) than Allegro CL.
If money didn't matter, I would use Allegro for platform-dependent stuff and Lispworks for everything else. Personally, I can't afford Allegro and I've settled for Lispworks. I've never regretted buying it.
I'll be happy to have my impressions corrected by people who know better.
* Arthur Lemmens <lemm...@simplex.nl> | I've waited two days for people with more experience to shed some | light here. But, apparently, nobody is willing to burn his fingers | on a comparison between Harlequin and Franz.
that's because this is the kind of stuff lawsuits are made of. you need a protective wrapper of serious legal quality to dive into this matter of comparing products in general. not that I think Franz or Harlequin will sue anyone, but most professionals are aware of the problems of comparing products, and consequently avoid it, at least in public.
| So here's my (very personal and very subjective) impression, based on | about 1000 hours of working with Harlequin's Lispworks, 50 hours of | experiments with Franz' previous version (don't remember version number) | for Windows and about 5 hours of playing with Franz' current version. | All of this on Windows 95/98.
although extremely important to inform your readers of (thanks), this makes your comparison "weak". (I wouldn't be able to provide a stronger comparison, by the way.)
| * Price
price comparisons are more dangerous than any other comparisons.
| * Conformance to standards.
this comparison should be performed by someone very familiar with the standard and its semantics, because impressions of non-conformance may actually be within the bounds of conformance, and some non-conformances may be insignificant and easily fixed if the vendor is alerted to them.
| My impression is that both companies are pretty good at conforming to the | ANSI spec, but that Harlequin takes it a bit more seriously than Franz.
this is _very_ difficult to establish from watching the products, as it refers to intentions and future, not to the past. it _is_ fair to say that Harlequin's LispWorks conforms better to the specification in some areas than Franz's Allegro CL does, but it has to be an area-by-area comparison to be fair, and the severity of the non-conformance is also important for a fair comparison. e.g., _my_ impression is that Allegro CL has a weaker safe mode (not all errors signal errors as they should) than one could hope for, but this is not an area where I need it, so it may or may not matter to a particular programmer. (incidentally, I know that Franz Inc _is_ taking conformance seriously and I'm working with them to help us all get there.)
| Harlequin seems to be much better at supporting Unicode and other | character sets.
although very valuable for a user, this is not about conformance to the ANSI Common Lisp standard. it is therefore important to state what you expect from a product.
| Franz still seems to think that 256 characters is more than enough (just | like Bill Gates thought that 640K is more than anyone would ever need).
such parenthetical remarks, however, make your "comparison" nigh useless.
incidentally, Franz Inc has an "international" (= Japanese) version that covers the need of most present non-Latin speakers. (I have had to do a little home-brewing to get ISO 8859-1 working as I want it to in Allegro CL, but I don't know whether LispWorks is any better.)
| * Integration with underlying platform
this is a valuable comment to a user.
| * Performance
comparisons here are fraught with danger and should be performed with published code and all sorts of things. e.g., some property that makes it feel "sluggish" could be extremely easy to fix, and other properties can be very hard to change because they are pandemic to the design. I think performance comparisons are _generally_ unfair, because after you have decided on a product, you learn how to make it faster.
| I'll be happy to have my impressions corrected by people who know | better.
I don't want to snap at you, but it's a _lot_ safer to talk to the person requesting a comparison and let it be a personal exchange, rather than post impressions and request correction; it usually requires a huge effort to correct simple misimpressions. this is why comparisons often produce a tremendous amount of noise on the newsgroups. also, most user impressions are exceedingly hard to quantify, and a lot of factors come into play.
incidentally, I haven't had the opportunity to compare Allegro CL with much anything else. (I went from CMUCL 17f to Allegro CL 4.3 and it was a world of difference, so I don't even consider CMUCL possible to compare in the area I think matters the most: the development environment.) I get the performance I need, and I get the support I need from Franz Inc whenever I wonder about something or find a problem, and I see no reason to go look for a competing product. now, this is more an accident of history than anything else, so it does in no way preclude similar experiences with Harlequin -- it just didn't happen to me. my guess is that this is how most user impressions are formed: luck and good timing.
#:Erik -- environmentalists are much too concerned with planet earth. their geocentric attitude prevents them from seeing the greater picture -- lots of planets are much worse off than earth is.
Arthur Lemmens wrote: > I've waited two days for people with more experience to shed some > light here. But, apparently, nobody is willing to burn his fingers > on a comparison between Harlequin and Franz. So here's my (very
OK. I've used both Allegro (up to 4.2) and Harlequin (upto 1995), and I must say, FFI issues aside, that I would go with Allegro. I just feel safer (some might argue that Harlequin's other businesses make it safer, but for me it gives a feeling (just an impression, but that's what marketing
is all about) of lack of commitment. Maybe more importantly, I never `got' the Harlequin way. I'm not a fan of IDEs. The primitive (but then, you have those cool menus) Xemacs interface of Allegro feels good. I've never used ACL for Windows but it looks pretty neat, so maybe it's just that I don't like _this_ Harlequin IDE.
I'd really like a Genera for SGI to play around. I have no use for it right now, but I'd gladly pay my own $800 for the manuals (upgradeable to a commercial license).
-- Fernando D. Mato Mira Real-Time SW Eng & Networking Advanced Systems Engineering Division CSEM Jaquet-Droz 1 email: matomira AT acm DOT org CH-2007 Neuchatel tel: +41 (32) 720-5157 Switzerland FAX: +41 (32) 720-5720
Erik Naggum wrote: > price comparisons are more dangerous than any other comparisons.
Would you care to explain? I would think that price is just about the only thing you can compare without the risk of giving "misimpressions".
> this comparison should be performed by someone very familiar with the > standard and its semantics > [...] > but it has to be an area-by-area comparison to be fair, and the > severity of the non-conformance is also important for a fair comparison.
I can't disagree with this, of course. But I tried to make it clear that I was giving my personal impression and not attempting to make a fair comparison. (I couldn't possibly find the time for a fair comparison, but I didn't want to leave the original question unanswered.)
> | Franz still seems to think that 256 characters is more than enough (just > | like Bill Gates thought that 640K is more than anyone would ever need).
> such parenthetical remarks, however, make your "comparison" nigh useless.
Sorry, I shouldn't have said that. This wasn't the right place to vent my frustration about the slow acceptance of a decent international character set.
> (I have had to do a little home-brewing to get ISO 8859-1 working > as I want it to in Allegro CL, but I don't know whether LispWorks > is any better.)
I'm so glad that I only need to type (code-char #x41A) to actually get a Russian K that I've forgiven Lispworks for returning NIL when I ask (alpha-char-p *)
But it _does_ know something about Latin 1:
CL-USER 17 > (code-char #xF0) #\ð
CL-USER 18 > (char-upcase *) #\Ð
> it's a _lot_ safer to talk to the person requesting a comparison > and let it be a personal exchange, rather than post impressions > and request correction;
Thanks for the advice. I don't know if I will actually follow it, though. Having a public discussion increases the chance that _I_ can learn something as well. E.g., if I had sent my remarks privately, I wouldn't have learnt from you that Franz has an international version of Allegro CL.
> (I went from CMUCL 17f to Allegro CL 4.3 and it was a world of > difference, so I don't even consider CMUCL possible to compare > in the area I think matters the most: the development environment.)
Let's hope the CMUCL maintainers won't sue you for this remark ;-)
* Arthur Lemmens <lemm...@simplex.nl> | Would you care to explain? I would think that price is just about the | only thing you can compare without the risk of giving "misimpressions".
there are all sorts of pricing policies around, depending on who you are (student, commercial, educational), where you are (United States, Europe, Asia), how much you want to buy (trial, student, professional, enterprise edition), etc, etc. it's actually difficult to compare the price that you would have to pay unless you're the person buying something and in position to weigh alternatives. for instance, one might find that some add-on product is not worth the price from one vendor and roll your own, while from another vendor the price is acceptable. the result may be that the former costs less from the vendor than the latter, but more after life-cycle costs for the new code are accounted for, but not with the initial prices, only. stuff like this is why large companies have acquisitions departments who work like hell to get good package deals.
| But it _does_ know something about Latin 1: | | CL-USER 17 > (code-char #xF0) | #\ð | | CL-USER 18 > (char-upcase *) | #\Ð
good. Allegro CL does this correctly only with my personal fixes. (which, incidentally, supports the entire ISO 8859 family, one by one, once properly invoked.)
| E.g., if I had sent my remarks privately, I wouldn't have learnt from | you that Franz has an international version of Allegro CL.
valid point. however, it would have been prudent to ask Franz Inc if you tried to write a fair comparison.
#:Erik -- environmentalists are much too concerned with planet earth. their geocentric attitude prevents them from seeing the greater picture -- lots of planets are much worse off than earth is.
In article <37164EF3.F1598...@simplex.nl>, Arthur Lemmens <lemm...@simplex.nl> wrote:
(...)
> I'm so glad that I only need to type > (code-char #x41A) > to actually get a Russian K
Cyrillic K, please. There are a number of nations besides the Russians, including Byelorussians, Macedonians, Serbs, Ukrainians, as well as Bulgarians, who use (different versions of) this alphabet.
I am too lazy to look it up, but I believe the ISO 10646 name of this character is CYRILLIC CAPITAL LETTER KA or something.
Historical Note: The original version of the Cyrillic alphabet was developed in the 9th century on the basis of the Greek alphabet. Its name is a tribute to St. Cyril, the Eastern Roman scholar and missionary who captured the phonetics of the (then common) Slavonic language into a writing system (using a different alphabet, now extinct) and who was a translator of the Bible, and its development is credited to St. Climent of Okhrid, one of St. Cyril's disciples.
By the way, I am not aware of another alphabet besides the contemporary Russian version of Cyrillics where the number of letters is a power of 2 (2^5).
(...)
> But it _does_ know something about Latin 1:
> CL-USER 17 > (code-char #xF0) > #\ð
> CL-USER 18 > (char-upcase *) > #\Ð
(...)
I'd rather have had the above as
(char-code (char-upcase (code-char #xF0)))
instead of, or in addition to, the above, which makes little apparent sense on my Macintosh with a Cyrillic font selected.
-- Vassil Nikolov <vnikoÄ
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
> By the way, I am not aware of another alphabet besides the > contemporary Russian version of Cyrillics where the number > of letters is a power of 2 (2^5).
[mumbling the alphabet]... hmm, last time I checked - it was 33. :-)
You probably forgot cyrillic-letter-io, which is rarely used in printed Russian (mostly in texts for children and foreigners and in ambiguous cases) and is substituted with cyrillic-letter-ie. Still it's part of the alphabet/orthography.
I wrote: > I'm so glad that I only need to type > (code-char #x41A) > to actually get a Russian K
Vassil Nikolov replied:
> Cyrillic K, please. There are a number of nations besides the > Russians, including Byelorussians, Macedonians, Serbs, Ukrainians, > as well as Bulgarians, who use (different versions of) this alphabet.
Uhm, yes. Sorry. In my situation, (code-char #x41A) is usually a Russian K. But next time I'll call it Cyrillic. I sometimes forget I'm not the only Cyrillic speaking (;-) Lisp programmer on Usenet.
> I am too lazy to look it up, but I believe the ISO 10646 name of > this character is CYRILLIC CAPITAL LETTER KA or something.
I looked it up. You're right.
> I'd rather have had the above as
> (char-code (char-upcase (code-char #xF0)))
> instead of, or in addition to, the above, which makes little > apparent sense on my Macintosh with a Cyrillic font selected.
Before sending, I verified that the content-type header included "charset=iso8859-1" to increase the probability of readers seeing what I meant.
Arthur Lemmens <lemm...@simplex.nl> writes: > > (I have had to do a little home-brewing to get ISO 8859-1 working > > as I want it to in Allegro CL, but I don't know whether LispWorks > > is any better.)
LispWorks uses ISO 8859-1 for files by default. Currently users need to do some configuration to use other encodings for files.
The internal encoding of LispWorks 4.x is Unicode. It has just one executable.
> I'm so glad that I only need to type > (code-char #x41A) > to actually get a Russian K that I've forgiven Lispworks for > returning NIL when I ask > (alpha-char-p *)
> But it _does_ know something about Latin 1:
> CL-USER 17 > (code-char #xF0) > #\ð
> CL-USER 18 > (char-upcase *) > #\Ð
Yes, in LispWorks we added the alphabetic property and case-pairs (beyond those required by the ANSI standard) for Latin-1 only. I should admit that this is rather half-baked, but allow me to explain one of the technical problems...
Recall that BASE-STRINGs contain only BASE-CHARs. LispWorks provides also a 16bit string type (TEXT-STRING) which can contain all of Unicode.
There is a particular difficulty (for LispWorks at least) with U+00FF LATIN SMALL LETTER Y DIARESIS which is a BASE-CHAR in LispWorks yet its uppercase pair (as defined by Unicode) is an EXTENDED-CHAR. Thus if we were to make these particular characters BOTH-CASE-P then STRING-UPCASE etc. could not be relied upon to preserve string types. That might be acceptable by the ANSI standard (though potentially dangerous to users whose code used specialized accessors) but the real killer was NSTRING-UPCASE.
I suppose we could have defined a larger set of alphabetic characters without such problems, but we didn't. Sorry!
There was some attempt to define extended character case (and other) functions in the JEIDA Common Lisp Guideline. I don't know if anyone actually implemented that.
LispWorks users needing case converters beyond Latin-1 should exploit the fact that the internal encoding is Unicode to write their own functions using range checks.
-- Dave Fox Email: da...@harlequin.com Harlequin Ltd, Barrington Hall, Tel: +44 1223 873879 Barrington, Cambridge CB2 5RG, England. Fax: +44 1223 873873 These opinions are not necessarily those of Harlequin.
* David Fox | | There is a particular difficulty (for LispWorks at least) with | U+00FF LATIN SMALL LETTER Y DIARESIS which is a BASE-CHAR in | LispWorks yet its uppercase pair (as defined by Unicode) is an | EXTENDED-CHAR.
This problem is solved in the recently-approved ISO 8859-15, so providing that as an alternative to 8859-1 may make sense.
David Fox <da...@harlequin.co.uk> wrote: >Yes, in LispWorks we added the alphabetic property and case-pairs >(beyond those required by the ANSI standard) for Latin-1 only. I >should admit that this is rather half-baked, ...
If you need the case pairs for some other codepages, you can grab those. It's not the full set but you get the idea. I haven't found them on the net anywhere so I had to calculate them by my own. I wrote it for AutoLISP, in CL you maybe could use vectors instead.
I post this also (instead of linking to it) to let you see how weird some codepages had been designed, considering case predicates and case conversions. I guess most OS do it by precalculating the tables, wasting a lot of bytes.
Note: the third element <islower> of each triple is the numeric difference from the uppercase to the lowercase char. so (65 90 32) means that there are 36 uppercase chars from 60 to 95 with the lower brothers 32 above (65+32 up to 90+32)
;;; Hardcoded charset capital letter ranges per codepage, ;;; kind of LC_CTYPE info. Format: list of: (<from> <to> <tolower>) ;;; Found the differences in toupper, tolower, isupper, islower ;;; by scanning the descriptive character names for upper and lower, ;;; unified the pairs into groups and came up with redefinitions ;;; of the upper/lower predicates and conversions. (setq std:cp-cap-ascii '((65 90 32))) ; this is simple ;; there's a hole at 215 (setq std:cp-cap-iso8859-1 '((65 90 32)(192 214 32)(216 223 32))) (setq std:cp-cap-iso8859-2 '((65 90 32)(192 214 32)(216 223 32) (161 161 16)(163 163 16) (165 166 16)(169 172 16)(174 175 16) )) (setq std:cp-cap-iso8859-3 '((65 90 32)(192 214 32)(216 223 32) (161 161 16)(166 166 16) (169 172 16)(175 175 16) ; 0xAE, 0xBE seem to missing ))
;; Beware: Dynamic Autolisp code, just to get the idea. ;; you really should store the pairs in bitfield for the ;; predicates and vectors for the converters. (defun STD-ISUPPER (_i) (if (stringp _i) (setq _i (ascii _i))) (apply 'or (mapcar (function (lambda (l) (<= (car l) _i (cadr l)))) std:actual-cp-cap)))
(defun STD-TOUPPER (i / cp x) (setq x (car (setq cp std:actual-cp-cap))) (while x (if (<= (+ (caddr x) (car x)) i (+ (caddr x) (cadr x))) (setq i (- i (caddr x)) x nil) (setq cp (cdr cp) x (car cp)) ) ) i )
In article <7f6l5s$cn...@news.ptc.spbu.ru>, "Valeriy E. Ushakov" <u...@ptc.spbu.ru> wrote:
> Vassil Nikolov <vniko...@poboxes.com> wrote: (...) > > By the way, I am not aware of another alphabet besides the > > contemporary Russian version of Cyrillics where the number > > of letters is a power of 2 (2^5).
> [mumbling the alphabet]... hmm, last time I checked - it was 33. :-)
> You probably forgot cyrillic-letter-io, which is rarely used in > printed Russian (mostly in texts for children and foreigners and in > ambiguous cases) and is substituted with cyrillic-letter-ie. Still > it's part of the alphabet/orthography.
No, I had not forgotten it, I forgot to write something like `mainstream use,' and I apologise for that. (I believe I have (almost) never seen this letter in a publication (and I have read a _lot_ of Russian texts, still do from time to time) which was not a children's book, a textbook, a dictionary, or some such.) Specifically, in the context of the thread, I was thinking of the 32-character block that one sees in 8859-5 etc.
Of course, cyrillic letter io _is_ an integral part of the Russian alphabet (Ukranian? Byelorussian?), and I should have mentioned it. By the way, with all those language reforms, the phrase `last time I checked' is very appropriate...
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
In article <3716EA3D.22D9B...@simplex.nl>, Arthur Lemmens <lemm...@simplex.nl> wrote: (...)
> In my situation, (code-char #x41A) is usually a > Russian K. But next time I'll call it Cyrillic.
That sounds like an interesting situation. If it is _usually_ that, what is it _sometimes_? Does it ever happen to be a Bulgarian K? And what would the difference be, for your purposes, between a Russian K and a Bulgarian K? (I'd be hard pressed to think of such a difference in terms of characters and their codes.)
(Or do you sometimes use another 16-bit-per-character encoding where #x41A is the code of some Chinese or Japanese ideogram?)
My point was that unless the context is appropriately specific, the generic name (Cyrillic) should be used in preference to the language-specific name (Russian). In the same way, outside of a specific context, it is appropriate to say `Roman K' (or `Latin K'), rather than `English K' (or `Italian K' etc.).
If only the world had simply stuck to the good old Phoenician alphabet as it was...
(...)
> > I'd rather have had the above as
> > (char-code (char-upcase (code-char #xF0)))
> > instead of, or in addition to, the above, which makes little > > apparent sense on my Macintosh with a Cyrillic font selected.
> Before sending, I verified that the content-type header included > "charset=iso8859-1" to increase the probability of readers seeing > what I meant.
I _did_ see what you meant---but not with my _eyes_ (with the mind's eye, perhaps, if my mind has one---I have never seen it).
Well, I know I deserve to lose... Having struggled on too many occasions with all those 4-5 different Cyrillic encodings that are in _active_ use around myself (and that are mutually exclusive with the Roman letters with diacritical marks, for happiness to be complete), and with all those different EBCDIC-ASCII mappings, etc.^1, I have become somewhat hypersensitive to not having the character code itself on such occasions. I wish ``charset=...'' did work, always. In a perfect world, maybe. __________ ^1 the law of perverse solutions (`every problem has one') is also applicable here: there are character sets where the codes for the Roman _and_ Cyrillic letters A, C, E, etc. (that have the same glyphs) are the same... KOI-8 (and even DKOI) is a blessing by comparison.
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
In article <3717b01c.47909860@judy>, rur...@sbox.tu-graz.ac.at (Reini Urban) wrote: (...)
> I post this also (instead of linking to it) to let you see how weird > some codepages had been designed, considering case predicates and case > conversions. I guess most OS do it by precalculating the tables, wasting > a lot of bytes.
(...)
First of all, it was nice of you to post a useful piece of data.
Second, I would like to make a few points, not to criticise, but to show there are different ways to look at this.
* The sets you identified as weird all contain Cyrillic characters that by themselves look rather strange, even to one who knows the Greek alphabet (which helps a little). Regarding the layout, weirdness comes at least in part from the fact that only the 32 `mainstream' Cyrillic characters are in contiguous positions (even with `well-behaved' sets like 8859-5). Since there are other characters in addition to these 32, they had to be fit elsewhere, while deciding which other characters (like left/right single/double quotes) to keep and which to sacrifice.
(By the way, even limiting ourselves to ()[]{}<>, there isn't a simple operation like toggling a bit to convert an `opener' into a `closer,' so even 7-bit ASCII is not absolutely regular (not that it could have been, I believe).)
* Keeping tables to support case conversions etc. does not take up that much memory (especially now that memory does not come so expensive as a couple of decades ago), and improves speed a lot; besides, with some sets like KOI-8 and effectively Macintosh Cyrillics^1 as well, tables are a must in order to do sorting even if we limit ourselves to the `mainstream' characters (because (< (CODE-CHAR a) (CODE-CHAR b)) does not produce alphabetical order). __________ ^1 uppercase: 80-9F, lowercase: E0-FE,DF
Third, if anyone needs assistance with making sense out of Cyrillic characters and sets (in particular, Bulgarian, Russian, and Serbian <silly remarks deleted>), I'd be happy to be of any help, just send me a private e-mail.
Good luck with character sets, Vassil.
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
In article <wk3e20ify6....@ifi.uio.no>, Lars Marius Garshol <lar...@ifi.uio.no> wrote:
> * David Fox > | > | There is a particular difficulty (for LispWorks at least) with > | U+00FF LATIN SMALL LETTER Y DIARESIS which is a BASE-CHAR in > | LispWorks yet its uppercase pair (as defined by Unicode) is an > | EXTENDED-CHAR.
> This problem is solved in the recently-approved ISO 8859-15, so > providing that as an alternative to 8859-1 may make sense.
It's good that it has been solved (well, I shouldn't say that when I don't know how). I was never able to understand what made them use M-DEL for a printable character in the first place.
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
* Vassil Nikolov <vniko...@poboxes.com> | It's good that it has been solved (well, I shouldn't say that when I | don't know how). I was never able to understand what made them use M-DEL | for a printable character in the first place.
ISO character sets come in 94-character and 96-character flavors, apart from ISO 10646. the ISO 8859 family uses the ISO 4873 8-bit template, with a 94-character set in the left half and a 96-character set in the right half.
in the 94-character set, 2/0 is SPACE and 7/15 is DELETE, both of which sort of dual as control and data characters. in the 96-character set, 2/0 and 7/15 are data characters.
if you have a 94-character set and only 7 bits worth of data, the last bit is free to be used for other purposes, such as constant zero, parity, an application flag, or constant one. most modern uses are constant zero and an application flag. however, if you use an 8-bit character set, the only chance you have at using an application flag is with 10/0 and 15/15, in which case you'd probably want a non-breaking space and what IBM calls EO (eight ones), used as an "end of whatever" signal. referring to 15/15 as "M-DEL" regardless of whether it is a character or EO betrays a serious conceptual confusion about the usage of the code space.
incidentally, there _is_ no upper-case version of ÿ, just as there is no upper-case version of ß. pining for LATIN CAPITAL LETTER Y WITH DIARESIS is like pining for LATIN CAPITAL LETTER SHARP S -- a symptom of a strong inability to deal with practical matters and to understand the sometimes _very_ erratic history of writing systems.
not that Vassil or anyone here is particularly to blame for this, but the history of the æ, oe (not in 8859-1 because some French moron told ECMA it wasn't needed and shouldn't be there, and then we got × and ÷ stuck in the middle of the O's, only to have the smart French guy who designed this stuff return fully recuperated after some serious accident or other, only the voting had completed, to demand a 8859 member with OE and oe -- which they got from ISO after a few years, but which nobody uses, not even the French¹), and ÿ are one of dipthongs that merged over the course of centuries and then assumed phonemes of their own. ae -> æ in Denmark and Norway are almost the same as ä in Sweden, but different from ä in Germany (and the decoration used to be different, too, until ECMA had enough of it). the French oe has a long and arduous story I don't know in detail, but it's not unlike ö in Germany.
now, ÿ is not a y with diaeresis at all. it has more in common with et (&) and ad (@) than y, since it's "ij" written together. in Belgia and the Netherlands, it is pronounced like the English long I. of course, as time goes by, various stupid people will do all kinds of stupid things, and in this case, we have the _reverse_ of what happened in France when some genius² decided that capital letters should not have accents because that was too hard to do with early typewriters and printers -- this has since been reversed when computers learned how to handle French. so now that we have these nifty computerized thingamajigs, let's just forget that neither I nor J have dots on them, even though i and j do (despite the linguist³ who decided that Turkish i and j should upcase to I and J with dots, but I and J should downcase to i and j without dots, which I think is at least part of the reason awful movies get Turkey awards), so the nifty computers should produce a _really_ historically moronic letter that nobody in their right mind would ever want to use.
so, the single cluon in danger of being annihilated by swarms of morons upon contact is that just as ß is upcased to SS, ÿ is upcased to IJ.
[ this article was best viewed with an ISO 8859-1 capable font. ]
#:Erik ------- ¹ the morale of this story is either to keep the morons away from standards bodies or not to have serious accidents if you're the only smart guy in France. ² read: moron -- it wasn't the only smart guy in France alluded to above. ³ another moron; wouldn't surprise me if he was French. -- environmentalists are much too concerned with planet earth. their geocentric attitude prevents them from seeing the greater picture -- lots of planets are much worse off than earth is.
On 17 Apr 1999 17:23:24 +0000, "Erik" == Erik Naggum <e...@naggum.no> writes:
Erik> now, ÿ is not a y with diaeresis at all. it has more in common with et Erik> (&) and ad (@) than y, since it's "ij" written together.
Being Dutch, I probably should have known or figured this out, but I didn't; I always thought it was a Turkish letter. I don't know who invented the graphical form of this letter (ÿ), but it probably wasn't a Dutchman. In actual practice, "ij", although one letter (actually, diftong), is *always* typed and typeset as an i followed by a j. As far as I'm concerned, i'd be happy to ceede this ascii value to more important purposes (capital sharp s?) When upcased, both i and j have to be upcased (which is rare, but a good example is 'IJsselmeer', the big watery hole in the middle of Holland^H^H^H^H^H^H^H^HThe Netherlands). However, most dictionaries sort the 'ij' as two separate letters. Confusing, sortof. Philip -- To accurately forge this signature, use a lucidatypewriter-medium-12 font --------------------------------------------------------------------------- -- Philip Lijnzaad, lijnz...@ebi.ac.uk | European Bioinformatics Institute +44 (0)1223 49 4639 | Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) | Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53
Erik Naggum <e...@naggum.no> writes: > now, ÿ is not a y with diaeresis at all. it has more in common with et > (&) and ad (@) than y, since it's "ij" written together. in Belgia and > the Netherlands, it is pronounced like the English long I. of course, as > time goes by, various stupid people will do all kinds of stupid things,
Except that in the Dutch speaking parts of Belgium and the Netherlands, everybody writes it as ij. The confusion could have been started because some morons (this time not even French) collated the ij combination with the y, although modern dictionaries have stopped this a long time ago. There is also some difference of opinion how to write an uppercase version of this. Some people use Ij but most - especially in handwriting will use a variant of uppercase Y with diaresis.
BTW: if Gordon's Introduction to Old Norse is accurate and can be extrapolated to the modern variant, it's rather pronounced as the ei diphtong in 'bein'.
-- Lieven Marchand <m...@bewoner.dma.be> If there are aliens, they play Go. -- Lasker
In article <3133358604132...@naggum.no>, Erik Naggum <e...@naggum.no> wrote:
> * Vassil Nikolov <vniko...@poboxes.com> > | It's good that it has been solved (well, I shouldn't say that when I > | don't know how). I was never able to understand what made them use M-DEL > | for a printable character in the first place. (...) > however, if you use an 8-bit character set, the > only chance you have at using an application flag is with 10/0 and 15/15, > in which case you'd probably want a non-breaking space and what IBM calls > EO (eight ones), used as an "end of whatever" signal. referring to 15/15 > as "M-DEL" regardless of whether it is a character or EO betrays a > serious conceptual confusion about the usage of the code space.
I don't know if what it _betrays_ is true (don't have such introspective capabilities), but what it _is_ is inappropriate use of technical jargon. Sorry for that.
Correct me if I am wrong, but the above (quoted) paragraph does not contradict a statement that using 15/15 for a printable character is inappropriate. Or did I miss anything?
(...)
> _very_ erratic history of writing systems.
But very interesting, and from an information technology point of view too. (Writing is an information technology in my book as this phrase does not necessarily mean computer technology.)
It is hard to encode the barely encodable. (I.e. to transform human speech into a sequence of signs.) I find it interesting that the same language can be used for speaking and writing.
> not that Vassil or anyone here is particularly to blame for
[inadequacies in standardised character sets]
:-)
(This reminded me of some Russian who allegedly said, `Cyril and Methodius did such a bad thing to us...' (meaning that otherwise Russians would be using the Roman alphabet, like e.g. the Polish or the Czech, and be saved from many headaches, perhaps).) __________ For the Russian-speaking: `Kiril i Metodij nam takoe nadelali...'; St. Methodius was St. Cyril's brother and co-developer/co-translator.
(...)
> neither I nor J have dots on them, even though i and j do (despite > the linguist3 who decided that Turkish i and j should upcase to I and J > with dots, but I and J should downcase to i and j without dots, which I
I don't understand your point here. In the version of the Roman alphabet as used in Turkey (and adopted by an Act of Parliament from 1928, by the way), there are two I's: one has dots both in the small and capital case (and is pronounced as the `i' in `fit') and the other has no dots either in the small or capital case (and is pronounced as the `i' in `fir' but short and without any `r' of course). Whether this is moronic is not for me to say, but this is the way the Turkish alphabet is. (As to J in that alphabet, it has a dot in the small case only.)
(Turkish is a very rich language, having incorporated a lot from Arabic and Persian; until Ataturk's reforms in the 1920's, Arabic script (or some variety thereof) was used for writing. I do not know Turkish (apart from a few words), but I have a dictionary and I know a few facts about its history (of the language, not the dictionary).)
> think is at least part of the reason awful movies get Turkey awards), so > the nifty computers should produce a _really_ historically moronic letter > that nobody in their right mind would ever want to use.
I.e. a small minority would never want to use it, and the majority will just accept it as the latest and the greatest benefit coming from computer technology.
> so, the single cluon in danger of being annihilated by swarms of morons > upon contact is that just as ß is upcased to SS, ÿ is upcased to IJ.
I wondered (as an academic exercise) what should CHAR-UPCASE and NSTRING-UPCASE do about LATIN SMALL LETTER Y WITH DIAERESIS (assuming STRING-UPCASE is allowed to return a longer string which isn't especially nice either). Signal an error? Or the implementation would state that the character sets it uses do not include this letter? (Making CHAR-UPCASE return two values, like #\I and #\J in this case, appears more than perverse, though who knows.)
> [ this article was best viewed with an ISO 8859-1 capable font. ]
I did use one this time, on a different machine.
(...)
-- Vassil Nikolov <vniko...@poboxes.com> www.poboxes.com/vnikolov (You may want to cc your posting to me if I _have_ to see it.) LEGEMANVALEMFVTVTVM (Ancient Roman programmers' adage.)
-----------== Posted via Deja News, The Discussion Network ==---------- http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
* Vassil Nikolov <vniko...@poboxes.com> | Correct me if I am wrong, but the above (quoted) paragraph does not | contradict a statement that using 15/15 for a printable character is | inappropriate. Or did I miss anything?
yes. 10/0 and 15/15 are characters when the right-hand side of an 8-bit character set (GR) is filled with a 96-character set. (the other 32 are control characters (C1).) if you had filled it with a 94-character set, it would have been inappropriate to use 15/15 at all.
the reason for this is that 10/0 and 15/15 are characters in their own right and must be coded with 8 bits, but if you use a shifting coding with only 7 bits and codes to swap between G0 and G1 (both now in GL) with the codes SO and SI, then it's important that 2/0 and 7/15 remain their usual semi-control characters even when G1 is invoked.
| I don't understand your point here.
seems I was mistaken about the up/downcasing of I with/without dots. (shoot, gotta check and go back and fix those files for Emacs.)
| I wondered (as an academic exercise) what should CHAR-UPCASE and | NSTRING-UPCASE do about LATIN SMALL LETTER Y WITH DIAERESIS (assuming | STRING-UPCASE is allowed to return a longer string which isn't especially | nice either). Signal an error? Or the implementation would state that | the character sets it uses do not include this letter? (Making | CHAR-UPCASE return two values, like #\I and #\J in this case, appears | more than perverse, though who knows.)
I have come to think that people who use sick writing systems should pay for their own mistakes so they will have reason to fix them. forcing everybody else to pay for them only causes software not to be available. e.g., the Spanish purportedly undid the silly sorting requirements of ll (treated as a separate "letter" between k and l, I think it was) due to the force of simplicity and logic of computers (or was it marketing :). a German spelling reform (which people seem to hate rather strongly) do away with the sharp s and spell it "ss" in lowercase, too. the Norwegian and Danish sillitude of sorting "aa" as equivalent to "å" (a ring), and the hysterical requirement that German spelled out with "ue" instead of "ü" should be sorted as if it wasn't spelled out are examples of morons who got into standards bodies. (now, the right way to do this is to store a sort key and a print string, but since people don't use tools easily extendible that way, forcing stupid people to do this causes a lot of grief and problems when they try to print the sort key or vice versa.)
anyway, let's just ignore the issue and ask them to spell it out as ij, like the Dutch correctly do. (the ÿ is Belgian, _from_ Dutch ij.) (I'm not sure upcasing "ij" to "IJ" is all that great an idea, although it is obvious if you look at fonts designed in or for The Netherlands: they sport "ij" and "IJ" ligatures, just as fonts designed for Norway has a ligature for "fj" just like "fi", because of "fjord" and "fjell".)
anyway. 8 bits would have been enough if we had been using floating diacritics and upcasing and downcasing would have needed to worry about A-Z, only. ISO tried that, too, (ISO 6937) but computer people were not able to appreciate it, because they were thinking fonts, not character sets. sigh.
if there's reincarnation, I hope I won't remember any of this the next time around.
#:Erik -- environmentalists are much too concerned with planet earth. their geocentric attitude prevents them from seeing the greater picture -- lots of planets are much worse off than earth is.
* Erik Naggum | | now, ÿ is not a y with diaeresis at all. it has more in common with et | (&) and ad (@) than y, since it's "ij" written together.
* Philip Lijnzaad | | [...] In actual practice, "ij", although one letter (actually, | diftong), is *always* typed and typeset as an i followed by a j. As | far as I'm concerned, i'd be happy to ceede this ascii value to more | important purposes (capital sharp s?) When upcased, both i and j | have to be upcased [...]. However, most dictionaries sort the 'ij' | as two separate letters. Confusing, sortof.
Most? From what I've heard (from Dutch sources, BTW) IJ is sorted as a separate letter after Z. Can you elaborate on whether both happens or whether I've been misinformed?
And if it's really sorted separately then I think makes sense to consider it a separate character, as Unicode more or less does (although it calls it a ligature): U+0132 and U+0133.
* Lars Marius Garshol <lar...@ifi.uio.no> | And if it's really sorted separately then I think makes sense to | consider it a separate character, as Unicode more or less does | (although it calls it a ligature): U+0132 and U+0133.
this is getting a bit far afield, but collation order, characterness, and glyphness are distinct properties of a writing system element. for one thing, there is no _single_ correct collation order. character sets do _not_ imply collation order. characterness of a writing system element is a fairly fundamental concept and is strongly associated with meaning. glyphness of a writing system element is strongly associated with looks. finally, fonts are made up instantiations of glyphs. e.g., a writing system element may exhibit so different meanings that they deserve to be separate characters, although this is very rare. in general, there is also one glyph per character, although some have more (the German short and long s, the open and baggy a, the open and broken vertical line), but more frequent is a glyph for a sequence of characters (ligatures in Latin scripts, but includes vowels in Indic scripts and Hebrew) or a character in contex (the connectives (single, initial, medial, final) in Arabic scripts), etc. collation order is tightly coupled with character, but for hysterical raisins many languages collate sequences of characters as a single unit. to represent all of this correctly, you need a whole bunch of tables. there are therefore glyph set standards that are very separate from character set standards, and their mapping is non-trivial. there are huge tables of correct collation orders for different scripts and languages (French requires a five-level deep collation system in full name and dictionary sorting), and conflation of representation makes up most of it (e.g., no significance it attached to the ring in "Ångstrøm" in an English dictionary, where it is sorted with Angst, but you'll find it at the end of a Norwegian one because Å is a separate character).
Unicode is a hybrid of a character and a glyph set. the reason for this is fairly obvious when you consider its major proponents: Xerox and Microsoft. Xerox makes printers and wanted a simple standard for which they could make huge fonts. Microsoft are just too damn stupid to get it right or to respect any traditions. (Xerox didn't want it to replace the first ISO 10646 draft, however, so they may be excused.) in typical "is this a font or what?"-misunderstanding, æ was a ligature in Unicode, but I complained about it, so ISO 10646-1 has amended it to be a letter, and "ij" is a character, not a presentation form, which it should have been.
#:Erik -- environmentalists are much too concerned with planet earth. their geocentric attitude prevents them from seeing the greater picture -- lots of planets are much worse off than earth is.
On 18 Apr 1999 09:43:07 +0000, Erik Naggum <e...@naggum.no> wrote:
> e.g., the Spanish purportedly undid the silly sorting requirements > of ll (treated as a separate "letter" between k and l, I think it > was) due to the force of simplicity and logic of computers (or was > it marketing :).
Between "l" and "m".
What it's stupid, IMHO, is not the fact of having "ll" as a single letter, but having it so, and the same with "ch" (between "c" and "d") and then having "rr" as r+r and "qu" as q+u. The sound of most of those characters is not related to their spelling ("ll" is not an l+l, etc., and "q" is *never* used in isolation in Spanish, it is *always* q+u, the only case in Spanish where "u" is mute) so in a coherent world either "ch", "ll", "rr" and "qu" should each be treated as a single entity, or none of them at all (perhaps the best solution).
Regarding the reform of the sorting requirement, the Spanish RAE ("Real Academia Española de la Lengua") did it, but I think some latin-american academies objected and the issue was dropped. Not sure, thought.
* Erik Naggum | | now, ÿ is not a y with diaeresis at all. it has more in common with et | (&) and ad (@) than y, since it's "ij" written together.
* Philip Lijnzaad | | [...] In actual practice, "ij", although one letter (actually, | diftong), is *always* typed and typeset as an i followed by a j. As | far as I'm concerned, i'd be happy to ceede this ascii value to more | important purposes (capital sharp s?) When upcased, both i and j | have to be upcased [...]. However, most dictionaries sort the 'ij' | as two separate letters. Confusing, sortof.
* Lars Marius Garshol | | Most? From what I've heard (from Dutch sources, BTW) IJ is sorted as a | separate letter after Z.
Not that any of this has much to do with Lisp, but:
- U+00FF (LATIN SMALL LETTER Y DIAERESIS) is described in the Unicode standard as being French, not Dutch. This probably explains why Philip didn't recognize it as a Dutch letter. It also casts some doubt on Erik's explanation that it's "ij" written together. I suppose we have to wait for the French to tell us more about this (I read some French from time to time, but I don't recall ever having seen a ÿ.)
- The Unicode version of Dutch 'ij', which _is_ "ij" written together and is probably what Erik had in mind, is U+0133. Its upper case equivalent is U+0132.
- IJ is _never_ sorted as a separate letter after Z. Maybe, sometimes, it has been sorted as Y (between X and Z). Modern dictionaries sort it as I followed by J. So you have '("iets" "ijdel" "ijsje" "ik").
- When a Dutchman doesn't have a U+0133 handy (which is very likely), he just uses #\i followed by #\j. As in "ijsje". If this needs capitalizing, he'll use #\I followed by #\J. Capitalizing the above list would result in '("Iets" "IJdel" "IJsje" "Ik").
* Lars Marius Garshol | | And if it's really sorted separately then I think makes sense to | consider it a separate character, as Unicode more or less does | (although it calls it a ligature): U+0132 and U+0133.
For _capitalization_ it makes some sense to consider it a separate character. But _sorting_ will be much more likely to go wrong when you use a separate character.