Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ñïåøèàë äëÿ ðîññèéñêèõ ïèòîíùèêîâ)))

7 views
Skip to first unread message

Garber

unread,
Jul 13, 2003, 1:32:04 PM7/13/03
to
народ, полскажите плз где достать прогу фризер(кажеться, короче надо в
exe-шник перегнать).

Irmen de Jong

unread,
Jul 13, 2003, 3:01:11 PM7/13/03
to
Garber wrote:

> народ, полскажите плз где достать прогу фризер(кажеться, короче надо в
> exe-шник перегнать).


Yes!
I think.

;-)

On second thought, what's he saying?

Bruno Desthuilliers

unread,
Jul 13, 2003, 6:12:26 PM7/13/03
to

He said :


"народ, полскажите плз где достать прогу фризер(кажеться, короче надо в
exe-шник перегнать)."

It does seem clear enough, doesn't it ?-)

Martin v. Loewis

unread,
Jul 13, 2003, 4:22:31 PM7/13/03
to Bruno Desthuilliers
Bruno Desthuilliers wrote:
>> On second thought, what's he saying?
>>
>
> He said :
> "íàðîä, ïîëñêàæèòå ïëç ãäå äîñòàòü ïðîãó ôðèçåð(êàæåòüñÿ, êîðî÷å íàäî â
> exe-øíèê ïåðåãíàòü)."

That's not what he said. Instead, he said

"народ, полскажите плз где достать прогу фризер(кажеться, короче надо в
exe-шник перегнать)."

To see that, you have to read the message in windows-1251.

I think this roughly translates into "Can you please tell me where to
find the proggie freezer (which appears to be needed to create exes)?"

The program is called freeze, and it is located in the Tools directory,
atleast of the source distribution. See the FAQ for other options to
create exes.

Regards,
Martin

Irmen de Jong

unread,
Jul 13, 2003, 6:34:48 PM7/13/03
to
Martin v. Loewis wrote:

> That's not what he said. Instead, he said
>
> "народ, полскажите плз где достать прогу фризер(кажеться, короче надо в
> exe-шник перегнать)."
>
> To see that, you have to read the message in windows-1251.

Wonderful, I can "read" *your* rendition of it... probably because you
posted in UTF-8 :-) The original message was posted in regular latin-1...

> I think this roughly translates into "Can you please tell me where to
> find the proggie freezer (which appears to be needed to create exes)?"

Wow, you apparently understand what Garber wrote, too...

'nuff said
--Irmen

Dan Bishop

unread,
Jul 13, 2003, 9:18:53 PM7/13/03
to
Irmen de Jong <irmen@-NOSPAM-REMOVETHIS-xs4all.nl> wrote in message news:<3f11ac74$0$49101$e4fe...@news.xs4all.nl>...
> Garber wrote:
>
> > ?????, ?????????? ??? ??? ??????? ????? ??????(????????, ?????? ???? ?
> > exe-???? ?????????).

>
> Yes!
> I think.
>
> ;-)
>
> On second thought, what's he saying?

The Babelfish translation is

"People, you polskazhite plz where to reach progu to
frizer(kazhet'sya, shorter must be into yekhe-shnik outdistanced)."

Bengt Richter

unread,
Jul 13, 2003, 10:34:47 PM7/13/03
to

narod, polskaschitye plye postatb progou fribyer(kaschyetbsya, korouye nado b exe-shink peregnatb)

or somethink like that. No idea, just applying cyrillic font and picking out greek ;-P

Roman can tell us ;-)

Regards,
Bengt Richter

Martin v. Löwis

unread,
Jul 14, 2003, 1:47:40 AM7/14/03
to
Irmen de Jong <irmen@-NOSPAM-REMOVETHIS-xs4all.nl> writes:

> > I think this roughly translates into "Can you please tell me where to
> > find the proggie freezer (which appears to be needed to create exes)?"
>
> Wow, you apparently understand what Garber wrote, too...

The joys of having grown up in an east-block country... These days, I
wish my Russian was better than it is.

Regards,
Martin

Alan Kennedy

unread,
Jul 14, 2003, 6:06:03 AM7/14/03
to
Dan Bishop wrote:

> The Babelfish translation is
>
> "People, you polskazhite plz where to reach progu to
> frizer(kazhet'sya, shorter must be into yekhe-shnik outdistanced)."

My favourite "phrase designed to mess up machine translation" (I'll
bet there's one big long word for that in German) is

"Time flies like an arrow, but fruit flies like a banana".

:-D

I'd love to hear of other similar phrases. And somehow I intuit
there's people in this ng who know lots of them :-)

If people think it's too much newsgroup abuse to discuss such a
non-python subject, then email them to me privately, and I'll collect
and list them.

monday-morning-coffee-hasn't-kicked-in-yet-ly yrs.

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan

Francois Pinard

unread,
Jul 14, 2003, 9:20:36 AM7/14/03
to
[Alan Kennedy]

> My favourite "phrase designed to mess up machine translation" [...] is

> "Time flies like an arrow, but fruit flies like a banana".

The "Time flies" is a classic. I heard it a few dozens of years ago. :-)
Another similar classic, for French, is "Le pilote ferme la porte.".

> If people think it's too much newsgroup abuse to discuss such a

> non-python subject, [...]

It is Pythonic enough to me :-), as I long intended to rewrite a previous
word categoriser of mine, in Python. The categorical ambiguities are
always a challenge, and for French at least, there are plenty of those,
especially for texts which are loose on diacritical marks.

My previous categoriser was written in MoSTex, an (unknown) language and
system implemented in Scheme, made in such a way that it was compiling
to optimised machine code. I'm quite curious about how speedy, legible
and maintainable a Python implementation would be, by comparison.

--
François Pinard http://www.iro.umontreal.ca/~pinard

Alan Kennedy

unread,
Jul 14, 2003, 11:36:55 AM7/14/03
to
Francois,

>The "Time flies" is a classic. I heard it a few dozens of years ago. :-)
>Another similar classic, for French, is "Le pilote ferme la porte.".

I presume that is a play on "pilote ferme", as in "experimental farm",
as well as the obvious meaning "the pilot/driver closes the
door/gate". Is this meaning betrayed by the "le" matching "pilote",
rather than the "la" it should have for "ferme". Or have I missed the
point completely (I don't consider myself a strong french speaker)?

Joyeux Le Quatorze!

Skip Montanaro

unread,
Jul 14, 2003, 11:54:14 AM7/14/03
to

>> The "Time flies" is a classic. I heard it a few dozens of years
>> ago. :-) Another similar classic, for French, is "Le pilote ferme la
>> porte.".

Alan> ... Or have I missed the point completely (I don't consider
Alan> myself a strong french speaker)?

Not being any sort of French speaker, I don't understand the "pilote ferme"
one at all. François?

Skip

Andrew Dalke

unread,
Jul 14, 2003, 12:44:59 PM7/14/03
to
Alan Kennedy:

> I'd love to hear of other similar phrases. And somehow I intuit
> there's people in this ng who know lots of them :-)

Related, for speech recognition:

"It's hard to recognize speech"
"It's hard to wreck a nice beach"

(! Only 9 Google hits for "wreak a nice beach" and none for the full
phrase?)

And I tell the story about how in 9th grade we did sentence diagrams
(a form of parse tree developed by english majors :) and the first one
was
"Have you ever seen a pilot fish?"
Turns out this is ambiguous. I have two uncles who are (or were) pilots,
so I was thinking - have I ever seen them fish? Not, "ahh, a 'pilot fish',
like on sharks." The rest of the assignment was equally difficult if you
didn't realize the difference.

That was 20 years ago, and I'm still annoyed :)

Andrew
da...@dalkescientific.com


Mike Rovner

unread,
Jul 14, 2003, 2:37:11 PM7/14/03
to
Martin v. Loewis wrote:

> "народ, полскажите плз где достать прогу фризер(кажеться, короче надо
> в exe-шник перегнать)."
>
> To see that, you have to read the message in windows-1251.
>
> I think this roughly translates into "Can you please tell me where to
> find the proggie freezer (which appears to be needed to create exes)?"

You are absolutely right.

> The program is called freeze, and it is located in the Tools
> directory, atleast of the source distribution. See the FAQ for other
> options to create exes.

I'm afraid he is not fluent in English, so I'd repeat you answer in Russian:

Программа называется freeze и лежит в каталоге Tools в дистрибутиве.
FAQ описывает и другие способы создания exe-шников.

Mike


Irmen de Jong

unread,
Jul 14, 2003, 4:42:19 PM7/14/03
to
Dan Bishop wrote:

> The Babelfish translation is
>
> "People, you polskazhite plz where to reach progu to
> frizer(kazhet'sya, shorter must be into yekhe-shnik outdistanced)."

Ah thanks! That clears it all up. If we didn't have the Babelfish...

--Irmen

Francois Pinard

unread,
Jul 14, 2003, 4:12:45 PM7/14/03
to
[Skip Montanaro]

> >> Another similar classic, for French, is "Le pilote ferme la
> >> porte.".

> Not being any sort of French speaker, I don't understand the "pilote ferme"
> one at all. François?

Just to repeat publicly the content of a private reply, the two meanings
could be:

"The driver closes the door."
"The firm driver is carrying her."

The second meaning is clearly wrong to most listeners, but an automated
system has no real clue about this (unless it does wider context analysis
with the surrounding sentences, but this is fairly difficult.)

Alan Kennedy

unread,
Jul 15, 2003, 6:05:02 AM7/15/03
to
[Francois]

>Just to repeat publicly the content of a private reply, the two meanings
>could be:
>
> "The driver closes the door."
> "The firm driver is carrying her."

Aha!

That's very clever. I like the dual use of "la porte" (the door) as a noun
and "la porte" (carries her) as a verb. That makes it similar to the "time
flies" one, where "like" is used as a comparative (french "comme") and as a
verb (french "aime").

I was obviously completely off base with my interpretation of "pilote
ferme"/"experimental farm". :#)

Thanks Francois.

Regards,

Al.

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


John J. Lee

unread,
Jul 15, 2003, 7:20:01 AM7/15/03
to
Alan Kennedy <ala...@hotmail.com> writes:

> Dan Bishop wrote:
>
> > The Babelfish translation is
> >
> > "People, you polskazhite plz where to reach progu to
> > frizer(kazhet'sya, shorter must be into yekhe-shnik outdistanced)."
>
> My favourite "phrase designed to mess up machine translation" (I'll
> bet there's one big long word for that in German) is
>
> "Time flies like an arrow, but fruit flies like a banana".
>
> :-D
>
> I'd love to hear of other similar phrases. And somehow I intuit
> there's people in this ng who know lots of them :-)

[...]

There are so many...

Matthew 25:35 I was a stranger, and you took me in.

In English, there are two obvious meanings (the second, which is
amusingly in almost exact contradiction to the real meaning, is only
obvious to a native speaker once somebody sufficiently perverse has
pointed out its existence ;). Automated translation usually misses
both of them, of course.

What surprised me was *how many* possible interpretations of some of
these phrases there are: IIRC, one program extracted tens of subtly
different possible meanings from "Time flies...".


PS just to take things even further OT, does anybody remember the
words coined by Steven Pinker in one of his books (probably to
illustrate how English does something similar to German in forming new
words)? One meaning "fear of peanut butter sticking to the roof of
one's mouth", and another one which was, IIRC, some kind of
self-referential joke? Couldn't find it in "The Language Instinct",
so I guess it must have been in "How The Mind Works", which I don't
have a copy of.


John

Colin S. Miller

unread,
Jul 15, 2003, 9:43:48 AM7/15/03
to
>
> Matthew 25:35 I was a stranger, and you took me in.
Care to enlighten us with the second meaning?
I'm a native English speaker, but can only see one meaning
'I was unknown to you, yet you let me stay in your house'
Although 'took me in' could also mean 'accept as a friend'


Colin S. Miller

Duncan Booth

unread,
Jul 15, 2003, 10:24:45 AM7/15/03
to
"Colin S. Miller" <colinsm.s...@picsel.com> wrote in
news:4701fb...@195.171.216.1:

To "take someone in" means to trick or deceive them.

From
http://www.chambersharrap.co.uk/chambers/chref/chref.py/main?query=take&tit
le=21st (Must be a good dictionary if they use Python):

take someone in 1 to include them. 2 to give them accommodation or shelter.
3 to deceive or cheat them.

--
Duncan Booth dun...@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Harvey Thomas

unread,
Jul 15, 2003, 10:07:34 AM7/15/03
to
Colin S. Miller wrote

> >
> > Matthew 25:35 I was a stranger, and you took me in.
> Care to enlighten us with the second meaning?
> I'm a native English speaker, but can only see one meaning
> 'I was unknown to you, yet you let me stay in your house'
> Although 'took me in' could also mean 'accept as a friend'
>
>
> Colin S. Miller
>
Also 'I was unknown to you and you deceived me'. Slightly colloquial


_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.

Francois Pinard

unread,
Jul 15, 2003, 10:24:17 AM7/15/03
to
[John J. Lee]

> Alan Kennedy <ala...@hotmail.com> writes:

> > I'd love to hear of other similar phrases. And somehow I intuit
> > there's people in this ng who know lots of them :-)

> There are so many...

Automated translators which ignore punctuation are pretty fragile, too.
Here is a case where the exact same words are used, besides punctuation.
I read in one of Audouard's books ("De la connerie et des cons", if
I remember correctly), about Montcalm, a French Canadian military of
old times. The history reports that he said:

"Messieurs les Anglais, tirez les premiers!"

but Audouard wrote that he fears the correct writing should have been
something like:

"Messieurs! Les Anglais... Tirez les premiers!"

P.S. - I confess I would have more difficulty relating this one to Python.
The word categorisation disambiguation program that I intend to rewrite
in Python, one of these days, would (correctly) yield the same results
for both sentences, so Python-wise for me, this is a non-issue! :-)

Jack Diederich

unread,
Jul 15, 2003, 10:40:39 AM7/15/03
to
On Tue, Jul 15, 2003 at 10:24:17AM -0400, Francois Pinard wrote:
> [John J. Lee]
>
> > Alan Kennedy <ala...@hotmail.com> writes:
>
> > > I'd love to hear of other similar phrases. And somehow I intuit
> > > there's people in this ng who know lots of them :-)
>
> > There are so many...
>
> Automated translators which ignore punctuation are pretty fragile, too.
> Here is a case where the exact same words are used, besides punctuation.

My favorite example is comifying a list.
"1, 2, and 3" vs "1, 2 and 3" (journalist seem to prefer the later)

"I dedicate this book to my parents, Jane, and God."
"I dedicate this book to my parents, Jane and God."

-jack

Roy Smith

unread,
Jul 15, 2003, 11:22:18 AM7/15/03
to
Harvey Thomas <h...@empolis.co.uk> wrote:
> Also 'I was unknown to you and you deceived me'. Slightly colloquial

Given the biblical meaning of "known", this could have even more than
two meanings :-)

Syver Enstad

unread,
Jul 15, 2003, 11:51:55 AM7/15/03
to
r...@panix.com (Roy Smith) writes:

Does "to know" in english also mean to feel someone? In my own language
the direct translation of the english know also means to feel. I could
say (translated) "I know the cold", meaning I feel the cold
weather.

Michael Hudson

unread,
Jul 15, 2003, 11:52:11 AM7/15/03
to
Francois Pinard <pin...@iro.umontreal.ca> writes:

> [John J. Lee]
>
> > Alan Kennedy <ala...@hotmail.com> writes:
>
> > > I'd love to hear of other similar phrases. And somehow I intuit
> > > there's people in this ng who know lots of them :-)
>
> > There are so many...
>
> Automated translators which ignore punctuation are pretty fragile, too.

Oh, if we're allowed to play punctuation games, I like:

I travelled on[,] my face towards home.

Cheers,
mwh

--
Like most people, I don't always agree with the BDFL (especially
when he wants to change things I've just written about in very
large books), ...
-- Mark Lutz, http://python.oreilly.com/news/python_0501.html

Alan Kennedy

unread,
Jul 15, 2003, 12:22:48 PM7/15/03
to
Syver Enstad wrote:

> > Given the biblical meaning of "known", this could have even more than
> > two meanings :-)
>
> Does "to know" in english also mean to feel someone? In my own language
> the direct translation of the english know also means to feel. I could
> say (translated) "I know the cold", meaning I feel the cold
> weather.

To "know" someone, in the biblical sense, is to have "carnal
knowledge" of them, i.e. "knowledge of the flesh", i.e. to have had
sexual relations with them.

Some of the English translations of the bible use terms such as "And
Adam knew Eve, and Eve begat 2 children", etc, etc. These translations
are probably from the middle ages, or earlier.

Harvey Thomas

unread,
Jul 15, 2003, 12:06:57 PM7/15/03
to
Syver Enstad

> r...@panix.com (Roy Smith) writes:
>
> > Harvey Thomas <h...@empolis.co.uk> wrote:
> > > Also 'I was unknown to you and you deceived me'. Slightly
> colloquial
> >
> >
> > Given the biblical meaning of "known", this could have even
> more than
> > two meanings :-)
>
> Does "to know" in english also mean to feel someone? In my
> own language
> the direct translation of the english know also means to feel. I could
> say (translated) "I know the cold", meaning I feel the cold
> weather.
>
Not really. The sense Roy is referring to is in the King James (Authorised Version) of the bible, "know" is often used in the sense "to know sexually" i.e. to have sex with. It's archaic now, so if you asked someone "Do you know Guido van Rossum", no-one would think you were asking "Have you had sex with GvR"

Syver Enstad

unread,
Jul 15, 2003, 1:20:17 PM7/15/03
to
Alan Kennedy <ala...@hotmail.com> writes:

> Syver Enstad wrote:
>
> > > Given the biblical meaning of "known", this could have even more
> than
>
> > > two meanings :-)
> >
> > Does "to know" in english also mean to feel someone? In my own
> language
>
> > the direct translation of the english know also means to feel. I
> could
>
> > say (translated) "I know the cold", meaning I feel the cold
> > weather.
>
> To "know" someone, in the biblical sense, is to have "carnal
> knowledge" of them, i.e. "knowledge of the flesh", i.e. to have had
> sexual relations with them.

Yes, I know that, I think it sounds great. It's just didn't make sense
the way I understand the english word "to know".

--

Vennlig hilsen

Syver Enstad

Borcis

unread,
Jul 15, 2003, 3:33:47 PM7/15/03
to
Francois Pinard wrote:
>
> Automated translators which ignore punctuation are pretty fragile, too.
> Here is a case where the exact same words are used, besides punctuation.
> I read in one of Audouard's books ("De la connerie et des cons", if
> I remember correctly), about Montcalm, a French Canadian military of
> old times. The history reports that he said:
>
> "Messieurs les Anglais, tirez les premiers!"
>
> but Audouard wrote that he fears the correct writing should have been
> something like:
>
> "Messieurs! Les Anglais... Tirez les premiers!"
>
> P.S. - I confess I would have more difficulty relating this one to Python.
> The word categorisation disambiguation program that I intend to rewrite
> in Python, one of these days, would (correctly) yield the same results
> for both sentences, so Python-wise for me, this is a non-issue! :-)

Reminds me of sitting, a dozen years ago, through the exposition, by an
amator, of the theory of word stemming he had spent years, in his
corner, to polish; applied to machine translation - that was the
research activity of the institute I happened to work for at the time.

Someone had invited a local University's professor of linguistics,
specialized in morphology. That person apparently felt he needed to
debunk the amator's proposal, and started pedantically expounding :
"Your proposal to decompose prefix-stem-suffix to translate them
separately and recompose the result won't work if the stems are
not etymologically related (somehow the topic was translation between
closely related languages). Take for instance the pair <faire>, <do>..."

-- Of course it's possible : "faisable", "doable" !

Cheers, Boris Borcic
--
python >>> filter(lambda W : W not in "ILLITERATE","BULLSHIT")


Brendan Hahn

unread,
Jul 15, 2003, 4:15:28 PM7/15/03
to
dun...@rcp.co.uk wrote:

>"Colin S. Miller" <colinsm.s...@picsel.com> wrote:
>To "take someone in" means to trick or deceive them.

"take in" can also mean to observe.

--
brendan DOT hahn AT hp DOT com

John J. Lee

unread,
Jul 15, 2003, 5:15:54 PM7/15/03
to
bh...@spam-spam.g0-away.com (Brendan Hahn) writes:

> dun...@rcp.co.uk wrote:
> >"Colin S. Miller" <colinsm.s...@picsel.com> wrote:
> >To "take someone in" means to trick or deceive them.
>
> "take in" can also mean to observe.

But "take someone in" never means that. It really is a wonder that we
manage to communicate this way...


John

Peter Hansen

unread,
Jul 15, 2003, 5:30:15 PM7/15/03
to

Never say never. I could picture a writer better than I, carefully
crafting a sentence in a book involving a very, very large women (or
man, let's not be sexist here), saying something to the effect of
"She was so large, I couldn't take her all in."

But you're write (sic) about it being hard to communicate sometimes.

(Did I mean "sometimes you're right", or did I mean "sometimes it's
hard to communicate"? ;-)

-Peter

John J. Lee

unread,
Jul 15, 2003, 5:49:12 PM7/15/03
to
Syver Enstad <syver-e...@online.no> writes:
[...]

> Does "to know" in english also mean to feel someone? In my own language
> the direct translation of the english know also means to feel. I could
> say (translated) "I know the cold", meaning I feel the cold
> weather.

There's a similar meaning in English. Isn't used often, probably
because it implies some kind of seriousness, usually as "to know <some
emotion>". "I know the cold" works, but it'd sound like you were
about to tell us that you did 30 years hard labour in Siberia (OK, not
necessarily *quite* that extreme ;-). Or maybe it sounds serious
because it isn't used often <0.5 wink>.

Not usually used about people, though, because "to know <some person>"
in a context which implies anything other than the everyday meaning is
associated with "the biblical sense" (which is almost an idiomatic
phrase in itself!).

How the hell did nature sneak all this subtlety of English usage into
my brain without me noticing it?

making-a-link-with-Python-would-be-easy-but-pointless-ly y'rs,


John

Borcis

unread,
Jul 15, 2003, 7:02:51 PM7/15/03
to
John J. Lee wrote:

>>>To "take someone in" means to trick or deceive them.
>>
>>"take in" can also mean to observe.
>
> But "take someone in" never means that.

Except of course that in the course of deceiving someone,
it is usual to observe that person.

Erik Max Francis

unread,
Jul 15, 2003, 8:34:31 PM7/15/03
to
Syver Enstad wrote:

> Does "to know" in english also mean to feel someone? In my own
> language
> the direct translation of the english know also means to feel. I could
> say (translated) "I know the cold", meaning I feel the cold
> weather.

I don't know that usage. (Har, har.)

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \ Nine worlds I remember.
\__/ Icelandic Edda of Snorri Sturluson

JanC

unread,
Jul 15, 2003, 10:20:30 PM7/15/03
to
"Alan Kennedy" <ala...@hotmail.com> schreef:

> I was obviously completely off base with my interpretation of "pilote
> ferme"/"experimental farm". :#)

"Experimental farm" or "pilot farm" would have been "ferme pilote" or
"ferme expérimentale" or something like that...

--
JanC

"Be strict when sending and tolerant when receiving."
RFC 1958 - Architectural Principles of the Internet - section 3.9

JanC

unread,
Jul 15, 2003, 10:24:33 PM7/15/03
to
Francois Pinard <pin...@iro.umontreal.ca> schreef:

> "The firm driver is carrying her."

Which is a nice example of something that could be misunderstood in
English. ;-)

(Is the driver strong or does he work for a firm?)

Courageous

unread,
Jul 16, 2003, 2:01:01 AM7/16/03
to

>It's archaic now, so if you asked someone "Do you know Guido van Rossum", no-one would think you
>were asking "Have you had sex with GvR"

I bet that, when first authoring Python, Mr. Van Russom never imagined that as a
consequence he might here those particular words in that particular order. :)

C//

John Machin

unread,
Jul 16, 2003, 7:59:45 AM7/16/03
to
Francois Pinard <pin...@iro.umontreal.ca> wrote in message news:<mailman.1058279187...@python.org>...
> ... Montcalm, a French Canadian military of

> old times. The history reports that he said:
>
> "Messieurs les Anglais, tirez les premiers!"
>
> but Audouard wrote that he fears the correct writing should have been
> something like:
>
> "Messieurs! Les Anglais... Tirez les premiers!"
>

There's a much wider disparity between reported versions of
Cambronne's retort to the English offer that he chuck it in at the end
at Waterloo :-)

Paul Boddie

unread,
Jul 16, 2003, 12:21:22 PM7/16/03
to
Duncan Booth <dun...@NOSPAMrcp.co.uk> wrote in message news:<Xns93B99C8B951...@127.0.0.1>...

>
> To "take someone in" means to trick or deceive them.

[Dictionary reference]

> take someone in 1 to include them. 2 to give them accommodation or shelter.
> 3 to deceive or cheat them.

English is a great language to confuse people with when one considers
different verb/preposition combinations, especially when some of them
are used for slang purposes. Anyway, to elaborate on the above:

A. "I couldn't take it (all) in," refers to the observation of
events or quite commonly some kind of sensory experience. This
is only ever used with passive objects or events, though.

B. "The vicar was completely taken in by the deception." (Note that
this has subtle differences from...

"The vicar was completely taken by the idea."

...which may indicate enthusiasm or obsession.)

C. "After trying all his other acquaintances, it was the bishop who
finally took him in." (This means that the bishop offered
accommodation or shelter, and not that the bishop was behind an
elaborate or ambitious deception.)

I'm sure other alternatives exist, some with dubious meanings. :-)

I suppose this goes to show that "modifiers" which change the
behaviour of known "operations" (frequently in subtle ways) can be
impediments to the understanding of a language. It could be
interesting to consider whether Python, as an artificial language,
manages to successfully avoid such possibilities for confusion.

Paul

Anton Vredegoor

unread,
Jul 16, 2003, 3:32:41 PM7/16/03
to
Alan Kennedy <ala...@hotmail.com> wrote:

>"Time flies like an arrow, but fruit flies like a banana".
>
>:-D
>

>I'd love to hear of other similar phrases. And somehow I intuit
>there's people in this ng who know lots of them :-)

I don't know about automated translators, but I dare any non Dutch or
German native speaker to interprete the following valid Dutch
sentence:

"Als achter vliegen vliegen vliegen vliegen vliegen vliegen achterna."

Anton

Irmen de Jong

unread,
Jul 16, 2003, 4:06:36 PM7/16/03
to
Anton Vredegoor wrote:

> I don't know about automated translators, but I dare any non Dutch or
> German native speaker to interprete the following valid Dutch
> sentence:
>
> "Als achter vliegen vliegen vliegen vliegen vliegen vliegen achterna."
>
> Anton

You should really add a comma at the right place, otherwise even Dutch
people themselves tend to get confused rather quickly too:

"Als achter vliegen vliegen vliegen, vliegen vliegen vliegen achterna."

(btw, a tip: 'vliegen' is a verb and also a noun in Dutch.)

--Irmen

Michele Simionato

unread,
Jul 16, 2003, 8:20:52 PM7/16/03
to
Alan Kennedy <ala...@hotmail.com> wrote in message news:<3F142A58...@hotmail.com>...

> Syver Enstad wrote:
>
> > > Given the biblical meaning of "known", this could have even more than
> > > two meanings :-)
> >
> > Does "to know" in english also mean to feel someone? In my own language
> > the direct translation of the english know also means to feel. I could
> > say (translated) "I know the cold", meaning I feel the cold
> > weather.
>
> To "know" someone, in the biblical sense, is to have "carnal
> knowledge" of them, i.e. "knowledge of the flesh", i.e. to have had
> sexual relations with them.
>
> Some of the English translations of the bible use terms such as "And
> Adam knew Eve, and Eve begat 2 children", etc, etc. These translations
> are probably from the middle ages, or earlier.


It comes at least from the Latin version and I would not be surprised
if the double sense of "known" was in the Greek version too (any Greek
here?).

Now, Latin had to verbs for "to know": "scire" and "cognoscere".
Only the second one had the sexual double meaning. The double meaning
has been preserved in modern latin languages:

Italian -> conoscere
French -> connaitre
Spanish -> conocer

The other verb "scire" has generated (if I am not mistaken) "sapere",
savoir", "saber" and of course "science", which are sexually clean,
at least as
far as I know ;)


P.S. according to http://www.freedict.com/cgi-bin/onldict.cgi

scio -> to know, understand.
cognosco -> to examine, inquire, learn

Greg Ewing (using news.cis.dfn.de)

unread,
Jul 16, 2003, 9:58:13 PM7/16/03
to
JanC wrote:
> Francois Pinard <pin...@iro.umontreal.ca> schreef:
>
>
>> "The firm driver is carrying her."
>
>
> Which is a nice example of something that could be misunderstood in
> English. ;-)
>
> (Is the driver strong or does he work for a firm?)

Or should we be directing followups to soc.sexuality.general?-)

JanC

unread,
Jul 16, 2003, 9:35:02 PM7/16/03
to
mi...@pitt.edu (Michele Simionato) schreef:

> It comes at least from the Latin version and I would not be surprised
> if the double sense of "known" was in the Greek version too (any Greek
> here?).

I'm not Greek, but I still have a ancient Greek dictionary from school. :-)

The verb "gignooskoo" (trying to write it with Latin letters ;) does indeed
have the same "double" meaning in ancient Greek. (Of course this is not
really a "double" meaning, if you think about it.)

In Dutch the normal translation of "to know" is "kennen" or "weten", the
sexual meaning is translated as "bekennen" (but, just like in English, it
is not really common in contemporary Dutch). Interesting is that
"bekennen" also has another meaning in Dutch: "to profess", "to confess"...

> Now, Latin had to verbs for "to know": "scire" and "cognoscere".

And "cognovisse". (I still have a latin dictionary too... :)

Tim Roberts

unread,
Jul 16, 2003, 11:02:29 PM7/16/03
to
Jack Diederich <ja...@performancedrivers.com> wrote:
>
>My favorite example is comifying a list.
>"1, 2, and 3" vs "1, 2 and 3" (journalist seem to prefer the later)
>
>"I dedicate this book to my parents, Jane, and God."
>"I dedicate this book to my parents, Jane and God."

This is actually something that is changing over time. It used to be that
"no final comma" was a hard and fast rule, but many of the more recent
style guides now suggest the comma.

My 7th grader's English textbook advocates the final comma, and it led me
to get her into trouble in one assignment.
--
- Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Harvey Thomas

unread,
Jul 17, 2003, 3:43:20 AM7/17/03
to
Tim Roberts wrote

Ah, the "Oxford comma". See http://www.askoxford.com/asktheexperts/faq/aboutother/oxfordcomma
for a bit more about it.

Alan Kennedy

unread,
Jul 17, 2003, 6:53:05 AM7/17/03
to
JanC wrote:

> The verb "gignooskoo" (trying to write it with Latin letters ;)

Why limit yourself to that nasty little us-ascii alphabet? >;-)

Here it is in a format where almost everybody will be able to see the
original greek verb on their screen.

#---------
<?xml version="1.0" encoding="utf-8"?>
<verb>&#x3b3;&#x3af;&#x3b3;&#x3bd;&#x3c9;&#x3c3;&#x3ba;&#x3c9;</verb>
#---------

For anybody who has MS Internet Explorer 5+, Netscape 6+, Mozilla 1+,
i.e. any browser that supports XML, simply save this to a disk file
and open it in your chosen browser.

Of course, I could also have used charset "iso-8859-7", in which case
the character codes would be one-byte-only. But I don't think that
would have travelled well over UseNet to most of you.

Or I could have used UTF-16, in which case every character would have
been two-bytes. But the same UseNet problems apply.

So, the challenge to the ASCII proponents is: put the greek word
"gignooskoo" on everybody's screen, originating from a usenet message,
in the original greek, where "oo" -> greek letter omega.

Obviously, it could also be represented in python itself. But I think
it is fair to exclude python, given that not everyone reading this
message will have python available to them (think of people stumbling
across this posting while searching the archives for information about
the origin of the word "science" for example).

I expect you won't find it as simple as the XML above, although I'm
also completely prepared to be proven wrong (Alan tries to cover his
a** in advance ;-).

Michele Simionato

unread,
Jul 17, 2003, 8:16:50 AM7/17/03
to
JanC <usene...@janc.invalid> wrote in message news:<Xns93BB246...@213.118.75.80>...
> mi...@pitt.edu (Michele Simionato) schreef:

>
> > Now, Latin had to verbs for "to know": "scire" and "cognoscere".
>
> And "cognovisse". (I still have a latin dictionary too... :)


Huh? "cognovisse" cannot be a verb, what would be the paradigma?


Michele

Ben Finney

unread,
Jul 17, 2003, 10:03:23 PM7/17/03
to
On Thu, 17 Jul 2003 11:53:05 +0100, Alan Kennedy wrote:
> JanC wrote:
>> The verb "gignooskoo" (trying to write it with Latin letters ;)
> Why limit yourself to that nasty little us-ascii alphabet? >;-)

Because it will display reliably on any computer.

> Here it is in a format where almost everybody will be able to see the
> original greek verb on their screen.

> [instructions to cut and paste to a file, then open in a limited range
> of programs, on computers possessing the appropriate font]


>
> So, the challenge to the ASCII proponents is: put the greek word
> "gignooskoo" on everybody's screen, originating from a usenet message,
> in the original greek, where "oo" -> greek letter omega.

Challenge accepted:

Open any drawing program. Draw, in order from left to right, the Greek
characters gamma, ipsilon, gamma, nu, omega, sigma, kappa, omega.

Done. The desired word now appears on the screen.

Oh, what's that -- you say that's cheating because the user has to use
particular programs? Perform manual steps? Have some existing
knowledge about the process? That the process may fail for any of these
reasons?

Those are attributes of the "simple" process of manually manipulating
XML content you gave.

Not every computer is capable of automatically displaying Greek
characters. Even for those which can, there's not yet a universal way
to instruct them to do so. Hence, it is not possible to have any
computer automatically display a word with Greek characters.

But you already knew that, so why the silly challenge?

--
\ "Behind every successful man is a woman, behind her is his |
`\ wife." -- Groucho Marx |
_o__) |
http://bignose.squidly.org/ 9CFE12B0 791A4267 887F520C B7AC2E51 BD41714B

Alan Kennedy

unread,
Jul 17, 2003, 6:22:58 PM7/17/03
to
JanC:

>>> The verb "gignooskoo" (trying to write it with Latin letters ;)

Alan Kennedy wrote:

>> Why limit yourself to that nasty little us-ascii alphabet? >;-)

Ben Finney wrote:

> Because it will display reliably on any computer.

I'm not so sure. Depends on what you mean by display I suppose. I have
a luddite classics professor friend who would deride the assertion
that the above is an accurate representation of the original greek
word.

> > Here it is in a format where almost everybody will be able to see the
> > original greek verb on their screen.
> > [instructions to cut and paste to a file, then open in a limited range
> > of programs, on computers possessing the appropriate font]
> >
> > So, the challenge to the ASCII proponents is: put the greek word
> > "gignooskoo" on everybody's screen, originating from a usenet message,
> > in the original greek, where "oo" -> greek letter omega.
>
> Challenge accepted:
>
> Open any drawing program. Draw, in order from left to right, the Greek
> characters gamma, ipsilon, gamma, nu, omega, sigma, kappa, omega.
>
> Done. The desired word now appears on the screen.

OK, now that we've solved that problem, let's move it up a level.

Now we want our greek to be indexable and searchable, so that, for
example, I can go to google and have it returned as a hit for the word
"gignooskoo". (apologies to greek people and greek scholars for the
poor rendering, if you've found this message at all).

And we want our greek to be accessible to visually disabled people.
Hacks like displaying bitmaps instead of glyphs work for visual
rendering. But what about non-visual renderings? Aural renderings?
Braille renderings?

> Oh, what's that -- you say that's cheating because the user has to use
> particular programs? Perform manual steps? Have some existing
> knowledge about the process? That the process may fail for any of these
> reasons?

Not necessarily cheating. Just not scalable (to say, "The Illiad").
And not searchable. Or accessible. Yes, I could automate the process,
by say generating a series of vector commands, which results in
drawing the glyph on the users screen. But it still isn't searchable.

As for particular programs: we all use a limited set of software that
fits our personal paradigm for information modelling. But my process
only involved OS and software-independent concepts, listed below under
"existing knowledge". Also, I think browsers are pretty universal
these days. Note also that your proposed process requires the
availability of drawing software.

Perform manual steps: There'll always be manual steps. I count 12
mouse presses to follow my process. How many for yours?

Existing knowledge: All I need was knowledge of copy&paste, file
creation and file viewing. I might need Pretty basic and universal
computer knowledge, in these days of GUIs.

> Those are attributes of the "simple" process of manually manipulating
> XML content you gave.

Fair enough, if copying and pasting is a complex and error-prone
operation. But it's not. And even the copying and pasting would be
eliminated if I could have the usenet transport protocol encode its
data and metadata in XML.

And yes, it would also be necessary if I could encode protocol
metadata in UTF-8. But I can't. HTTP and MIME, restrict me to 8-bit
character sets like iso-8859-1.

> Not every computer is capable of automatically displaying Greek
> characters. Even for those which can, there's not yet a universal way
> to instruct them to do so. Hence, it is not possible to have any
> computer automatically display a word with Greek characters.

> But you already knew that, so why the silly challenge?

To raise the stakes once somebody has "ante'd up" >;-)

Martin v. Loewis

unread,
Jul 17, 2003, 6:33:38 PM7/17/03
to
Alan Kennedy wrote:
> For anybody who has MS Internet Explorer 5+, Netscape 6+, Mozilla 1+,
> i.e. any browser that supports XML, simply save this to a disk file
> and open it in your chosen browser.
[...]

> So, the challenge to the ASCII proponents is: put the greek word
> "gignooskoo" on everybody's screen, originating from a usenet message,
> in the original greek, where "oo" -> greek letter omega.

[...]


> I expect you won't find it as simple as the XML above, although I'm
> also completely prepared to be proven wrong (Alan tries to cover his
> a** in advance ;-).

So what do you think about this message?:

γίγνωσκω

Look Ma, no markup. And not every character uses two bytes, either.
And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ) if I want to.

I don't know for whom this renders well, but I guess MSIE5+, NS6+
and Mozilla 1+ are good candidates - without the need for saving
things into files.

Regards,
Martin

Irmen de Jong

unread,
Jul 17, 2003, 6:47:50 PM7/17/03
to
Martin v. Loewis wrote:

> γίγνωσκω
>
> Look Ma, no markup. And not every character uses two bytes, either.
> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ) if I want to.
>
> I don't know for whom this renders well, but I guess MSIE5+, NS6+
> and Mozilla 1+ are good candidates - without the need for saving
> things into files.

Exactly, it renders perfectly okay for me (mozilla 1.4).
I wonder one thing: how did you type it in?

--Irmen

Ben Finney

unread,
Jul 17, 2003, 11:30:11 PM7/17/03
to
On Fri, 18 Jul 2003 00:33:38 +0200, Martin v. Loewis wrote:
> So what do you think about this message?:
> ????????

I think it looks like a series of identical question marks. Presumably
you wrote it using a character set my terminal isn't using, and you had
no way of instructing my computer to use.

Even if you could, it's folly to assume that my computer has a font
containing the character set you used.

The only characters you know are in my computer's fonts are the
printable ASCII characters (ASCII 32-126). Anything else is a minefield
of disparate character sets, font mappings and incomplete implementations.

Beyond that, even if I could read your message, it's even greater folly
to assume that I have some way of responding in kind, or of searching
for those characters. I have no obvious way to generate them.

I'm not sure what Alan Kennedy's point is, but the current state of
play, while improving, does not allow for a universal way of generating,
displaying and searching international character sets.

--
\ "Experience is that marvelous thing that enables you to |
`\ recognize a mistake when you make it again." -- Franklin P. |
_o__) Jones |

Florian Schulze

unread,
Jul 17, 2003, 6:53:29 PM7/17/03
to
On Fri, 18 Jul 2003 00:33:38 +0200, Martin v. Loewis <mar...@v.loewis.de>
wrote:

> So what do you think about this message?:
>
> γίγνωσκω
>
> Look Ma, no markup. And not every character uses two bytes, either.
> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ) if I want to.
>
> I don't know for whom this renders well, but I guess MSIE5+, NS6+
> and Mozilla 1+ are good candidates - without the need for saving
> things into files.

And Opera (7.11). I would also like to know what's the trick.

Florian

Martin v. Loewis

unread,
Jul 17, 2003, 7:14:30 PM7/17/03
to
Irmen de Jong wrote:

> Exactly, it renders perfectly okay for me (mozilla 1.4).
> I wonder one thing: how did you type it in?

The Greek one, I copied from the XML file that Alan sent:
I did as he said. Save file to disk, open it in the browser.
I then copied the characters from one browser window to another.

For the umlauts, I just pressed the relevant characters on
my German keyboard.

For the Arabic, I copied some string from a web page
(the demo page of worldnames.net). Notice that this is
right-to-left, so don't be surprised if your cursor is
on the left end of the text at the end of pasting. Just
pressing the closing parenthesis will switch back to
left-to-right mode.

Regards,
Martin


Aahz

unread,
Jul 17, 2003, 7:46:31 PM7/17/03
to
In article <3F172442...@v.loewis.de>,

Martin v. Loewis <mar...@v.loewis.de> wrote:
>
>So what do you think about this message?:
>
>γίγνωσκω

Well, that renders as

.......<1/2 symbol>.....o..

in trn3.6 running in a vt100 emulator window, and it renders as

.......<1/2 symbol>.~I.~C.o.~I

when I started up vi to follow up (again in the vt100 emulator).
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

A: No.
Q: Is top-posting okay?

Bengt Richter

unread,
Jul 17, 2003, 11:11:56 PM7/17/03
to
On Thu, 17 Jul 2003 11:53:05 +0100, Alan Kennedy <ala...@hotmail.com> wrote:

>JanC wrote:
>
>> The verb "gignooskoo" (trying to write it with Latin letters ;)
>
>Why limit yourself to that nasty little us-ascii alphabet? >;-)
>
>Here it is in a format where almost everybody will be able to see the
>original greek verb on their screen.
>
>#---------
><?xml version="1.0" encoding="utf-8"?>
><verb>&#x3b3;&#x3af;&#x3b3;&#x3bd;&#x3c9;&#x3c3;&#x3ba;&#x3c9;</verb>
>#---------
>
>For anybody who has MS Internet Explorer 5+, Netscape 6+, Mozilla 1+,
>i.e. any browser that supports XML, simply save this to a disk file
>and open it in your chosen browser.
>

Sorry, that doesn't work for my old browser (NS4.5 ;-) Try this:

====< giginooskoo.html >======================================================
<html><head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-7">
<style> H1 {font-size: 72pt} </style>
<title>gignooskoo</title></head><body>
<h1>&#947;&#943;&#947;&#957;&#969;&#963;&#954;&#969;</h1>
</body></html>
==============================================================================

>Of course, I could also have used charset "iso-8859-7", in which case
>the character codes would be one-byte-only. But I don't think that
>would have travelled well over UseNet to most of you.
>

The above seems to work for me. Does it you? Windows-1253 as char set should also work, I think.
(I made the char numeric entities decimal, as some old browsers don't do &#x...;)
(There's also some unnecessary formatting ;-)

Regards,
Bengt Richter

Erik Max Francis

unread,
Jul 18, 2003, 12:54:10 AM7/18/03
to
"Martin v. Loewis" wrote:

> So what do you think about this message?:
>

> [non ASCII characters]
>
> Look Ma, no markup.

Yeah, but that only works if everyone's expecting the same encoding. I
just see garbage non-ASCII characters, for instance, with my lowly
Netscape 4 newsreader.

> And not every character uses two bytes, either.

Looked like it was probably was here, I saw what looked very strongly
like eight double-byte characters (and two bytes each).

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE

/ \ Wretches hang that jurymen may dine.
\__/ Alexander Pope

Erik Max Francis

unread,
Jul 18, 2003, 12:57:36 AM7/18/03
to
Ben Finney wrote:

> I think it looks like a series of identical question marks.
> Presumably
> you wrote it using a character set my terminal isn't using, and you
> had
> no way of instructing my computer to use.

When you see the right number of question marks, that usually means
whatever's processing it knows it's dealing with Unicode, but can't
display the glyphs. So your terminal knows the character set, it just
doesn't have the glyphs in the font it's using.

Oren Tirosh

unread,
Jul 18, 2003, 2:57:40 AM7/18/03
to
On Fri, Jul 18, 2003 at 03:11:56AM +0000, Bengt Richter wrote:
> ====< giginooskoo.html >======================================================
> <html><head>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-7">
> <style> H1 {font-size: 72pt} </style>
> <title>gignooskoo</title></head><body>
> <h1>&#947;&#943;&#947;&#957;&#969;&#963;&#954;&#969;</h1>
> </body></html>
> ==============================================================================

Actually, you don't need the "CHARSET=iso-8859-7". It would be required
if you used the bytes 227, 223, 227, 237, 249, 243, 234, 249 to represent
the characters. With numeric character references you can embed any
character from the UCS repertoire regardless of the charset used.

Oren

Hallvard B Furuseth

unread,
Jul 18, 2003, 3:30:32 AM7/18/03
to
Oren Tirosh wrote:
>On Fri, Jul 18, 2003 at 03:11:56AM +0000, Bengt Richter wrote:

>> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-7">

Needs to be charset=utf-8. iso-8859-7 has no character number 947.

>> <h1>&#947;(...)

> Actually, you don't need the "CHARSET=iso-8859-7". It would be
> required if you used the bytes 227, 223, 227, 237, 249, 243, 234, 249
> to represent the characters. With numeric character references you can
> embed any character from the UCS repertoire regardless of the charset
> used.

&#<num>; seems to mean character number NUM in the current character
set, not in UCS. At least on NS 4.79.

--
Hallvard

Alan Kennedy

unread,
Jul 18, 2003, 5:45:56 AM7/18/03
to
"Martin v. Loewis" wrote:

> So what do you think about this message?:
>
> γίγνωσκω
>
> Look Ma, no markup. And not every character uses two bytes, either.
> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ)
> if I want to.

Martin,

I can see from other people's messages that this has been successful
for some people with modern software.

However, it failed for me on my old Netscape 4.x Messenger. Which is
acceptable, I suppose, because I intentionally use ancient email and
usenet software. It is also worth noting that although my poor old
usenet client failed to display the sequence of characters, the
"Navigator" component to which it belongs correctly displayed the
greek text when fed Bengt's "gignooskoo.html" file (although it failed
on my xml snippet).

More worrying however is the failure of modern browsers to display the
characters when accessed through Google Groups.

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=3F172442.2040907%40v.loewis.de

I tried to view this in IE6.0 and Netscape 6.2, and all I saw was
"?????s??".

Whereas that thread still shows my XML snippet intact, still
copy&paste-able.

kind regards,

Martin v. Löwis

unread,
Jul 18, 2003, 11:10:06 AM7/18/03
to
Erik Max Francis <m...@alcyone.com> writes:

> Yeah, but that only works if everyone's expecting the same encoding. I
> just see garbage non-ASCII characters, for instance, with my lowly
> Netscape 4 newsreader.

No, that is not a prerequisite. Instead, the prerequisite is that the
news reader/MUA knows what MIME (Multipurpose Internet Mail
Extensions) is. In the MIME header, I clearly identified the encoding
of this message as UTF-8, so any news reader *should* be capable of
converting this to the local encoding (perhaps using replacement
characters where glyphs are missing).

> Looked like it was probably was here, I saw what looked very strongly
> like eight double-byte characters (and two bytes each).

But then, you were able to read the English parts of my message just
fine, right? The ASCII letters in this message did not take up two
bytes per letter, as they would have if the message was encoded in
UTF-16.

Regards,
Martin

Martin v. Löwis

unread,
Jul 18, 2003, 11:13:13 AM7/18/03
to
Alan Kennedy <ala...@hotmail.com> writes:

> More worrying however is the failure of modern browsers to display the
> characters when accessed through Google Groups.

It's not the browsers that display it incorrectly; it is Google
rendering it incorrectly. Fortunately, they keep the original data at

http://groups.google.com/groups?selm=3F172442.2040907%40v.loewis.de&oe=UTF-8&output=gplain

Regards,
Martin

Brian McErlean

unread,
Jul 18, 2003, 12:03:25 PM7/18/03
to
Alan Kennedy <ala...@hotmail.com> wrote in message news:<3F17C1D4...@hotmail.com>...

> "Martin v. Loewis" wrote:
>
> > So what do you think about this message?:
> >
> > ฮณฮฏฮณฮฝฯ?ฯ?ฮบฯ?

> >
> > Look Ma, no markup. And not every character uses two bytes, either.
> > And I can use Umlauts (รครถรผ) and Arabic (ุก๏บ?๏ปฃ.๏บ?๏ป?๏บฎ๏บท)

> > if I want to.
>
> Martin,
>
> I can see from other people's messages that this has been successful
> for some people with modern software.
>
> However, it failed for me on my old Netscape 4.x Messenger. Which is
> acceptable, I suppose, because I intentionally use ancient email and
> usenet software. It is also worth noting that although my poor old
> usenet client failed to display the sequence of characters, the
> "Navigator" component to which it belongs correctly displayed the
> greek text when fed Bengt's "gignooskoo.html" file (although it failed
> on my xml snippet).
>
> More worrying however is the failure of modern browsers to display the
> characters when accessed through Google Groups.
>
> >http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=3F172442.2040907%40v.loewis.de
>
> I tried to view this in IE6.0 and Netscape 6.2, and all I saw was
> "?????s??".
>
> Whereas that thread still shows my XML snippet intact, still
> copy&paste-able.
>
> kind regards,

I saw the same as you with that URL, but viewing the thread in google,
or going to the link it gave for "View this article only":
http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=3F172442.2040907%40v.loewis.de

displayed OK (Using Mozilla firebird 0.6)

The key difference seems to be the "oe=UTF-8" argument in the URL.
Adding this to your URL displays it correctly.

Cliff Wells

unread,
Jul 18, 2003, 12:44:21 PM7/18/03
to
On Thu, 2003-07-17 at 15:33, Martin v. Loewis wrote:

> So what do you think about this message?:
>
> γίγνωσκω
>
> Look Ma, no markup. And not every character uses two bytes, either.
> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ) if I want to.
>
> I don't know for whom this renders well, but I guess MSIE5+, NS6+
> and Mozilla 1+ are good candidates - without the need for saving
> things into files.

Looks fine in Evolution 1.4.3. Well, that is, it looks like a bunch of
gibberish, which is what I expected to see <wink>

--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 (800) 735-0555


Bengt Richter

unread,
Jul 18, 2003, 2:35:42 PM7/18/03
to
On 18 Jul 2003 09:30:32 +0200, Hallvard B Furuseth <h.b.furuseth(nospam)@usit.uio(nospam).no> wrote:

>Oren Tirosh wrote:
>>On Fri, Jul 18, 2003 at 03:11:56AM +0000, Bengt Richter wrote:
>
>>> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-7">
>
>Needs to be charset=utf-8. iso-8859-7 has no character number 947.

You're right. I think the iso-8859-7 just served as a font hint, in effect.
You can't leave out the <META ... line on my browser (NS4.5, english font defaults)
but IWG one could with Greek defaults. IWG everything is converted to
windows wchars internally either way, according to some best-guess rules
if things aren't consistent.

If we add a space and the character &#1046; after the Greek, the difference
will show up: with utf-8 you get the Cyrillic, and with iso-8859-7 you get a question mark.
(or at least you do with NS4.5). Just tried IE5 -- it seems to fake it either way, but screws up
the presentation with a change in font weight after two characters, and then spaces between chars.
Don't know what that's about. I don't use IE5, (comma required ;-) normally.


>
>>> <h1>&#947;(...)
>
>> Actually, you don't need the "CHARSET=iso-8859-7". It would be
>> required if you used the bytes 227, 223, 227, 237, 249, 243, 234, 249
>> to represent the characters. With numeric character references you can
>> embed any character from the UCS repertoire regardless of the charset
>> used.
>
>&#<num>; seems to mean character number NUM in the current character
>set, not in UCS. At least on NS 4.79.

That seems to be confirmed by the Cyrillic experiment above, now at least
for NS4.5 and NS4.79.

Regards,
Bengt Richter

Robin Munn

unread,
Jul 18, 2003, 2:35:26 PM7/18/03
to
(Restoring the [OT] marker in the subject because this is definitely
off-topic for a Python newsgroup. Well, I guess it might have a bearing
on Python's use of Unicode. Somehow. Remotely.)

Martin v. Loewis <mar...@v.loewis.de> wrote:

> Alan Kennedy wrote:
>> For anybody who has MS Internet Explorer 5+, Netscape 6+, Mozilla 1+,
>> i.e. any browser that supports XML, simply save this to a disk file
>> and open it in your chosen browser.
> [...]
>
>> So, the challenge to the ASCII proponents is: put the greek word
>> "gignooskoo" on everybody's screen, originating from a usenet message,
>> in the original greek, where "oo" -> greek letter omega.
> [...]
>> I expect you won't find it as simple as the XML above, although I'm
>> also completely prepared to be proven wrong (Alan tries to cover his
>> a** in advance ;-).
>
> So what do you think about this message?:
>

> ????????

I see eight identical question marks.

>
> Look Ma, no markup. And not every character uses two bytes, either.

> And I can use Umlauts (äöü) and Arabic (???.????) if I want to.

I see an a-umlaut, o-umlaut, u-umlaut. But for the Arabic, I see three
question marks, a period, and four question marks.

Running slrn version 0.9.7.4 on Linux. My terminal is a PuTTY SSH
connection from a Windows box. slrn --version produces:

Slrn 0.9.7.4 [2002-03-13]
S-Lang Library Version: 1.4.7
Compiled at: Jan 12 2003 08:31:04
Operating System: Linux

COMPILE TIME OPTIONS:
Backends: +nntp +slrnpull +spool
External programs / libs: -inews -ssl -uudeview
Features: +charset_mapping +decoding +emphasized_text +end_of_thread
+fake_refs +gen_msgid -grouplens +mime -msgid_cache +piping +rnlock
+slang +spoilers -strict_from +verbatim_marks
DEFAULTS:
Default server object: nntp
Default posting mechanism: nntp
Default character set: isolatin
SUPPORTED CHARACTER SETS:
isolatin ibm850 ibm852 ibm737 NeXT koi8

Looking at a hex dump of this post, I see that your Unicode characters
have become ASCII question marks in my reply. Bad slrn! No biscuit!

--
Robin Munn <rm...@pobox.com> | http://www.rmunn.com/ | PGP key 0x6AFB6838
-----------------------------+-----------------------+----------------------
"Remember, when it comes to commercial TV, the program is not the product.
YOU are the product, and the advertiser is the customer." - Mark W. Schumann

Alan Kennedy

unread,
Jul 18, 2003, 3:25:20 PM7/18/03
to
Alan Kennedy

>> More worrying however is the failure of modern browsers to display
>> the characters when accessed through Google Groups.

Martin v. Löwis:

> It's not the browsers that display it incorrectly; it is Google
> rendering it incorrectly. Fortunately, they keep the original data
> at
>
http://groups.google.com/groups?selm=3F172442.2040907%40v.loewis.de&oe=UTF-8&output=gplain

Thanks Martin, a virtuoso demonstration.

It is also worth noting that your message and messages quoting it are
the only hits that turn up in a Google Groups search using the
original greek text as a search term: i.e. I go to Google Groups and
paste in the greek letters. This is true of both "global" Google
Groups and the Greek version as well:

groups.google.com: http://tinyurl.com/hd58
groups.google.com.gr: http://tinyurl.com/hd5l

Bravo! (These kudos exchangable for food+beers should you ever decide
to visit Dublin :-)

To everyone else: Why does this stuff get so complicated? Why does it
take a multi-lingual + encoding-guru + protocol-guru + markup-guru +
python-bot like Martin von L to get stuff like this done? Does it have
to require somebot who writes better quality software (i.e. less
defective) than the world's leading search engine, Google, who got it
slightly wrong?

The idea of raising this came to me when that Russian individual
posted a message a few days ago that got very garbled in the
transmission, both subject and content. Again, it was only Martin who
was able to figure out its content: I, being an ordinary mortal, was
left saying "¿Qué?"

Computers should be about making it easier for people to communicate
with each other. And yes I fully realise python's excellence in that
regard, thanks in large part to Martin.

To me, the "structure data using ASCII" argument seems very similar to
the human language position: "English is now universal, therefore all
people must learn and speak it if they want to communicate." What if I
want to have an irish gaelic word in the subject line of my emails or
usenet posts?

slán libh,

--
aláin ó cinnéide

Alan Kennedy

unread,
Jul 18, 2003, 3:25:39 PM7/18/03
to
Brian McErlean wrote:
> I saw the same as you with that URL, but viewing the thread in
> google, or going to the link it gave for "View this article only":

> http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=3F172442.2040907%40v.loewis.de
>
> displayed OK (Using Mozilla firebird 0.6)
>
> The key difference seems to be the "oe=UTF-8" argument in the URL.
> Adding this to your URL displays it correctly.

Yep, we're definitely not in ASCII-land anymore.

Martin v. Löwis

unread,
Jul 18, 2003, 8:52:38 PM7/18/03
to
Alan Kennedy <ala...@hotmail.com> writes:

> To everyone else: Why does this stuff get so complicated? Why does it
> take a multi-lingual + encoding-guru + protocol-guru + markup-guru +
> python-bot like Martin von L to get stuff like this done? Does it have
> to require somebot who writes better quality software (i.e. less
> defective) than the world's leading search engine, Google, who got it
> slightly wrong?

Indeed, it is the bugs in the software that make it so
hard. Fortunately, people like myself have worked hard over the last
10 years or so to get us where we are: writing software, testing
software, reporting bugs. Now much new software is unicode aware, and
may even support it to a large degree. Still, a lot needs to be done.
Most of this is in the minds of developers, to recognize "Unicode
good, byte string bad, Unicode good, byte string bad" :-)

> The idea of raising this came to me when that Russian individual
> posted a message a few days ago that got very garbled in the
> transmission, both subject and content. Again, it was only Martin who
> was able to figure out its content

I made a number of guesses, I admit. It had to be a language which
rarely uses ASCII, and whose encodings don't use bytes < 128. So most
likely it was Greek or Russian - this is expert knowledge one collects
over time. I tried three Russian encodings (again, which one to try
are expert knowledge, and I thought of Windows only last).

Regards,
Martin

Martin v. Löwis

unread,
Jul 18, 2003, 8:57:29 PM7/18/03
to
Robin Munn <rm...@pobox.com> writes:

> Running slrn version 0.9.7.4 on Linux. My terminal is a PuTTY SSH
> connection from a Windows box. slrn --version produces:

For this to work, you need to
a) explain slrn that your console uses UTF-8, and
b) tell PuTTY to use UTF-8 in the console.

Then, slrn should determine that all characters of the message are
supported.

I don't know how to do either a) or b) with PuTTY and slrn; for a),
setting LANG to en_US.UTF-8 might be sufficient.

Regards,
Martin

Alan Kennedy

unread,
Jul 19, 2003, 10:59:16 AM7/19/03
to
Martin v. Loewis

>> So what do you think about this message?:
>>
>> γίγνωσκω
>>
>> Look Ma, no markup. And not every character uses two bytes, either.
>> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ)
>> if I want to.

Florian Schulze wrote:

> And [it displays properly on] Opera (7.11).

> I would also like to know what's the trick.

The final point I'd like to make [explicit] is: nobody had to ask me
how or why my xml snippet worked: there were no tricks. Nobody asked
for debugging information, or for reasons why they couldn't see it:
everyone saw/heard/felt it. Thus saving me a large amount of
(time|effort|embarassment)+, helping people perceive what I was
saying,
or apologising to those that it slipped past.

One person, Bengt, said that he couldn't see it, but posted another
piece of very similar markup, i.e. (ht|x)ml, that worked for him and
everybody else.

simple-and-universally-interchangable-encodings-and-structures-are-a-honking-great-idea:-let's-do-more-of-those-ly
y'rs.

Ben Finney

unread,
Jul 20, 2003, 12:25:49 AM7/20/03
to
On Sat, 19 Jul 2003 15:59:16 +0100, Alan Kennedy wrote:
> One person, Bengt, said that he couldn't see it

This is identical to the justification of "$BIGNUM percent of our target
users use browser $BROWSER, so we can ignore the rest and use methods
only viewable by browser $BROWSER."

Which quickly leads to "You must use $BROWSER to view this site". No
thanks.

Provide a method that degrades gracefully to ASCII, the current
standard; then I'll be interested.

--
\ "I'd like to see a nude opera, because when they hit those high |
`\ notes, I bet you can really see it in those genitals." -- Jack |
_o__) Handey |

eltr...@juno.com

unread,
Jul 19, 2003, 5:29:52 PM7/19/03
to
 
On Sat, 19 Jul 2003 15:59:16 +0100 Alan Kennedy <ala...@hotmail.com> writes maybe none of the below:

> Martin v. Loewis
> >> So what do you think about this message?:
> >>
> >> γίγνωσκω

> >>
> >> Look Ma, no markup. And not every character uses two bytes,
> either.
> >> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ)
> >> if I want to.

>
>
> The final point I'd like to make [explicit] is: nobody had to ask me
> how or why my xml snippet worked: there were no tricks. Nobody asked
> for debugging information, or for reasons why they couldn't see it:
 

Content-Type: text/plain; charset=iso-8859-1

Content-Transfer-Encoding: 8bit

is how my email picked it up.

other messages and replys were inconsistently rendered.

isn't this the kind of thing the test groups were designed for?

IIR, 7 bits is the standard in email & usenet.

 no one should expect more than 7

from any program or relay. the proper way

would seem to be html using char entity's

html appears to be no ones favorite format in email or usenet,

so let the flames begin.

I hope I haven't been hoodwinked into replying in 8bit as well.

 

 

e

Alan Kennedy

unread,
Jul 20, 2003, 10:16:48 AM7/20/03
to
Alan Kennedy wrote:

>> One person, Bengt, said that he couldn't see it

Ben Finney wrote:

> This is identical to the justification of "$BIGNUM percent of our
> target users use browser $BROWSER, so we can ignore the rest and
> use methods only viewable by browser $BROWSER."

Hmm, I fail to see the connection here. Fair enough, I made a mistake
in structuring my original xml snippet. I didn't attempt to address
the fact there are still some browsers out there that don't do XML.
Bengt corrected that mistake by providing an HTML snippet that works
in
non-XML browsers as well, i.e. a superset of the set I covered. Given
the current market breakdown for browsers, I guesstimate that Bengt's
snippet worked for > 99.9% of recipients.

> Which quickly leads to "You must use $BROWSER to view this site".
> No thanks.

No, that's the precise opposite of the point I was making. My position
is "You must use markup-capable software to perceive what I've
written. Your choice of software is entirely up to you: the only
requirement is the ability to process (x|ht)ml". I try to avoid
platform/language/os/browser dependent anything: that was the whole
point of the post.

> Provide a method that degrades gracefully to ASCII, the current
> standard; then I'll be interested.

#------------------------------------------------------------
snippet = """<?xml version="1.0" encoding="utf-8"?>


<verb>&#x3b3;&#x3af;&#x3b3;&#x3bd;&#x3c9;&#x3c3;&#x3ba;&#x3c9;</verb>
"""

def is7bitclean(s):
for c in s:
if ord(c) > 127:
return 0
return 1

if is7bitclean(snippet):
print "Yep, it's clean."
else:
print "Thou hast broken the rules."
#------------------------------------------------------------

Is that what you meant by "graceful degradation to ASCII"?

The 7-bit cleanness of my original snippet was the reason why it
arrived safely in everyone's "inbox".

Bengt's even-further-travelling HTML snippet is also 7-bit clean.

And if the message structures used in the protocol transporting the
messages were encoded in XML, you wouldn't even have seen any encoding
declarations or pointy brackets, or had to copy&paste.

Alan Kennedy

unread,
Jul 20, 2003, 10:49:46 AM7/20/03
to
Martin v. Loewis:
>>>> So what do you think about this message?:
>>>>
>>>> γίγνωσκω

>>>>
>>>> Look Ma, no markup. And not every character uses two bytes,
>>>> either.
>>>> And I can use Umlauts (äöü) and Arabic (ءﺎﻣ.ﺔﻛﺮﺷ)
>>>> if I want to.

Alan Kennedy:

> The final point I'd like to make [explicit] is: nobody had to ask
> me how or why my xml snippet worked: there were no tricks. Nobody
> asked for debugging information, or for reasons why they couldn't
> see it:


eltr...@juno.com wrote:

> Content-Type: text/plain; charset=iso-8859-1
>
> Content-Transfer-Encoding: 8bit
>
> is how my email picked it up.
>
> other messages and replys were inconsistently rendered.
>
> isn't this the kind of thing the test groups were designed for?
>
> IIR, 7 bits is the standard in email & usenet.
>
> no one should expect more than 7
>
> from any program or relay. the proper way
>
> would seem to be html using char entity's
>
> html appears to be no ones favorite format in email or usenet,
>
> so let the flames begin.
>
> I hope I haven't been hoodwinked into replying in 8bit as well.

Hmm, on reading and re-reading your points, the only way I can make
sense of them is if I assume that you didn't read the thread from the
start, which is highly recommended

http://tinyurl.com/hhhs

In summary:

1. I managed to make a greek word, using the original greek glyphs,
appear on everyone's "rendering surface", by posting a 7-bit clean XML
snippet. Another poster widened the software coverage even further by
posting a 7-bit clean HTML snippet. Both of our 7-bit markup snippets
travelled safely throughout the entirety of UseNet, including all the
7-bit relays and gateways.

2. The only other person who managed it, without using markup, was
Martin von Loewis, who is so good at this stuff that he confidently
makes statements like "what I did was right: it was Google that got it
wrong". Martin used the UTF-8 character set, i.e. a non-ASCII,
non-7-bit-clean character set, to achieve this. Although I'm sure
Martin could have managed it with UTF-7 as well.

3. If anybody else was willing to give it a try, they don't seem to
have had enough confidence in their knowledge of encodings, MIME,
transports, NNTP, etc, etc, to have actually hit the "send" button, in
case it didn't work. Which doesn't bode well for the average person in
the street: if the technology specialists in this newsgroup don't feel
in command of the issue, what hope for everyone else?

Maybe I should have a poke around the UseNet test groups to see how
many people tried and failed ;-)

Martin v. Löwis

unread,
Jul 20, 2003, 4:56:47 PM7/20/03
to
Alan Kennedy <ala...@hotmail.com> writes:

> Hmm, I fail to see the connection here. Fair enough, I made a mistake
> in structuring my original xml snippet. I didn't attempt to address
> the fact there are still some browsers out there that don't do XML.

You should consider that this is Usenet, though (atleast for those of
use who read comp.lang.python instead of pytho...@python.org). I
don't even use a Web browser to read your message, or to reply to it.
And I definitely don't want to have HTML in email or usenet messages,
since my software cannot display it at all.

Regards,
Martin

Martin v. Löwis

unread,
Jul 20, 2003, 5:04:41 PM7/20/03
to
Alan Kennedy <ala...@hotmail.com> writes:

> 2. The only other person who managed it, without using markup, was
> Martin von Loewis, who is so good at this stuff that he confidently
> makes statements like "what I did was right: it was Google that got it
> wrong". Martin used the UTF-8 character set, i.e. a non-ASCII,
> non-7-bit-clean character set, to achieve this. Although I'm sure
> Martin could have managed it with UTF-7 as well.

It wasn't that hard to do: I only had to ask my newsreader to sent the
message as UTF-8. If my newsreader had chosen a
content-transfer-encoding of base64 or quoted-printable, it even would
have been 7-bit clean.

> 3. If anybody else was willing to give it a try, they don't seem to
> have had enough confidence in their knowledge of encodings, MIME,
> transports, NNTP, etc, etc, to have actually hit the "send" button, in
> case it didn't work. Which doesn't bode well for the average person in
> the street: if the technology specialists in this newsgroup don't feel
> in command of the issue, what hope for everyone else?

It's a matter of time. Web browsers are ahead of all other software,
here, as they first hit the problem of displaying content in a wide
variety of languages. Over time, most email agents and news readers
will catch up, getting it right without bothering the user.

It is not that MIME is more complicated than XML, on the contrary.
It is just that authors of MIME software, for some reason, don't
care that much about these issues.

Regards,
Martin

Mel Wilson

unread,
Jul 20, 2003, 3:28:16 PM7/20/03
to
In article <2259b0e2.0307...@posting.google.com>,
mi...@pitt.edu (Michele Simionato) wrote:
>JanC <usene...@janc.invalid> wrote in message news:<Xns93BB246...@213.118.75.80>...
>> mi...@pitt.edu (Michele Simionato) schreef:
>> > Now, Latin had to verbs for "to know": "scire" and "cognoscere".
>>
>> And "cognovisse". (I still have a latin dictionary too... :)
>
>Huh? "cognovisse" cannot be a verb, what would be the paradigma?

Me too, but it's in Bantam's _The New College Latin &
English Dictionary_. 'Cognovisse' seems to be some
modification of cognoscere. No guidance as to use. Could
it be some non-verbal form like 'To know it is to
misunderstand it'?

Regards. Mel.

Steven D'Aprano

unread,
Jul 20, 2003, 9:56:51 PM7/20/03
to
Alan Kennedy <ala...@hotmail.com> wrote in message news:<3F1AAC0A...@hotmail.com>...

> Alan Kennedy:
>
> > The final point I'd like to make [explicit] is: nobody had to ask
> > me how or why my xml snippet worked: there were no tricks. Nobody
> > asked for debugging information, or for reasons why they couldn't
> > see it:

Sorry Alan, but when I follow your instructions and save your XML to
disk and open it in Opera 6.01 on Win 98, I get this:

XML parsing failed: not well-formed (1:0)

At least it renders visibly in my browser, although I don't think its
rendering the way you wished. <grin>

(For the record, this is the contents of the XML file, triple-quoted
for your convenience:


"""<?xml version="1.0" encoding="utf-8"?>

<verb>&#x3b3;&#x3af;&#x3b3;&#x3bd;&#x3c9;&#x3c3;&#x3ba;&#x3c9;</verb>""")


[snip]


> In summary:
>
> 1. I managed to make a greek word, using the original greek glyphs,
> appear on everyone's "rendering surface", by posting a 7-bit clean XML
> snippet. Another poster widened the software coverage even further by
> posting a 7-bit clean HTML snippet. Both of our 7-bit markup snippets
> travelled safely throughout the entirety of UseNet, including all the
> 7-bit relays and gateways.

I couldn't see either rendered correctly in either Opera's newsreader
or the Google archive.

> 2. The only other person who managed it, without using markup, was
> Martin von Loewis, who is so good at this stuff that he confidently
> makes statements like "what I did was right: it was Google that got it
> wrong". Martin used the UTF-8 character set, i.e. a non-ASCII,
> non-7-bit-clean character set, to achieve this. Although I'm sure
> Martin could have managed it with UTF-7 as well.

Martin's effort did work for me in Opera's newsreader, but not in the
Google Groups archive. But we already knew that Google broke it.

> 3. If anybody else was willing to give it a try, they don't seem to
> have had enough confidence in their knowledge of encodings, MIME,
> transports, NNTP, etc, etc, to have actually hit the "send" button, in
> case it didn't work. Which doesn't bode well for the average person in
> the street: if the technology specialists in this newsgroup don't feel
> in command of the issue, what hope for everyone else?

Exactly. Which brings us back to Ben's suggestion: when writing for a
general audience using unknown systems, stick to ASCII, or at least
follow your rich text with a description of what your reader should
see:

"""And I can use Umlauts (äöü) -- you should see a, o and u all in
lowercase with two dots on top."""

It's a mess and I despair. It would be nice if everyone used bug-free
XML-aware newsreaders, browsers and mail clients, but the majority
don't. That's why I always practice defensive writing whenever I use
any character I can't see on my keyboard, and spell it out in ASCII.
That's not very satisfactory, but its better than some random
percentage of your audience seeing "?????".


--
Steven D'Aprano

Carl Banks

unread,
Jul 20, 2003, 11:51:59 PM7/20/03
to
Alan Kennedy <ala...@hotmail.com> wrote in message news:<3F168011...@hotmail.com>...

> So, the challenge to the ASCII proponents is: put the greek word
> "gignooskoo" on everybody's screen, originating from a usenet message,
> in the original greek, where "oo" -> greek letter omega.


I accept.

____ ____ ___ _____ ___
| | | |\ | / \ \ | / / \
| | | | \ | | | \ | / | |
| | | | \ | | | > |< | |
| | | | \| \ / / | \ \ /
| | | | | _\ /_ /____ | \ _\ /_

:-)

--
CARL BANKS

Bengt Richter

unread,
Jul 21, 2003, 12:03:39 AM7/21/03
to

Here's a way that's been around a while (you have ghostscript, right?)

====< gignooskoo.ps >====================================
gsave 72 72 scale
/Symbol findfont 1.0 scalefont setfont
1.0 10.0 moveto (\147\151\147\156\167\163\153\167) show
showpage grestore
=========================================================


Of course, if you use tools (ms word, pdfwriter) to get that done,
you'll wind up with 24,655 bytes of resources and font info and privacy compromise
instead of 135 bytes of native PS level 1 ;-)

Or a 102-byte one-liner that may not be multipage context friendly, but should show in ghostscript:

/Symbol findfont 72 scalefont setfont 72 720 moveto (\147\151\147\156\167\163\153\167) show showpage

Regards,
Bengt Richter

Mark Hadfield

unread,
Jul 21, 2003, 12:20:48 AM7/21/03
to
Carl Banks wrote:

I am old enough to remember ASCII art, though I won't admit to ever
participating in this activity myself. (It used to be said that it makes
one go blind.) I think that answer has to take first prize for originality.

I hope you won't think if I'm a spoilsport in pointing out that it fails
the "everybody" test, as many people use proportional fonts in their
news readers.

--
Mark Hadfield "Ka puwaha te tai nei, Hoea tatou"
m.had...@niwa.co.nz
National Institute for Water and Atmospheric Research (NIWA)

Ben Finney

unread,
Jul 21, 2003, 5:15:46 AM7/21/03
to
On Mon, 21 Jul 2003 16:20:48 +1200, Mark Hadfield wrote:
> I hope you won't think if I'm a spoilsport in pointing out that it
> fails the "everybody" test, as many people use proportional fonts in
> their news readers.

*All* solutions fail the "everybody" test. Newsreaders use the NNTP
protocol, which (IIRC) only requires 7-bit ASCII. (Regardless, most
newsreaders only expect 7-bit ASCII, and sometimes implement 8-bit in
inconsistent ways).

"International characters" and "everybody" are, at present, incompatible
criteria.

--
\ "Quidquid latine dictum sit, altum viditur." ("Whatever is |
`\ said in Latin, sounds profound.") -- Anonymous |
_o__) |

Ben Finney

unread,
Jul 21, 2003, 5:45:08 AM7/21/03
to
On Sun, 20 Jul 2003 15:16:48 +0100, Alan Kennedy wrote:
> Ben Finney wrote:
>> Which quickly leads to "You must use $BROWSER to view this site".
>> No thanks.
>
> No, that's the precise opposite of the point I was making. My position
> is "You must use markup-capable software to perceive what I've
> written. Your choice of software is entirely up to you: the only
> requirement is the ability to process (x|ht)ml". I try to avoid
> platform/language/os/browser dependent anything: that was the whole
> point of the post.

You also stipulated "... from a Usenet post". Most Usenet readers do
not handle markup, nor should they. There are many benefits from the
fact that posts are plain text, readable by any software that can handle
character streams; parsing a markup tree for an article is a whole order
of complexity that I'd rather not have in my newsreader.

Expecting people to use a news reader that attempts to parse markup and
render the result, is like expecting people to use an email reader that
attempts to parse markup and render ther result. Don't.

--
\ "I was in the grocery store. I saw a sign that said 'pet |
`\ supplies'. So I did. Then I went outside and saw a sign that |
_o__) said 'compact cars'." -- Steven Wright |

Alan Kennedy

unread,
Jul 21, 2003, 5:51:32 AM7/21/03
to
I don't want to go on and on about this, and I'm happy to concede that
some of my points are far from proven, and others are disproven.
However, there are one or two small points I'd like to make.

Ben Finney wrote:

>>> Which quickly leads to "You must use $BROWSER to view this site".
>>> No thanks.

Alan Kennedy wrote:

>> No, that's the precise opposite of the point I was making.

Ben Finney wrote:

> You also stipulated "... from a Usenet post". Most Usenet readers
> do not handle markup, nor should they. There are many benefits from
> the fact that posts are plain text, readable by any software that can
> handle character streams;

1. While there may be benefits from posts being plain text, there are
also costs. The cost is a "semantic disconnect", where related
concepts are not searchable, linkable or matchable, because their
character representations are not comparable.

2. I chose the "from a usenet post" restriction precisely because of
the 7-bit issue, because I knew that 8-bit character sets would break
in some places. It was an obstacle course.

> parsing a markup tree for an article is a whole order
> of complexity that I'd rather not have in my newsreader.
>
> Expecting people to use a news reader that attempts to parse markup
> and render the result, is like expecting people to use an email reader
> that attempts to parse markup and render ther result. Don't.

I don't expect people's newsreaders or email clients to start parsing
embedded XML (I nearly barfed when I saw Microsoft's "XML Data
Islands" for the first time).

What I'm really concerned about is the cultural impact. I voluntarily
maintain a web site for an organisation that has members in 26
countries, who not surprisingly have lots of non-ASCII characters in
their names. Here's one:

http://www.paratuberculosis.org/members/pavlik.htm

Because of the ASCII restriction in URLs, I was only able to offer Dr.
Pavlík the above uri, or this:

http://www.paratuberculosis.org/members/pavl%EDk.htm

which sucks.

Little wonder then that the next generation are choosing to explicitly
remove the accents from their names, i.e. his colleague Dr. Machackova
explicitly asked to have the accents in her name removed. Although I
assured her that her name would be correctly spelled, on web sites
that I maintain, the fact that her name breaks continually with
various ASCII centric technologies makes her think it's not worth the
hassle, or worth the risk of searches for her name failing.

http://www.paratuberculosis.org/members/machackova.htm

And what about Dr. Sigurðardóttir, Dr. Djønne, and Dr. de la Cruz
Domínguez Punaro? Are they destined to be passed over more often than
ASCII-named people?

[BTW, I've written the above in "windows-1252", apologies if it gets
mangled]

Solely because of technical inertia, and unwillingness to address the
(perhaps excessive) complexity of our various communications layers,
i.e. our own "Tower of 7-bit Babel", we're suppressing cultural
diversity, for no technically valid reason.

I personally don't have the slightest problem with reformulating NNTP
and POP to use XML instead: In a way, I think it's almost inevitable,
given how poor our existing "ascii" technologies are at dealing with
i18n and l10n issues. Emails and usenet posts are all just documents
after all.

Would something like this really be so offensive (the Gaelic isn't, I
promise :-)? Or inefficient?

#begin---------
<?xml version="1.0" encoding="windows-1252"?>
<xnntp>
<subject>An mhaith l'éinne dul go dtí an nGaillimh Dé
Domhnaigh?</subject>
<from>aláin ó cinnéide</from>
<to>na cailíní agus na buachaillí</to>
</xnntp>
#end-----------

Alan Kennedy

unread,
Jul 21, 2003, 5:55:14 AM7/21/03
to
Alan Kennedy:

>> So, the challenge to the ASCII proponents is: put the greek word
>> "gignooskoo" on everybody's screen, originating from a usenet message,
>> in the original greek, where "oo" -> greek letter omega.

Carl Banks wrote:

> I accept.
>
> ____ ____ ___ _____ ___
> | | | |\ | / \ \ | / / \
> | | | | \ | | | \ | / | |
> | | | | \ | | | > |< | |
> | | | | \| \ / / | \ \ /
> | | | | | _\ /_ /____ | \ _\ /_
>

Very clever!

You win a prize: a nice pint of cold frothy beer!

oOoOo
,==|||||
|| |||||
|| |||||
'==HHHHH
"""""

Sláinte mhaith! :-)

Bengt Richter

unread,
Jul 21, 2003, 5:03:17 PM7/21/03
to
On Mon, 21 Jul 2003 10:51:32 +0100, Alan Kennedy <ala...@hotmail.com> wrote:

>I don't want to go on and on about this, and I'm happy to concede that
>some of my points are far from proven, and others are disproven.
>However, there are one or two small points I'd like to make.
>

Ditto ;-)

>Ben Finney wrote:
>
>>>> Which quickly leads to "You must use $BROWSER to view this site".
>>>> No thanks.
>
>Alan Kennedy wrote:
>
>>> No, that's the precise opposite of the point I was making.
>
>Ben Finney wrote:
>
>> You also stipulated "... from a Usenet post". Most Usenet readers
>> do not handle markup, nor should they. There are many benefits from
>> the fact that posts are plain text, readable by any software that can
>> handle character streams;
>
>1. While there may be benefits from posts being plain text, there are
>also costs. The cost is a "semantic disconnect", where related
>concepts are not searchable, linkable or matchable, because their
>character representations are not comparable.

<author>I</author><verb>hope</verb><addressee>you</addressee>
don't want some awful garbage like the above line in postings?
Especially since the bulk of it would probably be automatically
generated an MS NLP feature ;-/

>
>2. I chose the "from a usenet post" restriction precisely because of
>the 7-bit issue, because I knew that 8-bit character sets would break
>in some places. It was an obstacle course.

I see this as a separate issue from semantics, though. Encoding consistent
signs for identical things is a different problem from handling and encoding
the *meaning* of the things indicated.

>
>> parsing a markup tree for an article is a whole order
>> of complexity that I'd rather not have in my newsreader.
>>
>> Expecting people to use a news reader that attempts to parse markup
>> and render the result, is like expecting people to use an email reader
>> that attempts to parse markup and render ther result. Don't.
>
>I don't expect people's newsreaders or email clients to start parsing
>embedded XML (I nearly barfed when I saw Microsoft's "XML Data
>Islands" for the first time).
>
>What I'm really concerned about is the cultural impact. I voluntarily
>maintain a web site for an organisation that has members in 26
>countries, who not surprisingly have lots of non-ASCII characters in
>their names. Here's one:

Ok, but, do we need to embed a full markup language to handle small
encoding exceptions, it that's the real concern? (IMO, no ;-)

>
>http://www.paratuberculosis.org/members/pavlik.htm
>
>Because of the ASCII restriction in URLs, I was only able to offer Dr.
>Pavlík the above uri, or this:
>
>http://www.paratuberculosis.org/members/pavl%EDk.htm
>
>which sucks.

Which part, though? The encoding, or the fact that you see the encoding
in the above instead of its being rendered with the intended appearance?

IOW, any solution will involve *some* encoding and the possibility of
rendering it "raw" or interpreted. A smart GUI might have a default
mode showing everything interpreted, and have a "view source" button.

But "any solution" is not where we are. I think most people would object to
getting this kind of stuff in what appears to be just visually enhanced mail,
if they were aware every time it happened:
(I hope nobody's too-smart-for-its-own-good viewer tries to see the following
as actual MIME content ;-/)

-----
...
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----000000000000000000000000000000000000000000000000000000000000000"

<x-html>
<html>
<body>
<A HREF="http://ad.doubleclick.net/jump/N2870.or/B914513.8;sz=1x1;ord=[timestamp]?">
<IMG SRC="http://ad.doubleclick.net/ad/N2870.or/B914513.8;sz=1x1;ord=[timestamp]?" BORDER=0 WIDTH=1 HEIGHT=1 ALT=""></A>
...
-----
The trouble is that any automated following of references out of received information
is effectively a stealth channel for info, if only to inform that your computer processed
the message. Not to mention cookie stuff, and exploitation of real security holes.

The defense of filtering out requests to alternate server sources may be a reasonable
compromise for web viewing, but IMO such defenses should not be necessary in email.

But the large majority of users will use MS stuff with whatever defaults MS decided
good for something. So an email winds up doing a lot more than presenting different
language encodings and fonts well.

That's why alternatives have appeared, but what happens if an html chunk is handed to
a MS system DLL to do rendering? How limited is the interpretation? Should it all be
rewritten from scratch and duplicate lots of stuff already available? What can safely
be handed off? A whole email preview pane presentation?

>
>Little wonder then that the next generation are choosing to explicitly
>remove the accents from their names, i.e. his colleague Dr. Machackova
>explicitly asked to have the accents in her name removed. Although I
>assured her that her name would be correctly spelled, on web sites
>that I maintain, the fact that her name breaks continually with
>various ASCII centric technologies makes her think it's not worth the
>hassle, or worth the risk of searches for her name failing.
>

That is a complaint about the current state of affairs, though.
The problem is migration to new tools without unacceptable backwards breakage.
Unfortuantely, the lowest common denominator tends to be a breakage solution.

>http://www.paratuberculosis.org/members/machackova.htm
>
>And what about Dr. Sigurðardóttir, Dr. Djønne, and Dr. de la Cruz
>Domínguez Punaro? Are they destined to be passed over more often than
>ASCII-named people?
>
>[BTW, I've written the above in "windows-1252", apologies if it gets
>mangled]
>
>Solely because of technical inertia, and unwillingness to address the
>(perhaps excessive) complexity of our various communications layers,
>i.e. our own "Tower of 7-bit Babel", we're suppressing cultural
>diversity, for no technically valid reason.
>
>I personally don't have the slightest problem with reformulating NNTP
>and POP to use XML instead: In a way, I think it's almost inevitable,
>given how poor our existing "ascii" technologies are at dealing with
>i18n and l10n issues. Emails and usenet posts are all just documents
>after all.

Right, but are they multimedia presentations? I like the option to have
the latter, but only by optional and intentional linkage following, and
rendered by a separate invoked tool, not as an email reader built-in.

That might seem like a fine distinction, but it is easier to trust a limited
tool with well-defined and manual control transfers to other functionality
than it is to trust and automated doitall. That is the comfort of plain ascii, IMO,
and for many that comfort is worth a fair amount of other discomfort.

If the information for the other tool has to be embedded, we have MIME attachments,
but IMO they should not be delivered by default. ISTM having a selection of simple
separate tools that do limited things only on manual command would be better for email
than having it be an instance of general purpose XHTML processing.

>
>Would something like this really be so offensive (the Gaelic isn't, I
>promise :-)? Or inefficient?
>
>#begin---------
><?xml version="1.0" encoding="windows-1252"?>
><xnntp>
> <subject>An mhaith l'éinne dul go dtí an nGaillimh Dé
>Domhnaigh?</subject>
> <from>aláin ó cinnéide</from>
> <to>na cailíní agus na buachaillí</to>
></xnntp>
>#end-----------

Hm, your post arrived to me with this in the header:
--


Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

--

ISTM the problem is the dual aspect of some meta-data (e.g. headers).
I.e., to recognize meta-data, something has to assume an encoding for *it*.
The simplest assumption is a fixed standard assumption, like ascii.
This dual-aspect problem appears also in file systems, where a box handle
also serves as a box label and loose content-type indicator.

If you want meta-data also to be GUI-presentation-encoded data you
are getting away from a standard assumption, or perhaps substituting
the utf-8 standard assumption of XML. If the latter, why not just let
that be it, without involving the tagged markup cruft of XML, unless
you have specific goals for the markup per se? I can see such goals,
but I don't think they belong in normal email, and to do it just to
identify header elements is IMO an unclean solution to what rfc2822
already does much more readably (taking glyph encoding as a separate issue).

The problem with multiple encoding declarations is that they have to be
recognized as meta-data. They belong *on* the box, not in it. The only
way to be in a box is to be on a box inside a box that can contain others
in a way that separates the contained boxes, like multi-part MIME sections.

Otherwise an nested encoding declaration would have to be *itself* encoded
in whatever the current encoding was. But then you have to decide that it
wasn't just peculiar data, and you have to invent an escape, etc...

Just switching to utf-8 as a standard assumption for "box labels"
(e.g., email headers and file names etc.) would IMO go a long way
towards avoiding XML for email bodies. Thus

...
From: Alan Kennedy <ala...@hotmail.com>
Organization: just me.
...


Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

...
would be encoded in utf-8, but after the blank line that ends the header,
it would be assumed text/plain; charset=iso-8859-1.

I'm sure people like Martin have given this a lot more thought (not to mention
relevant implementation work ;-) than I have, so I would not be surprised to
have important issues I've disrgarded pointed out.

For special effects in emails, I could see borrowing the XML "processing instruction"
escape, of which <?xml ... ?> itself is an example. I.e., That syntax means invoke
the named processing, passing ... up to the ?> as arguments to the processing program.

I wouldn't want just any of these to take off automatically, though. Imagine, e.g.,

<?python
import os
os.system( '...nastiness...')
?>

A smart email reader could hide <?xxx ...?> as a clickable xxx and let you right-click
to see the source, but an old reader would just show it as above.

Embedded pictures could e.g. be specific to a base64-encoded (and *that* represented in the
current encoding, along with the entire <? ... ?>) gif, like
<?gif ...base64 stuff...?> Clicking the highlighted "gif" might pop up a picture
in a child window, etc.

<?svg ... ?> Might be interesting also. All these things could be designed to operate
on immediate data vs referred-to data, where the latter could refer to attachments or
urls or whatever.

This mechanism could also be used to solve the gignooskoo problem, something like
(here with immediate data):

<?utf8 &#947;&#943;&#947;&#957;&#969;&#963;&#954;&#969;?>

Where that whole thing is encoded within the current message in its current encoding
(which can't be violated by including actual utf-8 characters not in common)
but is seen by the smart email reader as invokng "utf8" processing. That might be one
that one would elect to have the reader do automatically.

Again, for email readers that don't understand <?xxx ... ?> you just see it as above.

But converting email/news wholesale to an instance of XHTML, please no! ;-)

Regards,
Bengt Richter

Ben Finney

unread,
Jul 21, 2003, 9:46:12 PM7/21/03
to
On Mon, 21 Jul 2003 10:51:32 +0100, Alan Kennedy wrote:
> Ben Finney wrote:
> > Expecting people to use a news reader that attempts to parse markup
> > and render the result, is like expecting people to use an email
> > reader that attempts to parse markup and render ther result. Don't.
>
> Solely because of technical inertia, and unwillingness to address the
> (perhaps excessive) complexity of our various communications layers,
> i.e. our own "Tower of 7-bit Babel", we're suppressing cultural
> diversity, for no technically valid reason.

Yes. The solutions must involve a significant sociological element,
since that is a large part of the current situation.

> I personally don't have the slightest problem with reformulating NNTP
> and POP to use XML instead: In a way, I think it's almost inevitable,
> given how poor our existing "ascii" technologies are at dealing with
> i18n and l10n issues. Emails and usenet posts are all just documents
> after all.

I've no idea, though, why you keep banging on about XML for simple,
plain-text documents. Substitute XML with UTF-8 in the above, and I
agree entirely. This is a problem of character encodings, yet you keep
wanting to apply a heavy, structural markup solution. That way lies
HTML/XML email, and it's totally unnecessary and unhelpful.

Email and NNTP are lightweight, freeform, unstructured document formats,
and they're good that way. Nothing you've said so far has offered even
a pretence of a reason for abandoning freeform text formats for heavy,
markup-oriented formats.

Where character encoding is the problem, Unicode is the current best
solution. But that in no way necessitates a markup format.

--
\ "If you're not part of the solution, you're part of the |
`\ precipitate." -- Steven Wright |
_o__) |

Alan Kennedy

unread,
Jul 21, 2003, 5:25:19 PM7/21/03
to
[Much discussion elided, because it's definitely way off topic]

Bengt Richter wrote:

> Which part, though? The encoding, or the fact that you see the encoding
> in the above instead of its being rendered with the intended appearance?
>
> IOW, any solution will involve *some* encoding and the possibility of
> rendering it "raw" or interpreted. A smart GUI might have a default
> mode showing everything interpreted, and have a "view source" button.

I did just want to make one quick point about this.

I believe that a major part of the reasons for the success of the URI
scheme is their simplicity and transcribability, especially in
situations that may not immediately involve a computer. I think most
people been through one or more of the following

1. Spoken a URI over the telephone.
2. Seen URIs on passing taxis/trucks/trains/planes/automobiles
3. Scribbled a URI on a business card, or scrap of paper
4. Sent a URI by SMS text message, i.e. tapping it out on a 10-key
phone keypad.
5. Printed a URI on their own business card.
6. "Handwritten" a URI into PDA handwriting recognition software.

I have a little difficulty with the insistence that 2 character
iso-latin-1 escape codes go easily in all the above situations. Should
I have to say

http://xhaus.com/%F3cinn%E9ide

instead of

http://xhaus.com/ócinnéide

If I use the latter, then that's an illegal URI.

It's a wart.

But it's not a python wart, so I'll shut up now :-L

regards,

Martin v. Löwis

unread,
Jul 21, 2003, 6:36:23 PM7/21/03
to
Alan Kennedy <ala...@hotmail.com> writes:

> http://xhaus.com/ócinnéide
>
> If I use the latter, then that's an illegal URI.

However, it is a valid IRI.

Regards,
Martin

Martin v. Löwis

unread,
Jul 21, 2003, 6:38:25 PM7/21/03
to
Alan Kennedy <ala...@hotmail.com> writes:

> I personally don't have the slightest problem with reformulating NNTP
> and POP to use XML instead: In a way, I think it's almost inevitable,
> given how poor our existing "ascii" technologies are at dealing with
> i18n and l10n issues. Emails and usenet posts are all just documents
> after all.
>
> Would something like this really be so offensive (the Gaelic isn't, I
> promise :-)? Or inefficient?

It would be pointless doing so. All the infrastructure needed to
communicate different encodings in NNTP is already there - no need to
change any protocol at all. The specification part has already been
done. It is called MIME.

What is lacking is the implementations. Just saying "Use XML then"
won't magically make implementations appear.

Regards,
Martin

Alan Kennedy

unread,
Jul 21, 2003, 6:51:58 PM7/21/03
to
Alan Kennedy:

>> Solely because of technical inertia, and unwillingness to address
>> the (perhaps excessive) complexity of our various communications
>> layers, i.e. our own "Tower of 7-bit Babel", we're suppressing
>> cultural diversity, for no technically valid reason.

Ben Finney wrote:

> Yes. The solutions must involve a significant sociological element,
> since that is a large part of the current situation.

The only point I would like to add is that I think all of our
sociological situations are going change rather rapidly over the next
few years. The business landscape has changed pretty radically over
the last few years, I think it's going to change even more once global
initiatives such EB-XML, etc, kick in.

>> I personally don't have the slightest problem with reformulating
>> NNTP and POP to use XML instead: In a way, I think it's almost
>> inevitable, given how poor our existing "ascii" technologies are
>> at dealing with i18n and l10n issues. Emails and usenet posts are
>> all just documents after all.
>
> I've no idea, though, why you keep banging on about XML for simple,
> plain-text documents. Substitute XML with UTF-8 in the above, and I
> agree entirely. This is a problem of character encodings, yet you
keep
> wanting to apply a heavy, structural markup solution. That way lies
> HTML/XML email, and it's totally unnecessary and unhelpful.

You quite probably put all of your email through a virus scanner? If
it's full nasty xml, why not put it through an XML transform instead,
that removes the images/webbugs for example, or "downcasts" it to
ASCII? It's just a stage in the processing chain. Write your own
transform
if you wish, it will interoperate seamlessly with your gateway
processing
software, regardless of implementation language. It's only XML.

BTW, Bengt, <subject>I</subject><verb
person="firstsingular">hate</verb>
<object type="anonymous">this</object><conjunctive>too</conjunctive>.

Ugh!

> Email and NNTP are lightweight, freeform, unstructured document
> formats, and they're good that way.

POP and NNTP were great for what they were designed for: sending ascii
messages in the age of uucp, uuencode, 2.4K modems, acoustic couplers
and phone-phreakers. These days, there's so much protocol wrapping and
unwrapping going on that mistakes happen all the time. Witness the
wrong URIs episode today in "Python-URL".

These days, you and I could probably have a reasonable
telephone^H^H^H^H^H^H^H^H^H voice conversation over the net. You're in
Australia (g'day) and I'm in Ireland. Should be only 0.3 to 0.5 second
delay. I have a 256Kbit/1024Kbit connection. You? We have much more
reliable connections now, and perhaps we need to focus a little more
on what information we're sending, rather than how we're sending it.

> Nothing you've said so far has offered even
> a pretence of a reason for abandoning freeform text formats for
> heavy, markup-oriented formats.

In my simplistic view, I see an increasing requirement in the modern
day for people requiring to communicate meaningfully with each other:
to work together on complicated projects, across barriers of time
zones. Working together requires good communications. But really
achieving anything requires structuring ideas and information, and
structuring ideas means (to me) either using the same software or
forming a common framework within which to operate.

In the wonderfully diverse world of open source, interoperability
among diversity is a highly desired quality, i.e. everybody uses their
own preferred combination of software. I see XML as offering a simple
way to form those same common frameworks, which will allow
NON-technical people to work together on structuring and coordinating
their efforts, and to also enjoy the benefits of interoperability
among diversity. It will allow the less IT-literate PhDs in
microbiology, medical science, genomics, renewable energy,
anthropology, etc, etc, etc, to better coordinate their efforts. I
expect an explosion of data interchange among non-IT scientists, now
that all the serious office packages, i.e. Microsoft and Open*Office,
are using XML as their formats. All the world's data, suddenly
free(-ish) to move between software and and between people and their
preferences.

The kind of protocols we need in the modern day are protocols like
UDDI: Universal Description and Discovery of Web Services.

http://www.uddi.org/

And I expect that 99% of the time, people who use UDDI services will
never see an angle-bracket, publishing or subscribing. But if they
want to become publishers on minimal resources, they'll be able to
simply put their IT together, using wonderful language tools such as
python processing XML.

No barrier, technical or financial, to entry.

> Where character encoding is the problem, Unicode is the current best
> solution. But that in no way necessitates a markup format.

Let's tear down all the old copper wires, and replace them with
microwave towers. They were an eyesore anyway ;-)

regards,

Steven Taschuk

unread,
Jul 21, 2003, 8:36:40 PM7/21/03
to
Quoth Mel Wilson:
[Latin verbs for 'to know']

> Me too, but it's in Bantam's _The New College Latin &
> English Dictionary_. 'Cognovisse' seems to be some
> modification of cognoscere. [...]

It is, in fact, the perfect infinitive of <cognoscere>.

According to Wheelock, <cognoscere> means "to learn" (etc.), and
*in perfect tenses* means "to know". Note that you know
something (present tense) iff you have learned it (perfect tense).

So if Bantam lists <cognovisse> as a translation of "to know", I
see no reason to object. It would be surprising if they list it
as a head word, as if it were the infinitive of a (purely
notional) verb
cognovi, cognovisse, cognoveram, cognitus eram
(which are analogues of the usual principal parts under the
mapping present -> perfect).

--
Steven Taschuk stas...@telusplanet.net
"Please don't damage the horticulturalist."
-- _Little Shop of Horrors_ (1960)

Christos TZOTZIOY

unread,
Jul 21, 2003, 10:20:57 PM7/21/03
to
On Thu, 17 Jul 2003 11:53:05 +0100, rumours say that Alan Kennedy
<ala...@hotmail.com> might have written:

>Here it is in a format where almost everybody will be able to see the
>original greek verb on their screen.
>
>#---------


><?xml version="1.0" encoding="utf-8"?>
><verb>&#x3b3;&#x3af;&#x3b3;&#x3bd;&#x3c9;&#x3c3;&#x3ba;&#x3c9;</verb>

>#---------

if I may... that should be

<?xml version="1.0" encoding="utf-8"?>

<verb>&#x3b3;&#x3b9;&#x3b3;&#x3bd;&#x3ce;&#x3c3;&#x3ba;&#x3c9;</verb>

since a word ending in a long syllable could never have an accent in the
third syllable from the end (that still applies in modern greek).
--
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.

JanC

unread,
Jul 21, 2003, 11:18:03 PM7/21/03
to
Paul Foley <s...@below.invalid> schreef:

> betacode

Thanks, I didn't know about that.
Have now searched Google for "greek betacode" and have bookmarked some of
the results. :-)

--
JanC

"Be strict when sending and tolerant when receiving."
RFC 1958 - Architectural Principles of the Internet - section 3.9

Christos TZOTZIOY

unread,
Jul 22, 2003, 12:59:08 PM7/22/03
to
On Fri, 18 Jul 2003 20:25:20 +0100, rumours say that Alan Kennedy
<ala...@hotmail.com> might have written:

>It is also worth noting that your message and messages quoting it are
>the only hits that turn up in a Google Groups search using the
>original greek text as a search term: i.e. I go to Google Groups and
>paste in the greek letters.

If I may say it sideways, try searching for google for the french word
"càfe", as in "un càfe s'il vous plaît". Wanna bet my message will be
the only one to be found? :)

(I'd lose that bet. There would be two more posts, another one with the
word incorrectly written --but it's excusable judging on the newsgroup
name ;-) -- and one in swedish).

The word you should be looking for is 'gignw/skw', not 'gi/gnwskw'.

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=%CE%B3%CE%B9%CE%B3%CE%BD%CF%8E%CF%83%CE%BA%CF%89&btnG=Google+Search

(one huge line)

It is loading more messages.
0 new messages