Comments on Microsoft Open Source documentA

Chip Salzenberg

unread,

Nov 8, 1998, 3:00:00 AM11/8/98

to

According to Alex Belits:
> UTF-8 can't be handled well with regexps, used widely in Perl [...]

No perl FUD please. Perl's development source tree already includes
fully functional support for UTF-8, including regexes.
--
Chip Salzenberg - a.k.a. - <ch...@perlsupport.com>
"There -- we made them swerve slightly!" //MST3K

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Olaf Titz

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

> > Makes me think of writing application proxies that trash any attempt to
> > use M$ proprietary extensions.

> A more productive thing to do would be define and implement open extensions
> that have the same functionality as any proprietary MS extensions to standard
> protocols.

That has not worked in the past, e.g. when Netscape designed frames
in order to be incompatible to the accepted standard. I see no reason
why people should use the non-Microsoft thing, in the face of
"everyone" (read: the M$ influenced press) telling them that because
it's not M$, it must be inferior.

(Which is the only possible reason to use Windows anyway.)

> In many cases, the standard protocols aren't optimal. E.g., if one were
> designing the web from scratch, one could do a lot better than HTTP.

Most older protocols are even less optimal (I have in mind RFC 822 and
FTP as the worst offenders), but everyone keeps using them. The only
protocol that ever has successfully been abandoned since the
introduction of TCP was Gopher. Even HTTP/1.0 can do everything that
FTP does and can do it better (much easier to implement in certain
regards), but this hasn't stopped FTP from being used.

olaf

Riley Williams

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

Hi Olaf.

>> In many cases, the standard protocols aren't optimal. E.g., if
>> one were designing the web from scratch, one could do a lot better
>> than HTTP.

> Most older protocols are even less optimal (I have in mind RFC 822
> and FTP as the worst offenders), but everyone keeps using them. The
> only protocol that ever has successfully been abandoned since the
> introduction of TCP was Gopher.

Nodz...

> Even HTTP/1.0 can do everything that FTP does and can do it better
> (much easier to implement in certain regards), but this hasn't
> stopped FTP from being used.

One thing http/1.0 certainly does better than ftp is to freeze and
drop a link in mid transfer - something that happens with such
monotonous regularity for me that I've stopped even considering using
http for file transfers...

Best wishes from Riley.

Alan Cox

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

> Most older protocols are even less optimal (I have in mind RFC 822 and
> FTP as the worst offenders), but everyone keeps using them. The only
> protocol that ever has successfully been abandoned since the

FTP is dying, the main things that keep it alive are the fact http
daemons are bad at handing out large files, and the fact http clients dont
use byte ranges on broken file transfer retries.

Alan

Alex Belits

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

On Mon, 9 Nov 1998, Theodore Y. Ts'o wrote:

> 4. Unicode makes a displaying problem non-issue (all characters are in
> one huge font) at the price of modifying all string-handling routines.
> That however includes complete incompatibility with existing charsets,
> and lack of language-labeling.
>
> The reality is that you have to odify all string-handling routines
> anyway, because of languages like Chinese where 8-bit characters simply
> aren't enough.

And it already was done without Unicode. And surprise in cases where
UTF-8 can be processed transparently as a byte stream, so can every
other charset.

> You also want to be able to handle multiple languages using different
> character sets inside one particular document,

First, in most of cases it's a strawman argument because such documents
(ones that don't fit into any existing charset, considering that English
is always can be used in all national standards) are extremely rare, and
linguists that actually use such documents in most of cases still need
something more complex than that.

Second, I consider language/charset labeling to be a real problem that
should be solved regardless of appearing often or rare in one document,
and not because of languages mixing in documents but because of the need
of better language labeling in general, and the case of mixed-language
document is an example that should be considered to produce the most
clean, convenient solution. Unicode is simple, but not convenient from the
language handling point of view.

> which is in fact
> *simpler* to do with Unicode, since it's all (as you put it) one
> gigantic font. If you don't do this, you end up needing to have magic
> character-set switching escape sequences (or MIME-style headers, or some
> other complex solution), and your string and display routines end up
> getting just as complex, if not more so.

I only claim that the problem is complex enough to require such a
complex solution -- simple one solves a tiny part of it, but cuts any way
back to labeling because the whole puprose of it is to remove all
labeling. With language labeling the purpose of Unicode is lost --
and of course, Unicode supporters never proposed any language labeling
or language-dependent processing for UTF-8 strings.

> The bottom line is that doing internationalization is hard.

Exactly my statement minus a part that "simple" solutions cause
enormous harm.

> As one I18N
> expert was heard to say, "It would be easier to teach them all English."

I see that phrase as umm... radicalized version of the decision
to use Unicode and UTF-8.

> Any solution will end up impacting some people more than others. It is
> no doubt true that UTF-8 may end up impacting certain people more than
> others. But the backwards compatibility aspects of UTF-8,

It's certainly backward compatible with ASCII (see above) and trivial
for iso8859-1.

> combined with
> the undeniable perponderence of where computers systems are deployed
> (i.e., U.S. and Europe) means that it was inevitable that UTF-8 would be
> chosen as the most pragmatic solution which impacts the smallest number
> of people and allows for the easist transition to a full I18N support.

Computers are produced in a lot of places and used everywhere. Pressure
of companies to standard committees is mostly applied in US and Europe
though.

> From where I sit, Microsoft wasn't the only company pushing Unicode; the
> push for Unicode and UTF-8 came from all directions, not just Microsoft.
> Or are you going to claim that the developers of Perl and X are pawns of
> Microsoft?

X did not adopt UTF-8, it uses multi-charset i18n support.

> Instead, it seems pretty clear that Perl and X chose UTF-8
> because

...IETF loudly declared UTF-8 to be The Only Standard, defined XML
UTF-8-only, and Larry Wall not being familiar with the issue was
misled into believing that UTF-8 indeed is the only standard that must be
supported instead of "legacy" charsets and their labeling.

> it's the sanest way to make the very hard transition from 8-bit
> characters to supporting internationalization, including character sets
> that simply won't fit in 256 character slots.

The nature of Perl and its data handling does not directly impose
restrictions of that kind.

> Finally, what in the world does this have to do with the Linux kernel?

Certainly it has to do with Linux kernel more than Microsoft
marketing/business strategy that is the main topic of this thread. Linux
kernel has some rudimental Unicode support, and there were proposals to
use it as "the" standard for filesystems, and claims that ext2 is
"designed" to support UTF-8 while it merely can use UTF-8 filenames just
like any other sanely designed filesystem.

> Followups to /dev/null, please.

If you don't want to hear answers there is no point arguing.

--
Alex

Khimenko Victor

unread,

Nov 9, 1998, 3:00:00 AM11/9/98

to

9-Nov-98 10:47 you wrote:
>> Even HTTP/1.0 can do everything that FTP does and can do it better
>> (much easier to implement in certain regards), but this hasn't
>> stopped FTP from being used.

> One thing http/1.0 certainly does better than ftp is to freeze and
> drop a link in mid transfer - something that happens with such
> monotonous regularity for me that I've stopped even considering using
> http for file transfers...

This is not a problem at all if you'll use appropriate tools. For example
if you'll use Apache and wget this will not be problem at all. But HTTP/1.0
if FAR more slow if want to download complex file structure with a lot of
small files. HTTP/1.1 parhaps could solve this problem but you'll need add
yet-to-be-developed file format to handle directory indexes (and no, HTML
will not work -- file attributes, time, symlinks, etc). For FTP there are no
such format as well but there are tools for a lot of pupular ftp-server with
tricks and this tools WORKS! Not always, of course, but most times...
While HTTP/1.1 does not works. For now.

P.S. For example wget was unable to download directory with filenames like
"Who is this?.jpeg" via HTTP but was able to download the same directory via
FTP ...

-- cut --
AC> FTP is dying, the main things that keep it alive are the fact http
AC> daemons are bad at handing out large files, and the fact http clients dont
AC> use byte ranges on broken file transfer retries.
-- cut --
In fact apache works with big files reliable and Netscape (4.05+ at least) could
use byte ranges for file download...

Alex Belits

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Mon, 9 Nov 1998, Alex Belits wrote:
>
> And it already was done without Unicode. And surprise in cases where
> UTF-8 can be processed transparently as a byte stream, so can every
> other charset.

Sorry, lost a comma:

And it already was done without Unicode. And surprise, in cases where

UTF-8 can be processed transparently as a byte stream, so can every
other charset.

--

Francisco Rodrigo Escobedo Robles

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

On Mon, 9 Nov 1998, Alan Cox wrote:

> > Most older protocols are even less optimal (I have in mind RFC 822 and
> > FTP as the worst offenders), but everyone keeps using them. The only
> > protocol that ever has successfully been abandoned since the
>

> FTP is dying, the main things that keep it alive are the fact http

> daemons are bad at handing out large files, and the fact http clients dont

> use byte ranges on broken file transfer retries.

Hopefully, HTTP/1.1 and compliant clients could get rid of this nasty bug.
As for large files, I suppose we'll have to wait a little more...

I work as a BOFH^H^H^H^HSystem Administrator at an ISP, and we use Linux
(of course) and Apache. I am always looking for optimizations, and a
secure way to do things. It would be great to switch from ftp to http if
it really is more secure.

At home I use 2.1.x series, and I haven't experimented tcp stalls that
some people report. Using 2 NE2000 PCI clones, I use to get around 1MB/s
transfers (sometimes a little less), through ftp (haven't tried with http)
and nfs (don't use very often). Both are Pentium machines, no bells and
whistles. Kernel now is 2.1.127, same results since 2.1.119 (I used
previous versions, but can't remember testings).

As a last comment, while 2.1.127 performs well for me, I couldn't
compile ftape support due to errors (I am at work, can't remember but I
think it's a missing or duplicated symbol).

Regards.

---
Francisco Rodrigo Escobedo Robles - mailto:fr...@vnet.es
Administrador del Sistema, Virtual Net - Hipernet
Este mensaje expresa unicamente mi opinion en este momento

Brandon S. Allbery KF8NH

unread,

Nov 10, 1998, 3:00:00 AM11/10/98

to

In message <1998111007...@tantalophile.demon.co.uk>, Jamie Lokier
write
s:
+-----
| Now if NFS weren't so slow over the phone I'd use that where available.
| I use NFS almost exclusively for getting files from a mirror
| (sunsite.doc.ic.ac.uk) when at work -- it's just about perfect.
|
| Maybe a cacheing httpfs (with some kind of directory listing) would do
| the trick.
+--->8

Why do I think "cachefs+WebNFS" when I read the above? :-)

--
brandon s. allbery [os/2][linux][solaris][japh] all...@kf8nh.apk.net
system administrator [WAY too many hats] all...@ece.cmu.edu
carnegie mellon / electrical and computer engineering KF8NH
Kiss my bits, Billy-boy.