function inclusion problem

548 views
Skip to first unread message

vlya...@gmail.com

unread,
Feb 10, 2015, 6:38:12 PM2/10/15
to
I defined function Fatalln in "mydef.py" and it works fine if i call it from "mydef.py", but when i try to call it from "test.py" in the same folder:
import mydef
...
Fatalln "my test"
i have NameError: name 'Fatalln' is not defined
I also tried include('mydef.py') with the same result...
What is the right syntax?
Thanks

sohca...@gmail.com

unread,
Feb 10, 2015, 6:55:48 PM2/10/15
to
It would help us help you a lot of you copy/paste your code from both mydef.py and test.py so we can see exactly what you're doing.

Don't re-type what you entered, because people (Especially new programmers) are prone to either making typos or leaving out certain things because they don't think they're important. Copy/Paste the code from the two files and then copy/paste the error you're getting.

Steven D'Aprano

unread,
Feb 10, 2015, 6:58:03 PM2/10/15
to
Preferred:

import mydef
mydef.Fatalln("my test")



Also acceptable:


from mydef import Fatalln
Fatalln("my test")





--
Steven

Michael Torrie

unread,
Feb 10, 2015, 7:00:37 PM2/10/15
to pytho...@python.org
Almost.

Try this:

mydef.Fatalln()

Unless you import the symbols from your mydef module into your program
they have to referenced by the module name. This is a good thing and it
helps keep your code separated and clean. It is possible to import
individual symbols from a module like this:

from mydef import Fatalln

Avoid the temptation to import *all* symbols from a module into the
current program's namespace. Better to type out the extra bit.
Alternatively you can alias imports like this

import somemodule.submodule as foo

Frequently this idiom is used when working with numpy to save a bit of
time, while preserving the separate namespaces.

sohca...@gmail.com

unread,
Feb 10, 2015, 7:03:06 PM2/10/15
to
On Tuesday, February 10, 2015 at 3:38:12 PM UTC-8, vlya...@gmail.com wrote:
If you only do `import mydef`, then it creates a module object called `mydef` which contains all the global members in mydef.py. When you want to call a function from that module, you need to specify that you're calling a function from that module by putting the module name followed by a period, then the function. For example:

mydef.Fatalln("my test")

If you wanted to be able to call Fatalln without using the module name, you could import just the Fatalln function:

from mydef import Fatalln
Fatalln("my test")

If you had a lot of functions in mydef.py and wanted to be able to access them all without that pesky module name, you could also do:

from mydef import *

However, this can often be considered a bad practice as you're polluting your global name space, though can be acceptable in specific scenarios.

For more information, check https://docs.python.org/3/tutorial/modules.html

Ian Kelly

unread,
Feb 10, 2015, 7:03:36 PM2/10/15
to Python
import mydef
mydef.Fatalin("my test")

or

from mydef import Fatalin
Fatalin("my test")

Laura Creighton

unread,
Feb 10, 2015, 7:06:55 PM2/10/15
to vlya...@gmail.com, pytho...@python.org, l...@openend.se
>--
>https://mail.python.org/mailman/listinfo/python-list

from mydef import Fatalln

Laura Creighton

unread,
Feb 10, 2015, 7:17:00 PM2/10/15
to Laura Creighton, pytho...@python.org, vlya...@gmail.com, l...@openend.se
In a message of Wed, 11 Feb 2015 01:06:00 +0100, Laura Creighton writes:
>In a message of Tue, 10 Feb 2015 15:38:02 -0800, vlya...@gmail.com writes:
>>--
>>https://mail.python.org/mailman/listinfo/python-list
>
>from mydef import Fatalln
>

Also, please be warned. If you use a unix system, or a linux
system. There are lots of problems you can get into if you
expect something named 'test' to run your code. Because they
already have one in their shell, and that one wins, and so ...
well, test.py is safe. But if you rename it as a script and call
it the binary file test ...

Bad and unexpected things happen.

Name it 'testme' or something like that. Never have that problem again.
:)

Been there, done that!
Laura

Message has been deleted

Victor L

unread,
Feb 11, 2015, 8:28:24 AM2/11/15
to Laura Creighton, pytho...@python.org
Laura, thanks for the answer - it works. Is there some equivalent of "include" to expose every function in that script?
Thanks again,
-V

On Tue, Feb 10, 2015 at 7:16 PM, Laura Creighton <l...@openend.se> wrote:
In a message of Wed, 11 Feb 2015 01:06:00 +0100, Laura Creighton writes:
>In a message of Tue, 10 Feb 2015 15:38:02 -0800, vlya...@gmail.com writes:

Dave Angel

unread,
Feb 11, 2015, 10:07:59 AM2/11/15
to pytho...@python.org
On 02/11/2015 08:27 AM, Victor L wrote:
> Laura, thanks for the answer - it works. Is there some equivalent of
> "include" to expose every function in that script?
> Thanks again,
> -V
>
Please don't top-post, and please use text email, not html. Thank you.

yes, as sohca...@gmail.com pointed out, you can do

from mydef import *

But this is nearly always bad practice. If there are only a few
functions you need access to, you should do

from mydef import Fatalln, func2, func42

and if there are tons of them, you do NOT want to pollute your local
namespace with them, and should do:

import mydef

x = mydef.func2() # or whatever

The assumption is that when the code is in an importable module, it'll
be maintained somewhat independently of the calling script/module. For
example, if you're using a third party library, it could be updated
without your having to rewrite your own calling code.

So what happens if the 3rd party adds a new function, and you happen to
have one by the same name. If you used the import* semantics, you could
suddenly have broken code, with the complaint "But I didn't change a thing."

Similarly, if you import from more than one module, and use the import*
form, they could conflict with each other. And the order of importing
will (usually) determine which names override which ones.

The time that it's reasonable to use import* is when the third-party
library already recommends it. They should only do so if they have
written their library to only expose a careful subset of the names
declared, and documented all of them. And when they make new releases,
they're careful to hide any new symbols unless carefully documented in
the release notes, so you can manually check for interference.

Incidentally, this is also true of the standard library. There are
symbols that are importable from multiple places, and sometimes they
have the same meanings, sometimes they don't. An example (in Python
2.7) of the latter is os.walk and os.path.walk

When I want to use one of those functions, I spell it out:
for dirpath, dirnames, filenames in os.walk(topname):

That way, there's no doubt in the reader's mind which one I intended.

--
DaveA

Tim Chase

unread,
Feb 11, 2015, 10:36:54 AM2/11/15
to pytho...@python.org
On 2015-02-11 10:07, Dave Angel wrote:
> if there are tons of them, you do NOT want to pollute your local
> namespace with them, and should do:
>
> import mydef
>
> x = mydef.func2() # or whatever

or, if that's verbose, you can give a shorter alias:

import Tkinter as tk
root = tk.Tk()
root.mainloop()

-tkc




Chris Angelico

unread,
Feb 11, 2015, 10:37:31 AM2/11/15
to pytho...@python.org
On Thu, Feb 12, 2015 at 2:07 AM, Dave Angel <d...@davea.name> wrote:
> Similarly, if you import from more than one module, and use the import*
> form, they could conflict with each other. And the order of importing will
> (usually) determine which names override which ones.

Never mind about conflicts and order of importing... just try figuring
out code like this:

from os import *
from sys import *
from math import *

# Calculate the total size of all files in a directory
tot = 0
for path, dirs, files in walk(argv[1]):
# We don't need to sum the directories separately
for f in files:
# getsizeof() returns a value in bytes
tot += getsizeof(f)/1024.0/1024.0

print("Total directory size:", floor(tot), "MB")

Now, I'm sure some of the experienced Python programmers here can see
exactly what's wrong. But can everyone? I doubt it. Even if you run
it, I doubt you'd get any better clue. But if you could see which
module everything was imported from, it'd be pretty obvious.

ChrisA

Laura Creighton

unread,
Feb 24, 2015, 2:58:21 PM2/24/15
to Laura Creighton, pytho...@python.org, l...@openend.se
Dave Angel
are you another Native English speaker living in a world where ASCII
is enough?

Laura

Dave Angel

unread,
Feb 24, 2015, 3:42:09 PM2/24/15
to pytho...@python.org
I'm a native English speaker, and 7 bits is not nearly enough. Even if
I didn't currently care, I have some history:

No. CDC display code is enough. Who needs lowercase?

No. Baudot code is enough.

No, EBCDIC is good enough. Who cares about other companies.

No, the "golf-ball" only holds this many characters. If we need more,
we can just get the operator to switch balls in the middle of printing.

No. 2 digit years is enough. This world won't last till the millennium
anyway.

No. 2k is all the EPROM you can have. Your code HAS to fit in it, and
only 1.5k RAM.

No. 640k is more than anyone could need.

No, you cannot use a punch card made on a model 26 keypunch in the same
deck as one made on a model 29. Too bad, many of the codes are
different. (This one cost me travel back and forth between two
different locations with different model keypunches)

No. 8 bits is as much as we could ever use for characters. Who could
possibly need names or locations outside of this region? Or from
multiple places within it?

35 years ago I helped design a serial terminal that "spoke" Chinese,
using a two-byte encoding. But a single worldwide standard didn't come
until much later, and I cheered Unicode when it was finally unveiled.

I've worked with many printers that could only print 70 or 80 unique
characters. The laser printer, and even the matrix printer are
relatively recent inventions.

Getting back on topic:

According to:
http://support.esri.com/cn/knowledgebase/techarticles/detail/27345

"""ArcGIS Desktop applications, such as ArcMap, are Unicode based, so
they support Unicode to a certain level. The level of Unicode support
depends on the data format."""

That page was written about 2004, so there was concern even then.

And according to another, """In the header of each shapefile (.DBF), a
reference to a code page is included."""

--
DaveA

Steven D'Aprano

unread,
Feb 24, 2015, 8:19:58 PM2/24/15
to
ASCII was never enough. Not even for Americans, who couldn't write things
like "I bought a comic book for 10¢ yesterday", let alone interesting
things from maths and science.

I missed the whole 7-bit ASCII period, my first computer (Mac 128K) already
had an extended character set beyond ASCII. But even that never covered the
full range of characters I wanted to write, and then there was the horrible
mess that you got whenever you copied text files from a Mac to a DOS or
Windows PC or visa versa. Yes, even in 1984 we were transferring files and
running into encoding issues.



--
Steven

Marcos Almeida Azevedo

unread,
Feb 24, 2015, 11:54:50 PM2/24/15
to Steven D'Aprano, pytho...@python.org
On Wed, Feb 25, 2015 at 9:19 AM, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
Laura Creighton wrote:

> Dave Angel
> are you another Native English speaker living in a world where ASCII
> is enough?

ASCII was never enough. Not even for Americans, who couldn't write things
like "I bought a comic book for 10¢ yesterday", let alone interesting
things from maths and science.


ASCII was a necessity back then because RAM and storage are too small.
 
I missed the whole 7-bit ASCII period, my first computer (Mac 128K) already
had an extended character set beyond ASCII. But even that never covered the

I miss the days when I was coding with my XT computer (640kb RAM) too.  Things were so simple back then.
 
full range of characters I wanted to write, and then there was the horrible
mess that you got whenever you copied text files from a Mac to a DOS or
Windows PC or visa versa. Yes, even in 1984 we were transferring files and
running into encoding issues.



--
Marcos | I love PHP, Linux, and Java

Rustom Mody

unread,
Feb 26, 2015, 7:40:25 AM2/26/15
to
Wrote something up on why we should stop using ASCII:
http://blog.languager.org/2015/02/universal-unicode.html

(Yeah the world is a bit larger than a small bunch of islands off a half-continent.
But this is not that discussion!)

Rustom Mody

unread,
Feb 26, 2015, 8:15:56 AM2/26/15
to
Dave's list above of instances of 'poverty is a good idea' turning out stupid and narrow-minded in hindsight is neat. Thought I'd ack that explicitly.

Chris Angelico

unread,
Feb 26, 2015, 8:24:34 AM2/26/15
to pytho...@python.org
On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rusto...@gmail.com> wrote:
> Wrote something up on why we should stop using ASCII:
> http://blog.languager.org/2015/02/universal-unicode.html

>From that post:

"""
5.1 Gibberish

When going from the original 2-byte unicode (around version 3?) to the
one having supplemental planes, the unicode consortium added blocks
such as

* Egyptian hieroglyphs
* Cuneiform
* Shavian
* Deseret
* Mahjong
* Klingon

To me (a layman) it looks unprofessional – as though they are playing
games – that billions of computing devices, each having billions of
storage words should have their storage wasted on blocks such as
these.
"""

The shift from Unicode as a 16-bit code to having multiple planes came
in with Unicode 2.0, but the various blocks were assigned separately:
* Egyptian hieroglyphs: Unicode 5.2
* Cuneiform: Unicode 5.0
* Shavian: Unicode 4.0
* Deseret: Unicode 3.1
* Mahjong Tiles: Unicode 5.1
* Klingon: Not part of any current standard

However, I don't think historians will appreciate you calling all of
these "gibberish". To adequately describe and discuss old texts
without these Unicode blocks, we'd have to either do everything with
images, or craft some kind of reversible transliteration system and
have dedicated software to render the texts on screen. Instead, what
we have is a well-known and standardized system for transliterating
all of these into numbers (code points), and rendering them becomes a
simple matter of installing an appropriate font.

Also, how does assigning meanings to codepoints "waste storage"? As
soon as Unicode 2.0 hit and 16-bit code units stopped being
sufficient, everyone needed to allocate storage - either 32 bits per
character, or some other system - and the fact that some codepoints
were unassigned had absolutely no impact on that. This is decidedly
NOT unprofessional, and it's not wasteful either.

ChrisA

Sam Raker

unread,
Feb 26, 2015, 11:46:11 AM2/26/15
to
I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so helps solve the problems affecting speakers of "underserved" languages--access and language preservation. Speakers of Mongolian, Cherokee, Georgian, etc. all deserve to be able to interact with technology in their native languages as much as we speakers of ASCII-friendly languages do. Unicode support also makes writing papers on, dictionaries of, and new texts in such languages much easier, which helps the fight against language extinction, which is a sadly pressing issue.

Also, like, computers are big. Get an external drive for your high-resolution PDF collection of Medieval manuscripts if you feel like you're running out of space. A few extra codepoints aren't going to be the straw that breaks the camel's back.


On Thursday, February 26, 2015 at 8:24:34 AM UTC-5, Chris Angelico wrote:
> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rusto...@gmail.com> wrote:
> > Wrote something up on why we should stop using ASCII:
> > http://blog.languager.org/2015/02/universal-unicode.html
>
> >From that post:
>
> """
> 5.1 Gibberish
>
> When going from the original 2-byte unicode (around version 3?) to the
> one having supplemental planes, the unicode consortium added blocks
> such as
>
> * Egyptian hieroglyphs
> * Cuneiform
> * Shavian
> * Deseret
> * Mahjong
> * Klingon
>
> To me (a layman) it looks unprofessional - as though they are playing
> games - that billions of computing devices, each having billions of

Terry Reedy

unread,
Feb 26, 2015, 12:03:44 PM2/26/15
to pytho...@python.org
On 2/26/2015 8:24 AM, Chris Angelico wrote:
> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rusto...@gmail.com> wrote:
>> Wrote something up on why we should stop using ASCII:
>> http://blog.languager.org/2015/02/universal-unicode.html

I think that the main point of the post, that many Unicode chars are
truly planetary rather than just national/regional, is excellent.

> From that post:
>
> """
> 5.1 Gibberish
>
> When going from the original 2-byte unicode (around version 3?) to the
> one having supplemental planes, the unicode consortium added blocks
> such as
>
> * Egyptian hieroglyphs
> * Cuneiform
> * Shavian
> * Deseret
> * Mahjong
> * Klingon
>
> To me (a layman) it looks unprofessional – as though they are playing
> games – that billions of computing devices, each having billions of
> storage words should have their storage wasted on blocks such as
> these.
> """
>
> The shift from Unicode as a 16-bit code to having multiple planes came
> in with Unicode 2.0, but the various blocks were assigned separately:
> * Egyptian hieroglyphs: Unicode 5.2
> * Cuneiform: Unicode 5.0
> * Shavian: Unicode 4.0
> * Deseret: Unicode 3.1
> * Mahjong Tiles: Unicode 5.1
> * Klingon: Not part of any current standard

You should add emoticons, but not call them or the above 'gibberish'.
I think that this part of your post is more 'unprofessional' than the
character blocks. It is very jarring and seems contrary to your main point.

> However, I don't think historians will appreciate you calling all of
> these "gibberish". To adequately describe and discuss old texts
> without these Unicode blocks, we'd have to either do everything with
> images, or craft some kind of reversible transliteration system and
> have dedicated software to render the texts on screen. Instead, what
> we have is a well-known and standardized system for transliterating
> all of these into numbers (code points), and rendering them becomes a
> simple matter of installing an appropriate font.
>
> Also, how does assigning meanings to codepoints "waste storage"? As
> soon as Unicode 2.0 hit and 16-bit code units stopped being
> sufficient, everyone needed to allocate storage - either 32 bits per
> character, or some other system - and the fact that some codepoints
> were unassigned had absolutely no impact on that. This is decidedly
> NOT unprofessional, and it's not wasteful either.

I agree.

--
Terry Jan Reedy


Rustom Mody

unread,
Feb 26, 2015, 12:08:37 PM2/26/15
to
On Thursday, February 26, 2015 at 10:16:11 PM UTC+5:30, Sam Raker wrote:
> I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so helps solve the problems affecting speakers of "underserved" languages--access and language preservation. Speakers of Mongolian, Cherokee, Georgian, etc. all deserve to be able to interact with technology in their native languages as much as we speakers of ASCII-friendly languages do. Unicode support also makes writing papers on, dictionaries of, and new texts in such languages much easier, which helps the fight against language extinction, which is a sadly pressing issue.


Agreed -- Correcting the inequities caused by ASCII-bias is a good thing.

In fact the whole point of my post was to say just that by carving out and
focussing on a 'universal' subset of unicode that is considerably larger than
ASCII but smaller than unicode, we stand to reduce ASCII-bias.

As also other posts like
http://blog.languager.org/2014/04/unicoded-python.html
http://blog.languager.org/2014/05/unicode-in-haskell-source.html

However my example listed

> > * Egyptian hieroglyphs
> > * Cuneiform
> > * Shavian
> > * Deseret
> > * Mahjong
> > * Klingon

Ok Chris has corrected me re. Klingon-in-unicode. So lets drop that.
Of the others which do you thing is in 'underserved' category?

More generally which of http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Supplementary_Multilingual_Plane
are underserved?

Chris Angelico

unread,
Feb 26, 2015, 12:29:43 PM2/26/15
to pytho...@python.org
On Fri, Feb 27, 2015 at 4:02 AM, Terry Reedy <tjr...@udel.edu> wrote:
> On 2/26/2015 8:24 AM, Chris Angelico wrote:
>>
>> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rusto...@gmail.com>
>> wrote:
>>>
>>> Wrote something up on why we should stop using ASCII:
>>> http://blog.languager.org/2015/02/universal-unicode.html
>
>
> I think that the main point of the post, that many Unicode chars are truly
> planetary rather than just national/regional, is excellent.

Agreed. Like you, though, I take exception at the "Gibberish" section.

Unicode offers us a number of types of character needed by linguists:

1) Letters[1] common to many languages, such as the unadorned Latin
and Cyrillic letters
2) Letters specific to one or very few languages, such as the Turkish dotless i
3) Diacritical marks, ready to be combined with various letters
4) Precomposed forms of various common "letter with diacritical" combinations
5) Other precomposed forms, eg ligatures and Hangul syllables
6) Symbols, punctuation, and various other marks
7) Spacing of various widths and attributes

Apart from #4 and #5, which could be avoided by using the decomposed
forms everywhere, each of these character types is vital. You can't
typeset a document without being able to adequately represent every
part of it. Then there are additional characters that aren't strictly
necessary, but are extremely convenient, such as the emoticon
sections. You can talk in text and still put in a nice little picture
of a globe, or the monkey-no-evil set, etc.

Most of these characters - in fact, all except #2 and maybe a few of
the diacritical marks - are used in multiple places/languages. Unicode
isn't about taking everyone's separate character sets and numbering
them all so we can reference characters from anywhere; if you wanted
that, you'd be much better off with something that lets you specify a
code page in 16 bits and a character in 8, which is roughly the same
size as Unicode anyway. What we have is, instead, a system that brings
them all together - LATIN SMALL LETTER A is U+0061 no matter whether
it's being used to write English, French, Malaysian, Turkish,
Croatian, Vietnamese, or Icelandic text. Unicode is truly planetary.

ChrisA

[1] I use the word "letter" loosely here; Chinese and Japanese don't
have a concept of letters as such, but their glyphs are still
represented.

Rustom Mody

unread,
Feb 26, 2015, 12:59:24 PM2/26/15
to
On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> On 2/26/2015 8:24 AM, Chris Angelico wrote:
Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno…
In any case I'd like to stay clear of political(izable) questions


> I think that this part of your post is more 'unprofessional' than the
> character blocks. It is very jarring and seems contrary to your main point.

Ok I need a word for
1. I have no need for this
2. 99.9% of the (living) on this planet also have no need for this

>
> > However, I don't think historians will appreciate you calling all of
> > these "gibberish". To adequately describe and discuss old texts
> > without these Unicode blocks, we'd have to either do everything with
> > images, or craft some kind of reversible transliteration system and
> > have dedicated software to render the texts on screen. Instead, what
> > we have is a well-known and standardized system for transliterating
> > all of these into numbers (code points), and rendering them becomes a
> > simple matter of installing an appropriate font.
> >
> > Also, how does assigning meanings to codepoints "waste storage"? As
> > soon as Unicode 2.0 hit and 16-bit code units stopped being
> > sufficient, everyone needed to allocate storage - either 32 bits per
> > character, or some other system - and the fact that some codepoints
> > were unassigned had absolutely no impact on that. This is decidedly
> > NOT unprofessional, and it's not wasteful either.
>
> I agree.

I clearly am more enthusiastic than knowledgeable about unicode.
But I know my basic CS well enough (as I am sure you and Chris also do)

So I dont get how 4 bytes is not more expensive than 2.
Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
You could use a clever representation like UTF-8 or FSR.
But I dont see how you can get out of this that full-unicode costs more than
exclusive BMP.

eg consider the case of 32 vs 64 bit executables.
The 64 bit executable is generally larger than the 32 bit one
Now consider the case of a machine that has say 2GB RAM and a 64-bit processor.
You could -- I think -- make a reasonable case that all those all-zero hi-address-words are 'waste'.

And youve got the general sense best so far:
> I think that the main point of the post, that many Unicode chars are
> truly planetary rather than just national/regional,

And if the general tone/tenor of what I have written is probably not getting
across by some words (like 'gibberish'?) so I'll try and reword.

However let me try and clarify that the whole of section 5 is 'iffy' with 5.1 being only more extreme. Ive not written these in because the point of that
post is not to criticise unicode but to highlight the universal(isable) parts.

Still if I were to expand on the criticisms here are some examples:

Math-Greek: Consider the math-alpha block
http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block

Now imagine a beginning student not getting the difference between font, glyph,
character. To me this block represents this same error cast into concrete and
dignified by the (supposed) authority of the unicode consortium.

There are probably dozens of other such stupidities like distinguishing kelvin K from latin K as if that is the business of the unicode consortium

My real reservations about unicode come from their work in areas that I happen to know something about

Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ is perhaps ok
However all this stuff http://xahlee.info/comp/unicode_music_symbols.html
makes no sense (to me) given that music (ie standard western music written in staff notation) is inherently 2 dimensional -- multi-voiced, multi-staff, chordal

Sanskrit/Devanagari:
Consists of bogus letters that dont exist in devanagari
The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf
But not here http://en.wikipedia.org/wiki/Devanagari#Vowels
So I call it bogus-devanagari

Contrariwise an important letter in vedic pronunciation the double-udatta is missing
http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html

All of which adds up to the impression that the unicode consortium occasionally fails to do due diligence

In any case all of the above is contrary to /irrelevant to my post which is about
identifying the more universal parts of unicode

Rustom Mody

unread,
Feb 26, 2015, 2:59:38 PM2/26/15
to
On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> You should add emoticons, but not call them or the above 'gibberish'.

Done -- and of course not under gibberish.
I dont really know much how emoji are used but I understand they are.
JFTR I consider it necessary to be respectful to all (living) people.
For that matter even dead people(s) - no need to be disrespectful to the egyptians who created the hieroglyphs or the sumerians who wrote cuneiform.

I only find it crosses a line when the 2 millenia dead creations are made to take
the space of the living.

Chris wrote:
> * Klingon: Not part of any current standard

Thanks Removed.

wxjm...@gmail.com

unread,
Feb 26, 2015, 3:20:55 PM2/26/15
to
Le jeudi 26 février 2015 18:59:24 UTC+1, Rustom Mody a écrit :
>
> ...To me this block represents this same error cast into concrete and
> dignified by the (supposed) authority of the unicode consortium.
>

Unicode does not prescribe, it registrates.

Eg. The inclusion of
U+1E9E, 'LATIN CAPITAL LETTER SHARP S'
has been officialy proposed by the "German
Federal Government".
(I have a pdf copy somewhere).

Chris Angelico

unread,
Feb 26, 2015, 5:14:18 PM2/26/15
to pytho...@python.org
On Fri, Feb 27, 2015 at 4:59 AM, Rustom Mody <rusto...@gmail.com> wrote:
> On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
>> I think that this part of your post is more 'unprofessional' than the
>> character blocks. It is very jarring and seems contrary to your main point.
>
> Ok I need a word for
> 1. I have no need for this
> 2. 99.9% of the (living) on this planet also have no need for this

So what, seven million people need it? Sounds pretty useful to me. And
your figure is an exaggeration; a lot more people than that use
emoji/emoticons.

>> > Also, how does assigning meanings to codepoints "waste storage"? As
>> > soon as Unicode 2.0 hit and 16-bit code units stopped being
>> > sufficient, everyone needed to allocate storage - either 32 bits per
>> > character, or some other system - and the fact that some codepoints
>> > were unassigned had absolutely no impact on that. This is decidedly
>> > NOT unprofessional, and it's not wasteful either.
>>
>> I agree.
>
> I clearly am more enthusiastic than knowledgeable about unicode.
> But I know my basic CS well enough (as I am sure you and Chris also do)
>
> So I dont get how 4 bytes is not more expensive than 2.
> Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
> You could use a clever representation like UTF-8 or FSR.
> But I dont see how you can get out of this that full-unicode costs more than
> exclusive BMP.

Sure, UCS-2 is cheaper than the current Unicode spec. But Unicode 2.0
was when that changed, and the change was because 65536 characters
clearly wouldn't be enough - and that was due to the number of
characters needed for other things than those you're complaining
about. Every spec since then has not changed anything that affects
storage. There are still, today, quite a lot of unallocated blocks of
characters (we're really using only about two planes' worth so far,
maybe three), but even if Unicode specified just two planes of 64K
characters each, you wouldn't be able to save much on transmission
(UTF-8 is already flexible and uses only what you need; if a future
Unicode spec allows 64K planes, UTF-8 transmission will cost exactly
the same for all existing characters), and on an eight-bit-byte
system, the very best you'll be able to do is three bytes - which you
can do today, too; you already know 21 bits will do. So since the BMP
was proven insufficient (back in 1996), no subsequent changes have had
any costs in storage.

> Still if I were to expand on the criticisms here are some examples:
>
> Math-Greek: Consider the math-alpha block
> http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block
>
> Now imagine a beginning student not getting the difference between font, glyph,
> character. To me this block represents this same error cast into concrete and
> dignified by the (supposed) authority of the unicode consortium.
>
> There are probably dozens of other such stupidities like distinguishing kelvin K from latin K as if that is the business of the unicode consortium

A lot of these kinds of characters come from a need to unambiguously
transliterate text stored in other encodings. I don't personally
profess to understand the reasoning behind the various
indistinguishable characters, but I'm aware that there are a lot of
tricky questions to be decided; and if once the Consortium decides to
allocate a character, that character must remain forever allocated.

> My real reservations about unicode come from their work in areas that I happen to know something about
>
> Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ is perhaps ok
> However all this stuff http://xahlee.info/comp/unicode_music_symbols.html
> makes no sense (to me) given that music (ie standard western music written in staff notation) is inherently 2 dimensional -- multi-voiced, multi-staff, chordal

The placement on the page is up to the display library. You can
produce a PDF that places the note symbols at their correct positions,
and requires no images to render sheet music.

> Sanskrit/Devanagari:
> Consists of bogus letters that dont exist in devanagari
> The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf
> But not here http://en.wikipedia.org/wiki/Devanagari#Vowels
> So I call it bogus-devanagari
>
> Contrariwise an important letter in vedic pronunciation the double-udatta is missing
> http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html
>
> All of which adds up to the impression that the unicode consortium occasionally fails to do due diligence

Which proves that they're not perfect. Don't forget, they can always
add more characters later.

ChrisA

Steven D'Aprano

unread,
Feb 26, 2015, 6:09:55 PM2/26/15
to
Chris Angelico wrote:

> Unicode
> isn't about taking everyone's separate character sets and numbering
> them all so we can reference characters from anywhere; if you wanted
> that, you'd be much better off with something that lets you specify a
> code page in 16 bits and a character in 8, which is roughly the same
> size as Unicode anyway.

Well, except for the approximately 25% of people in the world whose native
language has more than 256 characters.

It sounds like you are referring to some sort of "shift code" system. Some
legacy East Asian encodings use a similar scheme, and depending on how they
are implemented they have great disadvantages. For example, Shift-JIS
suffers from a number of weaknesses including that a single byte corrupted
in transmission can cause large swaths of the following text to be
corrupted. With Unicode, a single corrupted byte can only corrupt a single
code point.


--
Steven

Chris Angelico

unread,
Feb 26, 2015, 6:23:45 PM2/26/15
to pytho...@python.org
On Fri, Feb 27, 2015 at 10:09 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> Chris Angelico wrote:
>
>> Unicode
>> isn't about taking everyone's separate character sets and numbering
>> them all so we can reference characters from anywhere; if you wanted
>> that, you'd be much better off with something that lets you specify a
>> code page in 16 bits and a character in 8, which is roughly the same
>> size as Unicode anyway.
>
> Well, except for the approximately 25% of people in the world whose native
> language has more than 256 characters.

You could always allocate multiple code pages to one language. But
since I'm not advocating this system, I'm only guessing at solutions
to its problems.

> It sounds like you are referring to some sort of "shift code" system. Some
> legacy East Asian encodings use a similar scheme, and depending on how they
> are implemented they have great disadvantages. For example, Shift-JIS
> suffers from a number of weaknesses including that a single byte corrupted
> in transmission can cause large swaths of the following text to be
> corrupted. With Unicode, a single corrupted byte can only corrupt a single
> code point.

That's exactly what I was hinting at. There are plenty of systems like
that, and they are badly flawed compared to a simple universal system
for a number of reasons. One is the corruption issue you mention;
another is that a simple memory-based text search becomes utterly
useless (to locate text in a document, you'd need to do a whole lot of
stateful parsing - not to mention the difficulties of doing
"similar-to" searches across languages); concatenation of text also
becomes a stateful operation, and so do all sorts of other simple
manipulations. Unicode may demand a bit more storage in certain
circumstances (where an eight-bit encoding might have handled your
entire document), but it's so much easier for the general case.

ChrisA

Steven D'Aprano

unread,
Feb 26, 2015, 8:05:38 PM2/26/15
to
Rustom Mody wrote:

> Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno…
> In any case I'd like to stay clear of political(izable) questions

Emoji is the term used in Japan, gradually spreading to the rest of the
word. Emoticons, I believe, should be restricted to the practice of using
ASCII-only digraphs and trigraphs such as :-) (colon, hyphen, right-parens)
to indicate "smileys".

I believe that emoji will eventually lead to Unicode's victory. People will
want smileys and piles of poo on their mobile phones, and from there it
will gradually spread to everywhere. All they need to do to make victory
inevitable is add cartoon genitals...


>> I think that this part of your post is more 'unprofessional' than the
>> character blocks. It is very jarring and seems contrary to your main
>> point.
>
> Ok I need a word for
> 1. I have no need for this
> 2. 99.9% of the (living) on this planet also have no need for this

0.1% of the living is seven million people. I'll tell you what, you tell me
which seven million people should be relegated to second-class status, and
I'll tell them where you live.

:-)


[...]
> I clearly am more enthusiastic than knowledgeable about unicode.
> But I know my basic CS well enough (as I am sure you and Chris also do)
>
> So I dont get how 4 bytes is not more expensive than 2.

Obviously it is. But it's only twice as expensive, and in computer science
terms that counts as "close enough". It's quite common for data structures
to "waste" space by using "no more than twice as much space as needed",
e.g. Python dicts and lists.

The whole Unicode range U+0000 to U+10FFFF needs only 21 bits, which fits
into three bytes. Nevertheless, there's no three-byte UTF encoding, because
on modern hardware it is more efficient to "waste" an entire extra byte per
code point and deal with an even multiple of bytes.


> Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
> You could use a clever representation like UTF-8 or FSR.
> But I dont see how you can get out of this that full-unicode costs more
> than exclusive BMP.

Are you missing a word there? Costs "no more" perhaps?


> eg consider the case of 32 vs 64 bit executables.
> The 64 bit executable is generally larger than the 32 bit one
> Now consider the case of a machine that has say 2GB RAM and a 64-bit
> processor. You could -- I think -- make a reasonable case that all those
> all-zero hi-address-words are 'waste'.

Sure. The whole point of 64-bit processors is to enable the use of more than
2GB of RAM. One might as well say that using 32-bit processors is wasteful
if you only have 64K of memory. Yes it is, but the only things which use
16-bit or 8-bit processors these days are embedded devices.


[...]
> Math-Greek: Consider the math-alpha block
>
http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block
>
> Now imagine a beginning student not getting the difference between font,
> glyph,
> character. To me this block represents this same error cast into concrete
> and dignified by the (supposed) authority of the unicode consortium.

Not being privy to the internal deliberations of the Consortium, it is
sometimes difficult to tell why two symbols are sometimes declared to be
mere different glyphs for the same character, and other times declared to
be worthy of being separate characters.

E.g. I think we should all agree that the English "A" and the French "A"
shouldn't count as separate characters, although the Greek "Α" and
Russian "А" do.

In the case of the maths symbols, it isn't obvious to me what the deciding
factors were. I know that one of the considerations they use is to consider
whether or not users of the symbols have a tradition of treating the
symbols as mere different glyphs, i.e. stylistic variations. In this case,
I'm pretty sure that mathematicians would *not* consider:

U+2115 DOUBLE-STRUCK CAPITAL N "ℕ"
U+004E LATIN CAPITAL LETTER N "N"

as mere stylistic variations. If you defined a matrix called ℕ, you would
probably be told off for using the wrong symbol, not for using the wrong
formatting.

On the other hand, I'm not so sure about

U+210E PLANCK CONSTANT "ℎ"

versus a mere lowercase h (possibly in italic).


> There are probably dozens of other such stupidities like distinguishing
> kelvin K from latin K as if that is the business of the unicode consortium

But it *is* the business of the Unicode consortium. They have at least two
important aims:

- to be able to represent every possible human-language character;

- to allow lossless round-trip conversion to all existing legacy encodings
(for the subset of Unicode handled by that encoding).


The second reason is why Unicode includes code points for degree-Celsius and
degree-Fahrenheit, rather than just using °C and °F like sane people.
Because some idiot^W code-page designer back in the 1980s or 90s decided to
add single character ℃ and ℉. So now Unicode has to be able to round-trip
(say) "°C℃" without loss.

I imagine that the same applies to U+212A KELVIN SIGN K.


> My real reservations about unicode come from their work in areas that I
> happen to know something about
>
> Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪
> ♫ is perhaps ok However all this stuff
> http://xahlee.info/comp/unicode_music_symbols.html
> makes no sense (to me) given that music (ie standard western music written
> in staff notation) is inherently 2 dimensional -- multi-voiced,
> multi-staff, chordal

(1) Text can also be two dimensional.
(2) Where you put the symbol on the page is a separate question from whether
or not the symbol exists.


> Consists of bogus letters that dont exist in devanagari
> The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf
> But not here http://en.wikipedia.org/wiki/Devanagari#Vowels
> So I call it bogus-devanagari

Hmm, well I love Wikipedia as much as the next guy, but I think that even
Jimmy Wales would suggest that Wikipedia is not a primary source for what
counts as Devanagari vowels. What makes you think that Wikipedia is right
and Unicode is wrong?

That's not to say that Unicode hasn't made some mistakes. There are a few
deprecated code points, or code points that have been given the wrong name.
Oops. Mistakes happen.


> Contrariwise an important letter in vedic pronunciation the double-udatta
> is missing
>
http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html

I quote:


I do not see any need for a "double udaatta". Perhaps "double
ANudaatta" is meant here?


I don't know Sanskrit, but if somebody suggested that Unicode doesn't
support English because the important letter "double-oh" (as
in "moon", "spoon", "croon" etc.) was missing, I wouldn't be terribly
impressed. We have a "double-u" letter, why not "double-oh"?


Another quote:

I should strongly recommend not to hurry with a standardization
proposal until the text collection of Vedic texts has been finished


In other words, even the experts in Vedic texts don't yet know all the
characters which they may or may not need.



--
Steven

Dave Angel

unread,
Feb 26, 2015, 8:58:13 PM2/26/15
to pytho...@python.org
On 02/26/2015 08:05 PM, Steven D'Aprano wrote:
> Rustom Mody wrote:
>

>
>> eg consider the case of 32 vs 64 bit executables.
>> The 64 bit executable is generally larger than the 32 bit one
>> Now consider the case of a machine that has say 2GB RAM and a 64-bit
>> processor. You could -- I think -- make a reasonable case that all those
>> all-zero hi-address-words are 'waste'.
>
> Sure. The whole point of 64-bit processors is to enable the use of more than
> 2GB of RAM. One might as well say that using 32-bit processors is wasteful
> if you only have 64K of memory. Yes it is, but the only things which use
> 16-bit or 8-bit processors these days are embedded devices.

But the 2gig means electrical address lines out of the CPU are wasted,
not address space. A 64 bit processor and 64bit OS means you can have
more than 4gig in a process space, even if over half of it has to be in
the swap file. Linear versus physical makes a big difference.

(Although I believe Seymour Cray was quoted as saying that virtual
memory is a crock, because "you can't fake what you ain't got.")




--
DaveA

Steven D'Aprano

unread,
Feb 27, 2015, 12:58:56 AM2/27/15
to
Dave Angel wrote:

> (Although I believe Seymour Cray was quoted as saying that virtual
> memory is a crock, because "you can't fake what you ain't got.")

If I recall correctly, disk access is about 10000 times slower than RAM, so
virtual memory is *at least* that much slower than real memory.



--
Steven

Dave Angel

unread,
Feb 27, 2015, 2:31:10 AM2/27/15
to pytho...@python.org
It's so much more complicated than that, that I hardly know where to
start. I'll describe a generic processor/OS/memory/disk architecture;
there will be huge differences between processor models even from a
single manufacturer.

First, as soon as you add swapping logic to your
processor/memory-system, you theoretically slow it down. And in the
days of that quote, Cray's memory was maybe 50 times as fast as the
memory used by us mortals. So adding swapping logic would have slowed
it down quite substantially, even when it was not swapping. But that
logic is inside the CPU chip these days, and presumably thoroughly
optimized.

Next, statistically, a program uses a small subset of its total program
& data space in its working set, and the working set should reside in
real memory. But when the program greatly increases that working set,
and it approaches the amount of physical memory, then swapping becomes
more frenzied, and we say the program is thrashing. Simple example, try
sorting an array that's about the size of available physical memory.

Next, even physical memory is divided into a few levels of caching, some
on-chip and some off. And the caching is done in what I call strips,
where accessing just one byte causes the whole strip to be loaded from
non-cached memory. I forget the current size for that, but it's maybe
64 to 256 bytes or so.

If there are multiple processors (not multicore, but actual separate
processors), then each one has such internal caches, and any writes on
one processor may have to trigger flushes of all the other processors
that happen to have the same strip loaded.

The processor not only prefetches the next few instructions, but decodes
and tentatively executes them, subject to being discarded if a
conditional branch doesn't go the way the processor predicted. So some
instructions execute in zero time, some of the time.

Every address of instruction fetch, or of data fetch or store, goes
through a couple of layers of translation. Segment register plus offset
gives linear address. Lookup those in tables to get physical address,
and if table happens not to be in on-chip cache, swap it in. If
physical address isn't valid, a processor exception causes the OS to
potentially swap something out, and something else in.

Once we're paging from the swapfile, the size of the read is perhaps 4k.
And that read is regardless of whether we're only going to use one
byte or all of it.

The ratio between an access which was in the L1 cache and one which
required a page to be swapped in from disk? Much bigger than your
10,000 figure. But hopefully it doesn't happen a big percentage of the
time.

Many, many other variables, like the fact that RAM chips are not
directly addressable by bytes, but instead count on rows and columns.
So if you access many bytes in the same row, it can be much quicker than
random access. So simple access time specifications don't mean as much
as it would seem; the controller has to balance the RAM spec with the
various cache requirements.
--
DaveA

wxjm...@gmail.com

unread,
Feb 27, 2015, 4:07:20 AM2/27/15
to
Le vendredi 27 février 2015 02:05:38 UTC+1, Steven D'Aprano a écrit :
>
> E.g. I think we should all agree that the English "A" and the French "A"
> shouldn't count as separate characters, although the Greek "Α" and
> Russian "А" do.
>


Yes. Simple and logical explaination.
Unicode does not handle languages per se, it encodes scripts
for languages.

Steven D'Aprano

unread,
Feb 27, 2015, 6:55:10 AM2/27/15
to
Dave Angel wrote:

> On 02/27/2015 12:58 AM, Steven D'Aprano wrote:
>> Dave Angel wrote:
>>
>>> (Although I believe Seymour Cray was quoted as saying that virtual
>>> memory is a crock, because "you can't fake what you ain't got.")
>>
>> If I recall correctly, disk access is about 10000 times slower than RAM,
>> so virtual memory is *at least* that much slower than real memory.
>>
>
> It's so much more complicated than that, that I hardly know where to
> start.

[snip technical details]

As interesting as they were, none of those details will make swap faster,
hence my comment that virtual memory is *at least* 10000 times slower than
RAM.




--
Steven

Dave Angel

unread,
Feb 27, 2015, 9:03:21 AM2/27/15
to pytho...@python.org
The term "virtual memory" is used for many aspects of the modern memory
architecture. But I presume you're using it in the sense of "running in
a swapfile" as opposed to running in physical RAM.

Yes, a page fault takes on the order of 10,000 times as long as an
access to a location in L1 cache. I suspect it's a lot smaller though
if the swapfile is on an SSD drive. The first byte is that slow.

But once the fault is resolved, the nearby bytes are in physical memory,
and some of them are in L3, L2, and L1. So you're not running in the
swapfile any more. And even when you run off the end of the page,
fetching the sequentially adjacent page from a hard disk is much faster.
And if the disk has well designed buffering, faster yet. The OS tries
pretty hard to keep the swapfile unfragmented.

The trick is to minimize the number of page faults, especially to random
locations. If you're getting lots of them, it's called thrashing.

There are tools to help with that. To minimize page faults on code,
linking with a good working-set-tuner can help, though I don't hear of
people bothering these days. To minimize page faults on data, choosing
one's algorithm carefully can help. For example, in scanning through a
typical matrix, row order might be adjacent locations, while column
order might be scattered.

Not really much different than reading a text file. If you can arrange
to process it a line at a time, rather than reading the whole file into
memory, you generally minimize your round-trips to disk. And if you
need to randomly access it, it's quite likely more efficient to memory
map it, in which case it temporarily becomes part of the swapfile system.

--
DaveA

Chris Angelico

unread,
Feb 27, 2015, 9:22:42 AM2/27/15
to pytho...@python.org
On Sat, Feb 28, 2015 at 1:02 AM, Dave Angel <da...@davea.name> wrote:
> The term "virtual memory" is used for many aspects of the modern memory
> architecture. But I presume you're using it in the sense of "running in a
> swapfile" as opposed to running in physical RAM.

Given that this started with a quote about "you can't fake what you
ain't got", I would say that, yes, this refers to using hard disk to
provide more RAM.

If you're trying to use the pagefile/swapfile as if it's more memory
("I have 256MB of memory, but 10GB of swap space, so that's 10GB of
memory!"), then yes, these performance considerations are huge. But
suppose you need to run a program that's larger than your available
RAM. On MS-DOS, sometimes you'd need to work with program overlays (a
concept borrowed from older systems, but ones that I never worked on,
so I'm going back no further than DOS here). You get a *massive*
complexity hit the instant you start using them, whether your program
would have been able to fit into memory on some systems or not. Just
making it possible to have only part of your code in memory places
demands on your code that you, the programmer, have to think about.
With virtual memory, though, you just write your code as if it's all
in memory, and some of it may, at some times, be on disk. Less code to
debug = less time spent debugging. The performance question is largely
immaterial (you'll be using the disk either way), but the savings on
complexity are tremendous. And then when you do find yourself running
on a system with enough RAM? No code changes needed, and full
performance. That's where virtual memory shines.

It's funny how the world changes, though. Back in the 90s, virtual
memory was the key. No home computer ever had enough RAM. Today? A
home-grade PC could easily have 16GB... and chances are you don't need
all of that. So we go for the opposite optimization: disk caching.
Apart from when I rebuild my "Audio-Only Frozen" project [1] and the
caches get completely blasted through, heaps and heaps of my work can
be done inside the disk cache. Hey, Sikorsky, got any files anywhere
on the hard disk matching *Pastel*.iso case insensitively? *chug chug
chug* Nope. Okay. Sikorsky, got any files matching *Pas5*.iso case
insensitively? *zip* Yeah, here it is. I didn't tell the first search
to hold all that file system data in memory; the hard drive controller
managed it all for me, and I got the performance benefit. Same as the
above: the main benefit is that this sort of thing requires zero
application code complexity. It's all done in a perfectly generic way
at a lower level.

ChrisA

Dave Angel

unread,
Feb 27, 2015, 10:25:39 AM2/27/15
to pytho...@python.org
In 1973, I did manual swapping to an external 8k ramdisk. It was a box
that sat on the floor and contained 8k of core memory (not
semiconductor). The memory was non-volatile, so it contained the
working copy of my code. Then I built a small swapper that would bring
in the set of routines currently needed. My onboard RAM (semiconductor)
was 1.5k, which had to hold the swapper, the code, and the data. I was
writing a GPS system for shipboard use, and the final version of the
code had to fit entirely in EPROM, 2k of it. But debugging EPROM code
is a pain, since every small change took half an hour to make new chips.

Later, I built my first PC with 512k of RAM, and usually used much of it
as a ramdisk, since programs didn't use nearly that amount.


--
DaveA

alister

unread,
Feb 27, 2015, 11:01:04 AM2/27/15
to
On Sat, 28 Feb 2015 01:22:15 +1100, Chris Angelico wrote:

>
> If you're trying to use the pagefile/swapfile as if it's more memory ("I
> have 256MB of memory, but 10GB of swap space, so that's 10GB of
> memory!"), then yes, these performance considerations are huge. But
> suppose you need to run a program that's larger than your available RAM.
> On MS-DOS, sometimes you'd need to work with program overlays (a concept
> borrowed from older systems, but ones that I never worked on, so I'm
> going back no further than DOS here). You get a *massive* complexity hit
> the instant you start using them, whether your program would have been
> able to fit into memory on some systems or not. Just making it possible
> to have only part of your code in memory places demands on your code
> that you, the programmer, have to think about. With virtual memory,
> though, you just write your code as if it's all in memory, and some of
> it may, at some times, be on disk. Less code to debug = less time spent
> debugging. The performance question is largely immaterial (you'll be
> using the disk either way), but the savings on complexity are
> tremendous. And then when you do find yourself running on a system with
> enough RAM? No code changes needed, and full performance. That's where
> virtual memory shines.
> ChrisA

I think there is a case for bringing back the overlay file, or at least
loading larger programs in sections
only loading the routines as they are required could speed up the start
time of many large applications.
examples libre office, I rarely need the mail merge function, the word
count and may other features that could be added into the running
application on demand rather than all at once.

obviously with large memory & virtual mem there is no need to un-install
them once loaded.



--
Ralph's Observation:
It is a mistake to let any mechanical object realise that you
are in a hurry.

Chris Angelico

unread,
Feb 27, 2015, 11:12:57 AM2/27/15
to pytho...@python.org
On Sat, Feb 28, 2015 at 3:00 AM, alister
<alister.n...@ntlworld.com> wrote:
> I think there is a case for bringing back the overlay file, or at least
> loading larger programs in sections
> only loading the routines as they are required could speed up the start
> time of many large applications.
> examples libre office, I rarely need the mail merge function, the word
> count and may other features that could be added into the running
> application on demand rather than all at once.

Downside of that is twofold: firstly the complexity that I already
mentioned, and secondly you pay the startup cost on first usage. So
you might get into the program a bit faster, but as soon as you go to
any feature you didn't already hit this session, the program pauses
for a bit and loads it. Sometimes startup cost is the best time to do
this sort of thing.

Of course, there is an easy way to implement exactly what you're
asking for: use separate programs for everything, instead of expecting
a megantic office suite[1] to do everything for you. Just get yourself
a nice simple text editor, then invoke other programs - maybe from a
terminal, or maybe from within the editor - to do the rest of the
work. A simple disk cache will mean that previously-used programs
start up quickly.

ChrisA

[1] It's slightly less bloated than the gigantic office suite sold by
a top-end software company.

alister

unread,
Feb 27, 2015, 11:46:01 AM2/27/15
to
On Sat, 28 Feb 2015 03:12:16 +1100, Chris Angelico wrote:

> On Sat, Feb 28, 2015 at 3:00 AM, alister
> <alister.n...@ntlworld.com> wrote:
>> I think there is a case for bringing back the overlay file, or at least
>> loading larger programs in sections only loading the routines as they
>> are required could speed up the start time of many large applications.
>> examples libre office, I rarely need the mail merge function, the word
>> count and may other features that could be added into the running
>> application on demand rather than all at once.
>
> Downside of that is twofold: firstly the complexity that I already
> mentioned, and secondly you pay the startup cost on first usage. So you
> might get into the program a bit faster, but as soon as you go to any
> feature you didn't already hit this session, the program pauses for a
> bit and loads it. Sometimes startup cost is the best time to do this
> sort of thing.
>
If the modules are small enough this may not be noticeable but yes I do
accept there may be delays on first usage.

As to the complexity it has been my observation that as the memory
footprint available to programmers has increase they have become less &
less skilled at writing code.

of course my time as a professional programmer was over 20 years ago on 8
bit micro controllers with 8k of ROM (eventually, original I only had 2k
to play with) & 128 Bytes (yes bytes!) of RAM so I am very out of date.

I now play with python because it is so much less demanding of me which
probably makes me just a guilty :-)

> Of course, there is an easy way to implement exactly what you're asking
> for: use separate programs for everything, instead of expecting a
> megantic office suite[1] to do everything for you. Just get yourself a
> nice simple text editor, then invoke other programs - maybe from a
> terminal, or maybe from within the editor - to do the rest of the work.
> A simple disk cache will mean that previously-used programs start up
> quickly.
Libre office was sighted as just one example
Video editing suites are another that could be used as an example
(perhaps more so, does the rendering engine need to be loaded until you
start generating the output? a small delay here would be insignificant)
>
> ChrisA
>
> [1] It's slightly less bloated than the gigantic office suite sold by a
> top-end software company.





--
You don't sew with a fork, so I see no reason to eat with knitting
needles.
-- Miss Piggy, on eating Chinese Food

Chris Angelico

unread,
Feb 27, 2015, 12:46:20 PM2/27/15
to pytho...@python.org
On Sat, Feb 28, 2015 at 3:45 AM, alister
<alister.n...@ntlworld.com> wrote:
> On Sat, 28 Feb 2015 03:12:16 +1100, Chris Angelico wrote:
>
>> On Sat, Feb 28, 2015 at 3:00 AM, alister
>> <alister.n...@ntlworld.com> wrote:
>>> I think there is a case for bringing back the overlay file, or at least
>>> loading larger programs in sections only loading the routines as they
>>> are required could speed up the start time of many large applications.
>>> examples libre office, I rarely need the mail merge function, the word
>>> count and may other features that could be added into the running
>>> application on demand rather than all at once.
>>
>> Downside of that is twofold: firstly the complexity that I already
>> mentioned, and secondly you pay the startup cost on first usage. So you
>> might get into the program a bit faster, but as soon as you go to any
>> feature you didn't already hit this session, the program pauses for a
>> bit and loads it. Sometimes startup cost is the best time to do this
>> sort of thing.
>>
> If the modules are small enough this may not be noticeable but yes I do
> accept there may be delays on first usage.
>
> As to the complexity it has been my observation that as the memory
> footprint available to programmers has increase they have become less &
> less skilled at writing code.

Perhaps, but on the other hand, the skill of squeezing code into less
memory is being replaced by other skills. We can write code that takes
the simple/dumb approach, let it use an entire megabyte of memory, and
not care about the cost... and we can write that in an hour, instead
of spending a week fiddling with it. Reducing the development cycle
time means we can add all sorts of cool features to a program, all
while the original end user is still excited about it. (Of course, a
comparison between today's World Wide Web and that of the 1990s
suggests that these cool features aren't necessarily beneficial, but
still, we have the option of foregoing austerity.)

> Video editing suites are another that could be used as an example
> (perhaps more so, does the rendering engine need to be loaded until you
> start generating the output? a small delay here would be insignificant)

Hmm, I'm not sure that's actually a big deal, because your *data* will
dwarf the code. I can fire up sox and avconv, both fairly large
programs, and their code will all sit comfortably in memory; but then
they get to work on my data, and suddenly my hard disk is chewing
through 91GB of content. Breaking up avconv into a dozen pieces
wouldn't make a dent in 91GB.

ChrisA

Grant Edwards

unread,
Feb 27, 2015, 12:46:23 PM2/27/15
to
Nonsense. On all of my machines, virtual memory _is_ RAM almost all
of the time. I don't do the type of things that force the usage of
swap.

--
Grant Edwards grant.b.edwards Yow! ... I want FORTY-TWO
at TRYNEL FLOATATION SYSTEMS
gmail.com installed within SIX AND A
HALF HOURS!!!

Grant Edwards

unread,
Feb 27, 2015, 12:47:22 PM2/27/15
to
On 2015-02-27, Grant Edwards <inv...@invalid.invalid> wrote:
> On 2015-02-27, Steven D'Aprano <steve+comp....@pearwood.info> wrote: Dave Angel wrote:
>>> On 02/27/2015 12:58 AM, Steven D'Aprano wrote: Dave Angel wrote:
>>>>
>>>>> (Although I believe Seymour Cray was quoted as saying that virtual
>>>>> memory is a crock, because "you can't fake what you ain't got.")
>>>>
>>>> If I recall correctly, disk access is about 10000 times slower than RAM,
>>>> so virtual memory is *at least* that much slower than real memory.
>>>>
>>> It's so much more complicated than that, that I hardly know where to
>>> start.
>>
>> [snip technical details]
>>
>> As interesting as they were, none of those details will make swap faster,
>> hence my comment that virtual memory is *at least* 10000 times slower than
>> RAM.
>
> Nonsense. On all of my machines, virtual memory _is_ RAM almost all
> of the time. I don't do the type of things that force the usage of
> swap.

And on some of the embedded systems I work on, _all_ virtual memory is
RAM 100.000% of the time.

--
Grant Edwards grant.b.edwards Yow! Don't SANFORIZE me!!
at
gmail.com

MRAB

unread,
Feb 27, 2015, 2:14:54 PM2/27/15
to pytho...@python.org
On 2015-02-27 16:45, alister wrote:
> On Sat, 28 Feb 2015 03:12:16 +1100, Chris Angelico wrote:
>
>> On Sat, Feb 28, 2015 at 3:00 AM, alister
>> <alister.n...@ntlworld.com> wrote:
>>> I think there is a case for bringing back the overlay file, or at least
>>> loading larger programs in sections only loading the routines as they
>>> are required could speed up the start time of many large applications.
>>> examples libre office, I rarely need the mail merge function, the word
>>> count and may other features that could be added into the running
>>> application on demand rather than all at once.
>>
>> Downside of that is twofold: firstly the complexity that I already
>> mentioned, and secondly you pay the startup cost on first usage. So you
>> might get into the program a bit faster, but as soon as you go to any
>> feature you didn't already hit this session, the program pauses for a
>> bit and loads it. Sometimes startup cost is the best time to do this
>> sort of thing.
>>
> If the modules are small enough this may not be noticeable but yes I do
> accept there may be delays on first usage.
>
I suppose you could load the basic parts first so that the user can
start working, and then load the additional features in the background.

blue

unread,
Feb 27, 2015, 3:13:33 PM2/27/15
to
On Wednesday, February 11, 2015 at 1:38:12 AM UTC+2, vlya...@gmail.com wrote:
> I defined function Fatalln in "mydef.py" and it works fine if i call it from "mydef.py", but when i try to call it from "test.py" in the same folder:
> import mydef
> ...
> Fatalln "my test"
> i have NameError: name 'Fatalln' is not defined
> I also tried include('mydef.py') with the same result...
> What is the right syntax?
> Thanks

...try to set your python utf-8 encode .
and read the FAQ or python manual

Dave Angel

unread,
Feb 27, 2015, 3:52:45 PM2/27/15
to pytho...@python.org
I can't say how Linux handles it (I'd like to know, but haven't needed
to yet), but in Windows (NT, XP, etc), a DLL is not "loaded", but rather
mapped. And it's not copied into the swapfile, it's mapped directly
from the DLL. The mapping mode is "copy-on-write" which means that
read=only portions are swapped directly from the DLL, on first usage,
while read-write portions (eg. static/global variables, relocation
modifications) are copied on first use to the swap file. I presume
EXE's are done the same way, but never had a need to know.

If that's the case on the architectures you're talking about, then the
problem of slow loading is not triggered by the memory usage, but by
lots of initialization code. THAT's what should be deferred for
seldom-used portions of code.

The main point of a working-set-tuner is to group sections of code
together that are likely to be used together. To take an extreme case,
all the fatal exception handlers should be positioned adjacent to each
other in linear memory, as it's unlikely that any of them will be
needed, and the code takes up no time or space in physical memory.

Also (in Windows), a DLL can be pre-relocated, so that it has a
preferred address to be loaded into memory. If that memory is available
when it gets loaded (actually mapped), then no relocation needs to
happen, which saves time and swap space.

In the X86 architecture, most code is self-relocating, everything is
relative. But references to other DLL's and jump tables were absolute,
so they needed to be relocated at load time, when final locations were
nailed down.

Perhaps the authors of bloated applications have forgotten how to do
these, as the defaults in the linker puts all DLL's in the same
location, meaning all but the first will need relocating. But system
DLL's are (were) each given unique addresses.

On one large project, I added the build step of assigning these base
addresses. Each DLL had to start on a 64k boundary, and I reserved some
fractional extra space between them in case one would grow. Then every
few months, we double-checked that they didn't overlap, and if necessary
adjusted the start addresses. We didn't just automatically assign
closest addresses, because frequently some of the DLL's would be updated
independently of the others.
--
DaveA

Chris Angelico

unread,
Feb 27, 2015, 4:05:12 PM2/27/15
to pytho...@python.org
On Sat, Feb 28, 2015 at 7:52 AM, Dave Angel <da...@davea.name> wrote:
> If that's the case on the architectures you're talking about, then the
> problem of slow loading is not triggered by the memory usage, but by lots of
> initialization code. THAT's what should be deferred for seldom-used
> portions of code.

s/should/can/

It's still not a clear case of "should", as it's all a big pile of
trade-offs. A few weeks ago I made a very deliberate change to a
process to force some code to get loaded and initialized earlier, to
prevent an unexpected (and thus surprising) slowdown on first use. (It
was, in fact, a Python 'import' statement, so all I had to do was add
a dummy import in the main module - with, of course, a comment making
it clear that this was necessary, even though the name wasn't used.)

But yes, seldom-used code can definitely have its initialization
deferred if you need to speed up startup.

ChrisA

alister

unread,
Feb 27, 2015, 5:09:56 PM2/27/15
to
On Fri, 27 Feb 2015 19:14:00 +0000, MRAB wrote:

>>
> I suppose you could load the basic parts first so that the user can
> start working, and then load the additional features in the background.
>
quite possible
my opinion on this is very fluid
it may work for some applications, it probably wouldn't for others.

with python it is generally considered good practice to import all
modules at the start of a program but there are valid cases fro only
importing a module if actually needed.




--
Some people have parts that are so private they themselves have no
knowledge of them.

alister

unread,
Feb 27, 2015, 5:13:23 PM2/27/15
to
On Sat, 28 Feb 2015 04:45:04 +1100, Chris Angelico wrote:
> Perhaps, but on the other hand, the skill of squeezing code into less
> memory is being replaced by other skills. We can write code that takes
> the simple/dumb approach, let it use an entire megabyte of memory, and
> not care about the cost... and we can write that in an hour, instead of
> spending a week fiddling with it. Reducing the development cycle time
> means we can add all sorts of cool features to a program, all while the
> original end user is still excited about it. (Of course, a comparison
> between today's World Wide Web and that of the 1990s suggests that these
> cool features aren't necessarily beneficial, but still, we have the
> option of foregoing austerity.)
>
>
> ChrisA

again I am fluid on this 'Clever' programming is often counter productive
& unmaintainable, but the again lazy programming can also be just as bad
fro this, there is no "one size fits all" solution but the modern
environment does make lazy programming very easy.



--
After all, all he did was string together a lot of old, well-known
quotations.
-- H.L. Mencken, on Shakespeare

Rustom Mody

unread,
Mar 3, 2015, 1:04:06 PM3/3/15
to
On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> On 2/26/2015 8:24 AM, Chris Angelico wrote:
> > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote:
> >> Wrote something up on why we should stop using ASCII:
> >> http://blog.languager.org/2015/02/universal-unicode.html
>
> I think that the main point of the post, that many Unicode chars are
> truly planetary rather than just national/regional, is excellent.

<snipped>

> You should add emoticons, but not call them or the above 'gibberish'.
> I think that this part of your post is more 'unprofessional' than the
> character blocks. It is very jarring and seems contrary to your main point.

Ok Done

References to gibberish removed from
http://blog.languager.org/2015/02/universal-unicode.html

What I was trying to say expanded here
http://blog.languager.org/2015/03/whimsical-unicode.html
[Hope the word 'whimsical' is less jarring and more accurate than 'gibberish']

wxjm...@gmail.com

unread,
Mar 3, 2015, 1:37:06 PM3/3/15
to
========

Emoji and Dingbats are now part of Unicode.
They should be considered as well as a "1" or a "a"
or a "mathematical alpha".
So, there is nothing special to say about them.

jmf

Chris Angelico

unread,
Mar 3, 2015, 1:44:11 PM3/3/15
to pytho...@python.org
On Wed, Mar 4, 2015 at 5:03 AM, Rustom Mody <rusto...@gmail.com> wrote:
> What I was trying to say expanded here
> http://blog.languager.org/2015/03/whimsical-unicode.html
> [Hope the word 'whimsical' is less jarring and more accurate than 'gibberish']

Re footnote #4: ½ is a single character for compatibility reasons.
⅟₁₀₀ doesn't need to be a single character, because there are
countably infinite vulgar fractions and only 0x110000 Unicode
characters.

ChrisA

Terry Reedy

unread,
Mar 3, 2015, 6:30:54 PM3/3/15
to pytho...@python.org
I agree with both.

--
Terry Jan Reedy

Rustom Mody

unread,
Mar 3, 2015, 9:53:52 PM3/3/15
to
On Wednesday, March 4, 2015 at 12:14:11 AM UTC+5:30, Chris Angelico wrote:
> On Wed, Mar 4, 2015 at 5:03 AM, Rustom Mody wrote:
> > What I was trying to say expanded here
> > http://blog.languager.org/2015/03/whimsical-unicode.html
> > [Hope the word 'whimsical' is less jarring and more accurate than 'gibberish']
>
> Re footnote #4: ½ is a single character for compatibility reasons.
> ⅟₁₀₀ ...
^^^

Neat
Thanks
[And figured out some of quopri module along the way figuring that out]

Steven D'Aprano

unread,
Mar 3, 2015, 9:54:40 PM3/3/15
to
Rustom Mody wrote:

> On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
>> On 2/26/2015 8:24 AM, Chris Angelico wrote:
>> > On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody wrote:
>> >> Wrote something up on why we should stop using ASCII:
>> >> http://blog.languager.org/2015/02/universal-unicode.html
>>
>> I think that the main point of the post, that many Unicode chars are
>> truly planetary rather than just national/regional, is excellent.
>
> <snipped>
>
>> You should add emoticons, but not call them or the above 'gibberish'.
>> I think that this part of your post is more 'unprofessional' than the
>> character blocks. It is very jarring and seems contrary to your main
>> point.
>
> Ok Done
>
> References to gibberish removed from
> http://blog.languager.org/2015/02/universal-unicode.html

I consider it unethical to make semantic changes to a published work in
place without acknowledgement. Fixing minor typos or spelling errors, or
dead links, is okay. But any edit that changes the meaning should be
commented on, either by an explicit note on the page itself, or by striking
out the previous content and inserting the new.

As for the content of the essay, it is currently rather unfocused. It
appears to be more of a list of "here are some Unicode characters I think
are interesting, divided into subgroups, oh and here are some I personally
don't have any use for, which makes them silly" than any sort of discussion
about the universality of Unicode. That makes it rather idiosyncratic and
parochial. Why should obscure maths symbols be given more importance than
obscure historical languages?

I think that the universality of Unicode could be explained in a single
sentence:

"It is the aim of Unicode to be the one character set anyone needs to
represent every character, ideogram or symbol (but not necessarily distinct
glyph) from any existing or historical human language."

I can expand on that, but in a nutshell that is it.


You state:

"APL and Z Notation are two notable languages APL is a programming language
and Z a specification language that did not tie themselves down to a
restricted charset ..."


but I don't think that is correct. I'm pretty sure that neither APL nor Z
allowed you to define new characters. They might not have used ASCII alone,
but they still had a restricted character set. It was merely less
restricted than ASCII.

You make a comment about Cobol's relative unpopularity, but (1) Cobol
doesn't require you to write out numbers as English words, and (2) Cobol is
still used, there are uncounted billions of lines of Cobol code being used,
and if the number of Cobol programmers is less now than it was 16 years
ago, there are still a lot of them. Academics and FOSS programmers don't
think much of Cobol, but it has to count as one of the most amazing success
stories in the field of programming languages, despite its lousy design.

You list ideographs such as Cuneiform under "Icons". They are not icons.
They are a mixture of symbols used for consonants, syllables, and
logophonetic, consonantal alphabetic and syllabic signs. That sits them
firmly in the same categories as modern languages with consonants, ideogram
languages like Chinese, and syllabary languages like Cheyenne.

Just because native readers of Cuneiform are all dead doesn't make Cuneiform
unimportant. There are probably more people who need to write Cuneiform
than people who need to write APL source code.

You make a comment:

"To me – a unicode-layman – it looks unprofessional… Billions of computing
devices world over, each having billions of storage words having their
storage wasted on blocks such as these??"

But that is nonsense, and it contradicts your earlier quoting of Dave Angel.
Why are you so worried about an (illusionary) minor optimization?

Whether code points are allocated or not doesn't affect how much space they
take up. There are millions of unused Unicode code points today. If they
are allocated tomorrow, the space your documents take up will not increase
one byte.

Allocating code points to Cuneiform has not increased the space needed by
Unicode at all. Two bytes alone is not enough for even existing human
languages (thanks China). For hardware related reasons, it is faster and
more efficient to use four bytes than three, so the obvious and "dumb" (in
the simplest thing which will work) way to store Unicode is UTF-32, which
takes a full four bytes per code point, regardless of whether there are
65537 code points or 1114112. That makes it less expensive than floating
point numbers, which take eight. Would you like to argue that floating
point doubles are "unprofessional" and wasteful?

As Dave pointed out, and you apparently agreed with him enough to quote him
TWICE (once in each of two blog posts), history of computing is full of
premature optimizations for space. (In fact, some of these may have been
justified by the technical limitations of the day.) Technically Unicode is
also limited, but it is limited to over one million code points, 1114112 to
be exact, although some of them are reserved as invalid for technical
reasons, and there is no indication that we'll ever run out of space in
Unicode.

In practice, there are three common Unicode encodings that nearly all
Unicode documents will use.

* UTF-8 will use between one and (by memory) four bytes per code
point. For Western European languages, that will be mostly one
or two bytes per character.

* UTF-16 uses a fixed two bytes per code point in the Basic Multilingual
Plane, which is enough for nearly all Western European writing and
much East Asian writing as well. For the rest, it uses a fixed four
bytes per code point.

* UTF-32 uses a fixed four bytes per code point. Hardly anyone uses
this as a storage format.


In *all three cases*, the existence of hieroglyphs and cuneiform in Unicode
doesn't change the space used. If you actually include a few hieroglyphs to
your document, the space increases only by the actual space used by those
hieroglyphs: four bytes per hieroglyph. At no time does the existence of a
single hieroglyph in your document force you to expand the non-hieroglyph
characters to use more space.


> What I was trying to say expanded here
> http://blog.languager.org/2015/03/whimsical-unicode.html

You have at least two broken links, referring to a non-existent page:

http://blog.languager.org/2015/03/unicode-universal-or-whimsical.html

This essay seems to be even more rambling and unfocused than the first. What
does the cost of semi-conductor plants have to do with whether or not
programmers support Unicode in their applications?

Your point about the UTF-8 "BOM" is valid only if you interpret it as a Byte
Order Mark. But if you interpret it as an explicit UTF-8 signature or mark,
it isn't so silly. If your text begins with the UTF-8 mark, treat it as
UTF-8. It's no more silly than any other heuristic, like HTML encoding tags
or text editor's encoding cookies.

Your discussion of "complexifiers and simplifiers" doesn't seem to be
terribly relevant, or at least if it is relevant, you don't give any reason
for it. The whole thing about Moore's Law and the cost of semi-conductor
plants seems irrelevant to Unicode except in the most over-generalised
sense of "things are bigger today than in the past, we've gone from
five-bit Baudot codes to 23 bit Unicode". Yeah, okay. So what's your point?

You agree that 16-bits are not enough, and yet you critice Unicode for using
more than 16-bits on wasteful, whimsical gibberish like Cuneiform? That is
an inconsistent position to take.

UTF-16 is not half-arsed Unicode support. UTF-16 is full Unicode support.

The problem is when your language treats UTF-16 as a fixed-width two-byte
format instead of a variable-width, two- or four-byte format. (That's more
or less like the old, obsolete, UCS-2 standard.) There are all sorts of
good ways to solve the problem of surrogate pairs and the SMPs in UTF-16.
If some programming languages or software fails to do so, they are buggy,
not UTF-16.

After explaining that 16 bits are not enough, you then propose a 16 bit
standard. /face-palm

UTF-16 cannot break the fixed with invariant, because it has no fixed width
invariant. That's like arguing against UTF-8 because it breaks the fixed
width invariant "all characters are single byte ASCII characters".

If you cannot handle SMP characters, you are not supporting Unicode.


You suggest that Chinese users should be looking at Big5 or GB. I really,
really don't think so.

- Neither is universal. What makes you think that Chinese writers need
to use maths symbols, or include (say) Thai or Russian in their work
any less than Western writers do?

- Neither even support all of Chinese. Big5 supports Traditional
Chinese, but not Simplified Chinese. GB supports Simplified
Chinese, but not Traditional Chinese.

- Big5 likewise doesn't support placenames, many people's names, and
other less common parts of Chinese.

- Big5 is a shift-system, like Shift-JIS, and suffers from the same sort
of data corruption issues.

- There is no one single Big5 standard, but a whole lot of vendor
extensions.


You say:

"I just want to suggest that the Unicode consortium going overboard in
adding zillions of codepoints of nearly zero usefulness, is in fact
undermining unicode’s popularity and spread."

Can you demonstrate this? Can you show somebody who says "Well, I was going
to support full Unicode, but since they added a snowman, I'm going to stick
to ASCII"?

The "whimsical" characters you are complaining about were important enough
to somebody to spend significant amounts of time and money to write up a
proposal, have it go through the Unicode Consortium bureaucracy, and
eventually have it accepted. That's not easy or cheap, and people didn't
add a snowman on a whim. They did it because there are a whole lot of
people who want a shared standard for map symbols.

It is easy to mock what is not important to you. I daresay kids adding emoji
to their 10 character tweets would mock all the useless maths symbols in
Unicode too.




--
Steven

Chris Angelico

unread,
Mar 3, 2015, 10:15:13 PM3/3/15