function inclusion problem

551 views
Skip to first unread message

vlya...@gmail.com

unread,
Feb 10, 2015, 6:38:12 PM2/10/15
to
I defined function Fatalln in "mydef.py" and it works fine if i call it from "mydef.py", but when i try to call it from "test.py" in the same folder:
import mydef
...
Fatalln "my test"
i have NameError: name 'Fatalln' is not defined
I also tried include('mydef.py') with the same result...
What is the right syntax?
Thanks

sohca...@gmail.com

unread,
Feb 10, 2015, 6:55:48 PM2/10/15
to
It would help us help you a lot of you copy/paste your code from both mydef.py and test.py so we can see exactly what you're doing.

Don't re-type what you entered, because people (Especially new programmers) are prone to either making typos or leaving out certain things because they don't think they're important. Copy/Paste the code from the two files and then copy/paste the error you're getting.

Steven D'Aprano

unread,
Feb 10, 2015, 6:58:03 PM2/10/15
to
Preferred:

import mydef
mydef.Fatalln("my test")



Also acceptable:


from mydef import Fatalln
Fatalln("my test")





--
Steven

Michael Torrie

unread,
Feb 10, 2015, 7:00:37 PM2/10/15
to pytho...@python.org
Almost.

Try this:

mydef.Fatalln()

Unless you import the symbols from your mydef module into your program
they have to referenced by the module name. This is a good thing and it
helps keep your code separated and clean. It is possible to import
individual symbols from a module like this:

from mydef import Fatalln

Avoid the temptation to import *all* symbols from a module into the
current program's namespace. Better to type out the extra bit.
Alternatively you can alias imports like this

import somemodule.submodule as foo

Frequently this idiom is used when working with numpy to save a bit of
time, while preserving the separate namespaces.

sohca...@gmail.com

unread,
Feb 10, 2015, 7:03:06 PM2/10/15
to
On Tuesday, February 10, 2015 at 3:38:12 PM UTC-8, vlya...@gmail.com wrote:
If you only do `import mydef`, then it creates a module object called `mydef` which contains all the global members in mydef.py. When you want to call a function from that module, you need to specify that you're calling a function from that module by putting the module name followed by a period, then the function. For example:

mydef.Fatalln("my test")

If you wanted to be able to call Fatalln without using the module name, you could import just the Fatalln function:

from mydef import Fatalln
Fatalln("my test")

If you had a lot of functions in mydef.py and wanted to be able to access them all without that pesky module name, you could also do:

from mydef import *

However, this can often be considered a bad practice as you're polluting your global name space, though can be acceptable in specific scenarios.

For more information, check https://docs.python.org/3/tutorial/modules.html

Ian Kelly

unread,
Feb 10, 2015, 7:03:36 PM2/10/15
to Python
import mydef
mydef.Fatalin("my test")

or

from mydef import Fatalin
Fatalin("my test")

Laura Creighton

unread,
Feb 10, 2015, 7:06:55 PM2/10/15
to vlya...@gmail.com, pytho...@python.org, l...@openend.se
>--
>https://mail.python.org/mailman/listinfo/python-list

from mydef import Fatalln

Laura Creighton

unread,
Feb 10, 2015, 7:17:00 PM2/10/15
to Laura Creighton, pytho...@python.org, vlya...@gmail.com, l...@openend.se
In a message of Wed, 11 Feb 2015 01:06:00 +0100, Laura Creighton writes:
>In a message of Tue, 10 Feb 2015 15:38:02 -0800, vlya...@gmail.com writes:
>>--
>>https://mail.python.org/mailman/listinfo/python-list
>
>from mydef import Fatalln
>

Also, please be warned. If you use a unix system, or a linux
system. There are lots of problems you can get into if you
expect something named 'test' to run your code. Because they
already have one in their shell, and that one wins, and so ...
well, test.py is safe. But if you rename it as a script and call
it the binary file test ...

Bad and unexpected things happen.

Name it 'testme' or something like that. Never have that problem again.
:)

Been there, done that!
Laura

Message has been deleted

Victor L

unread,
Feb 11, 2015, 8:28:24 AM2/11/15
to Laura Creighton, pytho...@python.org
Laura, thanks for the answer - it works. Is there some equivalent of "include" to expose every function in that script?
Thanks again,
-V

On Tue, Feb 10, 2015 at 7:16 PM, Laura Creighton <l...@openend.se> wrote:
In a message of Wed, 11 Feb 2015 01:06:00 +0100, Laura Creighton writes:
>In a message of Tue, 10 Feb 2015 15:38:02 -0800, vlya...@gmail.com writes:

Dave Angel

unread,
Feb 11, 2015, 10:07:59 AM2/11/15
to pytho...@python.org
On 02/11/2015 08:27 AM, Victor L wrote:
> Laura, thanks for the answer - it works. Is there some equivalent of
> "include" to expose every function in that script?
> Thanks again,
> -V
>
Please don't top-post, and please use text email, not html. Thank you.

yes, as sohca...@gmail.com pointed out, you can do

from mydef import *

But this is nearly always bad practice. If there are only a few
functions you need access to, you should do

from mydef import Fatalln, func2, func42

and if there are tons of them, you do NOT want to pollute your local
namespace with them, and should do:

import mydef

x = mydef.func2() # or whatever

The assumption is that when the code is in an importable module, it'll
be maintained somewhat independently of the calling script/module. For
example, if you're using a third party library, it could be updated
without your having to rewrite your own calling code.

So what happens if the 3rd party adds a new function, and you happen to
have one by the same name. If you used the import* semantics, you could
suddenly have broken code, with the complaint "But I didn't change a thing."

Similarly, if you import from more than one module, and use the import*
form, they could conflict with each other. And the order of importing
will (usually) determine which names override which ones.

The time that it's reasonable to use import* is when the third-party
library already recommends it. They should only do so if they have
written their library to only expose a careful subset of the names
declared, and documented all of them. And when they make new releases,
they're careful to hide any new symbols unless carefully documented in
the release notes, so you can manually check for interference.

Incidentally, this is also true of the standard library. There are
symbols that are importable from multiple places, and sometimes they
have the same meanings, sometimes they don't. An example (in Python
2.7) of the latter is os.walk and os.path.walk

When I want to use one of those functions, I spell it out:
for dirpath, dirnames, filenames in os.walk(topname):

That way, there's no doubt in the reader's mind which one I intended.

--
DaveA

Tim Chase

unread,
Feb 11, 2015, 10:36:54 AM2/11/15
to pytho...@python.org
On 2015-02-11 10:07, Dave Angel wrote:
> if there are tons of them, you do NOT want to pollute your local
> namespace with them, and should do:
>
> import mydef
>
> x = mydef.func2() # or whatever

or, if that's verbose, you can give a shorter alias:

import Tkinter as tk
root = tk.Tk()
root.mainloop()

-tkc




Chris Angelico

unread,
Feb 11, 2015, 10:37:31 AM2/11/15
to pytho...@python.org
On Thu, Feb 12, 2015 at 2:07 AM, Dave Angel <d...@davea.name> wrote:
> Similarly, if you import from more than one module, and use the import*
> form, they could conflict with each other. And the order of importing will
> (usually) determine which names override which ones.

Never mind about conflicts and order of importing... just try figuring
out code like this:

from os import *
from sys import *
from math import *

# Calculate the total size of all files in a directory
tot = 0
for path, dirs, files in walk(argv[1]):
# We don't need to sum the directories separately
for f in files:
# getsizeof() returns a value in bytes
tot += getsizeof(f)/1024.0/1024.0

print("Total directory size:", floor(tot), "MB")

Now, I'm sure some of the experienced Python programmers here can see
exactly what's wrong. But can everyone? I doubt it. Even if you run
it, I doubt you'd get any better clue. But if you could see which
module everything was imported from, it'd be pretty obvious.

ChrisA

Laura Creighton

unread,
Feb 24, 2015, 2:58:21 PM2/24/15
to Laura Creighton, pytho...@python.org, l...@openend.se
Dave Angel
are you another Native English speaker living in a world where ASCII
is enough?

Laura

Dave Angel

unread,
Feb 24, 2015, 3:42:09 PM2/24/15
to pytho...@python.org
I'm a native English speaker, and 7 bits is not nearly enough. Even if
I didn't currently care, I have some history:

No. CDC display code is enough. Who needs lowercase?

No. Baudot code is enough.

No, EBCDIC is good enough. Who cares about other companies.

No, the "golf-ball" only holds this many characters. If we need more,
we can just get the operator to switch balls in the middle of printing.

No. 2 digit years is enough. This world won't last till the millennium
anyway.

No. 2k is all the EPROM you can have. Your code HAS to fit in it, and
only 1.5k RAM.

No. 640k is more than anyone could need.

No, you cannot use a punch card made on a model 26 keypunch in the same
deck as one made on a model 29. Too bad, many of the codes are
different. (This one cost me travel back and forth between two
different locations with different model keypunches)

No. 8 bits is as much as we could ever use for characters. Who could
possibly need names or locations outside of this region? Or from
multiple places within it?

35 years ago I helped design a serial terminal that "spoke" Chinese,
using a two-byte encoding. But a single worldwide standard didn't come
until much later, and I cheered Unicode when it was finally unveiled.

I've worked with many printers that could only print 70 or 80 unique
characters. The laser printer, and even the matrix printer are
relatively recent inventions.

Getting back on topic:

According to:
http://support.esri.com/cn/knowledgebase/techarticles/detail/27345

"""ArcGIS Desktop applications, such as ArcMap, are Unicode based, so
they support Unicode to a certain level. The level of Unicode support
depends on the data format."""

That page was written about 2004, so there was concern even then.

And according to another, """In the header of each shapefile (.DBF), a
reference to a code page is included."""

--
DaveA

Steven D'Aprano

unread,
Feb 24, 2015, 8:19:58 PM2/24/15
to
ASCII was never enough. Not even for Americans, who couldn't write things
like "I bought a comic book for 10¢ yesterday", let alone interesting
things from maths and science.

I missed the whole 7-bit ASCII period, my first computer (Mac 128K) already
had an extended character set beyond ASCII. But even that never covered the
full range of characters I wanted to write, and then there was the horrible
mess that you got whenever you copied text files from a Mac to a DOS or
Windows PC or visa versa. Yes, even in 1984 we were transferring files and
running into encoding issues.



--
Steven

Marcos Almeida Azevedo

unread,
Feb 24, 2015, 11:54:50 PM2/24/15
to Steven D'Aprano, pytho...@python.org
On Wed, Feb 25, 2015 at 9:19 AM, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
Laura Creighton wrote:

> Dave Angel
> are you another Native English speaker living in a world where ASCII
> is enough?

ASCII was never enough. Not even for Americans, who couldn't write things
like "I bought a comic book for 10¢ yesterday", let alone interesting
things from maths and science.


ASCII was a necessity back then because RAM and storage are too small.
 
I missed the whole 7-bit ASCII period, my first computer (Mac 128K) already
had an extended character set beyond ASCII. But even that never covered the

I miss the days when I was coding with my XT computer (640kb RAM) too.  Things were so simple back then.
 
full range of characters I wanted to write, and then there was the horrible
mess that you got whenever you copied text files from a Mac to a DOS or
Windows PC or visa versa. Yes, even in 1984 we were transferring files and
running into encoding issues.



--
Marcos | I love PHP, Linux, and Java

Rustom Mody

unread,
Feb 26, 2015, 7:40:25 AM2/26/15
to
Wrote something up on why we should stop using ASCII:
http://blog.languager.org/2015/02/universal-unicode.html

(Yeah the world is a bit larger than a small bunch of islands off a half-continent.
But this is not that discussion!)

Rustom Mody

unread,
Feb 26, 2015, 8:15:56 AM2/26/15
to
Dave's list above of instances of 'poverty is a good idea' turning out stupid and narrow-minded in hindsight is neat. Thought I'd ack that explicitly.

Chris Angelico

unread,
Feb 26, 2015, 8:24:34 AM2/26/15
to pytho...@python.org
On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rusto...@gmail.com> wrote:
> Wrote something up on why we should stop using ASCII:
> http://blog.languager.org/2015/02/universal-unicode.html

>From that post:

"""
5.1 Gibberish

When going from the original 2-byte unicode (around version 3?) to the
one having supplemental planes, the unicode consortium added blocks
such as

* Egyptian hieroglyphs
* Cuneiform
* Shavian
* Deseret
* Mahjong
* Klingon

To me (a layman) it looks unprofessional – as though they are playing
games – that billions of computing devices, each having billions of
storage words should have their storage wasted on blocks such as
these.
"""

The shift from Unicode as a 16-bit code to having multiple planes came
in with Unicode 2.0, but the various blocks were assigned separately:
* Egyptian hieroglyphs: Unicode 5.2
* Cuneiform: Unicode 5.0
* Shavian: Unicode 4.0
* Deseret: Unicode 3.1
* Mahjong Tiles: Unicode 5.1
* Klingon: Not part of any current standard

However, I don't think historians will appreciate you calling all of
these "gibberish". To adequately describe and discuss old texts
without these Unicode blocks, we'd have to either do everything with
images, or craft some kind of reversible transliteration system and
have dedicated software to render the texts on screen. Instead, what
we have is a well-known and standardized system for transliterating
all of these into numbers (code points), and rendering them becomes a
simple matter of installing an appropriate font.

Also, how does assigning meanings to codepoints "waste storage"? As
soon as Unicode 2.0 hit and 16-bit code units stopped being
sufficient, everyone needed to allocate storage - either 32 bits per
character, or some other system - and the fact that some codepoints
were unassigned had absolutely no impact on that. This is decidedly
NOT unprofessional, and it's not wasteful either.

ChrisA

Sam Raker

unread,
Feb 26, 2015, 11:46:11 AM2/26/15
to
I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so helps solve the problems affecting speakers of "underserved" languages--access and language preservation. Speakers of Mongolian, Cherokee, Georgian, etc. all deserve to be able to interact with technology in their native languages as much as we speakers of ASCII-friendly languages do. Unicode support also makes writing papers on, dictionaries of, and new texts in such languages much easier, which helps the fight against language extinction, which is a sadly pressing issue.

Also, like, computers are big. Get an external drive for your high-resolution PDF collection of Medieval manuscripts if you feel like you're running out of space. A few extra codepoints aren't going to be the straw that breaks the camel's back.


On Thursday, February 26, 2015 at 8:24:34 AM UTC-5, Chris Angelico wrote:
> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rusto...@gmail.com> wrote:
> > Wrote something up on why we should stop using ASCII:
> > http://blog.languager.org/2015/02/universal-unicode.html
>
> >From that post:
>
> """
> 5.1 Gibberish
>
> When going from the original 2-byte unicode (around version 3?) to the
> one having supplemental planes, the unicode consortium added blocks
> such as
>
> * Egyptian hieroglyphs
> * Cuneiform
> * Shavian
> * Deseret
> * Mahjong
> * Klingon
>
> To me (a layman) it looks unprofessional - as though they are playing
> games - that billions of computing devices, each having billions of

Terry Reedy

unread,
Feb 26, 2015, 12:03:44 PM2/26/15
to pytho...@python.org
On 2/26/2015 8:24 AM, Chris Angelico wrote:
> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rusto...@gmail.com> wrote:
>> Wrote something up on why we should stop using ASCII:
>> http://blog.languager.org/2015/02/universal-unicode.html

I think that the main point of the post, that many Unicode chars are
truly planetary rather than just national/regional, is excellent.

> From that post:
>
> """
> 5.1 Gibberish
>
> When going from the original 2-byte unicode (around version 3?) to the
> one having supplemental planes, the unicode consortium added blocks
> such as
>
> * Egyptian hieroglyphs
> * Cuneiform
> * Shavian
> * Deseret
> * Mahjong
> * Klingon
>
> To me (a layman) it looks unprofessional – as though they are playing
> games – that billions of computing devices, each having billions of
> storage words should have their storage wasted on blocks such as
> these.
> """
>
> The shift from Unicode as a 16-bit code to having multiple planes came
> in with Unicode 2.0, but the various blocks were assigned separately:
> * Egyptian hieroglyphs: Unicode 5.2
> * Cuneiform: Unicode 5.0
> * Shavian: Unicode 4.0
> * Deseret: Unicode 3.1
> * Mahjong Tiles: Unicode 5.1
> * Klingon: Not part of any current standard

You should add emoticons, but not call them or the above 'gibberish'.
I think that this part of your post is more 'unprofessional' than the
character blocks. It is very jarring and seems contrary to your main point.

> However, I don't think historians will appreciate you calling all of
> these "gibberish". To adequately describe and discuss old texts
> without these Unicode blocks, we'd have to either do everything with
> images, or craft some kind of reversible transliteration system and
> have dedicated software to render the texts on screen. Instead, what
> we have is a well-known and standardized system for transliterating
> all of these into numbers (code points), and rendering them becomes a
> simple matter of installing an appropriate font.
>
> Also, how does assigning meanings to codepoints "waste storage"? As
> soon as Unicode 2.0 hit and 16-bit code units stopped being
> sufficient, everyone needed to allocate storage - either 32 bits per
> character, or some other system - and the fact that some codepoints
> were unassigned had absolutely no impact on that. This is decidedly
> NOT unprofessional, and it's not wasteful either.

I agree.

--
Terry Jan Reedy


Rustom Mody

unread,
Feb 26, 2015, 12:08:37 PM2/26/15
to
On Thursday, February 26, 2015 at 10:16:11 PM UTC+5:30, Sam Raker wrote:
> I'm 100% in favor of expanding Unicode until the sun goes dark. Doing so helps solve the problems affecting speakers of "underserved" languages--access and language preservation. Speakers of Mongolian, Cherokee, Georgian, etc. all deserve to be able to interact with technology in their native languages as much as we speakers of ASCII-friendly languages do. Unicode support also makes writing papers on, dictionaries of, and new texts in such languages much easier, which helps the fight against language extinction, which is a sadly pressing issue.


Agreed -- Correcting the inequities caused by ASCII-bias is a good thing.

In fact the whole point of my post was to say just that by carving out and
focussing on a 'universal' subset of unicode that is considerably larger than
ASCII but smaller than unicode, we stand to reduce ASCII-bias.

As also other posts like
http://blog.languager.org/2014/04/unicoded-python.html
http://blog.languager.org/2014/05/unicode-in-haskell-source.html

However my example listed

> > * Egyptian hieroglyphs
> > * Cuneiform
> > * Shavian
> > * Deseret
> > * Mahjong
> > * Klingon

Ok Chris has corrected me re. Klingon-in-unicode. So lets drop that.
Of the others which do you thing is in 'underserved' category?

More generally which of http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Supplementary_Multilingual_Plane
are underserved?

Chris Angelico

unread,
Feb 26, 2015, 12:29:43 PM2/26/15
to pytho...@python.org
On Fri, Feb 27, 2015 at 4:02 AM, Terry Reedy <tjr...@udel.edu> wrote:
> On 2/26/2015 8:24 AM, Chris Angelico wrote:
>>
>> On Thu, Feb 26, 2015 at 11:40 PM, Rustom Mody <rusto...@gmail.com>
>> wrote:
>>>
>>> Wrote something up on why we should stop using ASCII:
>>> http://blog.languager.org/2015/02/universal-unicode.html
>
>
> I think that the main point of the post, that many Unicode chars are truly
> planetary rather than just national/regional, is excellent.

Agreed. Like you, though, I take exception at the "Gibberish" section.

Unicode offers us a number of types of character needed by linguists:

1) Letters[1] common to many languages, such as the unadorned Latin
and Cyrillic letters
2) Letters specific to one or very few languages, such as the Turkish dotless i
3) Diacritical marks, ready to be combined with various letters
4) Precomposed forms of various common "letter with diacritical" combinations
5) Other precomposed forms, eg ligatures and Hangul syllables
6) Symbols, punctuation, and various other marks
7) Spacing of various widths and attributes

Apart from #4 and #5, which could be avoided by using the decomposed
forms everywhere, each of these character types is vital. You can't
typeset a document without being able to adequately represent every
part of it. Then there are additional characters that aren't strictly
necessary, but are extremely convenient, such as the emoticon
sections. You can talk in text and still put in a nice little picture
of a globe, or the monkey-no-evil set, etc.

Most of these characters - in fact, all except #2 and maybe a few of
the diacritical marks - are used in multiple places/languages. Unicode
isn't about taking everyone's separate character sets and numbering
them all so we can reference characters from anywhere; if you wanted
that, you'd be much better off with something that lets you specify a
code page in 16 bits and a character in 8, which is roughly the same
size as Unicode anyway. What we have is, instead, a system that brings
them all together - LATIN SMALL LETTER A is U+0061 no matter whether
it's being used to write English, French, Malaysian, Turkish,
Croatian, Vietnamese, or Icelandic text. Unicode is truly planetary.

ChrisA

[1] I use the word "letter" loosely here; Chinese and Japanese don't
have a concept of letters as such, but their glyphs are still
represented.

Rustom Mody

unread,
Feb 26, 2015, 12:59:24 PM2/26/15
to
On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> On 2/26/2015 8:24 AM, Chris Angelico wrote:
Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno…
In any case I'd like to stay clear of political(izable) questions


> I think that this part of your post is more 'unprofessional' than the
> character blocks. It is very jarring and seems contrary to your main point.

Ok I need a word for
1. I have no need for this
2. 99.9% of the (living) on this planet also have no need for this

>
> > However, I don't think historians will appreciate you calling all of
> > these "gibberish". To adequately describe and discuss old texts
> > without these Unicode blocks, we'd have to either do everything with
> > images, or craft some kind of reversible transliteration system and
> > have dedicated software to render the texts on screen. Instead, what
> > we have is a well-known and standardized system for transliterating
> > all of these into numbers (code points), and rendering them becomes a
> > simple matter of installing an appropriate font.
> >
> > Also, how does assigning meanings to codepoints "waste storage"? As
> > soon as Unicode 2.0 hit and 16-bit code units stopped being
> > sufficient, everyone needed to allocate storage - either 32 bits per
> > character, or some other system - and the fact that some codepoints
> > were unassigned had absolutely no impact on that. This is decidedly
> > NOT unprofessional, and it's not wasteful either.
>
> I agree.

I clearly am more enthusiastic than knowledgeable about unicode.
But I know my basic CS well enough (as I am sure you and Chris also do)

So I dont get how 4 bytes is not more expensive than 2.
Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
You could use a clever representation like UTF-8 or FSR.
But I dont see how you can get out of this that full-unicode costs more than
exclusive BMP.

eg consider the case of 32 vs 64 bit executables.
The 64 bit executable is generally larger than the 32 bit one
Now consider the case of a machine that has say 2GB RAM and a 64-bit processor.
You could -- I think -- make a reasonable case that all those all-zero hi-address-words are 'waste'.

And youve got the general sense best so far:
> I think that the main point of the post, that many Unicode chars are
> truly planetary rather than just national/regional,

And if the general tone/tenor of what I have written is probably not getting
across by some words (like 'gibberish'?) so I'll try and reword.

However let me try and clarify that the whole of section 5 is 'iffy' with 5.1 being only more extreme. Ive not written these in because the point of that
post is not to criticise unicode but to highlight the universal(isable) parts.

Still if I were to expand on the criticisms here are some examples:

Math-Greek: Consider the math-alpha block
http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block

Now imagine a beginning student not getting the difference between font, glyph,
character. To me this block represents this same error cast into concrete and
dignified by the (supposed) authority of the unicode consortium.

There are probably dozens of other such stupidities like distinguishing kelvin K from latin K as if that is the business of the unicode consortium

My real reservations about unicode come from their work in areas that I happen to know something about

Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ is perhaps ok
However all this stuff http://xahlee.info/comp/unicode_music_symbols.html
makes no sense (to me) given that music (ie standard western music written in staff notation) is inherently 2 dimensional -- multi-voiced, multi-staff, chordal

Sanskrit/Devanagari:
Consists of bogus letters that dont exist in devanagari
The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf
But not here http://en.wikipedia.org/wiki/Devanagari#Vowels
So I call it bogus-devanagari

Contrariwise an important letter in vedic pronunciation the double-udatta is missing
http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html

All of which adds up to the impression that the unicode consortium occasionally fails to do due diligence

In any case all of the above is contrary to /irrelevant to my post which is about
identifying the more universal parts of unicode

Rustom Mody

unread,
Feb 26, 2015, 2:59:38 PM2/26/15
to
On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
> You should add emoticons, but not call them or the above 'gibberish'.

Done -- and of course not under gibberish.
I dont really know much how emoji are used but I understand they are.
JFTR I consider it necessary to be respectful to all (living) people.
For that matter even dead people(s) - no need to be disrespectful to the egyptians who created the hieroglyphs or the sumerians who wrote cuneiform.

I only find it crosses a line when the 2 millenia dead creations are made to take
the space of the living.

Chris wrote:
> * Klingon: Not part of any current standard

Thanks Removed.

wxjm...@gmail.com

unread,
Feb 26, 2015, 3:20:55 PM2/26/15
to
Le jeudi 26 février 2015 18:59:24 UTC+1, Rustom Mody a écrit :
>
> ...To me this block represents this same error cast into concrete and
> dignified by the (supposed) authority of the unicode consortium.
>

Unicode does not prescribe, it registrates.

Eg. The inclusion of
U+1E9E, 'LATIN CAPITAL LETTER SHARP S'
has been officialy proposed by the "German
Federal Government".
(I have a pdf copy somewhere).

Chris Angelico

unread,
Feb 26, 2015, 5:14:18 PM2/26/15
to pytho...@python.org
On Fri, Feb 27, 2015 at 4:59 AM, Rustom Mody <rusto...@gmail.com> wrote:
> On Thursday, February 26, 2015 at 10:33:44 PM UTC+5:30, Terry Reedy wrote:
>> I think that this part of your post is more 'unprofessional' than the
>> character blocks. It is very jarring and seems contrary to your main point.
>
> Ok I need a word for
> 1. I have no need for this
> 2. 99.9% of the (living) on this planet also have no need for this

So what, seven million people need it? Sounds pretty useful to me. And
your figure is an exaggeration; a lot more people than that use
emoji/emoticons.

>> > Also, how does assigning meanings to codepoints "waste storage"? As
>> > soon as Unicode 2.0 hit and 16-bit code units stopped being
>> > sufficient, everyone needed to allocate storage - either 32 bits per
>> > character, or some other system - and the fact that some codepoints
>> > were unassigned had absolutely no impact on that. This is decidedly
>> > NOT unprofessional, and it's not wasteful either.
>>
>> I agree.
>
> I clearly am more enthusiastic than knowledgeable about unicode.
> But I know my basic CS well enough (as I am sure you and Chris also do)
>
> So I dont get how 4 bytes is not more expensive than 2.
> Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
> You could use a clever representation like UTF-8 or FSR.
> But I dont see how you can get out of this that full-unicode costs more than
> exclusive BMP.

Sure, UCS-2 is cheaper than the current Unicode spec. But Unicode 2.0
was when that changed, and the change was because 65536 characters
clearly wouldn't be enough - and that was due to the number of
characters needed for other things than those you're complaining
about. Every spec since then has not changed anything that affects
storage. There are still, today, quite a lot of unallocated blocks of
characters (we're really using only about two planes' worth so far,
maybe three), but even if Unicode specified just two planes of 64K
characters each, you wouldn't be able to save much on transmission
(UTF-8 is already flexible and uses only what you need; if a future
Unicode spec allows 64K planes, UTF-8 transmission will cost exactly
the same for all existing characters), and on an eight-bit-byte
system, the very best you'll be able to do is three bytes - which you
can do today, too; you already know 21 bits will do. So since the BMP
was proven insufficient (back in 1996), no subsequent changes have had
any costs in storage.

> Still if I were to expand on the criticisms here are some examples:
>
> Math-Greek: Consider the math-alpha block
> http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block
>
> Now imagine a beginning student not getting the difference between font, glyph,
> character. To me this block represents this same error cast into concrete and
> dignified by the (supposed) authority of the unicode consortium.
>
> There are probably dozens of other such stupidities like distinguishing kelvin K from latin K as if that is the business of the unicode consortium

A lot of these kinds of characters come from a need to unambiguously
transliterate text stored in other encodings. I don't personally
profess to understand the reasoning behind the various
indistinguishable characters, but I'm aware that there are a lot of
tricky questions to be decided; and if once the Consortium decides to
allocate a character, that character must remain forever allocated.

> My real reservations about unicode come from their work in areas that I happen to know something about
>
> Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪ ♫ is perhaps ok
> However all this stuff http://xahlee.info/comp/unicode_music_symbols.html
> makes no sense (to me) given that music (ie standard western music written in staff notation) is inherently 2 dimensional -- multi-voiced, multi-staff, chordal

The placement on the page is up to the display library. You can
produce a PDF that places the note symbols at their correct positions,
and requires no images to render sheet music.

> Sanskrit/Devanagari:
> Consists of bogus letters that dont exist in devanagari
> The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf
> But not here http://en.wikipedia.org/wiki/Devanagari#Vowels
> So I call it bogus-devanagari
>
> Contrariwise an important letter in vedic pronunciation the double-udatta is missing
> http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html
>
> All of which adds up to the impression that the unicode consortium occasionally fails to do due diligence

Which proves that they're not perfect. Don't forget, they can always
add more characters later.

ChrisA

Steven D'Aprano

unread,
Feb 26, 2015, 6:09:55 PM2/26/15
to
Chris Angelico wrote:

> Unicode
> isn't about taking everyone's separate character sets and numbering
> them all so we can reference characters from anywhere; if you wanted
> that, you'd be much better off with something that lets you specify a
> code page in 16 bits and a character in 8, which is roughly the same
> size as Unicode anyway.

Well, except for the approximately 25% of people in the world whose native
language has more than 256 characters.

It sounds like you are referring to some sort of "shift code" system. Some
legacy East Asian encodings use a similar scheme, and depending on how they
are implemented they have great disadvantages. For example, Shift-JIS
suffers from a number of weaknesses including that a single byte corrupted
in transmission can cause large swaths of the following text to be
corrupted. With Unicode, a single corrupted byte can only corrupt a single
code point.


--
Steven

Chris Angelico

unread,
Feb 26, 2015, 6:23:45 PM2/26/15
to pytho...@python.org
On Fri, Feb 27, 2015 at 10:09 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> Chris Angelico wrote:
>
>> Unicode
>> isn't about taking everyone's separate character sets and numbering
>> them all so we can reference characters from anywhere; if you wanted
>> that, you'd be much better off with something that lets you specify a
>> code page in 16 bits and a character in 8, which is roughly the same
>> size as Unicode anyway.
>
> Well, except for the approximately 25% of people in the world whose native
> language has more than 256 characters.

You could always allocate multiple code pages to one language. But
since I'm not advocating this system, I'm only guessing at solutions
to its problems.

> It sounds like you are referring to some sort of "shift code" system. Some
> legacy East Asian encodings use a similar scheme, and depending on how they
> are implemented they have great disadvantages. For example, Shift-JIS
> suffers from a number of weaknesses including that a single byte corrupted
> in transmission can cause large swaths of the following text to be
> corrupted. With Unicode, a single corrupted byte can only corrupt a single
> code point.

That's exactly what I was hinting at. There are plenty of systems like
that, and they are badly flawed compared to a simple universal system
for a number of reasons. One is the corruption issue you mention;
another is that a simple memory-based text search becomes utterly
useless (to locate text in a document, you'd need to do a whole lot of
stateful parsing - not to mention the difficulties of doing
"similar-to" searches across languages); concatenation of text also
becomes a stateful operation, and so do all sorts of other simple
manipulations. Unicode may demand a bit more storage in certain
circumstances (where an eight-bit encoding might have handled your
entire document), but it's so much easier for the general case.

ChrisA

Steven D'Aprano

unread,
Feb 26, 2015, 8:05:38 PM2/26/15
to
Rustom Mody wrote:

> Emoticons (or is it emoji) seems to have some (regional?) takeup?? Dunno…
> In any case I'd like to stay clear of political(izable) questions

Emoji is the term used in Japan, gradually spreading to the rest of the
word. Emoticons, I believe, should be restricted to the practice of using
ASCII-only digraphs and trigraphs such as :-) (colon, hyphen, right-parens)
to indicate "smileys".

I believe that emoji will eventually lead to Unicode's victory. People will
want smileys and piles of poo on their mobile phones, and from there it
will gradually spread to everywhere. All they need to do to make victory
inevitable is add cartoon genitals...


>> I think that this part of your post is more 'unprofessional' than the
>> character blocks. It is very jarring and seems contrary to your main
>> point.
>
> Ok I need a word for
> 1. I have no need for this
> 2. 99.9% of the (living) on this planet also have no need for this

0.1% of the living is seven million people. I'll tell you what, you tell me
which seven million people should be relegated to second-class status, and
I'll tell them where you live.

:-)


[...]
> I clearly am more enthusiastic than knowledgeable about unicode.
> But I know my basic CS well enough (as I am sure you and Chris also do)
>
> So I dont get how 4 bytes is not more expensive than 2.

Obviously it is. But it's only twice as expensive, and in computer science
terms that counts as "close enough". It's quite common for data structures
to "waste" space by using "no more than twice as much space as needed",
e.g. Python dicts and lists.

The whole Unicode range U+0000 to U+10FFFF needs only 21 bits, which fits
into three bytes. Nevertheless, there's no three-byte UTF encoding, because
on modern hardware it is more efficient to "waste" an entire extra byte per
code point and deal with an even multiple of bytes.


> Yeah I know you can squeeze a unicode char into 3 bytes or even 21 bits
> You could use a clever representation like UTF-8 or FSR.
> But I dont see how you can get out of this that full-unicode costs more
> than exclusive BMP.

Are you missing a word there? Costs "no more" perhaps?


> eg consider the case of 32 vs 64 bit executables.
> The 64 bit executable is generally larger than the 32 bit one
> Now consider the case of a machine that has say 2GB RAM and a 64-bit
> processor. You could -- I think -- make a reasonable case that all those
> all-zero hi-address-words are 'waste'.

Sure. The whole point of 64-bit processors is to enable the use of more than
2GB of RAM. One might as well say that using 32-bit processors is wasteful
if you only have 64K of memory. Yes it is, but the only things which use
16-bit or 8-bit processors these days are embedded devices.


[...]
> Math-Greek: Consider the math-alpha block
>
http://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode#Mathematical_Alphanumeric_Symbols_block
>
> Now imagine a beginning student not getting the difference between font,
> glyph,
> character. To me this block represents this same error cast into concrete
> and dignified by the (supposed) authority of the unicode consortium.

Not being privy to the internal deliberations of the Consortium, it is
sometimes difficult to tell why two symbols are sometimes declared to be
mere different glyphs for the same character, and other times declared to
be worthy of being separate characters.

E.g. I think we should all agree that the English "A" and the French "A"
shouldn't count as separate characters, although the Greek "Α" and
Russian "А" do.

In the case of the maths symbols, it isn't obvious to me what the deciding
factors were. I know that one of the considerations they use is to consider
whether or not users of the symbols have a tradition of treating the
symbols as mere different glyphs, i.e. stylistic variations. In this case,
I'm pretty sure that mathematicians would *not* consider:

U+2115 DOUBLE-STRUCK CAPITAL N "ℕ"
U+004E LATIN CAPITAL LETTER N "N"

as mere stylistic variations. If you defined a matrix called ℕ, you would
probably be told off for using the wrong symbol, not for using the wrong
formatting.

On the other hand, I'm not so sure about

U+210E PLANCK CONSTANT "ℎ"

versus a mere lowercase h (possibly in italic).


> There are probably dozens of other such stupidities like distinguishing
> kelvin K from latin K as if that is the business of the unicode consortium

But it *is* the business of the Unicode consortium. They have at least two
important aims:

- to be able to represent every possible human-language character;

- to allow lossless round-trip conversion to all existing legacy encodings
(for the subset of Unicode handled by that encoding).


The second reason is why Unicode includes code points for degree-Celsius and
degree-Fahrenheit, rather than just using °C and °F like sane people.
Because some idiot^W code-page designer back in the 1980s or 90s decided to
add single character ℃ and ℉. So now Unicode has to be able to round-trip
(say) "°C℃" without loss.

I imagine that the same applies to U+212A KELVIN SIGN K.


> My real reservations about unicode come from their work in areas that I
> happen to know something about
>
> Music: To put music simply as a few mostly-meaningless 'dingbats' like ♩ ♪
> ♫ is perhaps ok However all this stuff
> http://xahlee.info/comp/unicode_music_symbols.html
> makes no sense (to me) given that music (ie standard western music written
> in staff notation) is inherently 2 dimensional -- multi-voiced,
> multi-staff, chordal

(1) Text can also be two dimensional.
(2) Where you put the symbol on the page is a separate question from whether
or not the symbol exists.


> Consists of bogus letters that dont exist in devanagari
> The letter ऄ (0904) is found here http://unicode.org/charts/PDF/U0900.pdf
> But not here http://en.wikipedia.org/wiki/Devanagari#Vowels
> So I call it bogus-devanagari

Hmm, well I love Wikipedia as much as the next guy, but I think that even
Jimmy Wales would suggest that Wikipedia is not a primary source for what
counts as Devanagari vowels. What makes you think that Wikipedia is right
and Unicode is wrong?

That's not to say that Unicode hasn't made some mistakes. There are a few
deprecated code points, or code points that have been given the wrong name.
Oops. Mistakes happen.


> Contrariwise an important letter in vedic pronunciation the double-udatta
> is missing
>
http://list.indology.info/pipermail/indology_list.indology.info/2000-April/021070.html

I quote:


I do not see any need for a "double udaatta". Perhaps "double
ANudaatta" is meant here?


I don't know Sanskrit, but if somebody suggested that Unicode doesn't
support English because the important letter "double-oh" (as
in "moon", "spoon", "croon" etc.) was missing, I wouldn't be terribly
impressed. We have a "double-u" letter, why not "double-oh"?


Another quote:

I should strongly recommend not to hurry with a standardization
proposal until the text collection of Vedic texts has been finished


In other words, even the experts in Vedic texts don't yet know all the
characters which they may or may not need.



--
Steven

Dave Angel

unread,
Feb 26, 2015, 8:58:13 PM2/26/15
to pytho...@python.org
On 02/26/2015 08:05 PM, Steven D'Aprano wrote:
> Rustom Mody wrote:
>

>
>> eg consider the case of 32 vs 64 bit executables.
>> The 64 bit executable is generally larger than the 32 bit one
>> Now consider the case of a machine that has say 2GB RAM and a 64-bit
>> processor. You could -- I think -- make a reasonable case that all those
>> all-zero hi-address-words are 'waste'.
>
> Sure. The whole point of 64-bit processors is to enable the use of more than
> 2GB of RAM. One might as well say that using 32-bit processors is wasteful
> if you only have 64K of memory. Yes it is, but the only things which use
> 16-bit or 8-bit processors these days are embedded devices.

But the 2gig means electrical address lines out of the CPU are wasted,
not address space. A 64 bit processor and 64bit OS means you can have
more than 4gig in a process space, even if over half of it has to be in
the swap file. Linear versus physical makes a big difference.

(Although I believe Seymour Cray was quoted as saying that virtual
memory is a crock, because "you can't fake what you ain't got.")




--
DaveA

Steven D'Aprano

unread,
Feb 27, 2015, 12:58:56 AM2/27/15
to
Dave Angel wrote:

> (Although I believe Seymour Cray was quoted as saying that virtual
> memory is a crock, because "you can't fake what you ain't got.")

If I recall correctly, disk access is about 10000 times slower than RAM, so
virtual memory is *at least* that much slower than real memory.



--
Steven

Dave Angel

unread,
Feb 27, 2015, 2:31:10 AM2/27/15
to pytho...@python.org
It's so much more complicated than that, that I hardly know where to
start. I'll describe a generic processor/OS/memory/disk architecture;
there will be huge differences between processor models even from a
single manufacturer.

First, as soon as you add swapping logic to your
processor/memory-system, you theoretically slow it down. And in the
days of that quote, Cray's memory was maybe 50 times as fast as the
memory used by us mortals. So adding swapping logic would have slowed
it down quite substantially, even when it was not swapping. But that
logic is inside the CPU chip these days, and presumably thoroughly
optimized.

Next, statistically, a program uses a small subset of its total program
& data space in its working set, and the working set should reside in
real memory. But when the program greatly increases that working set,
and it approaches the amount of physical memory, then swapping becomes
more frenzied, and we say the program is thrashing. Simple example, try
sorting an array that's about the size of available physical memory.

Next, even physical memory is divided into a few levels of caching, some
on-chip and some off. And the caching is done in what I call strips,
where accessing just one byte causes the whole strip to be loaded from
non-cached memory. I forget the current size for that, but it's maybe
64 to 256 bytes or so.

If there are multiple processors (not multicore, but actual separate
processors), then each one has such internal caches, and any writes on
one processor may have to trigger flushes of all the other processors
that happen to have the same strip loaded.

The processor not only prefetches the next few instructions, but decodes
and tentatively executes them, subject to being discarded if a
conditional branch doesn't go the way the processor predicted. So some
instructions execute in zero time, some of the time.

Every address of instruction fetch, or of data fetch or store, goes
through a couple of layers of translation. Segment register plus offset
gives linear address. Lookup those in tables to get physical address,
and if table happens not to be in on-chip cache, swap it in. If
physical address isn't valid, a processor exception causes the OS to
potentially swap something out, and something else in.

Once we're paging from the swapfile, the size of the read is perhaps 4k.
And that read is regardless of whether we're only going to use one
byte or all of it.

The ratio between an access which was in the L1 cache and one which
required a page to be swapped in from disk? Much bigger than your
10,000 figure. But hopefully it doesn't happen a big percentage of the
time.

Many, many other variables, like the fact that RAM chips are not
directly addressable by bytes, but instead count on rows and columns.
So if you access many bytes in the same row, it can be much quicker than
random access. So simple access time specifications don't mean as much
as it would seem; the controller has to balance the RAM spec with the
various cache requirements.
--
DaveA