Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Python usage numbers
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 112 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Eric Snow  
View profile  
 More options Feb 11, 4:02 pm
Newsgroups: comp.lang.python
From: Eric Snow <ericsnowcurren...@gmail.com>
Date: Sat, 11 Feb 2012 14:02:47 -0700
Local: Sat, Feb 11 2012 4:02 pm
Subject: Python usage numbers
Does anyone have (or know of) accurate totals and percentages on how
Python is used?  I'm particularly interested in the following
groupings:

- new development vs. stable code-bases
- categories (web, scripts, "big data", computation, etc.)
- "bare metal" vs. on top of some framework
- regional usage

I'm thinking about this partly because of the discussion on
python-ideas about the perceived challenges of Unicode in Python 3.
All the rhetoric, anecdotal evidence, and use-cases there have little
meaning to me, in regards to Python as a whole, without an
understanding of who is actually affected.

For instance, if frameworks (like django and numpy) could completely
hide the arguable challenges of Unicode in Python 3--and most projects
were built on top of frameworks--then general efforts for making
Unicode easier in Python 3 should go toward helping framework writers.

Not only are such usage numbers useful for the Unicode discussion
(which I wish would get resolved and die so we could move on to more
interesting stuff :) ).  They help us know where efforts could be
focused in general to make Python more powerful and easier to use
where it's already used extensively.  They can show us the areas that
Python isn't used much, thus exposing a targeted opportunity to change
that.

Realistically, it's not entirely feasible to compile such information
at a comprehensive level, but even generally accurate numbers would be
a valuable resource.  If the numbers aren't out there, what would some
good approaches to discovering them?  Thanks!

-eric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Stefan Behnel  
View profile  
 More options Feb 11, 4:28 pm
Newsgroups: comp.lang.python
From: Stefan Behnel <stefan...@behnel.de>
Date: Sat, 11 Feb 2012 22:28:01 +0100
Local: Sat, Feb 11 2012 4:28 pm
Subject: Re: Python usage numbers
Eric Snow, 11.02.2012 22:02:

> - categories (web, scripts, "big data", computation, etc.)

No numbers, but from my stance, the four largest areas where Python is used
appear to be (in increasing line length order):

a) web applications
b) scripting and tooling
c) high-performance computation
d) testing (non-Python/embedded/whatever code)

I'm sure others will manage to remind me of the one or two I forgot...

Stefan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Berg  
View profile  
 More options Feb 11, 4:51 pm
Newsgroups: comp.lang.python
From: Andrew Berg <bahamutzero8...@gmail.com>
Date: Sat, 11 Feb 2012 15:51:49 -0600
Local: Sat, Feb 11 2012 4:51 pm
Subject: Re: Python usage numbers
On 2/11/2012 3:02 PM, Eric Snow wrote:

> I'm thinking about this partly because of the discussion on
> python-ideas about the perceived challenges of Unicode in Python 3.
> For instance, if frameworks (like django and numpy) could completely
> hide the arguable challenges of Unicode in Python 3--and most projects
> were built on top of frameworks--then general efforts for making
> Unicode easier in Python 3 should go toward helping framework writers.

Huh? I'll admit I'm a novice, but isn't Unicode mostly trivial in py3k
compared to 2.x? Or are you referring to porting 2.x to 3.x? I've been
under the impression that Unicode in 2.x can be painful at times, but
easy in 3.x.
I've been using 3.2 and Unicode hasn't been much of an issue.
--
CPython 3.2.2 | Windows NT 6.1.7601.17640

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Lawrence  
View profile  
 More options Feb 11, 5:17 pm
Newsgroups: comp.lang.python
From: Mark Lawrence <breamore...@yahoo.co.uk>
Date: Sat, 11 Feb 2012 22:17:20 +0000
Local: Sat, Feb 11 2012 5:17 pm
Subject: Re: Python usage numbers
On 11/02/2012 21:02, Eric Snow wrote:

As others have said on other Python newsgroups it ain't a problem.  The
only time I've ever had a problem was with matplotlib which couldn't
print a £ sign.  I used a U to enforce unicode job done.  If I had a
major problem I reckon that a search on c.l.p would give me an answer
easy peasy.

--
Cheers.

Mark Lawrence.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric Snow  
View profile  
 More options Feb 11, 8:21 pm
Newsgroups: comp.lang.python
From: Eric Snow <ericsnowcurren...@gmail.com>
Date: Sat, 11 Feb 2012 18:21:01 -0700
Local: Sat, Feb 11 2012 8:21 pm
Subject: Re: Python usage numbers

On Sat, Feb 11, 2012 at 2:51 PM, Andrew Berg <bahamutzero8...@gmail.com> wrote:
> On 2/11/2012 3:02 PM, Eric Snow wrote:
>> I'm thinking about this partly because of the discussion on
>> python-ideas about the perceived challenges of Unicode in Python 3.

>> For instance, if frameworks (like django and numpy) could completely
>> hide the arguable challenges of Unicode in Python 3--and most projects
>> were built on top of frameworks--then general efforts for making
>> Unicode easier in Python 3 should go toward helping framework writers.
> Huh? I'll admit I'm a novice, but isn't Unicode mostly trivial in py3k
> compared to 2.x? Or are you referring to porting 2.x to 3.x? I've been
> under the impression that Unicode in 2.x can be painful at times, but
> easy in 3.x.
> I've been using 3.2 and Unicode hasn't been much of an issue.

My expectation is that yours is the common experience.  However, in at
least one current thread (on python-ideas) and at a variety of times
in the past, _some_ people have found Unicode in Python 3 to make more
work.  So that got me to thinking about who's experience is the
general case, and if any concerns broadly apply to more that
framework/library writers (like django, jinja, twisted, etc.).  Having
usage statistics would be helpful in identifying the impact of things
like Unicode in Python 3.

-eric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Angelico  
View profile  
 More options Feb 11, 8:28 pm
Newsgroups: comp.lang.python
From: Chris Angelico <ros...@gmail.com>
Date: Sun, 12 Feb 2012 12:28:30 +1100
Local: Sat, Feb 11 2012 8:28 pm
Subject: Re: Python usage numbers

On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow <ericsnowcurren...@gmail.com> wrote:
> However, in at
> least one current thread (on python-ideas) and at a variety of times
> in the past, _some_ people have found Unicode in Python 3 to make more
> work.

If Unicode in Python is causing you more work, isn't it most likely
that the issue would have come up anyway? For instance, suppose you
have a web form and you accept customer names, which you then store in
a database. You could assume that the browser submits it in UTF-8 and
that your database back-end can accept UTF-8, and then pretend that
it's all ASCII, but if you then want to upper-case the name for a
heading, somewhere you're going to needto deal with Unicode; and when
your programming language has facilities like str.upper(), that's
going to make it easier, not later. Sure, the simple case is easier if
you pretend it's all ASCII, but it's still better to have language
facilities.

ChrisA


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eric Snow  
View profile  
 More options Feb 11, 8:38 pm
Newsgroups: comp.lang.python
From: Eric Snow <ericsnowcurren...@gmail.com>
Date: Sat, 11 Feb 2012 18:38:53 -0700
Local: Sat, Feb 11 2012 8:38 pm
Subject: Re: Python usage numbers

Yeah, that's how I see it too.  However, my sample size is much too
small to have any sense of the broader Python 3 experience.  That's
what I'm going for with those Python usage statistics (if it's even
feasible).

-eric


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven D'Aprano  
View profile  
 More options Feb 11, 9:23 pm
Newsgroups: comp.lang.python
From: Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>
Date: 12 Feb 2012 02:23:24 GMT
Local: Sat, Feb 11 2012 9:23 pm
Subject: Re: Python usage numbers

On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote:
> On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow
> <ericsnowcurren...@gmail.com> wrote:
>> However, in at
>> least one current thread (on python-ideas) and at a variety of times in
>> the past, _some_ people have found Unicode in Python 3 to make more
>> work.

> If Unicode in Python is causing you more work, isn't it most likely that
> the issue would have come up anyway?

The argument being made is that in Python 2, if you try to read a file
that contains Unicode characters encoded with some unknown codec, you
don't have to think about it. Sure, you get moji-bake rubbish in your
database, but that's the fault of people who insist on not being
American. Or who spell Zoe with an umlaut.

In Python 3, if you try the same thing, you get an error. Fixing the
error requires thought, and even if that is only a minuscule amount of
thought, that's too much for some developers who are scared of Unicode.
Hence the FUD that Python 3 is too hard because it makes you learn
Unicode.

I know this isn't exactly helpful, but I wish they'd just HTFU. I'm with
Joel Spolsky on this one: if you're a programmer in 2003 who doesn't have
at least a basic working knowledge of Unicode, you're the equivalent of a
doctor who doesn't believe in germs.

http://www.joelonsoftware.com/articles/Unicode.html

Learning a basic working knowledge of Unicode is not that hard. You don't
need to be an expert, and it's just not that scary.

The use-case given is:

"I have a file containing text. I can open it in an editor and see it's
nearly all ASCII text, except for a few weird and bizarre characters like
£ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
error. What should I do that requires no thought?"

Obvious answers:

- Try decoding with UTF8 or Latin1. Even if you don't get the right
characters, you'll get *something*.

- Use open(filename, encoding='ascii', errors='surrogateescape')

(Or possibly errors='ignore'.)

--
Steven


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rick Johnson  
View profile  
 More options Feb 11, 9:36 pm
Newsgroups: comp.lang.python
From: Rick Johnson <rantingrickjohn...@gmail.com>
Date: Sat, 11 Feb 2012 18:36:52 -0800 (PST)
Local: Sat, Feb 11 2012 9:36 pm
Subject: Re: Python usage numbers
On Feb 11, 8:23 pm, Steven D'Aprano <steve

That's not the worst of it... i have many times had a block of text
that was valid ASCII except for some intermixed Unicode white-space.
Who the hell would even consider inserting Unicode white-space!!!

> "I have a file containing text. I can open it in an editor and see it's
> nearly all ASCII text, except for a few weird and bizarre characters like
> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
> error. What should I do that requires no thought?"

> Obvious answers:

the most obvious answer would be to read the file WITHOUT worrying
about asinine encoding.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Angelico  
View profile  
 More options Feb 11, 11:38 pm
Newsgroups: comp.lang.python
From: Chris Angelico <ros...@gmail.com>
Date: Sun, 12 Feb 2012 15:38:37 +1100
Local: Sat, Feb 11 2012 11:38 pm
Subject: Re: Python usage numbers
On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson

<rantingrickjohn...@gmail.com> wrote:
> On Feb 11, 8:23 pm, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
>> "I have a file containing text. I can open it in an editor and see it's
>> nearly all ASCII text, except for a few weird and bizarre characters like
>> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
>> error. What should I do that requires no thought?"

>> Obvious answers:

> the most obvious answer would be to read the file WITHOUT worrying
> about asinine encoding.

What this statement misunderstands, though, is that ASCII is itself an
encoding. Files contain bytes, and it's only what's external to those
bytes that gives them meaning. The famous "bush hid the facts" trick
with Windows Notepad shows the folly of trying to use internal
evidence to identify meaning from bytes.

Everything that displays text to a human needs to translate bytes into
glyphs, and the usual way to do this conceptually is to go via
characters. Pretending that it's all the same thing really means
pretending that one byte represents one character and that each
character is depicted by one glyph. And that's doomed to failure,
unless everyone speaks English with no foreign symbols - so, no
mathematical notations.

ChrisA


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven D'Aprano  
View profile  
 More options Feb 12, 12:51 am
Newsgroups: comp.lang.python
From: Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>
Date: 12 Feb 2012 05:51:03 GMT
Local: Sun, Feb 12 2012 12:51 am
Subject: Re: Python usage numbers

On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:
> Everything that displays text to a human needs to translate bytes into
> glyphs, and the usual way to do this conceptually is to go via
> characters. Pretending that it's all the same thing really means
> pretending that one byte represents one character and that each
> character is depicted by one glyph. And that's doomed to failure, unless
> everyone speaks English with no foreign symbols - so, no mathematical
> notations.

Pardon me, but you can't even write *English* in ASCII.

You can't say that it cost you £10 to courier your résumé to the head
office of Encyclopædia Britanica to apply for the position of Staff
Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
and old-fashioned, but it is traditional English.)

Hell, you can't even write in *American*: you can't say that the recipe
for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.

ASCII truly is a blight on the world, and the sooner it fades into
obscurity, like EBCDIC, the better.

Even if everyone did change to speak ASCII, you still have all the
historical records and documents and files to deal with. Encodings are
not going away.

--
Steven


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chris Angelico  
View profile  
 More options Feb 12, 1:08 am
Newsgroups: comp.lang.python
From: Chris Angelico <ros...@gmail.com>
Date: Sun, 12 Feb 2012 17:08:24 +1100
Local: Sun, Feb 12 2012 1:08 am
Subject: Re: Python usage numbers
On Sun, Feb 12, 2012 at 4:51 PM, Steven D'Aprano

<steve+comp.lang.pyt...@pearwood.info> wrote:
> You can't say that it cost you £10 to courier your résumé to the head
> office of Encyclopædia Britanica to apply for the position of Staff
> Coördinator.

True, but if it cost you $10 (or 10 GBP) to courier your curriculum
vitae to the head office of Encyclopaedia Britannica to become Staff
Coordinator, then you'd be fine. And if it cost you $10 to post your
work summary to Britannica's administration to apply for this Staff
Coordinator position, you could say it without 'e' too. Doesn't mean
you don't need Unicode!

ChrisA


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven D'Aprano  
View profile  
 More options Feb 12, 1:10 am
Newsgroups: comp.lang.python
From: Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>
Date: 12 Feb 2012 06:10:20 GMT
Local: Sun, Feb 12 2012 1:10 am
Subject: Re: Python usage numbers

On Sat, 11 Feb 2012 18:36:52 -0800, Rick Johnson wrote:
>> "I have a file containing text. I can open it in an editor and see it's
>> nearly all ASCII text, except for a few weird and bizarre characters
>> like £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I
>> get an error. What should I do that requires no thought?"

>> Obvious answers:

> the most obvious answer would be to read the file WITHOUT worrying about
> asinine encoding.

Your mad leet reading comprehension skillz leave me in awe Rick.

If you try to read a file containing non-ASCII characters encoded using
UTF8 on Windows without explicitly specifying either UTF8 as the
encoding, or an error handler, you will get an exception.

It's not just UTF8 either, but nearly all encodings. You can't even
expect to avoid problems if you stick to nothing but Windows, because
Windows' default encoding is localised: a file generated in (say) Israel
or Japan or Germany will use a different code page (encoding) by default
than one generated in (say) the US, Canada or UK.

--
Steven


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Berg  
View profile  
 More options Feb 12, 2:05 am
Newsgroups: comp.lang.python
From: Andrew Berg <bahamutzero8...@gmail.com>
Date: Sun, 12 Feb 2012 01:05:35 -0600
Local: Sun, Feb 12 2012 2:05 am
Subject: Re: Python usage numbers
On 2/12/2012 12:10 AM, Steven D'Aprano wrote:
> It's not just UTF8 either, but nearly all encodings. You can't even
> expect to avoid problems if you stick to nothing but Windows, because
> Windows' default encoding is localised: a file generated in (say) Israel
> or Japan or Germany will use a different code page (encoding) by default
> than one generated in (say) the US, Canada or UK.

Generated by what? Windows will store a locale value for programs to
use, but programs use Unicode internally by default (i.e., API calls are
Unicode unless they were built for old versions of Windows), and the
default filesystem (NTFS) uses Unicode for file names. AFAIK, only the
terminal has a localized code page by default.
Perhaps Notepad will write text files with the localized code page by
default, but that's an application choice...

--
CPython 3.2.2 | Windows NT 6.1.7601.17640


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matej Cepl  
View profile  
 More options Feb 12, 3:14 am
Newsgroups: comp.lang.python
From: Matej Cepl <mc...@redhat.com>
Date: Sun, 12 Feb 2012 09:14:44 +0100
Local: Sun, Feb 12 2012 3:14 am
Subject: Re: Python usage numbers
On 12.2.2012 03:23, Steven D'Aprano wrote:

> The use-case given is:

> "I have a file containing text. I can open it in an editor and see it's
> nearly all ASCII text, except for a few weird and bizarre characters like
> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
> error. What should I do that requires no thought?"

> Obvious answers:

> - Try decoding with UTF8 or Latin1. Even if you don't get the right
> characters, you'll get *something*.

> - Use open(filename, encoding='ascii', errors='surrogateescape')

> (Or possibly errors='ignore'.)

These are not good answer, IMHO. The only answer I can think of, really, is:

- pack you luggage, your submarine waits on you to peel onions in it
(with reference to the Joel's article). Meaning, really, you should
learn your craft and pull up your head from the sand. There is a wider
world around you.

(and yes, I am a Czech, so I need at least latin-2 for my language).

Best,

Matěj


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matej Cepl  
View profile  
 More options Feb 12, 3:26 am
Newsgroups: comp.lang.python
From: Matej Cepl <mc...@redhat.com>
Date: Sun, 12 Feb 2012 09:26:57 +0100
Local: Sun, Feb 12 2012 3:26 am
Subject: Re: Python usage numbers
On 12.2.2012 09:14, Matej Cepl wrote:

>> Obvious answers:

>> - Try decoding with UTF8 or Latin1. Even if you don't get the right
>> characters, you'll get *something*.

>> - Use open(filename, encoding='ascii', errors='surrogateescape')

>> (Or possibly errors='ignore'.)

> These are not good answer, IMHO. The only answer I can think of, really,
> is:

Slightly less flameish answer to the question “What should I do,
really?” is a tough one: all these suggested answers are bad because
they don’t deal with the fact, that your input data are obviously
broken. The rest is just pure GIGO … without fixing (and I mean, really,
fixing, not ignoring the problem, which is what the previous answers
suggest) your input, you’ll get garbage on output. And you should be
thankful to py3k that it shown the issue to you.

BTW, can you display the following line?

Příliš žluťoučký kůň úpěl ďábelské ódy.

Best,

Matěj


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steven D'Aprano  
View profile  
 More options Feb 12, 4:12 am
Newsgroups: comp.lang.python
From: Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>
Date: 12 Feb 2012 09:12:57 GMT
Local: Sun, Feb 12 2012 4:12 am
Subject: Re: Python usage numbers

On Sun, 12 Feb 2012 01:05:35 -0600, Andrew Berg wrote:
> On 2/12/2012 12:10 AM, Steven D'Aprano wrote:
>> It's not just UTF8 either, but nearly all encodings. You can't even
>> expect to avoid problems if you stick to nothing but Windows, because
>> Windows' default encoding is localised: a file generated in (say)
>> Israel or Japan or Germany will use a different code page (encoding) by
>> default than one generated in (say) the US, Canada or UK.
> Generated by what? Windows will store a locale value for programs to
> use, but programs use Unicode internally by default

Which programs? And we're not talking about what they use internally, but
what they write to files.

> (i.e., API calls are
> Unicode unless they were built for old versions of Windows), and the
> default filesystem (NTFS) uses Unicode for file names.

No. File systems do not use Unicode for file names. Unicode is an
abstract mapping between code points and characters. File systems are
written using bytes.

Suppose you're a fan of Russian punk bank Наӥв and you have a directory
of their music. The file system doesn't store the Unicode code points
1053 1072 1253 1074, it has to be encoded to a sequence of bytes first.

NTFS by default uses the UTF-16 encoding, which means the actual bytes
written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading
byte-order mark \xff\xfe).

Windows has two separate APIs, one for "wide" characters, the other for
single bytes. Depending on which one you use, the directory will appear
to be called Наӥв or 0å2.

But in any case, we're not talking about the file name encoding. We're
talking about the contents of files.

> AFAIK, only the
> terminal has a localized code page by default. Perhaps Notepad will
> write text files with the localized code page by default, but that's an
> application choice...

Exactly. And unless you know what encoding the application chooses, you
will likely get an exception trying to read the file.

--
Steven


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andrew Berg  
View profile  
 More options Feb 12, 6:11 am
Newsgroups: comp.lang.python
From: Andrew Berg <bahamutzero8...@gmail.com>
Date: Sun, 12 Feb 2012 05:11:30 -0600
Local: Sun, Feb 12 2012 6:11 am
Subject: Re: Python usage numbers
On 2/12/2012 3:12 AM, Steven D'Aprano wrote:
> NTFS by default uses the UTF-16 encoding, which means the actual bytes
> written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading
> byte-order mark \xff\xfe).

That's what I meant. Those bytes will be interpreted consistently across
all locales.

> Windows has two separate APIs, one for "wide" characters, the other for
> single bytes. Depending on which one you use, the directory will appear
> to be called Наӥв or 0å2.

Yes, and AFAIK, the wide API is the default. The other one only exists
to support programs that don't support the wide API (generally, such
programs were intended to be used on older platforms that lack that API).

> But in any case, we're not talking about the file name encoding. We're
> talking about the contents of files.

Okay then. As I stated, this has nothing to do with the OS since
programs are free to interpret bytes any way they like.

--
CPython 3.2.2 | Windows NT 6.1.7601.17640


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Lawrence  
View profile  
 More options Feb 12, 7:11 am
Newsgroups: comp.lang.python
From: Mark Lawrence <breamore...@yahoo.co.uk>
Date: Sun, 12 Feb 2012 12:11:01 +0000
Local: Sun, Feb 12 2012 7:11 am
Subject: Re: Python usage numbers
On 12/02/2012 08:26, Matej Cepl wrote:

Yes in Thunderbird, Notepad, Wordpad and Notepad++ on Windows Vista,
can't be bothered to try any other apps.

--
Cheers.

Mark Lawrence.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roy Smith  
View profile  
 More options Feb 12, 10:13 am
Newsgroups: comp.lang.python
From: Roy Smith <r...@panix.com>
Date: Sun, 12 Feb 2012 10:13:07 -0500
Local: Sun, Feb 12 2012 10:13 am
Subject: Re: Python usage numbers
In article <mailman.5715.1329021524.27778.python-l...@python.org>,
 Chris Angelico <ros...@gmail.com> wrote:

Exactly.  <soapbox class="wise-old-geezer">.  ASCII was so successful at
becoming a universal standard which lasted for decades, people who grew
up with it don't realize there was once any other way.  Not just EBCDIC,
but also SIXBIT, RAD-50, tilt/rotate, packed card records, and so on.  
Transcoding was a way of life, and if you didn't know what you were
starting with and aiming for, it was hopeless.  Kind of like now where
we are again with Unicode.  </soapbox>

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roy Smith  
View profile  
 More options Feb 12, 10:48 am
Newsgroups: comp.lang.python
From: Roy Smith <r...@panix.com>
Date: Sun, 12 Feb 2012 10:48:36 -0500
Local: Sun, Feb 12 2012 10:48 am
Subject: Re: Python usage numbers
In article <4f375347$0$29986$c3e8da3$54964...@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote:

> ASCII truly is a blight on the world, and the sooner it fades into
> obscurity, like EBCDIC, the better.

That's a fair statement, but it's also fair to say that at the time it
came out (49 years ago!) it was a revolutionary improvement on the
extant state of affairs (every manufacturer inventing their own code,
and often different codes for different machines).  Given the cost of
both computer memory and CPU cycles at the time, sticking to a 7-bit
code (the 8th bit was for parity) was a necessary evil.

As Steven D'Aprano pointed out, it was missing some commonly used US
symbols such as ¢ or ©.  This was a small price to pay for the
simplicity ASCII afforded.  It wasn't a bad encoding.  I was a very good
encoding.  But the world has moved on and computing hardware has become
cheap enough that supporting richer encodings and character sets is
realistic.

And, before people complain about the character set being US-Centric,
keep in mind that the A in ASCII stands for American, and it was
published by ANSI (whose A also stands for American).  I'm not trying to
wave the flag here, just pointing out that it was never intended to be
anything other than a national character set.

Part of the complexity of Unicode is that when people switch from
working with ASCII to working with Unicode, they're really having to
master two distinct things at the same time (and often conflate them
into a single confusing mess).  One is the Unicode character set.  The
other is a specific encoding (UTF-8, UTF-16, etc).  Not to mention silly
things like BOM (Byte Order Mark).  I expect that some day, storage
costs will become so cheap that we'll all just be using UTF-32, and
programmers of the day will wonder how their poor parents and
grandparents ever managed in a world where nobody quite knew what you
meant when you asked, "how long is that string?".


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sommers  
View profile  
 More options Feb 12, 10:55 am
Newsgroups: comp.lang.python
From: Dan Sommers <d...@tombstonezero.net>
Date: Sun, 12 Feb 2012 15:55:17 +0000 (UTC)
Local: Sun, Feb 12 2012 10:55 am
Subject: Re: Python usage numbers

On Sun, 12 Feb 2012 17:08:24 +1100, Chris Angelico wrote:
> On Sun, Feb 12, 2012 at 4:51 PM, Steven D'Aprano
> <steve+comp.lang.pyt...@pearwood.info> wrote:
>> You can't say that it cost you £10 to courier your résumé to the head
>> office of Encyclopædia Britanica to apply for the position of Staff
>> Coördinator.

> True, but if it cost you $10 (or 10 GBP) to courier your curriculum
> vitae to the head office of Encyclopaedia Britannica to become Staff
> Coordinator, then you'd be fine. And if it cost you $10 to post your
> work summary to Britannica's administration to apply for this Staff
> Coordinator position, you could say it without 'e' too. Doesn't mean you
> don't need Unicode!

Back in the late 1970's, the economy and the outlook in the USA sucked,
and the following joke made the rounds:

Mr. Smith:  Good morning, Mr. Jones.  How are you?

Mr. Jones:  I'm fine.

(The humor is that Mr. Jones had his head so far [in the sand] that he
thought that things were fine.)

American English is my first spoken language, but I know enough French,
Greek, math, and other languages that I am very happy to have more than
ASCII these days.  I imagine that even Steven's surname should be spelled
D’Aprano rather than D'Aprano.

Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rusi  
View profile  
 More options Feb 12, 11:50 am
Newsgroups: comp.lang.python
From: rusi <rustompm...@gmail.com>
Date: Sun, 12 Feb 2012 08:50:28 -0800 (PST)
Local: Sun, Feb 12 2012 11:50 am
Subject: Re: Python usage numbers
On Feb 12, 10:51 am, Steven D'Aprano <steve

[Quite OT but...] How do you type all this?
[Note: I grew up on APL so unlike Rick I am genuinely asking :-) ]

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roy Smith  
View profile  
 More options Feb 12, 12:11 pm
Newsgroups: comp.lang.python
From: Roy Smith <r...@panix.com>
Date: Sun, 12 Feb 2012 12:11:46 -0500
Local: Sun, Feb 12 2012 12:11 pm
Subject: Re: Python usage numbers
In article <mailman.5730.1329065268.27778.python-l...@python.org>,
 Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:

> On Sun, 12 Feb 2012 10:48:36 -0500, Roy Smith <r...@panix.com> wrote:

> >As Steven D'Aprano pointed out, it was missing some commonly used US
> >symbols such as ¢ or ©.

That's interesting.  When I wrote that, it showed on my screen as a cent
symbol and a copyright symbol.  What I see in your response is an upper
case "A" with a hat accent (circumflex?) over it followed by a cent
symbol, and likewise an upper case "A" with a hat accent over it
followed by copyright symbol.

Oh, for the days of ASCII again :-)

Not to mention, of course, that I wrote <colon><dash><close-paren>, but
I fully expect some of you will be reading this with absurd clients
which turn that into some kind of smiley-face image.

>    Any volunteers to create an Extended Baudot... Instead of "letter
> shift" and "number shift" we could have a generic "encoding shift" which
> uses the following characters to identify which 7-bit subset of Unicode
> is to be represented <G>

I think that's called UTF-8.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roy Smith  
View profile  
 More options Feb 12, 12:21 pm
Newsgroups: comp.lang.python
From: Roy Smith <r...@panix.com>
Date: Sun, 12 Feb 2012 12:21:27 -0500
Local: Sun, Feb 12 2012 12:21 pm
Subject: Re: Python usage numbers
In article
<e7f457b3-7d49-4c95-bd95-e0f27fa66...@s8g2000pbj.googlegroups.com>,

What I do (on a Mac) is open the Keyboard Viewer thingie and try various
combinations of shift-control-option-command-function until the thing
I'm looking for shows up on a keycap.  A few of them I've got memorized
(for example, option-8 gets you a bullet €).  I would imagine if you
commonly type in a language other than English, you would quickly
memorize the ones you use a lot.

Or, open the Character Viewer thingie and either hunt around the various
drill-down menus (North American Scripts / Canadian Aboriginal
Syllabics, for example) or type in some guess at the official unicode
name into the search box.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 112   Newer >
« Back to Discussions « Newer topic     Older topic »