Message from discussion
Article on the future of Python
Received: by 10.66.78.170 with SMTP id c10mr1271927pax.22.1349334769744;
Thu, 04 Oct 2012 00:12:49 -0700 (PDT)
Path: g9ni21446pbh.1!nntp.google.com!border1.nntp.dca.giganews.com!border4.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!nntp.giganews.com!novia!news-hub.siol.net!news.mi.ras.ru!goblin2!goblin1!goblin.stu.neva.ru!news.glorb.com!news-out.octanews.net!indigo.octanews.net!auth.beige.octanews.com.POSTED!not-for-mail
From: Paul Rubin <no.em...@nospam.invalid>
Newsgroups: comp.lang.python
Subject: Re: Article on the future of Python
References: <mailman.1294.1348560867.27098.python-list@python.org>
<mailman.1333.1348581385.27098.python-list@python.org>
<k3ssuq$78m$1@reader1.panix.com>
<cc2771fd-0b2b-4721-9ae0-657bc722ebad@googlegroups.com>
<ef917cfd-43a5-4620-a9b4-1c6934624bc4@googlegroups.com>
<5062ad83$0$29997$c3e8da3$5496439d@news.astraweb.com>
<693ac61b-b1d3-4192-9e50-5166fd119278@googlegroups.com>
<mailman.1420.1348653316.27098.python-list@python.org>
<7xmx0cg204.fsf@ruckus.brouhaha.com>
<mailman.1454.1348679093.27098.python-list@python.org>
Date: Wed, 26 Sep 2012 10:32:40 -0700
Message-ID: <7xipb0g05j.fsf@ruckus.brouhaha.com>
Organization: Nightsong/Fort GNOX
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
Cancel-Lock: sha1:ZKvyEybaDtKi8yzrUa47AX330Us=
MIME-Version: 1.0
Lines: 12
NNTP-Posting-Date: 26 Sep 2012 12:32:40 CDT
X-Complaints-To: abuse@octanews.net
Bytes: 1888
Content-Type: text/plain; charset=us-ascii
Chris Angelico <ros...@gmail.com> writes:
> So, I don't actually have any stats for you, because it's really easy
> to just not index strings at all.
Right, that's why I think the O(n) indexing issue of UTF-8 may be
overblown. Haskell 98 was mentioned earlier as a language that did
Unicode "correctly", but its strings are linked lists of code points.
They are a performance pig to be sure but the O(n) indexing is usually
not the bottleneck. These days there is a "Text" module that I think is
basically UTF-16 arrays. I have been meaning to find out what happens
with non-BMP characters.