Received: by 10.231.10.198 with SMTP id q6mr1806514ibq.3.1291059011705; Mon, 29 Nov 2010 11:30:11 -0800 (PST) X-BeenThere: shedskin-discuss@googlegroups.com Received: by 10.231.76.165 with SMTP id c37ls2100478ibk.3.p; Mon, 29 Nov 2010 11:30:11 -0800 (PST) Received: by 10.231.35.131 with SMTP id p3mr1745851ibd.11.1291059011112; Mon, 29 Nov 2010 11:30:11 -0800 (PST) Received: by 10.231.35.131 with SMTP id p3mr1745848ibd.11.1291059010741; Mon, 29 Nov 2010 11:30:10 -0800 (PST) Return-Path: Received: from mail-iw0-f178.google.com (mail-iw0-f178.google.com [209.85.214.178]) by gmr-mx.google.com with ESMTP id m30si1627137ibu.6.2010.11.29.11.30.09; Mon, 29 Nov 2010 11:30:09 -0800 (PST) Received-SPF: pass (google.com: domain of mark.duf...@gmail.com designates 209.85.214.178 as permitted sender) client-ip=209.85.214.178; Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of mark.duf...@gmail.com designates 209.85.214.178 as permitted sender) smtp.mail=mark.duf...@gmail.com; dkim=pass (test mode) header...@gmail.com Received: by iwn1 with SMTP id 1so6169108iwn.9 for ; Mon, 29 Nov 2010 11:30:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=lMEcsEYqZhBe10EPCCsGD5QNRXvcx/3V2j2lKxE3128=; b=PLWbaLO8CmzdO9d1mWmvoGfojg0aKb1ikD+/2FOcpCRBoAhXN/RbhZmd/Iii2UGb0r WzChrMQcjohY7dYBW/RTNXuCPay10yK8mNTO7+GVcIhct2fDcL15hFU7eVsokdt+2ewJ INvTmV8E7XCsoXPyYMloKJdm3JEnHHt5IrDUk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=hhp0whs4P1J2ZNAkS44Eh7wPDs1fxS9r7dWufUyw6AElUnljgLCgbrx+t6ZhLDpwKO Wf8AvbSmWI63l5uYr1g4QsHQ5wgibBKAkzwjo2R4C8n7SI8/oDSU5qGcTJ22ayWBMZkR qBX3WoFU3ZKqi9InzbIdv+VychEra/Q9Itxys= MIME-Version: 1.0 Received: by 10.231.13.137 with SMTP id c9mr5998165iba.76.1291059008953; Mon, 29 Nov 2010 11:30:08 -0800 (PST) Received: by 10.231.166.14 with HTTP; Mon, 29 Nov 2010 11:30:08 -0800 (PST) In-Reply-To: References: <66af70ab-8519-4e7b-9f6c-2b688c275...@s5g2000yqm.googlegroups.com> Date: Mon, 29 Nov 2010 20:30:08 +0100 Message-ID: Subject: Re: links From: Mark Dufour To: shedskin-discuss@googlegroups.com Content-Type: multipart/alternative; boundary=0022150467ef9cdf610496361912 --0022150467ef9cdf610496361912 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Since Python strings are immutable, slices should never lead to memory > allocations, am I wrong? > well, if we keep everything copy-by-reference, we'd still have to store the slice start and end somewhere.. but if we use copy-by-value, that would avoid any allocations. though I never really liked copy-by-value. it's ugly and complicates code generation in several ways.. Yes, I entirely agree. That's something I've already thought about > several times=85 std::string is of little help there, and it has a > serious overhead for several simple tasks : > - slicing > - copying > - resizing to a shorter string > - (maybe with some additional work) working with simple `char' where > strings are not needed (eg. for expressions like > for set and dict it turned out to be a very good idea to just do it the cpython way.. I also like this approach because it gives the same behaviour= , so for example making it faster in cpython makes it faster with shedskin as well. > x =3D 'a' > or > if foo[42] =3D=3D 'z': > or even > x =3D 'a' > if foo[42] =3D=3D x: > ) > shedskin currently caches all 256 possible 1-length strings, so indexing a string for example doesn't cause an allocation, and working with "characters" is really fast. that's also how I optimized geet's program for shedskin, by simply comparing three characters instead of slicing and then comparing. I played a bit with an alternative 'char' type some time ago, but that ende= d up complicating too many things, and because the current approach works practically just as well in many cases, in the end that clearly wasn't wort= h the effort. > When the ticket for issue 74 (bz2) was opened, I wanted to write a > quick and dirty implementation. Unfortunately, the current internal > representation of strings prevents us from simply returning pointers > to arbitrary locations in the memory. Same problem for the `easy' task > `improve IO speed': being able to mmap files and return pointers on > the memory without any copying would help a lot. > yeah, that's something to keep in mind. though the main problem with IO speed is that currently everything happens per-character. > As you may already know, the CPython representation for strings is a > pair of pointers: one for the beginning of the string, the other for > the end. I think nothing is more efficient than this. > no, but it sounds logical at this point.. ;) I'm guessing though that it still allocates a new object for each slice, to contain these pointers.. what I'm hoping is that we can somehow efficiently manage such objects, without needing copy-by-value.. it should at least be possible to allocate many of them at once. deallocation and recycling may be a bit trickier.. :P thanks, mark. --=20 http://www.youtube.com/watch?v=3DE6LsfnBmdnk --0022150467ef9cdf610496361912 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
Since Python strings are immutable, slices should never lead to memory<= br>
allocations, am I wrong?

well, if we keep everythi= ng copy-by-reference, we'd still have to store the slice start and end = somewhere.. but if we use copy-by-value, that would avoid any allocations. = though I never really liked copy-by-value. it's ugly and complicates co= de generation in several ways..

Yes, I = entirely agree. That's something I've already thought about
several times=85 std::string is of little help there, and it has a
serious overhead for several simple tasks :
=A0- slicing
=A0- copying
=A0- resizing to a shorter string
=A0- (maybe with some additional work) working with simple `char' where=
strings are not needed (eg. for expressions like

f= or set and dict it turned out to be a very good idea to just do it the cpyt= hon way.. I also like this approach because it gives the same behaviour, so= for example making it faster in cpython makes it faster with shedskin as w= ell.
=A0
=A0x =3D 'a'
or
=A0if foo[42] =3D=3D 'z':
or even
=A0x =3D 'a'
=A0if foo[42] =3D=3D x:
)

shedskin currently caches all 256 possible 1-len= gth strings, so indexing a string for example doesn't cause an allocati= on, and working with "characters" is really fast. that's also= how I optimized geet's program for shedskin, by simply comparing three= characters instead of slicing and then comparing.

I played a bit with an alternative 'char' type some time ago, b= ut that ended up complicating too many things, and because the current appr= oach works practically just as well in many cases, in the end that clearly = wasn't worth the effort.
=A0
When the ticket for issue 74 (bz2) was opened, I wanted to write a
quick and dirty implementation. Unfortunately, the current internal
representation of strings prevents us from simply returning pointers
to arbitrary locations in the memory. Same problem for the `easy' task<= br> `improve IO speed': being able to mmap files and return pointers on
the memory without any copying would help a lot.

y= eah, that's something to keep in mind. though the main problem with IO = speed is that currently everything happens per-character.
=A0
As you may already know, the CPython representation for strings is a
pair of pointers: one for the beginning of the string, the other for
the end. I think nothing is more efficient than this.
=
no, but it sounds logical at this point.. ;) I'm guessing though th= at it still allocates a new object for each slice, to contain these pointer= s.. what I'm hoping is that we can somehow efficiently manage such obje= cts, without needing copy-by-value.. it should at least be possible to allo= cate many of them at once. deallocation and recycling may be a bit trickier= .. :P

thanks,
mark.
--
http://www.youtube= .com/watch?v=3DE6LsfnBmdnk

--0022150467ef9cdf610496361912--