Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

save dictionary to a file without brackets.

211 views
Skip to first unread message

giuseppe...@gmail.com

unread,
Aug 9, 2012, 4:11:19 PM8/9/12
to
Hi,
I have a dict() unique
like this
{(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
4 5 1
5 4 1
4 4 2
2 3 1
4 3 2
Any ideas?
Thanks in advance
Giuseppe

Roman Vashkevich

unread,
Aug 9, 2012, 4:22:57 PM8/9/12
to giuseppe...@gmail.com, pytho...@python.org
for key in dict:
print key[0], key[1], dict[key]

10.08.2012, в 0:11, giuseppe...@gmail.com написал(а):
> --
> http://mail.python.org/mailman/listinfo/python-list

Tim Chase

unread,
Aug 9, 2012, 4:35:32 PM8/9/12
to Roman Vashkevich, pytho...@python.org, giuseppe...@gmail.com
On 08/09/12 15:22, Roman Vashkevich wrote:
>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>> 4 5 1
>> 5 4 1
>> 4 4 2
>> 2 3 1
>> 4 3 2
>
> for key in dict:
> print key[0], key[1], dict[key]

This might read more cleanly with tuple unpacking:

for (edge1, edge2), cost in d.iteritems(): # or .items()
print edge1, edge2, cost

(I'm making the assumption that this is a edge/cost graph...use
appropriate names according to what they actually mean)

-tkc



Giuseppe Amatulli

unread,
Aug 9, 2012, 4:35:54 PM8/9/12
to Oscar Benjamin, pytho...@python.org
thanks for the fast replies
my testing were very closed to yours but i did not know how

On 9 August 2012 15:25, Oscar Benjamin <oscar.j....@gmail.com> wrote:
> How's this?
>
> from __future__ import print_function
>
> output = open("out.txt", "w")
>
> for (a, b), c in d.items():
> print(a, b, c, file=output)
>
> output.close()
>
> Oscar.
>> --
>> http://mail.python.org/mailman/listinfo/python-list



--
Giuseppe Amatulli
Web: www.spatial-ecology.net

Gelonida N

unread,
Aug 9, 2012, 4:35:49 PM8/9/12
to pytho...@python.org
Boring explicit solution:

d = {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
for key, val in d.items():
v1,v2 = key
fout.write("%d %d %d\n" % (v1, v2, val))


Giuseppe Amatulli

unread,
Aug 9, 2012, 4:38:50 PM8/9/12
to Oscar Benjamin, pytho...@python.org
thanks for the fast replies
my testing were very closed to yours but i did not know how to print
the the number after the semicolon!
thanks!


On 9 August 2012 15:25, Oscar Benjamin <oscar.j....@gmail.com> wrote:
>
> On Aug 9, 2012 9:17 PM, <giuseppe...@gmail.com> wrote:
>>

Roman Vashkevich

unread,
Aug 9, 2012, 4:41:32 PM8/9/12
to Tim Chase, pytho...@python.org, giuseppe...@gmail.com
dict.items() is a list - linear access time whereas with 'for key in dict:' access time is constant: http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1

10.08.2012, в 0:35, Tim Chase написал(а):

> On 08/09/12 15:22, Roman Vashkevich wrote:
>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>> 4 5 1
>>> 5 4 1
>>> 4 4 2
>>> 2 3 1
>>> 4 3 2
>>

Mark Lawrence

unread,
Aug 9, 2012, 5:17:50 PM8/9/12
to pytho...@python.org
On 09/08/2012 21:41, Roman Vashkevich wrote:
> dict.items() is a list - linear access time whereas with 'for key in dict:' access time is constant: http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1
>
> 10.08.2012, О©╫ 0:35, Tim Chase О©╫О©╫О©╫О©╫О©╫О©╫О©╫(О©╫):
>
>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>> 4 5 1
>>>> 5 4 1
>>>> 4 4 2
>>>> 2 3 1
>>>> 4 3 2
>>>
>>> for key in dict:
>>> print key[0], key[1], dict[key]
>>
>> This might read more cleanly with tuple unpacking:
>>
>> for (edge1, edge2), cost in d.iteritems(): # or .items()
>> print edge1, edge2, cost
>>
>> (I'm making the assumption that this is a edge/cost graph...use
>> appropriate names according to what they actually mean)
>>
>> -tkc
>>
>>
>>
>

I'm impressed, the OP gives a dict with five entries and already we're
optimising, a cunning plan if ever there was one. Hum, I think I'll
start on the blast proof ferro-concrete bunker tonight just in case
WWIII starts tomorrow.

--
Cheers.

Mark Lawrence.

Tim Chase

unread,
Aug 9, 2012, 5:21:07 PM8/9/12
to Roman Vashkevich, pytho...@python.org, giuseppe...@gmail.com
On 08/09/12 15:41, Roman Vashkevich wrote:
> 10.08.2012, О©╫ 0:35, Tim Chase О©╫О©╫О©╫О©╫О©╫О©╫О©╫(О©╫):
>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>> 4 5 1
>>>> 5 4 1
>>>> 4 4 2
>>>> 2 3 1
>>>> 4 3 2
>>>
>>> for key in dict:
>>> print key[0], key[1], dict[key]
>>
>> This might read more cleanly with tuple unpacking:
>>
>> for (edge1, edge2), cost in d.iteritems(): # or .items()
>> print edge1, edge2, cost
>>
>> (I'm making the assumption that this is a edge/cost graph...use
>> appropriate names according to what they actually mean)
>
> dict.items() is a list - linear access time whereas with 'for
> key in dict:' access time is constant:
> http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1

That link doesn't actually discuss dict.{iter}items()

Both are O(N) because you have to touch each item in the dict--you
can't iterate over N entries in less than O(N) time. For small
data-sets, building the list and then iterating over it may be
faster faster; for larger data-sets, the cost of building the list
overshadows the (minor) overhead of a generator. Either way, the
iterate-and-fetch-the-associated-value of .items() & .iteritems()
can (should?) be optimized in Python's internals to the point I
wouldn't think twice about using the more readable version.

-tkc


Roman Vashkevich

unread,
Aug 9, 2012, 5:34:05 PM8/9/12
to Tim Chase, pytho...@python.org, giuseppe...@gmail.com
Actually, they are different.
Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
Dict uses hashing to get a value from the dict and this is why it's O(1).

10.08.2012, в 1:21, Tim Chase написал(а):

> On 08/09/12 15:41, Roman Vashkevich wrote:
>> 10.08.2012, в 0:35, Tim Chase написал(а):
>>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>>> 4 5 1
>>>>> 5 4 1
>>>>> 4 4 2
>>>>> 2 3 1
>>>>> 4 3 2
>>>>

Terry Reedy

unread,
Aug 9, 2012, 5:46:29 PM8/9/12
to pytho...@python.org
On 8/9/2012 5:21 PM, Tim Chase wrote:
> On 08/09/12 15:41, Roman Vashkevich wrote:
>> 10.08.2012, в 0:35, Tim Chase написал(а):
>>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>>> 4 5 1
>>>>> 5 4 1
>>>>> 4 4 2
>>>>> 2 3 1
>>>>> 4 3 2
>>>>
>>>> for key in dict:
>>>> print key[0], key[1], dict[key]
>>>
>>> This might read more cleanly with tuple unpacking:
>>>
>>> for (edge1, edge2), cost in d.iteritems(): # or .items()
>>> print edge1, edge2, cost
>>>
>>> (I'm making the assumption that this is a edge/cost graph...use
>>> appropriate names according to what they actually mean)
>>
>> dict.items() is a list - linear access time whereas with 'for
>> key in dict:' access time is constant:
>> http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1
>
> That link doesn't actually discuss dict.{iter}items()
>
> Both are O(N) because you have to touch each item in the dict--you
> can't iterate over N entries in less than O(N) time. For small
> data-sets, building the list and then iterating over it may be
> faster faster; for larger data-sets, the cost of building the list
> overshadows the (minor) overhead of a generator. Either way, the
> iterate-and-fetch-the-associated-value of .items() & .iteritems()
> can (should?) be optimized in Python's internals to the point I
> wouldn't think twice about using the more readable version.

In 3.x, .keys, .values, and .items are set-like read-only views
specifically designed for iteration. So in 3.x they are THE way to do so
for whichever alternative is appropriate. Iterating by keys and then
looking up values instead of yielding the values at the same time is
extra work.

--
Terry Jan Reedy


Dave Angel

unread,
Aug 9, 2012, 5:47:45 PM8/9/12
to Roman Vashkevich, pytho...@python.org, giuseppe...@gmail.com
On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).

Sure, that's why

for key in dict:
print key[0], key[1], dict[key]

is probably slower than

for (edge1, edge2), cost in d.iteritems(): # or .items()
print edge1, edge2, cost


So, the latter is both faster and easier to read. Why are you arguing against it?

Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.



--

DaveA

Chris Kaynor

unread,
Aug 9, 2012, 5:49:03 PM8/9/12
to Roman Vashkevich, pytho...@python.org, giuseppe...@gmail.com
On Thu, Aug 9, 2012 at 2:34 PM, Roman Vashkevich <vashke...@gmail.com> wrote:
>
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
>

Using "in" as an operator such as: "if key in dict" or "result = key
in dict" is O(1) as you say. Iterating on the dictionary requires
touching every item, and so is O(n), even though it also using "in" in
the command.

Here are a few quick timing tests I just ran with Python 2.6:

>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(1))')
0.078683853332734088
>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(10))')
0.17451784110969015
>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(100))')
1.1708168159579486

>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(1))')
0.14186911440299355
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(10))')
0.33836512561802579
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(100))')
2.2544262854249268

>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(1))')
0.10009793211446549
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(10))')
0.38825072496723578
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(100))')
3.3020098061049339


As can be seen here, a 1-item dictionary iterated in 0.07 seconds, 10
items in 0.17 seconds, and 100 items in 1.17 seconds. That is fairly
close to linear, especially when considering the overhead of a
complete no-op

Using iteritems, it appears to actually scale slightly better than
linear, though it is slower than just the plain iteration.

Doing a plain iteration, then looking up the keys to get the values
also appears to be linear, and is even slower than iteritems.

Chris Kaynor

unread,
Aug 9, 2012, 5:51:17 PM8/9/12
to pytho...@python.org
I realized, I should have done 10, 100, 1000 rather than 1, 10, 100
for better results, so here are the results for 1000 items. It still
maintains the same pattern:

>>> timeit.timeit('for i in d: pass', 'd=dict.fromkeys(range(1000))')
10.166595947685153
>>> timeit.timeit('for i in d.iteritems(): pass', 'd=dict.fromkeys(range(1000))')
19.922474218828711
>>> timeit.timeit('for i in d: v=d[i]', 'd=dict.fromkeys(range(1000))')
31.007666660415282

Chris

Giuseppe Amatulli

unread,
Aug 9, 2012, 5:53:49 PM8/9/12
to pytho...@python.org
Thanks a lot for the clarification.
Actually my problem is giving to raster dataset in geo-tif format find out
unique pair combination, count the number of observation
unique combination in rast1, count the number of observation
unique combination in rast2, count the number of observation

I try different solution and this seems to me the faster


Rast00=dsRast00.GetRasterBand(1).ReadAsArray()
Rast10=dsRast10.GetRasterBand(1).ReadAsArray()

mask=( Rast00 != 0 ) & ( Rast10 != 0 ) # may be this masking
operation can be included in the for loop

Rast00_mask= Rast00[mask] # may be this masking
operation can be included in the for loop
Rast10_mask= Rast10[mask] # may be this masking
operation can be included in the for loop

array2D = np.array(zip( Rast00_mask,Rast10_mask))

unique_u=dict()
unique_k1=dict()
unique_k2=dict()

for key1,key2 in array2D :
row = tuple((key1,key2))
if row in unique_u:
unique_u[row] += 1
else:
unique_u[row] = 1
if key1 in unique_k1:
unique_k1[key1] += 1
else:
unique_k1[key1] = 1
if key2 in unique_k2:
unique_k2[key2] += 1
else:
unique_k2[key2] = 1

output = open(dst_file_rast0010, "w")
for (a, b), c in unique_u.items():
print(a, b, c, file=output)
output.close()

output = open(dst_file_rast00, "w")
for (a), b in unique_k1.items():
print(a, b, file=output)
output.close()

output = open(dst_file_rast10, "w")
for (a), b in unique_k2.items():
print(a, b, file=output)
output.close()

What do you think? is there a way to speed up the process?
Thanks
Giuseppe





On 9 August 2012 16:34, Roman Vashkevich <vashke...@gmail.com> wrote:
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
>
> 10.08.2012, в 1:21, Tim Chase написал(а):
>
>> On 08/09/12 15:41, Roman Vashkevich wrote:
>>> 10.08.2012, в 0:35, Tim Chase написал(а):
>>>> On 08/09/12 15:22, Roman Vashkevich wrote:
>>>>>> {(4, 5): 1, (5, 4): 1, (4, 4): 2, (2, 3): 1, (4, 3): 2}
>>>>>> and i want to print to a file without the brackets comas and semicolon in order to obtain something like this?
>>>>>> 4 5 1
>>>>>> 5 4 1
>>>>>> 4 4 2
>>>>>> 2 3 1
>>>>>> 4 3 2
>>>>>
>>>>> for key in dict:
>>>>> print key[0], key[1], dict[key]
>>>>
>>>> This might read more cleanly with tuple unpacking:
>>>>
>>>> for (edge1, edge2), cost in d.iteritems(): # or .items()
>>>> print edge1, edge2, cost
>>>>
>>>> (I'm making the assumption that this is a edge/cost graph...use
>>>> appropriate names according to what they actually mean)
>>>
>>> dict.items() is a list - linear access time whereas with 'for
>>> key in dict:' access time is constant:
>>> http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#use-in-where-possible-1
>>
>> That link doesn't actually discuss dict.{iter}items()
>>
>> Both are O(N) because you have to touch each item in the dict--you
>> can't iterate over N entries in less than O(N) time. For small
>> data-sets, building the list and then iterating over it may be
>> faster faster; for larger data-sets, the cost of building the list
>> overshadows the (minor) overhead of a generator. Either way, the
>> iterate-and-fetch-the-associated-value of .items() & .iteritems()
>> can (should?) be optimized in Python's internals to the point I
>> wouldn't think twice about using the more readable version.
>>
>> -tkc

Roman Vashkevich

unread,
Aug 9, 2012, 6:02:00 PM8/9/12
to d...@davea.name, pytho...@python.org, giuseppe...@gmail.com
10.08.2012, в 1:47, Dave Angel написал(а):

> On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
>> Actually, they are different.
>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>
> Sure, that's why
>
> for key in dict:
> print key[0], key[1], dict[key]
>
> is probably slower than
>
> for (edge1, edge2), cost in d.iteritems(): # or .items()
> print edge1, edge2, cost
>
>
> So, the latter is both faster and easier to read. Why are you arguing against it?
>
> Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.
>
>
>
> --
>
> DaveA
>

I'm not arguing at all. Sorry if it sounded like I was arguing.
Thanks for notifying me of the way messages should be sent.

Roman

Andrew Cooper

unread,
Aug 9, 2012, 6:03:26 PM8/9/12
to
On 09/08/2012 22:34, Roman Vashkevich wrote:
> Actually, they are different.
> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
> Dict uses hashing to get a value from the dict and this is why it's O(1).
>

Sligtly off topic, but looking up a value in a dictionary is actually
O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.

~Andrew

Dave Angel

unread,
Aug 9, 2012, 6:26:53 PM8/9/12
to Andrew Cooper, pytho...@python.org
I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.


Where have you seen dictionaries so poorly implemented?

--

DaveA

Tim Chase

unread,
Aug 9, 2012, 6:39:22 PM8/9/12
to d...@davea.name, pytho...@python.org, Andrew Cooper

Chris Kaynor

unread,
Aug 9, 2012, 6:37:59 PM8/9/12
to pytho...@python.org
On Thu, Aug 9, 2012 at 3:26 PM, Dave Angel <d...@davea.name> wrote:
> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
> I'm glad you're wrong for CPython's dictionaries. The only time the
> lookup would degenerate to O[n] would be if the hash table had only one
> slot. CPython sensibly increases the hash table size when it becomes
> too small for efficiency.
>
>
> Where have you seen dictionaries so poorly implemented?

There are plenty of ways to make a pathological hash function that
will have that issue in CPython.

The very simple (and stupid):

class O(object):
def __hash__(self):
return 0
def __eq__(self, other): # I am aware this is the default equals method.
return self is other

Start adding those to a dictionary to get O(n) lookups.

Any case the hash return values modulus the dictionary hash table size
is constant will have similar results; powers of 2 are likely to
result in such behavior as well.

>
> --
>
> DaveA
>
> --
> http://mail.python.org/mailman/listinfo/python-list

Chris Angelico

unread,
Aug 9, 2012, 6:53:43 PM8/9/12
to pytho...@python.org
On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d...@davea.name> wrote:
> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> O(n) for all other entries in the dict which suffer a hash collision
>> with the searched entry.
>>
>> True, a sensible choice of hash function will reduce n to 1 in common
>> cases, but it becomes an important consideration for larger datasets.
>
> I'm glad you're wrong for CPython's dictionaries. The only time the
> lookup would degenerate to O[n] would be if the hash table had only one
> slot. CPython sensibly increases the hash table size when it becomes
> too small for efficiency.
>
> Where have you seen dictionaries so poorly implemented?

In vanilla CPython up to version (I think) 3.3, where it's possible to
DoS the hash generator. Hash collisions are always possible, just
ridiculously unlikely unless deliberately exploited.

(And yes, I know an option was added to older versions to randomize
the hashes there too. It's not active by default, so "vanilla CPython"
is still vulnerable.)

ChrisA

Andrew Cooper

unread,
Aug 9, 2012, 6:54:20 PM8/9/12
to
Different n, which I should have made more clear. I was using it for
consistency with O() notation. My statement was O(n) where n is the
number of hash collisions.

The choice of hash algorithm (or several depending on the
implementation) should specifically be chosen to reduce collisions to
aid in efficient space utilisation and lookup times, but any
implementation must allow for collisions. There are certainly runtime
methods of improving efficiency using amortized operations.

As for poor implementations,

class Foo(object):

...

def __hash__(self):
return 0

I seriously found that in some older code I had the misfortune of
reading. It didn't remain in that state for long.

~Andrew

Chris Angelico

unread,
Aug 9, 2012, 7:01:17 PM8/9/12
to pytho...@python.org
On Fri, Aug 10, 2012 at 8:39 AM, Tim Chase
<pytho...@tim.thechases.com> wrote:
> On 08/09/12 17:26, Dave Angel wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>> I'm glad you're wrong for CPython's dictionaries. The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot. CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>> Where have you seen dictionaries so poorly implemented?
>
> PHP?
>
> http://www.phpclasses.org/blog/post/171-PHP-Vulnerability-May-Halt-Millions-of-Servers.html

That's the same hash collision attack that I alluded to above, and it
strikes *many* language implementations. Most released a patch fairly
quickly and quietly (Pike, Lua, V8 (JavaScript/ECMAScript), PHP), but
CPython dared not, on account of various applications depending on
hash order (at least for tests). It's not (for once) an indictment of
PHP (maybe that should be an "inarrayment"?), it's a consequence of a
hashing algorithm that favored simplicity over cryptographic
qualities.

(It feels weird to be defending PHP...)

ChrisA

Roy Smith

unread,
Aug 9, 2012, 7:05:38 PM8/9/12
to
In article <ucXUr.1030527$2z2.3...@fx19.am4>,
Andrew Cooper <am...@cam.ac.uk> wrote:

> As for poor implementations,
>
> class Foo(object):
> def __hash__(self):
> return 0
>
> I seriously found that in some older code I had the misfortune of
> reading.

Python assumes you are a consenting adult. If you wish to engage in
activities which are hazardous to your health, so be it. But then
again, you could commit this particular stupidity just as easily in C++
or any other language which lets you define your own hash() function.

Chris Angelico

unread,
Aug 9, 2012, 7:14:30 PM8/9/12
to pytho...@python.org
On Fri, Aug 10, 2012 at 9:05 AM, Roy Smith <r...@panix.com> wrote:
> Python assumes you are a consenting adult. If you wish to engage in
> activities which are hazardous to your health, so be it.

... you mean, Python lets you make a hash of it?

*ducks for cover*

ChrisA

Roy Smith

unread,
Aug 9, 2012, 7:24:22 PM8/9/12
to
In article <mailman.3135.1344554...@python.org>,
Chris Angelico <ros...@gmail.com> wrote:

> On Fri, Aug 10, 2012 at 9:05 AM, Roy Smith <r...@panix.com> wrote:
> > Python assumes you are a consenting adult. If you wish to engage in
> > activities which are hazardous to your health, so be it.
>
> ... you mean, Python lets you make a hash of it?
>

Only if you order it with spam, spam, spam, spam, spam, spam, and spam.

Mark Lawrence

unread,
Aug 9, 2012, 7:33:04 PM8/9/12
to pytho...@python.org
Now now gentlemen we're getting slightly off topic here and wouldn't
want to upset the people who insist on staying on topic. Or would we? :)

--
Cheers.

Mark Lawrence.

Dave Angel

unread,
Aug 9, 2012, 7:38:13 PM8/9/12
to Andrew Cooper, pytho...@python.org
On 08/09/2012 06:54 PM, Andrew Cooper wrote:
> On 09/08/2012 23:26, Dave Angel wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> On 09/08/2012 22:34, Roman Vashkevich wrote:
>>>> Actually, they are different.
>>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>>>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>>>
>>> Sligtly off topic, but looking up a value in a dictionary is actually
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.
>>>
>>> ~Andrew
>> I'm glad you're wrong for CPython's dictionaries. The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot. CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>>
>> Where have you seen dictionaries so poorly implemented?
>>
> Different n, which I should have made more clear. I was using it for
> consistency with O() notation. My statement was O(n) where n is the
> number of hash collisions.
That's a little like doing a survey, and reporting the results as
showing that 100% of the women hit their husbands, among the population
of women who hit their husbands.

In your original message, you already stated the assumption that a
proper hash algorithm would be chosen, then went on to apparently claim
that large datasets would still have an order n problem. That last is
what I was challenging.

The rest of your message here refers to client code, not the system.
> The choice of hash algorithm (or several depending on the
> implementation) should specifically be chosen to reduce collisions to
> aid in efficient space utilisation and lookup times, but any
> implementation must allow for collisions. There are certainly runtime
> methods of improving efficiency using amortized operations.
>
> As for poor implementations,
>
> class Foo(object):
>
> ...
>
> def __hash__(self):
> return 0
>
> I seriously found that in some older code I had the misfortune of
> reading. It didn't remain in that state for long.
>
> ~Andrew


--

DaveA

Dave Angel

unread,
Aug 9, 2012, 7:42:48 PM8/9/12
to Chris Angelico, pytho...@python.org
On 08/09/2012 06:53 PM, Chris Angelico wrote:
> On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d...@davea.name> wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.
>> I'm glad you're wrong for CPython's dictionaries. The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot. CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>> Where have you seen dictionaries so poorly implemented?
> In vanilla CPython up to version (I think) 3.3, where it's possible to
> DoS the hash generator. Hash collisions are always possible, just
> ridiculously unlikely unless deliberately exploited.
>
> (And yes, I know an option was added to older versions to randomize
> the hashes there too. It's not active by default, so "vanilla CPython"
> is still vulnerable.)
>
> ChrisA

Thank you to you and others, who have corrected my over-general
response. I was not intending to claim anything about either a
deliberate DOS, nor a foolishly chosen hash function. But the message I
was replying to seemed to me to claim that for large datasets, even a
good hash algorithm would end up giving O(n) performance.



--

DaveA

Tim Chase

unread,
Aug 9, 2012, 8:16:58 PM8/9/12
to Mark Lawrence, pytho...@python.org
On 08/09/12 18:33, Mark Lawrence wrote:
> On 10/08/2012 00:24, Roy Smith wrote:
>>> ... you mean, Python lets you make a hash of it?
>>
>> Only if you order it with spam, spam, spam, spam, spam, spam, and spam.
>
> Now now gentlemen we're getting slightly off topic here and wouldn't
> want to upset the people who insist on staying on topic. Or would we? :)

We apologise for the off-topicness in the thread. Those responsible
have been sacked...

-tkc



Dave Angel

unread,
Aug 9, 2012, 8:27:23 PM8/9/12
to Tim Chase, Mark Lawrence, pytho...@python.org
On 08/09/2012 08:16 PM, Tim Chase wrote:
> On 08/09/12 18:33, Mark Lawrence wrote:
>> On 10/08/2012 00:24, Roy Smith wrote:
>>>> ... you mean, Python lets you make a hash of it?
>>> Only if you order it with spam, spam, spam, spam, spam, spam, and spam.
>> Now now gentlemen we're getting slightly off topic here and wouldn't
>> want to upset the people who insist on staying on topic. Or would we? :)
> We apologise for the off-topicness in the thread. Those responsible
> have been sacked...
>
> -tkc
>
>
>
Paper or plastic?


--

DaveA

Chris Angelico

unread,
Aug 9, 2012, 8:31:59 PM8/9/12
to pytho...@python.org
On Fri, Aug 10, 2012 at 10:16 AM, Tim Chase
<pytho...@tim.thechases.com> wrote:
> We apologise for the off-topicness in the thread. Those responsible
> have been sacked...

So if you take every mapping variable in your program and name them
"dFoo", "dBar", "dQuux", etc, for "dict"... would that be a dirty
Hungarian dictionary?

Excuse me, I'll go and sack myself now.

ChrisA

88888 Dihedral

unread,
Aug 10, 2012, 1:35:27 AM8/10/12
to
Andrew Cooper於 2012年8月10日星期五UTC+8上午6時03分26秒寫道:
This is the classical problem of storing the hash collision items one by one.
Those items should be stored by some order.

Steven D'Aprano

unread,
Aug 10, 2012, 4:54:42 AM8/10/12
to
Sacked? They were beaten to death with a large halibut!


--
Steven

Mark Lawrence

unread,
Aug 10, 2012, 5:37:12 AM8/10/12
to pytho...@python.org
Well whatever you do *DON'T* mention Cython. I mentioned it just now but
I think I've got away with it.

--
Cheers.

Mark Lawrence.

Roy Smith

unread,
Aug 10, 2012, 8:29:32 AM8/10/12
to
In article <mailman.3147.1344591...@python.org>,
What if I spell it Kython?

Mark Lawrence

unread,
Aug 10, 2012, 10:47:59 AM8/10/12
to pytho...@python.org
What a silly bunt!!!

--
Cheers.

Mark Lawrence.

Message has been deleted

88888 Dihedral

unread,
Aug 10, 2012, 1:46:09 PM8/10/12
to comp.lan...@googlegroups.com, d...@davea.name, giuseppe...@gmail.com, pytho...@python.org
Dave Angel於 2012年8月10日星期五UTC+8上午5時47分45秒寫道:
> On 08/09/2012 05:34 PM, Roman Vashkevich wrote:
>
> > Actually, they are different.
>
> > Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>
> > Dict uses hashing to get a value from the dict and this is why it's O(1).
>
>
>
> Sure, that's why
>
>
>
> for key in dict:
>
> print key[0], key[1], dict[key]
>
>
>
> is probably slower than
>
>
>
> for (edge1, edge2), cost in d.iteritems(): # or .items()
>
> print edge1, edge2, cost
>
>
>
>
>
> So, the latter is both faster and easier to read. Why are you arguing against it?
>
>
>
> Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.
>
>
>
>
>
>
>
> --
>
>
>
> DaveA

OK, lets estimate the hash colision rate first.

For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)).

Of couse another hash can be attatched.

88888 Dihedral

unread,
Aug 10, 2012, 1:46:09 PM8/10/12
to Roman Vashkevich, pytho...@python.org, giuseppe...@gmail.com, d...@davea.name

Steven D'Aprano

unread,
Aug 11, 2012, 7:26:37 AM8/11/12
to
On Fri, 10 Aug 2012 08:53:43 +1000, Chris Angelico wrote:

> On Fri, Aug 10, 2012 at 8:26 AM, Dave Angel <d...@davea.name> wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.
>>
>> I'm glad you're wrong for CPython's dictionaries. The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot. CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>> Where have you seen dictionaries so poorly implemented?
>
> In vanilla CPython up to version (I think) 3.3, where it's possible to
> DoS the hash generator. Hash collisions are always possible, just
> ridiculously unlikely unless deliberately exploited.

Not so unlikely actually.

py> hash(3)
3
py> hash(2**64 + 2)
3

py> hash(-1)
-2
py> hash(-2)
-2


By its very nature, a hash function cannot fail to have collisions. The
problem is that in general you have an essentially unlimited number of
objects being mapped to a large but still finite number of hash values.



--
Steven

88888 Dihedral

unread,
Aug 12, 2012, 7:59:53 AM8/12/12
to
Steven D'Aprano於 2012年8月11日星期六UTC+8下午7時26分37秒寫道:
Steven D'Aprano於 2012年8月11日星期六UTC+8下午7時26分37秒寫道:
Lets check the basic operations of a hash table or so-called a dictionary first.

If the dictionary is implemented toward faster in searching items,
then it is slightly different in the insertion and the deletion operations of
(key, value) pairs.


alex23

unread,
Aug 12, 2012, 8:15:12 PM8/12/12
to
On Aug 10, 7:37 pm, Mark Lawrence <breamore...@yahoo.co.uk> wrote:
> Well whatever you do *DON'T* mention Cython. I mentioned it just now but
> I think I've got away with it.

While I'm not against threads straying off topic, you're beginning to
come across as a bit of an asshole now.

Just let it go.

Steven D'Aprano

unread,
Aug 13, 2012, 4:05:34 AM8/13/12
to
Chill out Alex, it's all good. Mark was channelling a famous scene from
"Fawlty Towers", staring Monty Python's own John Cleese, hence it is on-
topic, for the sillier definitions of on-topic.

After making a German tourist cry with his repeated insensitive comments
about World War Two, Basil Fawlty (Cleese) -- who is an obnoxious git at
the best of times but is currently suffering from a concussion -- remarks
to his staff, "Don't mention the war, I mentioned it once but I think I
got away with it."

http://www.youtube.com/watch?v=7xnNhzgcWTk



--
Steven

Mark Lawrence

unread,
Aug 13, 2012, 8:37:04 AM8/13/12
to pytho...@python.org
Why on your say so?

--
Cheers.

Mark Lawrence.

alex23

unread,
Aug 13, 2012, 12:14:13 PM8/13/12
to
On Aug 13, 10:37 pm, Mark Lawrence <breamore...@yahoo.co.uk> wrote:
> Why on your say so?

My mistake, I didn't realise you wanted to sound so tedious. Knock
yourself out.


alex23

unread,
Aug 13, 2012, 12:16:46 PM8/13/12
to
On Aug 13, 6:05 pm, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> Chill out Alex, it's all good. Mark was channelling a famous scene from
> "Fawlty Towers", staring Monty Python's own John Cleese, hence it is on-
> topic, for the sillier definitions of on-topic.

Thank you, yes, I get that. However, Mark has repeatedly been
directing this dickishness at Stefan Behnel ever since he was asked to
not stray off topic. While Mark doesn't have to listen to anyone else
about his behaviour, he can't expect not to be called a dick when
acting like one.

rusi

unread,
Aug 13, 2012, 11:55:33 AM8/13/12
to
On Aug 13, 1:05 pm, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
>
> Chill out Alex, it's all good. Mark was channelling a famous scene from
> "Fawlty Towers", staring Monty Python's own John Cleese, hence it is on-
> topic, for the sillier definitions of on-topic.

Ha! Thanks for that connection.
Watched and enjoyed Fawlty towers as a kid but have never seen a Monty
Python.

Mark Lawrence

unread,
Aug 13, 2012, 1:07:26 PM8/13/12
to pytho...@python.org
Yes m'lud. Do I lick your boots or polish them?

--
Cheers.

Mark Lawrence.

Mark Lawrence

unread,
Aug 13, 2012, 1:43:40 PM8/13/12
to pytho...@python.org
On 13/08/2012 01:15, alex23 wrote:
http://mail.python.org/pipermail/pypy-dev/2012-February/009277.html

--
Cheers.

Mark Lawrence.

alex23

unread,
Aug 13, 2012, 9:32:48 PM8/13/12
to
Yeah, you're really coming across as holding the moral high ground
here.

Plonk.

Steven D'Aprano

unread,
Aug 13, 2012, 10:54:53 PM8/13/12
to
On Mon, 13 Aug 2012 18:07:26 +0100, Mark Lawrence wrote:

> On 13/08/2012 17:14, alex23 wrote:
>> On Aug 13, 10:37 pm, Mark Lawrence <breamore...@yahoo.co.uk> wrote:
>>> Why on your say so?
>>
>> My mistake, I didn't realise you wanted to sound so tedious. Knock
>> yourself out.
>>
>>
>>
> Yes m'lud. Do I lick your boots or polish them?


Children children, if you won't play nice don't play at all. You're
scaring away the people who are here to learn about Python.




--
Steven

Mark Lawrence

unread,
Aug 14, 2012, 7:33:20 AM8/14/12
to pytho...@python.org
Steven thanks for your comments, seriously for once, you've helped get
my feet back on the ground where they belong.

Everybody please accept my apologies for having gone OTT once again :(
In my defence for mitigating circumstances I offer a combination of
Asperger Syndrome and a new girl friend.

--
Cheers.

Mark Lawrence.

Alister

unread,
Aug 14, 2012, 11:05:14 AM8/14/12
to
if you have a new girlfriend why on earth are you posting here, I can
think of much more interesting things to do.
(apologies for continuing off topic)




--
Don't despair; your ideal lover is waiting for you around the corner.

Mark Lawrence

unread,
Aug 14, 2012, 2:15:27 PM8/14/12
to pytho...@python.org
Nothing is off topic here and I take your point, why am I posting here,
I haven't played pat a cake in years :)

--
Cheers.

Mark Lawrence.

0 new messages