Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Help with sets

3,148 views
Skip to first unread message

B. M. Whealton

unread,
Oct 4, 2010, 10:31:50 PM10/4/10
to pytho...@python.org
Hello all,
So, I started learning python just recently. I got inspired
by a project that related to the semantic web. I can see why this would
be a language chosen for the applications that help to build the
semantic web. It is interesting to see that indeed python does have
structures that make it well suited for this kind of application, or I
seem to be able to see how certain structures in python do stand out as
unique and related to concepts I am reading about in relation to the
Semantic web. I did get a bit confused in reading about the concept of
sets in python and why you would use them instead of a dictionary for
example.
For example, here we have:
self._pos = {predicate:{object:set([subject])}}
This is said to be a dictionary, containing dictionaries, which in turn
contain sets. I don't know if it would be possible to explain why one
would use a set, especially in this context without showing more. I was
a bit stumped by this for some odd reason. I don't know if other
languages lack certain structures and features. It's been a long time
since I did C programming too.
Anyway, this would be used an a Class as follows:

class SimpleGraph:
def __init__(self):
self._spo = {}
self._pos = {}
self._osp = {}

This is to create indexes that are permutations of the pattern subject,
predicate object. So, using this:

self._pos = {predicate:{object:set([subject])}}

We have the first dictionary keyed off the first term, the
second dictionary keyed off the second term, and the set containing the
third terms(note terms plural). I guess I somewhat understand that sets
are used to test for membership. Cannot that be done with dictionaries,
though?
Also, if you ran this for loop, below, what is being yielded,
the key or the value in the dictionary? I refer to this for loop here:
for subject in self._pos[predicate][ojbect]: yield (subject, predicate,
object)

Thanks for any help and insights into this,
Bruce
--

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
B. M. Whealton - Web Developer, Programmer, Owner:
Future Wave Designs: http://FutureWaveDesigns.com
"On Being a Poet and Other Existential Ideas":>
http://WordSaladPoetryMagazine.com/words/
Publisher of Word Salad Poetry Magazine:
http://WordSaladPoetryMagazine.com/ - Submit your poetry
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Chris Rebert

unread,
Oct 5, 2010, 3:06:53 AM10/5/10
to Ian Kelly, Python
On Tue, Oct 5, 2010 at 12:01 AM, Ian Kelly <ian.g...@gmail.com> wrote:

> On Mon, Oct 4, 2010 at 8:31 PM, B. M. Whealton
> <bwhe...@futurewavedesigns.com> wrote:
>>
>> self._pos = {predicate:{object:set([subject])}}
>>
>>       We have the first dictionary keyed off the first term, the second
>> dictionary keyed off the second term, and the set containing the third
>> terms(note terms plural). I guess I somewhat understand that sets are used
>> to test for membership.  Cannot that be done with dictionaries, though?
>
> You could.  I suspect the pertinent difference here is that dict keys have
> values associated with them.  The same code using a dict as the innermost
> collection would look something like this:
>
> self._pos = {predicate: {object: {subject: None}}}
>
> That's a bit ugly because the None serves no purpose here; the value
> associated with the subject has no meaning in this context.  It also
> uselessly takes up space in memory that could add up to significant wastage
> if the data structure grows to be large.

Indeed, CPython's sets are implemented as something like dictionaries
with dummy values (the keys being the members of the set), with some
optimization(s) that exploit this lack of values.

Cheers,
Chris
--
http://blog.rebertia.com

Steven D'Aprano

unread,
Oct 5, 2010, 3:39:54 AM10/5/10
to
On Mon, 04 Oct 2010 22:31:50 -0400, B. M. Whealton wrote:

> I did get a bit confused in reading about the concept of
> sets in python and why you would use them instead of a dictionary for
> example.

Why would you use a spoon instead of a paper clip? Why would you use a
hat-stand instead of a pencil sharpener?

Sets aren't an alternative to dictionaries. They have a completely
different purpose.

It so happens that sets and dicts *in Python* happen to share some
implementation details, and they happen to share similar syntax, but
don't be fooled by these incidental details. They could be implemented
differently.

Dicts are used when you need a 1:1 mapping between a key and a value:

{name: address} # each person has exactly one home address

Sets are used when you have a list of items, but you don't care about the
order of the items, only whether or not something is in the list.

Since lists are ordered, searching a list to see if something is in it
can be slow. Sets give up order so that they can make membership testing
blindingly fast:


>>> from timeit import Timer
>>> Timer("100042 in L", "L=range(100000)").timeit(number=10000)
53.221637010574341
>>> Timer("100042 in S", "S=set(range(100000))").timeit(number=10000)
0.0022180080413818359

> For example, here we have:
> self._pos = {predicate:{object:set([subject])}} This is said to be a
> dictionary, containing dictionaries, which in turn contain sets. I don't
> know if it would be possible to explain why one would use a set,
> especially in this context without showing more. I was a bit stumped by
> this for some odd reason. I don't know if other languages lack certain
> structures and features.

Of course they do. That's why we have more than one programming language.
Earlier versions of Python didn't have sets. Standard Pascal doesn't have
dicts. Neither Fortran nor COBOL supported recursion, at least early on.


> It's been a long time since I did C
> programming too.
> Anyway, this would be used an a Class as follows:
>
> class SimpleGraph:
> def __init__(self):
> self._spo = {}
> self._pos = {}
> self._osp = {}
>
> This is to create indexes that are permutations of the pattern subject,
> predicate object. So, using this:
>
> self._pos = {predicate:{object:set([subject])}}

Sounds complicated. What are the indexes for? Why not just have the
permutations directly?


> We have the first dictionary keyed off the first term, the
> second dictionary keyed off the second term, and the set containing the
> third terms(note terms plural). I guess I somewhat understand that sets
> are used to test for membership. Cannot that be done with dictionaries,
> though?
> Also, if you ran this for loop, below, what is being yielded,
> the key or the value in the dictionary? I refer to this for loop here:
> for subject in self._pos[predicate][ojbect]: yield (subject, predicate,
> object)

What is yielded is exactly what you tell Python to yield: a tuple
containing three items (subject, predicate, object), whatever they are.


--
Steven

Paul Rudin

unread,
Oct 5, 2010, 3:56:39 AM10/5/10
to
"B. M. Whealton" <bwhe...@futurewavedesigns.com> writes:

> I did get a bit confused in reading about the concept of sets in
> python and why you would use them instead of a dictionary for example.

Use a set when something is naturally modelled as a set... it's a
collection of unordered objects that you can test for membership, form
unions, intersections etc. reasonably efficiently.

Use a dictionary when you want something that's naturally modelled as a
dictionary - i.e. you have a unordered collection of keys each of which
has a value associated with it and you want reasonably efficient lookup
of those values from the keys.

Certainly you can model a set as a dictionary, but that's likely to be
less efficient than using a set, and quite possibly you'll need to roll
your own operations on your sets, whereas they're provided for the built
in implementation.

(Of course if you know more information about your problem domain you
might have a better representation than the built in set... for example
you might imagine an application where you want to form sets of objects
from a fixed, relatively small, universe. Then you could perhaps model a
set as a bit array, and lots of set operations can then be done by very
fast bit fiddling operations.)

nn

unread,
Oct 5, 2010, 12:13:54 PM10/5/10
to
> Semantic web.  I did get a bit confused in reading about the concept of
> sets in python and why you would use them instead of a dictionary for

Sets are faster and more convenient to do intersections, unions,
differences. They also use less space than dictionaries. Finally they
also help conveying the intent of the code.

Many things can be emulated using dictionaries. In languages like Lua
there are no sets, and no tuples, and no lists*. This makes sense if
your goal is to create a very small embedded language. Using specific
structures on the other hand has the benefit that operations common to
that structure are more convenient to express and that those
operations are optimized for space and time.


*"Tables are the sole data structuring mechanism in Lua; they can be
used to represent ordinary arrays, symbol tables, sets, records,
graphs, trees, etc." http://www.lua.org/manual/5.1/manual.html#2.2

Terry Reedy

unread,
Oct 5, 2010, 4:45:40 PM10/5/10
to pytho...@python.org
On 10/5/2010 3:01 AM, Ian Kelly wrote:

> self._pos = {predicate: {object: {subject: None}}}
>
> That's a bit ugly because the None serves no purpose here; the value
> associated with the subject has no meaning in this context.

This is what we did in Python before sets were added.

> It also
> uselessly takes up space in memory that could add up to significant
> wastage if the data structure grows to be large.
>

> The other major difference between dicts and sets is that sets provide
> mathematical operations such as intersection and difference that dicts
> do not, but that doesn't appear to be relevant in this context.

These two reasons are why we added sets.

--
Terry Jan Reedy

Terry Reedy

unread,
Oct 5, 2010, 4:56:11 PM10/5/10
to pytho...@python.org
On 10/5/2010 3:39 AM, Steven D'Aprano wrote:

> Sets aren't an alternative to dictionaries. They have a completely
> different purpose.

A dict/mapping is a specialized set -- a set of ordered pairs in which
each first member (the 'key') only appears once as a first member. The
set union of two mappings may not be a mapping, which is why dicts have
a specialized .update method instead of .union.

> It so happens that sets and dicts *in Python* happen to share some
> implementation details, and they happen to share similar syntax, but
> don't be fooled by these incidental details. They could be implemented
> differently.
>
> Dicts are used when you need a 1:1 mapping between a key and a value:

The mapping does not have to be 1:1, which means that each value also
appears only once, so that the mapping can be inverted. Values can
appear more than once, which is to say, many keys can map to one value.
so the mapping is many:1. Collections (sets or lists) are used as values
to implement 1:many mappings.

--
Terry Jan Reedy

Lawrence D'Oliveiro

unread,
Oct 7, 2010, 1:48:17 AM10/7/10
to
In message <87bp79q...@rudin.co.uk>, Paul Rudin wrote:

> Certainly you can model a set as a dictionary, but that's likely to be
> less efficient than using a set, and quite possibly you'll need to roll
> your own operations on your sets, whereas they're provided for the built
> in implementation.

Did you know that applying the “set” or “frozenset” functions to a dict
return a set of its keys?

Seems a bit dodgy, somehow.

Gregory Ewing

unread,
Oct 8, 2010, 7:37:46 PM10/8/10
to
Lawrence D'Oliveiro wrote:

> Did you know that applying the “set” or “frozenset” functions to a dict
> return a set of its keys?

> Seems a bit dodgy, somehow.

That's just a consequence of the fact that dicts produce their
keys when iterated over, and the set constructor iterates over
whatever you give it.

--
Greg

Lawrence D'Oliveiro

unread,
Oct 11, 2010, 7:11:03 AM10/11/10
to

Hmm. It seems that “iter(<dict>)” iterating over the keys has been around a
long time. But a dict has both keys and values: why are language constructs
treating them so specially as to grab the keys and throw away the values?

Is this an artifact of older directions in the language evolution, before
sets were introduced?

Ethan Furman

unread,
Oct 11, 2010, 8:39:08 AM10/11/10
to pytho...@python.org
Lawrence D'Oliveiro wrote:
> In message <8h9ob9...@mid.individual.net>, Gregory Ewing wrote:
>
>>Lawrence D'Oliveiro wrote:
>>
>>>Did you know that applying the “set” or “frozenset” functions to a dict
>>>return a set of its keys?
>>
>>>Seems a bit dodgy, somehow.
>>
>>That's just a consequence of the fact that dicts produce their
>>keys when iterated over, and the set constructor iterates over
>>whatever you give it.
>
> Hmm. It seems that “iter(<dict>)” iterating over the keys has been around a
> long time. But a dict has both keys and values: why are language constructs
> treating them so specially as to grab the keys and throw away the values?

What's so special about it? If you want the value, ask for it;
iterating over the dict without asking specifically for the values does
not give them. Furthermore, if you did get the key:value pairs how
would you store them in a set such that you could query the set to see
if a key were there?

~Ethan~

Robert Kern

unread,
Oct 11, 2010, 11:09:19 AM10/11/10
to pytho...@python.org
On 10/11/10 6:11 AM, Lawrence D'Oliveiro wrote:
> In message<8h9ob9...@mid.individual.net>, Gregory Ewing wrote:
>
>> Lawrence D'Oliveiro wrote:
>>
>>> Did you know that applying the “set” or “frozenset” functions to a dict
>>> return a set of its keys?
>>
>>> Seems a bit dodgy, somehow.
>>
>> That's just a consequence of the fact that dicts produce their
>> keys when iterated over, and the set constructor iterates over
>> whatever you give it.
>
> Hmm. It seems that “iter(<dict>)” iterating over the keys has been around a
> long time. But a dict has both keys and values: why are language constructs
> treating them so specially as to grab the keys and throw away the values?

Language constructs are not treating anything specially much less "throwing
away" anything. The language construct in question does exactly the same thing
with every object nowadays: call the .__iter__() method to get the iterator and
call .next() on that iterator until it raises StopIteration. It is the
responsibility of the dict object itself to decide how it wants to be iterated over.

The reasoning for this decision is spelled out in the PEP introducing the
iterator feature:

http://www.python.org/dev/peps/pep-0234/

"""
- There has been a long discussion about whether

for x in dict: ...

should assign x the successive keys, values, or items of the
dictionary. The symmetry between "if x in y" and "for x in y"
suggests that it should iterate over keys. This symmetry has been
observed by many independently and has even been used to "explain"
one using the other. This is because for sequences, "if x in y"
iterates over y comparing the iterated values to x. If we adopt
both of the above proposals, this will also hold for
dictionaries.

The argument against making "for x in dict" iterate over the keys
comes mostly from a practicality point of view: scans of the
standard library show that there are about as many uses of "for x
in dict.items()" as there are of "for x in dict.keys()", with the
items() version having a small majority. Presumably many of the
loops using keys() use the corresponding value anyway, by writing
dict[x], so (the argument goes) by making both the key and value
available, we could support the largest number of cases. While
this is true, I (Guido) find the correspondence between "for x in
dict" and "if x in dict" too compelling to break, and there's not
much overhead in having to write dict[x] to explicitly get the
value.

For fast iteration over items, use "for key, value in
dict.iteritems()". I've timed the difference between

for key in dict: dict[key]

and

for key, value in dict.iteritems(): pass

and found that the latter is only about 7% faster.

Resolution: By BDFL pronouncement, "for x in dict" iterates over
the keys, and dictionaries have iteritems(), iterkeys(), and
itervalues() to return the different flavors of dictionary
iterators.
"""

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Gregory Ewing

unread,
Oct 11, 2010, 8:11:18 PM10/11/10
to
Robert Kern wrote:

> The reasoning for this decision is spelled out in the PEP introducing
> the iterator feature:
>
> http://www.python.org/dev/peps/pep-0234/

There's also the question of whether 'if x in dict' should
compare keys only or both keys and values. This was also
hotly debated back when dicts were given an 'in' operator
(before that you had to use dict.haskey(x)). Can't remember
all the arguments, but it was definitely thought about.

--
Greg

Steve Howell

unread,
Oct 12, 2010, 1:00:11 AM10/12/10
to

The PEP seems to refer to such discussion, although no there is no
specific footnote to the mailing list archives.

- Regarding "if key in dict": there is no doubt that the
dict.has_key(x) interpretation of "x in dict" is by far the
most useful interpretation, probably the only useful one. There
has been resistance against this because "x in list" checks
whether x is present among the values, while the proposal makes
"x in dict" check whether x is present among the keys. Given
that the symmetry between lists and dictionaries is very weak,
this argument does not have much weight.

Symmetry is always a tricky balance in programming languages. Python
discounts symmetry considerations with respect to lists and
dictionaries when it comes to "in" (see above), while it goes for a
certain symmetry between "if" and "for" with respect to the "in"
keyword and dictionaries (see below):

While this is true, I (Guido) find the correspondence between
"for x in
dict" and "if x in dict" too compelling to break, and there's
not
much overhead in having to write dict[x] to explicitly get the
value.

There is no major inconsistency here; "symmetry" is subjective and
needs to be applied with a heavy dose of pragmatism.

Lawrence D'Oliveiro

unread,
Oct 12, 2010, 6:34:00 AM10/12/10
to
In message
<e70027a2-045b-4327...@a4g2000prm.googlegroups.com>, Steve
Howell wrote:

> Symmetry is always a tricky balance in programming languages.

Is that what we used to call “orthogonality”?

D'Arcy J.M. Cain

unread,
Oct 12, 2010, 9:15:49 AM10/12/10
to Lawrence D'Oliveiro, pytho...@python.org
On Tue, 12 Oct 2010 23:34 +1300
Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:
> > Symmetry is always a tricky balance in programming languages.
>
> Is that what we used to call “orthogonality”?

No, orthogonality is something else. "Orthogonal" means "perpendicular
to." See the Wikipedia article for a discussion of the word's
application to computer science.

--
D'Arcy J.M. Cain <da...@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

Lawrence D'Oliveiro

unread,
Oct 12, 2010, 8:54:05 PM10/12/10
to
In message <mailman.1606.1286889...@python.org>, D'Arcy
J.M. Cain wrote:

> On Tue, 12 Oct 2010 23:34 +1300
> Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:
>
>>> Symmetry is always a tricky balance in programming languages.
>>
>> Is that what we used to call “orthogonality”?
>
> No, orthogonality is something else. "Orthogonal" means "perpendicular
> to."

The appropriate meaning is ‘being able to combine independently” (as in the
orthogonal decomposition of a Fourier transform). An example of contemporary
usage, from the “Revised Report on the Algorithmic Language ALGOL 68” (1974,
I think):

0.1.2 Orthogonal design
The number of independent primitive concepts has been minimized in
order that the language be easy to describe, to learn, and to implement.
On the other hand, these concepts have been applied “orthogonally” in
order to maximize the expressive power of the language while trying to
avoid deleterious superfluities.

So “orthogonality” has to do with use of minimum number of primitive
components (operators, object types) in a maximum number of meaningful
combinations.

Steve Howell

unread,
Oct 12, 2010, 9:40:08 PM10/12/10
to
On Oct 12, 5:54 pm, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealand> wrote:
> In message <mailman.1606.1286889135.29448.python-l...@python.org>, D'Arcy

Lawrence, I was actually talking about symmetry, not orthogonality. I
was making the observation that Python doesn't always strive for
symmetry as its #1 driving concern (as well it shouldn't). I think
your definition of orthogonality is more what Python is about. I'd
say that symmetry and orthogonality go hand in hand for most
situations. Symmetry often helps minimize the number of primitive
components, for example. But sometimes symmetry arguments can be
forced or just be made on a weak foundation.

MRAB

unread,
Oct 12, 2010, 10:24:20 PM10/12/10
to pytho...@python.org

Someone (Niklaus Wirth?) once said something about how a simpler
language isn't necessarily easier to learn. Pascal, for example, makes
a distinction between procedures and function, which can help to
prevent some bugs, and Python forbids assignment in expressions, which
also helps to prevent some bugs.

Lawrence D'Oliveiro

unread,
Oct 13, 2010, 7:27:37 PM10/13/10
to
In message
<57a322df-8e42-4da5...@b14g2000pro.googlegroups.com>, Steve
Howell wrote:

> Lawrence, I was actually talking about symmetry, not orthogonality.

So what’s the difference?

Steve Howell

unread,
Oct 13, 2010, 9:38:13 PM10/13/10
to
On Oct 13, 4:27 pm, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealand> wrote:
> In message
> <57a322df-8e42-4da5-af96-0c21c5733...@b14g2000pro.googlegroups.com>, Steve

>
> Howell wrote:
> > Lawrence, I was actually talking about symmetry, not orthogonality.
>
> So what’s the difference?

I don't think there's a big difference between the two--you snipped
away my comment about them often going hand in hand, so I will repeat
it here that they do often go hand in hand.

I guess a lot depends on how you define "symmetry." Is your
definition of "symmetry" equivalent to your definition of
"orthogonality"?

I don't have a precise definition of "symmetry" myself, I will admit.
Perhaps for lack of a better word, I throw around the term "symmetry"
for situations where analogous constructs get used in slightly
different contexts. The example in this discussion is the "in"
keyword. You can use "in" for strings, sets, lists, and dictionaries,
which in and of itself is an example of "symmetry" in my mind.
Another example is the relationship of the "+" operator to the "*"
operator. With numbers "*" connotes repeated use of "+", as it does
with strings, even though the underlying operations for "+" are not
exactly equivalent. String star-ification is repeated concatenation,
whereas integer star-ification is repeated addition, but since
concatenation and addition are both represented by "+", it creates
some sort of symmetry to have "*" as the string-concatenation-
repetition operator, even if it is not something folks commonly
encounter outside of programming languages.

Lawrence D'Oliveiro

unread,
Oct 13, 2010, 10:25:15 PM10/13/10
to
In message
<aa6eafa0-7075-424c...@w19g2000yqb.googlegroups.com>, Steve
Howell wrote:

> I guess a lot depends on how you define "symmetry." Is your
> definition of "symmetry" equivalent to your definition of
> "orthogonality"?

No idea. It’s just that the example being discussed in this thread seemed to
come under the old term “orthogonality”, so I wondered why a different term
was being used.

So far no-one has come up with a good excuse for using a different term.

Steve Howell

unread,
Oct 13, 2010, 10:48:53 PM10/13/10
to
On Oct 13, 7:25 pm, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealand> wrote:
> In message
> <aa6eafa0-7075-424c-abef-79cbc0dd3...@w19g2000yqb.googlegroups.com>, Steve

Ask the authors of PEP 234 why they use the term symmetry:

http://www.python.org/dev/peps/pep-0234/

That was the original context of my comment. The term "symmetry" gets
used a couple times in that PEP, and I think we're in violent
agreement that the concept of "symmetry" is wishy-washy at best.

Here is just one example from the PEP:

The symmetry between "if x in y" and "for x in y"
suggests that it should iterate over keys. This symmetry has
been
observed by many independently and has even been used to
"explain"
one using the other.

I think I'm just making the same observation as you coming from a
different angle. Why talk about "symmetry" when it's such a tricky
balance?

Gregory Ewing

unread,
Oct 14, 2010, 7:09:13 PM10/14/10
to
Steve Howell wrote:

> That was the original context of my comment. The term "symmetry" gets
> used a couple times in that PEP, and I think we're in violent
> agreement that the concept of "symmetry" is wishy-washy at best.
>
> Here is just one example from the PEP:
>
> The symmetry between "if x in y" and "for x in y"
> suggests that it should iterate over keys.

Maybe "analogy" or "similarity" would be a better word here.

--
Greg

Steve Howell

unread,
Oct 14, 2010, 9:13:06 PM10/14/10
to

Agreed. "Analogy" seems particularly appropriate.

Lawrence D'Oliveiro

unread,
Oct 15, 2010, 12:22:25 AM10/15/10
to
In message
<12fcd67a-774d-42f0...@s24g2000pri.googlegroups.com>, Steve
Howell wrote:

Except that, while analogies can be handy for illustrating things, they are
useless for actually supporting arguments.

Steve Howell

unread,
Oct 15, 2010, 12:46:21 AM10/15/10
to
On Oct 14, 9:22 pm, Lawrence D'Oliveiro <l...@geek-
central.gen.new_zealand> wrote:
> In message
> <12fcd67a-774d-42f0-851a-9c3497df9...@s24g2000pri.googlegroups.com>, Steve

And only a fool would try to support his argument by actually
illustrating a point, right? ;)

0 new messages