Are sets ever likely to be a built in type?

556 views
Skip to first unread message

Peter Williams

unread,
Apr 5, 2010, 8:20:34 PM4/5/10
to golang-nuts
Sets are a useful tool for many computing tasks and an efficient built
in mechanism would make them more useful. Is there any likelihood of
their being added to Go?

The bitwise operators for integers already allow unsigned integers to be
used as SMALL integer sets and the operators to be viewed as set
operators. E.g:

| becomes the union operator
& becomes the intersection operator

So, if sets became a built in type, these operators could be used for
them without undue confusion. Unfortunately, the fact that {} (which
are commonly use to enclose sets) are used as block delimiters means
that they are unavailable as a notation for set constants but I'm sure
something other notation could be devised.

The alternative to built in set functionality is to provide a set
package as part of the container package group. The main disadvantage
of this alternative (to built ins) is the need to use functions/methods
instead of operators for the basic operations on sets.

Peter

Kevin Ballard

unread,
Apr 5, 2010, 8:25:55 PM4/5/10
to Peter Williams, golang-nuts

I would actually prefer to have a package for sets rather than
overloading the bitwise operators.

-Kevin

--
Kevin Ballard
http://kevin.sb.org
kbal...@gmail.com

Peter Williams

unread,
Apr 6, 2010, 8:43:58 PM4/6/10
to Kevin Ballard, golang-nuts

If no one else is working on an implementation of a package for sets, I
would like to have a go. So, if I don't here someone say they're
already doing it, I'll get started.

Cheers
Peter

Peter Froehlich

unread,
Apr 14, 2010, 12:06:24 AM4/14/10
to Peter Williams, Kevin Ballard, golang-nuts
Hi all,

On Tue, Apr 6, 2010 at 8:43 PM, Peter Williams <pwil...@gmail.com> wrote:
> If no one else is working on an implementation of a package for sets, I
> would like to have a go.  So, if I don't here someone say they're already
> doing it, I'll get started.

I've started my own little hack for this. If you care to look:

http://github.com/phf/go-intset

I currently have simplistic bitsets as well as sparse sets following
Briggs/Torzcon. The latter seem to really rock, but of course they eat
a lot of memory...

Cheers,
Peter
--
Peter H. Froehlich <http://www.cs.jhu.edu/~phf/>
Senior Lecturer | Director, Johns Hopkins Gaming Lab

Peter Williams

unread,
Apr 14, 2010, 1:44:01 AM4/14/10
to Peter Froehlich, Kevin Ballard, golang-nuts
On 14/04/10 14:06, Peter Froehlich wrote:
> Hi all,
>
> On Tue, Apr 6, 2010 at 8:43 PM, Peter Williams<pwil...@gmail.com> wrote:
>> If no one else is working on an implementation of a package for sets, I
>> would like to have a go. So, if I don't here someone say they're already
>> doing it, I'll get started.
>
> I've started my own little hack for this. If you care to look:
>
> http://github.com/phf/go-intset
>
> I currently have simplistic bitsets as well as sparse sets following
> Briggs/Torzcon. The latter seem to really rock, but of course they eat
> a lot of memory...

For those just joining the conversation, my work (still experimental and
previously unannounced) is at:

<http://github.com/pwil3058/gosets>

I'm contemplating two main classes of set: one for sets of the various
integer types and another for sets of arbitrary types that satisfy a
specified interface.

So far all the work has been on the integer type sets which so far has a
working "engine" and a trial version for uint32 sets which will probably
undergo a fair bit of change as I try various ideas.

My latest idea is to change to an implementation of the higher level
sets (with minor changes to the "engine") that will allow set operations
between compatible set types. This will be possible as all types will
share the same engine. Most set operations would be compatible between
set types with the exception being the union and symmetric difference
operators where one set is a signed integer set and the other an
unsigned integer set.

This problem could be overcome by providing a cast operator to create a
new set (of the proscribed type) using the members of an existing set
minus, of course, those members that are not of valid values for the
type of the new set.

Peter
PS Work is slow as I'm still getting my head around Go.

Peter Williams

unread,
Jun 3, 2010, 2:06:40 AM6/3/10
to golang-nuts

Implementations (still being debugged and tested but hopefully ready to
play with) of two set packages are available in the "mudlark" Go package
project at:

<https://code.google.com/p/mudlark-go-pkgs/>

The "mudlark/set/bitset" package contains a type that implements sets of
whole numbers in the range min(int64) to max(uint64) using a sparse
bitmap implemented using maps.

The "mudlark/set/heteroset" package contains a type that implements
heterogeneous sets of objects of mixed types.

If anyone is interested and has the time, I'd appreciate review and
feedback on these packages.

Thanks
Peter
PS there are also packages for alternate sort functions ("mudlark/sort")
and a Left Leaning Red/Black Tree ("mudlark/tree/llrb_tree") in the project.
PPS The "mudlark" part is just a device for staking out a name space so
that packages don't clash with other packages with similar functionality.

Serge Hulne

unread,
Jun 4, 2010, 5:46:08 AM6/4/10
to golang-nuts
Overloading operators is one of the features which make C++
unreadable.
(because it encourages programmers to make up their own personal
syntax).

This is why Python, Go (and even C) are much nore readable than C++
(or Pearl or Ruby ...).

Serge.


On Apr 6, 2:25 am, Kevin Ballard <kball...@gmail.com> wrote:
> kball...@gmail.com

chris dollin

unread,
Jun 4, 2010, 5:53:32 AM6/4/10
to Serge Hulne, golang-nuts
On 4 June 2010 10:46, Serge Hulne <serge...@gmail.com> wrote:
Overloading operators is one of the features which make C++
unreadable.
(because it encourages programmers to make up their own personal
syntax).

Not because it encourages programmers to make up their own syntax
(because it just reuses existing syntax), but perhaps because it
means intuitions about the meaning of those operators are no
longer reliable.

(And maybe because it's hard for a C++ IDE to show the overloadings
easily? If a use of an overloaded operator was displayed in radioactive blue,
one would be less likely to miss it ...)

Chris

--
Chris "happily ignorant of C++ IDEs" Dollin

Russel Winder

unread,
Jun 4, 2010, 7:07:28 AM6/4/10
to chris dollin, Serge Hulne, golang-nuts
On Fri, 2010-06-04 at 10:53 +0100, chris dollin wrote:

>
> Not because it encourages programmers to make up their own syntax
> (because it just reuses existing syntax), but perhaps because it
> means intuitions about the meaning of those operators are no
> longer reliable.

Of course the facility is also the one that makes writing internal DSLs
easy. So there is no black and white here just shades of grey.

--
Russel.
=============================================================================
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel...@ekiga.net
41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@russel.org.uk
London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

signature.asc

Paulo Pinto

unread,
Jun 4, 2010, 9:07:52 AM6/4/10
to golang-nuts
Well it might make the code unreadable if badly used, but it sure
makes the code with complex
mathematical expressions.

Just in case you don't know it, Python also supports overloading
operators. Actuall all of the
following languages do support such a feature:

- C++
- C#
- D
- Python
- Ruby
- Groovy
- Smalltalk
- Scala
- Haskell
- F#
- ML

Maybe there are even more I just forgot.

But it is true that they can be easily used in bad ways. I already saw
lots of strange code that was making a big usage
of operator overloading in very strange ways (yes this was in C++).

chris dollin

unread,
Jun 4, 2010, 9:13:33 AM6/4/10
to Paulo Pinto, golang-nuts
On 4 June 2010 14:07, Paulo Pinto <paulo....@gmail.com> wrote:
Actuall all of the
following languages do support such a feature:
 
- ML

ML doesn't have overloading, not the last time I looked,
anyway.

(ML-style polymorphicsm isn't overloading.)

Chris

--
Chris "allusive" Dollin

Shawn

unread,
Jun 10, 2010, 1:53:38 AM6/10/10
to golang-nuts
You can create a kind of "poor man's" set by using maps with the keys
being the elements, and ignoring the values. For example, for a set of
strings:

make a set:
mySet := make(map[string] struct{})

add an element:
mySet[x] = struct{}{}

test whether an element is contained:
_, isContained := mySet[x]

remove an element:
mySet[x] = struct{}{}, false

size of set:
len(mySet)

iterate over the set:
for x := range mySet { }

Of course, you can replace "string" with any other type that can be
used as a key in a map.

Shawn

On Jun 4, 6:13 am, chris dollin <ehog.he...@googlemail.com> wrote:

Benny Siegert

unread,
Jun 10, 2010, 3:55:59 AM6/10/10
to Shawn, golan...@googlegroups.com
On Thu, Jun 10, 2010 at 07:53, Shawn <fake....@gmail.com> wrote:
> You can create a kind of "poor man's" set by using maps with the keys
> being the elements, and ignoring the values.

This is a good idea. This pattern is quite often used in Perl.

For simplicity, I would suggest making the value an int and just
writing 1 (or whatever) in there each time.

--Benny.

chris dollin

unread,
Jun 10, 2010, 4:18:02 AM6/10/10
to Benny Siegert, Shawn, golan...@googlegroups.com

What's wrong with making the value a struct{} and putting that
in each value? It shouldn't take up any extra space for the
value then.

Shouldn't it?

Chris

--
Chris "allusive" Dollin

Xuan Luo

unread,
Jun 10, 2010, 4:30:15 AM6/10/10
to chris dollin, Benny Siegert, golan...@googlegroups.com

Hmm, it appears that struct{} does take up space, because
unsafe.Sizeof(struct{}{}) returns 4. :(
Shawn

chris dollin

unread,
Jun 10, 2010, 4:36:32 AM6/10/10
to xua...@ucla.edu, Benny Siegert, golan...@googlegroups.com
On 10 June 2010 09:30, Xuan Luo <fake....@gmail.com> wrote:

Hmm, it appears that struct{} does take up space, because
unsafe.Sizeof(struct{}{}) returns 4. :(

Myself I would count that as a bug. What do the Go team think?

roger peppe

unread,
Jun 10, 2010, 4:52:19 AM6/10/10
to chris dollin, xua...@ucla.edu, Benny Siegert, golan...@googlegroups.com
this has come up before. i think some parts of the
runtime assume that sizeof(T) > 0 for all T.

personally, i'd use map[KeyType]bool,
or perhaps big.Int if KeyType is a smallish
integer.

the former has the advantage that you can
test for membership in an expression, because
maps return the zero value (false) when a member
isn't present; the latter will be very much faster for
set operations, and allows representation of complementary
sets, which can be useful.

Ian Lance Taylor

unread,
Jun 10, 2010, 9:54:05 AM6/10/10
to chris dollin, xua...@ucla.edu, Benny Siegert, golan...@googlegroups.com
chris dollin <ehog....@googlemail.com> writes:

> On 10 June 2010 09:30, Xuan Luo <fake....@gmail.com> wrote:
>
>>
>> Hmm, it appears that struct{} does take up space, because
>> unsafe.Sizeof(struct{}{}) returns 4. :(
>>
>
> Myself I would count that as a bug. What do the Go team think?

I think that zero length types lead to some odd situations. E.g., if
you have an array of them, then pointers to different elements of the
array are ==.

Ian

chris dollin

unread,
Jun 10, 2010, 10:04:05 AM6/10/10
to Ian Lance Taylor, xua...@ucla.edu, Benny Siegert, golan...@googlegroups.com

Which is amusing, but is it likely to be a problem?

(Since there's exactly one value of type struct{}, even if all those
pointers were different they'd point to the same value and be
unable to change it. I suppose this leads to similar metaphysical
issues as whether two functions with the same written representation
are to be treated as equal or not, as seen elsewhere recently.)

--
Chris "allusive" Dollin

Peter Williams

unread,
Jun 10, 2010, 8:45:26 PM6/10/10
to Shawn, golang-nuts
On 10/06/10 15:53, Shawn wrote:
> You can create a kind of "poor man's" set by using maps with the keys
> being the elements, and ignoring the values. For example, for a set of
> strings:
>
> make a set:
> mySet := make(map[string] struct{})
>
> add an element:
> mySet[x] = struct{}{}
>
> test whether an element is contained:
> _, isContained := mySet[x]
>
> remove an element:
> mySet[x] = struct{}{}, false
>
> size of set:
> len(mySet)
>
> iterate over the set:
> for x := range mySet { }
>
> Of course, you can replace "string" with any other type that can be
> used as a key in a map.

You forgot Union(), Intersection(), Difference(), etc. :-)

My bitset package for sets ("mudlark-go-pkgs" at Google Code) of integer
numbers (int, uint, int32, ...) uses maps but instead of using one entry
per set member packs them into a uint as one bit per member
(bits[member / sizeof(uint)] | member % sizeof(uint) roughly). How much
gain (memory efficiency) you get from this depends on how dense your set
is but it will be no worse than one map entry per member. Cost is
slight increase in computational overhead some of which may be regained
doing set operations (Union, etc.) depending on previous qualification
re set density.

At the moment my iterator sends members in partial sorted order (due to
map keys not being sorted by "range") but I'm working on this. I tried
using a binary tree instead of maps (because you get the ordering for
free) but most operations were 3 to 5 times slower than with maps. A
factor of 2 would have been predicted for my implementation (due to many
operations requiring two look ups instead of one) so the indications are
that maps are very efficient and I decided to go with them.

For more general sets, maps are a good option (with a suitable wrapper)
as long as the item's types are suitable for use as map keys otherwise
you need to look elsewhere.

Peter

Reply all
Reply to author
Forward
0 new messages