beg for Bag

TSa

unread,

Nov 28, 2006, 12:12:25 PM11/28/06

to p6l

HaloO,

as a spin-off of the 'Set-returning .keys (was Re: Smart Matching
clarification)' thread I want to propose the addition of a Bag
type that completes the set of immutable types. It shall have the
following properties.

1) It is a multiset generalization of Set
2) It is a supertype of Set and Seq (a Set can of course be build
from a Seq). That is 'Set does Bag' and 'Seq does Bag'. Note
that a Seq is a ready-made Bag and if it happens to have no
duplicates it behaves like a Set.
3) It has set operations as generalizations of the Set operations
4) It provides some Bag specific ops like (+) that return a Bag
even when called with Sets
5) It provides the iteration interface (which in turn is applicable
to the subtypes Set and Seq, of course)
6) %hash.values returns a Bag
7) %hash.keys returns a Set

The wording in section 'Immutable Types' in S06 concerning Set as
"Unordered Seqs that allow no duplicates" is a bit misleading because
it hints as Set being a subtype of Seq. The Mapping could be explained
as Set of Pair.

The Bag type could be implemented in a module but then we need a
way to do supertyping. The added Bag type would need to add a
multiplicity accessor that returns 1 to the Set implementation and
add lazy multiplicity counting to Seq. And I don't know how Hash::values
would be augmented to return a Bag. This Bag return type would guarantee
%h1.values === %h2.values to yield true when the two hashes happen to
return their values in different orders.

Comments?
--

Smylers

unread,

Nov 28, 2006, 2:08:50 PM11/28/06

to perl6-l...@perl.org

TSa writes:

> I want to propose the addition of a Bag type

Different from the C<Bag> that's already mentioned in Synopsis 3?

Smylers

Darren Duncan

unread,

Nov 28, 2006, 6:33:23 PM11/28/06

to perl6-l...@perl.org

TSa wasn't the first person to ask for an explicit Bag type. I did
too, a few weeks ago. And one reason for that was exactly what you
mention. Various other parts of the Synopsis documents mention Bag
in examples and such, but the list of built-in types in Synopsis 6
does not include it. In my mind, unless Bag appears in the Synopsis
6 list, its references elsewhere count as nothing more than an
example stand-in for some arbitrary user-defined type. -- Darren
Duncan

Jonathan Lang

unread,

Nov 28, 2006, 9:26:17 PM11/28/06

to p6l

TSa wrote:
> 1) It is a multiset generalization of Set
> 2) It is a supertype of Set and Seq (a Set can of course be build
> from a Seq). That is 'Set does Bag' and 'Seq does Bag'. Note
> that a Seq is a ready-made Bag and if it happens to have no
> duplicates it behaves like a Set.
> 3) It has set operations as generalizations of the Set operations

Note that this would mean that Seq would also have set operations.

> 4) It provides some Bag specific ops like (+) that return a Bag
> even when called with Sets

or Seqs.

--
Jonathan "Dataweaver" Lang

TSa

unread,

Nov 29, 2006, 3:56:48 AM11/29/06

to p6l

HaloO,

Jonathan Lang wrote:
> Note that this would mean that Seq would also have set operations.

I count this as an advantage. So one can write (1,2,3) (|) (2,2,3,4,4)
to get a result of (1,2,2,3,4,4). As long as the Seq is a Set, that is
it has no duplicates, you get Set behavior through the Bag ops:
(1,2,3) (|) (2,3,4) === (1,2,3,4); (1,2,3) (&) (2,3,4) === (2,3).

BTW, the set/bag operations are not yet mentioned in S03 as new
operators. Here's a list what I think they should be:

(|) union
(&) intersection
(^) symmetric difference
(/) disjoint union?
(!) complement, this is difficult because you need the surrounding set
(-) difference
(+) join, returns a bag
(*) cartesian product
(**) powerset
(in) membership
(!in) negated membership
(<) proper subset
(>) proper superset
(<=) subset
(>=) superset
(=) equality, also with ===
(!=) inequality, also with !===

Did I forget something?

Regards, TSa.
--

TSa

unread,

Nov 30, 2006, 6:38:40 AM11/30/06

to p6l

HaloO,

Jonathan Lang wrote:
> Would (1,2,2,3,4,4) be a Seq or a Bag?

Comma constructs a Seq, of course.

> IMHO, the _only_ way this
> could work would be if it's a Bag: if it's a Seq, I see no way that
> one could resolve '(1,2,3) ∪ (3,1,2)'.

This is not any different from '3' + '4' resulting in numeric 7.
It is the operator that builds Bags from (1,2,3) and (3,1,2) and
then calculates the union.

> Mind you, I'm still not sold on the idea of performing set operations
> on Seqs - it may be technically feasible to do so, but it strikes me
> as fundamentally unintuitive.

But you have no problem with doing numeric stuff? E.g. (1,2,3) + (3,4)
actually means +(1,2,3) + +(3,4) == 3 + 2 == 5. It is the operator that
indicates Set/Bag operations. And 'Seq does Bag' comes in handy here.
Actually, no set operation is performed on Seq. The Seq is just used to
build a Bag. This is like there is no string concatenation for numbers,
arrays etc. They are stringified first.

One could also write @a (|) @b and the two arrays would be flattened to
Seq and then used to build a Bag. The only drawback with Bag being the
Seq supertype is that people might want @a (|) @b to mean Set(@a) (|)
Set(@b). The Set enforcement could also come after the Bag union:
Set(@a (|) @b). Perhaps @a (|) @b delivers the result in an Array
instead of a Bag value.

So in general the parens ops should not be called set ops but unordered
ops because this is their core meaning. They are overloaded for Bag and
Set.

> I'm still bothered by the idea that you have to wrap _every_ ASCII
> representative of a set operation in parentheses - something which is
> only necessary when you start applying the full range of set
> operations to non-Set entities. In particular, I want 'Set - Set' to
> produce the difference of the two Sets.

But that should be the numeric difference of the cardinalities of the
Sets. Plain - means numeric. It's the same thing that we don't do string
concatenation with + but have a stringish ~ for the task. Hence it works
to say 3 ~ 4 and end up with '34'.

> If set operations also apply to Seq, then (=) is not the same as ===.
> The former ignores the order of the terms; the latter only does so for
> Sets and Bags. In a way, this is what started the whole debate.

You are correct. This emphasizes the need for (=).

> You mention a single "disjoint union" operator: is it supposed to be
> the "disjoint union" comparison operator (i.e., it returns true if the
> sets are disjoint), or the "disjoint union" composition operator
> (which returns a Set of Pairs, with each element being keyed according
> to the Set that it was originally in)?

Perhaps we need both. I've chosen (/) for it's visual similarity with
(|) like || and //. We could use (/?) for the disjointness test. Or
doing visual games again: (%). You see the disjoint sets in the glyph.
(:) might work, too.

> Saying that "complement" is difficult is an understatement. I suppose
> you _could_ get it to work by having a 'complemented Set' would keep
> track of which elements it _doesn't_ have; but this opens a can of
> worms that I really don't think we want to get into (e.g., "A ∪
> (!)B"). And the notion of a complement with regard to the surrounding
> set is already handled by the difference operator.

Isn't that similar in semantics to the none junction? If you have e.g.
a Set of Int (1,2,3) you can express (!)(1,2,3) as Set(*..0,4..*).
But I agree that it is difficult in general. Well, (!)$x might actually
create a large set of almost all currently known objects in some scope.
This might just be written as * (-) $x. IOW, Set(*) means all known
objects. As long as a program is not wielding large data sets complement
might just work---and the complement of a large set is small, of course.

Note that $a (-) $b is equivalent to $a (&) (!)$b, you have it as the
union. Or did you mean $a (|) (!)$b === (!)($b (-) $a)?

Hmm, how is Bag complement defined in the first place? Is it just
complementing the set and goes for multiplicities of 1?

> You did highlight some things - for instance, a Set of Pairs is _not_
> a Hash: a Hash has a further requirement that every key must be
> unique, whereas a Set of Pairs allows for duplicate keys (but not
> duplicate key-value pairs; go with a Bag of Pairs for that).

Correct.

Regards, TSa.
--