Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

using the newer collection types

10 views
Skip to first unread message

Darren Duncan

unread,
May 4, 2006, 5:44:42 PM5/4/06
to perl6-l...@perl.org, dar...@darrenduncan.net
As I carry on in my spare time to implement a Relation type for Perl
6, I would like to use some of the simpler types that were added to
the Synopsis recently and seem to lack a lot of explanatory details
that older types have, and moreover they don't seem to be implemented
yet in Pugs.

So I have a few questions whose answers should clarify the intended
meaning and features of these newer types, as well as the syntax for
declaring them.

Some relevant example types from Synopsis 6:

Immutable types

Objects with these types behave like values, i.e. C<$x === $y> is true
if and only if their types and contents are identical.

List Lazy Perl list (composed of Seq and Range parts)
Seq Completely evaluated (hence immutable) sequence
Range Incrementally generated (hence lazy) sequence
Set Unordered Seqs that allow no duplicates
Junction Sets with additional behaviours
Pair Seq of two elements that serves as an one-element Mapping
Mapping Pairs with no duplicate keys
Signature Function parameters (left-hand side of a binding)
Capture Function call arguments (right-hand side of a binding)

Mutable types

Objects with these types have distinct C<.id> values.

Array Perl array
Hash Perl hash

The intended new Relation type could be described like this, if I
correctly understand the meaning of the existing types:

Immutable types

Relation Set of Mappings where all Mappings have the same keys

Speaking a little more technically, a Relation has 2 main components,
its heading and its body. The heading is a set of 0..N keys (called
"attributes" in relation-land), and the body is a set of 0..N
Mappings (called "tuples" in relation-land), where they set of keys
of each Mapping is identical to the Relation's heading. Its very
likely that a language-embedded Relation implementation would
actually not repeat the keys for each member Mapping, but we can
conceptualize as if they were present for simplicity.

The operations that you can do with a Relation are a proper super-set
of those you can do with a Set. So, the Relation type supports all
the same Set operators, with the same meanings, such as: equal(),
subset(), superset(), union(), intersection(), difference(),
symmetric_difference(), none(), any(), all(), member_exists(),
members(), member_count(). Moreover, the Relation type has these
operators that the Set type doesn't have: rename(), project(),
restrict(), extend(), join(), divide(), summarize(), group(),
ungroup(), wrap(), unwrap(), matching(), etc. Moreover, there would
probably be convenience wrapper functions over combinations of the
above operators such as insert(), update(), delete(), etc, though
they aren't essential (those examples are not mutators, despite their
name-sakes). Some extra operators like sort() would also be
provided, which convert Relations to Seqs or Arrays.

Now, some of the questions:

1. Are Sets or Junctions allowed to contain undefined elements? Can
undef be a key of a Mapping or Hash?

2. What actually is the practical distinction between a Set and a
Junction? Why would someone use one over the other? I recognize
that the use of Junctions is supposed to make parallelism easier, as
iterating through one is known to be order independent. But,
conceptually a Set and a Relation are exactly the same; you could
process their members in any order and/or in parallel as well. So is
the use of a Junction effectively like a compiler flag to make
certain kinds of Set ops faster at the expense of others?

3. Is a Signature like the keys of a Mapping but that it has extra
stuff like associated types and such? Can one declare and use a
Signature separately from declaring a function?

4. What is the syntax for declaring anonymous Sets and Mappings? I
am already aware of these syntax for other types (correct me if I'm
wrong):

$a = [1,2,3]; # Array
$b = {'x'=>2,'y'=>4}; # Hash
$c = (1=>2); # Pair
$d = (1,2,3); # Seq
$e = 1..5; # Range
$f = all(1,2,3); # Junction

If this hasn't yet been decided, might I suggest the following?:

$g = set(1,2,3); # Set
$h = ('x'=>2,'y'=>4); # Mapping

If that works, then perhaps an anonymous Relation declartion could look like:

$r = relation( set( 'x', 'y' ): ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

I'm not particular with the exact syntax; it just needs to be something good.

Note that a terse form of this could leave out the heading
declaration if at least one Mapping/tuple is provided, since that
contains the same key list.

$r = relation( ('x'=>2,'y'=>4), ('x'=>5,'y'=>6) );

Then the heading declaration is only needed if the Relation has no
Mappings/tuples.

$r = relation( set( 'x', 'y' ): );

5. What is the syntax for subscripting or extracting Mapping
components? Eg, can we use the same .keys, .values, .pairs, etc that
we use for Hashes? Also, is it possible to directly get the keys of
a Mapping and/or a Hash as a Set, or is it more ideal to do the likes
of all($mapping.keys) to get that behaviour?

6. Can I declare with named Set (or Junction) and Mapping typed
variables and/or parameters that their members are restricted to
particular types, such as Str, as I can with Arrays and Hashes, so
that Perl itself will catch violations? Eg, can I say as a parameter
"Set of Str :$heading?" or "Set of Mapping(Str) of Any :$body?" so
Perl will check that arguments are suchwise correct?

7. Can we add some operators to Mapping that are like the Relation
ones, so that implementing a Relation over Mappings is easier (or,
see the end of #8)? Eg, these would be useful: rename(), project(),
extend(), join(). In particular, implementing join() into Mapping
would help save CPU cycles:

a. join() is an N-ary operator taking 0..N Mappings and returning 1 Mapping.
b. If given zero Mappings, it returns an empty Mapping (no keys or values).
c. If given any undefined arguments, the result is undef;
otherwise, continue as follows:
d. If given one Mapping, it returns that Mapping.
e. If given 2..N Mappings, it merges each pair in turn (order
doesn't matter) as follows.
f. If the two input Mappings have no keys in common, the new
Mapping, consists of all keys and values of the sources.
g. If the two input mappings have any keys in common, and their
corresponding values are also the same (as defined by the identity
operator ===), then the new Mapping contains one copy of each
key/value in common, plus, the key/values where the keys were
different; as a trivial case of this, if all keys and values are the
same, aka the 2 Mappings as a whole are identical, the result is
identical to either of the inputs.
h. If the two input mappings have any keys in common, but any
corresponding values are different, the output is undef.

8. While I avoided it so far for simplicity, I would like for it to
be possible to declare that each attribute of a Relation-valued
variable or parameter, or anonymous value for that matter, is
restricted to a specific type (eg: Str, Int, ...) as we can do for
Arrays and Hashes; until now, the above descriptions assumed that
type was implicitly Any. In that case, such a declaration could look
like this, where the heading is a Mapping rather than a Set:

$r = relation( ( 'x' => Int, 'y' => Int ): ('x'=>2,'y'=>4),
('x'=>5,'y'=>6) );

But if we do this, then I'm not sure that a Mapping would be
appropriate any more as a separable Relation element, since I'm not
aware that you can declare a Mapping variable or parameter with a
predefined set of typed keys; eg:

Mapping('x'=>Int,'y'=>Int) $x;
$x = ('x'=>2,'y'=>4); # succeeds
$x = ('x'=>'Hello'); # fails
$x = ('z'=>5); # fails

So I'm wondering whether this may be a good excuse to have a
relation-land Tuple type, which is like the Relation but whose body
has just one element:

Tuple('x'=>Int,'y'=>Int) $x;
$x = ('x'=>2,'y'=>4); # succeeds
$x = ('x'=>'Hello'); # fails
$x = ('z'=>5); # fails

Or:

$t = tuple( set( 'x'=>Int,'y'=>Int ): 'x'=>2,'y'=>4 );
$t = tuple( 'x'=>2,'y'=>4 );

If we have an actual relation-land Tuple type, then Mapping can be
kept a lot simpler and/or the way it is now.

I'm inclined to rule out the Signature and Capture types to be used
here, since while they have some similar properties to what I'm
looking for (a Relation header could be a Signature and its body a
set of Captures), they seem function specific. But maybe you think
differently?

So, any answers to my questions and/or feedback on my ideas is appreciated.

Thank you in advance.

-- Darren Duncan

Sam Vilain

unread,
May 4, 2006, 6:51:07 PM5/4/06
to Darren Duncan, perl6-l...@perl.org
Darren Duncan wrote:

>Speaking a little more technically, a Relation has 2 main components,
>its heading and its body. The heading is a set of 0..N keys (called
>"attributes" in relation-land), and the body is a set of 0..N
>Mappings (called "tuples" in relation-land), where they set of keys
>of each Mapping is identical to the Relation's heading. Its very
>likely that a language-embedded Relation implementation would
>actually not repeat the keys for each member Mapping, but we can
>conceptualize as if they were present for simplicity.
>
>

I don't think this terminology or these restrictions are particularly
useful.

I do think that a Pair should be a sub-type of a more general "Tuple"
type, with the 'where' clause being { .items == 2 } or something like that.

I think that the most flexible arrangement is to define;

- a Collection as a Bag of Tuples
- a Relation as a Collection where the tuples have a shape and no
duplicate tuples are allowed (but Relation does not need to be a core type)

Then, Mappings, Sequences, etc, become sub-types of one of the above two
types. For instance, a sequence is a Collection of (Int, Any) where the
first Int is unique across the collection. Similarly a Mapping is a
Collection of (Any, Any) where Unique(0).

something like

role Tuple { has @.items };
role Collection { has Tuple @.tuples };
subset Pair of Tuple where { .items.items == 2 };
subset Bag of Collection where { ! .tuples.grep:{.items > 1 } }
subset Set of Bag where {
all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } )
}
subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } }
subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } }
subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } }

The above should probably all be written in terms of parametric roles
(see S12), but for now the above run-time checking versions should
hopefully express the relationships between the core collection-like
types as I see them.

That sounds like it might bite, but you wouldn't normally access an
Array as a Collection of (Int, Any), you'd access it as an Array, so you
get the nice .post_circumfix:<[ ]> method that makes array access easy.
You don't care that it has this higher order type as a parent class, and
you certainly wouldn't care for the 'bare' Collection interface (as for
one, you don't want to have to deal with the Integer keys). And it is
probably all backed by native methods.

I'm prototyping much of this using Moose in Perl 5, however Hubris is
delaying its release :-)

>Moreover, the Relation type has these
>operators that the Set type doesn't have: rename(), project(),
>restrict(), extend(), join(), divide(), summarize(), group(),
>ungroup(), wrap(), unwrap(), matching(), etc.
>

Is there a reference for the meaning of these methods?

>1. Are Sets or Junctions allowed to contain undefined elements? Can
>undef be a key of a Mapping or Hash?
>
>

undef.isa(Object), so you should be able to use it as you would any
other object. I would definitely not think of it as the absence of a
value in this context.

>2. What actually is the practical distinction between a Set and a
>Junction? Why would someone use one over the other? I recognize
>that the use of Junctions is supposed to make parallelism easier, as
>iterating through one is known to be order independent. But,
>conceptually a Set and a Relation are exactly the same; you could
>process their members in any order and/or in parallel as well. So is
>the use of a Junction effectively like a compiler flag to make
>certain kinds of Set ops faster at the expense of others?
>
>

Well one side effect at the moment is that Junctions are immutable,
whilst Sets are mutable. This is perhaps a deficiency in my original
Set.pm design; all of the mutating functions should be in a seperate
role, really (or just not be mutators).

>6. Can I declare with named Set (or Junction) and Mapping typed
>variables and/or parameters that their members are restricted to
>particular types, such as Str, as I can with Arrays and Hashes, so
>that Perl itself will catch violations? Eg, can I say as a parameter
>"Set of Str :$heading?" or "Set of Mapping(Str) of Any :$body?" so
>Perl will check that arguments are suchwise correct?
>
>

These are variously called "Generics" (ada I think, Java 1.5+),
"Parametric Types", "Higher Order Types" (Pierce et al), "Generic
Algebraic Data Types" (Haskell)

In Perl 6 they are parametric roles (as in S12 mentioned above)

>7. Can we add some operators to Mapping that are like the Relation
>ones, so that implementing a Relation over Mappings is easier (or,
>see the end of #8)? Eg, these would be useful: rename(), project(),
>extend(), join(). In particular, implementing join() into Mapping
>would help save CPU cycles:
>
>

Again, a reference to a prototype of the behaviour would be useful.

Sam.

Darren Duncan

unread,
May 4, 2006, 8:11:53 PM5/4/06
to perl6-l...@perl.org
At 10:51 AM +1200 5/5/06, Sam Vilain wrote:
> >Moreover, the Relation type has these
>>operators that the Set type doesn't have: rename(), project(),
>>restrict(), extend(), join(), divide(), summarize(), group(),
> >ungroup(), wrap(), unwrap(), matching(), etc.
>
>Is there a reference for the meaning of these methods?
>
> >7. Can we add some operators to Mapping that are like the Relation
>>ones, so that implementing a Relation over Mappings is easier (or,
>>see the end of #8)? Eg, these would be useful: rename(), project(),
>>extend(), join(). In particular, implementing join() into Mapping
>>would help save CPU cycles:
>
>Again, a reference to a prototype of the behaviour would be useful.

There are many written references to these methods; just type
"relational algebra" into Google.

That said, some of those search results may not explain things in the
same way, so I specifically prefer the definitions in Date and
Darwen's book "Databases, Types, and The Relational Model";
equivalent definitions are probably on the 'net' but I'm not sure
where.

I also defined part of the join() operator in my last email. How
that definition, for Tuples, extends to a Relation is that you pair
every tuple in each relation being joined to every tuple in each of
the others, and apply my earlier definition with each pairing; the
output Relation contains a tuple where the earlier definition
returned a tuple, and no tuple where it returned undef.

Alternately, if you want to wait a week, I will be coding up
documented implementations as soon as possible within Pugs'
ext/Relation/ dir.

But in order for me to do this, I was needing some answers about the
nature of existing types like Set and Mapping and Junction etc, which
I asked in this email.

Regarding your other comments:


>I don't think this terminology or these restrictions are particularly
>useful.
>
>I do think that a Pair should be a sub-type of a more general "Tuple"
>type, with the 'where' clause being { .items == 2 } or something like that.
>
>I think that the most flexible arrangement is to define;
>
>- a Collection as a Bag of Tuples
>- a Relation as a Collection where the tuples have a shape and no
>duplicate tuples are allowed (but Relation does not need to be a core type)
>
>Then, Mappings, Sequences, etc, become sub-types of one of the above two
>types. For instance, a sequence is a Collection of (Int, Any) where the
>first Int is unique across the collection. Similarly a Mapping is a
>Collection of (Any, Any) where Unique(0).

You may be on to something here, but I'll withold comments for now.

My main concerns with this whole Relation thing are that I want an
efficient way in Perl 6 to represent what relation-land's concept of
tuples and relations are, as well as an efficient implementations of
relational algebra that are just as easy to use and fast in Perl 6 as
are the language's other collection types such as Sets.

If Perl 6 is to be the choice language of the singularity, it needs
to be easy and fast to do any common type of work with it, and
relational algebra is an extremely common kind of work being done.

-- Darren Duncan

Darren Duncan

unread,
May 4, 2006, 8:47:44 PM5/4/06
to perl6-l...@perl.org
Actually, I'll add a few more things to my reply, which should be helpful ...

At 5:11 PM -0700 5/4/06, Darren Duncan wrote:
>At 10:51 AM +1200 5/5/06, Sam Vilain wrote:
>> >Moreover, the Relation type has these
>>>operators that the Set type doesn't have: rename(), project(),
>>>restrict(), extend(), join(), divide(), summarize(), group(),
>> >ungroup(), wrap(), unwrap(), matching(), etc.
>>
>>Is there a reference for the meaning of these methods?
>

>There are many written references to these methods; just type
>"relational algebra" into Google.

I will add that the first hit on such a search, the Wikipedia page on
relational algebra ( http://en.wikipedia.org/wiki/Relational_algebra
), is a perfectly good primer on what relational algebra is and what
its importance is.

The article's introduction says:

Relational algebra, an offshoot of first-order logic, is a
set of relations closed under operators. Operators operate on one or
more relations to yield a relation. Relational algebra is a part of
computer science.

Relation algebra in pure mathematics is an algebraic
structure, relevant to mathematical logic and set theory.

That article also explains many of the most important relational operators.

Note that there is a related set of operators comprising relational
calculus, and you can do everything in one that you can in the other,
though less or more verbosely as the case may be.

At 10:51 AM +1200 5/5/06, Sam Vilain wrote:
>I do think that a Pair should be a sub-type of a more general "Tuple"
>type, with the 'where' clause being { .items == 2 } or something like that.
>
>I think that the most flexible arrangement is to define;
>
>- a Collection as a Bag of Tuples
>- a Relation as a Collection where the tuples have a shape and no
>duplicate tuples are allowed (but Relation does not need to be a core type)
>
>Then, Mappings, Sequences, etc, become sub-types of one of the above two
>types. For instance, a sequence is a Collection of (Int, Any) where the
>first Int is unique across the collection. Similarly a Mapping is a
>Collection of (Any, Any) where Unique(0).
>

>something like
>
>role Tuple { has @.items };
>role Collection { has Tuple @.tuples };
>subset Pair of Tuple where { .items.items == 2 };
>subset Bag of Collection where { ! .tuples.grep:{.items > 1 } }
>subset Set of Bag where {
>all( .tuples.map:{ .items } ) == one( .tuples.map:{ .items } )
>}
>subset Mapping of Collection where { ! .tuples.grep:{ .items != 2 } }
>subset Array of Mapping where { .tuples.grep:{ .items[0].isa(Int) } }
>subset Hash of Mapping where { .tuples.grep:{ .items[0].does(Str) } }

While this may not actually change anything, I should point out that
every collection type can also be expressed in terms of a Relation
definition and/or they can all be implemented over a Relation (whose
members are actually always unique). For example:

1. A Set of Any is a Relation with one Any attribute.
2. A Bag of N Any attributes is a Relation of N+1 attributes, where
the extra attribute is an Int (constrained >= 1) that counts
occurrances of the distinct other attributes.
3. A Mapping can be a Relation of 2 Any attributes.
4. A Hash is a Relation of 2 attributes, Str (key) and Any (value),
where the key has a unique constraint.
5. A Seq is a Relation of 2 attributes, typed Int (>= 0) and Any,
where the first shows their ordinal position and the second is the
actual value; the first has a unique constraint.
6. An Array is the same, assuming it is a sparse; if it is not
sparse, there is an additional constraint that the greatest Int value
is the same / one less than the count of Relation members.

Suffice it to say that I'm sure you would implement a bag using some
other type, whether a relation or a hash or an array, where the
member is stored once with an occurrance count.

-- Darren Duncan

Sam Vilain

unread,
May 5, 2006, 4:01:36 AM5/5/06
to Darren Duncan, perl6-l...@perl.org
Darren Duncan wrote:

>>>Is there a reference for the meaning of these methods?
>>>
>>>
>>There are many written references to these methods; just type
>>"relational algebra" into Google.
>>
>>
>
>I will add that the first hit on such a search, the Wikipedia page on
>relational algebra ( http://en.wikipedia.org/wiki/Relational_algebra
>), is a perfectly good primer on what relational algebra is and what
>its importance is.
>
>

Thanks for the pointer.

>While this may not actually change anything, I should point out that
>every collection type can also be expressed in terms of a Relation
>definition and/or they can all be implemented over a Relation (whose
>members are actually always unique). For example:
>
>1. A Set of Any is a Relation with one Any attribute.
>2. A Bag of N Any attributes is a Relation of N+1 attributes, where
>the extra attribute is an Int (constrained >= 1) that counts
>occurrances of the distinct other attributes.
>3. A Mapping can be a Relation of 2 Any attributes.
>4. A Hash is a Relation of 2 attributes, Str (key) and Any (value),
>where the key has a unique constraint.
>5. A Seq is a Relation of 2 attributes, typed Int (>= 0) and Any,
>where the first shows their ordinal position and the second is the
>actual value; the first has a unique constraint.
>6. An Array is the same, assuming it is a sparse; if it is not
>sparse, there is an additional constraint that the greatest Int value
>is the same / one less than the count of Relation members.
>
>

I don't know if anyone will care, but you can't emulate the raw
"Collection" type with this fixed Relation type. That is, a collection
of tuples, each of which may be of differing length and type.

This is what leads me to think that Collection is the more generic role.
I'm not saying Relations are not useful, perhaps they are more useful
than Collections in the practical case, but they are a sub-type.

Also, I don't agree with the notion of a "header" of each relation. It
has a type for each tuple item, sure, but "header" just sounds like the
sort of thing you want in a ResultSet, not a Relation.

Sam.

Darren Duncan

unread,
May 5, 2006, 4:11:45 PM5/5/06
to perl6-l...@perl.org
At 8:01 PM +1200 5/5/06, Sam Vilain wrote:
>Also, I don't agree with the notion of a "header" of each relation. It
>has a type for each tuple item, sure, but "header" just sounds like the
>sort of thing you want in a ResultSet, not a Relation.
>Sam.

A relation's heading is essentially the definition of the relation's
structure, and is not redundant if the relation has no tuples in it,
which is valid.

A comment I back-logged on #perl6 yesterday suggested that what I was
proposing with relations is sort of a retread of what classes are,
where each is defined in terms of zero or more named and typed
attributes.

I just want to weigh in that I see that as partly true (the same
could also be said that it retreads what C structs are), and
therefore perhaps some language constructs related to class
definitions could be reused with relation definitions.

Perhaps "Relation" in Perl 6 would best be a role and/or meta-class
with factory methods that produce other classes with common interface
elements, such as all the relational algebra methods, and/or objects
each of which has its own class. Under this system, each time you
perform an operation like a join, you are potentially making a new
class, because its attributes are likely different than its
predecessors.

Frankly, with Perl 6 being what it is, there are probably numerous
ways to implement what I'm looking for, and I don't exactly know what
would be best.

For now I'll just follow the simple path with a single Relation class
in Pugs' ext/Relation/ directory for demonstration purposes and
evaluate changes later.

-- Darren Duncan

Darren Duncan

unread,
May 5, 2006, 11:20:28 PM5/5/06
to perl6-l...@perl.org
First of all, Sam Vilain, thank you for your responses.

Giving these issues more thought, I'm am now leaning towards the idea
that the best way to provide relational algebra in Perl 6 is that the
relation-land Tuple and Relation each be a Role which various other
classes can provide to their users.

For one thing, this lets people use more types of data collections in
the forms where they currently sit, rather than requiring them to
first convert the data into separate Tuple or Relation objects, in
order to apply relational algebra to them. And I suppose that's "the
Perl way". Having things just work the way people intuit that they
ought to, given conceptually similar examples.

Moreover, some implementations of the Tuple and Relation roles can go
beyond the basics and provide features often associated with those
roles but that aren't part of the core algebra (such as various kinds
of type based or whole-set constraints, or various tied means of
persistence), while other implementations of those roles don't have
to.

Moreover, one key distinction that some implementations can have from
others is to whether they are mutable. Some implementations could be
mutable and others not. All of the generic relational algebra
operators are not mutators, so they will work equally well with
immutable or mutable implementations. That said, for this to work
properly in all cases, Perl 6 will need to provide a way (and it
probably does) for a routine to examine the signature of an anonymous
routine passed in as an argument, and make sure that its parameters
are all read-only; some generic relational operators, such as
restrict() and extend(), would take routines as arguments, like
grep() and map() do, and we don't want them to try mutating their
arguments.

As an aside, you'll recall that I said any other type could be
implemented over relations, such as sets or arrays etc. However,
every attribute ("column") of a relation has a name, but things like
sets or arrays conceptually don't need them and so having sets or
arrays or hashes etc implement the Relation role could be difficult,
one might think. But I say there is no problem here. We can simply
reuse the names of "methods" we already have on those types. For
example, a Hash object seen through a Relation role would have 2
attributes named "keys" and "values". With sets, its just "members"
or "values" or whatever. Though I suppose the names may need more
creativity with a shaped hash or array, where each dimension should
be referrable to by a distinct name (a dimension index number if
necessary), unless we cop out and make "keys" for a shaped hash or
array to be of type Seq, where the one value encompasses all
dimensions; such an approach is perfectly valid for a Relation role,
but it may or may not be less useful in practice.

Getting back to the main topic, if we want to implement the Tuple or
Relation role in a way that each attribute ("column") of such is
specifically typed (because that tends to be the most common or
useful practice), it probably would work best to employ a class
factory (implementing said roles) where each distinct heading of a
Tuple or Relation corresponds to a distinct actual class definition,
where the relation-land attributes correspond 1:1 with actual class
attributes, and all the features for addressing, manipulating, and
defining said attributes with types can be reused. In this
situation, nearly every generic relational algebra operator would
necessarily be a factory for new classes; and there needs to be a way
to examine existing classes in enough detail to know what attributes
they have and/or clone parts of them into new ones (as I assume
Perl's various .meta provide). This approach is probably the most
powerful of all as an implementation.

This email is by no means exhaustive, but feedback is appreciated.

And keep in mind that the most important thing I'm suggesting here is
both that Tuple and Relation are roles, and that as many built-ins as
reasonable "do" (appropriately restricted versions of) them; if
people have disagreements, its mainly with that sub-suggestion that I
want to know about them.

Thank you. -- Darren Duncan

mAsterdam

unread,
May 6, 2006, 9:06:58 AM5/6/06
to perl6-l...@perl.org
Prompted by Darren Duncan's proposal on Relation type objects
I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html
and wondered how Interval type objects would fit in.

I couldn't imagine how. Now that isn't a surprise
(not for lack of imagination but for lack of perl6 knowledge).
I tried to find anything relevant but didn't succeed.

Could somebody provide some clues, please?

Darren Duncan

unread,
May 6, 2006, 4:41:41 PM5/6/06
to perl6-l...@perl.org

Can we first clarify that this describes what you are referring to:

http://en.wikipedia.org/wiki/Interval_(mathematics)

From what I've read there, I don't see an existing equivalent Perl 6 type.

Some people may confuse it with a Range, but I don't think so since a
Range progresses in discrete increments, while an Interval would be
continuous.

Moreover, the existing Set wouldn't overlap it because Set deals with
a finite collection of discrete members. (Similarly, a Relation is
explicitly discrete and finite by definition.)

I assume you aren't referring to the temporal specific thing called
an interval.

-- Darren Duncan

Larry Wall

unread,
May 6, 2006, 5:03:05 PM5/6/06
to perl6-l...@perl.org
On Sat, May 06, 2006 at 01:41:41PM -0700, Darren Duncan wrote:
: Some people may confuse it with a Range, but I don't think so since a
: Range progresses in discrete increments, while an Interval would be
: continuous.

No, Range objects in Perl 6 are defined to be intervals unless used
in a context that implies discrete increments, such as counting in
list context. But if you say

$x ~~ 1.2 ..^ 3.4

it is exactly equivalent to

1.2 <= $x < 3.4

The main point of context is to avoid an explosion of types.

Larry

James Mastros

unread,
May 6, 2006, 5:19:01 PM5/6/06
to Darren Duncan, perl6-l...@perl.org
On Sat, May 06, 2006 at 01:41:41PM -0700, Darren Duncan wrote:
> Some people may confuse it with a Range, but I don't think so since a
> Range progresses in discrete increments, while an Interval would be
> continuous.
A range listifies to a (potentially) finite list of discrete elements, but
it compares as a range. 1.1 should ~~ 1..2; pugs thinking that's false is a
bug, not a feature.

Of course, that doesn't mean implementing range in a subset of perl6 without
it isn't interesting, and possibly useful for bootstrapping.

-=- James Mastros

Darren Duncan

unread,
May 6, 2006, 6:17:34 PM5/6/06
to perl6-l...@perl.org
At 2:03 PM -0700 5/6/06, Larry Wall wrote (in reply):

>No, Range objects in Perl 6 are defined to be intervals unless used
>in a context that implies discrete increments, such as counting in
>list context. But if you say
>
> $x ~~ 1.2 ..^ 3.4
>
>it is exactly equivalent to
>
> 1.2 <= $x < 3.4
>
>The main point of context is to avoid an explosion of types.

At 10:19 PM +0100 5/6/06, James Mastros wrote (also in reply):


>A range listifies to a (potentially) finite list of discrete elements, but
>it compares as a range. 1.1 should ~~ 1..2; pugs thinking that's false is a
>bug, not a feature.

Okay, thank you both for clarifying this.

Conceptually in my mind, a Range is entirely appropriate to represent
a mathematical interval, but I was mistaken about Range being more
constrained than it actually is.

So, there you go mAsterdam; Range is indeed the interval you are looking for.

-- Darren Duncan

mAsterdam

unread,
May 6, 2006, 6:35:09 PM5/6/06
to perl6-l...@perl.org
Darren Duncan wrote:
> mAsterdam wrote:
>
>> Prompted by Darren Duncan's proposal on Relation type objects
>> I looked at http://dev.perl.org/perl6/doc/design/syn/S06.html
>> and wondered how Interval type objects would fit in.
>>
>> I couldn't imagine how. Now that isn't a surprise
>> (not for lack of imagination but for lack of perl6 knowledge).
>> I tried to find anything relevant but didn't succeed.
>>
>> Could somebody provide some clues, please?
>
> Can we first clarify that this describes what you are referring to:
>
> http://en.wikipedia.org/wiki/Interval_(mathematics)

Mathematical concepts never seem to completely
survive implementation in computing, but basically yes,
that was what I was thinking about.

> From what I've read there, I don't see an existing equivalent Perl 6 type.
>
> Some people may confuse it with a Range, but I don't think so since a
> Range progresses in discrete increments, while an Interval would be
> continuous.
>
> Moreover, the existing Set wouldn't overlap it because Set deals with a
> finite collection of discrete members. (Similarly, a Relation is
> explicitly discrete and finite by definition.)
>
> I assume you aren't referring to the temporal specific thing called an
> interval.

Date, Darwen and Lorentzos (Temporal data and the relational
model, 2003) build their temporal reasoning on top of
point types and interval types - they require them
allright - but I don't see how interval types (and point
types) require temporal stuff, i.o.w. I don't really
see how they are temporal specific (unless you are
thinking of SQL intervals).

mAsterdam

unread,
May 6, 2006, 6:45:12 PM5/6/06
to perl6-l...@perl.org

I hope it is also the appropriate type for implementing
relations with temporal attributes.

Thank you all for your prompt discussion.

Darren Duncan

unread,
May 6, 2006, 7:34:50 PM5/6/06
to perl6-l...@perl.org
At 12:45 AM +0200 5/7/06, mAsterdam wrote:
>>Okay, thank you both for clarifying this.
>>
>>Conceptually in my mind, a Range is entirely appropriate to
>>represent a mathematical interval, but I was mistaken about Range
>>being more constrained than it actually is.
>>
>>So, there you go mAsterdam; Range is indeed the interval you are looking for.
>
>I hope it is also the appropriate type for implementing
>relations with temporal attributes.
>
>Thank you all for your prompt discussion.

It should work just fine.

Keep in mind that your concern about relations is orthogonal to the
concern about intervals or temporal data.

An attribute of a relation can be any arbitrary data type at all
(including another relation).

So the only real concern here is whether there is a data type that
can represent a single piece of temporal data. But one can easily be
defined using Perl's standard class definition abilities if it isn't
pre-defined.

Note that I am of the opinion that Perl probably should not have
built-in data types that are specific to temporal or spacial data; it
is better for these to be extensions (like "DateTime" etc) defined
over built-ins like numbers or ranges or collections. Temporal or
spacial data in common use today is just too complicated and
non-generic, I think.

(Whereas, the existing built-ins, and relations, are very generic and simple.)

-- Darren Duncan

mAsterdam

unread,
May 6, 2006, 8:17:38 PM5/6/06
to perl6-l...@perl.org
Darren Duncan wrote:
> At 12:45 AM +0200 5/7/06, mAsterdam wrote:
>
>>> Okay, thank you both for clarifying this.
>>>
>>> Conceptually in my mind, a Range is entirely appropriate to represent
>>> a mathematical interval, but I was mistaken about Range being more
>>> constrained than it actually is.
>>>
>>> So, there you go mAsterdam; Range is indeed the interval you are
>>> looking for.
>>
>> I hope it is also the appropriate type for implementing
>> relations with temporal attributes.
>>
>> Thank you all for your prompt discussion.
>
> It should work just fine.
>
> Keep in mind that your concern about relations is orthogonal to the
> concern about intervals or temporal data.

I hope (and think) you are right about that regarding
implementing relations. Using them correctly is another
story though. I don't think Date, Darwen & Lorentzos
lightly took the step of introducing 6NF in 2003.

> An attribute of a relation can be any arbitrary data type at all
> (including another relation).

Aside, about RVA (relation valued attibutes): I read at
comp.database.theory that Hugh Darwen has introduced
gu(group/ungroup)NF a month ago.

> So the only real concern here is whether there is a data type that can
> represent a single piece of temporal data. But one can easily be
> defined using Perl's standard class definition abilities if it isn't
> pre-defined.

I really don't yet see how to define point types and interval
(range) types based on those. I think you (we) /will/ need them
(*not* neccesarily Perl 6 built-in) if ...

> Note that I am of the opinion that Perl probably should not have
> built-in data types that are specific to temporal or spacial data; it is
> better for these to be extensions (like "DateTime" etc) defined over
> built-ins like numbers or ranges or collections. Temporal or spacial
> data in common use today is just too complicated and non-generic, I think.
>
> (Whereas, the existing built-ins, and relations, are very generic and
> simple.)

... you, like I, want temporal and spacial data as simple
and generic as possible.

Darren Duncan

unread,
May 6, 2006, 9:06:49 PM5/6/06
to perl6-l...@perl.org
At 2:17 AM +0200 5/7/06, mAsterdam wrote:
>I hope (and think) you are right about that regarding
>implementing relations. Using them correctly is another
>story though. I don't think Date, Darwen & Lorentzos
>lightly took the step of introducing 6NF in 2003.
>
>Aside, about RVA (relation valued attibutes): I read at
>comp.database.theory that Hugh Darwen has introduced
>gu(group/ungroup)NF a month ago.

mAsterdam, I think we should end this sub-thread as I see these
points you are now bringing up are well beyond the scope of what the
Perl 6 language designers need to know or care about so I don't see a
need to continue it on list.

Even for me personally, I don't think this is anything to worry about.

But perhaps to explain why I think this ...

Ignoring 1NF, which relations are always in by definition (they
contain no duplicate tuples/rows), but things like SQL tables or
non-relation collections could possibly not be, the 2NF+ have nothing
to do with the actual definitions of relations themselves or the
ability to perform relational algebra, which is all that the Perl
language and/or extension classes to it need to know about.

The 2nd and higher "normal forms" are just formal labels applied to
certain best practices that one can follow when designing a
relational database. They are efforts to further reduce redundancy
in the collection of relations making up a relational database. Best
left to the users to make decisions about rather than the language
designers.

>>So the only real concern here is whether there is a data type that
>>can represent a single piece of temporal data. But one can easily
>>be defined using Perl's standard class definition abilities if it
>>isn't pre-defined.
>
>I really don't yet see how to define point types and interval
>(range) types based on those. I think you (we) /will/ need them
>(*not* neccesarily Perl 6 built-in) if ...
>

>... you, like I, want temporal and spacial data as simple
>and generic as possible.

You can do it simply, kind of like this:

class Point { has Real $x; has Real $y; };

subset Interval of Range where { all( .items ).does(Real) };

-- Darren Duncan

Darren Duncan

unread,
May 6, 2006, 9:15:34 PM5/6/06
to perl6-l...@perl.org
At 6:06 PM -0700 5/6/06, Darren Duncan wrote:
>You can do it simply, kind of like this:
>
>class Point { has Real $x; has Real $y; };
>
>subset Interval of Range where { all( .items ).does(Real) };

Er, you should read 'Real' as 'Num' (I originally meant Rational,
which no longer exists in the newest S06); I meant to say:

class Point { has Num $x; has Num $y; };

subset Interval of Range where } all( .items ).does(Num) };

-- Darren Duncan

mAsterdam

unread,
May 6, 2006, 9:41:18 PM5/6/06
to perl6-l...@perl.org
Darren Duncan wrote:
> At 2:17 AM +0200 5/7/06, mAsterdam wrote:
>
>> I hope (and think) you are right about that regarding
>> implementing relations. Using them correctly is another
>> story though. I don't think Date, Darwen & Lorentzos
>> lightly took the step of introducing 6NF in 2003.
>>
>> Aside, about RVA (relation valued attibutes): I read at
>> comp.database.theory that Hugh Darwen has introduced
>> gu(group/ungroup)NF a month ago.
>
>
> mAsterdam, I think we should end this sub-thread as I see these points
> you are now bringing up are well beyond the scope of what the Perl 6
> language designers need to know or care about so I don't see a need to
> continue it on list.
>
> Even for me personally, I don't think this is anything to worry about.

Ok.

> But perhaps to explain why I think this ...
>
> Ignoring 1NF, which relations are always in by definition (they contain
> no duplicate tuples/rows), but things like SQL tables or non-relation
> collections could possibly not be, the 2NF+ have nothing to do with the
> actual definitions of relations themselves or the ability to perform
> relational algebra, which is all that the Perl language and/or extension
> classes to it need to know about.
>
> The 2nd and higher "normal forms" are just formal labels applied to
> certain best practices that one can follow when designing a relational
> database. They are efforts to further reduce redundancy in the

No. Common misconception - but indeed not relevant to p.p6.l

> collection of relations making up a relational database. Best left to
> the users to make decisions about rather than the language designers.
>
>>> So the only real concern here is whether there is a data type that
>>> can represent a single piece of temporal data. But one can easily be
>>> defined using Perl's standard class definition abilities if it isn't
>>> pre-defined.
>>
>>
>> I really don't yet see how to define point types and interval
>> (range) types based on those. I think you (we) /will/ need them
>> (*not* neccesarily Perl 6 built-in) if ...
>>
>> ... you, like I, want temporal and spacial data as simple
>> and generic as possible.
>
>
> You can do it simply, kind of like this:
>
> class Point { has Real $x; has Real $y; };
>
> subset Interval of Range where { all( .items ).does(Real) };

How to go about about having cyclic intervals over
(weekday) points e.g. Fri..Mon ?

If you want to discuss this off-list - the email-address
simply works :-) (huge amounts of spam though :-( )

Larry Wall

unread,
May 6, 2006, 9:30:06 PM5/6/06
to perl6-l...@perl.org
On Sat, May 06, 2006 at 06:15:34PM -0700, Darren Duncan wrote:
: Er, you should read 'Real' as 'Num' (I originally meant Rational,
: which no longer exists in the newest S06);

Rational still exists in S02--we just don't automatically promote
anything to it currently. (A pragma could change that default in
a particular lexical scope, of course.) The main problem with
Rationals is that they tend to waste a lot of bits maintaining
precision that is of little or no use in the real world, and they
can't easily be sized in advance of the calculation if you're planning
to store them in a particular spot. Floaters win on both counts.

Larry

Sam Vilain

unread,
May 8, 2006, 1:30:44 AM5/8/06
to Darren Duncan, perl6-l...@perl.org
Darren Duncan wrote:

>>Also, I don't agree with the notion of a "header" of each relation. It
>>has a type for each tuple item, sure, but "header" just sounds like the
>>sort of thing you want in a ResultSet, not a Relation.
>>Sam.
>>
>>
>A relation's heading is essentially the definition of the relation's
>structure, and is not redundant if the relation has no tuples in it,
>which is valid.
>
>

The Relation has a type, which is a Relation of Tuples of something. The
"Header" you refer to is the higher order part of the Tuple, the
"something".

Sam.

0 new messages