On 23 Jul 2009, at 18:26, Alex Cruise wrote:
> I'm playing with implementing a toy FRelP system
Cool!
> ("FRP" has been taken
> over--damn you Conal Elliott! ;)
To be fair, Functional Reactive Programming did actually come first !
> and am curious about the repeated use
> of "address" fields (aliased for string) as candidate/foreign keys in
> most of the relations in the real estate example.
>
> In the modern mainstream version of relational dogma this raises red
> flags; we're supposed to refactor the repeated data into a combination
> of a new relation and some number of foreign keys pointing back to
> it.
"refactor the repeated data" - I think it's interesting to dig into
what you mean here.
You probably have in mind something like an "AddressID" which might be
some integer or another and using that instead of the "address" type
(alias) in the relvars of the example, maybe along with another relvar
to record (AddressID, String).
One thing to consider though is that what you've done is to replace
one piece of repeated data with another (albeit 'smaller') one.
As an aside, I've often heard people refer to designs such as this
example as "unnormalized" which is inaccurate (all normalizations are
defined with respect to some set of functional dependencies and we're
not talking about any fundeps here).
I think that this issue arises because genuine lack of normalization
leads to a database in which fewer single-tuple updates are logically
acceptable (you'd need to update all functionally dependent values
simultaneously to avoid violating the fundep), and (I believe) you're
pointing out that you couldn't "change" an address in a single tuple
without violating constraints.
My feeling is that the similarities between these two situations are
however largely superficial. In the former we're concerned with
information represented by the relations themselves (if we think
formally then relations over uninterpreted symbols). In the latter
you're thinking about information "represented" by a value itself (ie
an address in String form). I would argue that this latter kind of
thing is strictly outside the relational model.
Clearly there are some practical issues here, and for some time I've
wondered whether it would make sense to have some kind of "stratified"
system whereby lower tiers could define universes of values over which
higher tiers would define relations. In such a system "address" would
be defined in a base "tier", and the relvars in the example would all
live in a higher tier. Such a system would need to provide
increasingly tight restrictions on updates to lower tiers, as any
changes to a given tier could potentially invalidate arbitrary
constraints at higher tiers.
> Is it an intention of FRelP that there be no arbitrarily assigned
> identifiers, such as the primary/foreign keys we typically use today?
It is certainly my feeling that systems shouldn't impose arbitrarily
assigned identifiers on users /unless/ they actually add value to the
user interface (vehicle registrations, social security numbers) etc.
I also think that often the underlying reasons for using such
arbitrary identifiers stem from various technical limitations in
current systems.
> It somehow seems wrong to use a candidate key (i.e. some data that
> happens to be unique) as a foreign key;
I don't at all see why. This seems to me to be a natural way to model
many things.
Maybe you could expand on your concerns?
--Ben
I'd be interested in collaborating on developing an open source FRP
system in Haskell - let me know if you'd be interested in being
involved in that.
Cheers,
--Ben
Actually, my other question with respect arbitrary identifiers is an
indication of my desire to avoid the ID as well. It seems to me that
there should be a concept of first-class, *typed* references to rows.
For instance, the system shouldn't let you store a PropertyRef value
in a RoomRef attribute, even though their internal representation
might be the identical type.
I think that this issue arises because genuine lack of normalizationleads to a database in which fewer single-tuple updates are logicallyacceptable (you'd need to update all functionally dependent valuessimultaneously to avoid violating the fundep), and (I believe) you'repointing out that you couldn't "change" an address in a single tuplewithout violating constraints.
It's possible that it's the choice of the address attribute that's at
the root of my unease, but I don't think so. An address has obvious
semantic content to humans ("keys with business meaning"),
is very
likely to be subject to edits, and is
quite likely to become non-
unique for business reasons (e.g. multiple buyers and/or sellers at
the same address),
and is even somewhat likely to be broken apart into
separate attributes (e.g. house number, street name...).
To my mind, the half-typed foreign key values that are in current wide
use are a necessary (but not sufficient) layer of abstraction between
tuple values, and references to them.
I think it's a very desirable
property of normalized systems that one can change most aspects of a
relation (e.g. data representation, constraints, etc.) without
modifying any of the relations that refer to it.
Clearly there are some practical issues here, and for some time I'vewondered whether it would make sense to have some kind of "stratified"system whereby lower tiers could define universes of values over whichhigher tiers would define relations. In such a system "address" wouldbe defined in a base "tier", and the relvars in the example would alllive in a higher tier. Such a system would need to provideincreasingly tight restrictions on updates to lower tiers, as anychanges to a given tier could potentially invalidate arbitraryconstraints at higher tiers.
It's an interesting thought, but I don't think Address is a
qualitatively different kind of thing from Property, it just happens
to be used by it, and several other relations. If you could identify
some useful abstractions over relations I think those might be a good
starting point for thinking about an addition to the type system.
I also think that often the underlying reasons for using sucharbitrary identifiers stem from various technical limitations incurrent systems.
I agree, but I strongly believe that first-class references are a
better choice than making direct use of candidate key values, whether
they're human-readable or generated.
It somehow seems wrong to use a candidate key (i.e. some data thathappens to be unique) as a foreign key;I don't at all see why. This seems to me to be a natural way to modelmany things.
I think that from an intuitive standpoint it makes sense to make use
of unique candidate/foreign key values with business meaning, but I
still buy the argument against them, which I think has come about much
more from bitter experience than theoretical concerns.