A prudent question. I believe so.
That is, with a description of a Class, its Attributes, Associations and
Methods I can generate Perl classes, a Tangram schema to map them to a SQL
Database, and ECMAScript `classes' that have the same intrinsic structure.
And then use a JavaScript::Dumper class I've written to dump them to be
compliant with those ECMAScript classes 8-).
I'd like to be able to do that with Classes others have written, too.
> What are the traps and pitfalls that'll need to be avoided in Parrot's
> design to make object persistence not just possible but realistically
> feasible? For example: serializing objects without capturing their
> associations.
Here's a summary that I hope backs up my assertion that it associations
will make it sufficient. Let's consider three different cases of
persistence:
- Mapping to a Table structure, either via an RDBMS or an ISAM library
- Mapping to a Hash structure, either via a Berkeley DB or a filesystem
- Serialising a structure to a stream - either a Data::Dumper/Storable
object or an XML document
1. Table structures.
OO -> RDBMS/ISAM mapping in a nutshell: You generally map each Class to a
table, and each attribute to a column in the database. You add a `type'
column - unique to each Class in the storage. Extra attributes in
sub-classes are either put in seperate tables (``vertical'' mapping) or
extra columns (``horizontal'' mapping). To a SQL database, vertical
mapping is equivalent to horizontal mapping once both tables are joined on
their ID column.
So, you can see that Association entities are different. They won't get
mapped to a single column - one to many relationships are (well, possibly
two - an ID field and a `type' field, to allow for arbitrary inheritance on
the target). Many to one relationships are mapped in the *destination*
class' table. Many to Many relationships are mapped in a *seperate* link
table.
Any of these relationships may have the addition of a slot *key* (ie a hash
key) and/or a slot *number* (ie, an array key). Possibly both.
So, you need to know at least this about the classes:
- All of their allowed attributes, and enough information on each to
practically map that to a column type.
There's the sticky issue of the maximum length of scalars, the
approach taken by Tangram is to default to VARCHAR(255) columns and
force the user to specify columns which should be allowed more. But
that is an implementation detail.
Practically speaking, you have:
- string fields of N octet maximum length
- integers, floats, fixed point numbers
- times & dates
- enumerated types
- sets of enumerated types (ie, bitwise mappable fields)
I see other types as derivatives :-)
It is up to the mapper to map these standard types to SQL column
types. It is up to the user of the mapper to provide mapping
functions for their own custom, non-structural attribute types to a
storage representation.
- Enough information about the structure of their associations to
generate database linking information.
So, we're talking about:
- the source class of the association.
- the minimum/maximum destination multiplicity
- whether the association has an order, and/or key associated with
it.
- a flag to say whether this is an aggregation or a composition
In addition, if you want to be able to place constraints on the source
multiplicity of the relationship (such as specifying a 1 to many
relationship, which might be implemented with a foreign key), or to
navigate back the other way, you need to make it a 2-way association,
so need:
- the destination class of the association.
- the minimum/maximum source multiplicity.
- whether the reverse association has an order, and/or key
associated with it *independant* of the one coming *to* it.
- you also need to know all of the superclass relationships.
- I'm considering object methods as unimportant for data storage.
2. Hash structures.
[note: I'm making this bit up as I go along, as I don't know of any mappers
that actually take this approach. Please take with as many grains as salt
as you wish and feel free to pipe up and correct me where appropriate.]
Strict Hash databases, such as Berkerley, are a slightly different beast.
You'd generally map chunks of your data structure to a single hash entity.
A filesystem directory could also be considered a variant of a hash.
A straightforward approach would be to map from the store's `object ID' to
a serialisation of just that object. I will consider this.
In the store, you would serialise the object's immediate properties and
attributes as normal, along with any unblessed data structures. For
associations, the collection hashes/arrays used to implement the
associations as a normal hash/array, but replace the value with the object
id (probably a combination of the object type, so you know which hash to
look in, and its ID). For references, no collection is needed, just an
object ID.
The Hash mapping engine would need to know the structure of the
relationships to know when to interpret a structure it is thaw'ing as a
value, and when to interpret it as an object reference (to avoid messy
in-band signalling). And when it is freeze'ing, where to stop in its
traversal of the input data structure.
3. Serialisation.
Relatively easy :-). I don't think I can say anything here that people
wouldn't already guess.
As long as mechanisms are put in place to allow modules to bypass object
encapsulation and private/public constraints, and given that Parrot will
have no XS, this allows for all manner of serialisation tools - including
tools that allow you to pass a tree describing which nodes you want to
serialise, important for XML and persistence option number 2, above.
> Will the absence of associations force us to rely on attributes and
> convention to serialize an object's associations? I.e., as I thought you
> were implying, a messy hack that'll come back to haunt us?
Yes and no. It is a messy hack that has already engulfed everyone.
I really mean this - I see an awful lot of code in the world that is
manually dealing with associations. I've found having ready-to-go
associations is absolutely whipuptitudalicious.
[ poop-group list members: this is your cue to highlight the inadequacies
in what I have just stated, so that we can all model the input to our
object persistence tools in the same way in Perl 6. Speak now or hold
your peace for another generation of Perl. ]
--
Sam Vilain, s...@vilain.net
Two Commandments for the Molecular Age
1. Thou shalt not alter the consciousness of thy fellow men.
2. Thou shalt not prevent thy fellow man from altering his or her
own consciousness.
- Dr. Timothy Leary, Pd.D.
OK. Perhaps those structures should have a method/PMC that they must
export which will dump their internal state into a near equivalent Parrot
data structure rather than just having serialisation methods, for the sake
of the tools that want to traverse it rather than just freeze/thaw it to a
stream. ie, something that extracts their state and can be passed to
`bless' et al to reconstruct the original object. The structure freezer &
heater can ask objects to present themselves as core types for
serialisation. I think this would be a great debugging win as well.
--
Sam Vilain, s...@vilain.net
Real computer scientists like C's structured constructs, but they are
suspicious of it because its compiled. (Only Batch freaks and
efficiency weirdos bother with compilers, they're soooo un-dynamic.)