Questions related to migration / schema changes.

8 views

Skip to first unread message

Stephane Le Dorze

unread,

Jul 23, 2009, 9:15:40 AM7/23/09

to fenix-f...@googlegroups.com, joao.c...@gmail.com, damian....@mimesis-republic.com

Here the extract; please feel free to react on this subject.

On Thu, Jul 23, 2009 at 1:55 PM, Joao Cachopo <joao.c...@ist.utl.pt> wrote:

Stéphane Le Dorze <stephane...@mimesis-republic.com> writes:

> On Mon, Jul 20, 2009 at 11:58 AM, Joao Cachopo <joao.c...@ist.utl.pt> wrote:
>
> Well, what I want, ultimately, is to have freedom for changing the
> persistence layer of the fenix-framework, and make that completely
> transparent to the programming model.
>
> For that I need today a way to clean up completly a Fenix backend (the
> equivalent of a SQL drop tables). Indeed we need to make artists use
> our tools (remember they're remote) (which includes local Fenix
> database) without requiering them to reset their DB by Hand; which
> will happen each time they trigger the merging process / change the
> schema. We just reallized this this week (we were so involved in
> technical stuff we didn't see that mandatory feature was missing) Is
> it possible to add this in the Fenix interface really soon? (I am
> thinking in day(s))

Can't you do this entirely at the Java level? I mean, if your domain
model has a "root" domain object that points to all of your objects, and
you access your objects through this root domain object, then it is just
a matter of creating a new root domain object afresh to clean up your
"database", right?

Garbage collection would then be requiered (we could have a lot of these fresh database recreation in some of our workflows).

Of course that this will not delete the old objects from the database,
but is that important? Actually, it seems to me that is what you would
want if you want to clean-up the domain objects in run-time. That is
done with a transaction, after which there are no objects, but already
running transactions are able to access the old data, as usual.

As I said; it is semantically ok but our Database will start to explode; we need a mean to clean things up.

> We've been discussing more about this lately and some people in the
> Fenix team have come up with a set of good software-development
> practices (at the process level) that I believe will help in managing
> the "schema" changes. But we need to work more on that.
>
> I would be interested to have such information.

The people with whom I've talked about this is subscribed to the google
group. So, please raise the discussion there so that they may comment
on this.

Ok, I will.

> We're thinking to add the dml definition as part of the database
> content; to ease version mismatch detection and then peek the right
> transformers (not done yet).

We would like to have more info on this, also.

Ok :)

> We will not change significantly the structure of the persistence until
> we have a solution that we are comfortable with. In the Fenix webapp
> development, we have schema changes almost on a daily basis...
>
> Daily using the changes at the java level or the SQL one?

Currently, they are done at the SQL level, mostly.

Mmmmmmmm.. the dark side of the force is so attractive.. :)

> I'm not sure I understood you here, but regardless of how persistence
> will be supported, I don't plan to require that all objects fit in
> memory. On the contrary, with the changes that we have in mind, we may
> be able to reduce the memory footprint of the application compared to
> what we have today.
>
> I mean that during a transformation you can pass throught some invalid
> / inconsistent states (cf predicates)). If the consistencies become
> right only at the even of the transformation process then you cannot
> commit your transaction before then. So you keep all modified objects
> in memory; which does not scale.

Oh, I see what you mean.

Yes, if your conversion needs to do everything in a single atomic step,
that may be a problem, indeed. There are several ways of dealing with
this. One is allow writing to the "database" during the transaction
without committing the "database" transaction either until the STM
transaction is validated.

eh.. can we do that right now? really? how? if not is this planned?

> Talking about that; there another need we have; the ability to add /
> rename / remove some dml definitions to ease transformations (to
> support merging the twos dmls to do the transform and then throw away
> old ones).

We (internally) talked also about the need of rename. What we need is
something along the lines of having an entity-type A in one version of
the code, with the corresponding instances of this type, and then in a
subsequent version being able to say that the type A was renamed to B
and still being able to load the old instances of A as instances of B,
now. Of course that the relations that existed between A and other
types are now relations between B and those same types (eventually
renamed, also). Is this what you had in mind?

However, I was imagining this as a deploy-time operation, rather than a
run-time operation. That is, as something that we run between two runs
of the application. Doing this concurrently with other transactions
running seems complicated to me.

During deploy is fine as long as it is done too at the java level (not SQL) (that's what I've thought); can it be done?

What about the old objects and their 'tables'? they should be no more accessibles from the Root.. (GC at the type level?)

> I've been cleaning up some of the code of the fenix-framework, to
> conclude the transition from the old 32-bit object ids to the new 64-bit
> oids. In the process, I've further reduced the dependency on the OJB
> that we still have, making us closer to have all the versions on the DB.
>
> Let clarify something; by "all the versions" you mean to what extend?
> (to cover opened transactions version ranges?) "infinite" lifetime?
> (if so they're a mechanism to garbage stuff?)

Infinite lifetime.

At least with our applications, the rate of changes (on average, less
than 30.000 write transactions per day) storing all of the changes is
not a problem. Moreover, having all of the changes will enable (not by
itself, but still a step in the right direction) very interesting
features such as traveling back in time (useful for many things, such as
reproducing a given bug).

Of course that, for other applications, it may be necessary to GC old
values, but that should be trivial. The JVSTM already has a GC
mechanism in place for the memory. So, it is just a matter of using it
to the persistent store also.

Wow! Ok I understand; I think we'll have a fairly bigger amount of changes within a day.

When is GC planned? with the multiversion release?

> I want to conclude this and to incorporate the indexed and ordered
> relations into the framework before I move onto the multiple versions
> and then to the integration with the work of the other people working on
> the Pastramy project.
>
> A release date for the indexed / ordered relations?

I don't know for sure. The guy doing that has been ironing some small
problems and it should be done by now. Still, I want to look into that
code and merge it back myself to double check it. As I will leave for a
one week holiday in the next week, I will look into this only after the
3rd of August, when I come back.

> do you have a comparison features or measurement a kind of one sheet
> tech selling sheet available? If you have one; it will help me saving
> some time; otherwise; do not loose time doing one.

No, sorry, I don't have such a thing.

Ok; that's fine.

Looks like there's some features which needs to be done before relying on the Fenix without accessing it at the SQL level..

Those missing features prevents us to do a backend agnostic implementation of schema transformation; namely:

- Rename.

- Write before real commit to free memory.. (don't know yet if this is critical - potentially - depends of the transformation).

- Remove objects or GC. (not critical in the midterm).

- Remove old types. (not critical in the midterm - I hope).

The customer question: Potential dates..? visibility on these subjects?

Joao Cachopo

unread,

Aug 7, 2009, 5:54:55 AM8/7/09

to fenix-f...@googlegroups.com

Stephane Le Dorze <stephane...@gmail.com> writes:

> Here the extract; please feel free to react on this subject.
>
> On Thu, Jul 23, 2009 at 1:55 PM, Joao Cachopo <joao.c...@ist.utl.pt> wrote:
>
> Can't you do this entirely at the Java level? I mean, if your domain
> model has a "root" domain object that points to all of your objects, and
> you access your objects through this root domain object, then it is just
> a matter of creating a new root domain object afresh to clean up your
> "database", right?
>
> Garbage collection would then be requiered (we could have a lot of
> these fresh database recreation in some of our workflows).

OK, even though we don't have it yet (a persistent store GC), I don't
see that as particularly difficult to implement. It is just a matter of
having people available to do it ;)

> Oh, I see what you mean.
>
> Yes, if your conversion needs to do everything in a single atomic step,
> that may be a problem, indeed. There are several ways of dealing with
> this. One is allow writing to the "database" during the transaction
> without committing the "database" transaction either until the STM
> transaction is validated.
>
> eh.. can we do that right now? really? how? if not is this planned?

No, that is not possible currently, but from the top of my head I would
say that this would be quite easy to do in the HBase backend developed
by Sérgio Fernandes. He may confirm it.

Supporting this in the SQL backend would be more challenging, though...
At least with our current implementation based on the OJB.

On the other hand, if we assume that the conversion occurs isolated of
all the remaining activities (meaning that no other transaction is
allowed to occur during the conversion), then it becomes much simpler.

> Wow! Ok I understand; I think we'll have a fairly bigger amount of changes within a day.
> When is GC planned? with the multiversion release?

Actually, implementing this type of GC was not one of our top priorities
because in all of our use cases, it doesn't seem to be a problem.

I fact, given that we want to be able to look into the past, we may not
want to have GC at all.

Would you be interested to work on that?

> Looks like there's some features which needs to be done before relying
> on the Fenix without accessing it at the SQL level..
> Those missing features prevents us to do a backend agnostic
> implementation of schema transformation; namely:
> - Rename.

I'll bring this "rename" topic in another post, shortly...

> - Write before real commit to free memory.. (don't know yet if this is
> critical - potentially - depends of the transformation).
> - Remove objects or GC. (not critical in the midterm).
> - Remove old types. (not critical in the midterm - I hope).
>
> The customer question: Potential dates..? visibility on these subjects?

I can't give you an answer to this. None of these is in our shortlist
and I don't have anyone working on them right now...

--
João Cachopo

Sergio Miguel Fernandes

unread,

Aug 7, 2009, 6:57:10 AM8/7/09

to fenix-f...@googlegroups.com

Hi,

Joao Cachopo <joao.c...@ist.utl.pt> writes:

> Stephane Le Dorze <stephane...@gmail.com> writes:
>
>> Yes, if your conversion needs to do everything in a single atomic step,
>> that may be a problem, indeed. There are several ways of dealing with
>> this. One is allow writing to the "database" during the transaction
>> without committing the "database" transaction either until the STM
>> transaction is validated.
>>
>> eh.. can we do that right now? really? how? if not is this planned?
>
> No, that is not possible currently, but from the top of my head I would
> say that this would be quite easy to do in the HBase backend developed
> by Sérgio Fernandes. He may confirm it.

Yes, this is indeed possible in the HBase implementation. It is
possible to write to the database at any time as these values will be
ignored by all other transactions until this transaction in fact
commits. During the commit operation the list of valid version numbers
is updated and only then will other transactions see the new version.

This would allow some values to be "unloaded" from memory and reloaded
later if necessary, thus allowing a transaction to manipulate read-sets
and write-sets bigger than the available RAM.

But I would have to think a bit more about such an implementation. Here
are some issues that came to my mind just now:

- Given that all the read-set (write-set) might not fit in memory, we
might have to persist these sets as well. Or else, find an alternate
(more compact) way to represent them.

- After writing anything to the database all other write transactions
cannot commit until this transaction decides to commit or abort.

But, unfortunately, as it has been mentioned before in this list, we
don't consider the HBase implementation production ready yet:

------------------------------------------------------------------------------
Joao Cachopo <joao.c...@ist.utl.pt> writes:
>
> - I'm not sure that HBase itself is production-ready
> - It does not support multiple servers because the distribution
> aspects are not merged into this implementation
------------------------------------------------------------------------------