Indexed by not working (not talking about the lazy loading).

sledorze

unread,

Jul 30, 2009, 7:53:05 AM7/30/09

to Fénix Framework

Today, when I use the indexBy feature; it generates some code that do
not compile properly; I understand it was not used before.

However for the sake of future compatibility; could it be possible to
provide a working interface that later would be backed by the new
implementation?
if you have a release date for the new implementation it would be nice
to hear about it.

For the curious one; I need it to provide a mapping between local and
globalIds of objects for migration purpose.
Without that I have some O(n^2) id resolution; which do not scale.
Here's the dml part (with the commented indexed by):

class Info ..

class IdMappings;

class IdMapping {
String globalId;
Long localId;
}

relation IdMappingsHasLocalIds {
IdMappings playsRole mappingFromLocal {
multiplicity 1;
}
IdMapping playsRole localId {
multiplicity *;
// indexed by localId;
}
}

relation IdMappingsHasGlobalIds {
IdMappings playsRole mappingFromGlobal {
multiplicity 1;
}
IdMapping playsRole globalId {
multiplicity *;
// indexed by globalId;
}
}

relation InfoHasIdMappings {
IdMappings playsRole mappings {
multiplicity 1;
}
Info playsRole info {
multiplicity 1;
}
}

Joao Cachopo

unread,

Aug 7, 2009, 5:30:21 AM8/7/09

to fenix-f...@googlegroups.com

sledorze <stephane...@gmail.com> writes:

> Today, when I use the indexBy feature; it generates some code that do
> not compile properly; I understand it was not used before.

Yes, indexed relations were once implemented in a prototype (and
described in the ICWE 2006 paper), but never for the FenixEDU webapp.
So, they are not properly implemented yet in the fenix-framework.

> However for the sake of future compatibility; could it be possible to
> provide a working interface that later would be backed by the new
> implementation?
> if you have a release date for the new implementation it would be nice
> to hear about it.

I've not talked with Sérgio Silva recently, but I would say that they
should be ready real soon now...

--
João Cachopo

Stephane Le Dorze

unread,

Aug 10, 2009, 8:12:22 AM8/10/09

to fenix-framework

Damn; I am just understanding there's two Sergio in the place!! :) (is there only two of them?)

I messed myself with between what Sergio and Sergio were working on..

So now it appears clearer for me; the work concerning the lazy (binary tree based) relations implementation is done by Sergio Silva is is by no means tight to the work of Sergio Fernandez on HBase. (correct me if I'm wrong here).

Once you know I would be happy to know also; we will make good use of this (highly desired) feature!

Stephane

2009/8/7 Joao Cachopo <joao.c...@ist.utl.pt>

Joao Cachopo

unread,

Aug 10, 2009, 9:47:51 AM8/10/09

to fenix-f...@googlegroups.com

Stephane Le Dorze <stephane...@gmail.com> writes:

> Damn; I am just understanding there's two Sergio in the place!! :) (is
> there only two of them?)

Yes, there are two (and only two) at the moment:

- Sérgio Fernandes
- Sérgio Silva

> So now it appears clearer for me; the work concerning the lazy (binary
> tree based) relations implementation is done by Sergio Silva is is by
> no means tight to the work of Sergio Fernandez on HBase. (correct me
> if I'm wrong here).

Yes, that's right.

--
João Cachopo

Damián

unread,

Sep 21, 2009, 5:58:37 AM9/21/09

to Fénix Framework

Hello everybody,

Are there any news on the work by both Sérgios on lazy relations and
HBase backend? We're on a pretty tight schedule here and it would be
nice to know if/when we could rely on those new features.

Cheers,

Damián

On 10 août, 15:47, Joao Cachopo <joao.cach...@ist.utl.pt> wrote:

Joao Cachopo

unread,

Sep 22, 2009, 4:56:28 PM9/22/09

to fenix-f...@googlegroups.com

Damián <damian....@gmail.com> writes:

> Are there any news on the work by both Sérgios on lazy relations and
> HBase backend? We're on a pretty tight schedule here and it would be
> nice to know if/when we could rely on those new features.

Hi Damián,

The indexed relations by Sérgio Silva were delayed because, during
testing, Sérgio Silva found out that the performance of this new
solution was not what we were expecting. So, now he is performing some
profiling to see where is the problem and solve it.

The HBase backend, done by Sérgio Fernandes, is stalled. Actually, it
is done, even though Sérgio wants to test it further and write a report
on this work. Yet, as I told before, the work on this backend was not
meant to be production ready soon, because it does not support
clustering of the application servers. We expect to start integrating
the work done in the Pastramy project that addresses specifically the
implementation of a Distributed STM as a means to address this issue,
but that will take a while.

Still, none of these developments should affect significantly your code,
right? They will affect performance, mostly, rather than the
programming model. Moving from the current fenix-framework to a new
version should be transparent (not exactly in the case of the indexed
relations...).

Are you having or expecting any performance problems with the current
version of the fenix-framework?

Best regards,
--
João Cachopo

Stephane Le Dorze

unread,

Sep 23, 2009, 8:37:59 AM9/23/09

to fenix-f...@googlegroups.com

Hi Joao!

We're willing to be able to iterate over relations without loading them all in memory (we could have 10000(0) of these for now and a lot more later).

As we've expected the implementation to become available soon we have not incorporated a pattern to split the realtions with intermediate objects (which to be frank is a burden the end user (gameplay programmers) would be happy to avoid).

Another issue will be the activation of predicates to ensure safety (we have not looked at how to perform it yet) and the change of data schema vs the JVM memory exhaustion (we'll need to do atomic changes on the whole graph to prevent violating predicates and then, ensure safety). Here the HBase backend could have helped from what I understood from previous exhanges.

About these points priorities;

- The former (Sergio Silva's work) is critical for us.

- The second could be delayed without too much impact (however do you have a timescale idea? weeks? months? more?)

Thanks for the quick reply!

Stephane

P.S.: With the HBase backend; is the problem that:

- Several computer cannot do some transactions concurrently.

- Several computer cannot share a same transaction? (if so; didn't know you were using this pattern - it is not mandatory for us (just to let you know) )

Joao Cachopo

unread,

Sep 28, 2009, 5:40:06 PM9/28/09

to fenix-f...@googlegroups.com

Hi!

First of all, I apologize for the long delay in answering but things
have been complicated lately...

Stephane Le Dorze <stephane...@gmail.com> writes:

> We're willing to be able to iterate over relations without loading
> them all in memory (we could have 10000(0) of these for now and a lot
> more later).

When you say that you will iterate over the relations, does that mean
that you want to traverse the entire collection? If so, then indexed
relations would not help there. Actually, in that case the current
approach of loading all of the objects of the collection at once is
probably the best.

Besides, 10000 (or even 100000) of objects are not that many objects
(depending on their sizes, of course).

Do you have any idea already about the size of your database?
And how large are your operations?
And how many do you expect to have?
And what's the ratio of read/write?

Having that information may help in determining whether you may have
performance problems or not, based on our experience.

Premature optimization may be, well, premature...

> As we've expected the implementation to become available soon we have
> not incorporated a pattern to split the realtions with intermediate
> objects (which to be frank is a burden the end user (gameplay
> programmers) would be happy to avoid).

Yes, I agree that we should remove unnecessary burden from programmers
and make the programming model the simplest possible.

But that is somewhat orthogonal to the implementation of indexed
relations, because they will not affect, for the most part, the
programmer. Except that with them you get a new generated method that
fetches an element in a relation given a certain attribute.

So, if this proves to be urgent to you, we may provide a version of the
DML compiler that generates that code, but that iterates over the entire
collection in search of the element. Then, that code is replaced by
indexed relations. But note that indexed relations are almost done...

> Another issue will be the activation of predicates to ensure safety
> (we have not looked at how to perform it yet)

Are you talking about the "consistency predicates" as they exist in the
fenix-framework or about something else?

> and the change of data schema vs the JVM memory exhaustion (we'll need
> to do atomic changes on the whole graph to prevent violating
> predicates and then, ensure safety). Here the HBase backend could have
> helped from what I understood from previous exhanges.

Yes, maybe, but the HBase backend will still take some time until I'm
confident to tell that it is production-ready.

> About these points priorities;
> - The former (Sergio Silva's work) is critical for us.

I've talked with Sérgio Silva today and he has been profiling the code
and made already some improvements, but a little more work is needed
before it is done.

But, again, until you do have a problem with performance (which you
don't, yet, right?), his work will not influence much your own
development.

> - The second could be delayed without too much impact (however do you
> have a timescale idea? weeks? months? more?)

Probably months, yes.

Currently we don't have many resources to allocate to that task and we
have some higher priority tasks that we want to push further.

> Thanks for the quick reply!

Ooops, sorry...

> P.S.: With the HBase backend; is the problem that:
> - Several computer cannot do some transactions concurrently.

Yes. They are not aware of each other. The code presently assumes that
the database is being accessed by a single JVM.

As a final comment, I want to stress out that it would really help if
you could tell us a little more about your workload, and where are you
having problems. Also, in what kind of machines are you planning to run
your application.

Stephane Le Dorze

unread,

Sep 29, 2009, 5:51:44 AM9/29/09

to fenix-f...@googlegroups.com

On Mon, Sep 28, 2009 at 11:40 PM, Joao Cachopo <joao.c...@ist.utl.pt> wrote:

Hi!

First of all, I apologize for the long delay in answering but things
have been complicated lately...

Stephane Le Dorze <stephane...@gmail.com> writes:

> We're willing to be able to iterate over relations without loading
> them all in memory (we could have 10000(0) of these for now and a lot
> more later).

When you say that you will iterate over the relations, does that mean
that you want to traverse the entire collection? If so, then indexed
relations would not help there. Actually, in that case the current
approach of loading all of the objects of the collection at once is
probably the best.

Besides, 10000 (or even 100000) of objects are not that many objects
(depending on their sizes, of course).

Do you have any idea already about the size of your database?
And how large are your operations?
And how many do you expect to have?
And what's the ratio of read/write?

The database will grow based on our customers; we could expect customers counted in millions in the months following the launch (we have a default free service which could generate a lot of accounts quickly with a viral launch).

Our operations should not be very large but retrieving an objet via one of its attribute (i.e. prilary key in some jargon) which will happened a lot and schema migration.

Aside this it is very difficult for me to predict the real workload as it will be business dependent and based on how the customers will react to our adapting product. So I would better have a pessimistic approach (even with optimistic concurrency :) ).

Reads will dominate clearly.

Having that information may help in determining whether you may have
performance problems or not, based on our experience.

Premature optimization may be, well, premature...

Understand; but we need to be able to adapt (very) quickly then.

> As we've expected the implementation to become available soon we have
> not incorporated a pattern to split the realtions with intermediate
> objects (which to be frank is a burden the end user (gameplay
> programmers) would be happy to avoid).

Yes, I agree that we should remove unnecessary burden from programmers
and make the programming model the simplest possible.

Otherwise the abstraction leaks..

But that is somewhat orthogonal to the implementation of indexed
relations, because they will not affect, for the most part, the
programmer. Except that with them you get a new generated method that
fetches an element in a relation given a certain attribute.

So, if this proves to be urgent to you, we may provide a version of the
DML compiler that generates that code, but that iterates over the entire
collection in search of the element. Then, that code is replaced by
indexed relations. But note that indexed relations are almost done...

I prefer you work on the real stuff rather than loose time if the delivery is imminent (will it be avaible within a week / month / 2 months) ? It will be packed with the "one VBox per object under the hood"?

> Another issue will be the activation of predicates to ensure safety
> (we have not looked at how to perform it yet)

Are you talking about the "consistency predicates" as they exist in the
fenix-framework or about something else?

Yes, the consistency predicates; how to define them?

> and the change of data schema vs the JVM memory exhaustion (we'll need
> to do atomic changes on the whole graph to prevent violating
> predicates and then, ensure safety). Here the HBase backend could have
> helped from what I understood from previous exhanges.

Yes, maybe, but the HBase backend will still take some time until I'm
confident to tell that it is production-ready.

And this need the distributed transactions; I understand.

> About these points priorities;
> - The former (Sergio Silva's work) is critical for us.

I've talked with Sérgio Silva today and he has been profiling the code
and made already some improvements, but a little more work is needed
before it is done.

But, again, until you do have a problem with performance (which you
don't, yet, right?), his work will not influence much your own
development.

The potential grow of the number of customers make me nervous; aside the memory problems; it takes time to search for an object in an unsorted collection (until Sergio Silva work is integrated); some web service needs may kill our performances.

> - The second could be delayed without too much impact (however do you
> have a timescale idea? weeks? months? more?)

Probably months, yes.

Currently we don't have many resources to allocate to that task and we
have some higher priority tasks that we want to push further.

That sounds strange to me as I thought it was enabling a lot of scalability and remove a point of failure (which in my mind is very crucial).

> Thanks for the quick reply!

Ooops, sorry...

> P.S.: With the HBase backend; is the problem that:
> - Several computer cannot do some transactions concurrently.

Yes. They are not aware of each other. The code presently assumes that
the database is being accessed by a single JVM.

As a final comment, I want to stress out that it would really help if
you could tell us a little more about your workload, and where are you
having problems. Also, in what kind of machines are you planning to run
your application.

We have no problem right now.

The base machine we have is:

Quad core Xeon 2.33Ghz

250 Go (Raid)

4Go Ram

The JVM is 64bits and the memory will (more than) probably be increased.

Joao Cachopo

unread,

Sep 30, 2009, 4:01:40 PM9/30/09

to fenix-f...@googlegroups.com

Stephane Le Dorze <stephane...@gmail.com> writes:

> The database will grow based on our customers; we could expect
> customers counted in millions in the months following the launch (we
> have a default free service which could generate a lot of accounts
> quickly with a viral launch).
> Our operations should not be very large but retrieving an objet via
> one of its attribute (i.e. prilary key in some jargon) which will
> happened a lot and schema migration.

Therefore the need for the indexed relations...

> Aside this it is very difficult for me to predict the real workload as
> it will be business dependent and based on how the customers will
> react to our adapting product. So I would better have a pessimistic
> approach (even with optimistic concurrency :) ).

OK, I understand and sympathize with your concerns.

> I prefer you work on the real stuff rather than loose time if the
> delivery is imminent (will it be avaible within a week / month / 2
> months) ?

It is not really in my hands, so I can't say for sure. Sérgio is
working on it and only he may give a better estimate of when things will
be ready.

Actually, I believe that if you want to start using it, it is already
sufficiently stable for you to start testing it, even if not
production-ready. Any changes that Sérgio may make to the code will
change only its performance, rather than its functionality, I guess.

If you're interested in starting using it, Sérgio may give you the info
needed for you to get started.

> It will be packed with the "one VBox per object under the hood"?

That is an orthogonal development. I expect to resume the work on that
sometime soon (one or two weeks). But this change should affect only
the memory required for the application. It will have no influence on
the database schema.

> Yes, the consistency predicates; how to define them?

You may define them as I describe in my PhD thesis.

The difference, however, is that, because of persitence, the consistency
predicates that are currently supported by the Fénix Framework can
access only attributes of the class to which the consistency predicate
belongs. This is a severe limitation, I know, but removing it is not
easy, unfortunately.

> Probably months, yes.
>
> Currently we don't have many resources to allocate to that task and we
> have some higher priority tasks that we want to push further.
>
> That sounds strange to me as I thought it was enabling a lot of
> scalability and remove a point of failure (which in my mind is very
> crucial).

Yes, it is, but the change to a persistent store such as HBase is
probably too radical to be made in a project as critical as the FénixEDU
webapp.

So, we're aiming to a more smooth upgrade path, by addressing the same
problems while retaining a relational database in the interim.

As strange as it may seem (or maybe not), ditching entirely relational
databases is not always possible...

And, as a matter of fact, another path that I find promising is using
BerkeleyDB as a persistent store, instead of HBase. The solution that
we envisage for the distribution of the STM would apply equally well to
both solutions and I believe that BerkeleyDB is a more mature solution
for production critical systems. I may be wrong, though, and if you
have information to the contrary, I would be very glad to hear it...

Stephane Le Dorze

unread,

Oct 14, 2009, 11:16:19 AM10/14/09

to fenix-framework

Big delay; Reality check (beta users increase on the real system) - we had a lot of issues (many remote debug tuning - hard time - good pressure - obvious bugs removed) :)

(The problems were not Fenix related)

Ok, so back to the discussion;

2009/9/30 Joao Cachopo <joao.c...@ist.utl.pt>

Stephane Le Dorze <stephane...@gmail.com> writes:

> The database wi ll grow based on our customers; we could expect

> customers counted in millions in the months following the launch (we
> have a default free service which could generate a lot of accounts
> quickly with a viral launch).
> Our operations should not be very large but retrieving an objet via
> one of its attribute (i.e. prilary key in some jargon) which will
> happened a lot and schema migration.

Therefore the need for the indexed relations...

Yes; even the lazy aspect is a second order; the first being having sorted relations.

> Aside this it is very difficult for me to predict the real workload as
> it will be business dependent and based on how the customers will
> react to our adapting product. So I would better have a pessimistic
> approach (even with optimistic concurrency :) ).

OK, I understand and sympathize with your concerns.

> I prefer you work on the real stuff rather than loose time if the
> delivery is imminent (will it be avaible within a week / month / 2
> months) ?

It is not really in my hands, so I can't say for sure. Sérgio is
working on it and only he may give a better estimate of when things will
be ready.

Do you think talking directly to Sergio would help? would you mind if I do so? if not do you want to be in cc? (or I post on the Fenix group?)

Actually, I believe that if you want to start using it, it is already
sufficiently stable for you to start testing it, even if not
production-ready. Any changes that Sérgio may make to the code will
change only its performance, rather than its functionality, I guess.

If you're interested in starting using it, Sérgio may give you the info
needed for you to get started.

Indeed.

> It will be packed with the "one VBox per object under the hood"?

That is an orthogonal development. I expect to resume the work on that
sometime soon (one or two weeks). But this change should affect only
the memory required for the application. It will have no influence on
the database schema.

Ok; just let me know when it is available.

> Yes, the consistency predicates; how to define them?

You may define them as I describe in my PhD thesis.

The difference, however, is that, because of persitence, the consistency
predicates that are currently supported by the Fénix Framework can
access only attributes of the class to which the consistency predicate
belongs. This is a severe limitation, I know, but removing it is not
easy, unfortunately.

Mmm.. I understand; we cannot encode complex predicates; anyway; that's a start.

> Probably months, yes.
>

> Currently we don't have many resources to allocate to that task and we
> have some higher priority tasks that we want to push further.
>
> That sounds strange to me as I thought it was enabling a lot of
> scalability and remove a point of failure (which in my mind is very
> crucial).

Yes, it is, but the change to a persistent store such as HBase is
probably too radical to be made in a project as critical as the FénixEDU
webapp.

So, we're aiming to a more smooth upgrade path, by addressing the same
problems while retaining a relational database in the interim.

As strange as it may seem (or maybe not), ditching entirely relational
databases is not always possible...

It seem strange yes; but I don't know the details were the devil lives.

And, as a matter of fact, another path that I find promising is using
BerkeleyDB as a persistent store, instead of HBase. The solution that
we envisage for the distribution of the STM would apply equally well to
both solutions and I believe that BerkeleyDB is a more mature solution
for production critical systems. I may be wrong, though, and if you
have information to the contrary, I would be very glad to hear it...

I don't know so much but have found this about multi threading and DBs:

http://diaswww.epfl.ch/shore-mt/papers/edbt09johnson.pdf

I don't know about Shore mt, the maturity of the solution and/or the viability of the benchmarks (I am currently writing offline).

I think that going to a solid, mature DB like Berkeley is a very good choice; less variables make things more manageable.

Additionally, when looking for some documentation for the Web site guys about DB & scalability; I found a blog that you may find interesting:

http://highscalability.com/drop-acid-and-think-about-data

It would be nice to have Fenix / Pastramy appear in this list.

As a side work; we're now generating the dml files from internal specifications, allowing us to start the work we've discussed in previous emails:

- annotations to automatically generate of roots domain objects.

- idiomatic Scala and improved interface (no more 'null')

- type safe serialization with implicit versionning

- support for schema transformation - probably using relational algebra;

I lack ad'hoc intersection/union types in Scala to make it type safe; however there could be a type system switch at one point to make that happen - in the future )

Joao Cachopo

unread,

Oct 20, 2009, 4:38:29 PM10/20/09

to fenix-f...@googlegroups.com

Stephane Le Dorze <stephane...@gmail.com> writes:

> Reality check (beta users increase on the real system)

Just a quick question (I'll have to leave the rest of the email for
another day): is there any public info about your system available.

As you may imagine, there is some curiosity among the people on this
side what is it that you're developing.

Stephane Le Dorze

unread,

Oct 21, 2009, 6:58:03 AM10/21/09

to fenix-f...@googlegroups.com