> Today, when I use the indexBy feature; it generates some code that do
> not compile properly; I understand it was not used before.
Yes, indexed relations were once implemented in a prototype (and
described in the ICWE 2006 paper), but never for the FenixEDU webapp.
So, they are not properly implemented yet in the fenix-framework.
> However for the sake of future compatibility; could it be possible to
> provide a working interface that later would be backed by the new
> implementation?
> if you have a release date for the new implementation it would be nice
> to hear about it.
I've not talked with Sérgio Silva recently, but I would say that they
should be ready real soon now...
--
João Cachopo
> Damn; I am just understanding there's two Sergio in the place!! :) (is
> there only two of them?)
Yes, there are two (and only two) at the moment:
- Sérgio Fernandes
- Sérgio Silva
> So now it appears clearer for me; the work concerning the lazy (binary
> tree based) relations implementation is done by Sergio Silva is is by
> no means tight to the work of Sergio Fernandez on HBase. (correct me
> if I'm wrong here).
Yes, that's right.
--
João Cachopo
> Are there any news on the work by both Sérgios on lazy relations and
> HBase backend? We're on a pretty tight schedule here and it would be
> nice to know if/when we could rely on those new features.
Hi Damián,
The indexed relations by Sérgio Silva were delayed because, during
testing, Sérgio Silva found out that the performance of this new
solution was not what we were expecting. So, now he is performing some
profiling to see where is the problem and solve it.
The HBase backend, done by Sérgio Fernandes, is stalled. Actually, it
is done, even though Sérgio wants to test it further and write a report
on this work. Yet, as I told before, the work on this backend was not
meant to be production ready soon, because it does not support
clustering of the application servers. We expect to start integrating
the work done in the Pastramy project that addresses specifically the
implementation of a Distributed STM as a means to address this issue,
but that will take a while.
Still, none of these developments should affect significantly your code,
right? They will affect performance, mostly, rather than the
programming model. Moving from the current fenix-framework to a new
version should be transparent (not exactly in the case of the indexed
relations...).
Are you having or expecting any performance problems with the current
version of the fenix-framework?
Best regards,
--
João Cachopo
First of all, I apologize for the long delay in answering but things
have been complicated lately...
Stephane Le Dorze <stephane...@gmail.com> writes:
> We're willing to be able to iterate over relations without loading
> them all in memory (we could have 10000(0) of these for now and a lot
> more later).
When you say that you will iterate over the relations, does that mean
that you want to traverse the entire collection? If so, then indexed
relations would not help there. Actually, in that case the current
approach of loading all of the objects of the collection at once is
probably the best.
Besides, 10000 (or even 100000) of objects are not that many objects
(depending on their sizes, of course).
Do you have any idea already about the size of your database?
And how large are your operations?
And how many do you expect to have?
And what's the ratio of read/write?
Having that information may help in determining whether you may have
performance problems or not, based on our experience.
Premature optimization may be, well, premature...
> As we've expected the implementation to become available soon we have
> not incorporated a pattern to split the realtions with intermediate
> objects (which to be frank is a burden the end user (gameplay
> programmers) would be happy to avoid).
Yes, I agree that we should remove unnecessary burden from programmers
and make the programming model the simplest possible.
But that is somewhat orthogonal to the implementation of indexed
relations, because they will not affect, for the most part, the
programmer. Except that with them you get a new generated method that
fetches an element in a relation given a certain attribute.
So, if this proves to be urgent to you, we may provide a version of the
DML compiler that generates that code, but that iterates over the entire
collection in search of the element. Then, that code is replaced by
indexed relations. But note that indexed relations are almost done...
> Another issue will be the activation of predicates to ensure safety
> (we have not looked at how to perform it yet)
Are you talking about the "consistency predicates" as they exist in the
fenix-framework or about something else?
> and the change of data schema vs the JVM memory exhaustion (we'll need
> to do atomic changes on the whole graph to prevent violating
> predicates and then, ensure safety). Here the HBase backend could have
> helped from what I understood from previous exhanges.
Yes, maybe, but the HBase backend will still take some time until I'm
confident to tell that it is production-ready.
> About these points priorities;
> - The former (Sergio Silva's work) is critical for us.
I've talked with Sérgio Silva today and he has been profiling the code
and made already some improvements, but a little more work is needed
before it is done.
But, again, until you do have a problem with performance (which you
don't, yet, right?), his work will not influence much your own
development.
> - The second could be delayed without too much impact (however do you
> have a timescale idea? weeks? months? more?)
Probably months, yes.
Currently we don't have many resources to allocate to that task and we
have some higher priority tasks that we want to push further.
> Thanks for the quick reply!
Ooops, sorry...
> P.S.: With the HBase backend; is the problem that:
> - Several computer cannot do some transactions concurrently.
Yes. They are not aware of each other. The code presently assumes that
the database is being accessed by a single JVM.
As a final comment, I want to stress out that it would really help if
you could tell us a little more about your workload, and where are you
having problems. Also, in what kind of machines are you planning to run
your application.
Hi!
First of all, I apologize for the long delay in answering but things
have been complicated lately...
> We're willing to be able to iterate over relations without loadingWhen you say that you will iterate over the relations, does that mean
> them all in memory (we could have 10000(0) of these for now and a lot
> more later).
that you want to traverse the entire collection? If so, then indexed
relations would not help there. Actually, in that case the current
approach of loading all of the objects of the collection at once is
probably the best.
Besides, 10000 (or even 100000) of objects are not that many objects
(depending on their sizes, of course).
Do you have any idea already about the size of your database?
And how large are your operations?
And how many do you expect to have?
And what's the ratio of read/write?
Having that information may help in determining whether you may have
performance problems or not, based on our experience.
Premature optimization may be, well, premature...
> As we've expected the implementation to become available soon we haveYes, I agree that we should remove unnecessary burden from programmers
> not incorporated a pattern to split the realtions with intermediate
> objects (which to be frank is a burden the end user (gameplay
> programmers) would be happy to avoid).
and make the programming model the simplest possible.
But that is somewhat orthogonal to the implementation of indexed
relations, because they will not affect, for the most part, the
programmer. Except that with them you get a new generated method that
fetches an element in a relation given a certain attribute.
So, if this proves to be urgent to you, we may provide a version of the
DML compiler that generates that code, but that iterates over the entire
collection in search of the element. Then, that code is replaced by
indexed relations. But note that indexed relations are almost done...
Are you talking about the "consistency predicates" as they exist in the
> Another issue will be the activation of predicates to ensure safety
> (we have not looked at how to perform it yet)
fenix-framework or about something else?
Yes, maybe, but the HBase backend will still take some time until I'm
> and the change of data schema vs the JVM memory exhaustion (we'll need
> to do atomic changes on the whole graph to prevent violating
> predicates and then, ensure safety). Here the HBase backend could have
> helped from what I understood from previous exhanges.
confident to tell that it is production-ready.
> About these points priorities;I've talked with Sérgio Silva today and he has been profiling the code
> - The former (Sergio Silva's work) is critical for us.
and made already some improvements, but a little more work is needed
before it is done.
But, again, until you do have a problem with performance (which you
don't, yet, right?), his work will not influence much your own
development.
Probably months, yes.
> - The second could be delayed without too much impact (however do you
> have a timescale idea? weeks? months? more?)
Currently we don't have many resources to allocate to that task and we
have some higher priority tasks that we want to push further.
Ooops, sorry...
> Thanks for the quick reply!
Yes. They are not aware of each other. The code presently assumes that
> P.S.: With the HBase backend; is the problem that:
> - Several computer cannot do some transactions concurrently.
the database is being accessed by a single JVM.
As a final comment, I want to stress out that it would really help if
you could tell us a little more about your workload, and where are you
having problems. Also, in what kind of machines are you planning to run
your application.
> The database will grow based on our customers; we could expect
> customers counted in millions in the months following the launch (we
> have a default free service which could generate a lot of accounts
> quickly with a viral launch).
> Our operations should not be very large but retrieving an objet via
> one of its attribute (i.e. prilary key in some jargon) which will
> happened a lot and schema migration.
Therefore the need for the indexed relations...
> Aside this it is very difficult for me to predict the real workload as
> it will be business dependent and based on how the customers will
> react to our adapting product. So I would better have a pessimistic
> approach (even with optimistic concurrency :) ).
OK, I understand and sympathize with your concerns.
> I prefer you work on the real stuff rather than loose time if the
> delivery is imminent (will it be avaible within a week / month / 2
> months) ?
It is not really in my hands, so I can't say for sure. Sérgio is
working on it and only he may give a better estimate of when things will
be ready.
Actually, I believe that if you want to start using it, it is already
sufficiently stable for you to start testing it, even if not
production-ready. Any changes that Sérgio may make to the code will
change only its performance, rather than its functionality, I guess.
If you're interested in starting using it, Sérgio may give you the info
needed for you to get started.
> It will be packed with the "one VBox per object under the hood"?
That is an orthogonal development. I expect to resume the work on that
sometime soon (one or two weeks). But this change should affect only
the memory required for the application. It will have no influence on
the database schema.
> Yes, the consistency predicates; how to define them?
You may define them as I describe in my PhD thesis.
The difference, however, is that, because of persitence, the consistency
predicates that are currently supported by the Fénix Framework can
access only attributes of the class to which the consistency predicate
belongs. This is a severe limitation, I know, but removing it is not
easy, unfortunately.
> Probably months, yes.
>
> Currently we don't have many resources to allocate to that task and we
> have some higher priority tasks that we want to push further.
>
> That sounds strange to me as I thought it was enabling a lot of
> scalability and remove a point of failure (which in my mind is very
> crucial).
Yes, it is, but the change to a persistent store such as HBase is
probably too radical to be made in a project as critical as the FénixEDU
webapp.
So, we're aiming to a more smooth upgrade path, by addressing the same
problems while retaining a relational database in the interim.
As strange as it may seem (or maybe not), ditching entirely relational
databases is not always possible...
And, as a matter of fact, another path that I find promising is using
BerkeleyDB as a persistent store, instead of HBase. The solution that
we envisage for the distribution of the STM would apply equally well to
both solutions and I believe that BerkeleyDB is a more mature solution
for production critical systems. I may be wrong, though, and if you
have information to the contrary, I would be very glad to hear it...
> The database wi ll grow based on our customers; we could expect
> customers counted in millions in the months following the launch (we
> have a default free service which could generate a lot of accounts
> quickly with a viral launch).
> Our operations should not be very large but retrieving an objet via
> one of its attribute (i.e. prilary key in some jargon) which will
> happened a lot and schema migration.
Therefore the need for the indexed relations...
> Aside this it is very difficult for me to predict the real workload asOK, I understand and sympathize with your concerns.
> it will be business dependent and based on how the customers will
> react to our adapting product. So I would better have a pessimistic
> approach (even with optimistic concurrency :) ).
It is not really in my hands, so I can't say for sure. Sérgio is
> I prefer you work on the real stuff rather than loose time if the
> delivery is imminent (will it be avaible within a week / month / 2
> months) ?
working on it and only he may give a better estimate of when things will
be ready.
Actually, I believe that if you want to start using it, it is already
sufficiently stable for you to start testing it, even if not
production-ready. Any changes that Sérgio may make to the code will
change only its performance, rather than its functionality, I guess.
If you're interested in starting using it, Sérgio may give you the info
needed for you to get started.
> It will be packed with the "one VBox per object under the hood"?That is an orthogonal development. I expect to resume the work on that
sometime soon (one or two weeks). But this change should affect only
the memory required for the application. It will have no influence on
the database schema.
> Yes, the consistency predicates; how to define them?You may define them as I describe in my PhD thesis.
The difference, however, is that, because of persitence, the consistency
predicates that are currently supported by the Fénix Framework can
access only attributes of the class to which the consistency predicate
belongs. This is a severe limitation, I know, but removing it is not
easy, unfortunately.
> Probably months, yes.
>
> Currently we don't have many resources to allocate to that task and weYes, it is, but the change to a persistent store such as HBase is
> have some higher priority tasks that we want to push further.
>
> That sounds strange to me as I thought it was enabling a lot of
> scalability and remove a point of failure (which in my mind is very
> crucial).
probably too radical to be made in a project as critical as the FénixEDU
webapp.
So, we're aiming to a more smooth upgrade path, by addressing the same
problems while retaining a relational database in the interim.
As strange as it may seem (or maybe not), ditching entirely relational
databases is not always possible...
And, as a matter of fact, another path that I find promising is using
BerkeleyDB as a persistent store, instead of HBase. The solution that
we envisage for the distribution of the STM would apply equally well to
both solutions and I believe that BerkeleyDB is a more mature solution
for production critical systems. I may be wrong, though, and if you
have information to the contrary, I would be very glad to hear it...
> Reality check (beta users increase on the real system)
Just a quick question (I'll have to leave the rest of the email for
another day): is there any public info about your system available.
As you may imagine, there is some curiosity among the people on this
side what is it that you're developing.