Hi Philip, I've been working on a Postgres extension of the DbImplementation interface and was wondering how we should go about committing DB extensions. The first issue is that unit tests use the 'sensorbase.db.impl' property found in the sensorbase.properties file. The Sensorbase unit tests will only be run on the default DerbyImplementation if the 'sensorbase.db.impl' property is set to it's default. The second is that any changes to the DbImplementation interface, like what has happened recently, will cause the author of the change to update all of the DbImplementations. That doesn't sound too hard except that it forces the maintainer to know how to update all of the DB-specific queries.
Should developers just commit extensions to the trunk? Or should we be committing to branches and merging Sensorbase updates as we go.
--On Monday, June 16, 2008 10:07 PM -1000 Austen Ito <austen....@gmail.com> wrote:
> I've been working on a Postgres extension of the DbImplementation > interface
Yeah!
> and was wondering how we should go about committing DB > extensions. The first issue is that unit tests use the > 'sensorbase.db.impl' property found in the sensorbase.properties file. > The Sensorbase unit tests will only be run on the default > DerbyImplementation if the 'sensorbase.db.impl' property is set to > it's default. The second is that any changes to the DbImplementation > interface, like what has happened recently, will cause the author of > the change to update all of the DbImplementations. That doesn't sound > too hard except that it forces the maintainer to know how to update > all of the DB-specific queries.
First off, just in case someone doesn't know about this, there's a wiki page that goes through the development of a different backend db:
Second, yes, you're right, changes to DbImplementation do ripple across all of the implementations. I try to make sure to describe any changes to DbImplementation when I make a release so that alternative DbImplementation maintainers are aware of what's going on.
At Expedia, where they are maintaining an SQL Server DbImplementation, what they've done is commit an internal copy of the DerbyImplementation.java source. Then, whenever they need to do an update, they just download the most recent copy and do an SVN diff to see what's changed since the last time they sync'd. You can do something very similar just using the Browse capability in Google Project hosting.
> Should developers just commit extensions to the trunk? Or should we > be committing to branches and merging Sensorbase updates as we go.
Actually, I'd like to recommend a third approach: create a new google code project to host your code, such as:
hackystat-sensorbase-postgres
Building this project simply creates a small jar file containing the PostgresImplementation class. To compile it, you will need sensorbase.jar on your classpath. Someone wanting to use this Postgres just puts hackystat-postgres.jar on the classpath and changes the property in sensorbase.properties. Testing involves running the JUnit tests for the sensorbase with this alternative backend installed.
I think this is preferable because: - It keeps the sensorbase sources "lightweight" with just a single implementation. - It gives you a convenient place to put Postgres specific documentation. - It doesn't force me to learn Postgres or get you to work on that code in order to make a new sensorbase release. So, development can proceed more quickly. - The Postgres implementation can get slightly out of sync without impacting on the rest of the Hackystat ecosystem.
On a related DB note, I have been continuing to fiddle with the Derby implementation, in particular with indexing. There turns out to be a cool feature in Derby called query plan logging:
Hi Philip, I've followed your advice on how to commit and distribute the Postgres Sensorbase implementation. I created a project page for the hackystat-sensorbase-postgres module at http://code.google.com/p/hackystat-sensorbase-postgres/ . I've added the necessary User and Developer documentation so anyone can feel free to download it and start using it. ;)
The distributions contain a compatible sensorbase.jar file so user's don't have to worry about which Sensorbase version they should be using. The sensorbase-postgres.jar file contains the PostgresDbImplementation class that is automatically added to the classpath when the included sensorbase.jar file is executed.
The database schema that is included is still a work in progress with some changes ahead. I've talked with Aaron about updating the schema to include a ProjectUri table to create relationships between projects and their URIs. We are eating our own dog food at Referentia by using this sensorbase implementation, so there will most likely be schema changes in the future. Hopefully we can nail down the schema sooner rather than later to avoid the problems of users having to update their schemas.
Aaron has expressed interest in the server performance in his recent emails, so I'm leaning on him to come up with some performance numbers to compare the UH implementation vs. the Postgres implementation.
On Tue, Jun 17, 2008 at 9:04 AM, Philip Johnson <john...@hawaii.edu> wrote:
> --On Monday, June 16, 2008 10:07 PM -1000 Austen Ito <austen....@gmail.com> wrote:
>> I've been working on a Postgres extension of the DbImplementation >> interface
> Yeah!
>> and was wondering how we should go about committing DB >> extensions. The first issue is that unit tests use the >> 'sensorbase.db.impl' property found in the sensorbase.properties file. >> The Sensorbase unit tests will only be run on the default >> DerbyImplementation if the 'sensorbase.db.impl' property is set to >> it's default. The second is that any changes to the DbImplementation >> interface, like what has happened recently, will cause the author of >> the change to update all of the DbImplementations. That doesn't sound >> too hard except that it forces the maintainer to know how to update >> all of the DB-specific queries.
> First off, just in case someone doesn't know about this, there's a wiki page that goes > through the development of a different backend db:
> Second, yes, you're right, changes to DbImplementation do ripple across all of the > implementations. I try to make sure to describe any changes to DbImplementation when I > make a release so that alternative DbImplementation maintainers are aware of what's going > on.
> At Expedia, where they are maintaining an SQL Server DbImplementation, what they've done > is commit an internal copy of the DerbyImplementation.java source. Then, whenever they > need to do an update, they just download the most recent copy and do an SVN diff to see > what's changed since the last time they sync'd. You can do something very similar just > using the Browse capability in Google Project hosting.
>> Should developers just commit extensions to the trunk? Or should we >> be committing to branches and merging Sensorbase updates as we go.
> Actually, I'd like to recommend a third approach: create a new google code project to > host your code, such as:
> hackystat-sensorbase-postgres
> Building this project simply creates a small jar file containing the > PostgresImplementation class. To compile it, you will need sensorbase.jar on your > classpath. Someone wanting to use this Postgres just puts hackystat-postgres.jar on the > classpath and changes the property in sensorbase.properties. Testing involves running > the JUnit tests for the sensorbase with this alternative backend installed.
> I think this is preferable because: > - It keeps the sensorbase sources "lightweight" with just a single implementation. > - It gives you a convenient place to put Postgres specific documentation. > - It doesn't force me to learn Postgres or get you to work on that code in order to make > a new sensorbase release. So, development can proceed more quickly. > - The Postgres implementation can get slightly out of sync without impacting on the rest > of the Hackystat ecosystem.
> On a related DB note, I have been continuing to fiddle with the Derby implementation, in > particular with indexing. There turns out to be a cool feature in Derby called query plan > logging:
--On June 28, 2008 4:37:30 PM -1000 Austen Ito <austen....@gmail.com> wrote:
> Hi Philip, > I've followed your advice on how to commit and distribute the > Postgres Sensorbase implementation. I created a project page for the > hackystat-sensorbase-postgres module at > http://code.google.com/p/hackystat-sensorbase-postgres/ . I've added > the necessary User and Developer documentation so anyone can feel free > to download it and start using it. ;)
Yippee! That's great news.
> The distributions contain a compatible sensorbase.jar file so user's > don't have to worry about which Sensorbase version they should be > using.
Unfortunately, they do have to worry. The problem is that, over time, the DbImplementation class (that all database implementations must extend) has been evolving. These changes and additions to the low level API are exploited by the SensorBaseClient class (or any other users of the REST API) to provide better service.
When you provide a distribution with a prebuilt sensorbase.jar file, you essentially create a version of the sensorbase REST API that is locked in time and may not be in sync with the version required by other Hackystat services. This will cause problems for users: they don't know which distribution of the higher level service(s) is compatible with your sensorbase.jar file.
The solution to this problem is to incorporate hackystat-sensorbase-postgres into the daily build and distribution mechanism, so that when distributions are built, the postgres implementation is compiled and unit tested against the latest version of the DbImplementation class. That way users truly "won't have to worry": they just use the Postgres implementation that comes along with the distribution they downloaded.
There are a couple of things I noticed from the project home page:
"The UH Sensorbase uses Derby as it's storage mechanism, which allows only one client connection."
This is a little misleading. Derby has two modes: "network", which is the standard client-server style that you are used to, and "embedded", which offers higher performance with the trade-off that only one client connection can be made at a time. So, please edit this line to:
"The default database implementation of the UH Sensorbase uses Apache Derby in embedded mode, which allows only one client connection."
Note that I am assuming that you still want your implementation to conform to the UH SensorBase REST API. Of course, you can do this while providing a superset of those capabilities, which is fine. If you want to fork with respect to the UH SensorBase REST API, that is a much bigger step, since it means you might not any longer be compatible with higher level services. I hope that's not your intent, because it makes this extension less useful to us, but it's up to you (perhaps you want to write your own higher level services).
One other important thing:
"Having multiple connections allows other RESTful services to be built on top of the Postgres database. One could imagine having the Sensorbase service as the collection mechanism and having additional services consuming data from the database, via efficient queries, and serving data with a REST API."
The current SensorBase implementation does in-memory caching of certain database data (such as project instances). The implementation assumes that this cached data will not be changed by some other process while the SensorBase is running. Thus, to avoid problems with the current implementation, your extensions to the REST API should be limited to GET methods. Put another way, feel free to implement any SELECT statements you want, but if you start doing INSERTS or DELETES, there will be problems.
I am open to changing the SensorBase implementation to get rid of these caches if that will provide important capabilities for alternative db implementations, but would rather wait until the need is clear.
> The solution to this problem is to incorporate hackystat-sensorbase-postgres > into the daily build and distribution mechanism, so that when distributions are > built, the postgres implementation is compiled and unit tested against the > latest version of the DbImplementation class.
Once we get through the testing phase, shall I get started on incorporating mechanism? Is there is a wiki on how to do this?
>So, please edit this line to: > "The default database implementation of the UH Sensorbase uses Apache Derby > in embedded mode, which allows only one client connection."
Sure thing.
> Note that I am assuming that you still want your implementation to conform to > the UH SensorBase REST API.
Absolutely. We wanted to write the extension to conform to the Sensorbase REST API so that higher level services would still work, but still have the flexibility to go in and extend the API to fit our needs. We don't want to throw away all of the hard work done on higher level services.
> The current SensorBase implementation does in-memory caching of certain > database data (such as project instances). The implementation assumes that > this cached data will not be changed by some other process while the SensorBase > is running. Thus, to avoid problems with the current implementation, your > extensions to the REST API should be limited to GET methods. Put another way, > feel free to implement any SELECT statements you want, but if you start doing > INSERTS or DELETES, there will be problems.
It's funny you should mention this. Last week, Aaron went ahead and did an UPDATE to change his user password, but it didn't work. He restarting the Sensorbase and the updated password started working. He came to the conclusion that login information was stored in memory. Another problem was that we had to update one of the Xml* columns (XmlUser I believe) in addition to the Password column in the HackyUser table. That could be a problem in the long run if we have to update data in more than one place.
When we start moving away from the in-memory caching, we might want to think about removing the Xml* and Xml*Ref columns. I remember you said in a previous email that it will add a lot of redundant JAXB code, but that may be better than having to enforce the rule to update data in multiple places.
On Mon, Jun 30, 2008 at 7:58 AM, Philip Johnson <john...@hawaii.edu> wrote:
> --On June 28, 2008 4:37:30 PM -1000 Austen Ito <austen....@gmail.com> wrote:
>> Hi Philip, >> I've followed your advice on how to commit and distribute the >> Postgres Sensorbase implementation. I created a project page for the >> hackystat-sensorbase-postgres module at >> http://code.google.com/p/hackystat-sensorbase-postgres/ . I've added >> the necessary User and Developer documentation so anyone can feel free >> to download it and start using it. ;)
> Yippee! That's great news.
>> The distributions contain a compatible sensorbase.jar file so user's >> don't have to worry about which Sensorbase version they should be >> using.
> Unfortunately, they do have to worry. The problem is that, over time, the > DbImplementation class (that all database implementations must extend) has been > evolving. These changes and additions to the low level API are exploited by the > SensorBaseClient class (or any other users of the REST API) to provide better > service.
> When you provide a distribution with a prebuilt sensorbase.jar file, you > essentially create a version of the sensorbase REST API that is locked in time > and may not be in sync with the version required by other Hackystat services. > This will cause problems for users: they don't know which distribution of the > higher level service(s) is compatible with your sensorbase.jar file.
> The solution to this problem is to incorporate hackystat-sensorbase-postgres > into the daily build and distribution mechanism, so that when distributions are > built, the postgres implementation is compiled and unit tested against the > latest version of the DbImplementation class. That way users truly "won't have > to worry": they just use the Postgres implementation that comes along with the > distribution they downloaded.
> There are a couple of things I noticed from the project home page:
> "The UH Sensorbase uses Derby as it's storage mechanism, which allows only > one client connection."
> This is a little misleading. Derby has two modes: "network", which is the > standard client-server style that you are used to, and "embedded", which offers > higher performance with the trade-off that only one client connection can be > made at a time. So, please edit this line to:
> "The default database implementation of the UH Sensorbase uses Apache Derby > in embedded mode, which allows only one client connection."
> Note that I am assuming that you still want your implementation to conform to > the UH SensorBase REST API. Of course, you can do this while providing a > superset of those capabilities, which is fine. If you want to fork with > respect to the UH SensorBase REST API, that is a much bigger step, since it > means you might not any longer be compatible with higher level services. I > hope that's not your intent, because it makes this extension less useful to us, > but it's up to you (perhaps you want to write your own higher level services).
> One other important thing:
> "Having multiple connections allows other RESTful services to be built on top > of the Postgres database. One could imagine having the Sensorbase service as > the collection mechanism and having additional services consuming data from the > database, via efficient queries, and serving data with a REST API."
> The current SensorBase implementation does in-memory caching of certain > database data (such as project instances). The implementation assumes that > this cached data will not be changed by some other process while the SensorBase > is running. Thus, to avoid problems with the current implementation, your > extensions to the REST API should be limited to GET methods. Put another way, > feel free to implement any SELECT statements you want, but if you start doing > INSERTS or DELETES, there will be problems.
> I am open to changing the SensorBase implementation to get rid of these caches > if that will provide important capabilities for alternative db implementations, > but would rather wait until the need is clear.
--On Monday, June 30, 2008 9:49 AM -1000 Austen Ito <austen....@gmail.com> wrote:
> Hi Philip,
>> The solution to this problem is to incorporate hackystat-sensorbase-postgres >> into the daily build and distribution mechanism, so that when distributions are >> built, the postgres implementation is compiled and unit tested against the >> latest version of the DbImplementation class.
> Once we get through the testing phase, shall I get started on > incorporating mechanism? Is there is a wiki on how to do this?
Here's how I would approach it.
First, obviously, the hackystat-sensorbase-postgres project depends upon hackystat-sensorbase-uh. You need the sensorbase.jar file in order to compile the postgres code.
Second, the best and easiest way to test the postgres project is to simply re-run the current set of sensorbase tests, but using the alternative DB implementation.
We could imagine adding a junit.postgres.build.xml file to the hackystat-sensorbase-uh project, which is almost identical to junit.build.xml but sets up the alternative DB. The reason we shouldn't do this is because it creates a circular dependency between the two projects.
So, what we need is a junit.build.xml file in the hackystat-sensorbase-postgres project that invokes the junit tests from the sensorbase project. To help you do this, I've just committed a tiny change to the sensorbase setTestProperties() method that will set the database to the class specified by the system property sensorbase.db.impl, if such a property exists.
This allows you to use the <sysproperty> element in the <junit> task to set the database implementation to the postgres version. You'll also, of course, have it plus the sensorbase.jar on your classpath. The actual task should invoke tests from the sensorbase-uh project (use HACKYSTAT_SENSORBASE_UH).
Once you can build and run your system, the next step is to edit the *.xml files in the hackystat project to include this project. Just look through them and make the appropriate changes; it should be straightforward.
Let me know if you run into any problems or have any other questions.