Nakamura, post v1.0.0

Zach Thomas

unread,

Aug 30, 2011, 5:04:43 PM8/30/11

to Sakai Nakamura

Hello, nakamurians.

I'd like to start a bit of a brainstorming session. We've got plenty
of room for improvement in the server, and I want to hear everyone's
ideas. My stream-of-consciousness notes are up in typewith.me:
http://typewith.me/ox4dNjGQ5z

Here are my personal top four:
1. migration, migration, migration, migration
2. integration tests that match the real world
3. clustering (horizontal scaling, failover)
4. smart load tests that will start letting us measure the
(performance) effectiveness of any architectural decision

Migration is a unique challenge for us because our content system is
schema-less, but there is an implicit schema embedded throughout the
code. Assumptions about the structure of the data can take place both
in UX code and in server code, and we desperately need to isolate and
encapsulate those assumptions.

I'm envisioning as part of our integration tests that every bundle
that reads or writes data (which is to say virtually every bundle) has
tests where we create sample data from the previous OAE release, and
then fail the test if the bundle does not successfully transform the
data to match the new assumptions of the test. It's important that we
figure out how to do this soon, so that we don't wind up with a
monolithic (and nondeterministic) migration problem right before a
release. This is a good place to "move the pain forward," as the
agilists like to say.

Please reply with your ideas about this and your other favorite
problems.

cheers,
Zach

David Roma

unread,

Aug 30, 2011, 8:11:55 PM8/30/11

to sakai-...@googlegroups.com

Thanks for bringing this up Zach.. I agree totally. (I assume you want on list discussion, not etherpad updates)... We had to abandon our data migration in the end because of the complications it just wasn’t worth the effort for the minimal amount of pilot user data. I think unit tests would be a great step forward, not sure how this can be supported on the ui side as well if they are making assumptions on data structures.

Re: System Administration.. Definitely Become user and/or super user.. We will be provisioning non-visible course sites, so when a user has an issue it would be really helpful to login as them (without knowing their password) and see what they are seeing. Perhaps also an interface into all the useful curl commands, e.g. create user, change password, create group, etc.

Audit queries would also be very important.. I remember at a conference this was presented as one of OAE's best features, a timeline showing basically every 'event' that occurred in the system.. I am not sure if this was only when we were using JCR. Are 'events' currently being recorded in the system waiting to be tapped into, or will we essentially loose this information for our early pilots running on v1?

Monitoring system health? Is there anything useful we could show like memory usage, number of sessions, or requests per minute or something.. any early warning signs like average response time, max response times, index data out of sync etc..

Happy brainstorming guys,

Dave.

p.s. clustering is essential for us at csu..

--
You received this message because you are subscribed to the Google Groups "Sakai Nakamura" group.
To post to this group, send email to sakai-...@googlegroups.com.
To unsubscribe from this group, send email to sakai-kernel...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sakai-kernel?hl=en.

Zach Thomas

unread,

Aug 30, 2011, 10:19:41 PM8/30/11

to Sakai Nakamura

Thanks, Dave. Clustering supposedly works now, but someone has to try
to set it up. :-) Do you have any trials of that underway at CSU? The
tricky part is probably the master/slave setup for solr nodes. But
maybe it's not tricky; We just need to try it. Another question for
you: is clustering more important to you for failover, or for scaling?

The clustering thing ties into the system monitoring thing somewhat:
where we eventually want to go with this to make it easy to provision
additional nodes with something like virtual machines with a single
command from a console. The current state of the art is application
stacks that can autoscale, like various features of Amazon AWS, or a
platform-as-a-service like Heroku or CloudFoundry. Incidentally, click
around at http://www.heroku.com/how It is good food for thought about
web architecture. Maybe it's just marketing hot air, but the pictures
sure are pretty!

As far as auditing, the best information we're currently persisting is
activities: we save a little activity record to sparsemapcontent every
time content and groups are updated. This will give you some crude
reporting capability, but in the long run want we want to keep an
audit trail of select system events. There are plenty of events being
generated, but they're not currently being saved anywhere, unless you
count the logs. The best way to keep the events would be something
like Cassandra, MongoDB, or Redis. Some work on this has been done,
but I don't remember the status of it. Carl would be able to say more
on this subject.

I think some of the ground work for become user is in place already,
and in other systems, it hasn't been too hard to solve, so that goal
is within reach.

cheers,
Zach

Reply all

Reply to author

Forward