Q1 Server Post Mortem

51 views
Skip to first unread message

Ian Boston

unread,
Nov 17, 2010, 6:10:29 AM11/17/10
to Nakamura List
Hi,

So the Q1 release is out the way (in 4h) and we are moving onto working on Q2.
You know that we should be moving to a more agile approach soon, although the details of that, how it will work and critically what we, the server developers will work on is unknown. I almost used server team, but that wont really exist in the new structure since the teams will be cross functional. Until that happens I want to take a moment to look at where we are and what is important to us.

This is all with hindsight, in the cold light of day, and brutally honest.

The Q1 release was (invho) a disaster. We were targeting to be able to support 4K users on a single JVM and we struggled to support 30. Most of there server was Ok but sparse and complex searches were diabolical as was anything that tried to write while other things were reading. Reading performance was/is fantastic but our target audience have write permission and so the server is not 100% read.

We should have discovered this months before release, but due to the single threaded nature of our integration tests, and the total absence of load testing except in the last few weeks before the scheduled release we did not. This situation was made 100 times worse by nearly all the of the UI driven data feeds appearing in the last 8 weeks of work forcing us to make a mad scramble to implement, or damage the breath of features in the release.

In development, feature scope, available resources, delivery timescale and quality are finely balanced. You cannot increase or decrease one without effecting the others. We allowed feature scope to increase without changing resources or timescale. This left us with no place to go but to slash quality.... which we did with great efficiency. That has had an effect that will live with us for many iterations, either we do something about it or we will have to increase resource/timescale or reduce feature scope to accommodate. Slashing quality has a cumulative effect on everything, forever, until its raised.

Where do we go now?
We have to increase quality again.
We have to solve the performance problems post haste and make the server support a much larger number of writing concurrent users.
We have to fix the development process (in progress, re agile, cross functional) so we don't get forced for slash quality again.


I am writing this email because I want to get your opinions, if you don't express your opinion, others will and will tell you how you are going to work, so please do.

Ian

Alan Marks

unread,
Nov 17, 2010, 12:41:53 PM11/17/10
to sakai-kernel
Ian - thanks for writing this honest email. I think the keys to improving from where we are 1) a collaborative effort from the server team on identifying key areas of technical improvement 2) better cross-functional involvement in feature development and 3) greater emphasis on testing the fundamental pillars. 

On 1, we should build a list of areas to target for investigation and development. What needs to be fixed? I've created a Confluence page to brainstorm, please add your thoughts.

On 2, those on the list on the project team should have seen my email on dev process changes that I think will help. Please respond to that thread if you have comments as I do want feedback (as Ian says so nicely, if you don't express your opinion....). And, to be clear, cross functional teams doesn't mean Design and UIDev only, it's key that Server team is heavily involved.

To those on list but not on the project, I'll be providing details via my blog soon, but want the project team to have some input first.

On 3, I am always optimistic that we'll get more project resources for test-leadership in pillars like perf, security, etc. But in the absence of that happening, like, yesterday, I wonder if individuals on the development team would step up to be drivers in particular areas. Not to do all the work but to be the individuals to push us to excellence in those pillar areas such as performance and security. If you are interested, please take a step forward....


--
You received this message because you are subscribed to the Google Groups "Sakai Nakamura" group.
To post to this group, send email to sakai-...@googlegroups.com.
To unsubscribe from this group, send email to sakai-kernel...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sakai-kernel?hl=en.




--

Alan Marks | Sakai 3 Project Director
tele: 425-785-3284 | skype: skramnala

Ray Davis

unread,
Nov 19, 2010, 1:41:26 PM11/19/10
to sakai-...@googlegroups.com
From my point of view, the biggest problems the Nakamura subproject had
delivering Q1 had to do with unrealistic assumptions about what could be
"thrown over the wall." I've never once encountered any hints of "not my
problem" or "take what you're given" attitudes among *individuals* on
the project -- we're gifted with an incredibly good bunch of
collaborators in Sakai 3! But the development process encouraged the UX
team to try to figure out wireframes which might cover everything that
was needed, and the client-side team to try to figure out a
specification which might do everything that was needed, and the
server-side team to try to figure out services which might do everything
that was needed. And none of us got a chance to actually check our
assumptions against Real Life until too late in the schedule.

I hope that the new development structure will be able to reduce the
length of these cycles. One reason for cross-functional teams is to make
everyone equal and immediate owners of a working solution, so that if a
server-side developer finds that they just can't make the original idea
work, or if a UX designer sees essential capabilities being lost or a
veer into unusability, everyone is on board to re-think things.

As Ian says, the scary thing is that we also have a big mess to clean
up, and I don't know of a magic formula to figure out the balance there.
For example, I would rather put time into re-thinking the conceptual
basis of Pooled Content than into trying to make the existing Pooled
Content specification perform better. But I know there's disagreement on
that score.

Alan, in terms up stepping up for ownership, I certainly expect to
continue to focus on integration issues, particularly around groups and
roles -- at this point I've got too much experience in that area to be
able to walk away from it. But whether that matters depends on what
stories get into the sprint.

Best,
Ray

> <mailto:sakai-...@googlegroups.com>.


> To unsubscribe from this group, send email to
> sakai-kernel...@googlegroups.com

> <mailto:sakai-kernel%2Bunsu...@googlegroups.com>.

Alan Marks

unread,
Nov 19, 2010, 9:37:23 PM11/19/10
to sakai-kernel
Good thoughts, Ray, thanks. I think there definitely needs to be a cross-cutting discussion on pooled content. I do not think we are done discussing that issue and the time is about right to dig into some more.

I am not sure quite what you mean by "integration issues". That's pretty broad.

Ian Boston

unread,
Nov 20, 2010, 5:07:23 AM11/20/10
to sakai-...@googlegroups.com

On 19 Nov 2010, at 18:41, Ray Davis wrote:

> Alan, in terms up stepping up for ownership, I certainly expect to continue to focus on integration issues, particularly around groups and roles -- at this point I've got too much experience in that area to be able to walk away from it.


Ray,
I would prefer it if you would mentor in this area rather than take ownership so that you share the knowledge, distribute the decision making process and minimise the risk.

That goes as a general comment for everyone, myself included. If any one individual holds all the knowledge then we wont survive if that individual leaves the project for any reason. At some point funding *will* run out. The reason we are doing community source/open source is to ensure there is sustainability of what we produce when that funding runs out.

Ian


Ray Davis

unread,
Nov 22, 2010, 12:19:31 PM11/22/10
to sakai-...@googlegroups.com
On 11/19/10 6:37 PM, Alan Marks wrote:
> Good thoughts, Ray, thanks. I think there definitely needs to be a
> cross-cutting discussion on pooled content. I do not think we are done
> discussing that issue and the time is about right to dig into some more.
>
> I am not sure quite what you mean by "integration issues". That's pretty
> broad.

True enough. Narrowing just a bit, integrations of personal data, new
application access controls, and organizational roles are areas in which
I'm at least likely to "review a lot more vigorously" (to quote Tony
Stevenson).

Best,
Ray

Reply all
Reply to author
Forward
0 new messages