On Sep 19, 2012, at 12:14 PM, Sheeri Cabral <
sca...@mozilla.com> wrote:
> Why sample and not real data? the db how it is now isn't that large, we could do the whole thing. My guess is there are security issues around that?
Yeah, real data has peoples email addresses and hashed passwords. We cannot move that data out from under lock and key.
Also, real data is too small, we want to be able to create and prepare for a much larger data set that what we're currently supporting.
> Also, a static sample is not like real production...for one, it wouldn't have the fragmentation we had yesterday.
No, but we were able to reproduce the same behavior we saw in production with a data set of size n fragmented about 400% in a different environment just by having a data set that was 4n large.
So In the case yesterday, we were able to account for fragmentation in production simply by increasing the data set size. This not perfect, but I cannot think of a better approach. you?
> On another note, there are tools that can digest the slow query logs, and we can set the slow query logs to log every query (say, for 24 hours and then analyze). We can build up a meta database of types of queries that are actually run (and frequencies too, I believe), so that we can routinely test regular actual load, as opposed to what we think load is.
I love this approach, however it might be difficult to execute this in production for privacy and user data safety reasons.
The spirit of this approach though, using empirical data to prioritize optimization, can be done using the current mix of api requests in production combined with some math.
> -------------
> I could audit/advise, and even lead, but I'd want to know more about the "crew" structure. That's probably a 5-10 minute call…
Ping me on IRC if you'd like to talk. Otherwise, the crew structure is simple. It's a group of people working together for some amount of time to attain a specific set of goals.
Crew goals are simply to :
1. get a stable yet modern version of mysql in production
2. tune the database consciously to maximally leverage available resources
3. Stretch: develop understanding and guidelines for how to change the database structure (approach, cost, etc)
lloyd