John wrote:
> Surely if they are active pads they are held in memory already so no need to query database?
Hi John, that's a good point. I'm new to the codebase, so I don't know much about how it works... can you help?
What specifically is kept in memory? (i.e. pads? authors? Changesets?)
How long is the info kept in memory? What would happen if there were 90,000 pads -- would they all be in memory?
What happens when a Pad has not been updated in 30 days -- is there still a session for it in memory?
How does the server track the version numbers for a pad and generate a new version number for a changeset?
What happens to the active pads, cached stuff, and version numbers after the server restarts?
I threw out an idea about storing all pad changesets in one big row or a big document (i.e. thinking mongodb), and you asked if I had the perception that would be better than millions/billions of rows. Not a DB expert here -- just asked because anytime a DB has to do a rowscan over tens of millions of rows, things have gotten sloooow. ;-)
I've been looking how many rows various DBs can handle, and most of the benchmark articles dive into how to partition data to improve performance. Partitioning is essentially making smaller tables out of a huge table, so the BTREE indexes are smaller, but with the composite keys, how would I partition the data?
Purpose built document stores like MongoDB seem to have extra magics to make it easier/faster/possible to work with composite keys and values, but since MySQL does not have that, it feels like EtherpadLite may be forcing a square peg in a round hole. Said another way, if you know MySQL or some other SQL DB will be used, why *not* create more than one table? Or if pads are stored as "documents", then why not require something like MongoDB?
FYI - I'm not trying to be all smarty-pants and ask rhetorical questions. I am genuinely interested in better understanding the thoughts behind the storage design, so thanks for the feedback!
Cheers!
-Tim