Hello dear people,
There's this thing that has been bothering me for a while. I need to work on an application that we expect to scale, and I have trouble reconciling loudly stated best practices and baseline requirements.
Almost all "web2" media relies on a chronological order. When I browse facebook, or google+, or youtube, it's not posts and videos from 10 years ago that I want to (or do) see. Even though Facebook and Google never seem to present a "fully chronological ordered" list, the worst that can happen is that I see a post from two hours ago after one from two days ago. Never after one from 2008.
However, it would seem that distributed NoSQL databases *hate* timestamps. And not only timestamps, but chronological order *in general*. (see
https://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/)
So... I've looked into the alternatives... And the *only* solution I found was that I could prefix timestamps with random "bucket ids", their number potentially scaled based on the "write heat" of each entity, and run a separate query for each bucket... but that makes managing pagination beyond ridiculous, and I worry that it would make queries - like someone just randomly navigating to the front page, or hitting reload - expensive, and my gut feeling is that making the most frequent query type more expensive is a bad idea.
The problem is some way in the future now, but I'm really interested how the big players do it. I mean, just thinking of all the writes facebook must handle...