How to structure data for a social networking app for scale?

1,151 views
Skip to first unread message

Bill Feng

unread,
Feb 16, 2021, 5:13:02 AM2/16/21
to fireba...@googlegroups.com
Hi all,

started working with Firestore recently, and after going through some of the official guides, it looks like in NoSQL databases it's desirable to optimize data organization for easy reads. In this video, it's suggested that a small chunk of user information can be duplicated and stored with the review written by the user, so that when retrieving the data, we can only retrieve the review document and still have enough information to render the author's name and profile picture. The argument being that more reads are going to happen on the review, whereas the user will change their name or profile picture much less frequently.

This all makes sense, but my question is, to what extent can this scale? For instance, on a social networking app like instagram, a user might go around making many comments on various posts. The number of comments could easily grow into the thousands or tens of thousands, so when the user does eventually update their name or profile picture, will you then have to go and update every comment the user has ever made? This seems inefficient/impractical and error prone to someone used to relational databases, but is this just the way it is for NoSQL, and that it actually doesn't cause much of a problem in practice, or, in situations where the number gets large, we should still store the user info separately and combine it with the comment info when retrieving?

Another more extreme example: a user on instagram can see a list of the people they follow. Following the same logic, I imagine I'd want to store who the user follows in the user document. When one of the users I follow changes their profile, they'll then update their entry in my list of people I follow. However, a celebrity could have millions of followers, so when they update their profile, they'll need to update it in the "following" list of their millions of followers as well. Does this actually work in practice, or does it break down as things scale? And if this doesn't work, then what is the proper way to structure data in firestore for a feature like this?


Thanks in advance for your help!

Sam Stern

unread,
Feb 16, 2021, 6:52:23 AM2/16/21
to Firebase Google Group
Hi Bill,

Thanks for the detailed and thoughtful question. We get some variation of this question a lot.  I always call this question "The Justin Bieber Problem" since Instagram famously struggled to scale their service to keep up with Justin Bieber's posts.

The answer is that you can't build an Instagram or Twitter-scale social network on any single database, these applications present so many unique challenges that if you really want to be the Next Big Thing you will almost certainly have to creatively use multiple databases to solve the task. Cloud Firestore is a great general-purpose NoSQL database for mobile apps with a number of unique strengths, but as you've said there are clearly parts of this problem where it is not well-suited.

For some evidence of this, let's look at some posts that Twitter and Instagram made about their infrastructure:
Those two posts are from 2015-2017 so they're almost certainly out of date, but I think most of us would be happy to be able to scale our app to the size of Instagram in 2015 or Twitter in 2017, both had many millions of users.

Here's how Twitter visualized their database stack:
Screen Shot 2021-02-16 at 11.43.28 AM.png

And here's how Instagram visualized theirs:

Screen Shot 2021-02-16 at 11.44.23 AM.png

We can see that there are some common themes:
  • Both apps use a SQL database for some, but not all, of their data.
  • Both apps use a low-latency Key Value store (Redis, Manhattan)
  • Both have some non-relational database (Cassandra, Graph, etc)
I use these examples not to show you how you must structure your app but to show how even if you have all the money, talent, and computers in the world you won't have a one-size-fits-all data storage solution. However Firestore can almost definitely be a productive part of your app infrastructure today and also when you reach a larger scale.

I would encourage you to consider these problems "good problems" to have. Today pick an architecture that works for your current scale and gives you some breathing room to grow maybe 2x or 5x. If Beyonce or Elon Musk signs up for your social network you're going to have to make new decisions! Just as Instagram and Twitter have re-architected themselves over the years, you will too if you match their success. 

- Sam


--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-tal...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/CAGgoOv%3DJoQZ9KXsSh8Mq9DC-nK%2BVgXXjf1OU%2BQHha8BFLkx-mA%40mail.gmail.com.

Bill Feng

unread,
Feb 17, 2021, 12:10:44 PM2/17/21
to Firebase Google Group
Hi Sam,

Wow, thanks for the detailed response!
I have thought that I most definitely wasn't the first person to think of this question, but unfortunately my Googling skills failed me and I couldn't find a satisfying answer.
So, thank you VERY much for explaining the "Justin Bieber problem", despite having to have likely repeated yourself.

I haven't considered different types of databases, what firestore's limitations are and what it is best suited for.
Somehow I thought there must be a way to mitigate this problem and I just wasn't modeling the data properly.

I agree that scaling problems are good problems to have, and thinking about them too prematurely can be counterproductive.
I guess what I'm really looking for, since I'm new to firebase, is just what's the "typical" or "commonly accepted" way to model data for an app like instagram, so I can avoid trying to reinvent the wheel.

A better version of the question I asked might be:
Given that I want to start with only Firestore to keep things simple, how should I model my data for a social networking app?

Specifically, is the pattern of creating many features that increase the cost of write operations linearly acceptable?
(eg. having the user's name and avatar stored in every follower's "following" list, as well as with every post/comment made, and having to update them all when the user updates their profile)

If so, roughly at what point (eg. when a user gets more than X followers, user accumulated X comments) would this start to face scaling issues, and a more sophisticated setup with more than just firestore would be required?

Alternatively, would it be acceptable (or better) to only store the user id, and fetch the  user details separately using the id, kind of like using a traditional RDBMS?

If you could offer some insights or even just point me to some good reference material, that would be amazing!
Thanks again for your reply, it is much appreciated.

Kind Regards,
Bill

Sam Stern

unread,
Feb 17, 2021, 12:25:37 PM2/17/21
to Firebase Google Group
Hey Bill,

Glad you found that useful! I normally tell people to optimize for the more common operation.  So in a social media app reading a user's profile picture / username is extremely common while a user changing their display name is very uncommon!  So in most cases you're better off making the read operation simple and dealing with the pain of the occasional fan-out write. So in the example of comments, I'd duplicate the user's name and avatar into each one.  You don't want to do N user lookups every time a user loads a comment thread, and it's also probably OK if a user sees a slightly out of date avatar for another user.

However maybe for something like a "following list" the calculation is different. Do users really view this page all that often? Does it need to be lightning fast?  Do you need to load every follower at once or could you paginate? Maybe for that page you can just store a list of user IDs (which never change) and load the names/photos of the user on demand.

There is not going to be a specific "X" for which getting "X+1" users will cause a scaling issue. It depends on your app's usage profile and your own personal tolerance for dealing with cost (performance and monetary).

I wish I had reference material for your use case but unfortunately I don't, if there are any big social networks built on Firestore I am not personally aware of them (which isn't saying much!) so I can't say how they architect their data. And our sample applications don't really address scalability since they're meant as more of a "Hello World" experience.

- Sam

Kato Richardson

unread,
Feb 17, 2021, 12:25:46 PM2/17/21
to Firebase Google Group
Here's another great talk on the topic, covering Twitter infrastructure and how they handle realtime updates and feeds through fanout.


☼, Kato



--

Kato Richardson | Developer Programs Eng | kato...@google.com | 775-235-8398

Bill Feng

unread,
Feb 18, 2021, 5:11:05 AM2/18/21
to Firebase Google Group
Sam,

Thanks for the explanation, that makes sense!
Sounds like for features with few reads, keeping data normalized could be good to avoid having to perform many writes.
I'll keep that in mind when building.

Kato,

Thanks for the link to the talk about twitter timeline fanout writes, it was extremely helpful!
I suppose, while not a in memory database like Redis, Firestore can be utilized in a very similar way to build a social network feed using fanout writes.
If a set up with Redis can handle 20 million followers for twitter, I imagine a couple of thousand users would be no problem for Firestore, which is more than enough to get started.
I'll give it a shot at building something based on this.

Thank you both for the help!! :)

Regards,
Bill

Kato Richardson

unread,
Feb 18, 2021, 1:39:08 PM2/18/21
to Firebase Google Group
Indeed, and what's really great to understand here is where you would end up at massive scale and design accordingly. Firestore can of course cover you until you get to Twitter level of usage, at which point you hire a team of engineers and have them throw a Redis equivalent in front of the hotspots, and open your pockets wide because you've already figured out monetization : D

☼, Kato

Reply all
Reply to author
Forward
0 new messages