Firestore Write-Limit

692 views
Skip to first unread message

Andrew Decker

unread,
Jan 13, 2018, 2:59:52 PM1/13/18
to Firebase Google Group
I'm excited to try out Firestore!  One thing I'm not clear about how to handle the rate limit for documents:

Maximum write rate to a collection in which documents contain sequential values in an indexed field

This is discussed on StackOverflow here, and Gil Gilbert clarifies the statement nicely.  But his response leaves me wondering how to handle time-related data.  In my experience, that kind of data has been important in scheduling, say, periodic push notifications.  If storing timestamps is a limiting factor, what are some alternative approaches?

For push notifications, I suppose I could split the collection storing device tokens down further so that there are fewer documents per collection. 

What about the case where I just want to store something like "last_updated" on a user document?  Will this be a problem?  500 logins per second seems like a lot, so maybe it's not an issue.  But then again, 100,000 concurrent users also seems like a lot, and FireStore is working to break that limitation. 

Thanks for your input!
 

Gil Gilbert

unread,
Jan 16, 2018, 1:05:29 PM1/16/18
to Firebase Google Group
On Saturday, January 13, 2018 at 11:59:52 AM UTC-8, Andrew Decker wrote:
I'm excited to try out Firestore!  One thing I'm not clear about how to handle the rate limit for documents:

Maximum write rate to a collection in which documents contain sequential values in an indexed field

This is discussed on StackOverflow here, and Gil Gilbert clarifies the statement nicely.  But his response leaves me wondering how to handle time-related data.  In my experience, that kind of data has been important in scheduling, say, periodic push notifications.  If storing timestamps is a limiting factor, what are some alternative approaches?

Hi again!

To understand how to work around the consecutive value limitation, it's useful to understand how we store data. Firestore (just like nearly every Google storage system back to Bigtable) allocates ranges of rows to specific servers. To the extent that your writes hit the same server they'll be limited. To gain higher throughput than the default you need to scatter your writes across different row ranges.

What this means generally is to create some prefix of the consecutive value and create a composite index on the combination. Then instead of the limit applying to documents in the collection, the limit would apply to the domain created by the prefix.

An important consideration for while Firestore is in beta is that there's no way to disable the default per-field indexes. This is a limitation we're actively working on removing. However this means that if you need very high throughput *today* you need to scatter your writes through subcollections (effectively creating a composite index by creating a longer primary key).

For push notifications, I suppose I could split the collection storing device tokens down further so that there are fewer documents per collection.

Correct: if you prefix the notification time by device token then the limit applies per device token.

When we allow single-field index disablement you'll be able to do this without restructuring your data. You could disable the the index on notification-time and create a composite index on (device-token, notification-time) in a single collection of notifications.
 
What about the case where I just want to store something like "last_updated" on a user document?  Will this be a problem?  500 logins per second seems like a lot, so maybe it's not an issue.  But then again, 100,000 concurrent users also seems like a lot, and FireStore is working to break that limitation. 

So in today's Firestore beta, you're stuck. If you want a last_updated field on a user document in a single users collection then yes, you'd be rate limited to 500 user updates per second.

Once index disablement is possible, so long as you don't need to query users by last_updated (or ordered by last_updated), you could disable the index on last_updated and the limit is gone.

On a separate note, this notion of last update time comes up enough that it's worth coming up with an API for it. We already have this value, and so long as you don't need to query by the last update time the value we have is already equivalent to what you'd get by keeping a server timestamp for yourself.

Cheers,
-Gil

Andrew Decker

unread,
Jan 16, 2018, 8:36:18 PM1/16/18
to Firebase Google Group
Brilliant, thanks Gil!  I currently have 0 users on my app, so I don't anticipate any scalability issues in the next few months. :)

Daniel Dimitrov

unread,
Jan 30, 2018, 10:06:51 AM1/30/18
to Firebase Google Group
Thanks for the explanation Gil. I think I'm getting what you said, but would you mind again commenting on this. Let's imagine a comments collection.
Each comment has a created, modified timestamps and each comment has an article id.

As far as I understood - today since we can't disable the indexes on created & modified - this would create a congestion. And one would be limited to 500 updates per second on the whole comments collection? Once disabling an index is possible, we could create a composite index on article id + created and this way the 500 updates per second won't apply to the whole collection, but just to article id + modified prefix/domain?

But then you say "Once index disablement is possible, so long as you don't need to query users by last_updated (or ordered by last_updated), you could disable the index on last_updated and the limit is gone."

So I don't understand - why would one have last_updated, created, modified and not want to order by it? With most data you want to show the latest :)

Marek Gilbert

unread,
Jan 30, 2018, 11:10:07 AM1/30/18
to fireba...@googlegroups.com
On Mon, Jan 29, 2018 at 10:53 PM, Daniel Dimitrov <dan...@compojoom.com> wrote:
Thanks for the explanation Gil. I think I'm getting what you said, but would you mind again commenting on this. Let's imagine a comments collection.
Each comment has a created, modified timestamps and each comment has an article id.

As far as I understood - today since we can't disable the indexes on created & modified - this would create a congestion. And one would be limited to 500 updates per second on the whole comments collection? Once disabling an index is possible, we could create a composite index on article id + created and this way the 500 updates per second won't apply to the whole collection, but just to article id + modified prefix/domain?

That's correct.
 
But then you say "Once index disablement is possible, so long as you don't need to query users by last_updated (or ordered by last_updated), you could disable the index on last_updated and the limit is gone."

So I don't understand - why would one have last_updated, created, modified and not want to order by it? With most data you want to show the latest :)

What I meant was that you lose the ability to query on/order by these fields without supplying the prefix. Let's consider your example: a comments collection with last_updated and article_id. By default you'll get the following indexes:
  • article_id
  • last_updated
The plan (again depending upon a feature that hasn't shipped yet) is to disable the last_updated index and create a composite index, so you'll end up with indexes like these:
  • article_id
  • article_id, last_updated
This gets you up to 500 writes/sec per article. 

The trade-off you're making is that while you can you can issue this query:

db.collection('comments')
    .where('article_id', '==', 'composite-indexing')
    .orderBy('last_updated');

... what you're giving up is you can't order by last_updated without an equality constraint on article_id. That means this query becomes impossible:

db.collection('comments')
    .orderBy('last_updated');

In this domain querying across all comments regardless of article doesn't seem that useful so perhaps this trade-off is worth it.

Again note that this arrangement happens naturally if you nest comments as subcollections within articles. This gets you the higher rate limit today at the expense of only being able to query for comments within an article.

Cheers,
-Gil
 

On Tuesday, January 16, 2018 at 7:05:29 PM UTC+1, Gil Gilbert wrote:
On Saturday, January 13, 2018 at 11:59:52 AM UTC-8, Andrew Decker wrote:
I'm excited to try out Firestore!  One thing I'm not clear about how to handle the rate limit for documents:

Maximum write rate to a collection in which documents contain sequential values in an indexed field

This is discussed on StackOverflow here, and Gil Gilbert clarifies the statement nicely.  But his response leaves me wondering how to handle time-related data.  In my experience, that kind of data has been important in scheduling, say, periodic push notifications.  If storing timestamps is a limiting factor, what are some alternative approaches?

Hi again!

To understand how to work around the consecutive value limitation, it's useful to understand how we store data. Firestore (just like nearly every Google storage system back to Bigtable) allocates ranges of rows to specific servers. To the extent that your writes hit the same server they'll be limited. To gain higher throughput than the default you need to scatter your writes across different row ranges.

What this means generally is to create some prefix of the consecutive value and create a composite index on the combination. Then instead of the limit applying to documents in the collection, the limit would apply to the domain created by the prefix.

An important consideration for while Firestore is in beta is that there's no way to disable the default per-field indexes. This is a limitation we're actively working on removing. However this means that if you need very high throughput *today* you need to scatter your writes through subcollections (effectively creating a composite index by creating a longer primary key).

For push notifications, I suppose I could split the collection storing device tokens down further so that there are fewer documents per collection.

Correct: if you prefix the notification time by device token then the limit applies per device token.

When we allow single-field index disablement you'll be able to do this without restructuring your data. You could disable the the index on notification-time and create a composite index on (device-token, notification-time) in a single collection of notifications.
 
What about the case where I just want to store something like "last_updated" on a user document?  Will this be a problem?  500 logins per second seems like a lot, so maybe it's not an issue.  But then again, 100,000 concurrent users also seems like a lot, and FireStore is working to break that limitation. 

So in today's Firestore beta, you're stuck. If you want a last_updated field on a user document in a single users collection then yes, you'd be rate limited to 500 user updates per second.

Once index disablement is possible, so long as you don't need to query users by last_updated (or ordered by last_updated), you could disable the index on last_updated and the limit is gone.

On a separate note, this notion of last update time comes up enough that it's worth coming up with an API for it. We already have this value, and so long as you don't need to query by the last update time the value we have is already equivalent to what you'd get by keeping a server timestamp for yourself.

Cheers,
-Gil

--
You received this message because you are subscribed to a topic in the Google Groups "Firebase Google Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/firebase-talk/gA05mhSiYIo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to firebase-talk+unsubscribe@googlegroups.com.
To post to this group, send email to fireba...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/bf7c91a4-d989-4325-b5a4-390e586f98e5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Daniel Dimitrov

unread,
Jan 31, 2018, 12:05:20 PM1/31/18
to Firebase Google Group
Well, that sounds nice! In my case people are commenting within companies. So each comment has the company_id. We don't expect lot of users in a single company. So with the composite index we should never run into the issue as few people can't create 500 entries per second...


db.collection('comments')
    .orderBy('last_updated');

In this domain querying across all comments regardless of article doesn't seem that useful so perhaps this trade-off is worth it.

well, if you have a management interface and you want to show the last comments, then you definitely need last_updated... 

Thanks for the explanation!
To unsubscribe from this group and all its topics, send an email to firebase-tal...@googlegroups.com.

To post to this group, send email to fireba...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages