Firebase Firestore Composite Index Limit of 200

888 views
Skip to first unread message

jeffreysoboe

unread,
Apr 27, 2018, 10:07:44 AM4/27/18
to Firebase Google Group
Hi group,

I have loved working with Firestore so far, but now, deep into my project, I am worried about Composite Index Limit of 200, and whether or not I will be hitting that.

Can someone clarify what this means?

For example, let's say that I have an app with thousands of users.   Each user wants the ability to sort through his posts.  

He can order them by topic, and then they would appear in order of date_created.  Or by emoji, and they would appear by date_created.

Would this then actually create thousands of composite indexes, because they happen in sub collections of the users?

Take a look at the example below:

Users:
      {user_id}
           username:  {George}
           posts:
                date_created:    {date}
                length:  {length}
                emoji:   {emoji}
                topic:   {topic}

So I would do create the query like this, in Android:

Query query = fsDB.collection("Users").document("userID").collection("posts").orderBy("topic").orderBy("date_created");

Now, ordering on 2 fields requires a composite index, so in the Firebase Console I would need to create an index on "posts", with the fields "topic" and "date_created".

Would that created index count as just ONE composite index?  Or would it be multiplied by the number of Users I have?  

If it is multiplied by the number of users I have, how could I restructure this to avoid hitting the composite index limit of 200?


Thanks so much!!

Jeff


            

Frank van Puffelen

unread,
Apr 27, 2018, 10:09:23 AM4/27/18
to Firebase Google Group

Samuel Stern

unread,
Apr 27, 2018, 11:38:12 AM4/27/18
to fireba...@googlegroups.com
Hi Jeff,

From what I can see in your example, indexes are not multiplied by the number of users. Indexes actually on depend on the collection name so all of the "posts" subcollections of your "users" documents can share the same indexes.

You'd only approach the 200 index limit if you had 200+ different field combinations you wanted to index.  Hope that makes sense!

- Sam

--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-tal...@googlegroups.com.
To post to this group, send email to fireba...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/1eb89e52-18b1-4408-a12e-4ccba3c6c924%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeff Padgett

unread,
Apr 28, 2018, 12:20:01 AM4/28/18
to fireba...@googlegroups.com

Hi Sam,


Ok, thank you for a quick reply!


Jeff




From: 'Samuel Stern' via Firebase Google Group <fireba...@googlegroups.com>
Sent: Friday, April 27, 2018 10:37 AM
To: fireba...@googlegroups.com
Subject: Re: [Firebase] Re: Firebase Firestore Composite Index Limit of 200
 

jeffreysoboe

unread,
Apr 29, 2018, 11:54:08 AM4/29/18
to Firebase Google Group
Hi Sam,

Just to clarify, can you check this again?

In this following structure:

Users:
      {user_id}
           username:  {George}
           posts:
                {post_id}
                   date_created:    {date}
                   length:  {length}
                   emoji:   {emoji}
                   topic:   {topic}
               {post_id}
                   date_created:    {date}
                   length:  {length}
                   emoji:   {emoji}
                   topic:   {topic}
      {user_id}
           username:  {Sam}
           posts:
                {post_id}
                   date_created:    {date}
                   length:  {length}
                   emoji:   {emoji}
                   topic:   {topic}
               {post_id}
                   date_created:    {date}
                   length:  {length}
                   emoji:   {emoji}
                   topic:   {topic}

I can create the index on "posts", based any of the underlying fields (like date_created, length, etc) , and the indexes will not multiply based on the number of users?

Thanks!

Jeff

Samuel Stern

unread,
Apr 29, 2018, 1:56:28 PM4/29/18
to fireba...@googlegroups.com
Yes that's correct. You'll still be limited to querying within a single user's posts in each query, but all of the posts subcollections can share indexes. 

Sam

Jeff Padgett

unread,
May 1, 2018, 11:17:47 PM5/1/18
to fireba...@googlegroups.com

Thanks Sam.





Is there a way to test whether or not the data I'm retrieving is actually coming from the firestore index?  I am not receiving the warnings about needing an index, but it is taking a long time to get the data.


It is taking a long time to retrieve data, and it takes longer the larger the dataset is... which is not supposed to happen with firestore. 

 

Even if I use "limit" and keep the number of records the same.  



For example,  I create a query like:


Query query = fsDB.collection("Users").document(user_id).collection("posts").orderBy("date_created).orderBy("topic").limit(100)



Here are the results of some tests I did on my Android device:


If the "posts" collection has:


20 docs:  300-500 ms.

200 docs:  2000-3000 ms. , with one outlier of 6000 ms.

1200 docs:  3000-6000 ms. 

5000 docs:  85000-95000 ms. on the first time I make the query, and had an outlier of 151000 ms.  Immediate subsequent returns take only 30-60 ms. !  (I assume the device is accessing the cache?)  But this does not happen for the earlier levels of tests (1200) for some reason. 


Again, this is all using the "limit" feature.  I also tried removing the "orderBy" elements, and the results are very similar.  


Without the limit feature,  it takes:


20 docs:   280 - 400 ms. 

200 docs:  2500-3300ms, with averages closer to 3000ms.

1200 docs:  14,300 - 21,000ms,  with an outlier of 23,000ms.

5000 docs: 



The documents are really not very big, I don't think...


Using 'com.google.firebase:firebase-firestore:11.6.0'


Thanks,  

Jeff




From: 'Samuel Stern' via Firebase Google Group <fireba...@googlegroups.com>
Sent: Sunday, April 29, 2018 12:56 PM

To: fireba...@googlegroups.com
Subject: Re: [Firebase] Re: Firebase Firestore Composite Index Limit of 200

Samuel Stern

unread,
May 2, 2018, 12:06:35 PM5/2/18
to fireba...@googlegroups.com
Hi Jeff,

Before we continue I want to clarify that all firestore queries use an index.  If your query does not hit an index, it will fail.  So if you're seeing results coming from the backend, your indexes are set up properly.  Indexes on single fields are automatically created which is why your queries are succeeding without needing to explicitly create any indexes.

Now we can dig into your issue.  When you say "without the limit feature" do you mean you're just doing a get() on the entire collection and that's how many documents exists?  Or are you reducing the number of results some other way?

- Sam

Jeff Padgett

unread,
May 3, 2018, 5:11:07 PM5/3/18
to fireba...@googlegroups.com


Hi Sam, Ok, thanks for clarifying that all firestore queries use an index.  

Technically, I am actually using Firestore Recycler Adapter from the FirebaseUI, which uses .addSnapshotListener.  


By the "limit feature" I meant using .limit().


Situation 1:

*User is connected to internet
*Offline Persistence is enabled
*5,000 documents in the user's "posts" collection

query = fsDB.collection("Users").document(user_id).collection("posts").orderBy("date_created).orderBy("topic")

It takes between 60 - 280 seconds for onDataChange to be executed (in other words, for the data to be received.) 

Question 1)  Isn't that excessively long to retrieve all 5000 of the documents?  The documents are not very big.  15 key-value pairs per doc, just holding short strings.

Situation 2:

*User is connected to internet
*Offline Persistence is enabled:
*5,000 documents in the user's "posts" collection

query = fsDB.collection("Users").document(user_id).collection("posts").orderBy("date_created).orderBy("topic").limit(100);

It takes an average of 90 seconds to retrieve the 100 documents from the 5,000 doc collection the first time it is tried.  Every subsequent request is about 50 ms. 

I assume that this is because the first return is writing those documents in the cache.  The subsequent 50ms return is because the data is in the cache or memory.

Question 2)  Why does the first request take so long, when I am limiting the retrieval to the first 100 documents by .limit(100)?  

Question 3)  This behavior - long first return, super fast subsequent returns - did not happen with tests where the "posts" collection had 20, 200, or 1,200 documents.  In those tests, the return value stayed relatively consistent.  In other words, for 1,200 docs, the return times would be 3-5 seconds, which seems long, considering I was using .limit(100).  Why is the behavior different when I tested for 5,000 docs?


Situation 3:

*User is connected to internet
*Offline Persistence is disabled
*5,000 documents in the user's "posts" collection

query = fsDB.collection("Users").document(user_id).collection("posts").orderBy("date_created).orderBy("topic").limit(100);

Here, the fetch of 100 docs takes 2-3 seconds!  So, in this situation we are using the index.  I also just read that mike lehen, on Jan. 13, said that firestore does not support indexes on the client... so, what I think is happening is that with .addsnapshotlistener(), my device is checking the client device first, then sorting through the client side documents (since there is no index) before returning the data.  

Here is his post: 

mikelehen commented on Jan 13

@izakfilmalter It would be helpful if you could be more specific about what exactly is taking longer and provide actual timing information rather than "a year." :-)

Your comment about startup taking forever on spotty networks makes me wonder if you are doing get() calls. We implement get() to get up-to-date data from the backend when online, else fall back to cache. So that could explain the issue and could likely be worked around by using onSnapshot() instead. We're also reworking some timeouts in the future which could help.

As for startup taking longer with offline enabled, there's some known performance work in the pipeline. In particular, as you build up a large number of offline documents, even small queries may become slower as we do not yet implement indexing in the clients, so it has to scan all documents in the collection you're querying against. So that may explain what you're seeing.


Question 4)  Is Firebase going to build the indexes on the clients as well?

Question 5)  How can I keep offline persistence enabled, and also get a quick return of data?  Is there an option for .addSnapshotListener(checkOnlineFirst) option?  Or is taking a long time because the Snapshot Listener is writing all of the data to memory?  It seems like this could be solved if .addSnapshot Listener got the data from the index online, THEN wrote the data to the cache/memory on the device.  

Thanks,
Jeff



 





From: 'Samuel Stern' via Firebase Google Group <fireba...@googlegroups.com>
Sent: Wednesday, May 2, 2018 11:06 AM

Samuel Stern

unread,
May 7, 2018, 5:15:21 PM5/7/18
to fireba...@googlegroups.com
Hi Jeff, 

Apologies for the slow response here.  Responses inline:

On Thu, May 3, 2018 at 2:11 PM Jeff Padgett <jeff_p...@hotmail.com> wrote:


Hi Sam, Ok, thanks for clarifying that all firestore queries use an index.  

Technically, I am actually using Firestore Recycler Adapter from the FirebaseUI, which uses .addSnapshotListener.  


By the "limit feature" I meant using .limit().


Situation 1:

*User is connected to internet
*Offline Persistence is enabled
*5,000 documents in the user's "posts" collection

query = fsDB.collection("Users").document(user_id).collection("posts").orderBy("date_created).orderBy("topic")

It takes between 60 - 280 seconds for onDataChange to be executed (in other words, for the data to be received.) 

Question 1)  Isn't that excessively long to retrieve all 5000 of the documents?  The documents are not very big.  15 key-value pairs per doc, just holding short strings.

60-280s is 12-56ms per document.  The low end of that (12ms) is bad but not unheard of.  The higher end is definitely too slow, I will find out what we can do to investigate.
 

Situation 2:

*User is connected to internet
*Offline Persistence is enabled:
*5,000 documents in the user's "posts" collection

query = fsDB.collection("Users").document(user_id).collection("posts").orderBy("date_created).orderBy("topic").limit(100);

It takes an average of 90 seconds to retrieve the 100 documents from the 5,000 doc collection the first time it is tried.  Every subsequent request is about 50 ms. 

90s to get 100 documents is certainly a bug.  Again, I need to find out what the best next step is so we can debug without using too much of your time.

The subsequent requests are just a database read so tens of milliseconds sounds right.
 

I assume that this is because the first return is writing those documents in the cache.  The subsequent 50ms return is because the data is in the cache or memory.

Question 2)  Why does the first request take so long, when I am limiting the retrieval to the first 100 documents by .limit(100)?  

Question 3)  This behavior - long first return, super fast subsequent returns - did not happen with tests where the "posts" collection had 20, 200, or 1,200 documents.  In those tests, the return value stayed relatively consistent.  In other words, for 1,200 docs, the return times would be 3-5 seconds, which seems long, considering I was using .limit(100).  Why is the behavior different when I tested for 5,000 docs?


Were there any other differences besides the number of documents in the collection?  Maybe different sized documents or more/less random document IDs?
 

Situation 3:

*User is connected to internet
*Offline Persistence is disabled
*5,000 documents in the user's "posts" collection

query = fsDB.collection("Users").document(user_id).collection("posts").orderBy("date_created).orderBy("topic").limit(100);

Here, the fetch of 100 docs takes 2-3 seconds!  So, in this situation we are using the index.  I also just read that mike lehen, on Jan. 13, said that firestore does not support indexes on the client... so, what I think is happening is that with .addsnapshotlistener(), my device is checking the client device first, then sorting through the client side documents (since there is no index) before returning the data.  

So just to clarify, the only difference between this and the 90s experiment is that in this case persistence was disabled?  What sort of device are you testing on?
 

Here is his post: 

mikelehen commented on Jan 13

@izakfilmalter It would be helpful if you could be more specific about what exactly is taking longer and provide actual timing information rather than "a year." :-)

Your comment about startup taking forever on spotty networks makes me wonder if you are doing get() calls. We implement get() to get up-to-date data from the backend when online, else fall back to cache. So that could explain the issue and could likely be worked around by using onSnapshot() instead. We're also reworking some timeouts in the future which could help.

As for startup taking longer with offline enabled, there's some known performance work in the pipeline. In particular, as you build up a large number of offline documents, even small queries may become slower as we do not yet implement indexing in the clients, so it has to scan all documents in the collection you're querying against. So that may explain what you're seeing.


Question 4)  Is Firebase going to build the indexes on the clients as well?


Yes this is definitely on the roadmap, can't share a timeline.
 

Samuel Stern

unread,
May 11, 2018, 1:18:07 PM5/11/18
to fireba...@googlegroups.com
Hi Jeff,

I asked around and we have some ideas as to which part of the system your delays are coming from.  Could you email me personally and provide your project ID and some logs from the device with Firestore debug logging enabled?

On Android you can use this method to enable logging, and there are similar methods on iOS and Web:

- Sam

Jeff Padgett

unread,
May 12, 2018, 12:39:24 AM5/12/18
to fireba...@googlegroups.com

Hi Sam,


Thanks, will do ASAP.  Sorry have been and will be a little busy with something else next week, but will be as engaged as I can with this, really appreciate your help.  


Jeff




From: 'Samuel Stern' via Firebase Google Group <fireba...@googlegroups.com>
Sent: Friday, May 11, 2018 12:17 PM

Jeff Padgett

unread,
May 22, 2018, 9:55:18 PM5/22/18
to fireba...@googlegroups.com


Hi Sam, 

Yes, the only difference was whether or not offline Persistence was enabled.  With help of Frank P.,  I discovered that when I cleared the Application's Cache, the speed of the requests was greatly improved when using  .limit(100)

However, in reality the user will not be clearing the cache of the app on the device... Frank mentioned they may build some garbage collection into Firebase in the future.

In the meantime,

Can you send me your personal email so I can correspond with you as you requested?  I can send some of the new tests to you with project ID.


Thanks,
Jeff

 


From: Jeff Padgett <jeff_p...@hotmail.com>
Sent: Friday, May 11, 2018 10:22 PM
Reply all
Reply to author
Forward
0 new messages