Best way to query a large amount of data from firestore?

550 views
Skip to first unread message

Alex Egli

unread,
Apr 1, 2021, 1:00:06 PM4/1/21
to Firebase Google Group

I'm trying to write a csv export feature using firestore as my database and a cloud function to generate the CSV for the client to download and I'm running into a lot of performance issues. I'm looking for any advice on what alternative Google Cloud Products I could use instead of a cloud function to make this feature work, or if there is a way to use cloud functions or firestore differently that might work. I'm trying to find the best way to query and download 1000's of large JSON docs (~35kB each), process them to CSV based on my business logic, and save that to a CSV file that a client can download. I have this working fine with a different database tech (couchbase) but I am trying to migrate my projects to Firebase and this is a blocker for that. My current couchbase implementation takes about 10s to query, parse, and generate the CSV for 5600 of these documents, with the resulting csv being about 70MB.

My current flow is the user makes a request to my cloud function with the date range and number of users they want to export the data for. The cloud function queries two collections for each user and the selected date range, and then converts all the JSON docs returned from those queries into CSV using my business logic and writes that to a CSV file in Firebase storage. I then return the download link for the csv file in Firebase storage and the client opens that link in a new window to auto-download the file.
The problem is that there could be 1000's of documents to find and process in those collections, and each document is very large and full of data (about 35kB per document). The Firestore NodeJS Admin SDK can only handle downloading a few hundred at once before it crashes, so I tried retrieving the data in chunks of 500 docs using the Firestore pagination features for querying, with a 20s delay between each query, but it runs so slowly that it takes over 9 minutes just to run for 1400 documents, so it times out.

Kato Richardson

unread,
Apr 5, 2021, 5:11:47 PM4/5/21
to Firebase Google Group
Hi Alex,

There's not going to be a cheap and efficient way to regularly export large chunks of your data to CSV. Depending on your implementation, you might be able to get some mileage out of the automated import/export feature or perhaps using the BigQuery export (the latter seems promising).

Another approach might be to reverse your functions so that they incrementally write to a CSV file somewhere (say a daily/weekly output file kept in Cloud STorage) and when the client requests this, just point them to the pre-existing URL. This is a common strategy to scale in NoSQL--do the work at write time instead of read time.

I hope that helps!

☼, Kato

--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-tal...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/2e7af928-dfb0-42f7-82cd-90c7d9d53d36n%40googlegroups.com.


--

Kato Richardson | Developer Programs Eng | kato...@google.com | 775-235-8398

Reply all
Reply to author
Forward
0 new messages