I have a webapp where visitors fill out a short survey, the results of which are stored in Firebase. A sample set of responses may look like this:
{
"results": {
"randomid1": {
"age": 29,
"height": 57,
"weight": 42,
"submit_date": 1442339543203
},
"randomid2": {
"age": 34,
"height": 63,
"weight": 53,
"submit_date": 1442246673772
}
}
}
At the moment I have probably 2.1+ million records just like this. I store the aggregate counters in a separate branch:
{
"aggregates": {
"age": {
"19": 0,
"29": 1,
"39": 1
},
"height": {
"40": 0,
"50": 0,
"60": 1,
"70": 1
}
}
}
This is great for reporting - when a customer completes the survey, the actual results are written to
results and the appropriate aggregate counters are updated. All fine and well, and for the overview report, where we're showing the aggregate totals, it's perfect. But the customer also wants to search and summarize the raw results in a one-off way, which makes things difficult, because individual searches can take over three minutes to return data, which doesn't seem right. Firebase should be faster at that, right? The client wants to search ranges of
submit_date and
age at the same time, for example. So what I find myself doing is this:
var matches = {};
ref.child('results').orderByChild('submit_date').startAt(startDate).endAt(endDate).once('value', function(s) {
s.forEach(function(result) {
if (result.val().age >= startAge && result.val().age <= endAge)
matches[result.key()] = result.val();
});
});
What's taking forever (according to the web socket results panel in Chrome developer tools) is the actual retrieval of data. My current index is set up like this:
{
"rules": {
"results": {
".indexOn": [
"submit_date",
"age"
]
}
}
}
The way I understand the docs is, that should allow me to sort or search the entire
results collection by
submit_date pretty quickly. However, like I said, with over 2.1 million results, searching by those dates is taking several minutes at a time. And it looks like it's the actual search; the dev tools show data streaming in pretty quick once the search response is received. Is there any way to modify the indexes or restructure the data somehow so that searches don't take so long to return results?