Hi Matt,
So, if I wanted to know the sales by day over the course of the past week, I would look for ALL “goods & services transaction” documents where it is “revenue” (as opposed to expense) and where it is a “transaction” (as opposed to proposition or instruction) and range by transaction_date (from today - 7 days to today) and then sum the transaction_net_value (the transaction total ex Taxes) grouped party_uuid (the business identifier) and transaction_date. The values I would want from this are party_uuid, the transaction_date, the summed value and a category called “sales_by_day”.
I would then $out to a run-time environment which makes it available to each business. That means I have aggregated and calculated prior to it being required and turned it into a document (in a different collection) which can be picked up by the customer when they need it rather than attempting to do the query in real-time when the customer wants it. The query would run every x minutes and replace all those documents as a refreshed set. I wanted to create these cache type documents so that I can control how often queries are run on the database.
If I understand correctly, you have 65,000 businesses, all of which you intend to give an aggregated report every week/month/year. Currently you are storing detailed documents recording each sale for each business. However, you don’t want to perform the aggregation on-demand, but as a scheduled job instead.
If my understanding is correct, then you may be able to use the pre-aggregated report design pattern. The main idea is that when you are doing an insert into the main collection, you also do an update to a document that keeps a running tally of the inserted data. This is similar to your $out
method, although it doesn’t actually use the aggregation framework to achieve it.
For more information, please see the Pre-Aggregated Reports page.
What I want to understand is, is this child play for Mongo or am I pushing it’s performance considerably? Would you $out those documents to another collection and use Mongo for both Data Warehouse AND run-time or do you think, the data produced from the aggregation should be pushed off Mongo and into something like Cassandra? My main purpose for MongoDb in this project is to ETL transactions from multiple sources into the standardised data schema (as shown above) and keep them all. Then apply queries to that data set to produce data values to be consumed by reports and dashboard widgets.
Any opinion and recommendation will be very much appreciated. I would also potentially like some sort of stats - I’ve looked everywhere for it and found nothing. Like, how many documents in a collection? How many documents with matching value in an index before it impacts performance? How will my queries perform? If I am using a single collection for all these transactions, should I consider a collection per year? What’s the best way to setup MongoDb to handle my use - sharding, etc.
One of the reason why you don’t see much stats in the internet is because unlike relational databases where the database design is focused on how it is stored, MongoDB focuses on how the data is used. For example, given the same input data like yours, you require an aggregated report. Someone else might require fast reads, others might require fast writes, while others may require a balanced read/write workload. All four use cases also depends on the available budget and hardware, and thus could have radically different document design, even though the use case looks similar at a glance.
You might find the following links helpful for your consideration:
Best regards,
Kevin
Thanks Kevin, I understand your logic here, I'll investigate the preaggregated reports you have mentioned. Thanks again, Matt
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/IrmHJhkltgQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d216238a-e50d-4961-9ef9-361a8f79ddfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.