should I have all the info in mongodb and hdfs?
Hi Gonzalo,
This would be a decision that you would have to make as the domain expert of the your system. You could store the 'channel' meta and data in MongoDB or in a combination of MongoDB and HDFS
You may find the following resources useful:
should I have all info in mongodb and copy to hdfs at the moment I want to process it? (think that might take some time)
This is depending on the processing use case. For example, using the MongoDB Connector for Spark you could load collection data from MongoDB and process in Apache Spark. You could even store the resulting computation back to MongoDB.
See also :
This would be suitable for documents manipulation or aggregation.
what I understood the info doesn't go to hdfs if I use the connector. Is it the same (in performance and stuff like that) if I do MapReduce to something in mongodb (through connector) and if it is done in the hdfs? I used the connector to do video processing using the GridFSInputFormat
Although you can store data in both MongoDB and HDFS, they are two different things. HDFS is a distributed file systems and MongoDB is a document-oriented database. It's not a straight forward comparison, as it depends whether you require other features/characteristics they have.
If your video processing is more about large binary files processing, HDFS would probably be more performant as it's closer to the file management level. See also When to use GridFS.
As always, you should also perform some tests under your specific environment and use cases.
its for research (i'm learning).
I would recommend to enrol in a free online course at MongoDB University to learn more about MongoDB. A new session has just started today so you can join straight away. Especially the M101 courses which cover Data Modelling/Schema topics.
Regards,
Wan.