Do you guys have any efficient logic to convert 5 level nested mongo collection into mulitple flat files ? This seems to be my bottleneck.
Hi Prashanth,
It’s been a while since you posted the question, have you found a solution yet ?
Depending on your use case, you can try to use mongoexport to export data in CSV formatted files. You can then utilise the COPY command to load multiple split data files in parallel.
You can also specify --query option in mongoexport to only export specific documents. i.e. filter by time for nightly data.
For example:
mongoexport --db dbName --collection collName --type csv --fields="fieldA,fieldB,fieldC.nested1.nested2" --query '{timestamp:{$gt:1462781256010}}'
If you require a customised export behaviour, you could also develop an exporter using any of MongoDB supported drivers.
If you have further questions, please provide an example document to be exported/flatten and also clarify the ‘bottleneck’ problem that you are having.
If the bottleneck is a conversion or loading process within Amazon RedShift, you may get faster responses and reach wider audience by posting a question in StackOverflow: tag amazon-redshift.
Best regards,
Wan.