Hello All,
Firstly, thanks for open sourcing this wonderful tool dr.elephant, it is awesome.
However, I have been struggling to fit dr.elephant in our big data ecosystem. The set up we have in our environment is to spin up ephemeral EMR clusters, process, store processed data in S3 and terminate the cluster. Each pipeline will have its own cluster which will get terminated after processing.
The only way i have been able to use it currently is to manually install dr.elephant on a EMR cluster, then run the job on cluster, keep it alive, check dr.elephant dashboard. This will be cumbersome to do in a production environment and not possible too. All these logs from the cluster go into kibana where they are accessible.
Can dr.elephant read the logs from kibana (or any location other than history server) and provide insights into the jobs?
Has anyone set up dr.elephant in this kind of an environment? Looking for thoughts/ideas from the community.
Thanks in advance.