Hi Team,
We are working on a POC to build a data lake in Hadoop. In this regard we are evaluating ETL tools(Talend,Sqoop), Pig scripts,HIVE,Ooozie,Spark etc.. on top of Hadoop.
The clarity I am looking into is,
1. How to plan efforts/code release from development environment to production? How to plan release of hadoop based applications from development environment to production environment.
2. How to move pig scripts,HIVE QL,Oozie configurations from development environment to production?
I am looking into hello world examples using pig,hive, oozie etc on Hortonworks\cloudera site. Pig scripts/HIVEQL are deployed/created from UI.
Also most important, are there any Java components required apart from HIVE/PIG/OOzie? If yes how to move these java components into hadoop?
Is there any possibility we can write some custom utilities in java and make them available in hadoop context? Consider we want write these java components in eclipse/local and want these components to be available in hadoop context.
Any replies and guidance will be appreciated.
-Prakhyat M M