I'm newbie on Big Data, and i'm starting with hadoop.
I have installed Hortonworks HDP 3.1
I have to design a Big Data Layer that ingests large iot datasets and social media datasets, process data with MapReduce job and produce aggregation to store on HBASE tables.
For now, my focus is addressed on data processing issue. I'm investigating hadoop ecosystem to find a suitable tool for batch data processing.
I found many candidate tools like Apache Beam, Cascalog, Scalding, Spark. What do you think about them ?
Cascalog learning curve is not simple. I need your help to undestand if Cascalog is suitable for this scope and if Cascalog is yet maintened for support.
I would appreciate some help.