Enterprise Apache Hadoop: The Ecosystem of Projects

1 view
Skip to first unread message


Mar 2, 2016, 6:54:57 AM3/2/16
to UW Big Data Class
Apache Hadoop® is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly gain insight from massive amounts of structured and unstructured data.

  • Gain in-depth understanding of Big Data and Hadoop concepts
  • Excel in the concepts of Hadoop big data architecture and Hadoop Distributed File System (HDFS)
  • Implement HBase and MapReduce Integration
  • Understand Apache Hadoop 2.7 Framework and  Architecture
  • Learn to write complex Hadoop MapReduce programs in both MRv1 and Mrv2
  • Design and develop applications of big data using Hadoop Ecosystem
  • Set up Hadoop infrastructure with single and multi-node clusters using Amazon ec2 (CDH4)
  • Monitor a Hadoop cluster and execute routine administration procedures
  • Learn ETL connectivity with Hadoop big data, ETL tools, real-time case studies
  • Learn advanced big data technologies, write Hive and Apache Pig Scripts and work with Sqoop
  • Perform bigdata and analytics using Yarn
  • Schedule jobs through Oozie
  • Master Impala to work on real-time queries on Hadoop
  • Deal with Hadoop component failures and discoveries
  • Optimize Hadoop cluster for the best performance based on specific job requirements
  • Learn to work with complex, big data analytics tools in real-world applications and make use of Hadoop file System (like Google File System (GFS)
  • Derive insight into the field of Data Science and advanced data analytics
  • Gain insights into real-time processes happening in several big data companies
  • Work on a real-time project on Big Data Analytics and gain hands-on Big Data and Hadoop Project Experience
Start learning Apache Hadoop® from basics to advance levels here...
Reply all
Reply to author
0 new messages