Course content
Overview:
Apache Hadoop is the open source data management software that helps organizations analyze huge volumes of structured and unstructured data, is a very hot topic across the tech industry. It can be quickly learn to take advantage of the MapReduce framework through technical sessions and hands on labs.
Training Objectives of Hadoop:
Hadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.
Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux.
- Problems with traditional large-scale systems
- Requirements for a new approach
- An Overview of Hadoop
- The Hadoop Distributed File System
- Hands on Exercise
- How MapReduce Works
- Hands on Exercies
- Anatomy of a Hadoop Cluster
- Other Hadoop Ecosystem Components
- Examining a Sample MapReduce Program
- With several examples
- Basic API Concepts
- The Driver Code
- The Mapper
- The Reducer
- Hadoop’s Streaming API
- More About ToolRunner
- Testing with MRUnit
- Reducing Intermediate Data With Combiners
- The configure and close methods for Map/Reduce Setup and Teardown
- Writing Partitioners for Better Load Balancing
- Hands-On Exercise
- Directly Accessing HDFS
- Using the Distributed Cache
- Hands-On Exercise
- The configure and close Methods
- Sequence Files
- Record Reader
- Record Writer
- Role of Reporter
- Output Collector
- Processing video files and audio files
- Processing image files
- Processing XML files
- Counters
- Directly Accessing HDFS
- ToolRunner
- Using The Distributed Cache
- Sorting and Searching
- Indexing
- Classification/Machine Learning
- Term Frequency – Inverse Document Frequency
- Word Co-Occurrence
- Hands-On Exercise: Creating an Inverted Index
- Identity Mapper
- Identity Reducer
- Exploring well known problems using MapReduce applications
- What is HBase?
- HBase API
- Managing large data sets with HBase
- Using HBase in Hadoop applications
- Hands-on Exercise
COURSE DETAILS IN A NUTSHELL | |
1. In-detail explanation on the concepts of HDFS & MapReduce frameworks 2. What is Hadoop 2.X Architecture & How to set up Hadoop Cluster 3. How to write complex MapReduce Programs 4. In-detail explanation on how to load data using tools like Sqoop & Flume,solr 5. How to perform data analysis using tools like PIG, HIVE & YARN 6. How to implement & integrate HBASE & MapReduce 7. How to execute Advanced Usage and Indexing 8. How to schedule jobs using Oozie 9. What are the best practices for overall Hadoop development 10. RTAs on Data Analytics 11. What is Spark & brief about its ecosystem & how to work on RDD Using Spark | Programming languages: Java & Scala Frame works: Hadoop Distributed File System (HDFS) & MapReduce, spark Loading Tools: Sqoop & Flume Analytical Tools: Pig, Hive and YARN Scheduling Tools: Oozie |
CURRICULUM for HADOOP 2.X
S No | Concepts | Syllabus Objectives | Topics | RTAs | |
1 | Understanding Big Data and Hadoop | The syllabus for this lecture would brief about: 1. Big Data 2. Big Data problems & solutions, their limitations 3. HADOOP’s solutions that handle Big Data issue 4. Common Hadoop Ecosystem and its Architecture 5. Introduction to HDFS 6. What is a file and how to write & read 7. Brief on MapReduce Framework and its working style. | 1. Big Data, Limitations and Solutions of existing Data Analytics Architecture, 2. Hadoop, 3. Hadoop Features, 4. Hadoop Ecosystem, 5. Hadoop 2.x core components, 6. Hadoop Storage: HDFS, 7. Hadoop Processing 8. MapReduce Framework, 9. Hadoop Different Distributions. | ||
2 | Hadoop requirements | The syllabus for this lecture would brief about: Pre-requisites to learn hadoop | 10. Linux commands 30 Essential Linux Basic Commands You Must Know 11. vmware · Basics · Installations · Backups 12. sql basics · Introduction to SQL · MySQL Essentials · Database Fundamentals 13. Hands on exercise and Assignments | ||
3 | Hadoop Architecture and HDFS | The syllabus for this lecture would brief about: 1. What is Hadoop Cluster Architecture 2. What are the important Configuring files in a Hadoop Cluster 3. What are the various Data loading techniques 4. What are Single node and Multi nodes and their setups | 14. Hadoop 2.x Cluster Architecture 15. Federation and High Availability, 16. A Typical Production Hadoop Cluster, 17. Hadoop Cluster Modes, 18. Common Hadoop Shell Commands, 19. Hadoop 2.x Configuration Files, 20. Single node cluster and Multi node cluster set up Hadoop Administration. 21. Hands on exercise and Assignments | ||
4 | Hadoop MapReduce Framework | The syllabus for this lecture would brief about: 1. In-depth analysis on Hadoop MapReduce Framework 2. How MapReduce works on data stored in HDFS. 3. What are Splits, Combiner & Partitioner. 4. How to work on MapReduce using different data sets | 22. MapReduce Use Cases, 23. Traditional way Vs MapReduce way, 24. Why MapReduce, 25. Hadoop 2.x MapReduce Architecture, 26. Hadoop 2.x MapReduce Components, 27. YARN MR Application Execution Flow, 28. YARN Workflow, 29. Anatomy of MapReduce Program, 30. Demo on MapReduce. 31. Input Splits, 32. Relation between Input Splits and HDFS Blocks, 33. MapReduce Combiner & Partitioner, 34. Hands on exercise and Assignments | ||
5 | Pig | The syllabus for this lecture would brief about: 1. What is PIG & types of use, demo case 2. How to couple PIG with MapREduce 3. What are PIG Latin Scripting 4. What are PIG running Modes PIG UDF, Pig Streaming, Testing PIG Scripts. | 35. About Pig, 36. MapReduce Vs Pig, 37. Pig Use Cases, 38. Programming Structure in Pig, 39. Pig Running Modes, 40. Pig components, 41. Pig Execution, 42. Pig Latin Program, 43. Data Models in Pig, 44. Pig Data Types, 45. Shell and Utility Commands, 46. Pig Latin Relational Operators, 47. File Loaders, 48. Group Operator, 49. COGROUP Operator, 50. Joins and COGROUP, 51. Union, 52. Diagnostic Operators, 53. Specialized joins in Pig, 54. Hands on exercise and Assignments . | ||
6 | Hive | The syllabus for this lecture would brief about: 1. What are HIVE concepts 2. What are HIVE data types 3. What are Loading & Querying in HIVE, 4. How to run HIVE scripts 5. What are Hive UDF . | 55. Hive Background, 56. Hive Use Case, 57. About Hive, 58. Hive Vs Pig, 59. Hive Architecture and Components, 60. Metastore in Hive, 61. Limitations of Hive, 62. Comparison with Traditional Database, 63. Hive Data Types and Data Models, 64. Partitions and Buckets, 65. Hive Tables(Managed Tables and External Tables), 66. Importing Data, 67. Querying Data, 68. Managing Outputs, 69. Hive Script, 70. Hive UDF, 71. Retail use case in Hive, 72. Hands on exercise and Assignments | ||
7 | Advanced Hive and HBase | The syllabus for this lecture would brief about: 1. What are Advanced HIVE concepts 2. What are UDF, Dynamic Partitioning, HIVE indexes & Views 3. What are Optimizations in HIVE 4. In-depth analysis on HBase, its Architecture, components and its running modes | 73. Hive QL: Joining Tables, 74. Dynamic Partitioning, 75. Custom Map/Reduce Scripts, 76. Hive Indexes and views 77. Hive query optimizers, 78. User Defined Functions, 79. HBase: 80. Introduction to NoSQL 81. Databases and HBase, 82. HBase v/s RDBMS, 83. HBase Components, 84. HBase Architecture, 85. Run Modes & Configuration, 86. HBase Cluster Deployment. 87. Hands on exercise and Assignments | ||
8 | Advanced HBase | The syllabus for this lecture would brief about: 1. What are Advanced HBase Concepts 2. How to perform bulk loading 3. What are filters 4. What is Zookeeper and how it helps in Cluster monitoring. 5. Why HBase utilizes Zookeeper | 88. HBase Data Model, 89. HBase Shell, 90. HBase Client API, 91. Data Loading Techniques, 92. ZooKeeper 93. Demos on Bulk Loading, 94. Getting and Inserting Data, 95. Filters in HBase. 96. Hands on exercise and Assignments | ||
9 | Sqoop | The syllabus for this lectureould brief about: 1. Import data from other databases to hdfs 2. Import data from other databases to hive 3. export data from hadoop to other databses | 97. Introduction. 98. Import Data. 99. Export Data. 100. Sqoop Syntax. 101. Databases connection. 102. Hands on exercise and Assignments | ||
10 | Impala | The syllabus for this lecture would brief about: Impala | 103. .Introduction to Impala 104. .Impala Configuration 105. .Comparison between Hive and Impala 106. .Impala Commands 107. Hands on exercise and Assignments | ||
11 | Processing Distributed Data with Apache Spark | The syllabus for this lecture would brief about: 1. What is Spark Ecosystem 2. What is Scala and its utility in Spark 3. What is SparkContext 4. How to work on RDD in Spark 5. How to run a Spark Cluster 6. Comparison of MapReduce vs Spark | 108. What is Apache Spark, 109. Spark Ecosystem, 110. Spark Components, 111. History of Spark 112. Spark Versions/Releases, 113. What is Scala?, 114. Why Scala?, 115. SparkContext, 116. Spark Sql 117. Hands on exercise and Assignments. | ||
12 | Flume & solr | The syllabus for this lecture would brief about: Flume and solr | 118. Introduction. 119. Configuration and Setup 120. Flume Sink with example 121. Channel 122. Flume Source with example 123. Complex flume architecture 124. Streaming data storing into solr 125. customization of solr 126. Hands on exercise and Assignments | ||
13 | Hue | The syllabus for this lecture would brief about: Hue | 127. Introduction to Hue 128. Advantages of Hue 129. Hue Web Interface 130. Ecosystems in Hue 131. Hands on exercise and Assignments | ||
14 | Oozie | The syllabus for this lecture would brief about: 1. How multiple Hadoop ecosystem components work 2. How they should be implemented to solve Big Data Issues | 132. Oozie, 133. Oozie Components, 134. Oozie Workflow, 135. Scheduling with Oozie, 136. Demo on Oozie Workflow, 137. Oozie Co-ordinator, 138. Oozie Commands, 139. Oozie Web Console, 140. Oozie for MapReduce, 141. PIG, Hive, and Sqoop, 142. Combine flow of MR, PIG, Hive in Oozie 143. Hands on exercise and Assignments | ||
1 | Tableau | The syllabus for this lecture would brief about: Tableau | . 144. Tableau Fundamentals 145. Tableau Analytics. 146. Visual Analytics. 147. Hands on exercise and Assignments | ||
PROJECTS:
1. | Hadoop Project | Hadoop -Tableau live integration Topics : This is a project that gives you opportunity to work on retail data analytics. : | 1.Hadoop Integration with Tableau | |
2. | HadoopProject2 | Multi-node cluster setup Topics : This is a project that gives you opportunity to work on real world Hadoop multi-node cluster setup in a distributed environment. | · Running a Hadoop multi-node using a 4 node cluster · Deploying of MapReduce job on the Hadoop cluster · You will get a complete demonstration of working with various Hadoop cluster master and slave nodes, installing Java as a prerequisite for running Hadoop, installation of Hadoop and mapping the nodes in the Hadoop cluster. | |
3. | Hadoop Project3 | Social media analytics Topics : This is a project that gives you opportunity to work on social media Analytics. | · Streaming Twitter data · Store data into hadoop · Process social media data · Sentiment analysis on twitter data · Final result store in table · Connect BI Tool. |
Mode of Training
Online
Total duration of the course
5 to 7 weeks
Training duration per day
50 mins - 90 mins
Communication Mode
Go to meeting, WEB-EX
Software access:
Software will be installed/Server access will be provided, whichever is possible
Material
Soft copy of the material will be provided during the training.
Training
Both weekdays and weekends
Training Fee
$500