Category: BIG DATA

Big Data Hadoop Training

Big Data Hadoop Training

About Proerp Academy

Proerp Academy is a leading E-Learning Platform both Online and Instructor-led training(Offline).We have trained the students and professionals across the globe in different technologies like Amazon Big Data Hadoop,Webservices(AWS), Business Intelligence and Analytics,Big Data Analytics, Data Science Analytics.

We have efficient and affordable learning solutions that is accessible to the millions of interested students and professionals across the globe.

About the Course:

Proerp Academy’sBig  DataHadoopOnline and Offline training will train the Students and Professionals  how to use Big Data effectively and efficiently.

This  Big DataHadoop training also helps the Students and Professionals to upgrade the skills to the latest technologies.

This Course provides the skills and knowledge of experienced Big DataHadoopdevelopers and administrators.

This Course will provide you the necessary skills to implement the Big Data  Hadoop and Cloud Computing on this Platform.

The Course provides you the overview ofHadoop Big Dataand use of the Hadoop System.

Training Objectives:

  • Introduction to Big Data and Hadoop Ecosystem
  • HDFS Architecture
  • Hadoop Architecture
  • MapReduce and Sqoop
  • Basics of Impala
  • Basics of Hive
  • Working with Impala
  • Working with Hive
  • Type of Data Formats
  • Advanced Hive Concepts
  • Data File Partitioning
  • Apache Flume
  • HBase
  • Apache Pig
  • Basics of Apache Spark
  • RDDs in Spark
  • Spark Parallel Processing
  • Implementation of Spark Applications
  • Spark SQL
  • Spark RDD Optimization Techniques
  • Spark Algorithm

Module 1

Bigdata and Hadoop-Introduction and Overview

Goal:

In this module,we will discuss abouttraditional systems and the problems associated with the traditional large scale systems.You will also learn Hadoop and  its ecosystem.

Objective:

After completing this module 1,you should be able to:

  • Understand Traditional models
  • Understand Hadoop
  • Understand and Describe the problems with Traditional Large Scale Systems
  • Understand the Hadoop Ecosystem

Topics:

  • Traditional models
  • Hadoop
  • Problems with Traditional Large Scale Systems
  • Hadoop Ecosystem

Hands on:

  • Hadoop Ecosystem
  • Hadoop and its advantages
  • Traditional model
  • Problems with Traditional Large Scale Systems

 

 

Module 2

HDFS and Hadoop Architecture

Goal:

In this module,You will learn about distributed processing on cluster,HDFSarchitecture.You will learn how to use HDFS,YARN as a resource manager,Yarn architecture and working with YARN.

Objective:

After completing this module 2,you should be able to:

  • Understand Distributed Processing on a Cluster
  • Understand and Describe Storage:HDFS Architecture
  • Understand Resource Management:YARN
  • Understand and Explain Storage using HDFS
  • Understand Resource Management:YARN
  • Understand and work with Resource Management:Working with YARN

Topics:

  • Distributed Processing on aCluster
  • Storage:HDFS Architecture
  • Storage:using HDFS
  • Resource Management:Working with YARN
  • Resource Management:YARN
  • Resource Management:YARN Architecture

Hands on:

  • Storage using HDFS
  • Distributed Processing on a cluster
  • Resource Management:YARN Architecture
  • Resource Management:Working with YARN
  • Resource Management:YARN
  • Storage :HDFS Architecture

 

 

 

Module 3

MapReduce and Sqoop

Goal:

In this module,you will learn about MapReduce and its characteristics,advanceMapReduceConcepts,Overview of Sqoop,Basic import and exports in Sqoop,improvingSqoop’sperformance,limitations of Sqoop and Sqoop2.

Objective:

After completing this module 3,you should be able to:

  • Understand MapReduce
  • Understand and Explain MapReduce Characteristics
  • Understand Sqoop Overview
  • Understand Advanced MapReduce Concepts
  • Understand and Describe Basic Imports and Exports
  • Understand Limitations of Sqoop
  • Understand Sqoop 2

Topics:

  • MapReduce
  • MapReduce Characteristics
  • Advance MapReduce Concepts
  • Basic Imports and Exports
  • Sqoop Overview
  • Limitations of Sqoop
  • Improving Sqoop’s Performance
  • Sqoop 2

Hands on:

  • Basic Imports and Exports
  • Sqoop Overview
  • MapReduce
  • MapReduce and its Characteristics
  • Limitations of Sqoop
  • Improving Sqoop’s Performance
  • Sqoop 2

 

Module 4

Basics of Impala and Hive

Goal:

In this module,you will learn Hive and Impala,why to use Hive and Impala, differences between Hive and Impala,how Hive and Impala works and comparison of Hive to traditional databases.

Objective:

After completing this module 4,you should be able to:

  • Describe Impala and Hive
  • Understand why to use Impala and Hive?
  • Understand and Describe what is the difference between Hive and Impala
  • Understand and Explain how Hive and Impala works
  • Understand and Explain how to compare Hive with Traditional Databases

Topics:

  • Introduction to Impala and Hive
  • Difference between Hive and Impala
  • Why to use Impala and Hive
  • How Hive and Impala Works?
  • Comparing Hive to Traditional Databases

Hands on:

  • Impala and Hive
  • Understand the use of Impala
  • Understand the use of Hive
  • Differrence between Hive and Impala
  • Comparing Hive to Traditional Databases
  • Introduction to Hive and Impala

 

 

 

 

Module 5

Working with Impala and Hive

Goal:

In this module,you will learn about metastore,how to create databases and table in Hive and Impala,loading of data into tables of Hive and Impala,Hcatalog and how impala works on cluster.

Objective:

After completing this module 5,you should be able to:

  • UnderstandMetastore
  • Understand how to create databases and tables
  • Understand and Describe how to load data into tables
  • Understand Hcatalog
  • Understand and Describe impala on Cluster

Topics:

  • Metastore
  • Creating Databases
  • Creating Tables
  • Loading Data into Tables
  • Impala on Cluster
  • HCatalog

Hands on:

  • Creating Databases
  • Creating Tables
  • Understand and Describe how the data is loaded into Tables
  • Impala on Cluster
  • Understand Hcatalog and uses

 

 

 

 

Module 6

Type of Data Formats

Goal:

In this module,you will learn  about different types of file formats which are available such as Hadoop tool support for file format,avroschemas,usingavro with Hive and Sqoop and Avro schema evolution.

Objective:

After completing this module 6,you should be able to:

  • Understand File Format
  • Understand and Describe Hadoop Tool Support for File Formats
  • Understand Avro Schemas
  • Understand how to use Avro with Hive and Sqoop
  • Understand and Describe Acro Schema Evolution

Topics:

  • File Format
  • Avro Schemas
  • Hadoop Tool Support for File Formats
  • How to use Avro with Hive and Sqoop
  • Avro Schema Evolution

Hands on:

  • Creating File Format
  • Understand Avro Schemas
  • Understand how to use Avro with Hive and Sqoop
  • Avro Schema Evolution
  • Understand Hadoop Tool Support for different file formats

 

 

 

 

 

Module 7

Advance HIVE Concept and Data File Partitioning

Goal:

In this module,You will learn about partitioning in Hive and Impala,Partitioning in Impala and Hive,when to use partition,bucketing in Hive and more advanced concepts in Hive.

Objective:

After completing this module 7,you should be able to:

  • Understand Partitioning Overview
  • Understand when to use partition?
  • Understand and Describe partitioning in Impala and Hive
  • Understand Bucketing in Hive
  • Understand and Explain advanced concepts in Hive

Topics:

  • Partitioning Overview
  • When to use Partition?
  • Bucketing in Hive
  • Partitioning in Impala and Hive
  • Advanced concepts in Hive

Hands on:

  • Bucketing in Hive
  • Understanding about Partitioning in Impala and Hive
  • Understanding about Partitioning overview
  • Understand Advanced Concepts in Hive
  • When to use partition?

 

 

 

 

 

Module 8

Apache Flume and HBase

Goal:

In this module,You will learn about apache flume,flumearchitecture,flumesources,flumesinks,flumechannels,flumeconfigurations,introduction to HBase,HBaseArchitecture,data storage in HBase,HBasevs RDBMS.

Objective:

After completing this module 8,you should be able to:

  • Understand Apache Flume
  • Understand and Explain Basic Flume Architecture
  • Understand and Analyze Flume Sources
  • Understand Flume Channels
  • Understand and Describe Flume Sinks
  • Understand and Describe Flume Configuration
  • Understand and Explain HBase
  • Understand HBase Architecture
  • Data storage in HBase
  • Understand and Analyze HBasevs RDBMS
  • Understand working with HBase

Topics:

  • What is Apache Flume?
  • Basic Flume Architecture
  • Flume Channels
  • Flume Sinks
  • Flume Configuration
  • HBase
  • HBase Architecture
  • Data Storage in HBase
  • HBasevs RDBMS
  • Working with HBase

Hands on:

  • Flume Configuration
  • HBase Architecture
  • Data Storage in HBase
  • Understand Flume Channels
  • Flume Sinks
  • Flume Architecture
  • HBasevs RDBMS
  • Working with HBase

Module 9

Apache Pig

Goal:

In this module,You will learn about pig,components of Pig,Pigvs SQL and how to work with Pig .

Objective:

After completing this module 9,you should be able to:

  • Understandwhat is Pig
  • Understand the various components of Pig
  • Understand the difference between Pig vs SQL
  • Understand how to work with Pig

Topics:

  • Pig Overview
  • Various components of Pig
  • Pig vs SQL
  • Working with Pig

Hands on:

  • Understanding of Various components of Pig
  • Understand the difference between Pig vs SQL
  • Working with Pig

 

 

 

 

 

Module 10

Basics of Apache Spark

Goal:

In this module,You will learn about Apache Spark,how to use spark shell,RDDs,Functional programming in Spark.

Objective:

After completing this module 10,you should be able to:

  • Understand Apache Spark
  • Understand how to use Spark Shell
  • Understand and Describe Resilent Distributed Datasets (RDDs)
  • Understand the functional programming in Spark

Topics:

  • Apache Spark
  • Resilent Distributed Datasets
  • Spark Shell
  • Functional Programming in Spark

Hands on:

  • Apache Spark
  • Resilent Distributed Datasets(RDDs)
  • Functional Programming in Spark

Module 11

Resilent Distributed Datasets(RDDs)

Goal:

In this module,You will learn about RDD in detail and all operations associated with it,key value pair RDD and few more other pair RDD operations.

Objective:

After completing this module 11,you should be able to:

  • Understand key value pair RDDs
  • Understand and Describe Other pair RDD Operations

Topics:

  • A closer look at RDDs
  • Key Value Pair RDDs
  • RDD Operations

Hands on:

  • Key-Value Pair RDDs
  • Other Pair RDD Operations

Module 12

Implementation of Spark Applications

Goal:

In this module,You will  learn about spark applications vs spark shell,how  to  create a sparkcontext,building a spark application,how spark run on YARN in client and cluster mode,dynamic resource allocation and configuring spark properties.

Objective:

After completing this module 12,you should be able to:

  • Understand Spark Applications vs Spark Shell
  • Understand and Describe how to buid a Spark Application(scala and Java)
  • Creating the Spark Context
  • Understand how Spark runs on YARN:Client Mode
  • Understand and Describe Dynamic Resource Allocation
  • Understand the configuration of Spark and Properties

Topics:

  • Creating the Spark Context
  • Spark Applications vs Spark Shell
  • Building a Spark Application(Scala and Java)
  • How Spark Runs on YARN:Client Mode
  • How Spark Runs on YARN:Cluster Mode
  • Configuring Spark
  • Dynamic Resource Allocation

 

Hands on:

  • Creating the Spark Context
  • Understand about Dynamic Resource Allocation
  • Understand the difference between Spark Applications vs Spark Shell
  • Configuration of Spark
  • Building a Spark Application(Scala and Java)
  • Understand how Spark runs on YARN in Client Mode
  • Understand and Explain how Spark runs on YARN in Cluster Mode

Module 13

Spark Parallel Processing

Goal:

In this module,You will  learn about how spark run on cluster,RDD partitions,how to  create partitioning on File baed RDD,HDFS and data locality,parallel operations on spark,spark and stages and how to control the level of parallelism.

Objective:

After completing this module 13,you should be able to:

  • Understand RDD Partitions
  • Understand and Explain Spark on a cluster
  • Understand and Explain the partitioning of File based RDDs
  • Understand and Explain Parallel Operations on Partitions
  • Understand and Describe Stages and Tasks
  • Understand and Explain controlling the Level of Parallelism
  • Understand HDFS and Data Locality

Topics:

  • Spark on a Cluster
  • RDD Partitions
  • HDFS and Data Locality
  • Stages and Tasks
  • Controlling the Level of Parallelism
  • Partitioning of File based RDDs
  • Parallel Operations on Partitions

 

Hands on:

  • RDD Partitions
  • Stages and Tasks
  • Controlling the Level of Parallelism
  • Spark on Cluster
  • Partitioning of File based RDDs
  • HDFS and Data Locality
  • Parallel Operations on Partitions

Module 14

Spark RDD Optimization Techniques

Goal:

In this module,You will  learn about RDD lineage,overview on caching,distributed persistence,storage levels of RDD persistence,how to choose the correct RDD persistence storage level and RDD fault tolerance.

Objective:

After completing this module 14,you should be able to:

  • Understand RDD Lineage
  • Understand Distributed Persistence
  • Understand and Describe Storage levels of RDD Persistence
  • Understand and Explain RDD Fault Tolerance
  • Understand Caching Overview
  • Understand and Explain how to choose the correct RDD Persistence Storage Level

Topics:

  • Caching Overview
  • RDD Lineage
  • Storage Levels of RDD Persistence
  • Distributed Persistence
  • RDD Fault Tolerance
  • Choosing the Correct RDD Persistence Storage Level

Hands on:

  • RDD Fault Tolerance
  • Understanding of Distributed Persistence
  • Understanding the storage levels of RDD Persistence
  • Understand ing of RDD Lineage
  • Caching Overview
  • Choosing the Correct RDD Persistence Storage Level

Module 15

Spark Algorithm

Goal:

In this module,You will  learn about common spark use cases,interactive algorithms in spark,graph processing and analysis,machine and k-means algorithm.

Objective:

After completing this module 15,you should be able to:

  • Understand Iterative Algorithms in Spark
  • Understand and Describe Graph Processing and Analysis
  • Understand and Explain Machine Learning
  • Understand Common Spark use Cases
  • Understand and Describe K-means Algorithm

Topics:

  • Graph Processing and Analysis
  • Machine Learning
  • K-means Algorithm
  • Common Spark Use Cases
  • Iterative Algorithms in Spark

Hands on:

  • Machine Learning
  • K-Means Algorithm
  • Iterative Algorithms in Spark
  • Common Spark Use Cases
  • Graph Processing and Analysis

 

 

Module 16

Spark SQL

Goal:

In this module,You will  learn about Spark SQL and SQL Context,Creating dataframes,transforming and querying dataframes and comparing Spark SQL with Impala.

Objective:

After completing this module 16,you should be able to:

  • Understand Spark SQL and SQL Context
  • Understand and Explain Comparison betweenn Spark SQL and Impala
  • Creating Dataframes
  • Understand about Transforming and Querying DataframesIterative

Topics:

  • Comparing Spark SQL with Impala
  • Creating Dataframes
  • Spark SQL and the SQL Context
  • Transforming and Queying Dataframes

Hands on:

  • Creating Dataframes
  • Understanding the comparison between Spark SQL and Impala
  • Spark SQL and SQL Context Comparison and Understanding
  • Transforming and Querying Dataframes