Big Data Applications using Hadoop

Hadoop has emerged as one of the most popular framework enabling huge data to be stored & processed. This course will cover various aspects of how Hadoop works.

  • 24 hours of live, expert instructor-led training
  • 15 hours of self-paced learning
  • 15 hours of capstone project
  • Verified certificate from NIIT
Call Me

Online Instructor-Led

batch loading...

Course Features

Related Courses

About Big Data Applications using Hadoop

The field of Big Data is exploding everyday with new software tools and technologies enabling us to answer questions that we may not have been able to, even a few years ago. Its impact has been felt in almost every sphere of life, from transportation, health, education, commerce, communications and politics. Hadoop has probably emerged as one of the most popular framework that enables huge data to be stored and processed. The tools that have been developed around Hadoop like Pig, Hive, Sqoop, Oozie, Flume, HBase, and Spark; with its ability to process streaming data and machine learning algorithms applied on massive data, is becoming the standard framework that developers want to learn and work with. This course will cover in detail various aspects of how Hadoop works along with introduction to Spark.

Course Objectives
  • Acquire conceptual knowledge on Big Data, Data Engineering and Data Science
  • Learn Hadoop Architecture and develop Map Reduce Programs using an IDE like Eclipse or Java
  • Build queries and process data residing in HDFS using tools like Pig, Hive, Sqoop and HBase
  • Understand the limitations of Hadoop and get to know where Spark shines
Module 1:

Why Big Data and Hadoop?

  • Role Big Data plays in our lives
  • History of Hadoop and its role in the Big Data Space
Module 2:

Setting up a Cloudera VM on the local machine

  • Downloading/Copying and Installing Virtual Box
  • Downloading / Copying the ""right"" cloudera VM installer
  • Installing Cloudera bits on the VM
  • Prepping up the VM instance for the course. (Enabling Sharable folders, bidirectional copying, etc)
  • Brief discussion on modes of operation in Hadoop (local vs cluster based)
  • Running a sample hadoop jar
Module 3:


  • Role of HDFS in the Big Data world
  • Comparison of Local FS and HDFS
  • Discussion on Name Node and Data Node
  • Anatomy of File Read / Write in HDFS
  • Failures – NameNode / Data Node
  • HDFS Federation – What / Why
Module 4:


  • Map Reduce explained with a simple real-life example
  • Map Reduce introduced with the WordCount program (Demo)
  • YARN – Introduction
    • Components, Failures, Scheduling
    • Tool / Tool Runner Classes
    • Key- Value Pairs
    • Input / Output File Formats
    • Combiners / Partitioners
    • Shuffle and Sort
    • Counters
Module 5:


  • Need / History of Pig
  • Installing / Configuring Pig - Modes of Execution
  • Discussion on Pig Latin
    • Writing UDFs
Module 6:


  • Need / History of Hive
  • Installing / Configuring / Modes of Execution
  • HiveQL Vs SQL (RDBMS) – a very brief discussion
  • Deep Dive into HiveQL
    1. Managed / External Tables
    2. Buckets / partitions
    3. CTAS / Multiple Inserts
    4. Different ways of interacting with Hive (shell / hue)
    5. Writing UDFs and using them in Hive
Module 7:


  • Why SQOOP?
  • Installing / Configuring
  • Importing Data into Hadoop using SQOOP
    1. Import into HDFS
    2. Direct import into Hive
  • Exporting Data from Hadoop using SQOOP
Module 8:


  • What purpose does Oozie serve?
  • Discussion on Workflow.xml and Job.Properties files for various tools
  • Discussion on Action / Control nodes
  • Demo involving multiple operations using Hue Editor / Commands (from terminal)
Module 9:


  • Downloading / Configuring and Installing HBase
  • Why Hbase? HBase Vs RDBMS
  • Structure of a Table, Regions, Region Servers, Meta Tables
  • Basic Commands (get, put scan, etc)
  • Loading Data into Hbase table, (Bulk loading using MR)
  • Integration with Pig / Hive Tools
Module 10:

Introduction to Spark

  • Limitations of Hadoop
  • Discussion on Spark Architecture
  • Purpose of Spark Libraries
  • Demo using Streaming and Machine Learning modules of Spark
  • Where to go from here?

Instructors are handpicked from a selected group of industry experts and mentors and trained to deliver the best online learning experience. All instructors have at least ten years of industry experience and extensive functional expertise in the field they train.


A test will be conducted at the end of the course. On completion of the test with a minimum of 70% marks, will issue a certificate of successful completion from NIIT.

One re-attempt will be provided in case the candidate scores less than 70%.

A Participation certificate will be issued if the candidate does not score 70% after five attempts.


The prerequisites for the program are:

  • The learner should have basic programming knowledge in any software programming language.
  • The learner should good working knowledge in SQL Server.

Who should join this course?

This course is for software programmers keen on making a career in Big Data Analytics. Big data analytics unveils market trends, customer preferences and other useful business information. Big Data is the biggest game-changing opportunity for marketing and sales.

Will there be any project in the program?

Yes, you will be implementing a project during the course. Project will help you implement what you have learnt during the course. The details of the project will be shared in the first orientation session of the course.

What happens if I miss a session?

All the live sessions are recorded and available for later view. Learners can refer to recordings of a missed session at their convenience.

What is your refund policy?

Upon registering for the course, if for some reason you are unable or unwilling to participate in the course further, you can apply for a refund. You can initiate the refund any time before start of the second session of the course by sending an email to , with your enrolment details and bank account details (where you want the amount to be transferred). Once you initiate a refund request, you will receive the amount within 21 days after confirmation and verification by our team. This is provided if you have not downloaded any courseware after registration.

What are the minimum system requirements to attend the program?

    Minimum system requirements for accessing the courses are:

  • Personal computer or Laptop with web camera
  • Headphone with Mic
  • Broadband connection with minimum bandwidth of 1mbps .however, recommend is 2 mbps.
  • A self-diagnostic test to meet necessary requirements to be done is available at

    Please note that webcam, mike and internet speed cannot be verified through this link.

Is there an official support desk for technical guidance during the training program?

Yes. For immediate technical support, you can reach out on 91-9717992809 or 0124-4917203 between 9:00 AM and 8:00 PM IST. You can write to for all other queries and our team will be happy to help you.

Course Features

batch loading...

Related Courses

AI and Deep Learning with TensorFlow
AWS Certification and Training Program
Administration Essentials for New Admins- Salesforce
Advanced Data Mining projects with R
Advanced Pay Per Click
Advanced Program in Data Sciences
Advanced Social Media Marketing
Analyzing and Visualizing Data with Excel
Analyzing and Visualizing Data with Power BI
Android Game Development for Beginners
Application Development with Swift 2
Automated UI Testing in Java
Big Data Analytics with R
Big Data Applications using Hadoop
Building Android Games with OpenGL ES
Building Applications with Ext JS
Building Applications with
Building a Data Mart with Pentaho Data Integration
Building iOS 10 Applications with Swift
Builiding web application with spring MVC
Business Analytics using R from KPMG
Certified Digital Marketing Professional
Complete Web and Social Media Analytics
Data Quality 9.x: Developer, Level 1
Data Science Orientation
Data Science with R
Data Science with Spark
DevOps Certification Training
Developing Microsoft SharePoint® Server 2013
Enabling and Managing Microsoft Office 365
Executive Program in Applied Finance
Executive Program in Digital and Social Media Marketing Strategy
Getting Started with R for Data Science
Getting started with Apache Solr Search Server
IBM Cognos Connection and Workspace Advanced
Implementing Microsoft Azure Solutions-70-533
Informatica PowerCenter 9.x Level 1
Introducing Rails 5 Learning Web Development the Ruby Way
Introduction to ITIL
Java Enterprise Apps with DevOps
Joomla Certification Training Program
Julia for Data Science
LEAD (Learn. Enhance. Aspire. Deliver)
Learning Android N Application Development
Learning Data Mining with R
Learning Joomla 3 Extension Development
Learning MongoDB
Learning R for Data Visualization
Learning Spring Boot
Learning Swift 2
Linux shell scripting solution
Machine Learning with Python
Master AngularJS 2
Mastering Magento
Open Source Web App Development using MEAN Stack
PMI® Agile Certified Practitioner Training
Pentaho Reporting
Post Graduate Certificate in General Management (PGCGM)
Programming Using Python
Programming with Python for Data Sciences
Project Management Professional (PMP®) Training
R Data Mining Projects
R for Data Science Solutions
Reactive Java 9
SAS Certification Training Program
Secrets of Viral Video Marketing
Selenium with Java
Six Sigma Certification Training Program
Spring Security
Supply Chain Management(SCM) Training Program
Teradata Certification Training
Test Driven Android
UNIX Shell Scripting Training
Web Apps Development using Node.js along with Express.js and MongoDB
Web Apps Development with HTML5, CSS3, jQuery & Bootstrap
Web Development with Node.JS and MongoDB
iOS App Development Certification Training
jquery UI Development