Big Data Applications using Hadoop

Hadoop has emerged as one of the most popular framework enabling huge data to be stored & processed. This course will cover various aspects of how Hadoop works.

  • 24 hours of live, expert instructor-led training
  • 15 hours of self-paced learning
  • 15 hours of capstone project
  • Verified certificate from NIIT
Call Me

Online Instructor-Led

batch loading...

Course Features

Related Courses

About Big Data Applications using Hadoop

The field of Big Data is exploding everyday with new software tools and technologies enabling us to answer questions that we may not have been able to, even a few years ago. Its impact has been felt in almost every sphere of life, from transportation, health, education, commerce, communications and politics. Hadoop has probably emerged as one of the most popular framework that enables huge data to be stored and processed. The tools that have been developed around Hadoop like Pig, Hive, Sqoop, Oozie, Flume, HBase, and Spark; with its ability to process streaming data and machine learning algorithms applied on massive data, is becoming the standard framework that developers want to learn and work with. This course will cover in detail various aspects of how Hadoop works along with introduction to Spark.

Course Objectives
  • Acquire conceptual knowledge on Big Data, Data Engineering and Data Science
  • Learn Hadoop Architecture and develop Map Reduce Programs using an IDE like Eclipse or Java
  • Build queries and process data residing in HDFS using tools like Pig, Hive, Sqoop and HBase
  • Understand the limitations of Hadoop and get to know where Spark shines
Module 1:

Why Big Data and Hadoop?

  • Role Big Data plays in our lives
  • History of Hadoop and its role in the Big Data Space
Module 2:

Setting up a Cloudera VM on the local machine

  • Downloading/Copying and Installing Virtual Box
  • Downloading / Copying the ""right"" cloudera VM installer
  • Installing Cloudera bits on the VM
  • Prepping up the VM instance for the course. (Enabling Sharable folders, bidirectional copying, etc)
  • Brief discussion on modes of operation in Hadoop (local vs cluster based)
  • Running a sample hadoop jar
Module 3:


  • Role of HDFS in the Big Data world
  • Comparison of Local FS and HDFS
  • Discussion on Name Node and Data Node
  • Anatomy of File Read / Write in HDFS
  • Failures – NameNode / Data Node
  • HDFS Federation – What / Why
Module 4:


  • Map Reduce explained with a simple real-life example
  • Map Reduce introduced with the WordCount program (Demo)
  • YARN – Introduction
    • Components, Failures, Scheduling
    • Tool / Tool Runner Classes
    • Key- Value Pairs
    • Input / Output File Formats
    • Combiners / Partitioners
    • Shuffle and Sort
    • Counters
Module 5:


  • Need / History of Pig
  • Installing / Configuring Pig - Modes of Execution
  • Discussion on Pig Latin
    • Writing UDFs
Module 6:


  • Need / History of Hive
  • Installing / Configuring / Modes of Execution
  • HiveQL Vs SQL (RDBMS) – a very brief discussion
  • Deep Dive into HiveQL
    1. Managed / External Tables
    2. Buckets / partitions
    3. CTAS / Multiple Inserts
    4. Different ways of interacting with Hive (shell / hue)
    5. Writing UDFs and using them in Hive
Module 7:


  • Why SQOOP?
  • Installing / Configuring
  • Importing Data into Hadoop using SQOOP
    1. Import into HDFS
    2. Direct import into Hive
  • Exporting Data from Hadoop using SQOOP
Module 8:


  • What purpose does Oozie serve?
  • Discussion on Workflow.xml and Job.Properties files for various tools
  • Discussion on Action / Control nodes
  • Demo involving multiple operations using Hue Editor / Commands (from terminal)
Module 9:


  • Downloading / Configuring and Installing HBase
  • Why Hbase? HBase Vs RDBMS
  • Structure of a Table, Regions, Region Servers, Meta Tables
  • Basic Commands (get, put scan, etc)
  • Loading Data into Hbase table, (Bulk loading using MR)
  • Integration with Pig / Hive Tools
Module 10:

Introduction to Spark

  • Limitations of Hadoop
  • Discussion on Spark Architecture
  • Purpose of Spark Libraries
  • Demo using Streaming and Machine Learning modules of Spark
  • Where to go from here?

Instructors are handpicked from a selected group of industry experts and mentors and trained to deliver the best online learning experience. All instructors have at least ten years of industry experience and extensive functional expertise in the field they train.


A test will be conducted at the end of the course. On completion of the test with a minimum of 70% marks, will issue a certificate of successful completion from NIIT.

One re-attempt will be provided in case the candidate scores less than 70%.

A Participation certificate will be issued if the candidate does not score 70% after five attempts.


The prerequisites for the program are:

  • The learner should have basic programming knowledge in any software programming language.
  • The learner should good working knowledge in SQL Server.

Who should join this course?

This course is for software programmers keen on making a career in Big Data Analytics. Big data analytics unveils market trends, customer preferences and other useful business information. Big Data is the biggest game-changing opportunity for marketing and sales.

Will there be any project in the program?

Yes, you will be implementing a project during the course. Project will help you implement what you have learnt during the course. The details of the project will be shared in the first orientation session of the course.

What happens if I miss a session?

All the live sessions are recorded and available for later view. Learners can refer to recordings of a missed session at their convenience.

What is your refund policy?

Upon registering for the course, if for some reason you are unable or unwilling to participate in the course further, you can apply for a refund. You can initiate the refund any time before start of the second session of the course by sending an email to , with your enrolment details and bank account details (where you want the amount to be transferred). Once you initiate a refund request, you will receive the amount within 21 days after confirmation and verification by our team. This is provided if you have not downloaded any courseware after registration.

What are the minimum system requirements to attend the program?

    Minimum system requirements for accessing the courses are:

  • Personal computer or Laptop with web camera
  • Headphone with Mic
  • Broadband connection with minimum bandwidth of 1mbps .however, recommend is 2 mbps.
  • A self-diagnostic test to meet necessary requirements to be done is available at

    Please note that webcam, mike and internet speed cannot be verified through this link.

Is there an official support desk for technical guidance during the training program?

Yes. For immediate technical support, you can reach out on 91-9717992809 or 0124-4917203 between 9:00 AM and 8:00 PM IST. You can write to for all other queries and our team will be happy to help you.

Course Features

batch loading...

Related Courses

AWS Certification and Training Program
Administration Essentials for New Admins- Salesforce
Advanced Pay Per Click
Advanced Program in Data Sciences
Advanced Social Media Marketing
Analyzing and Visualizing Data with Excel
Analyzing and Visualizing Data with Power BI
Big Data Applications using Hadoop
Big Data with R
Building Applications with
Complete Web and Social Media Analytics
Data Quality 9.x: Developer, Level 1
Data Science Orientation
DevOps Certification Training
Developing Microsoft SharePoint® Server 2013
Enabling and Managing Microsoft Office 365
Executive Program in Applied Finance
Executive Program in Digital and Social Media Marketing Strategy
Implementing Microsoft Azure Solutions-70-533
Informatica PowerCenter 9.x Level 1
Introduction to ITIL
Java Enterprise Apps with DevOps
Joomla Certification Training Program
LEAD (Learn. Enhance. Aspire. Deliver)
Master AngularJS 2
Open Source Web App Development using MEAN Stack
PMI® Agile Certified Practitioner Training
Post Graduate Certificate in General Management (PGCGM)
Professional Diploma in Digital Marketing
Programming Using Python
Programming with Python for Data Sciences
Project Management Professional (PMP®) Training
SAS Certification Training Program
Six Sigma Certification Training Program
Supply Chain Management(SCM) Training Program
Teradata Certification Training
UNIX Shell Scripting Training
Web Apps Development using Node.js along with Express.js and MongoDB
Web Apps Development with HTML5, CSS3, jQuery & Bootstrap
iOS App Development Certification Training