Big Data Applications using Hadoop

Hadoop has emerged as one of the most popular framework enabling huge data to be stored & processed. This course will cover various aspects of how Hadoop works.

  • 24 hours of live, expert instructor-led training
  • 15 hours of self-paced learning
  • 15 hours of capstone project
  • Verified certificate from NIIT
Call Me

Online Instructor-Led

batch loading...

Course Features

Related Courses

About Big Data Applications using Hadoop

The field of Big Data is exploding everyday with new software tools and technologies enabling us to answer questions that we may not have been able to, even a few years ago. Its impact has been felt in almost every sphere of life, from transportation, health, education, commerce, communications and politics. Hadoop has probably emerged as one of the most popular framework that enables huge data to be stored and processed. The tools that have been developed around Hadoop like Pig, Hive, Sqoop, Oozie, Flume, HBase, and Spark; with its ability to process streaming data and machine learning algorithms applied on massive data, is becoming the standard framework that developers want to learn and work with. This course will cover in detail various aspects of how Hadoop works along with introduction to Spark.

Course Objectives
  • Acquire conceptual knowledge on Big Data, Data Engineering and Data Science
  • Learn Hadoop Architecture and develop Map Reduce Programs using an IDE like Eclipse or Java
  • Build queries and process data residing in HDFS using tools like Pig, Hive, Sqoop and HBase
  • Understand the limitations of Hadoop and get to know where Spark shines
Curriculum
Module 1:

Why Big Data and Hadoop?

  • Role Big Data plays in our lives
  • History of Hadoop and its role in the Big Data Space
Module 2:

Setting up a Cloudera VM on the local machine

  • Downloading/Copying and Installing Virtual Box
  • Downloading / Copying the ""right"" cloudera VM installer
  • Installing Cloudera bits on the VM
  • Prepping up the VM instance for the course. (Enabling Sharable folders, bidirectional copying, etc)
  • Brief discussion on modes of operation in Hadoop (local vs cluster based)
  • Running a sample hadoop jar
Module 3:

HDFS

  • Role of HDFS in the Big Data world
  • Comparison of Local FS and HDFS
  • Discussion on Name Node and Data Node
  • Anatomy of File Read / Write in HDFS
  • Failures – NameNode / Data Node
  • HDFS Federation – What / Why
Module 4:

MapReduce

  • Map Reduce explained with a simple real-life example
  • Map Reduce introduced with the WordCount program (Demo)
  • YARN – Introduction
    • Components, Failures, Scheduling
    • Tool / Tool Runner Classes
    • Key- Value Pairs
    • Input / Output File Formats
    • Combiners / Partitioners
    • Shuffle and Sort
    • Counters
Module 5:

Pig

  • Need / History of Pig
  • Installing / Configuring Pig - Modes of Execution
  • Discussion on Pig Latin
    • LOAD/DESCRIBE/ILLUSTRATE/DUMP commands
    • FILTERS / FOREACH / GENERATE/ GROUP commands
    • UNION / SPLIT
    • Writing UDFs
Module 6:

Hive

  • Need / History of Hive
  • Installing / Configuring / Modes of Execution
  • HiveQL Vs SQL (RDBMS) – a very brief discussion
  • Deep Dive into HiveQL
    1. Managed / External Tables
    2. Buckets / partitions
    3. CTAS / Multiple Inserts
    4. Different ways of interacting with Hive (shell / hue)
    5. Writing UDFs and using them in Hive
Module 7:

Sqoop

  • Why SQOOP?
  • Installing / Configuring
  • Importing Data into Hadoop using SQOOP
    1. Import into HDFS
    2. Direct import into Hive
  • Exporting Data from Hadoop using SQOOP
Module 8:

Oozie

  • What purpose does Oozie serve?
  • Discussion on Workflow.xml and Job.Properties files for various tools
  • Discussion on Action / Control nodes
  • Demo involving multiple operations using Hue Editor / Commands (from terminal)
Module 9:

HBase

  • Downloading / Configuring and Installing HBase
  • Why Hbase? HBase Vs RDBMS
  • Structure of a Table, Regions, Region Servers, Meta Tables
  • Basic Commands (get, put scan, etc)
  • Loading Data into Hbase table, (Bulk loading using MR)
  • Integration with Pig / Hive Tools
Module 10:

Introduction to Spark

  • Limitations of Hadoop
  • Discussion on Spark Architecture
  • Purpose of Spark Libraries
  • Demo using Streaming and Machine Learning modules of Spark
  • Where to go from here?
Instructor
 

Instructors are handpicked from a selected group of industry experts and mentors and trained to deliver the best online learning experience. All training.com instructors have at least ten years of industry experience and extensive functional expertise in the field they train.

Certification

A test will be conducted at the end of the course. On completion of the test with a minimum of 70% marks, training.com will issue a certificate of successful completion from NIIT.

One re-attempt will be provided in case the candidate scores less than 70%.

A Participation certificate will be issued if the candidate does not score 70% after five attempts.

Pre-requisites

The prerequisites for the program are:

  • The learner should have basic programming knowledge in any software programming language.
  • The learner should good working knowledge in SQL Server.
FAQs

Will there be any project in the program?

Yes, you will be implementing a project during the course. Project will help you implement what you have learnt during the course. The details of the project will be shared in the first orientation session of the course.

Who should join this course?

This course is for software programmers keen on making a career in Big Data Analytics. Big data analytics unveils market trends, customer preferences and other useful business information. Big Data is the biggest game-changing opportunity for marketing and sales.

Why should I join this course?

Big data analytics contributed for about one-fifth of the nation’s KPO market, which is considered to be worth almost $5.6 billion. Big Data Hadoop architects are in great demand and their average salary is 85000 USD to 115 000 USD. As per a survey done by McKinsey there is expected to be a shortage of 1.7 million professionals with big data skills in the U.S. alone by 2018, providing ample new opportunities for Hadoop architects to grow and expand their carrier.

What happens if I miss a session?

All the live sessions are recorded and available for later view. Learners can refer to recordings of a missed session at their convenience.

Where can I find my session schedule?

The session schedule will be available in the training.com Student portal - Learning Plan section. You can login to your training.com account to view the same.

Do you provide any study materials?

The study material will be available in the training.com Student portal - Resources section. You can login to your training.com account to view the same.

What is your refund policy?

Upon registering for the course, if for some reason you are unable or unwilling to participate in the course further, you can apply for a refund. You can initiate the refund any time before start of the second session of the course by sending an email to support@training.com , with your enrolment details and bank account details (where you want the amount to be transferred). Once you initiate a refund request, you will receive the amount within 21 days after confirmation and verification by our team. This is provided if you have not downloaded any courseware after registration.

What are the minimum system requirements to attend the program?

    Minimum system requirements for accessing the courses are:

  • Personal computer or Laptop with web camera
  • Headphone with Mic
  • Broadband connection with minimum bandwidth of 1 Mbps . However, recommend is 2 Mbps.
  • A self-diagnostic test to meet necessary requirements to be done is available at

    https://na1cps.adobeconnect.com/common/help/en/support/meeting_test.htm

    Please note that webcam, mike and internet speed cannot be verified through this link.

Is there an official support desk for technical guidance during the training program?

Yes. For immediate technical support during the live online classroom sessions, you can call 91-9717992809 or 0124-4917203 between 9:00 AM and 8:00 PM IST. You can write to support@training.com for all other queries and our team will be happy to help you.

Course Features

batch loading...

Related Courses

AI and Deep Learning with TensorFlow
AWS Certification and Training Program
Active Directory® Services with Windows Server®
Administering Microsoft Exchange Server 2016
Administering Microsoft® SQL Server® 2014 Databases
Administering System Center Configuration Manager and Intune
Administering Windows Server® 2012
Administering the Web Server IIS Role of Windows Server
Administration Essentials for New Admins- Salesforce
Advanced Automated Administration with Windows PowerShell®
Advanced Data Mining projects with R
Advanced Pay Per Click
Advanced Program in Data Sciences
Advanced Social Media Marketing
Advanced Solutions of Microsoft Exchange Server 2013
Advanced Solutions of Microsoft® SharePoint® Server 2013
Analyzing Data with Power BI
Analyzing and Visualizing Data with Excel
Analyzing and Visualizing Data with Power BI
Android Game Development for Beginners
Application Development with Swift 2
Automated UI Testing in Java
Automating Administration with Windows PowerShell®
Big Data Analytics with R
Big Data Applications using Hadoop
Building Android Games with OpenGL ES
Building Applications with Ext JS
Building Applications with Force.com
Building a Data Mart with Pentaho Data Integration
Building iOS 10 Applications with Swift
Builiding web application with spring MVC
Business Analytics using R from KPMG
Certified Digital Marketing Professional
Cloud and Datacenter Monitoring with System Center Operations Manager
Complete Web and Social Media Analytics
Configuring Advanced Windows Server® 2012 Services
Core Solutions of Microsoft® Exchange Server 2013
Core Solutions of Microsoft® SharePoint® Server 2013
Core Solutions of Skype for Business 2015
Data Quality 9.x: Developer, Level 1
Data Science Orientation
Data Science with R
Data Science with Spark
Deploying Windows Desktops and Enterprise Applications
Designing and Deploying Microsoft Exchange Server 2016
Designing and Implementing a Server Infrastructure
DevOps Certification Training
Developing Microsoft Azure Solutions
Developing Microsoft SharePoint® Server 2013
Developing Microsoft SharePoint® Server 2013 Core Solutions
Developing SQL Databases
Enabling and Managing Microsoft Office 365
Executive Program in Applied Finance
Executive Program in Digital and Social Media Marketing Strategy
Fundamentals of a Windows Server® Infrastructure
GNIIT Foundation
Getting Started with R for Data Science
Getting started with Apache Solr Search Server
IBM Cognos Connection and Workspace Advanced
IT Service Management with System Center Service Manager
Implementing Microsoft Azure Infrastructure Solutions
Implementing Microsoft Azure Solutions-70-533
Implementing a Data Warehouse with Microsoft® SQL Server® 2014
Informatica PowerCenter 9.x Level 1
Installing and Configuring Windows 10
Installing and Configuring Windows Server® 2012
Introducing Rails 5 Learning Web Development the Ruby Way
Introduction to ITIL
Introduction to SQL Databases
Introduction to Web Development with Microsoft Visual Studio 2010
Java Enterprise Apps with DevOps
Joomla Certification Training Program
Julia for Data Science
LEAD (Learn. Enhance. Aspire. Deliver)
Learning Android N Application Development
Learning Data Mining with R
Learning Joomla 3 Extension Development
Learning MongoDB
Learning R for Data Visualization
Learning Spring Boot
Learning Swift 2
Linux shell scripting solution
Machine Learning with Python
Marketing Analytics Data Tools and Techniques
Master AngularJS 2
Mastering Magento
Open Source Web App Development using MEAN Stack
PMI® Agile Certified Practitioner Training
Pentaho Reporting
Performance Tuning and Optimizing SQL Databases
Planning and Deploying System Center 2012 Configuration Manager
Post Graduate Certificate in General Management (PGCGM)
Programming Using Python
Programming in C Sharp
Programming in HTML5 with JavaScript and CSS3
Programming with Python for Data Sciences
Project Management Professional (PMP®) Training
Querying Data with Transact SQL
Querying Microsoft SQL Server® 2014
R Data Mining Projects
R for Data Science Solutions
Reactive Java 9
SAS Certification Training Program
Secrets of Viral Video Marketing
Selenium with Java
Six Sigma Certification Training Program
Spring Security
Supply Chain Management(SCM) Training Program
Supporting and Troubleshooting Windows 10
Teradata Certification Training
Test Driven Android
UNIX Shell Scripting Training
Upgrading Your Skills to MCSA Windows Server 2016
Upgrading Your Skills to MCSA Windows Server® 2012
Web Apps Development using Node.js along with Express.js and MongoDB
Web Apps Development with HTML5, CSS3, jQuery & Bootstrap
Web Development with Node.JS and MongoDB
iOS App Development Certification Training
jquery UI Development