The field of Big Data is exploding everyday with new software tools and technologies enabling us to answer questions that we may not have been able to, even a few years ago. Its impact has been felt in almost every sphere of life, from transportation, health, education, commerce, communications and politics. Hadoop has probably emerged as one of the most popular framework that enables huge data to be stored and processed. The tools that have been developed around Hadoop like Pig, Hive, Sqoop, Oozie, Flume, HBase, and Spark; with its ability to process streaming data and machine learning algorithms applied on massive data, is becoming the standard framework that developers want to learn and work with. This course will cover in detail various aspects of how Hadoop works along with introduction to Spark.
- Acquire conceptual knowledge on Big Data, Data Engineering and Data Science
- Learn Hadoop Architecture and develop Map Reduce Programs using an IDE like Eclipse or Java
- Build queries and process data residing in HDFS using tools like Pig, Hive, Sqoop and HBase
- Understand the limitations of Hadoop and get to know where Spark shines
Why Big Data and Hadoop?
- Role Big Data plays in our lives
- History of Hadoop and its role in the Big Data Space
Setting up a Cloudera VM on the local machine
- Downloading/Copying and Installing Virtual Box
- Downloading / Copying the ""right"" cloudera VM installer
- Installing Cloudera bits on the VM
- Prepping up the VM instance for the course. (Enabling Sharable folders, bidirectional copying, etc)
- Brief discussion on modes of operation in Hadoop (local vs cluster based)
- Running a sample hadoop jar
- Role of HDFS in the Big Data world
- Comparison of Local FS and HDFS
- Discussion on Name Node and Data Node
- Anatomy of File Read / Write in HDFS
- Failures – NameNode / Data Node
- HDFS Federation – What / Why
- Map Reduce explained with a simple real-life example
- Map Reduce introduced with the WordCount program (Demo)
- YARN – Introduction
- Components, Failures, Scheduling
- Tool / Tool Runner Classes
- Key- Value Pairs
- Input / Output File Formats
- Combiners / Partitioners
- Shuffle and Sort
- Need / History of Pig
- Installing / Configuring Pig - Modes of Execution
- Discussion on Pig Latin
- LOAD/DESCRIBE/ILLUSTRATE/DUMP commands
- FILTERS / FOREACH / GENERATE/ GROUP commands
- UNION / SPLIT
- Writing UDFs
- Need / History of Hive
- Installing / Configuring / Modes of Execution
- HiveQL Vs SQL (RDBMS) – a very brief discussion
- Deep Dive into HiveQL
- Managed / External Tables
- Buckets / partitions
- CTAS / Multiple Inserts
- Different ways of interacting with Hive (shell / hue)
- Writing UDFs and using them in Hive
- Why SQOOP?
- Installing / Configuring
- Importing Data into Hadoop using SQOOP
- Import into HDFS
- Direct import into Hive
- Exporting Data from Hadoop using SQOOP
- What purpose does Oozie serve?
- Discussion on Workflow.xml and Job.Properties files for various tools
- Discussion on Action / Control nodes
- Demo involving multiple operations using Hue Editor / Commands (from terminal)
- Downloading / Configuring and Installing HBase
- Why Hbase? HBase Vs RDBMS
- Structure of a Table, Regions, Region Servers, Meta Tables
- Basic Commands (get, put scan, etc)
- Loading Data into Hbase table, (Bulk loading using MR)
- Integration with Pig / Hive Tools
Introduction to Spark
- Limitations of Hadoop
- Discussion on Spark Architecture
- Purpose of Spark Libraries
- Demo using Streaming and Machine Learning modules of Spark
- Where to go from here?
Instructors are handpicked from a selected group of industry experts and mentors and trained to deliver the best online learning experience. All training.com instructors have at least ten years of industry experience and extensive functional expertise in the field they train.
A test will be conducted at the end of the course. On completion of the test with a minimum of 70% marks, training.com will issue a certificate of successful completion from NIIT.
One re-attempt will be provided in case the candidate scores less than 70%.
A Participation certificate will be issued if the candidate does not score 70% after five attempts.
The prerequisites for the program are:
- The learner should have basic programming knowledge in any software programming language.
- The learner should good working knowledge in SQL Server.
Will there be any project in the program?
Yes, you will be implementing a project during the course. Project will help you implement what you have learnt during the course. The details of the project will be shared in the first orientation session of the course.
Who should join this course?
This course is for software programmers keen on making a career in Big Data Analytics. Big data analytics unveils market trends, customer preferences and other useful business information. Big Data is the biggest game-changing opportunity for marketing and sales.
Why should I join this course?
Big data analytics contributed for about one-fifth of the nation’s KPO market, which is considered to be worth almost $5.6 billion. Big Data Hadoop architects are in great demand and their average salary is 85000 USD to 115 000 USD. As per a survey done by McKinsey there is expected to be a shortage of 1.7 million professionals with big data skills in the U.S. alone by 2018, providing ample new opportunities for Hadoop architects to grow and expand their carrier.
What happens if I miss a session?
All the live sessions are recorded and available for later view. Learners can refer to recordings of a missed session at their convenience.
Where can I find my session schedule?
The session schedule will be available in the training.com Student portal - Learning Plan section. You can login to your training.com account to view the same.
Do you provide any study materials?
The study material will be available in the training.com Student portal - Resources section. You can login to your training.com account to view the same.
What is your refund policy?
Upon registering for the course, if for some reason you are unable or unwilling to participate in the course further, you can apply for a refund. You can initiate the refund any time before start of the second session of the course by sending an email to firstname.lastname@example.org , with your enrolment details and bank account details (where you want the amount to be transferred). Once you initiate a refund request, you will receive the amount within 21 days after confirmation and verification by our team. This is provided if you have not downloaded any courseware after registration.
What are the minimum system requirements to attend the program?
- Personal computer or Laptop with web camera
- Headphone with Mic
- Broadband connection with minimum bandwidth of 1 Mbps . However, recommend is 2 Mbps.
Minimum system requirements for accessing the courses are:
A self-diagnostic test to meet necessary requirements to be done is available at
Please note that webcam, mike and internet speed cannot be verified through this link.
Is there an official support desk for technical guidance during the training program?
Yes. For immediate technical support during the live online classroom sessions, you can call 91-9717992809 or 0124-4917203 between 9:00 AM and 8:00 PM IST. You can write to email@example.com for all other queries and our team will be happy to help you.