The field of Big Data is exploding with each passing day - new software tools and technologies enabling us to answer questions that we may not have been able to, even a few years ago. Its impact has been felt in almost every sphere of life - from transportation, health, education, commerce, communications and politics - to name a few! Hadoop has probably emerged as one of the most popular framework that enables huge data to be stored and processed. The tools that have been developed around Hadoop like Pig, Hive, Sqoop, Oozie, Flume and HBase - to name a few have allowed engineers to put Hadoop to great use - using the tools/languages that they are familiar with. Spark is a recent development. With its ability to process streaming data and allow machine learning algorithms to easily be applied on massive data, it is becoming the standard framework that developers what to learn and work with. This course will cover in detail various aspects of how Hadoop works. It also introduces you to Spark.
At the end of the course you will be able to
- Define Big Data and understand Data Engineering and Data Science concepts and roles
- Understand Hadoop Architecture and be able to develop Map Reduce Programs using an IDE like Eclipse and Java
- Use tools like Pig, Hive, Sqoop and HBase to store, query and process data residing in HDFS
- Understand the limitations of Hadoop and get to know where Spark shines.
Why Big Data and Hadoop?
- Role Big Data plays in our lives.
- History of Hadoop and its role in the Big Data Space.
Setting up a Cloudera VM on the local machine
- Downloading/Copying and Installing Virtual Box
- Downloading / Copying the ""right"" cloudera VM installer
- Installing Cloudera bits on the VM
- Prepping up the VM instance for the course. (Enabling Sharable folders, bidirectional copying, etc)
- Brief discussion on modes of operation in Hadoop (local vs cluster based)
- Running a sample hadoop jar
- Role of HDFS in the Big Data world.
- Comparison of Local FS and HDFS.
- Discussion on Name Node and Data Node.
- Anatomy of File Read / Write in HDFS.
- Failures – NameNode / Data Node
- HDFS Federation – What / Why
- Map Reduce explained with a simple real-life example
- Map Reduce introduced with the WordCount program (Demo)
- YARN – Introduction
- Components, Failures, Scheduling
- Tool / Tool Runner Classes
- Key- Value Pairs
- Input / Output File Formats
- Combiners / Partitioners
- Shuffle and Sort
- Need / History of Pig
- Installing / Configuring Pig - Modes of Execution
- Discussion on Pig Latin
- LOAD/DESCRIBE/ILLUSTRATE/DUMP commands
- FILTERS / FOREACH / GENERATE/ GROUP commands
- UNION / SPLIT
- Writing UDFs
- Need / History of Hive
- Installing / Configuring / Modes of Execution
- HiveQL Vs SQL (RDBMS) – a very brief discussion
- Deep Dive into HiveQL
- Managed / External Tables
- Buckets / partitions
- CTAS / Multiple Inserts
- Different ways of interacting with Hive (shell / hue)
- Writing UDFs and using them in Hive
- Why SQOOP?
- Installing / Configuring
- Importing Data into Hadoop using SQOOP
- Import into HDFS
- Direct import into Hive
- Exporting Data from Hadoop using SQOOP
- What purpose does Oozie serve?
- Discussion on Workflow.xml and Job.Properties files for various tools
- Discussion on Action / Control nodes
- Demo involving multiple operations using Hue Editor / Commands (from terminal)
- Downloading / Configuring and Installing HBase
- Why Hbase? HBase Vs RDBMS
- Structure of a Table, Regions, Region Servers, Meta Tables
- Basic Commands (get, put scan, etc)
- Loading Data into Hbase table, (Bulk loading using MR)
- Integration with Pig / Hive Tools
Introduction to Spark
- Limitations of Hadoop
- Discussion on Spark Architecture
- Purpose of Spark Libraries
- Demo using Streaming and Machine Learning modules of Spark
- Where to go from here?
Mr.Sriram R will be the instructor for the batch. He has 20+ years of experience in varied roles in the IT industry. He has worked in organizations like Microsoft , Citi Bank , Ramco Systems to name a few. His areas of expertise are Big Data, Data Science and Internet of Things.
After successful completion of the training, you will get a verified certificate from NIIT.
The learner should have basic programming knowledge (in any language) and a good working knowledge of SQL Server essential.
Is their any pre requisite for this program?
Familiarity with any programming language is desired as you will be introduced to writing MapReduce programs using Java. Decent knowledge of SQL is essential.
Who should join this course?
This course is suitable for Software Developers who want to grow in the field of Big Data.
Will there be any project in the program?
Yes, you will be implementing a project during the course. Project will help you implement what you have learnt during the course. The details of the project will be shared in the first orientation session of the course.
What happens if I miss a session?
In additions to the study material, we provide you with recordings of each session that you can view anytime you want. So if you miss any live class, you may refer to the recordings of that session.
What is your refund policy?
After registering for the program if for some reason you are unable or unwilling to participate in the program then you can apply for a refund. You can initiate refund any time before the 2nd session of the course After registering for the program if for some reason you are unable or unwilling to participate in the program then you can apply for a refund. You can initiate refund anytime before the second session of the course by sending a mail to Support@training.com with your enrolment details and bank account details (where you want the amount to be transferred). Once you initiate a refund request you will receive the amount within 21 days after confirmation of refund request.This is provided you have not downloaded any courseware for the course.
What are the minimum system requirements to attend the program?
- A PC/Laptop
- Web Cam
- Headphone with Mic
- Min 1MBPS broadband connection
- This self-diagnostic test will verify if you meet the necessary requirements (webcam, mike and internet speed cannot be verified through this link)