Data Science with Spark

This practical hands-on tutorial covers the fundamentals of Spark needed to get grips with Data Science through a single data set. It expands on the next learning curve for those comfortable with Spark programming who are looking to apply Spark in the field of Data Science.

  • Comprehensive training through 40 video sessions.
  • Understand the Spark programming language and its ecosystem of packages in Data Science
  • Understand the Spark machine learning algorithm to build a simple pipeline
  • Apply data mining techniques on the available data sets
    Call Me

    Self-Paced

    batch loading...

    Course Features

    Related Courses

    About Data Science with Spark

    The real power and value proposition of Apache Spark is its speed and platform to execute Data Science tasks. Spark's unique use case is that it combines ETL, batch analytic, real-time stream analysis, machine learning, graph processing, and visualizations to allow Data Scientists to tackle the complexities that come with raw unstructured data sets. Spark embraces this approach and has the vision to make the transition from working on a single machine to working on a cluster, something that makes data science tasks a lot more agile.

    In this course, you’ll get a hands-on technical resource that will enable you to become comfortable and confident working with Spark for Data Science. We won't just explore Spark’s Data Science libraries, we’ll dive deeper and expand on the topics.

    This course starts by taking you through Spark and the needed steps to build machine learning applications. You will learn to collect, clean, and visualize data coming from Twitter with Spark streaming. Then, you will get acquainted with Spark Machine learning algorithms and different machine learning techniques. You will also learn to apply statistical analysis and mining operations on our Tweet dataset. Finally, the course will end by giving you some ideas on how to perform awesome analysis including graph processing. By the end of the course, you will be able to do your Data scientist job in a very visual way, comprehensive and appealing for business and other stakeholders.

    Course Objectives
    • Understand the Spark programming language and its ecosystem of packages in Data Science
    • Obtain and clean data before processing it
    • Understand the Spark machine learning algorithm to build a simple pipeline
    • Work with interactive visualization packages in Spark
    • Apply data mining techniques on the available data sets
    • Build a recommendation engine
    Curriculum
    Module 1:

    Your Spark And Visualization Toolkit

    • The Course Overview
    • Spark: Origins and Ecosystem for Big Data Scientists, the Scala, Python, and R flavors
    • Install Spark on Your Laptop with Docker, or Scale Fast in the Cloud
    • Apache Zeppelin, a Web-Based Notebook for Spark with matplotlib and ggplot2
    Module 2:

    Your Next Data Challenges

    • Manipulating Data with the Core RDD API
    • Using Dataframe, Dataset, and SQL – Natural and Easy!
    • Manipulating Rows and Columns
    Module 3:

    First Steps With Spark Visualization

    • Discovering spark.ml and spark.mllib - and Other Libraries
    • Wrapping Up Basic Statistics and Linear Algebra
    • Cleansing Data and Engineering the Features
    • Reducing the Dimensionality
    Module 4:

    Collecting And Cleansing The Dirty Tweets

    • Streaming Tweets to Disk
    • Streaming Tweets on a Map
    • Cleansing and Building Your Reference Dataset
    Module 5:

    Statistical Analysis On Tweets

    • Indicators, Correlations, and Sampling
    • Validating Statistical Relevance
    • Running SVD and PCA
    Module 6:

    Extracting Features From The Tweets

    • Analyzing Free Text from the Tweets
    • Dealing with Stemming, Syntax, Idioms and Hashtags
    • Detecting Tweet Sentiment
    Module 7:

    Mine Data And Share Results

    • Word Cloudify Your Dataset
    • Locating Users and Displaying Heatmaps with GeoHash
    • Collaborating on the Same Note with Peers
    Module 8:

    Classifying The Tweets

    • Building the Training and Test Datasets
    • Training a Logistic Regression Model
    • Evaluating Your Classifier
    Module 9:

    Clustering Users

    • Clustering Users by Followers and Friends
    • Clustering Users by Location
    • Running KMeans on a Stream
    Module 10:

    Your Next Data Challanges

    • Recommending Similar Users
    • Analyzing Mentions with GraphX
    • Where to Go from Here
    Instructor

    Eric Charles has 10 years’ experience in the field of Data Science and is the founder of Datalayer (http://datalayer.io/docker), a social network for Data Scientists. He is passionate about using software and mathematics to help companies get insights from data.

    His typical day includes building efficient processing with advanced machine learning algorithms, easy SQL, streaming and graph analytics. He also focuses a lot on visualization and result sharing.

    He is passionate about open source and is an active Apache Member. He regularly gives talks to corporate clients and at open source events. He can be contacted on Twitter on @echarles.

    Certification

    A test will be conducted at the end of the course. On completion of the test with a minimum of 70% marks, training.com will issue a certificate of successful completion from NIIT.

    Five re-attempts will be provided in case the candidate scores less than 70%.

    A Participation certificate will be issued if the candidate does not score 70% after five attempts.

    Pre-requisites

    Preferably Math or Statistics subjects learnt at school and college level.

    Database, MIS and data analysis would help a lot in easy and quick learning.

    FAQs

    Who should go for this Course?

    Software professional aspiring career in Data Science to use data analytical ability for interpreting rich data stores.

    Data visualisation and statistics aspirants with keen interest in programming can also join this course to leverage upon the learning for promising career growth.

    Where can I find my session schedule?

    The session schedule will be available in the training.com Student portal - Learning Plan section. You can login to your training.com account to view the same.

    What is your refund policy?

    Upon registering for the course, if for some reason you are unable or unwilling to participate in the course further, you can apply for a refund. You can initiate the refund any time before start of the second session of the course by sending an email to support@training.com , with your enrolment details and bank account details (where you want the amount to be transferred). Once you initiate a refund request, you will receive the amount within 21 days after confirmation and verification by our team. This is provided if you have not downloaded any courseware after registration.

    Why is it called Self Paced course?

    Self Paced courses are comprised of several learning videos into a course structure broken down into Learning Modules and Sessions. The learner is required to go through the videos topic-wise in the structure sequence of the course to learn the concepts. Being Self Paced, there is no intervention of any external faculty or additional mentor in learning.

    Being a self paced course, how will my attendance be tracked and marked?

    you login into your training.com account to watch the videos, attendance for it will be marked automatically.

    How will the assessment be conducted for my certification?

    After each module, a multiple choice questions type online assessment will be conducted. 5 Attempts will be allowed for the assessment to be completed. The minimum pass percentage for each assessment is 70%. On successfully clearing the assessment, a verified certificate from NIIT shall be awarded otherwise the certificate of participation will be issued.

    What are the minimum system requirements to attend the course?

      Minimum system requirements for accessing the courses are:

    • Personal computer or Laptop with web camera
    • Headphone with Mic
    • Minimum 4 Mbps broadband connection

    Is there an official support desk for technical guidance during the training program?

    Yes.For immediate technical support during the live online classroom sessions, you can call 91-9717992809 or 0124-4917203 between 9:00 AM and 8:00 PM IST. You can write to support@training.com for all other queries and our team will be happy to help you.

    Course Features

    batch loading...

    Related Courses

    AI and Deep Learning with TensorFlow
    AWS Certification and Training Program
    Administration Essentials for New Admins- Salesforce
    Advanced Data Mining projects with R
    Advanced Pay Per Click
    Advanced Program in Data Sciences
    Advanced Social Media Marketing
    Analyzing and Visualizing Data with Excel
    Analyzing and Visualizing Data with Power BI
    Android Game Development for Beginners
    Application Development with Swift 2
    Automated UI Testing in Java
    Big Data Analytics with R
    Big Data Applications using Hadoop
    Building Android Games with OpenGL ES
    Building Applications with Ext JS
    Building Applications with Force.com
    Building a Data Mart with Pentaho Data Integration
    Building iOS 10 Applications with Swift
    Builiding web application with spring MVC
    Business Analytics using R from KPMG
    Certified Digital Marketing Professional
    Complete Web and Social Media Analytics
    Data Quality 9.x: Developer, Level 1
    Data Science Orientation
    Data Science with R
    Data Science with Spark
    DevOps Certification Training
    Developing Microsoft SharePoint® Server 2013
    Enabling and Managing Microsoft Office 365
    Executive Program in Applied Finance
    Executive Program in Digital and Social Media Marketing Strategy
    Getting Started with R for Data Science
    Getting started with Apache Solr Search Server
    IBM Cognos Connection and Workspace Advanced
    Implementing Microsoft Azure Solutions-70-533
    Informatica PowerCenter 9.x Level 1
    Introducing Rails 5 Learning Web Development the Ruby Way
    Introduction to ITIL
    Java Enterprise Apps with DevOps
    Joomla Certification Training Program
    Julia for Data Science
    LEAD (Learn. Enhance. Aspire. Deliver)
    Learning Android N Application Development
    Learning Data Mining with R
    Learning Joomla 3 Extension Development
    Learning MongoDB
    Learning R for Data Visualization
    Learning Spring Boot
    Learning Swift 2
    Linux shell scripting solution
    Machine Learning with Python
    Master AngularJS 2
    Mastering Magento
    Open Source Web App Development using MEAN Stack
    PMI® Agile Certified Practitioner Training
    Pentaho Reporting
    Post Graduate Certificate in General Management (PGCGM)
    Programming Using Python
    Programming with Python for Data Sciences
    Project Management Professional (PMP®) Training
    R Data Mining Projects
    R for Data Science Solutions
    Reactive Java 9
    SAS Certification Training Program
    Secrets of Viral Video Marketing
    Selenium with Java
    Six Sigma Certification Training Program
    Spring Security
    Supply Chain Management(SCM) Training Program
    Teradata Certification Training
    Test Driven Android
    UNIX Shell Scripting Training
    Web Apps Development using Node.js along with Express.js and MongoDB
    Web Apps Development with HTML5, CSS3, jQuery & Bootstrap
    Web Development with Node.JS and MongoDB
    iOS App Development Certification Training
    jquery UI Development