Hadoopexpress - Big Data Training, Consulting and Development
  • Login
  • Sign up

Introduction to Apache Spark

Duration: 2 to 4 Weeks
Flex Timing and Duration
$1499
This course is an introduction to Apache Spark, the most popular open source cluster computing system that enables reading, writing and analysis of data at lightning speeds. The course also includes introduction to Scala which is a language most suitable for working with Spark. Learn the architectural concepts of Spark including core components as well as Scala constructs. Understand how to use Spark with Scala to handle big datasets using APIs. Quickly understand essential features of Spark and how to use them. Learn how to run parallel jobs, and use batch processing, stream processing and machine learning using Spark

About this Course

Duration: 2 to 4 Weeks
Flex Timing and Duration
$1499
This course is an introduction to Apache Spark, the most popular open source cluster computing system that enables reading, writing and analysis of data at lightning speeds. The course also includes introduction to Scala which is a language most suitable for working with Spark. Learn the architectural concepts of Spark including core components as well as Scala constructs. Understand how to use Spark with Scala to handle big datasets using APIs. Quickly understand essential features of Spark and how to use them. Learn how to run parallel jobs, and use batch processing, stream processing and machine learning using Spark

Course Syllabus


The syllabus for this course includes and introduction to Apache Spark that covers the history of Spark, its uses and application to industry, advantage over prevailing Big Data tools and ecosystem and how Spark architecture is built around components like Streaming, MLib, GraphX, Spark SQL and Spark R. The course includes an introduction to scala as Spark is written in Scala language, RDD fundamentals, common operations and aggregations using pair RDDs. There is a good discussion on the runtime architecture using driver and executor processes in common deployment environments. Concepts of tuning and debugging, usage of Spark SQL, Spark Streaming and Machine Learning are well explained.

Course Structure


  • Introduction to Data Analysis with Spark
  • Intro to Scala
  • Installation and Overview
  • Spark RDDs
  • RDD Fundamentals
  • Common Operations
  • RDD Conversions
  • Aggregations by using Pair RDDs
  • Loading and Saving Data
  • Runtime Architecture
  • Common Deployment Environments
    • Built-in Cluster Manager
    • Running with Yarn
    • Running with Mesos
    • Running on Amazon EC2
  • Tuning and Debugging
  • Spark SQL
  • Spark Streaming
  • Machine learning with Spark

Course Logistics


The course is spread over five days with four hours of learning in each session. There are short breaks of ten minutes after every hour to hour-and-half.

Opportunities after the course


Spark had created a huge stir and revolution in the market and over a thousand companies have gone into production with Spark. It has a great demand in all companies that are adopting Big Data technology in all sectors such as banking, finance, retail, life sciences and telecom. There is a shortage of Spark developers in the market since it is a new technology and on-the-job training is rarely available. Those having knowledge of Hadoop coupled with Spark have a great opportunities in the job market and great potential for career enhancement.

Sessions

[ET] 9 am - 1.30 pm

Delivery Method
Instructor Based $ 1499

Additional Batches
Course at a Glance
  • English
  • Skill Level: Intermediate
Online Classes
Assignments: 5
Project: 1
Lifetime Access
Certificates
System Requirements


This course can be taken in classroom or remotely. For those attending in classroom, a computer can be provided on request for hands-on labs. For those attending remotely, a laptop or desktop is required with built-in microphone and speaker or an external headset so that the participant may be able to hear and speak with the instructor. High speed internet connection is required if connecting remotely.

Prerequisites

  • Basic knowledge and understanding of computer systems. Please see the free video on Spark Seminar on the home page of hadoopexpress.com before attending the class.
Testimonials

" The course was very interactive and easy to understand even for a beginner like me! It helped me prepare and pass my certification soon after completing the course!! "

- Priyam

" I really loved this course. It was fast paced, very hands on with fun filled exercises. Not only do I have lifetime access to lectures and notes, I can also email the instructor any time for help! Awesome!! "

- Samuel Adlekha

" Loved the the course. The instructor was patient and provided great demos and examples. I am new to programming but felt so comfortable since it was well explained. Awesome! "

- Shveta

" It was a pleasure and great learning experience with Net Serpents under the guidance of Mr. Shashi Prakash. "

- Aijaz

Contact Us:

Hadoop is a registered trademark of the Apache Software Foundation(ASF) and Hadoop is a product owned by Apache. Hadoop Express is not affiliated in any way to ASF . All educational material, resources, videos and other content available on this site is created and owned by Net Serpents and is intended only to provide training. This website does not own any of the products on which it provides training, many of which are owned by Apache while others are owned companies such as SAS, Python and Oracle. Net Serpents LLC is committed to education and online learning. All recognizable terms, names of software, tools, programming languages that appear on this site belong to the respective copyright and/or trademark owners.