Workshop: Hands on with Apache Spark

Location:

Level: 
Beginner
9:00am - 4:00pm

Prerequisites

  • High level knowledge of Hadoop.
  • Laptop with Java 7 or above and Spark 1.5.1 installed

Apache Spark is a fast and powerful open source data processing engine built for sophisticated analytics and easy of use. It is deemed as the next-generation successor of MapReduce. It was designed from the ground up to support batch processing, iterative processing, streaming processing and graph processing. Apache Spark provides a nice abstraction of large data sets with the concept of Resilient Distributed Datasets (RDD) and elegant APIs to easily manipulate these large data sets.

This tutorial will cover the core concepts in Apache Spark and will include hands on exercises using Spark APIs to solve common data processing problems. The exercises will be done using Apache Spark Scala APIs and therefore this tutorial will also cover the essential parts of Scala that are relevant to the exercises.

Here is what you can expect to learn from this tutorial:

  • The basics of Scala
  • Apache Spark architecture, core concepts, execution and programming model
  • Data exploration and analysis with Spark RDD APIs
  • Data exploration and analysis with Spark DataFrame & Spark SQL APIs
  • Understanding Spark streaming concepts

Target Audience:
Architect and software engineers that are interested in big data processing

Hien Luu Elsewhere

Other Workshops:

Day: Thursday [Full Day]
Day: Thursday [Full Day]
Day: Friday [Full Day]
Day: Friday [Full Day]

Tracks

Covering innovative topics

Monday, 7 March

Tuesday, 8 March

Wednesday, 9 March