Mon, 2 Mar




Participants should have some high level knowledge of Hadoop

Tutorial: Hands on with Apache Spark [Sold Out]

This tutorial is sold out.

Apache Spark is a new and exciting open source data processing engine and it is deemed as the next-generation successor of MapReduce. It was designed from the ground up to support streaming data processing, graph processing as well as complex iterative data processing. Apache Spark provides a nice abstraction of large data sets with the concept of Resilient Distributed Datasets (RDD) and elegant APIs to easily manipulate these large data sets.


This tutorial will cover the core concepts in Apache Spark and will include hands on exercises with using RDD APIs to solve common data processing problems. The exercises will be done using Apache Spark Scala APIs and therefore this tutorial will also cover the essential parts of Scala that are relevant to the exercises.


Here is what you can expect to learn from this tutorial:

  • The basics of Scala
  • Understand Apache Spark architecture, core concepts, programming model
  • Using Spark shell for interactive data analysis
  • Parallel programming with Spark RDD APIs
  • Developing standalone Spark applications
  • Developing Spark streaming applications


Target Audience

Architect and software engineers that are interested in big data processing

Hien Luu Elsewhere