Workshop: Hands on with Apache Spark
Location:
- St James, 4th flr.
Prerequisites
- High level knowledge of Hadoop.
- Laptop with Java 7 or above and Spark 1.5.1 installed
Apache Spark is a fast and powerful open source data processing engine built for sophisticated analytics and easy of use. It is deemed as the next-generation successor of MapReduce. It was designed from the ground up to support batch processing, iterative processing, streaming processing and graph processing. Apache Spark provides a nice abstraction of large data sets with the concept of Resilient Distributed Datasets (RDD) and elegant APIs to easily manipulate these large data sets.
This tutorial will cover the core concepts in Apache Spark and will include hands on exercises using Spark APIs to solve common data processing problems. The exercises will be done using Apache Spark Scala APIs and therefore this tutorial will also cover the essential parts of Scala that are relevant to the exercises.
Here is what you can expect to learn from this tutorial:
- The basics of Scala
- Apache Spark architecture, core concepts, execution and programming model
- Data exploration and analysis with Spark RDD APIs
- Data exploration and analysis with Spark DataFrame & Spark SQL APIs
- Understanding Spark streaming concepts
Target Audience:
Architect and software engineers that are interested in big data processing
Hien Luu Elsewhere
Other Workshops:
Tracks
Covering innovative topics
Monday, 7 March
-
Back to Java
What to expect in Java 9 and Spring 5
-
Stream Processing @ Scale
Big data, fast-moving data. Practical implementation lessons on Real-time Data
-
DevOps & CI/CD
Lessons/stories on optimizing the deployment pipeline
-
Head-to-Tail Functional Languages
Free-range Monads, Tackling immutability, tales from production, and more...
-
Architecting for Failure
Your system will fail. Take control before it takes you with it
-
21st Century Culture from Geeks on the Ground
New ways to organise technology companies and workplace culture
Tuesday, 8 March
-
Architectures You've Always Wondered about
In-depth technical case studies from giants like: Microsoft, Netflix, Google, Twitter, and more...
-
Close to the Metal
Get efficiency back into your code, concepts like: cache efficient algorithm and lock free data structures
-
Containers (in production)
Real-world lessons on scalability and reliability in production container deployments
-
Modern CS in the real world
Real-world Industry adoption of modern CS ideas
-
Security, Incident Response & Fraud Detection
Master-level classes on building security into your system and responding to incidents when things go wrong.
-
Optimizing You
Keeping life in balance is always a challenge. Learning lifehacks
Wednesday, 9 March
-
Disrupting Finance
Technology advances in finance (blockchain, P2P, Machine Learning, API's)
-
Modern Native Languages
Modern native languages: Safe efficiency with Go, Rust, Swift
-
Full Stack Javascript
Level up Javascript with topics like Angular, React/ReactNative, Node, Mongo/Couch/Other, Falcor, GraphQL, etc
-
Data Science & Machine Learning Methods
A developer's data science and machine learning toolkit
-
Microservices for Mega-Architectures
Practical lessons on Microservices success.
-
Modern Agile Development
Revisiting Agile today and tackling challenges we are seeing in the wild