Apache Spark Hands-On Training (In-Company)

Apache Spark Hands-On Training (In-Company)


Want to learn Spark fast, practice it, and get yourself a flying start ?

ON REQUEST
Location: In-company (YOUR COMPANY)
Presented in English by Geert Van Landeghem
Price: ASK FOR PRICE QUOTE (excl. 21% VAT)
AGENDA » SPEAKERS »



Full Programme:
WELCOME
Introducing the speaker, participants and workshop
INTRO TO SPARK
What is Apache Spark ?
  • Where does Spark come from ?
  • Why has it grown so quickly to the most popular cluster computing framework ?
  • What are the advantages compared to Hadoop and MapReduce ?
  • What is new in Spark 2.0 ?
  • Data Engineering vs Data Science
 
Notebooks
  • Notebooks: interactive programs that allow you to do data analysis and visualise the results
  • Writing Spark programs using notebooks (Zeppelin, Spark Notebook, Databricks Cloud)
 
Just Enough Scala and Python

Spark was developed in Scala, a high-level programming language that combines object-oriented and functional programming.Programming Spark applications in Scala is straightforward for anyone who is familiar with a programming language. We look at the definition of variables, functions and the use of collections in Scala.

However, because a lot of data science and statistical applications are currently programmed in Python, the open source community has developed a wonderful toolkit called PySpark, to expose the Spark programming model to Python.

We make sure that you are very familiar with the programming environment, so that you can start solving increasingly complex exercises.

GETTING STARTED
Spark Basics

We look at the Spark Core API from the perspective of the "Data Developer": from prototyping in the Spark Shell to the compilation and packaging of Spark applications for a cluster, and how this application is efficiently executed on a cluster.

The following topics will be covered:

  • Spark Shell: the interactive shell for doing data analysis in Spark in an interactive way
  • RDD (Resilient Distributed Datasets): a distributed collection of objects, the most important concept in Spark
  • Transformations & Actions: operations on RDDs
  • Job Execution
  • Clustering
END OF DAY 1
End of Day 1 of this Workshop
DAY 2: WELCOME BACK
Welcome to Day 2 with Coffee/Tea
DATAFRAMES and DATASETS
Spark SQL
  • SQL
  • DataFrames: a distributed collection of data organized into named columns
  • Datasets
LUNCH
MORE ADVANCED EXERCISES
Putting it all together

More extended, guided exercises in which most of the Spark modules are combined, showing the true power of Spark

FINISH
End of this two-day Workshop



This is a very brief overview of the programme of this unique two-day workshop:

  • WELCOME - Registration, Coffee/Tea and Croissants
  • INTRO - What is Apache Spark ?
  • DEVELOPMENT MADE EASY - Notebooks
  • LANGUAGES - Just Enough Scala and Python
  • LUNCH - Lunch
  • BASICS - Spark Basics
  • END OF DAY 1 - End of Day 1 of this Workshop
  • START of DAY 2 - Welcome to Day 2 with Coffee/Tea and Croissants
  • DATAFRAMES and DATASETS - Spark SQL
  • LUNCH - Lunch
  • ADVANCED EXERCISES - Putting it all together
  • FINISH - End of this two-day Workshop
        SPEAKERS »

YES, I am interested !

Check out our related open workshops:

Check out our related in-house workshops:

dit is een inhouse

Questions about this ? Interested but you can't attend ? Send us an email !