Introducing the speaker, participants and workshop
INTRO TO SPARK
What is Apache Spark ?
- Where does Spark come from ?
- Why has it grown so quickly to the most popular cluster computing framework ?
- What are the advantages compared to Hadoop and MapReduce ?
- What is new in Spark 2.0 ?
- Data Engineering vs Data Science
- Notebooks: interactive programs that allow you to do data analysis and visualise the results
- Writing Spark programs using notebooks (Zeppelin, Spark Notebook, Databricks Cloud)
Just Enough Scala and Python
Spark was developed in Scala, a high-level programming language that combines object-oriented and functional programming.Programming Spark applications in Scala is straightforward for anyone who is familiar with a programming language. We look at the definition of variables, functions and the use of collections in Scala.
However, because a lot of data science and statistical applications are currently programmed in Python, the open source community has developed a wonderful toolkit called PySpark, to expose the Spark programming model to Python.
We make sure that you are very familiar with the programming environment, so that you can start solving increasingly complex exercises.
We look at the Spark Core API from the perspective of the "Data Developer": from prototyping in the Spark Shell to the compilation and packaging of Spark applications for a cluster, and how this application is efficiently executed on a cluster.
The following topics will be covered:
- Spark Shell: the interactive shell for doing data analysis in Spark in an interactive way
- RDD (Resilient Distributed Datasets): a distributed collection of objects, the most important concept in Spark
- Transformations & Actions: operations on RDDs
- Job Execution
END OF DAY 1
End of Day 1 of this Workshop
DAY 2: WELCOME BACK
Welcome to Day 2 with Coffee/Tea
DATAFRAMES and DATASETS
- DataFrames: a distributed collection of data organized into named columns
MORE ADVANCED EXERCISES
Putting it all together
More extended, guided exercises in which most of the Spark modules are combined, showing the true power of Spark
End of this two-day Workshop
This is a very brief overview of the programme of this unique two-day workshop:
- WELCOME - Registration, Coffee/Tea and Croissants
- INTRO - What is Apache Spark ?
- DEVELOPMENT MADE EASY - Notebooks
- LANGUAGES - Just Enough Scala and Python
- LUNCH - Lunch
- BASICS - Spark Basics
- END OF DAY 1 - End of Day 1 of this Workshop
- START of DAY 2 - Welcome to Day 2 with Coffee/Tea and Croissants
- DATAFRAMES and DATASETS - Spark SQL
- LUNCH - Lunch
- ADVANCED EXERCISES - Putting it all together
- FINISH - End of this two-day Workshop