Highlights of this Workshop:
Why do we organise this workshop about Apache Spark ?
Who should attend this workshop?
This workshop is mainly aimed at developers, data analysts, data scientists, architects, software engineers and IT operations who want to develop Apache Spark applications. This course uses a hands-on approach to teach you the basics of Spark and give you a flying start.
You get an introduction to all Spark components from the perspective of "the data developer". Some experience with programming is necessary to get the most out of this course.
Please bring a laptop to the course. We'll run the exercises in a Notebook environment in the browser (no additional software needed on the laptop) via the Databricks cloud platform. Exercises vary from easy to complex, gradually adding functionality. Scala is our language of choice, but Python is possible as well.
We also offer this training as an in-house course for a minimum of 6 people from your company. The typical cost for an in-house training is 3.500 euro per day, excluding VAT, preparation, travel and hotel accommodation (if applicable).
Spark was developed in Scala, a high-level programming language that combines object-oriented and functional programming.Programming Spark applications in Scala is straightforward for anyone who is familiar with a programming language. We look at the definition of variables, functions and the use of collections in Scala.
However, because a lot of data science and statistical applications are currently programmed in Python, the open source community has developed a wonderful toolkit called PySpark, to expose the Spark programming model to Python.
We make sure that you are very familiar with the programming environment, so that you can start solving increasingly complex exercises.
We look at the Spark Core API from the perspective of the "Data Developer": from prototyping in the Spark Shell to the compilation and packaging of Spark applications for a cluster, and how this application is efficiently executed on a cluster.
The following topics will be covered:
More extended, guided exercises in which most of the Spark modules are combined, showing the true power of Spark
Geert Van Landeghem is a Big Data consultant with 25 years of experience working for companies across industries. He worked on his first big data project in 2011, and is still consulting companies on how to adopt big data within their organisation.
He has worked as the Head of BI for a gambling company in Belgium, where he led a team of 8 people. He is an Apache Spark Certified Developer since November 2014, and has worked as an instructor for IBM and Datacrunchers, where he teaches Hadoop and Spark-related courses.
He is currently examining how Artificial Intelligence can be used for business use cases and as such followed the first IBM Watson and O'Reilly AI conferences abroad.
Check out these related open workshops:
Check out our related in-house workshops:
Questions about this ? Interested but you can't attend ? Send us an email !