Highlights of this Workshop:
- We focus on Spark 2, the latest version
- We combine theory and practice through realistic and increasingly complex exercises
- These exercises can be developed and run in the Databricks Cloud Community Edition through a Spark Notebook environment in the browser, so there is no need for local installations on your laptop - a browser is all you need
- The Databricks Cloud environment allows to use Scala, Python, SQL and R in an interactive way
- Presented by an expert in big data, Hadoop, Spark and the Databricks environment
Why do we organise this workshop about Apache Spark ?
- Big Data lays the foundation for data-driven business, the future of business
- Apache Hadoop is great for storing and processing big data volumes in batch, but has its limitations
- Apache Spark is the fast and more general-purpose engine for large-scale data processing
- IBM called Apache Spark "most important new open source project in a decade"
- Spark programs can be 100 times faster than Hadoop/MapReduce in memory, or 10 times faster on disk
- Spark is easy to use, because you can write applications quickly in Java, Scala, Python, R
- It is an open source data analytics cluster computing framework that supports different types of data analysis within the same technology stack: fast interactive queries, streaming analysis, graph analysis and machine learning
- During this two-day hands-on workshop, we discuss the theory and practice of several data analysis applications, and make sure your understand the framework, the environment and how to successfully run your own Spark projects
Who should attend this workshop?
This workshop is mainly aimed at developers, data analysts, data scientists, architects, software engineers and IT operations who want to develop Apache Spark applications. This course uses a hands-on approach to teach you the basics of Spark and give you a flying start.
You get an introduction to all Spark components from the perspective of "the data developer". Some experience with programming is necessary to get the most out of this course.
Please bring a laptop to the course. We'll run the exercises in a Notebook environment in the browser (no additional software needed on the laptop) via the Databricks cloud platform. Exercises vary from easy to complex, gradually adding functionality. Scala is our language of choice, but Python is possible as well.
We also offer this training as an in-house course for a minimum of 6 people from your company. The typical cost for an in-house training is 3.500 euro per day, excluding VAT, preparation, travel and hotel accommodation (if applicable).
This is a very brief overview of the programme of this unique two-day workshop:
AGENDA » SPEAKERS »
- WELCOME - Registration, Coffee/Tea and Croissants
- INTRO - What is Apache Spark ?
- DEVELOPMENT MADE EASY - Notebooks
- LANGUAGES - Just Enough Scala and Python
- LUNCH - Lunch
- BASICS - Spark Basics
- END OF DAY 1 - End of Day 1 of this Workshop
- START of DAY 2 - Welcome to Day 2 with Coffee/Tea and Croissants
- DATAFRAMES and DATASETS - Spark SQL
- LUNCH - Lunch
- ADVANCED EXERCISES - Putting it all together
- FINISH - End of this two-day Workshop