We combine theory and practice through realistic and increasingly complex exercises
These exercises can be developed and run in the Databricks Cloud Community Edition through a Spark Notebook environment in the browser, so there is no need for local installations on your laptop - a browser is all you need
The Databricks Cloud environment allows to use Scala, Python, SQL and R in an interactive way
Presented by an expert in big data, Hadoop, Spark and the Databricks environment
Why this Workshop on Apache Spark ?
Big Data lays the foundation for data-driven business, the future of business
Apache Hadoop is great for storing and processing big data volumes in batch, but has its limitations
Apache Spark is the fast and more general-purpose engine for large-scale data processing
IBM called Apache Spark "most important new open source project in a decade"
Spark programs can be 100 times faster than Hadoop/MapReduce in memory, or 10 times faster on disk
Spark is easy to use, because you can write applications quickly in Java, Scala, Python, R
It is an open source data analytics cluster computing framework that supports different types of data analysis within the same technology stack: fast interactive queries, streaming analysis, graph analysis and machine learning
During this two-day hands-on workshop, we discuss the theory and practice of several data analysis applications, and make sure your understand the framework, the environment and how to successfully run your own Spark projects
Who should attend this workshop?
This workshop is mainly aimed at developers, data analysts, data scientists, architects, software engineers and IT operations who want to develop Apache Spark applications. This course uses a hands-on approach to teach you the basics of Spark and give you a flying start.
You get an introduction to all Spark components from the perspective of the "data developer". Some experience with programming is necessary to get the most out of this course.
Please bring a laptop to the course. We'll run the exercises in a Notebook environment in the browser (no additional software needed on the laptop) via the Databricks cloud platform. Exercises vary from easy to complex, gradually adding functionality. Scala is our language of choice, but Python is possible as well.
We also offer this training as an in-house course for a minimum of 6 people from your company.