The Hadoop Ecosystem: a Practical Workshop

The Hadoop Ecosystem: a Practical Workshop


Explore and understand the power of the Hadoop ecosystem

3-4 July 2012 (10-18u)
Location: Golden Tulip Brussels Airport (Diegem)
Presented in English by
Price: 1050 EUR (excl. 21% VAT)
Register Now »

This event is history, please check out the List of Upcoming Seminars, or send us an email

Check out our related in-house workshops:

 Learning Objectives

Why do we organize this workshop ?

The rise of the Internet, social media and mobile technologies has dramatically increased our digital footprint. When companies like Google and Facebook were confronted with this, they started to look at massive amounts of data ("big data") in a completely different way. But you don't have to be Google or Facebook to benefit from these new possibilities.

Hadoop offers an open source solution based on the same technology that is used within Google. It allows you to store massive amounts of data, and to analyze these in a scalable way to gain new insights.

During this workshop, you have the chance to explore and understand this revolutionary technology and the concepts behind it.

Why should you attend this workshop about the Hadoop ecosystem ?

During this workshop, you will get real answers to these and other questions:

  • What is Hadoop and how does it fit in the big data (r)evolution ?
  • Which components do we find in the Hadoop ecosystem ?
  • How do you import data from a relational database to a Hadoop environment ?
  • In what ways can you store information in a Hadoop environment ?
  • How do you integrate existing scripts in MapReduce ?
  • How do you manage a Hadoop cluster ?

Who should attend this workshop ?

This workshop is aimed at everyone who wants to explore big data technology. This workshop is presented in English and a basic understanding of the following topics will make it easier to do the exercises:

  • Linux/Unix commands - We will occasionally use the basic commands like less, ls, cat, ...
  • SQL - We will execute simple queries to extract data from relational databases

The number of participants is limited to 16 to guarantee an optimal interaction and learning experience.

What do you need during this workshop ?

Exercises will be done using a virtual machine which needs to be installed on your own laptop.

 Full Programme

9.30h - 10.00h
Registration, coffee/tea and croissants
10.00h (DAY 1)
Introduction

During the introduction, we will go deeper into the concepts used within the BigData world. We'll explore the history behind Hadoop and get a feeling of what can be done with a Hadoop environment.

Additionally, we will get our virtual machine up and running so we can use it during the exercises later on. We will introduce Hue, an online desktop serving as the gateway to our cluster.

Hadoop Storage Technologies

The first main topic we will cover is the "Hadoop Distributed Filesystem". We will go into detail on how it differs from a "regular" file system and what the concequences are of choosing this approach.

The second topic will deal with HBase that serves as a distributed key-value store on top of HDFS. We go deeper into why you would need HBase when you have HDFS also touching HBase data modeling.

During the exercises we will install Hadoop and get a feeling of this filesystem. You'll get a feeling of working with HBase to do simple operations.

18.00h (DAY 1)
End of Dag 1 of this Workshop
9.30h - 10.00h (DAY 2)
Coffee/Tea and croissants
10.00h (DAY 2)
Hadoop Processing Technologies

Next to the storage of data, this second main part will deal with the processing of this stored information. MapReduce will be explained as the main algorithm used to process large amounts of information.

We will also show that you don't need to know Java to invoke MapReduce jobs, but that tools like Hive and Pig greatly simplify this job.

The exercises will require the same datasets to be processed with Hive as well as Pig, showing the similarities and differences between the two technologies.

Importing and Exporting data

Most likely, you want to integrate your Hadoop environment into your existing one, which means you will need to import and export information from and to your existing systems.

This part of the course will deal with importing and exporting data from existing systems as well as relational databases using tools like Sqoop and Flume.

Managing an Hadoop Cluster

So you got your cluster up and running, but how do you manage it and make sure it stays up? We'll introduce Cloudera Manager, a tool for managing your hadoop environment. For monitoring we'll have a look at Ganglia.

18.00h
End of this Workshop

The number of participants is limited to 16 to guarantee an optimal interaction and learning experience.

 Speakers


Geert Van Landeghem (DataCrunchers)
DataCrunchers

Geert Van Landeghem is a Big Data consultant with over 20 years of experience. He got interested in Big Data in 2010 and implemented his first Big Data project in 2011. Many big data projects later, he currently works as the Head of the BI team and Big Data architect for an online gambling company that uses Spark. He is always eager to learn new big data technologies and to translate them into new business solutions. He is also the co-organiser of the bigdata.be meetup group.

Geert was an instructor for IBM and has developed many courses for datacrunchers.eu.

In november 2014, he received the "Developer Certification for Apache Spark" from Databricks and O'Reilly.

Daan Gerits (DataCrunchers)
DataCrunchers

Daan Gerits is a BigData and open source enthousiast at heart. During his education he not only participated in, but also launched several initiatives to promote open source.

As the amount of available data rises, he is convinced that storing data is the easy part, processing it makes the difference. Therefore initiatives like Storm, Hadoop and Mahout really sparkle his mind.

Nathan Bijnens (DataCrunchers)
DataCrunchers

Nathan Bijnens is a Hadoop and Big Data Developer at datacrunchers.eu with a passion for great code, the web and big data. He is interested in programming and system administration, especially where they meet, from scaling platforms to designing the architecture of new and existing products and everything in between. He is a Hadoop and Hbase user, in combination with Pig and Hive. Focused at the infrastructure side, with a lot of interest for Business Intelligence and visualizing big data.

Questions about this ? Interested but you can't attend ? Send us an email !

-->