This event is history, please check out the List of Upcoming Seminars
Check out our related open workshops:
Check out our related in-house workshops:
During the introduction, we will go deeper into the concepts used within the BigData world. We'll explore the history behind Hadoop and get a feeling of what can be done with a Hadoop environment.
Additionally, we will get our virtual machine up and running so we can use it during the exercises later on. We will introduce Hue, an online desktop serving as the gateway to our cluster.
The first main topic we will cover is the "Hadoop Distributed Filesystem". We will go into detail on how it differs from a "regular" file system and what the concequences are of choosing this approach.
The second topic will deal with HBase that serves as a distributed key-value store on top of HDFS. We go deeper into why you would need HBase when you have HDFS also touching HBase data modeling.
During the exercises we will install Hadoop and get a feeling of this filesystem. You'll get a feeling of working with HBase to do simple operations.
Next to the storage of data, this second main part will deal with the processing of this stored information. MapReduce will be explained as the main algorithm used to process large amounts of information.
We will also show that you don't need to know Java to invoke MapReduce jobs, but that tools like Hive and Pig greatly simplify this job.
The exercises will require the same datasets to be processed with Hive as well as Pig, showing the similarities and differences between the two technologies.
Most likely, you want to integrate your Hadoop environment into your existing one, which means you will need to import and export information from and to your existing systems.
This part of the course will deal with importing and exporting data from existing systems as well as relational databases using tools like Sqoop and Flume.
So you got your cluster up and running, but how do you manage it and make sure it stays up? We'll introduce Cloudera Manager, a tool for managing your hadoop environment. For monitoring we'll have a look at Ganglia.
The number of participants is limited to 16 to guarantee an optimal interaction and learning experience.SPEAKERS »