get ready

BIG DATA HADOOP HANDS-ON WORKSHOP
MEDIA ROTANA

BECOME AN EXPERT IN HDFS, MAPREDUCE, HBASE, HIVE, PIG, YARN, FLUME, SQOOP AND CLOUDERA PLATFORM

13-15 OCT 2017

overview

Hadoop is no longer a technology for tech enthusiasts and bleeding-edge Internet startups. Research shows that it’s becoming an integral part of the enterprise data strategy as users are gaining new insights into customers and their business.

Hadoop is driven by several rising needs, including the need to handle exploding data volumes, scale existing IT systems in warehousing, archiving, and content management, and to finally get BI value out of non-structured data. And with analytics as the primary path to extract business value from Big Data, Hadoop adoption is rapidly increasing.

The world of Hadoop and "Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems!

The Big Data Hadoop Workshop is designed to give you in-depth knowledge of the Big Data framework using Hadoop, including HDFS, YARN, and MapReduce. You will learn to use Pig, and Hive to process and analyze large datasets stored in the HDFS, and use Sqoop and Flume for data ingestion.

image


This course is best suited for IT, data management, and analytics professionals looking to gain expertise in Big Data.

11 REASONS TO ATTEND THIS WORKSHOP

  Design distributed systems that manage “big data” using Hadoop and related technologies

  Use Pig and Hive to create scripts to process data on a Hadoop cluster in more complex ways

  Analyze data using HBase (NOSQL), and MapReduce program

  Choose an appropriate data storage technology for your application

  Publish data to your Hadoop Cluster using Sqoop, and Flume

  Use HDFS and MapReduce for storing and analyzing data at scale

  Analyze relational data using Hive and MySQL

  Understand how Hadoop Clusters are managed by YARN, Hue, and Cloudera Manager

  Batch offline and online analysis

  Begin your journey in Data Science using Hadoop and other technologies

  Get trained for Cloudera Certification for Developers

Register Now

INSTRUCTOR PROFILE

Venkat is an experienced technology professional specializing in Data Science, Data warehouse, Big Data Analytics, Analysis, Machine learning, and Artificial Intelligence.
Venkat has close to a decade and half of experience in the technology industry, in roles ranging from strategy, planning, application development, consulting, and product management cutting across verticals like smart cities, education and retail.

SPECIALTIES

• Analytical & conceptual skills to understand key business needs and design tailored solutions to solve specific business problems.
• Large data sets and distributed computing (Hive/Hadoop)
• Knowledge in areas of information retrieval, natural language processing, data analytics and information visualization
• Hadoop stack - MapReduce, Pig, Hive, HCatalog, Oozie , R programming, Spark
• Using stat and reporting tools like R, SQL, Tableau, Python
• JIRA, Jenkins, Python, Unix, Agile methodology, SVN, GIT

image Venkata Satyanarayana Billa
Senior Big Data Trainer/ Architect


ATTENDEE PROFILE

  •   Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend "big data" at scale.
  •   Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
  •   Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
  •   System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.
  • Register Now

Agenda

08:30 AM

COFFEE AND REGISTRATION


 
09:00 AM

BIG DATA, HADOOP, INTRODUCTION TO HADOOP ARCHITECTURE AND HDFS

• Why did big data suddenly become so prominent?
• Limitations of traditional large-scale systems
• Compare Hadoop architecture with traditional architecture
• Core components of Hadoop
• Understanding Hadoop Master-Slave architecture
• Understanding HDFS architecture
• Learn about Name Node, Data Node, Secondary Node
• Learn about Job Tracker, Track Transfer
• Anatomy of read and write data on HDFS
• Hadoop deployment modes – Standalone, Single Node
• Important web URLs for Hadoop
• Run HDFS and Linux commands
 

10:30 AM

NETWORKING BREAK


 
10:50 AM

HADOOP2.0, YARN, MRV2

• Hadoop
• 1.0 Limitations Map
• Reduce Limitation
• Mrv1 vs Mrv2
• HDFS 2: Architecture
• HDFS 2: High Availability
• HDFS 2: YARN

 

12:30 PM

NETWORKING LUNCH


 
01:30 PM

UNDERSTANDING APACHESQOOP

• Sqoop – How Sqoop works
• Import/ Export data
• Sqoop architecture
 

03:00 PM

COFFEE BREAK


 
03:20 PM

UNDERSTANDING APACHE FLUME

• Flume – How Flume works
• Import/ Export data
• Flume architecture
 

04:50 PM

CLOSING REMARKS


 
09:00 AM

UNDERSTANDING HADOOP MAPREDUCE - PART 1

• Overview of the MapReduce Framework
• Use cases of MapReduce
• MapReduce Architecture
• Understand the concept of Mappers, Reducers
• Anatomy of MapReduce Program • MapReduce Components- Mapper Class, Reducer Class, Driver code
• Splits and Blocks - Understand Combiner Understanding
• Input/ Output Format
• MapReduce APIand Hadoop Data Types

 

10:30 AM

COFFEE BREAK


 
10:50 AM

UNDERSTANDING HADOOP MAPREDUCE – PART 1

• Understanding Combiner
• Partitioner
• MR joins
 

12:30 PM

NETWORKING LUNCH


 
01:30 PM

APACHE HIVE – HIVEQL – PART 1

• What is Hive
• Hive DDL-Create/Show/ Drop Database
• Hive DDL-Create/Show/ Drop Tables
• Hive DML-Load Files into Tables Hive DML-Inserting Data into Tables
• Hive SQL- Select, Filter, Join, Group
 

03:00 PM

COFFEE BREAK


 
03:20 PM

APACHE HIVE – HIVEQL – PART 2

• Partitions
• Buckets
• External tables
• Hive Data Model and Data
• Serdes
• Orc’s

 

04:50 PM

CLOSING REMARKS


 

09:00 AM

APACHEPIG – PART 1

• PIG vs MapReduce
• PIG components
• PIG execution
• PIG Datatypes
• PIG Architecture
• PIG Latin Relational Operators

 

10:30 AM

COFFEE BREAK


 
10:50 AM

APACHEPIG – PART 2

• PIG Latin Join and Co Group
• PIG Latin Group and Union
• Describe, Explain, Illustrate
• PIG Latin: File Loaders
 

12:30 PM

NETWORKING LUNCH


 
01:30 PM

HBASE & NOSQL DATABASES – PART 1

• Introduction to NoSQL
• RDBMS vs NoSQL
• Analytical (OLAP)
• Why HBASE
• HBASE Architecture
• HBASE Data Model HBASE Families
• HBASE Master
• HBASE vs RDBMS
• Column Families
• Access HBASE Data HBASE API
• Runtime modes
• Running HBASE

 

03:00 PM

COFFEE BREAK


 
03:20 PM

HBASE & NOSQL DATABASES – PART 2

• Get
• Scan
• Analysis
• Filters
• Column families
• Put
• Drop
• Sqoop: sql to HBASE

 

04:50 PM

CLOSING REMARKS


 

Masterclass starts in:

Offline Registration

Please complete the form below.
*All fields are required
*Email
Please enter a valid email address - this will be used to send your confirmation, badge and attendee information

*Full Name

*Job Title

*Organisation


*Telephone

 


For further information, please contact us: info@theinfinityconferences.com or call us on +971 55 875 2588

Venue


Media Rotana | Hessa Street | Barsha Heights | Dubai | United Arab Emirates