get ready

Data Science And Machine Learning
Workshop

APACHE SPARK | PYTHON

28 - 29 Jan 2018 | MEDIA ROTANA

overview

Data Science has been a trending word in the industry for a long time. It is the middle path of the business aspect and the technology aspect of decision making. Data science analyses data to provide actionable insights. At its core, data science involves using automated methods to analyze massive amounts of data and to extract knowledge from them by incorporating computer science, data modelling, statistics, analytics, and mathematics.

BUT NOT EVERY BUSINESS IS ABLE TO MAKE SENSE OUT OF THE ENORMOUS DATA IT HAS. MAKING SENSE OUT OF DATA IS AS CRUCIAL AS COLLECTING IT. WITH DATA POINTS SUCH AS MOBILE APPS, WEB APPS, WEBSITES, POINT OF SALES, IOT INCREASING GEOMETRICALLY, THE ROLE AND IMPACT OF DATA SCIENCE CAN ONLY GROW IN THE FUTURE.

image

One of the biggest misconceptions is that you need a sciences or math Ph. D to become a legitimate data scientist. Data Scientists use many technologies such as Apache Spark and Python. These technologies do not warrant a Ph. D. Mastering these technologies can open the avenues for an aspiring Data Scientist.

Join the Data Science & Machine Learning Workshop and Learn how to analyze data to gain insights, develop new strategies, and cultivate actionable business intelligence.


WHY ATTEND?

At the end of the workshop, Participants will be able to:
  Differentiate between supervised and unsupervised Machine Learning problems
  Apply various regression and classification models
  Train analytical models with Spark MLlib’s DataFrame-based estimators
  Implement linear regression, decision trees, logistic regression,and k-means.
  Understand purpose of Transformers to perform pre-processing on a dataset prior to training
  Write code to implement Transformers using standardization, normalization, one-hot encoding, and binarization.
  Create a processing pipeline including transformations, estimations, evaluation of analytical models.
  Using Spark Mlib evaluators to Evaluate model accuracy by dividing data into training and test datasets and computing metrics.
  Tune training hyper-parameters by integrating cross-validation into Spark MLlib Pipelines.

ATTENDEE PROFILE
  Technical Managers / Team Leads
  Software engineers with some level of coding experience
  Entry level Data scientists

Register Now

WORKSHOP OBJECTIVE

This is a 2-day hands-on Apache Spark & Python Workshop / training targets Technical Managers or enthusiastic Software Engineers with some level of coding experience or Entry Level Data Scientists. This program will help participants to perform data analysis at scale using Apache Spark. This course is practical oriented and hands on which covers an overview of Python and Apache Spark, and Machine Learning Libraries using Spark.
PARTICIPANTS WILL BE ABLE TO PERFORM HANDS-ON PROJECTS WITH BELOW OBJECTIVES:
• Using python scripts for web scraping and data wrangling
• Using apache spark to perform quick analysis on structured and unstructured data sets which includes file formats like CSV, JSON, AVRO, Parquet
• Utilizing extract-transform-load operations (ETL),
• Employing exploratory data analysis (EDA),
• Building machine learning models, evaluating models, and performing cross validation.

PREREQUISITES
  Basic Python or other Programming languages experience required
  Basic understanding of Big Data
SOFTWARE INSTALLATION
  We will provide instructions to install Apache Spark and Python

Register Now

INSTRUCTOR PROFILE

Kumar D U S
Big Data Architect

Kumar is a BIG DATA Architect and Practice Lead for Artificial Intelligence and Machine Learning. He has 19 years of experience in Information Technology. He has delivered solutions in various countries like USA, UK, Australia, New Zealand, India and Singapore for reputed organizations. He has worked on large Data Warehousing projects, BIG DATA projects. He has Authored books on Hadoop development, Hadoop Administration, Python and Spark. He enjoys delivering knowledge and helping customers in Critical requirements.


image


Workshop starts in:

Agenda

08:30 AM

COFFEE AND REGISTRATION


 
09:00 AM

Introduction to BIG DATA


 
09:15 AM

BIG DATA & Machine Learning

• How to implement BIG DATA tools
• How to implement SPARK
• Demonstrate and Discuss 1 project – Participants will be able to see complete code execution.
 

09:45 AM

Machine Learning – Algorithms and Terminology

• Classification
• Regression
• Cross – Validator
• Decision Trees
* We will provide complete Cheat sheet of Machine Learning algorithms. Also, Glossary & Definitions of Machine Learning Algorithms
 

10:15 AM

Resource planning, Hardware Consideration, Client Interactions in such projects(FAQ)


 
10:45 AM

NETWORKING BREAK


 
11:00 AM

BIG DATA – Tools & Frameworks FAST FORWARD

• Apache Hadoop (Live Practical examples)
• Hive, PIG, SQOOP, FLUME
• Apache Spark, Python
 
11:30 AM

Machine Learning LIVE Example Walk through:

• Featurizing a DataFrame using Transformers
• Training a linear Regression model using Estimators
• Evaluating the Model using Evaluators
• Putting it all together using MLlib Pipelines
 
12:00 PM

Python – Fundamentals & Programming Techniques

• Demonstrate and discuss major Python end to end coding solutions.
• We will provide 2 examples, which you can practice after the course
 
01:00 PM

NETWORKING LUNCH


 
02:00 PM

Spark – Fundamentals

Hands on
RDD, Data Frames, Data Sets

SPARK SQL
• Demonstrate and discuss SPARK project end to end coding solution.
• We will provide 2 additional examples, which you can practice after the course
 

04:00 PM

Machine Learning with SPARK

Understanding ML, Mlib packages
Classification
• Binary Classification
• Multinomial Classification
 

05:00 PM

Q & A – Closing session


 
09:00 AM

Recap of Machine Learning with Spark


 
09:15 AM

The components of the machine learning area

• Supervised Learning
• Unsupervised Learning
 

09:45 AM

Understand different kinds of data and file formats JSON

AVRO PARQUET CSV
 

10:15 AM

SPARK – Mlib Libraries – Purpose – Functions / Methods


 
10:45 AM

NETWORKING BREAK


 
11:00 AM

Understanding process flow of Machine Learning algorithms

• Purpose of Training sets
• Purpose of Testing sets
• How to identify Features
 

11:30 AM

Understanding process flow of Machine Learning algorithms

• How to prepare MODELS
• How to FIT model
 

12:00 PM

Machine Learning algorithms: Classification & Regression
Binary Classification Multinomial Classification Logistic Regression


 
01:00 PM

NETWORKING LUNCH


 
02:00 PM

Machine Learning algorithms

• Linear regression Decision Trees Caching & Persistence
• Make participants hands on with predicting with regression models
 

04:00 PM

End to End project execution

• Demonstrate and discuss SPARK Machine Learning algorithms end to end project.
• We will provide 2 additional examples, which you can practice after the course
 

05:00 PM

Q & A – Closing session


 
We request all the participants to be at Venue by Sharp 8.15 AM. Breakfast will be completed by 8.45 AM. We will do quick check on Installation of Required Software between 8.45 AM to 9 AM. Session will start Sharp 9 AM. We will send all instructions on how to install required software before the session. We expect Participants to bring their own laptops.

register

Offline Registration

Please complete the form below.
*All fields are required
*Email
Please enter a valid email address - this will be used to send your confirmation, badge and attendee information

*Full Name

*Job Title

*Organisation


*Telephone

 


For further information, please contact us: info@theinfinityconferences.com or call us on +971 55 875 2588

check out our upcoming events

Venue


Media Rotana | Hessa Street | Barsha Heights | Dubai | United Arab Emirates