Phone (416) 332-8727 ; Add to Favorites
Home Programs Admission Financial Aid e-Learning Events & News Career Services Contact
Big Data Hadoop Development

Big Data Era has come. This course is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems. It is for those who want to start thinking about how Big Data might be useful in their business or career.  During this course, you will learn


• How to identify Big Data Strategies, Industry expectations and current opportunities by First Step towards Job - Workshop

• Which methods can help you to develop valid project documentation and techniques to review documentation through Project Documentation Development and Review   

• Identify, analyze and design processes models to enhance information flow with our Business Process Modeling - Workshop

• Discover real business needs with a structured approach to Facilitation with Role Play   

• Develop superior strategies for gathering, documenting and reviewing requirements

• How important it is to understand big data tools such as Hadoop, Hive, Pig, Spark, Map Reduce through our Requirement Management Tools   

• Present yourself with high value and crack certification in the first instance when you will attend our Mock Exams and Resume Preparation  

 

COURSE Outline:

Lesson 1: An Introduction to Big Data

Understanding Big Data and Hadoop
Introduction to Big Data
Importance of Big Data
Big data and its Hype
Structured vs Unstructured Data
Big Data users and Scenarios
Challenges of Big Data
Why Distributed Processing

Lesson 2: Hadoop Architecture and HDFS

History of Hadoop
Hadoop Ecosystem
Hadoop Animal Planet
When to use & when not to use Hadoop
What is Hadoop?
Key Distinctions of Hadoop
Hadoop Components/Architecture
Understanding Storage Components
Understanding Processing Components
Anatomy of a File Write
Anatomy of a File Read

Lesson 3: Hadoop MapReduce Framework

Meet MapReduce
Word Count Algorithm – Traditional approach
Traditional approach on a Distributed system
Traditional approach – Drawbacks
MapReduce approach
Input & Output Forms of a MR program
Map, Shuffle & Sort, Reduce Phases
Workflow & Transformation of Data
Word Count Code walkthrough

Lesson 4: Advanced MapReduce

Combiner
Partitioner
Counters
Hadoop Data Types
Custom Data Types
Input Format & Hierarchy
Output Format & Hierarchy
Side Data distribution – Distributed cache
Joins
Map side Join using Distributed cache
Reduce side Join
MR Unit – An Unit testing framework

Lesson 5: Pig

What is Pig?
Why Pig?
Pig vs SQL
Execution Types or Modes
Running Pig
Pig Data types
Pig Latin relational Operators
Multi Query execution
Pig Latin Diagnostic Operators
Pig Latin Macro & UDF statements
Pig Latin Commands
Pig Latin Expressions
Schemas
Pig Functions
Pig Latin File Loaders
Pig UDF & executing a Pig UDF

Lesson 6: Hive

Introduction to Hive
Pig Vs Hive
Hive Limitations & Possibilities
Hive Architecture
Metastore
Hive Data Organization
Hive QL
Sql vs Hive QL
Hive Data types
Data Storage
Managed & External Tables
Partitions & Buckets
Storage Formats
Built-in Serdes
Importing Data
Alter & Drop Commands
Data Querying
Partitions & Buckets
Storage Formats
Built-in Serdes
Importing Data
Alter & Drop Commands
Data Querying

Lesson 7: Advanced Hive and Hbase

Introduction to NoSql & HBase
Row & Column oriented storage
Characteristics of a huge DB
What is HBase?
HBase Data-Model
HBase vs RDBMS
HBase architecture
HBase in operation
Loading Data into HBase
HBase shell commands
HBase operations through Java
HBase operations through MR

The Trainers
Ms. Jun Guan
Senior Data Analyst
Senior Biostatistian
Mater Degree in Statistics, U of T

Ms. Joan Lin
Ph.D. in Computer Application
Senior Scientisit
Senior Statistian
The Achievements
Achievement
Consultation

Fill out and submit this Form to ask any questions about this program. Our counsellor will get back to you shortly.

Name
Phone
Email
Questions
 

The Resources
Resources
Articles
Data Mining

OCOT Advantages

100 %Instructor-Led Class
State-of-the-Art Facilities
Unlimited Lab Time
Labs Open 7-days a Week
Free Repeat
Free Job Placement
Financial Aid Possible
Resume Writing
Interview Skills


© 2008 Ontario College of Technology