Big Data Era has come. This course is for those who want to become conversant with the terminology and the core concepts behind big data problems, applications, and systems. It is for those who want to start thinking about how Big Data might be useful in their business or career. During this course, you will learn
• How to identify Big Data Strategies, Industry expectations and current opportunities by First Step towards Job - Workshop
• Which methods can help you to develop valid project documentation and techniques to review documentation through Project Documentation Development and Review
• Identify, analyze and design processes models to enhance information flow with our Business Process Modeling - Workshop
• Discover real business needs with a structured approach to Facilitation with Role Play
• Develop superior strategies for gathering, documenting and reviewing requirements
• How important it is to understand big data tools such as Hadoop, Hive, Pig, Spark, Map Reduce through our Requirement Management Tools
• Present yourself with high value and crack certification in the first instance when you will attend our Mock Exams and Resume Preparation
COURSE Outline:
Lesson 1: An Introduction to Big Data
Understanding Big Data and Hadoop
Introduction to Big Data
Importance of Big Data
Big data and its Hype
Structured vs Unstructured Data
Big Data users and Scenarios
Challenges of Big Data
Why Distributed Processing
Lesson 2: Hadoop Architecture and HDFS
History of Hadoop
Hadoop Ecosystem
Hadoop Animal Planet
When to use & when not to use Hadoop
What is Hadoop?
Key Distinctions of Hadoop
Hadoop Components/Architecture
Understanding Storage Components
Understanding Processing Components
Anatomy of a File Write
Anatomy of a File Read
Lesson 3: Hadoop MapReduce Framework
Meet MapReduce
Word Count Algorithm – Traditional approach
Traditional approach on a Distributed system
Traditional approach – Drawbacks
MapReduce approach
Input & Output Forms of a MR program
Map, Shuffle & Sort, Reduce Phases
Workflow & Transformation of Data
Word Count Code walkthrough
Lesson 4: Advanced MapReduce
Combiner
Partitioner
Counters
Hadoop Data Types
Custom Data Types
Input Format & Hierarchy
Output Format & Hierarchy
Side Data distribution – Distributed cache
Joins
Map side Join using Distributed cache
Reduce side Join
MR Unit – An Unit testing framework
Lesson 5: Pig
What is Pig?
Why Pig?
Pig vs SQL
Execution Types or Modes
Running Pig
Pig Data types
Pig Latin relational Operators
Multi Query execution
Pig Latin Diagnostic Operators
Pig Latin Macro & UDF statements
Pig Latin Commands
Pig Latin Expressions
Schemas
Pig Functions
Pig Latin File Loaders
Pig UDF & executing a Pig UDF
Lesson 6: Hive
Introduction to Hive
Pig Vs Hive
Hive Limitations & Possibilities
Hive Architecture
Metastore
Hive Data Organization
Hive QL
Sql vs Hive QL
Hive Data types
Data Storage
Managed & External Tables
Partitions & Buckets
Storage Formats
Built-in Serdes
Importing Data
Alter & Drop Commands
Data Querying
Partitions & Buckets
Storage Formats
Built-in Serdes
Importing Data
Alter & Drop Commands
Data Querying
Lesson 7: Advanced Hive and Hbase
Introduction to NoSql & HBase
Row & Column oriented storage
Characteristics of a huge DB
What is HBase?
HBase Data-Model
HBase vs RDBMS
HBase architecture
HBase in operation
Loading Data into HBase
HBase shell commands
HBase operations through Java
HBase operations through MR
|