Phone (416) 332-8727 ; Add to Favorites
Home Programs Admission Financial Aid e-Learning Events & News Career Services Contact
Data Mining
1.Introduction

What is data mining?
Predominant areas in the computing history
What is this course about?
Course structure
Association analysis, data classification, and clustering.

2.Decision Tree Construction

On-line references:
Structure of decision trees
Data input
Decision tree construction: A simplified example
The Concept Learning System (CLS)
Information Gain
Training, testing and predictive accuracy
Information gain vs the gain ratio criterion
Difficulties with decision tree construction
Overfitting

3. Association Analysis

A mathematical model for association analysis
Large itemsets and association rules
Apriori: constructs large itemsets with minisup by iterations
Interestingness of Discovered Association Rules
Application examples
Association analysis vs. classification
Machine Learning Software in Java at the University of Waikato
Experiments/exercises with weka.associations.Apriori


4.Clustering

Clustering: unsupervised learning
Types of clusters
Different clustering methods
k-means: iterative distance-based clustering
Dealing with discrete values in k-means
Constructing a hierarchical clustering using k-means
Incremental clustering/classification: pros and cons
Steps in COBWEB to construct a clustering tree
Category utility
4 choices at each level when inserting a new instance
The COBWEB algorithm in Weka
The cutoff parameter (-C percentage)
How to combine clustering and classification?
How to measure the quality of clustering?
Density-based clustering methods
Outlier analysis

5.Rule Induction

Classification rules
Decision lists and disjunctive normal form (DNF)
1R ("1-rule")
Steps in c4.5rules
Running c4.5rules on Mansfield
The default (or no information) rule
Rule Induction by Covering
PRISM: Constructing correct and "perfect" rules
Divide-and-Conquer vs Separate-and-Conquer
Rule induction algorithms in Weka
Classification vs prediction
Lazy vs eager learning
The k-nearest neighbor algorithm
Genetic algorithms (GA)


6.Bayesian Methods

Alternative hypotheses
Prior knowledge
Imperfect data indicators
Conditional probability
Bayes theorem
Maximum A Posteriori (MAP)
Naive Bayes Classifier
The PlayTennis data with Naive Bayes
Day & Outlook & Temperature & Humidity & Windy &
Belief networks
Network topology
Conditional probability tables (CPT)
Joint probability distribution in a belief network
Training belief networks
Incremental construction of belief networks
Inference in belief networks
The Naive Bayes algorithm in Weka,

7.Dealing with Noise and Real-Valued Attributes

Artificial vs. real-world databases
The Monk's Problems: An example
Sources Of Noise
Noise Handling
Cross validation
Dealing with contradictions and redundancy
Expansion of Don't Care values
Handling of ? values
Generation of nonexistent examples
Light-weight leaves/rules
Stopping criteria to avoid overfitting
Overfitting vs underfitting
Occam's Razor
Reduced error pruning with a separate pruning set
Truncation of rules - TRUNC
"No match" and "multiple match" when deduction of induction results
Measure of fit
Estimate of probability
Dealing with real-valued attributes: Discretization
Criteria to stop the recursive splitting
Discretization in C4.5

8.Data Mining from Very Large Databases

Why large databases?
Data partitioning
Sampling techniques
Cross validation
Windowing in C4.5
Integrative windowing
Bagging, boosting, and their differences
Boosting in C5.0
Incremental batch learning
Aggregation of rules from different data sources
Leading data mining tools
The Trainers
Ms. Jun Guan
Senior Data Analyst
Senior Biostatistian
Mater Degree in Statistics, U of T

Ms. Joan Lin
Ph.D. in Computer Application
Senior Scientisit
Senior Statistian
The Achievements
Achievement
Consultation

Fill out and submit this Form to ask any questions about this program. Our counsellor will get back to you shortly.

Name
Phone
Email
Questions
 

The Resources
Resources
Articles
Data Mining

OCOT Advantages

100 %Instructor-Led Class
State-of-the-Art Facilities
Unlimited Lab Time
Labs Open 7-days a Week
Free Repeat
Free Job Placement
Financial Aid Possible
Resume Writing
Interview Skills


© 2008 Ontario College of Technology