INTRODUCTION TO MACHINE LEARNING
Measuring model performance or error
Introduction to Machine Learning
Is our model any good? ●
●
Context of task ●
Accuracy
●
Computation time
●
Interpretability
3 types of tasks ●
Classification
●
Regression
●
Clustering
Introduction to Machine Learning
Classification ●
Accuracy and Error
●
System is right or wrong
●
Accuracy goes up when Error goes down
correctly classified instances Accuracy = total amount of classified instances Error
= 1 - Accuracy
Introduction to Machine Learning
Example ●
Squares with 2 features: small/big and solid/do!ed
●
Label: colored/not colored
●
Binary classification problem
Introduction to Machine Learning
Example Truth Predicted
✔
✘
✔ ✔ ✔ ✔ ✔ ✔ ✘ ✘
✘
✔
=
3 5
✔
= 60%
Introduction to Machine Learning
Example Truth Predicted
✔
✘
✘
✔
✔
✔ ✔ ✔ = ✔ ✔ ✔ ✘
✘
3 5
=
60%
Introduction to Machine Learning
Limits of accuracy ●
Classifying very rare heart disease
●
Classify all as negative (not sick)
●
Predict 99 correct (not sick) and miss 1
●
Accuracy: 99%
●
Bogus… you miss every positive case!
Introduction to Machine Learning
Confusion matrix ●
Rows and columns contain all available labels
●
Each cell contains frequency of instances that are classified in a certain way
Introduction to Machine Learning
Confusion matrix ●
Binary classifier: positive or negative (1 or 0)
Prediction P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Confusion matrix ●
Binary classifier: positive or negative (1 or 0)
Prediction
True Positives Prediction: P Truth: P
P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Confusion matrix ●
Binary classifier: positive or negative (1 or 0)
Prediction
True Negatives Prediction: N Truth: N
P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Confusion matrix ●
Binary classifier: positive or negative (1 or 0)
Prediction
False Negatives Prediction: N Truth: P
P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Confusion matrix ●
Binary classifier: positive or negative (1 or 0)
Prediction
False Positives Prediction: P Truth: N
P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Ratios in the confusion matrix ●
Accuracy
●
Precision
●
Recall Prediction P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Ratios in the confusion matrix ●
Accuracy
●
Precision
●
Recall Prediction
Precision TP/(TP+FP)
P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Ratios in the confusion matrix ●
Accuracy
●
Precision
●
Recall Prediction
Precision TP/(TP+FP)
P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Ratios in the confusion matrix ●
Accuracy
●
Precision
●
Recall Prediction
Recall TP/(TP+FN)
P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Ratios in the confusion matrix ●
Accuracy
●
Precision
●
Recall Prediction
Recall TP/(TP+FN)
P
N
p
TP
FN
n
FP
TN
Truth
Introduction to Machine Learning
Back to the squares
Prediction
Truth Predicted
P
N
p
1
1
n
1
2
Truth
Introduction to Machine Learning
Back to the squares
Prediction
Truth
P
N
p
1
1
n
1
2
Truth
Predicted
✔
Introduction to Machine Learning
Back to the squares
Prediction
Truth
P
N
p
1
1
n
1
2
Truth
Predicted
✔
✔
Introduction to Machine Learning
Back to the squares
Prediction
Truth
P
N
p
1
1
n
1
2
Truth
Predicted
✘
Introduction to Machine Learning
Back to the squares
Prediction
Truth
P
N
p
1
1
n
1
2
Truth
Predicted
✘
Introduction to Machine Learning
Back to the squares ●
Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%
●
Precision: TP/(TP+FP) = 1/(1+1) = 50%
●
Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction
Truth
P
N
p
1
1
n
1
2
Truth
Predicted
✔
✔
✔
Introduction to Machine Learning
Back to the squares ●
Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%
●
Precision: TP/(TP+FP) = 1/(1+1) = 50%
●
Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction
Truth
P
N
p
1
1
n
1
2
Truth
Predicted
✔
✘
✘
✔
✔
Introduction to Machine Learning
Back to the squares ●
Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%
●
Precision: TP/(TP+FP) = 1/(1+1) = 50%
●
Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction
Truth
P
N
p
1
1
n
1
2
Truth
Predicted
✘
✔
Introduction to Machine Learning
Back to the squares ●
Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%
●
Precision: TP/(TP+FP) = 1/(1+1) = 50%
●
Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction
Truth
P
N
p
1
1
n
1
2
Truth
Predicted
✘
✔
Introduction to Machine Learning
Rare heart disease ●
Accuracy: 99/(99+1) = 99%
●
Recall: 0/1 = 0%
●
Precision: undefined — no positive predictions Prediction P
N
p
0
1
n
0
99
Truth
Introduction to Machine Learning
Regression: RMSE ●
Root Mean Squared Error (RMSE)
●
Mean distance between estimates and regression line
12
●
11
●
●
10
●
X2
●● ●
●
● ● ● ● ●●
●
● ● ● ●
●
9
● ●
8
● ●
● ●● ●
●
●
●
7
● ●● ● ● ●
6
●
6
7
8
9 X1
10
11
12
Introduction to Machine Learning
Clustering ●
No label information
●
Need distance metric between points
Introduction to Machine Learning
Clustering ●
Performance measure consists of 2 elements ●
Similarity within each cluster
●
Similarity between clusters
Introduction to Machine Learning
Within cluster similarity Within sum of squares (WSS)
●
Diameter
10
●
●
Minimize
●● ● ●
●
5
● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ●●● ●● ●● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●
−5
0
X2
●
●
● ●
−5
0
5 X1
10
Introduction to Machine Learning
Between cluster similarity Between cluster sum of squares (BSS)
●
Intercluster distance
10
●
●
●
● ●
●● ● ●
5 0 −5
Maximize
● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ●●● ● ● ●● ● ●●● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●
X2
●
●
−5
0
5 X1
10
Introduction to Machine Learning
10
Dunn’s index ●
●
● ●
●● ● ●
5
● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ●●● ●● ●● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●
−5
0
X2
minimal intercluster distance maximal diameter
●
−5
0
5 X1
10
INTRODUCTION TO MACHINE LEARNING
Let’s practice!
INTRODUCTION TO MACHINE LEARNING
Training set and test set
Introduction to Machine Learning
Machine learning - statistics ●
Predictive power vs. descriptive power
●
Supervised learning: model must predict ●
●
unseen observations
Classical statistics: model must fit data ●
explain or describe data
Introduction to Machine Learning
Predictive model ●
Training ●
not on complete dataset
●
training set
●
Test set to evaluate performance of model
●
Sets are disjoint: NO OVERLAP
●
Model tested on unseen observations
-> Generalization!
Introduction to Machine Learning
Split the dataset ●
N instances (=observations): X
●
K features: F
●
Class labels: y
x1 x2 … xr xr+1 xr+2 … xN
f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1 … xN,1
f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2 … xN,2
… … … … … … … … …
fK x1,K x2,K … xr,K xr+1,K xr+2,K … xN,K
y y1 y2 … yr yr+1 yr+2 … yN
Training set
Test set
Introduction to Machine Learning
Split the dataset ●
N instances (=observations): X
●
K features: F
●
Class labels: y
x1 x2 … xr xr+1 xr+2
f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1
f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2
… xN
… xN,1
… xN,2
… … … … … … … … …
fK x1,K x2,K … xr,K xr+1,K xr+2,K
y y1 y2 … yr yr+1 yr+2
… xN,K
… yN
Training set
Test set
Introduction to Machine Learning
Split the dataset ●
N instances (=observations): X
●
K features: F
●
Class labels: y
x1 x2 … xr xr+1 xr+2
f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1
f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2
… xN
… xN,1
… xN,2
… … … … … … … … …
fK x1,K x2,K … xr,K xr+1,K xr+2,K
y y1 y2 … yr yr+1 yr+2
… xN,K
… yN
Training set Test set
Introduction to Machine Learning
Split the dataset ●
N instances (=observations): X
●
K features: F
●
Class labels: y
x1 x2 … xr xr+1 xr+2
f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1
f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2
… xN
… xN,1
… xN,2
… … … … … … … … …
fK x1,K x2,K … xr,K xr+1,K xr+2,K
y y1 y2 … yr yr+1 yr+2
… xN,K
… yN
Training set
Test set
Introduction to Machine Learning
Split the dataset f1
f2
…
fK
y
x1
x1,1
x1,2
…
x1,K
y1
x2
x2,1
x2,2
…
x2,K
y2
…
…
…
…
…
…
xr
xr,1
xr,2
…
xr,K
yr
xr+1
xr+1,1
xr+1,2
…
xr+1,K
yr+1
xr+2
xr+2,1
xr+2,2
…
xr+2,K
yr+2
…
…
…
…
…
…
xN
xN,1
xN,2
…
xN,K
yN
Training set
Test set
Introduction to Machine Learning
Split the dataset f1
f2
…
fK
y
x1
x1,1
x1,2
…
x1,K
y1
x2
x2,1
x2,2
…
x2,K
y2
…
…
…
…
…
…
xr
xr,1
xr,2
…
xr,K
yr
xr+1
xr+1,1
xr+1,2
…
xr+1,K
yr+1
xr+2
xr+2,1
xr+2,2
…
xr+2,K
yr+2
…
…
…
…
…
…
xN
xN,1
xN,2
…
xN,K
yN
Use to predict y: ŷ
Training set
Test set
Introduction to Machine Learning
Split the dataset f1
f2
…
fK
y
x1
x1,1
x1,2
…
x1,K
y1
x2
x2,1
x2,2
…
x2,K
y2
…
…
…
…
…
…
xr
xr,1
xr,2
…
xr,K
yr
xr+1
xr+1,1
xr+1,2
…
xr+1,K
yr+1
xr+2
xr+2,1
xr+2,2
…
xr+2,K
yr+2
…
…
…
…
…
…
xN
xN,1
xN,2
…
xN,K
yN
Use to predict y: ŷ
real y compare them
Training set
Test set
Introduction to Machine Learning
When to use training/test set? ●
Supervised learning
●
Not for unsupervised (clustering) ●
Data not labeled
Introduction to Machine Learning
Predictive power of model Train model
Training set
Use model
Test model
Test set
Performance measure
Predictive power
Introduction to Machine Learning
How to split the sets? ●
Which observations go where?
●
Training set larger test set
●
Typically about 3/1
●
Quite arbitrary
●
Generally: more data = be!er model
●
Test set not too small
Introduction to Machine Learning
Distribution of the sets ●
●
Classification ●
classes must have similar distributions
●
avoid a class not being available in a set
Classification & regression ●
shuffle dataset before spli"ing
Introduction to Machine Learning
Effect of sampling ●
Sampling can affect performance measures
●
Add robustness to these measures: cross-validation
●
Idea: sample multiple times, with different separations
Introduction to Machine Learning
Cross-validation ●
E.g.: 4-fold cross-validation Training set
Test set
Training set
Test set
Test set
Test set
Training set
Training set
Introduction to Machine Learning
Cross-validation ●
E.g.: 4-fold cross-validation Training set
Test set
Training set
Test set
Test set
Test set
Training set
Training set
Introduction to Machine Learning
Cross-validation ●
E.g.: 4-fold cross-validation Training set
Test set
Training set
Test set
Test set
Test set
Training set
Training set
Introduction to Machine Learning
Cross-validation ●
E.g.: 4-fold cross-validation Training set
Test set
Training set
Test set
Test set
Test set
Training set
Training set
aggregate results for robust measure
Introduction to Machine Learning
n-fold cross-validation ●
Fold test set over dataset n times
●
Each test set is 1/n size of total dataset
INTRODUCTION TO MACHINE LEARNING
Let’s practice!
INTRODUCTION TO MACHINE LEARNING
Bias and Variance
Introduction to Machine Learning
What you’ve learned? ●
Accuracy and other performance measures
●
Training and test set
Introduction to Machine Learning
Kni!ing it all together ●
Effect of spli"ing dataset (train/test) on accuracy
●
Over- and underfi"ing
Introduction to Machine Learning
Introducing BIAS
VARIANCE
Introduction to Machine Learning
Bias and Variance ●
Main goal of supervised learning: prediction
●
Prediction error ~ reducible + irreducible error
Introduction to Machine Learning
Irreducible - reducible error ●
Irreducible: noise — don’t minimize
●
Reducible: error due to unfit model — minimize
●
Reducible error is split into bias and variance
Introduction to Machine Learning
Bias ●
Error due to bias: wrong assumptions
●
Difference predictions and truth ●
using models trained by specific learning algorithm
Introduction to Machine Learning
Example
Introduction to Machine Learning
Example ●
Quadratic data
Introduction to Machine Learning
Example ●
Quadratic data
●
Assumption: data is linear — use linear regression
Introduction to Machine Learning
Example ●
Quadratic data
●
Assumption: data is linear — use linear regression
●
Error due to bias is high: more restrictions on model
Introduction to Machine Learning
Bias ●
Complexity of model
●
More restrictions lead to high bias
Introduction to Machine Learning
Variance ●
Error due to variance: error due to the sampling of the training set
●
Model with high variance fits training set closely
Introduction to Machine Learning
Example ●
Quadratic data
●
Few restrictions: fit polynomial perfectly through training set
●
If you change training set, model will change completely
high variance : generalizes bad to test set
Introduction to Machine Learning
Bias-variance tradeoff BIAS
VARIANCE
low bias - high variance low variance - high bias
Introduction to Machine Learning
Overfi!ing ●
Accuracy will depend on dataset split (train/test)
●
High variance will heavily depend on split
●
Overfi!ing = model fits training set a lot be!er than test set
●
Too specific
Introduction to Machine Learning
Underfi!ing ●
Restricting your model too much
●
High bias
●
Too general
Introduction to Machine Learning
Example - spam or not? Truth no Emails training set
capital letters exclamation marks
A lot of capital letters? yes
A lot of exclamation marks? yes
exception with
50 capital letters 30 exclamation marks is no spam
spam
no spam no
no spam
Introduction to Machine Learning
Example - spam or not? Overfit no
A lot of capital letters? Emails training set
yes
capital letters exclamation marks
A lot of exclamation marks? yes
yes
30 exclamation marks?
no
no spam
no
50 capital letters? exception with
50 capital letters 30 exclamation marks is no spam
no spam
spam no
spam
yes
no spam
too specific!
Introduction to Machine Learning
Example - spam or not? Underfit no
Emails training set
capital letters exclamation marks
More than 10 capital letters?
no spam
yes
spam
too general!
INTRODUCTION TO MACHINE LEARNING
Let’s practice!