introduction to machine learning

Report 3 Downloads 73 Views
INTRODUCTION TO MACHINE LEARNING

Measuring model performance or error

Introduction to Machine Learning

Is our model any good? ●



Context of task ●

Accuracy



Computation time



Interpretability

3 types of tasks ●

Classification



Regression



Clustering

Introduction to Machine Learning

Classification ●

Accuracy and Error



System is right or wrong



Accuracy goes up when Error goes down

correctly classified instances Accuracy = total amount of classified instances Error

= 1 - Accuracy

Introduction to Machine Learning

Example ●

Squares with 2 features: small/big and solid/do!ed



Label: colored/not colored



Binary classification problem

Introduction to Machine Learning

Example Truth Predicted





✔ ✔ ✔ ✔ ✔ ✔ ✘ ✘





=

3 5



= 60%

Introduction to Machine Learning

Example Truth Predicted











✔ ✔ ✔ = ✔ ✔ ✔ ✘



3 5

=

60%

Introduction to Machine Learning

Limits of accuracy ●

Classifying very rare heart disease



Classify all as negative (not sick)



Predict 99 correct (not sick) and miss 1



Accuracy: 99%



Bogus… you miss every positive case!

Introduction to Machine Learning

Confusion matrix ●

Rows and columns contain all available labels



Each cell contains frequency of instances that are classified in a certain way

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction

True Positives Prediction: P Truth: P

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction

True Negatives Prediction: N Truth: N

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction

False Negatives Prediction: N Truth: P

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction

False Positives Prediction: P Truth: N

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy



Precision



Recall Prediction P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy



Precision



Recall Prediction

Precision TP/(TP+FP)

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy



Precision



Recall Prediction

Precision TP/(TP+FP)

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy



Precision



Recall Prediction

Recall TP/(TP+FN)

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy



Precision



Recall Prediction

Recall TP/(TP+FN)

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Back to the squares

Prediction

Truth Predicted

P

N

p

1

1

n

1

2

Truth

Introduction to Machine Learning

Back to the squares

Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted



Introduction to Machine Learning

Back to the squares

Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted





Introduction to Machine Learning

Back to the squares

Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted



Introduction to Machine Learning

Back to the squares

Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted



Introduction to Machine Learning

Back to the squares ●

Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%



Precision: TP/(TP+FP) = 1/(1+1) = 50%



Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted







Introduction to Machine Learning

Back to the squares ●

Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%



Precision: TP/(TP+FP) = 1/(1+1) = 50%



Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted











Introduction to Machine Learning

Back to the squares ●

Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%



Precision: TP/(TP+FP) = 1/(1+1) = 50%



Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted





Introduction to Machine Learning

Back to the squares ●

Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%



Precision: TP/(TP+FP) = 1/(1+1) = 50%



Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted





Introduction to Machine Learning

Rare heart disease ●

Accuracy: 99/(99+1) = 99%



Recall: 0/1 = 0%



Precision: undefined — no positive predictions Prediction P

N

p

0

1

n

0

99

Truth

Introduction to Machine Learning

Regression: RMSE ●

Root Mean Squared Error (RMSE)



Mean distance between estimates and regression line

12



11





10



X2

●● ●



● ● ● ● ●●



● ● ● ●



9

● ●

8

● ●

● ●● ●







7

● ●● ● ● ●

6



6

7

8

9 X1

10

11

12

Introduction to Machine Learning

Clustering ●

No label information



Need distance metric between points

Introduction to Machine Learning

Clustering ●

Performance measure consists of 2 elements ●

Similarity within each cluster



Similarity between clusters


Introduction to Machine Learning

Within cluster similarity Within sum of squares (WSS)



Diameter

10





Minimize

●● ● ●



5

● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ●●● ●● ●● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●

−5

0

X2





● ●

−5

0

5 X1

10

Introduction to Machine Learning

Between cluster similarity Between cluster sum of squares (BSS)



Intercluster distance

10







● ●

●● ● ●

5 0 −5

Maximize

● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ●●● ● ● ●● ● ●●● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●

X2





−5

0

5 X1

10

Introduction to Machine Learning

10

Dunn’s index ●



● ●

●● ● ●

5

● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ●●● ●● ●● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●

−5

0

X2

minimal intercluster distance maximal diameter



−5

0

5 X1

10

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

INTRODUCTION TO MACHINE LEARNING

Training set and test set

Introduction to Machine Learning

Machine learning - statistics ●

Predictive power vs. descriptive power



Supervised learning: model must predict ●



unseen observations

Classical statistics: model must fit data ●

explain or describe data

Introduction to Machine Learning

Predictive model ●

Training ●

not on complete dataset



training set



Test set to evaluate performance of model



Sets are disjoint: NO OVERLAP



Model tested on unseen observations 
 -> Generalization!

Introduction to Machine Learning

Split the dataset ●

N instances (=observations): X



K features: F



Class labels: y

x1 x2 … xr xr+1 xr+2 … xN

f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1 … xN,1

f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2 … xN,2

… … … … … … … … …

fK x1,K x2,K … xr,K xr+1,K xr+2,K … xN,K

y y1 y2 … yr yr+1 yr+2 … yN

Training set

Test set

Introduction to Machine Learning

Split the dataset ●

N instances (=observations): X



K features: F



Class labels: y

x1 x2 … xr xr+1 xr+2

f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1

f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2

… xN

… xN,1

… xN,2

… … … … … … … … …

fK x1,K x2,K … xr,K xr+1,K xr+2,K

y y1 y2 … yr yr+1 yr+2

… xN,K

… yN

Training set

Test set

Introduction to Machine Learning

Split the dataset ●

N instances (=observations): X



K features: F



Class labels: y

x1 x2 … xr xr+1 xr+2

f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1

f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2

… xN

… xN,1

… xN,2

… … … … … … … … …

fK x1,K x2,K … xr,K xr+1,K xr+2,K

y y1 y2 … yr yr+1 yr+2

… xN,K

… yN

Training set Test set

Introduction to Machine Learning

Split the dataset ●

N instances (=observations): X



K features: F



Class labels: y

x1 x2 … xr xr+1 xr+2

f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1

f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2

… xN

… xN,1

… xN,2

… … … … … … … … …

fK x1,K x2,K … xr,K xr+1,K xr+2,K

y y1 y2 … yr yr+1 yr+2

… xN,K

… yN

Training set

Test set

Introduction to Machine Learning

Split the dataset f1

f2



fK

y

x1

x1,1

x1,2



x1,K

y1

x2

x2,1

x2,2



x2,K

y2













xr

xr,1

xr,2



xr,K

yr

xr+1

xr+1,1

xr+1,2



xr+1,K

yr+1

xr+2

xr+2,1

xr+2,2



xr+2,K

yr+2













xN

xN,1

xN,2



xN,K

yN

Training set

Test set

Introduction to Machine Learning

Split the dataset f1

f2



fK

y

x1

x1,1

x1,2



x1,K

y1

x2

x2,1

x2,2



x2,K

y2













xr

xr,1

xr,2



xr,K

yr

xr+1

xr+1,1

xr+1,2



xr+1,K

yr+1

xr+2

xr+2,1

xr+2,2



xr+2,K

yr+2













xN

xN,1

xN,2



xN,K

yN

Use to predict y: ŷ

Training set

Test set

Introduction to Machine Learning

Split the dataset f1

f2



fK

y

x1

x1,1

x1,2



x1,K

y1

x2

x2,1

x2,2



x2,K

y2













xr

xr,1

xr,2



xr,K

yr

xr+1

xr+1,1

xr+1,2



xr+1,K

yr+1

xr+2

xr+2,1

xr+2,2



xr+2,K

yr+2













xN

xN,1

xN,2



xN,K

yN

Use to predict y: ŷ

real y compare them

Training set

Test set

Introduction to Machine Learning

When to use training/test set? ●

Supervised learning



Not for unsupervised (clustering) ●

Data not labeled

Introduction to Machine Learning

Predictive power of model Train model

Training set

Use model

Test model

Test set

Performance measure

Predictive power

Introduction to Machine Learning

How to split the sets? ●

Which observations go where?



Training set larger test set



Typically about 3/1



Quite arbitrary



Generally: more data = be!er model



Test set not too small

Introduction to Machine Learning

Distribution of the sets ●



Classification ●

classes must have similar distributions



avoid a class not being available in a set

Classification & regression ●

shuffle dataset before spli"ing

Introduction to Machine Learning

Effect of sampling ●

Sampling can affect performance measures



Add robustness to these measures: cross-validation



Idea: sample multiple times, with different separations

Introduction to Machine Learning

Cross-validation ●

E.g.: 4-fold cross-validation Training set

Test set

Training set

Test set

Test set

Test set

Training set

Training set

Introduction to Machine Learning

Cross-validation ●

E.g.: 4-fold cross-validation Training set

Test set

Training set

Test set

Test set

Test set

Training set

Training set

Introduction to Machine Learning

Cross-validation ●

E.g.: 4-fold cross-validation Training set

Test set

Training set

Test set

Test set

Test set

Training set

Training set

Introduction to Machine Learning

Cross-validation ●

E.g.: 4-fold cross-validation Training set

Test set

Training set

Test set

Test set

Test set

Training set

Training set

aggregate results for robust measure

Introduction to Machine Learning

n-fold cross-validation ●

Fold test set over dataset n times



Each test set is 1/n size of total dataset

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

INTRODUCTION TO MACHINE LEARNING

Bias and Variance

Introduction to Machine Learning

What you’ve learned? ●

Accuracy and other performance measures



Training and test set

Introduction to Machine Learning

Kni!ing it all together ●

Effect of spli"ing dataset (train/test) on accuracy



Over- and underfi"ing

Introduction to Machine Learning

Introducing BIAS

VARIANCE

Introduction to Machine Learning

Bias and Variance ●

Main goal of supervised learning: prediction



Prediction error ~ reducible + irreducible error

Introduction to Machine Learning

Irreducible - reducible error ●

Irreducible: noise — don’t minimize



Reducible: error due to unfit model — minimize



Reducible error is split into bias and variance

Introduction to Machine Learning

Bias ●

Error due to bias: wrong assumptions



Difference predictions and truth ●

using models trained by specific learning algorithm

Introduction to Machine Learning

Example

Introduction to Machine Learning

Example ●

Quadratic data

Introduction to Machine Learning

Example ●

Quadratic data



Assumption: data is linear — use linear regression

Introduction to Machine Learning

Example ●

Quadratic data




Assumption: data is linear — use linear regression



Error due to bias is high: more restrictions on model

Introduction to Machine Learning

Bias ●

Complexity of model



More restrictions lead to high bias

Introduction to Machine Learning

Variance ●

Error due to variance: error due to the sampling of the training set



Model with high variance fits training set closely

Introduction to Machine Learning

Example ●

Quadratic data



Few restrictions: fit polynomial perfectly through training set



If you change training set, model will change completely

high variance : generalizes bad to test set

Introduction to Machine Learning

Bias-variance tradeoff BIAS

VARIANCE

low bias - high variance low variance - high bias

Introduction to Machine Learning

Overfi!ing ●

Accuracy will depend on dataset split (train/test)



High variance will heavily depend on split



Overfi!ing = model fits training set a lot be!er than test set



Too specific

Introduction to Machine Learning

Underfi!ing ●

Restricting your model too much



High bias



Too general

Introduction to Machine Learning

Example - spam or not? Truth no Emails training set

capital letters exclamation marks

A lot of capital letters? yes

A lot of exclamation marks? yes

exception with
 50 capital letters 30 exclamation marks is no spam

spam

no spam no

no spam

Introduction to Machine Learning

Example - spam or not? Overfit no

A lot of capital letters? Emails training set

yes

capital letters exclamation marks

A lot of exclamation marks? yes

yes

30 exclamation marks?

no

no spam

no

50 capital letters? exception with
 50 capital letters 30 exclamation marks is no spam

no spam

spam no

spam

yes

no spam

too specific!

Introduction to Machine Learning

Example - spam or not? Underfit no

Emails training set

capital letters exclamation marks

More than 10 capital letters?

no spam

yes

spam

too general!

INTRODUCTION TO MACHINE LEARNING

Let’s practice!