introduction to machine learning

Comment

Report 3 Downloads 73 Views

INTRODUCTION TO MACHINE LEARNING

Measuring model performance or error

Introduction to Machine Learning

Is our model any good? ●

●

Context of task ●

Accuracy

●

Computation time

●

Interpretability

3 types of tasks ●

Classification

●

Regression

●

Clustering

Introduction to Machine Learning

Classification ●

Accuracy and Error

●

System is right or wrong

●

Accuracy goes up when Error goes down

correctly classified instances Accuracy = total amount of classified instances Error

= 1 - Accuracy

Introduction to Machine Learning

Example ●

Squares with 2 features: small/big and solid/do!ed

●

Label: colored/not colored

●

Binary classification problem

Introduction to Machine Learning

Example Truth Predicted

✔

✘

✔ ✔ ✔ ✔ ✔ ✔ ✘ ✘

✘

✔

=

3 5

✔

= 60%

Introduction to Machine Learning

Example Truth Predicted

✔

✘

✘

✔

✔

✔ ✔ ✔ = ✔ ✔ ✔ ✘

✘

3 5

=

60%

Introduction to Machine Learning

Limits of accuracy ●

Classifying very rare heart disease

●

Classify all as negative (not sick)

●

Predict 99 correct (not sick) and miss 1

●

Accuracy: 99%

●

Bogus… you miss every positive case!

Introduction to Machine Learning

Confusion matrix ●

Rows and columns contain all available labels

●

Each cell contains frequency of instances that are classified in a certain way

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction

True Positives Prediction: P Truth: P

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction

True Negatives Prediction: N Truth: N

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction

False Negatives Prediction: N Truth: P

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Confusion matrix ●

Binary classifier: positive or negative (1 or 0)

Prediction

False Positives Prediction: P Truth: N

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy

●

Precision

●

Recall Prediction P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy

●

Precision

●

Recall Prediction

Precision TP/(TP+FP)

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy

●

Precision

●

Recall Prediction

Precision TP/(TP+FP)

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy

●

Precision

●

Recall Prediction

Recall TP/(TP+FN)

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Ratios in the confusion matrix ●

Accuracy

●

Precision

●

Recall Prediction

Recall TP/(TP+FN)

P

N

p

TP

FN

n

FP

TN

Truth

Introduction to Machine Learning

Back to the squares

Prediction

Truth Predicted

P

N

p

1

1

n

1

2

Truth

Introduction to Machine Learning

Back to the squares

Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted

✔

Introduction to Machine Learning

Back to the squares

Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted

✔

✔

Introduction to Machine Learning

Back to the squares

Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted

✘

Introduction to Machine Learning

Back to the squares

Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted

✘

Introduction to Machine Learning

Back to the squares ●

Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%

●

Precision: TP/(TP+FP) = 1/(1+1) = 50%

●

Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted

✔

✔

✔

Introduction to Machine Learning

Back to the squares ●

Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%

●

Precision: TP/(TP+FP) = 1/(1+1) = 50%

●

Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted

✔

✘

✘

✔

✔

Introduction to Machine Learning

Back to the squares ●

Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%

●

Precision: TP/(TP+FP) = 1/(1+1) = 50%

●

Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted

✘

✔

Introduction to Machine Learning

Back to the squares ●

Accuracy: (TP+TN)/(TP+FP+FN+TN) = (1+2)/(1+2+1+1) = 60%

●

Precision: TP/(TP+FP) = 1/(1+1) = 50%

●

Recall: TP/(TP+FN) = 1/(1+1) = 50% Prediction

Truth

P

N

p

1

1

n

1

2

Truth

Predicted

✘

✔

Introduction to Machine Learning

Rare heart disease ●

Accuracy: 99/(99+1) = 99%

●

Recall: 0/1 = 0%

●

Precision: undefined — no positive predictions Prediction P

N

p

0

1

n

0

99

Truth

Introduction to Machine Learning

Regression: RMSE ●

Root Mean Squared Error (RMSE)

●

Mean distance between estimates and regression line

12

●

11

●

●

10

●

X2

●● ●

●

● ● ● ● ●●

●

● ● ● ●

●

9

● ●

8

● ●

● ●● ●

●

●

●

7

● ●● ● ● ●

6

●

6

7

8

9 X1

10

11

12

Introduction to Machine Learning

Clustering ●

No label information

●

Need distance metric between points

Introduction to Machine Learning

Clustering ●

Performance measure consists of 2 elements ●

Similarity within each cluster

●

Similarity between clusters 

Introduction to Machine Learning

Within cluster similarity Within sum of squares (WSS)

●

Diameter

10

●

●

Minimize

●● ● ●

●

5

● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ●●● ●● ●● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●

−5

0

X2

●

●

● ●

−5

0

5 X1

10

Introduction to Machine Learning

Between cluster similarity Between cluster sum of squares (BSS)

●

Intercluster distance

10

●

●

●

● ●

●● ● ●

5 0 −5

Maximize

● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ●●● ● ● ●● ● ●●● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●

X2

●

●

−5

0

5 X1

10

Introduction to Machine Learning

10

Dunn’s index ●

●

● ●

●● ● ●

5

● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●●● ●● ● ● ●● ●●● ●● ●● ●● ● ●● ● ● ●● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●

−5

0

X2

minimal intercluster distance maximal diameter

●

−5

0

5 X1

10

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

INTRODUCTION TO MACHINE LEARNING

Training set and test set

Introduction to Machine Learning

Machine learning - statistics ●

Predictive power vs. descriptive power

●

Supervised learning: model must predict ●

●

unseen observations

Classical statistics: model must fit data ●

explain or describe data

Introduction to Machine Learning

Predictive model ●

Training ●

not on complete dataset

●

training set

●

Test set to evaluate performance of model

●

Sets are disjoint: NO OVERLAP

●

Model tested on unseen observations   -> Generalization!

Introduction to Machine Learning

Split the dataset ●

N instances (=observations): X

●

K features: F

●

Class labels: y

x1 x2 … xr xr+1 xr+2 … xN

f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1 … xN,1

f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2 … xN,2

… … … … … … … … …

fK x1,K x2,K … xr,K xr+1,K xr+2,K … xN,K

y y1 y2 … yr yr+1 yr+2 … yN

Training set

Test set

Introduction to Machine Learning

Split the dataset ●

N instances (=observations): X

●

K features: F

●

Class labels: y

x1 x2 … xr xr+1 xr+2

f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1

f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2

… xN

… xN,1

… xN,2

… … … … … … … … …

fK x1,K x2,K … xr,K xr+1,K xr+2,K

y y1 y2 … yr yr+1 yr+2

… xN,K

… yN

Training set

Test set

Introduction to Machine Learning

Split the dataset ●

N instances (=observations): X

●

K features: F

●

Class labels: y

x1 x2 … xr xr+1 xr+2

f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1

f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2

… xN

… xN,1

… xN,2

… … … … … … … … …

fK x1,K x2,K … xr,K xr+1,K xr+2,K

y y1 y2 … yr yr+1 yr+2

… xN,K

… yN

Training set Test set

Introduction to Machine Learning

Split the dataset ●

N instances (=observations): X

●

K features: F

●

Class labels: y

x1 x2 … xr xr+1 xr+2

f1 x1,1 x2,1 … xr,1 xr+1,1 xr+2,1

f2 x1,2 x2,2 … xr,2 xr+1,2 xr+2,2

… xN

… xN,1

… xN,2

… … … … … … … … …

fK x1,K x2,K … xr,K xr+1,K xr+2,K

y y1 y2 … yr yr+1 yr+2

… xN,K

… yN

Training set

Test set

Introduction to Machine Learning

Split the dataset f1

f2

…

fK

y

x1

x1,1

x1,2

…

x1,K

y1

x2

x2,1

x2,2

…

x2,K

y2

…

…

…

…

…

…

xr

xr,1

xr,2

…

xr,K

yr

xr+1

xr+1,1

xr+1,2

…

xr+1,K

yr+1

xr+2

xr+2,1

xr+2,2

…

xr+2,K

yr+2

…

…

…

…

…

…

xN

xN,1

xN,2

…

xN,K

yN

Training set

Test set

Introduction to Machine Learning

Split the dataset f1

f2

…

fK

y

x1

x1,1

x1,2

…

x1,K

y1

x2

x2,1

x2,2

…

x2,K

y2

…

…

…

…

…

…

xr

xr,1

xr,2

…

xr,K

yr

xr+1

xr+1,1

xr+1,2

…

xr+1,K

yr+1

xr+2

xr+2,1

xr+2,2

…

xr+2,K

yr+2

…

…

…

…

…

…

xN

xN,1

xN,2

…

xN,K

yN

Use to predict y: ŷ

Training set

Test set

Introduction to Machine Learning

Split the dataset f1

f2

…

fK

y

x1

x1,1

x1,2

…

x1,K

y1

x2

x2,1

x2,2

…

x2,K

y2

…

…

…

…

…

…

xr

xr,1

xr,2

…

xr,K

yr

xr+1

xr+1,1

xr+1,2

…

xr+1,K

yr+1

xr+2

xr+2,1

xr+2,2

…

xr+2,K

yr+2

…

…

…

…

…

…

xN

xN,1

xN,2

…

xN,K

yN

Use to predict y: ŷ

real y compare them

Training set

Test set

Introduction to Machine Learning

When to use training/test set? ●

Supervised learning

●

Not for unsupervised (clustering) ●

Data not labeled

Introduction to Machine Learning

Predictive power of model Train model

Training set

Use model

Test model

Test set

Performance measure

Predictive power

Introduction to Machine Learning

How to split the sets? ●

Which observations go where?

●

Training set larger test set

●

Typically about 3/1

●

Quite arbitrary

●

Generally: more data = be!er model

●

Test set not too small

Introduction to Machine Learning

Distribution of the sets ●

●

Classification ●

classes must have similar distributions

●

avoid a class not being available in a set

Classification & regression ●

shuﬄe dataset before spli"ing

Introduction to Machine Learning

Eﬀect of sampling ●

Sampling can aﬀect performance measures

●

Add robustness to these measures: cross-validation

●

Idea: sample multiple times, with diﬀerent separations

Introduction to Machine Learning

Cross-validation ●

E.g.: 4-fold cross-validation Training set

Test set

Training set

Test set

Test set

Test set

Training set

Training set

Introduction to Machine Learning

Cross-validation ●

E.g.: 4-fold cross-validation Training set

Test set

Training set

Test set

Test set

Test set

Training set

Training set

Introduction to Machine Learning

Cross-validation ●

E.g.: 4-fold cross-validation Training set

Test set

Training set

Test set

Test set

Test set

Training set

Training set

Introduction to Machine Learning

Cross-validation ●

E.g.: 4-fold cross-validation Training set

Test set

Training set

Test set

Test set

Test set

Training set

Training set

aggregate results for robust measure

Introduction to Machine Learning

n-fold cross-validation ●

Fold test set over dataset n times

●

Each test set is 1/n size of total dataset

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

INTRODUCTION TO MACHINE LEARNING

Bias and Variance

Introduction to Machine Learning

What you’ve learned? ●

Accuracy and other performance measures

●

Training and test set

Introduction to Machine Learning

Kni!ing it all together ●

Eﬀect of spli"ing dataset (train/test) on accuracy

●

Over- and underfi"ing

Introduction to Machine Learning

Introducing BIAS

VARIANCE

Introduction to Machine Learning

Bias and Variance ●

Main goal of supervised learning: prediction

●

Prediction error ~ reducible + irreducible error

Introduction to Machine Learning

Irreducible - reducible error ●

Irreducible: noise — don’t minimize

●

Reducible: error due to unfit model — minimize

●

Reducible error is split into bias and variance

Introduction to Machine Learning

Bias ●

Error due to bias: wrong assumptions

●

Diﬀerence predictions and truth ●

using models trained by specific learning algorithm

Introduction to Machine Learning

Example

Introduction to Machine Learning

Example ●

Quadratic data

Introduction to Machine Learning

Example ●

Quadratic data

●

Assumption: data is linear — use linear regression

Introduction to Machine Learning

Example ●

Quadratic data

●

 Assumption: data is linear — use linear regression

●

Error due to bias is high: more restrictions on model

Introduction to Machine Learning

Bias ●

Complexity of model

●

More restrictions lead to high bias

Introduction to Machine Learning

Variance ●

Error due to variance: error due to the sampling of the training set

●

Model with high variance fits training set closely

Introduction to Machine Learning

Example ●

Quadratic data

●

Few restrictions: fit polynomial perfectly through training set

●

If you change training set, model will change completely

high variance : generalizes bad to test set

Introduction to Machine Learning

Bias-variance tradeoﬀ BIAS

VARIANCE

low bias - high variance low variance - high bias

Introduction to Machine Learning

Overfi!ing ●

Accuracy will depend on dataset split (train/test)

●

High variance will heavily depend on split

●

Overfi!ing = model fits training set a lot be!er than test set

●

Too specific

Introduction to Machine Learning

Underfi!ing ●

Restricting your model too much

●

High bias

●

Too general

Introduction to Machine Learning

Example - spam or not? Truth no Emails training set

capital letters exclamation marks

A lot of capital letters? yes

A lot of exclamation marks? yes

exception with  50 capital letters 30 exclamation marks is no spam

spam

no spam no

no spam

Introduction to Machine Learning

Example - spam or not? Overfit no

A lot of capital letters? Emails training set

yes

capital letters exclamation marks

A lot of exclamation marks? yes

yes

30 exclamation marks?

no

no spam

no

50 capital letters? exception with  50 capital letters 30 exclamation marks is no spam

no spam

spam no

spam

yes

no spam

too specific!

Introduction to Machine Learning

Example - spam or not? Underfit no

Emails training set

capital letters exclamation marks

More than 10 capital letters?

no spam

yes

spam

too general!

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

Recommend Documents

INTRODUCTION TO MACHINE LEARNING

Introduction to Machine Learning

introduction to machine learning

Introduction to Machine Quilting

Introduction to Machine Applique