Bayesian Bayesian Estimation

Report 8 Downloads 370 Views
Probabilistic Graphical Models

Learning Parameter Estimation

Bayesian  Bayesian Estimation Daphne Koller

Limitations of MLE

• Two teams play 10 times, and the first wins 7 of the 10 matches Ö Probability of first team winning = 0.7

• A coin is tossed 10 times, and comes out ‘heads’ 7 of the 10 tosses Ö Probability of heads = 0.7

• A coin is tossed 10000 times, and comes out ‘heads’ 7000 of the 10000 tosses Ö Probability of heads = 0.7

Daphne Koller

Parameter Estimation as a PGM θ

θ X Data m

X[1] 1

...

X[M]

• Given a fixed θ, tosses are independent • If θ is unknown, tosses are not marginally independent – each toss tells us something about θ

Daphne Koller

Bayesian Inference • Joint probabilistic model P ( x [1 ],..., x [ M ], θ )

X[1]

θ

...

X[M]

= P ( x [1 ],..., x [ M ] | θ ) P (θ ) M

= P (θ ) ∏ P ( x [ i ] | θ ) i =1 M

= P (θ )θ

H

(1 − θ ) M T

P ( x [1 ],..., x [ M ] | θ ) P (θ ) P (θ | x [1 ],..., x [ M ]) = P ( x [1 ],..., x [ M ]) Daphne Koller

Dirichlet Distribution • θ is a multinomial distribution over k values • Dirichlet distribution θ ~Dirichlet(α1,...,αk) –

1 whereP (θ ) = Z

k

∏θ i =1

Γ (α ) ∏ and Z = Γ (∑ α ) k

α i −1 i

i =1

i

k

i =1

i



Γ(x) =

∫t

x −1

e − t dt

0

• Intuitively, hyperparameters correspond to the number of samples we have seen Daphne Koller

Dirichlet Distributions

5

Dirichlet(1,1) Dirichlet(2,2) Dirichlet(0.5,0.5) Dirichlet(5,5)

4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

0

0.2

0.4

0.6

0.8

1 Daphne Koller

Dirichlet Priors & Posteriors P (θ | D ) ∝ P ( D | θ ) P (θ )

P(D |θ ) =



θ i =1 i k

M

i

P (θ ) ∝

k

α θ ∏ i

i

−1

i =1

• If P(θ) is Dirichlet Di ichl t and nd th the lik likelihood lih d is multinomial, then the posterior is also Dirichlet – Prior is Dir(α1,...,αk) – Data counts are M1,...,Mk – Posterior is Dir(α1+M1,...αk+Mk)

• Dirichlet is a conjugate prior for the multinomial Daphne Koller

Summary

• Bayesian learning treats parameters as random variables – Learning is then a special case of inference

• Dirichlet distribution is conjugate to multinomial – Posterior has same form as prior – Can be updated in closed form using sufficient statistics from data

Daphne Koller