Computational Methods in Data Analysis: Particle

Report 0 Downloads 62 Views
Statistical Methods and Analysis Techniques in Experimental Physics

• fitting with constraints • kernel density estimation

Course FS11 (ETH + UZH) Lecture 9 19.4.2011

Christoph Grab

Summary of last lecture

“Statistics show that of those who contract

the habit of eating, very few survive.”. (G.B.Shaw)

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

2

Method of Least Squares  

Goal: from a set of measurements (yi i), find the best estimate of the unknown parameter a in function f(x|a), describing data. Assume data set yi , independent, Gaussian distributed, errors i,

calculate the Chi-square 2(a) with parameters [a] for a model function f(x|a) which predicts for every x-value a y-value  Best estimate of parameters [a] is obtained by minimizing 2(a) 

N 2

(a) i 1

[ yi

f ( xi ; a)]2

Residuals =deviations from “calculated values”, weighted by errors !

2 i

 sum obeys a 2 distribution with N-p degrees of freedom, where N is # of data points and p is # of parameters to determine in f(x|a)

 Note: we have taken the errors the different points Christoph Grab, ETHZ

i

into account for weighting

Statistical Methods and Analysis Techniques in Experimental Physics

3

Example: straight line N 2

(a) i 1

[ yi

f ( xi ; a)]2 2 i

these differences are the so-called

residuals

f(xi|a)

yi

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

4

Method of Least Squares LSQ Procedure to find parameter â which minimizes is by taking derivatives d 2/da=0: N

[ yi

2

a

a

i 1

f ( xi ; a )]2 2 i

N i 1

1 2 i

f ( xi ; a) yi a

2

,

f ( xi ; a)

0

• If a is a p-component vector a=(a1,…,ap), then solve p equations simultaneously  “normal equations”

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

5

LSQ - Connection to Likelihood Given data set yi(xi) with xi known exactly; i=1..N interpret yi as measurements of N independent Gaussian random variables, distributed around theory-function f(xi|a) assumed known assume large enough statistics errors on yi are Gaussian then probability to measure particular yi for given (xi|a )is:

p ( yi | a ) i

Ln L(a, y )

1 2

1 2

[ yi

Ln Lln (a, L( y ) a, y ) 2 

f ( xi ; a )]2

and

2 i

e

f ( xi ; a)]2 2 i

2

Christoph Grab, ETHZ

[ yi

L( y | a ) i

Maximize likelihood by minimizing lnL:

ln

i

2

N 2

2

p( yi | a)

(a)

[ yi

i 1

2

f ( xi ; a)]2 2 i

Term is irrelevant

Statistical Methods and Analysis Techniques in Experimental Physics

6

LSQ: linear matrix case (recap) In case function f is linear in a (not in x), then the normal equations are exactly solvable Linear: f(x|a)= cr(x)ar or f=Ca with Cir=cr(xi) y = column vector of N measured data (yj, j=1..N); a = column vector of n parameters (ai, i=1..n); ; C = Nxn rectangular matrix (fixed) r = y – Ca = y-f = column vector of N residuals (j=1..N); V[y] = NxN covariance matrix ; W = V[y] -1 weight matrix

LSQ principle: minimize 2

(a)

(y

2

T

wrt a:

Ca) V

1

(y

Ca)

T

r Wr

Note: f(x|a)= a1+a2x+a3x2+a4 x is linear in a ! But f(x|a)= a1+sin(x+a2) not ! Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

7

LSQ: linear correlated case Linear case, with correlated measurements, the normal equations 2/ ai=0 for N data points and n coefficients ai (n=0 Energy / momentum are conserved in reaction (4c-fits) So-called “beam-constraints”, e.g. at e+e- colliders, the total energy in the event (s=ECMS2) is given by beams. Particle decays: mass of mother particle is known, once the decay is identified. e.g. (Ks )

First look at linear constraints. If needed one can linearise the problem. Numerically, non-linear constraints are not a real issue any longer.

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

14

Example Tau-lepton decays The goal of the constrained fit is to make optimal use of the experimental data to determine τ branching fractions. Total of 119 conventional branching fractions are listed; and some 28 upper limits. 31 basis modes : sum is constrained to= 1 Constrained fit has a χ2 of 95.7 for 100 degrees of freedom [ prob(χ2,n) = 0.60 ] Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

15

Inclusion of constraints … Idea: reformulate the 2 minimization to an unconstraint one, then fit as before! Constraints can be included in different ways. Elimination method: use the constraint equations to eliminate a number of variables analytically, and then fit the data in terms of the reduced number of unknowns. Method of Lagrangian multipliers: increase the number of unknowns in the fit by adding a set of Lagrangian multipliers, one for each constraint equation.

Pragmatic approaches: mimicking very precise measurements; iterative fitting, fixing parameters one by one...

useful in unstable conditions. Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

16

Example : Angles in Triangle Measure the 3 angles of a triangle yi with an identical error for all of s=1o (for case of argument): y1 = 63o y2=34o y3=85o Wanted: estimates ni for true values of angles

observations: yi – 180 =2 : two degrees off LSQ fit: find solution to 3 2

(n1 , n2 , n3 ) i 1

constraint

Christoph Grab, ETHZ

yi

ni

2

min

i

ni 180 0 i

Statistical Methods and Analysis Techniques in Experimental Physics

17

Example Triangle : Elimination method Elimination method : use constraint to eliminate one n, eliminate n3 , substitute in 2 , and minimize wrt the remaining 2 variables n1 and n2 2

(n1 , n2 )

( y1 n1 ) 2

( y2 n2 ) 2

2 1

Find solutions:

nˆ1

( y3 (180 n1 n2 )) 2

2 2

1 (180 2 y1 3

and from constraint:

2 1

y2

y3 ) 62

1 3

nˆ2

nˆ3 180 nˆ1 nˆ2

min

1 (180 y1 2 y2 3

84

y3 ) 33

1 3

Find : improvement comes from subtracting the measured excess of 2o from all three observations equally  -2/3. No surprise. Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

18

1 3

J.-L. Lagrange Joseph-Louis Lagrange born Giuseppe Lodovico Lagrangia

(Turin, 1736 – 1813; then lived in Berlin and Paris. succeeded Euler as the director of mathematics at the Prussian Academy of Sciences in Berlin. first professor of analysis at the École Polytechnique in Paris

many important contributions: Algebra, Astronomy, Classical and analytical mechanics e.g. Mécanique Analytique Euler-Lagrange equations.. Variation calculus

 introduced Christoph Grab, ETHZ

the method of Lagrangian Multiplier Statistical Methods and Analysis Techniques in Experimental Physics

19

Principle of Lagrange Multipliers Find parameter â which minimizes

2

N

:

2

(a)

f ( xi ; a)]2

[ yi

2 i

i 1

• Add constraints in form of equation: (may depend on a)

g ( xi , a) c

• Re-define 2 including constraint, adding a parameter (add one i per constraint) N [ yi f ( xi ; a)]2 2 ( a, ) 2 2 i 1

: g ( xi , a) c

i

• Find solutions for both (â, ) by derivatives (2 variables) of N 2

a

i 1

2

Christoph Grab, ETHZ

1 2 i

f ( xi ; a) yi a

f ( xi ; a)

( g ( xi ; a) c) a

2

:

0

2[ g ( xi ; a) c] 0

Statistical Methods and Analysis Techniques in Experimental Physics

20

Triangle : Lagrangian Multiplier (1) Lagrangian multiplier method : reformulate problem to an equivalent form with Lagrangian multiplier : 3 2

yi

(n1 , n2 , n3 , ) i 1

ni

2

i

3

2

ni 180

min

i 1

This is a linear LSQ problem for four unknowns ni, The 4 normal equations with 2/ ni=0 become: 2

0=

2 2 i

ni 2

and

0=

( yi

ni ) 2

3

2

ni 180

for

i=1,2,3

for

i 1

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

21

Triangle : Lagrangian Multiplier (2) This can be analytically solved for (sum the equations: -I-II-III+1/ 0=

2

3

yi 180

2

leads to 

6

i 1

: 2*IV ) ˆ=1 2 6 2

3

yi 180 i 1

Then the estimates for the angles become (with Eq. IIII) : 3

nˆ j = y j

2

ˆ

yj

1 3

yi 180 i 1

Find same result: subtracting measured excess of 2o from all three observations equally . symmetry in all three indices is obvious Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

22

Comments on Lagrangian Multiplier In general the method of Lagrange Multipliers is useful if the number of constraints is not too large. Number of Lagrange Multipliers is equal to the number of constraints. All variables are treated symmetrically. the introduced factor of 2 is convention  drops after the derivatives. Linear conditions can be solved analytically Non-linear conditions need be solved iteratively, using reasonable starting conditions for parameters to converge. [ for a more explicit treatment see e.g. V.Blobel ] Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

23

Generalisation : Fitting with constraints General case of

2

minimization with set of additional constraints H(yref )=0

- linearize first (first order Taylor expansion) around some given point yex

H ( yex )

H ( yex ) (y y

yex )

D

y d

0

D: matrix of derivatives, one line per constraint equation (n equations x n param.); d: vector of values of constraints. Function to minimize with respect to (yex, ) 2

( yref

y) V

1

( yref

y) 2

T

(D y d )

min

In the linearized case, the minimization problem has an analytical solution, independent of constraint equations May need to iterate, if initial expansion point is too far off the minimum … Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

24

Example of constraints without error … Consider fitting charge particle tracks in a collider experiment, coming from one common event vertex. 1) Standard way is assigning a certain combination of measured hits to one “track” and fit them to a “helical track” using some standard parametrisation. 2) Can improve this: if I know that the particle originates from the primary interaction, can require that the track comes from (0,0,0). for this, just ADD this (0,0,0) as one additional “measured” hit.

3) Improve, ie. make it more realistic: add error to the origin (0,0,0)

4) Can improve by replacing the origin (0,0,0) by a realistic average run vertex, ie. the beam spot

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

25

Example of constraints … Constraints mimicking very precise measurements Consider finding the origin of an interaction in a collider experiment, the so-called event (or primary) vertex, based on a number of measured tracks. Using information from the beam, an expected distribution of the reactions in the interaction region is known locus of collisions, given in terms of a “beam-profile” e.g. HERA: close to Gaussian in x, y, with x=150 mm and y =25 mm; in z it‟s a flattened non-Gaussian dist. with a length of some 30 cm.

In addition a “run-vertex” can be determined, by averaging over many events in the run. All these information produce “razor-like” shape of the interaction region with errors. Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

26

LSQ Fitting with Weighted Events Events may be weighted: i.e. the weight wi assigned to a particular event i is equal to the inverse probability of observing this event. Weights may come from: prescales from HW Trigger levels (L1,..HLT) online selection processes (skims ..) due to weights in the Monte Carlo simulation process: e.g. steeply falling distributions, may generate separately N events in different regions, but with relative weights.

Problem: how can these individually weighted events taken properly into account in the fit? see Barlow 93, Zech 95

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

27

LSQ Fitting with Weighted Events Note: weights between different events should not be TOO different, else curious results may be obtained. BUT: These procedure is no longer exact. Exactness can only be reached in the limiting case for n infinity Judgement is the key ... !

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

28

LSQ Fits: Residuals and Pulls Goodness of fit checks : Residuals

i

= yi - ni

pulls or stretch function: zi =

i

/

i

:

if observations are uncorrelated and the overall estimation problem is sufficiently linear, then we expect a Gaussian G(0,1) distribution Plot the pull distribution and compare with G(0,1)

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

29

Some notes about fitting/using weighted events … 

Avoid data sets with largely varying weights



if you need weights, be careful when combining events

if you have weights, be extremely careful when quoting errors on “combined samples” 

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

30

How to fit Multiple signals and BGND

Different sources are the decays. - assume, they all have the same weight. Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

31

Binned ML fit without weights (1) Observed data events di, (bins i=1…n) from sources j=1..m j = fraction of source j (=scale factor)  searched parameters distribution of sources is given by MC templates, with Aji = expected number of MC events of source j in bin i, which interpreted as Poisson (fluctuate due to limited stat.), also represent the mean of generated MC events aji m

fi = predicted number of data events in bin i

fi

j

Aji ( x)

j 1

Data are assumed Poisson distributed:

Estimate fractions

j

(searched)and Aji (uninteresting) by ML fits:

n

- ln L

fi

Christoph Grab, ETHZ

n

m

di ln fi i 1

P(di | f i )

f i di e di !

a ji ln Aji

Aji

i 1 j 1

Statistical Methods and Analysis Techniques in Experimental Physics

32

fi

How to fit Multiple signals and BGND

Different sources are the decays, with different BR. - different efficiencies means, they all have different weights. Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

33

LSQ Fitting with Weighted Events (opt) Example: histogram with ni =observed number of events in bin i, fi = predicted number of events, and wij inverse detection probability of event j in bin i. n Defining 1 n 1 i

i

ni ' =

Di =

wij

j 1

ni

j 1

wij

fi' = fi Di

Various alternatives to do a LSQ fit are for N 2 1 i 1

(ni ' fi ) 2 fi

N 2 2 i 1

The various alternatives for

(ni ' fi ) 2 ni ' 2

2

: N

2 2 i 1

(ni ' fi ) 2 fi '

, actually errors (!) are:

1) “exact” for assuming Poisson distribution for ni 2) “approximate”: substitute variance by ni instead of fi 3) using the estimated detection efficiency

Note: 3) is equivalent to 1) for identical weights. Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

35

Literature on fitting etc.. R.Barlow, C.Beeston: “Fitting using finite MC samples”; Computer Physics Communications, Volume 77, Issue 2, October 1993, Pages 219-228 http://lss.fnal.gov/archive/other/man-hep-93-1.pdf

G.Zech: “Comparing Statistical Data to Monte Carlo Simulation” DESY Report 95/113 http://ccdb4fs.kek.jp/cgi-bin/img/allpdf?199508210

A.G.Frodesen, O.Skjeggestad, H.Toefte:”Probability and Statistics in Particle Physics”, 1979, p.295 Lendermann etal. Note on combining weighted events in experiments: http://arxiv.org/abs/0901.4118 Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

36

Kernel density estimation

Kernel density estimation is a way to estimate the PDF of a random variable x in a smooth way – without knowing the true PDF.

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

37

Kernel density estimation Definition: A kernel is • non-negative real-valued integrable function K • used to estimate random variables' density functions pdf, • satisfies the following requirements:

K (u ) 0 K ( u ) K (u ) K (u )du 1 uK (u )du

Positiv Symmetric around 0 normalised (makes it a PDF)

0

expectation value is 0

u 2 K (u )du

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

38

non parametric kernel density estimation consider simplest case of a few measured points:

simplest non parametric kernel density estimation is a histogram Need to determine the bin width and the bin end-points

 bins:

Christoph Grab, ETHZ

2-2.5, 2.5-3, 3-3.5, 3.5-4,…

bin width=0.5

Statistical Methods and Analysis Techniques in Experimental Physics

39

non parametric kernel density estimation same data points, same bin width, but shifting the bin centres gives a different picture distribution is NOT smooth

 shifted bins: 2.25-2.75, 2.75-3.25, … Christoph Grab, ETHZ

bin width=0.5

Statistical Methods and Analysis Techniques in Experimental Physics

40

Idea of kernel density estimation Kernel density estimation is a way to estimate the PDF of a random variable x in a smooth way it is determined by kernel K, which defines the weighting around each data point i.e. centred at the data point itself and bandwidth b which defines the width of weighting   Summing kernels centred around every data point:

Sample of events

Christoph Grab, ETHZ

Apply kernel to each event

Summed kernels for all events  “pdf”

Statistical Methods and Analysis Techniques in Experimental Physics

41

Other example: kernel density estimation

 Six

data points measured

 Six Gaussians (red) and their sum  The kernel density estimate f(x) is

(blue). obtained by taking the sum of all 6 Gaussians, and dividing by 6 (=number of Gaussians).

 The

variance of the Gaussians was set to 0.5.

 Note

: where the points are denser, the density estimate will have higher values.

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

42

Non-parametric density-estimation Generalized kernel density estimation = procedure to estimate unknown PDF in a non-parametric way Let xi be a set of i=1…n independent and identically distributed samples of a random variable x relative occurrence 1/n of data point xi in sample is seen as probability density smear/smooth probability density around xi according to certain pattern (=kernel); width of smoothening around xi is called bandwidth b General expression f for kernel density approximation of the data’s probability density function f with the kernel K(u) with u=(x-xi)/b is given by:

fˆnK ( x) Christoph Grab, ETHZ

1 nb

n i

x xi K b 1

Statistical Methods and Analysis Techniques in Experimental Physics

43

Finite Kernel choices Several choices for finite kernels K(u)  Rectangular

kernel

KR (u) 0.5  Triangular

for

kernel

KT (u) 1 | u |  Bi-Square

K B (u )

for

1 u 1, else 0

or Quartic kernel

15 1 u2 16

Christoph Grab, ETHZ

1 u 1, else 0

2

for

1 u 1, else 0

Statistical Methods and Analysis Techniques in Experimental Physics

44

Finite Kernel choices The Epanechnikov kernel used kernels

K (u)

3 (1 u 2 ) 4

Christoph Grab, ETHZ

for

K(u), one of the most often

1 u 1, else 0

Statistical Methods and Analysis Techniques in Experimental Physics

45

Infinite Kernels Other often used infinite kernels

Gaussian kernel

KG (u )

Christoph Grab, ETHZ

1 exp( u 2 / 2) 2

Statistical Methods and Analysis Techniques in Experimental Physics

46

Comments on density-estimation nearly optimal choice for kernel is Epanechnikow kernel, since it yields efficient estimations  see below Gauss kernel is also good and frequently used in practice, the choice of bandwidth parameter b in kernel density estimation is much more important than kernel choice. Too small parameter b density fluctuates a lot, many local maxima/minima „undersmoothing‟

too large bandwidth lose information in data, washes out substructures „oversmoothing‟

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

47

example: bandwidth influence histogram, centred at bins with a) box kernel density estimate b) and c) use Gaussian kernel, but under- or oversmoothed

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

48

example: bandwidth influence

 Kernel

density estimation of 100 normally distributed random numbers using different smoothing bandwidths values b

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

49

Example: Old Faithful • Old Faithful in Yellowstone Park • eruption period regular, but duration of eruption irregular • „textbook‟ example to apply non-parametric kernel density estimation

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

50

Choice of bandwidth b ? Task: how to choose the optimal bandwidth ? solution: choose the bandwidth b, which minimises the so-called AMISE =Asymptotic Mean Integrated Squared Error In general, the AMISE depends on the true underlying density (which we don't have!); so we need to estimate the AMISE from our data as well.

This means that the chosen bandwidth is an estimate of an asymptotic approximation. BUT: this particular choice of bandwidth recovers all the important features whilst maintaining smoothness

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

51

Choice of optimal bandwidth b (opt) Choose the optimal bandwidth: Use mean integrated squared error (MISE) to estimate accuracy of estimation (f=unknown PDF to estimate; fˆ kernel density estimate of data)

For kernel density estimation with kernel K(z), MISE can be approximated:

bias

variance (dominant for small b) Bias depends on second derivative of f and disappears for b0. Tradeoff.

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

52

Choice of optimal bandwidth b Bandwidth b is optimal, if MISE minimal: dMISE/db=0

bopt

1 n

2

K ( z ) dz 2

f ''( x) 2 dx

1/5

 mean

integrated squared error for kernel K(z)  =Int [ z2 K(z) ]

Optimal b depends on sample size n and f‟‟ (true PDF). But f is unknown, may be not even analytically existing

Rule of thumb: replace the unknown f by either of the following kernels, with optimal bopt: Plug-in approach: start initially with best guesstimate of b and calculate then f‟‟(x)dx. Calculate then b again. Iterate towards  bopt Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

53

example: optimal bandwidth our histogram: found the optimal smoothing parameter, the bandwidth is b ~ 0.25

Now we have: smooth no end points depend on bandwidth

Christoph Grab, ETHZ

Statistical Methods and Analysis Techniques in Experimental Physics

54

Adaptive kernel density estimation bandwidth b is not constant anymore increase (decrease) b in region with small (high) data population preserve small features in high statistics area, minimize jitter in small statistics area

fˆnK ( x) Static Kernel (width of all Gaussian identical)

Christoph Grab, ETHZ

1 n

n i 1

x xi 1 K bi bi

Adaptive Kernel (width of all Gaussian depends on local density of events)

Statistical Methods and Analysis Techniques in Experimental Physics

55