EFFICIENCY OF FUNCTIONAL REGRESSION ESTIMATORS FOR ...

Report 4 Downloads 143 Views
EFFICIENCY OF FUNCTIONAL REGRESSION ESTIMATORS FOR COMBINING MULTIPLE LASER SCANS OF cDNA MICROARRAYS

Chris Glasbey & Mizan Khondoker

Biomathematics and Statistics Scotland

cDNA array to study ingestion of apoptotic cells by rodent macrophages

2

40000

Spot summary data from 4 laser scans

20000

30000

Y i4

10000

Y i3

0

Y i2

0

2000

4000

8000

Y i1

Functional regression model Yij ∼ N (µiβj , σj2)

6000

for gene i = 1, . . . , n, 3

scan j = 1, . . . , m

12 9 0

3

6

Y i2

6 0

3

Y i2

9

12

Functional regression is intractable if there are 2 variables and an intercept term, because there is insufficient information to distinguish σ1 from σ2

0

2

4

6

8

0

2

Y i1

4

Y i1

But what if there are > 2 variables and no intercept term? 4

6

8

Functional regression appears in many guises: • Ultrastructural relationship • One-factor factor analysis model • Instrumental- or latent-variable model • Errors-in-variables or measurement error model

5

c °Helge Roider, Chalmers University, Sweden

6

Stages in analysing microarray data: A. Estimation of gene expression level from scan data B. Normalisation to standardise means and variances C. Detection of differential expression D. Multivariate methods E. Dynamic models

7

Yij ∼ N (µiβj , σj2)

Model:

(with β1 ≡ 1 for identifiability)

The minimum variance unbiased estimator of gene i expression level is µˆ i =

X

βj2/σj2 wj = X 2 2 k βk /σk

Yij , j wj βj

But how do we estimate βs and σs? Maximum likelihood fails, because likelihood → ∞ as σj2 → 0 for any j However, for m ≥ 3 there is sufficient information in second moments to estimate βs and σs, for example by minimising: trace{(S−V )2} :

Y TY 2} S= , V = E(S) = µ2ββ T +diag{σ12, . . . , σm n 8

2nd moment estimates for

j 1 2 3 4 all

Yij ∼ N (µiβj , σj2)

βˆj 1.00 1.56 2.75 4.29

σˆ j 27.0 30.6 49.6 149.5

sd(ˆ µ) 27.0 19.6 18.1 34.8 11.3

These estimators are consistent, but are they efficient?

9

We simulated 1000 datasets similar to the observed data, then fitted the model 106×rmse Estimation method 2nd moments centred 2nd moments 1st and 2nd moments 1st and centred 2nd moments

βˆ2 562 826 768 768

βˆ3 961 1406 1316 1316

βˆ4 2110 3003 2858 2858

103×rmse σˆ 1 344 344 13002 12907

σˆ 2 507 507 18210 18216

Are these the smallest possible root-mean-square errors?

10

σˆ 3 920 920 30121 30045

σˆ 4 1389 1389 31226 31024

Maximum likelihood theory fails because the number of parameters increases with sample size One way around this is to eliminate µ ¶ βj Yij − Yik ≡ Zijk ∼ N (0 , βk

the µs. For example: 2 β 2 ≡ σ2 + j σ2 where ωjk j βk2 k

2 ), ωjk

We can estimate the βs and σs by maximising the product of likelihoods of all such pairwise differences: "

2 Y 1 Y Zijk exp − 2 i j6=k ωjk 2ωjk

11

#

More simulation results: 106×rmse Estimation method 2nd moments Pairwise differences or (βk Yij − βj Yik )

βˆ2 562 601 2856

βˆ3 961 1073 5075

None of these methods is efficient!

12

103×rmse βˆ4 2110 2351 7516

σˆ 1 344 270 274

σˆ 2 507 398 398

σˆ 3 920 707 706

σˆ 4 1389 1392 1392

Chan & Mak (Biometrika, 1983) proposed consistent estimators for the more general model Yij ∼ N (αj +µiβj , σj2), which they showed to be identical to maximum likelihood estimators for the Gaussian structural model µi ∼ N (ν, τ 2)



2 }) Yi. ∼ Nm(α+νβ , τ 2ββ T + diag{σ12, . . . , σm

We omit αs We could also omit ν, because Yi. and −Yi. are equivalent in the functional regression model, and minimise log |V | + trace(SV −1) If the set of values of µ is compatible with a Gaussian distribution, we would expect these estimators to be reasonably efficient. But, what if they are far from Gaussian?

13

More simulation results: 106×rmse Estimation method 2nd moments Pairwise differences Gaussian Gaussian (with ν ≡ 0)

βˆ2 562 601 562 562

βˆ3 961 1073 961 962

βˆ4 2110 2351 2109 2110

103×rmse σˆ 1 344 270 269 269

σˆ 2 507 398 396 396

σˆ 3 920 707 707 707

σˆ 4 1389 1392 1389 1390

Are the Gaussian methods efficient? If we assume the µs are drawn from a distribution, then this can be approximated by a mixture of Gaussians

14

0.0

0.2

0.4

p

0.6

0.8

1.0

Empirical and fitted cumulative distribution function of µ

200

Where µ ∼ N (νk , τk2) k 1 2 3 4 5

500

1000

2000

5000 10000

µ

with probability πk νˆk 320 453 808 1530 3379

τˆk 48 90 241 536 1632 15

for k = 1, . . . , K πˆ k 0.25 0.27 0.28 0.17 0.03

If µ is distributed as a mixture of Gaussians, then Yi. is a mixture of mdimensional Gaussians: 2 }) prob. π for k = 1, . . . , K Yi. ∼ Nm(νk β , τk2ββ T +diag{σ12, . . . , σm k

We can estimate βs and σs by maximising the likelihood of this mixture distribution

If we assume that the distribution of µ is known, the remaining parameters should be estimated with super efficiency

16

We simulated 1000 datasets from the mixture model, then fitted by the five methods: 106×rmse Estimation method 2nd moments Pairwise differences Gaussian Gaussian (with ν ≡ 0) Known Gaussian mixture

βˆ2 562 601 562 562 563

βˆ3 961 1073 961 962 961

βˆ4 2110 2351 2109 2110 2115

103×rmse σˆ 1 344 270 269 269 269

σˆ 2 507 398 396 396 395

σˆ 3 920 707 707 707 701

rmse σˆ 4 1389 1392 1389 1390 1387

µˆ 11.3 11.3 11.3 11.3 11.3

Using a known Gaussian mixture should give a lower bound on what is achievable by any other method, and hence a lower bound on efficiencies We see that the Gaussian methods, with or without ν, are efficient Still true if we include intercept terms (α) However, all methods estimate µs efficiently 17

To see what happens with noisier data, we repeated the simulations with 10 × σs: 105×rmse Estimation method 2nd moments Pairwise differences Gaussian Gaussian (with ν ≡ 0) Known Gaussian mixture

βˆ2 579 2292 571 571 537

βˆ3 998 4742 979 979 909

102×rmse βˆ4 2145 8972 2128 2126 2071

σˆ 1 344 271 268 269 254

Gaussian methods are less efficient in this case

18

σˆ 2 507 393 392 396 344

σˆ 3 920 701 696 706 609

rmse σˆ 4 1389 1565 1382 1390 1345

µˆ 113 114 113 113 113

Summary Combining multiple laser scans of cDNA microarrays can be formulated as a functional regression problem Maximum likelihood cannot be used to fit such models, even though in ≥ 2 dimensions there is sufficient information in the data to estimate all parameters However, we have shown that fitting a seemingly inappropriate Gaussian structural model leads to efficient estimators in our region of parameter space Matching second moments and using pairwise differences are less efficient, though better than several other methods we have used For further details, see paper on http://www.bioss.ac.uk/~chris 19