EFFICIENCY OF FUNCTIONAL REGRESSION ESTIMATORS FOR COMBINING MULTIPLE LASER SCANS OF cDNA MICROARRAYS
Chris Glasbey & Mizan Khondoker
Biomathematics and Statistics Scotland
cDNA array to study ingestion of apoptotic cells by rodent macrophages
2
40000
Spot summary data from 4 laser scans
20000
30000
Y i4
10000
Y i3
0
Y i2
0
2000
4000
8000
Y i1
Functional regression model Yij ∼ N (µiβj , σj2)
6000
for gene i = 1, . . . , n, 3
scan j = 1, . . . , m
12 9 0
3
6
Y i2
6 0
3
Y i2
9
12
Functional regression is intractable if there are 2 variables and an intercept term, because there is insufficient information to distinguish σ1 from σ2
0
2
4
6
8
0
2
Y i1
4
Y i1
But what if there are > 2 variables and no intercept term? 4
6
8
Functional regression appears in many guises: • Ultrastructural relationship • One-factor factor analysis model • Instrumental- or latent-variable model • Errors-in-variables or measurement error model
5
c °Helge Roider, Chalmers University, Sweden
6
Stages in analysing microarray data: A. Estimation of gene expression level from scan data B. Normalisation to standardise means and variances C. Detection of differential expression D. Multivariate methods E. Dynamic models
7
Yij ∼ N (µiβj , σj2)
Model:
(with β1 ≡ 1 for identifiability)
The minimum variance unbiased estimator of gene i expression level is µˆ i =
X
βj2/σj2 wj = X 2 2 k βk /σk
Yij , j wj βj
But how do we estimate βs and σs? Maximum likelihood fails, because likelihood → ∞ as σj2 → 0 for any j However, for m ≥ 3 there is sufficient information in second moments to estimate βs and σs, for example by minimising: trace{(S−V )2} :
Y TY 2} S= , V = E(S) = µ2ββ T +diag{σ12, . . . , σm n 8
2nd moment estimates for
j 1 2 3 4 all
Yij ∼ N (µiβj , σj2)
βˆj 1.00 1.56 2.75 4.29
σˆ j 27.0 30.6 49.6 149.5
sd(ˆ µ) 27.0 19.6 18.1 34.8 11.3
These estimators are consistent, but are they efficient?
9
We simulated 1000 datasets similar to the observed data, then fitted the model 106×rmse Estimation method 2nd moments centred 2nd moments 1st and 2nd moments 1st and centred 2nd moments
βˆ2 562 826 768 768
βˆ3 961 1406 1316 1316
βˆ4 2110 3003 2858 2858
103×rmse σˆ 1 344 344 13002 12907
σˆ 2 507 507 18210 18216
Are these the smallest possible root-mean-square errors?
10
σˆ 3 920 920 30121 30045
σˆ 4 1389 1389 31226 31024
Maximum likelihood theory fails because the number of parameters increases with sample size One way around this is to eliminate µ ¶ βj Yij − Yik ≡ Zijk ∼ N (0 , βk
the µs. For example: 2 β 2 ≡ σ2 + j σ2 where ωjk j βk2 k
2 ), ωjk
We can estimate the βs and σs by maximising the product of likelihoods of all such pairwise differences: "
2 Y 1 Y Zijk exp − 2 i j6=k ωjk 2ωjk
11
#
More simulation results: 106×rmse Estimation method 2nd moments Pairwise differences or (βk Yij − βj Yik )
βˆ2 562 601 2856
βˆ3 961 1073 5075
None of these methods is efficient!
12
103×rmse βˆ4 2110 2351 7516
σˆ 1 344 270 274
σˆ 2 507 398 398
σˆ 3 920 707 706
σˆ 4 1389 1392 1392
Chan & Mak (Biometrika, 1983) proposed consistent estimators for the more general model Yij ∼ N (αj +µiβj , σj2), which they showed to be identical to maximum likelihood estimators for the Gaussian structural model µi ∼ N (ν, τ 2)
⇒
2 }) Yi. ∼ Nm(α+νβ , τ 2ββ T + diag{σ12, . . . , σm
We omit αs We could also omit ν, because Yi. and −Yi. are equivalent in the functional regression model, and minimise log |V | + trace(SV −1) If the set of values of µ is compatible with a Gaussian distribution, we would expect these estimators to be reasonably efficient. But, what if they are far from Gaussian?
13
More simulation results: 106×rmse Estimation method 2nd moments Pairwise differences Gaussian Gaussian (with ν ≡ 0)
βˆ2 562 601 562 562
βˆ3 961 1073 961 962
βˆ4 2110 2351 2109 2110
103×rmse σˆ 1 344 270 269 269
σˆ 2 507 398 396 396
σˆ 3 920 707 707 707
σˆ 4 1389 1392 1389 1390
Are the Gaussian methods efficient? If we assume the µs are drawn from a distribution, then this can be approximated by a mixture of Gaussians
14
0.0
0.2
0.4
p
0.6
0.8
1.0
Empirical and fitted cumulative distribution function of µ
200
Where µ ∼ N (νk , τk2) k 1 2 3 4 5
500
1000
2000
5000 10000
µ
with probability πk νˆk 320 453 808 1530 3379
τˆk 48 90 241 536 1632 15
for k = 1, . . . , K πˆ k 0.25 0.27 0.28 0.17 0.03
If µ is distributed as a mixture of Gaussians, then Yi. is a mixture of mdimensional Gaussians: 2 }) prob. π for k = 1, . . . , K Yi. ∼ Nm(νk β , τk2ββ T +diag{σ12, . . . , σm k
We can estimate βs and σs by maximising the likelihood of this mixture distribution
If we assume that the distribution of µ is known, the remaining parameters should be estimated with super efficiency
16
We simulated 1000 datasets from the mixture model, then fitted by the five methods: 106×rmse Estimation method 2nd moments Pairwise differences Gaussian Gaussian (with ν ≡ 0) Known Gaussian mixture
βˆ2 562 601 562 562 563
βˆ3 961 1073 961 962 961
βˆ4 2110 2351 2109 2110 2115
103×rmse σˆ 1 344 270 269 269 269
σˆ 2 507 398 396 396 395
σˆ 3 920 707 707 707 701
rmse σˆ 4 1389 1392 1389 1390 1387
µˆ 11.3 11.3 11.3 11.3 11.3
Using a known Gaussian mixture should give a lower bound on what is achievable by any other method, and hence a lower bound on efficiencies We see that the Gaussian methods, with or without ν, are efficient Still true if we include intercept terms (α) However, all methods estimate µs efficiently 17
To see what happens with noisier data, we repeated the simulations with 10 × σs: 105×rmse Estimation method 2nd moments Pairwise differences Gaussian Gaussian (with ν ≡ 0) Known Gaussian mixture
βˆ2 579 2292 571 571 537
βˆ3 998 4742 979 979 909
102×rmse βˆ4 2145 8972 2128 2126 2071
σˆ 1 344 271 268 269 254
Gaussian methods are less efficient in this case
18
σˆ 2 507 393 392 396 344
σˆ 3 920 701 696 706 609
rmse σˆ 4 1389 1565 1382 1390 1345
µˆ 113 114 113 113 113
Summary Combining multiple laser scans of cDNA microarrays can be formulated as a functional regression problem Maximum likelihood cannot be used to fit such models, even though in ≥ 2 dimensions there is sufficient information in the data to estimate all parameters However, we have shown that fitting a seemingly inappropriate Gaussian structural model leads to efficient estimators in our region of parameter space Matching second moments and using pairwise differences are less efficient, though better than several other methods we have used For further details, see paper on http://www.bioss.ac.uk/~chris 19