Trajectory Modeling by Shape

Report 1 Downloads 108 Views
Trajectory Modeling by Shape Nicholas P. Jewell Departments of Statistics & School of Public Health (Biostatistics) University of California, Berkeley San Diego Chapter American Statistical Association Pfizer, La Jolla October 16, 2013 1

Thanks •  Joint work with Brianna Heggeseth (Williams College)

References •  Heggeseth, BC and Jewell, NP. The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference: an application to longitudinal modeling. Statistics in Medicine, 2013, 32, 2790-2803 . •  Heggeseth, BC and Jewell, NP. Clustering trajectories by shape. To appear.

“Understanding our world requires conceptualizing the similarities and differences between the entities that compose it” Robert Tryon and Daniel Bailey, 1970

3

How does BMI change with age?

National Longitudinal Study of Youth (NLSY) from 1979 - 2008. 4

How does BMI change with age?

National Longitudinal Study of Youth (NLSY) from 1979 - 2008. 5

How does BMI change with age?

National Longitudinal Study of Youth (NLSY) from 1979 - 2008. 6

Typical Longitudinal Analysis •  Use Generalized Estimating Equations (GEE) to estimate the mean outcome, and how it changes over time, adjusting for covariates   regression parameter estimation is consistent despite potential covariance misspecification   efficiency can be gained through use of a more appropriate working correlation structure   robust (sandwich) standard error estimators available •  But, with a heterogeneous population,   BMI does not change much for some people as they age   BMI changes considerably for some people as they age •  We don’t wish to average out these separate trajectories by modeling the mean over time 7

Finite Mixture Models •  Data for n individuals:

yi = (yi1 , . . . , yim ) measured at times

ti = (ti1 , . . . , timi ) •  We assume K latent trajectories in the population that are distributed with frequencies: ⇡1 , . . . , ⇡K

f (y|t, ✓) = ⇡1 f (y|t,

where ⇡k > 0 and

1 , ⌃1 )

⌃K k=1 ⇡k = 1 .

+ · · · + ⇡K f (y|t,

K , ⌃K )

f (y|t, k , ⌃k ) , a multivariate and covariance ⌃k .

•  The (conditional) mixture density is Gaussian with mean

µk

•  In most trajectory software, (conditional) independence is assumed as a working correlations structure:

(⌃k =

✓ = (⇡1 , . . . , ⇡K ;

2 k I). 1, . . . ,

K ; ⌃1 , . . . , ⌃K )

8

Finite Mixture Models •  The mean vector   Linear:   Quadratic:

µk

is related to the observation times as follows:

(µk )j =

0

(µk )j =

+

1 tij

0+

1 tij +

2 t 2 ij

  Splines in observation times where the regression model (and coefficients) are assumed the same for each cluster, and tij is the jth observation for the ith individual where 1  j  mi

9

Finite Mixture Models •  Group membership:

exp( k z) ⇡k = K ⌃j=1 exp( j z)

Z is set of same or different covariates This expands ✓ to include the

s also

10

Estimation for Mixture Models •  Maximum likelihood estimation for θ via the EM algorithm

•  K is pre-specified; can be chosen using the BIC •  Parameter estimators are not consistent under covariance misspecification (White, 1982; Heggeseth and Jewell, 2013). •  Robust (sandwich) standard error estimators are available. •  How bad can the bias in regression estimators be? What influences its size?

11

Mispecified Covariance Structure Bias and Separation of Trajectories •  Separated components lead to little bias even when you wrongly assume independence.

Black dashed -- true means, Solid lines – estimated means ˆ I ( 01 ) = 0.01, SE ˆ R ( 01 ) = 0.01 12 ˆ I ( 01 ) = 0.02, SE ˆ R ( 01 ) = 0.06 SE SE

Mispecified Covariance Structure Bias and Level of Dependence •  Components with little dependence lead to small bias even when you wrongly assume independence.

Black dashed -- true means, Solid lines – estimated means

ˆ I( SE

01 )

ˆ R( = 0.02, SE

01 )

= 0.06

ˆ I( SE

01 )

ˆ R( = 0.03, SE

01 )

= 0.04

13

NLSY Data Analysis

Covariance makes a difference to the trajectories   hard to estimate bias from mispecified covariance

14

How Do We Group These Blocks?

15

Group by Color

16

Group by Shape

17

How Do We Group These Blocks?

18

Group by Color or Shape

19

How Do We Group These (Regression) Lines?

20

y

15

10

5

2

4

6 x

8

10

20

Group by Intercept

20

y

15

10

5

2

4

6

8

10

x

21

Group by Level

20

y

15

10

5

2

4

6

8

10

x

22

Group by Shape (Slope)

20

y

15

10

5

2

4

6 x

8

10

23

Simulated Data

Y

15

10

5

0 2

4

6

8

10

Time

How could we group these individuals?

24

Simulated Data

15

Y



10

● ●

● ●

5

0 2

4

6

8

10

Time

How could we group these individuals?

25

Real Longitudinal Data •  Center for the Health Assessment of Assessment of Mothers and Children of Salinas (CHAMACOS) Study   In 1999-2000, enrolled 601 pregnant women in agricultural Salinas Valley, CA.   Mostly Hispanic, agricultural workers.   Determine if exposure to pesticides and other chemicals impact children's growth patterns (BMI, neurological measures etc_. •  First, focus on studying/estimating the growth patterns of children. •  Second, determine if early life predictors are related to the patterns   pesticide exposure   BPA (bisphenol A)

26

CHAMACOS Data 40

35

BMI

30

25

20

15

10 20

40

60

80

100

120

Age in months

How could we group these individuals?

27

Cluster Analyses •  Clustering is the task of assigning a set of objects into groups so that the objects in the same group are more similar to each other than to those in other groups. •  What does it mean for objects to be more similar or more dissimilar?   Distance matrix •  Why do we cluster objects?

28

Standard Clustering Methods •  Partition methods   Partition objects into K groups so that an objective function of dissimilarities is minimized or maximized.   Example: K-means Algorithm •  Model-based methods   Assume a model that includes a grouping structure and estimate parameters.   Example: Finite Mixture Models

29

K-means algorithm •  Input: Data for n individuals in vector form. For individual i , the observed data vector is

yi = (y1i , . . . , yim ). •  Measure of Dissimilarity: Squared Euclidean distance. The dissimilarity between the 1st and 2nd individuals is

d(y1

y2 ) = ky1

y2 k2 = (y11

y12 )2 + · · · + (yim

y2m )2

30

K-means Algorithm •  Goal: Partition individuals into K sets

C = {C1 , C2 , . . . , CK } so as

to minimize the within-cluster sum of squares

⌃K k=1 ⌃yi 2Ck kyi where

µk

µk k2

is the mean vector of individuals in Ck.

(K must be known before starting K-means. There are many ways to choose K from the data that try to minimize the dissimilarity within each cluster while maximizing the dissimilarity between clusters: for example, the use of silhouettes.) 31

Application to Simulated Data K−means

Y

15

10

5

0 2

4

6 Time

8

10

32

Application to Simulated Data K−means

Y

15





















10

5

0 2

4

6

8

10

Time

How would you describe—interpret—the group trajectories?

33

Finite Mixture Model Applied to CHAMACOS Data Mixture Model with Independence 40

35

BMI

30

25

20

15

10 20

40

60

80

Age in months

100

120

34

Finite Mixture Model Applied to CHAMACOS Data Mixture Model with Independence 40

35

BMI

30

25

20

15

10 20

40

60

80

Age in months

100

120

35

Finite Mixture Model Applied to CHAMACOS Data Mixture Model with Exponential 40

35

BMI

30

25

20

15

10 20

40

60

80

Age in months

100

120

36

Clustering by Shape •  Interested in shape not just level (which appears to dominate clustering techniques) • 

Want a method that:   Works with irregularly sampled data   Includes a way to estimate the relationship between baseline risk factors and group membership   Groups individuals according to the outcome pattern over time ignoring the level

37

Clustering by Shape Options •  Estimate slopes between neighboring observations and cluster on the “derived” observations •  Fit splines for each individual, differentiate, and cluster on coefficients of resulting derivative •  Use partition based cluster methods (like PAM) but use (i) the Pearson coefficient as a distance or dissimilarity measure

dcorr (x, y) = 1

Corr(x, y)

or the cosine-angle measure of dissimilarity

dcos (x, y) = 1

⌃m j=1 xj yj 2 m 2 (⌃m j=1 xj )(⌃j=1 yj )

•  Vertical shifting individual trajectories 38

Vertical Shifting •  For each individual, calculate

yi⇤

= yi

mi

1

mi ⌃j=1 yij

•  Each individual now has mean zero and so level is removed from any resulting clustering •  Apply clustering technique to shifted data, e.g. finite mixture model

39

Correlation Models for Vertical Shifted Data •  Without specifying group, suppose

⇤ yi

=

where

i Imi

Imi

is an

+ µi + ✏i ,

⇠ F , ✏ ⇠ N (), ⌃)

mi length vector of 1s, and

µij = µk (tij ) is the j

th

element of the vector of mean values for

the kth group evaluated at the observation times ti . Thus,

⇤ yi

= Ai yi = µi

µ¯i + ✏i

✏¯i 40

Correlation Models for Vertical Shifted Data ⇤ Cov(Yi

µi ) = Cov((A

Imi )µi + A✏)

Two components of the covariance   One induced by the averaging process   One induced by (random) observation times

41

Correlation Models for Vertical Shifted Data Observation Times Fixed

Cov(Y⇤

µ) = A⌃AT

suppressing the individual/group indices for simplicity (Σ is allowed to vary across clusters)

This covariance matrix is singular since

det(A) = 0

This naturally reflects the “loss” of one dimension 42

Correlation Models for Vertical Shifted Data Observation Times Fixed

Cov(Y⇤ •  If

⌃=

2

I

µ) = A⌃AT

(conditional independence with constant variance,

then the induced covariance is exchangeable with negative correlation given by

1/(m

1) and variance decreases to

2

m 1 ) m

•  If original covariance is exchangeable with constant variance and correlation ρ then the induced covariance remains exchangeable with negative correlation and reduced variance

2

(1

⇢)

m 1 ) m

43

Correlation Models for Vertical Shifted Data Observation Times Fixed

Cov(Y⇤ If

⌃=

2

I

µ) = A⌃AT

(conditional independence with constant variance,

then the induced covariance is exchangeable with negative correlation m 1 given by 1/(m 1) and variance decreases to 2 m ) This induced exchangeable correlation is the lower bound for correlation in an exchangeable matrix Thus, if “estimated”, the (true) parameter is on the boundary of the parameter space

44

Correlation Models for Vertical Shifted Data Observation Times Random (µ is random) Cov(Y⇤

µ) = m

2

0 2 T T ⌃m V ar(t )[µ (E(t ))] 11 + A⌃A j j j=1

Sum of two non-invertible matrices, but the positive magnitude of the first matrix may counteract the negative correlations of the second.

45

Correlation Models for Vertical Shifted Data Observation Times Random (µ is random) Cov(Y⇤

µ) = m

500 simulations of

2

0 2 T T ⌃m V ar(t )[µ (E(t ))] 11 + A⌃A j j j=1

Yi = µi + ✏i

✏i ⇠ N (0, ⌃⇢ )

where the error covariance matrix is of exponential form with range ρ

t=T+⌧

⌧ ⇠ N (0,

2 ⌧ I)

T = (1, 2, . . . , 9, 10)

µij = µ(tij ) 46

Correlation Models for Vertical Shifted Data

47

Simulated Data

Y

15

10

5

0 2

4

6

8

10

Time

How could we group these individuals?

48

Application to Simulated Data K−means

Y

15





















10

5

0 2

4

6

8

10

Time

How would you describe—interpret—the group trajectories?

49

Vertical Shifting Applied to Simulated Data Vertically Shifted Mixture Model with Exponential

Y

15

10

5

0 2

4

6 Time

8

10

50

Vertical Shifting Applied to Simulated Data Vertically Shifted Mixture Model with Exponential

Y

15

10

5

0 2

4

6 Time

8

10

51

500 Simulations

µ1 (t) =

1

t

µ2 (t) = 11

t

µ3 (t) = 0 µ4 (t) =

negative slope, low level negative slope, high level zero slope middle level

11 + t

µ5 (t) = 1 + t

positive slope, low level

positive slope, high level

Mean functions evaluated at five equidistant points that span [1,10} Including ends of the interval 52

500 Simulations

µ1 (t) =

1

t

µ2 (t) = 11

t

µ3 (t) = 0 µ4 (t) =

negative slope, low level negative slope, high level zero slope middle level

11 + t

µ5 (t) = 1 + t

positive slope, low level

positive slope, high level

Two components to noise: random individual level perturbationN (0, 2 ) random measurement error across times (exchangeable correlation)

N (0, 53

2 ✏)

500 Simulations

54

Vertical Shifting with CHAMACOS Vertically Shifted Mixture Model with Exponential 40

35

BMI

30

25

20

15

10 20

40

60

80

100

120

Age in months

55

Further Thoughts •  Interest often focuses on the role of explanatory covariates as well as the trajectories themselves:   two-part models: (i) model for overall level (ii) model for shape trajectory

•  Time-dependent covariates

56