Genome segmentation using piecewise constant intensity models and ...

Report 2 Downloads 60 Views
Vol. 18 Suppl. 2 2002 Pages S211–S218

BIOINFORMATICS

Genome segmentation using piecewise constant intensity models and reversible jump MCMC Marko Salmenkivi 1, Juha Kere 2 and Heikki Mannila 1 1 HIIT

Basic Research Unit, Department of Computer Science, University of Helsinki, PO Box 26, FIN-00014 Helsinki, Finland and 2 Molecular Genetics, Department of Biosciences at Novum, Karolinska Institute, S-141 57 Huddinge, Sweden

Received on April 8, 2002; accepted on June 15, 2002

ABSTRACT The existence of whole genome sequences makes it possible to search for global structure in the genome. We consider modeling the occurrence frequencies of discrete patterns (such as starting points of ORFs or other interesting phenomena) along the genome. We use piecewise constant intensity models with varying number of pieces, and show how a reversible jump Markov Chain Monte Carlo (RJMCMC) method can be used to obtain a posteriori distribution on the intensity of the patterns along the genome. We apply the method to modeling the occurrence of ORFs in the human genome. The results show that the chromosomes consist of 5–35 clearly distinct segments, and that the posteriori number and length of the segments shows significant variation. On the other hand, for the yeast genome the intensity of ORFs is nearly constant. Contact: [email protected]; [email protected]; [email protected]

INTRODUCTION When whole genomes of organisms are available, it becomes possible to model their global structure. A relatively simple question is: given a class of discrete patterns that occur in various parts of the genome, how are the occurrences distributed? Is the distribution uniform, or is there significant evidence of clear segments? More generally, the occurrence of patterns in the genome can be considered as an example of an event sequence. Intuitively, such a sequence consists of pairs (e, t), where t is the occurrence location of the pattern or event, and e is the type of event. For the genomic application, the variable t is the location along the chromosome, and e is the type of the pattern. Sequences of events occur also in other applications, where the attribute t typically is the occurrence time of the event. Examples include telecommunication network management, web access logs, and biostatistics (survival data) (see Mannila et al. (1997) for examples). c Oxford University Press 2002 

Stochastically, a sequence consisting of (conditionally) independent events can be seen as being produced by a Poisson process, which has some intensity describing the frequency of events (Papoulis, 1991). Location (or time) dependent intensity functions can be used in modeling the frequencies of occurrences of patterns and the interactions between occurrences of different types of patterns. Segmentation of sequences can be done in various ways. Dynamic programming approaches have been used in many different application areas, such as line segment approximation (Bellman, 1961), phoneme recognition (Xiong et al., 1994), and paleoecology (Bennett, 1996). The problem with the dynamic programming methods is that, as maximum likelihood methods, they always provide a segmentation with a given number of segments, whether the data supports one or not. This can lead to spurious or downright misleading results, unless care is taken to control carefully for the significance of the output. In this paper we use Bayesian modeling with reversible jump Markov chain Monte Carlo (RJMCMC) (Green, 1995) simulation methods for finding posteriori distributions of piecewise constant intensity functions for the data. While the models used to describe data are piecewise constant, the posterior distribution can have arbitrarily complex form. This means that we can by simple inspection of the posterior distribution determine fairly accurately whether there are segments or not in the data. The running time of the algorithm is O(n + I ), where n is the number of occurrences of events and I is the number of iterations. That is, each iteration can be done in constant time given linear preprocessing. The Bayesian approach to segmentation has been used before; see, e.g. Booth and Smith (1992). Liu and Lawrence (1999) describe a general Bayesian approach to finding changepoints of the distribution of letters along a sequence, i.e. the underlying likelihood is multinomial. Computationally, they use dynamic programming approaches to speed up the Bayesian analysis. A further application of Bayesian techniques for DNA segmentation is given in Ramensky et al. (2000). S211

M.Salmenkivi et al.

Compared to Liu and Lawrence (1999), we use in this paper as the underlying model the Poisson intensity models. Computationally we apply the MCMC approach to estimate the posterior probabilities on the whole space of segmentations. Especially, the use of RJMCMC methods makes it possible to study segmentations containing different numbers of segments. We apply the method to modeling the occurrence of ORFs along the human genome. The results show that the chromosomes typically consist of relatively distinct segments, and that the posteriori number and length of the segments show significant variation. On the other hand, for the yeast genome the intensity of ORFs is remarkably constant. The results are quite independent of the choice of the prior probabilities for the number of piecewise constant components. We also compare our approach to dynamic programming methods. It is straightforward to see that if the best fitting piecewise constant function—instead of the posteriori distribution of functions—is to be computed, the optimal solution can be found in time O(n 2 k), for n occurrences of events, and k segments, by using dynamic programming. The resulting piecewise constant intensity functions correspond closely to the posterior averages produced by the MCMC methods, giving further support to the robustness of the results. While in this paper we apply our methods only to ORF location data, the methods can be applied to any set of discrete events along the genome. The rest of this paper is organized as follows. We first introduce the concepts of event sequence and intensity function. We then describe the Bayesian modeling approach and reversible jump Markov chain Monte Carlo simulation methods. Next, the methods are applied to analyzing the distribution of ORF occurrences of the human and yeast genomes. Finally, we briefly conclude the issues.

INTENSITY MODELS FOR EVENT SEQUENCE DATA Consider an event sequence consisting of events of a single event type, and consider a fixed starting time for observing the process. Let T be the random variable indicating the waiting time for the next event. Denote by G(t) = Pr ob(T ≤ t) the distribution function of T . The interpretation of G is the probability that (after the starting time some time ts ) the waiting time before the next occurrence is not longer than t. Assume that the density function of G exists, denote it by g(t). The function G(t) = 1−G(t) expresses the probability that the waiting time is longer than t; then G(t) is called a survival function. The intensity function λ(t) is defined as follows S212

(Arjas, 1989; Papoulis, 1991): λ(t) =

g(t) G(t)

The intensity function λ(t) can be interpreted as the ‘immediate’ probability of occurrence at time t, given again that no event occurred before t. Thus λ(t)dt expresses the probability of exactly one event during a ‘short’ time interval dt. The intensity function uniquely determines the distribution function G(t). On the other hand, λ(t) can also be interpreted as the expected number of occurrences per time unit. Assume that events of an event type occur at a constant intensity λc . t Then the survival function G(t) = exp (− 0 λ(s)ds) = e−λc t , and the distribution function of the time intervals between the occurrences (T ) is G(t) = 1 − e−λc t . This is the distribution function of the exponential distribution; thus the time intervals T ∼ Exp(λc ). Piecewise constant representations In this paper we use piecewise constant functions for modeling intensities, i.e. the functions λ(t) are assumed to have the following form:  λ1 if Ts ≤ t < c1 ,    λ   2 if c1 ≤ t < c2 , .. .. λ(t) = . .    if ci−1 ≤ t ≤ Te , λ  i  0 elsewhere

Here {Ts , Te } ∈ R are the start and end times of the observation period, and the values {λ1 , . . . , λi } ∈ R+ are the intensity values in i pieces, and {c1 , . . . , ci−1 } ∈ [Ts , Te ] are the change points of the function. There are several reasons for using piecewise constant functions. Piecewise constant functions are relatively simple. The arithmetic operations between such functions result piecewise constant functions. These operations as well as integration can be done easily and efficiently. Such functions can in a flexible way be used in subtle data analysis and complex statistical modeling, and they still are easy to interpret. Poisson likelihood Assume that an intensity function λe (t) is defined in time range [Ts , Te ], and consider an event sequence Se containing only events of a single event type e: Se = {(e, t1 ), . . . , (e, tn )}, where ti ∈ [Ts , Te ] for all i = 1, . . . , n. Then the Poisson likelihood of the data Se , i.e. the likelihood when the intervals between the occurrence times Tk = tk − tk−1 are supposed to be exponentially distributed, is given by   tj n  λe (t) dt) λe (t j ) · exp(− L(Se | λe ) = j=1

t j−1

Genome segmentation using intensity models and MCMC

 · exp (−

Te

λe (t) dt),

tk

where t0 = Ts . Because integration is done over the whole time range [Ts , Te ], the likelihood can also be written in the form

 Te  n λe (t) dt · λe (t j ). (1) L(Se | λe ) = exp − Ts

j=1

BAYESIAN APPROACH AND MCMC METHODS FOR FINDING PIECEWISE CONSTANT INTENSITY MODELS Bayesian analyses typically writes the joint probability of data Y and parameters θ as the product P(θ , Y ) = P(θ)P(Y |θ) of the prior probability P(θ) of the parameters and the likelihood P(Y | θ ) of the data given the parameters. Bayes rule follows by conditioning the joint probability distribution on the known data; it gives the posterior distribution of θ : P(θ |Y ) =

P(θ)P(Y |θ) P(Y )

As the probability of the data P(Y ) is not dependent on θ, the posterior distribution is proportional to the product P(θ )P(Y | θ ) of the prior distribution and the likelihood. The basic idea in Markov chain Monte Carlo (MCMC) (Gilks et al., 1996) methods is to approximate the target distribution f (θ ) = P(θ)P(Y | θ)/P(Y ) by constructing a Markov chain which has f as the equilibrium distribution. Denote by T (θ, θ  ) the probability of moving from state θ to state θ  in the chain. The reversibility condition states that for all states θ and θ  we have f (θ )T (θ, θ  ) = f (θ  )T (θ  , θ ).

(2)

Intuitively, this means that the flow from θ to θ  will be the same as the flow from θ  to θ. If this holds, then the chain has f as the equilibrium distribution (assuming the technical conditions of aperiodicity and irreducibility, Guttorp (1995)). To move from state to state one uses a proposal distribution g(θ, θ  ) which gives the probability of proposing a move from state θ to state θ  . This move is accepted with probability

f (θ  ) q(θ  , θ) α(θ, θ ) = min 1, . f (θ) q(θ, θ  ) This enforces the reversibility condition one (see Gilks et al. (1996) for details).

The Metropolis–Hastings algorithm starts from a random state θ, repeatedly draws candidate states θ  and accepts them with the above probability. It is straightforward to show that the method gives a Markov chain whose equilibrium distribution is f . Note that in order to run the simulation we only need to be able to determine the quotient f (θ  ) P(θ  )P(Y | θ  )/P(Y ) P(θ  )P(Y | θ) = = , f (θ ) P(θ )P(Y | θ )/P(Y ) P(θ  )P(Y | θ  ) and the probabilities P(Y ) cancel i.e. the distribution must be known up to a constant. In case of the Bayesian posterior distribution, the posterior density function is proportional to the product of the prior and likelihood densities. Thus the Metropolis-Hastings algorithm can be applied provided these densities can be calculated. To prevent the choice of the initial state from distorting the empirical distribution of the states, a number of initial iterations should be run without storing the parameter values. In practice, the sufficient length of the initial period (so-called burn-in) may be difficult to determine. Few analytical results exist for the problem, but a large number of heuristic methods have been developed to assess the convergence of the Markov chain in practice (see e.g. Brooks (1998)). For the case of piecewise constant functions we want to consider functions having different numbers of pieces. Thus the number of parameters of a model is not fixed. Then deciding what the proposal distribution q(θ, θ  ) should be takes some care. Consider a situation where the dimension of the current state θ is m and a candidate state θ  of a higher dimension n, is generated. To ensure that both sides of the reversibility equation (Equation 2) have densities on spaces of equal dimension, it must hold that m +dim(u) = n. Here u is the vector of random numbers used when proposing the candidate state θ  in state θ . Let us denote by g the function that determines the candidate state in state θ , given u; i.e. θ  = g(θ, u). Then the acceptance rate of Metropolis-Hastings algorithm generalized by Green to state spaces of variable dimensions yields (Green, 1995):  f (θ  ) q(θ  , θ ) ∂ g(θ, u) α = min 1, f (θ ) q(θ  , θ) (∂θ ∂u) Here the last term is the Jacobian of the transformation of the random variables θ and u. The generalized algorithm is called the reversible jump Metropolis Hastings algorithm, or reversible jump Markov chain Monte Carlo (RJMCMC). For detailed description of variable dimension MCMC, see Green (1995) and Waagepetersen and Sorensen (2001). Piecewise constant functions provide a flexible way to represent intensity functions. Even if the real intensities S213

M.Salmenkivi et al.

are continuous, not step functions, they still can be represented accurately as averages of piecewise constant functions, provided we allow variable dimension, i.e. the number of segments in the functions is not fixed. Thus reversible jump MCMC methods are quite suitable for our needs. To use RJMCMC we need to specify proposal distributions that update the existing parameter vector. The updates are divided into two groups: those that maintain the dimension and those that do not. The proposal distributions that maintain the dimension can change the intensity of one segment, and modify the start or end time of a segment. The operations that modify the dimension can create a new segment by chopping an existing one into two, or merge two adjacent segments into one. For details of the proposal distribution, see Salmenkivi (2001), Section 3.2.5. We model the intensity λ(t) by using a piecewise constant function as described above on page 212. The parameters of the model are the number k of segments, the level of the function λ j and the change points c j for j = 1, . . . , k. These have to be provided prior distributions. An approximation for the posterior expectation of intensity can be computed from the simulation results in the following way. The intensity value λk (t j ), is detected for each pre-determined location (or time instant) t j and for each simulation round k. Let us denote the posterior distribution of the intensity function by f (λ, t). The Monte Carlo approximation of the posterior expectation of the intensity at t j is the average of the k intensity values at t j , i.e.  ∞ k 1 λ(t j ) f (λ, t j ) dλ ≈ λi (t j ) E f (λ(t j )) = k i=1 0 The convergence of the MCMC simulation is monitored by using the Gelman and Rubin diagnostic (Gelman and Rubin, 1992). For each proposed piecewise constant intensity function we have to be able to compute the likelihood (Equation 1). All events belonging to the same segment of the piecewise constant function contribute in the same way to the likelihood. Given a sequence S = {(e, t1 ), . . . , (e, tn )} we can compute in a linear preprocessing stage the value A(t) = |{(e, t j ) | t j < t, j = 1, . . . , n}| for each t = t1 , . . . , tn . Then the likelihood of a proposed function can be computed in constant time, and the running time of the algorithm is O(n + I ), where n is the number of occurrences of events and I is the number of iterations.

ORF DISTRIBUTION IN THE HUMAN GENOME We considered the location of ORFs on human chromosomes 1-22 obtained from ‘http://www.ensembl.org/’. (It should be noted that known factors, such as CG content, S214

can be used to explain the gene density. For brevity we discuss only the case of looking at the ORF locations in isolation.) We applied the above method for finding the posterior distribution of the intensity of occurrence of ORFs in the chromosomes. The goal was to find out whether there are significant differences between the chromosomes as to the number of segments in which they can be decomposed, to find out whether the segment boundaries are sharp, to investigate the effect of prior distribution on the results, and to search for connections with other types of segment structure on the genome. For the initial study, we use the following prior distributions. For the number k of segments we used k ∼ Geom(0.5), for the intensity levels λ j ∼ Gamma(0.02, 0.75), and for the change points c j ∼ Unif(0, Se ), where Se is the length of the DNA sequence in the chromosome. These priors are relatively uninformative in the sense that they do not pose hard restrictions on the allowed values. The prior on k indicates that solutions with fewer segments are preferred. The RJMCMC simulations were run for approximately 10 000 000 iterations with a burn-in period of 10 000 000 iterations. The parameter values were picked up at approximately every tenth iteration. Figure 1 shows the posterior average and deviation of the number of segments for different chromosomes, and as a function of the size of the chromosome and the number of ORFs in the chromosome. We see that there are clear differences between the chromosomes. For example, chromosomes 16 and 18 have about the same size, but the number of segments needed to model the distribution of ORFs on chromosome 16 is about twice the number of segments for chromosome 18. We also see that the differences are not explained by the sizes of the chromosomes or the number of ORFs in them. Figure 2 shows the posterior averages and deviations of the intensities of ORF occurrence frequency for chromosomes 1, 4, 10, 20, 21, and 22. We see that there are clear segment boundaries in the chromosomes, indicating that various parts of the chromosomes are qualitatively different. As an example, in chromosome 21 there are three major changepoints. For the most part, the intensity is constant, while the transition close to 40 Mb shows that the method can also describe nonconstant intensities. The results on the other chromosomes are similar (data not shown). To study the effect of the organism on the result, we analyzed also the yeast genome (Saccharomyces cerevisia) obtained from ‘http://www.mips.biochem.mpg.de/’. The intensity of ORFs in yeast chromosomes is remarkably constant, with the average number of pieces being close to 1; see Figure 3. We also experimented with the effect of the prior distribution on the segmentation and the number of pieces.

Genome segmentation using intensity models and MCMC

40 posterior mean and sd 35

nr of segments

30

25

20

15

10

5

0 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome

40

35

nr of segments

30

25

20

15

10

5 posterior mean and sd 0 0

500

1000

1500 nr of ORFs

2000

2500

3000

40

35

nr of segments

30

25

20

15

10

5 posterior mean and sd 0 0

5e+07

1e+08

1.5e+08 nr of basepairs

2e+08

2.5e+08

Fig. 1. (a) The posteriori average and standard deviation of the number of segments on human chromosomes 1–22. (b) The posteriori average and standard deviation of the number of segments on human chromosomes 1–22 as a function of the number of ORFs in the chromosome. (c) The posteriori average and standard deviation of the number of segments on human chromosomes 1–22 as a function of the size of the chromosome in base pairs.

S215

M.Salmenkivi et al.

Human chromosome 4

Human chromosome 1

0.04

0.07

posterior avg 0.5 % quantile 99.5 % quantile

posterior avg 0.5 % quantile 99.5 % quantile

0.06

0.03

0.04

intensity

intensity

0.05

0.03

0.02

0.02

0.01

0.01

0

0 0

50000

100000

150000

200000

0

250000

25000

50000

position in DNA sequence Human chrosome 10

0.06

75000 100000 125000 position in DNA sequence Human chromosome 20

150000

175000

0.07 posterior avg 0.5 % quantile 99.5 % quantile

posterior avg 0.5 % quantile 99.5 % quantile

0.06

0.05

0.05

intensity

intensity

0.04

0.03

0.04

0.03

0.02 0.02 0.01

0.01

0

0 0

25000

50000

75000

100000

125000

0

10000

position in DNA sequence Human chromosome 21

20000

30000

40000

50000

60000

position in DNA sequence Human chromosome 22

0.04

0.06 posterior avg 0.5 % quantile 99.5 % quantile

posterior avg 0.5 % quantile 99.5 % quantile 0.05

0.03

intensity

intensity

0.04

0.02

0.03

0.02 0.01 0.01

0

0 0

10000

20000 30000 position in DNA sequence

40000

0

10000

20000 30000 position in DNA sequence

40000

Fig. 2. Posterior averages and 99 % intervals of the intensities of ORF occurrence frequency for human chromosomes 1, 4, 10, 20, 21, and 22. Units on the x axis: 1000 bp.

Figure 4 shows the results. They indicate that the prior distribution does not have a large influence on the resulting segmentations; the number of segments is about the same for k ∼ Geom(0.1) and for k ∼ Geom(0.9). Further experiments (data not shown) indicated that the other priors had even less influence on the results. Figure 5 shows the maximum likelihood segmentation for human chromosomes 3, 9, 13, and 19 produced by a straightforward dynamic programming algorithm (Mannila and Salmenkivi, 2001). The number of segments S216

used was fixed by using the MCMC results. We see that the Bayesian analysis results are quite close to the maximum likelihood solutions. Again, the results are similar for the other chromosomes.

DISCUSSION We have described a method for segmenting genomes using a Bayesian technique for finding piecewise constant intensity models that describe the occurrences of patterns in the sequence. The running time of the method is linear

Genome segmentation using intensity models and MCMC

Yeast

0.01

0.01

posterior avg (chromosome 8)

posterior avg (chromosome 15)

posterior avg (chromosome 13) 0.008

0.008

0.006

0.006

0.006

0.004

0.002

intensity

0.008

intensity

intensity

Yeast

Yeast

0.01

0.004

0.002

0.002

0

0

0

0

10000

20000 30000 40000 position in DNA sequence

50000

0.004

0

10000 20000 30000 40000 50000 60000 70000 80000 90000

0

20000

position in DNA sequence

40000 60000 80000 position in DNA sequence

100000

Fig. 3. Posterior averages of the intensities of ORF occurrence frequency for yeast chromosomes 8, 13, and 15. Human chromosome 20 25

20

20

nr of segments

nr of segments

Human chromosome 10 25

15

10

5

10

5

0

0 0.1

0.2

0.3 0.4 0.5 0.6 0.7 Prior: nr of segments ~ Geom(p) Human chromosome 21

0.8

0.9

25

25

20

20

nr of segments

nr of segments

15

15

10

5

0.1

0.2

0.3 0.4 0.5 0.6 0.7 Prior: nr of segments ~ Geom(p) Human chromosome 22

0.8

0.9

0.1

0.2

0.3 0.4 0.5 0.6 0.7 Prior: nr of segments ~ Geom(p)

0.8

0.9

15

10

5

0

0 0.1

0.2

0.3 0.4 0.5 0.6 0.7 Prior: nr of segments ~ Geom(p)

0.8

0.9

Fig. 4. Posterior averages and deviations of the number of segments as a function of the prior parameter p for human chromosomes 10, 20, 21, and 22.

in the number of occurrences of events and in the number of iterations. In practice, the reversible jump MCMC method converges reasonably fast, although the technique, of course, is computationally intensive. We have applied the method to the occurrences of ORFs in human and yeast sequences. The results show that there are clear segments in the human genome, whereas the yeast genome is quite uniform. The biological significance of the segments is not clear; there are some indications that the segments might be connected to human-mouse synteny maps. As an example, in chromosome 21 two out of the three maps changepoints coincide with seg-

ment boundaries in the human-mouse comparative maps (12.4 Mb and 39.9–41.7 Mb). Similarly, the chromosome 21 segments corresponds to some extent with the isochores structure (Pavlice et al., 2002). As discussed above, the method applies to occurrences of any type of patterns in sequences. One interesting further point of development is investigating the relationship of intensities between several types of events. The intensity modeling approach could be used also in this case, but it is not clear what are the best and biologically relevant interaction mechanisms between the occurrences of different types of events. S217

M.Salmenkivi et al.

0.04

0.045 posterior avg (chromosome 3) optimal 20 pieces

posterior avg (chromosome 9) optimal 15 pieces

0.035

0.035

0.03

0.03

0.025 intensity

intensity

0.04

0.025 0.02

0.02 0.015

0.015

0.01

0.01

0.005

0.005

0

0 0

50000

100000 150000 position in DNA sequence

0

200000

20000

40000

60000

80000

100000

120000

position in DNA sequence

0.03

0.035 posterior avg (chromosome 13) optimal 10 pieces

posterior avg (chromosome 20) optimal 14 pieces 0.03

0.025

0.025 intensity

intensity

0.02

0.015

0.02 0.015

0.01 0.01 0.005

0.005

0

0 0

20000

40000 60000 80000 position in DNA sequence

100000

0

10000

20000 30000 40000 50000 position in DNA sequence

60000

Fig. 5. Posterior averages of the intensities (MCMC) of ORF occurrence frequency for human chromosomes 3, 9, 13, and 19. Optimal intensity functions with a fixed number of pieces (dynamic programming).

REFERENCES Arjas,E. (1989) Survival Models and Martingale Dynamics. Scandinavian Journal of Statistics, 16, 177–225. Bellman,R. (1961) On the approximation of curves by line segments using dynamic programming. Communications of the ACM, 4, 384. Bennett,K.D. (1996) Determination of the number of zones in a biostratigraphical sequence. New Phytol., 132, 155–170. Booth,N.B. and Smith,A.F.M. (1992) A Bayesian approach to retrospective identification of change-points. J. Econometr., 19, 7–22. Brooks,S.P. (1998) Markov chain Monte Carlo and its applications. Statistician, 47, 69–100. Gelman,A. and Rubin,D. (1992) Inference from iterative simulation using multiple sequences. Statistical Sci., 7, 473–511. Gilks,W., Richardson,S. and Spiegelhalter,D. (eds) (1996) Markov Chain Monte Carlo in Practice. Chapman and Hall, London. Green,P. (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732. Guttorp,P. (1995) Stochastic Modeling of Scientific Data. Chapman and Hall, London. Liu,J.S. and Lawrence,C.E. (1999) Bayesian inference on biopolymer models. Bioinformatics, 15, 38–52. Mannila,H. and Salmenkivi,M. (2001) Finding simple intensity

S218

descriptions from event sequence data. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01). San Francisco, pp. 341– 346. Mannila,H., Toivonen,H. and Verkamo,A.I. (1997) Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1, 259–289. Papoulis,A. (1991) Probability, Random Variables, and Stochastic Processes. McGraw-Hill. Pavlicek,A., Paces,J., Clay,O. and Bernardi,G. (2002) A compact view of isochores in the draft human genome sequence. FEBS Lett., 511, 165–169. Ramensky,V.E., Makeev,V.J., Roytberg,M. A. and Tumanyan,V.G. (2000) DNA segmentation through the Bayesian approach. J. Comput. Biol,, 7 (1), 215–231. Salmenkivi,M. (2001) Computational Methods for Intensity models, Ph.D. Thesis, University of Helsinki, Department of Computer Science, Series A-2001-2, http://www.cs.Helsinki.FI/u/ salmenki/publications/Ph.Thesis.ps Waagepetersen,R. and Sorensen,D. (2001) A tutorial on Reversible Jump MCMC with a view toward applications in QTL-mapping. International Statistical Review, 69, 49–62. Xiong,Z., Herley,C., Ramchandran,K. and Orchard,M. (1994) Flexible time segmentations for time-varying wavelet packets. In IEEE Proceedings of the International Symposium on TimeFrequence and Time-Scale Analysis. Philadelphia.