Methods for Quantifying the Causal Structure of Bivariate Time Series M. Lungarella∗,1 , K. Ishiguro∗, Y. Kuniyoshi∗, and N. Otsu∗,∗∗ ∗
Intelligent Systems and Informatics Laboratory Department of Mechano-Informatics, University of Tokyo 113-8656 Tokyo, Japan ∗∗ National Institute of Advanced Industrial Science and Technology 305-8568 Tsukuba, Japan Abstract In the study of complex systems one of the major concerns is the detection and characterization of causal interdependencies and couplings between different subsystems. The nature of such dependencies is typically not only nonlinear but also asymmetric and thus makes the use of symmetric and linear methods ineffective. Moreover, signals sampled from real world systems are noisy and short, posing additional constraints on the estimation of the underlying couplings. In this article, we compare a set of six recently introduced methods for quantifying the causal structure of bivariate time series extracted from systems with complex dynamical behavior. We discuss the usefulness of the methods for detecting asymmetric couplings and directional flow of information in the context of uni- and bidirectionally coupled deterministic chaotic systems. Key words: causal structure, nonlinear time series analysis, coupled systems, information flow 1 Corresponding author. Tel/Fax: +81-3-5841-6388 E-mail address:
[email protected] 1
1 Introduction A fundamental issue in the study of complex systems is the identification and quantification of causal relationships among the parts of a dynamical system. The rationale being that the detection of causal structure - measured in terms of which parts of the system drive which other parts - not only furthers the understanding of the system’s internal dynamics, but also simplifies the design of control strategies for its spatio-temporal evolution (e.g. by exploiting redundant information to reduce the number of measurement channels). Unfortunately, in most natural as opposed to laboratory settings, we are not granted direct access to the dynamics of the physical processes underlying the system, but possess only a set of simultaneously recorded (typically multivariate) variables. The problem can thus be mapped onto measuring to what extent the time series corresponding to such variables contribute to the generation of information, and at what rate they exchange information. In other words, given a set of time series observations, is it possible to assess if they originate from coupled or decoupled systems and to detect the hidden causal dependencies between them? A popular method for disclosing relationships between time series extracted from coupled systems is cross-correlation (or the covariance index). Closely related methods are the cross-spectrum and the coherency-spectrum which measure dependencies in the frequency domain [Priestly, 1989]. The higher the values of these indices, the higher the degree of correlation or linear coupling between the time series observations. The biggest problem with such methods is that linear dependencies are rarely present in real time series sampled from (typically) nonlinear systems. Moreover, they cannot discriminate between unidirectional and bidirectional interactions. Another (special) kind of functional interdependence between coupled systems, which basic linear methods are not able to detect, is synchronization [Parlitz and Kocarev, 1999; Pikovsky et al., 2001], e.g. two coupled periodic oscillators synchronize if their frequencies become the same or lock with a rational ratio. One difficulty is the assessment of causal relationships between synchronized coupled systems in the sense of which system is the driver and which is the responder. Methods to deal with this issue have been proposed, e.g. time lagged phase synchronization [Rybski et al., 2003] and time delay estimation based on coherence maximization [Govidan et al., 2005]. A mathematically general and statistically rigorous approach to the detection of interdependence between time series is mutual information. It is a nonparametric and model-free function which depends on all higher order moments of the probability distribution. Mutual information does not suffer from the restric2
tions of cross-correlation (which is a second order method), because it is sensitive to all higher order correlations, both linear and nonlinear. Yet mutual information, as a symmetrical measure, cannot detect directional flow of information and causal relationships unless one of the time series is time delayed. Although the introduction of a time delay is an important improvement, the resulting measure is still affected by a set of problems. One important problem is that its estimation imposes substantial requirements on the amount and the quality of the data (especially if the expected causal effect is small) calling for a large quantity of noise-free stationary data – a condition rarely met in real world situations: experimental signals are typically short and likely to be contaminated by measurement noise. Practical analyses are further complicated by finite sample effects that make the assessment of the significance of the obtained values difficult [e.g. Palus and Stefanovska, 2003; Roulston, 1999]. A second issue is that a priori causal relationships might exist without detectable time delays, and conversely delays might exist which do not reflect the na¨ıvely expected causal structure in the time series. A last problem – affecting also other interdependence measures, e.g. synchrony detectors – relates to the fact that time lagged mutual information does not explicitly discriminate between information actually exchanged between two systems from information due to the influence of an unobserved (hidden) third common signal or system (we refer to [Pearl, 1988] for a discussion in the context of Bayesian nets of how genuine causal dependencies can be distinguished from spurious correlations coordinated by some common, often unknown, causal mechanism). In this paper, we review some recently introduced measures for detecting asymmetric (causal) interdependences between time series which are aimed at overcoming some of the difficulties described above. We discuss the usefulness of the reviewed measures for detecting nonlinear dependencies as well as causal structure in the context of uni- and bidirectionally coupled deterministic chaotic systems with known patterns of interaction. In the following section, we give some mathematical background and introduce six recent measures for detecting asymmetries and estimating their strength: transfer entropy, extended and nonlinear Granger causality, predictability improvement, and two similarity indices. We then describe the experimental models used to compare the estimators of causal structure and report the results of our simulations. In the last section, we discuss our results in the light of the insights gained from our simulation studies.
3
2 Methods In this section we give the mathematical background for the estimators of dependency used in this article. We first outline some formalism and then describe the methods individually. All methods were implemented using standard Matlab procedures on the same computer equipped with a Pentium P4 (3.2GHz) processor and 2GB of memory.
2.1 Notation Given a complex system, it is often the case that the amount and direction of the coupling between its elements as well as their internal dynamics are a priori unknown. There is also no knowledge of whether the system is deterministic or stochastic. Here, we assume that all we can do is to measure two or more variables observed separately from different subsystems of the complex system. For simplicity, let us consider the bivariate case, and let X = (x1 , x2 , ..., xN ) and Y = (y1 , y2 , ..., yN ) denote two different time series of N simultaneously measured scalar quantities. The state spaces X and Y corresponding to the two time series are reconstructed by the method of time delay embedding [Packard et al., 1980; Takens, 1981] in which the state (or embedding) vectors are formed by delayed ”past” scalar observations of the time series. The m-dimensional embedding vector is defined as xm,τ , xt , xt−τ , xt−2τ , . . . , xt−(m−1)τ t
T
(1)
where m is the embedding dimension, τ is the time delay (lag) between successive elements of the vector, and t = η, . . . , N with embedding window η = (m − 1) τ. If not otherwise stated, in this article τ = 1, and we use the following expression: xmt , xm,1 , (xt , xt−1 , xt−2 , . . . , xt−m+1 )T . t
(2)
Similarly, we can reconstruct the state space trajectory of the time series Y as ym,τ , yt , yt−τ , yt−2τ , . . . , yt−(m−1)τ t
T
.
(3)
Note that the state vectors xm,τ and ym,τ are points in the m-dimensional spaces X t t and Y. To reconstruct the state spaces of the time series, the embedding dimension m and the time delay τ have to be determined. The choice of both parameters depends on the dynamics of the underlying data. In this article, first guesses for 4
the embedding dimension m and the time lag τ were obtained, respectively, by the false nearest neighbor technique [Kennel et al., 1992], and by estimating the first minimum of the mutual information function [Fraser and Swinney, 1986]. Because some well-known issues with such ”optimal” a priori estimates of embedding parameters exist [e.g. Arnhold et al., 1999], all experiments were also conducted with different (”empirical”) combinations of m and τ. We did not observe any significant differences between the results obtained using ”optimal” or ”empirical” parameters.
2.2 Transfer entropy A first approach to obtain knowledge about asymmetric dependencies between coupled systems (or the direction and strength of the coupling between its subsystems) is by measuring to which extent its individual components contribute to information production and at what rate they exchange information among each other. The transfer entropy seems to be an appropriate starting point because it is an information theoretic measure which not only shares some of the properties of mutual information but also takes the dynamics of information transport into account [Schreiber, 2000]. It is a specific version of the mutual information for conditional probabilities, but unlike mutual information it is designed to detect the directed exchange of information between two systems, separate for both directions, and conditional to common history and input signals. Essentially, transfer entropy measures the deviation from the generalized Markov property, (4) p xt+1 xkt = p xt+1 xkt , ylt where p denotes the transition probability. If the deviation from the generalized Markov process is small, then we can safely assume that the state of space Y has no (or little) relevance on the transition probabilities of the state vectors of space X. If the deviation is large, however, then the assumption of Markov process is not valid. The incorrectness of the assumption can be quantified by the transfer entropy which is formulated as Kullback-Leibler entropy between p xt+1 xkt and p xt+1 xkt , ylt : T Y→X
p xt+1 xkt , ylt . p xt+1 , xkt , ylt log = p xt+1 xkt xt+1 ,xk ,yl X t
t
(5)
The transfer entropy is explicitly nonsymmetric under the exchange of xt and yt 5
(a similar expression exists for T X→Y ). It can thus be used to detect the direction of the coupling between two time series. In other words, the transfer entropy represents the information about a future observation of variable xt obtained from the simultaneous observation of past values of both xt and yt , after discarding the information about the future of xt obtained from the past of xt alone. If not stated otherwise, in the subsequent parts of this article k = l = 1. We calculated the conditional probabilities from the joint probabilities. The problem is how to compute the joint probability p xt+1 , xkt , ylt . One possibility is to estimate the probability density through kernel estimation. In this article, we approximate the probabilities by using the histograms of the embedding vectors (na¨ıve histogram technique). The only parameter is the number of bins which is denoted as r. When the available data is limited and the coupling between the time series is small, transfer entropy (as well as the other measures reviewed here) suffers from a finite sample effect which makes the assessment of the significance of the obtained values difficult. To counter this problem, a slightly modified ”finite sample”-corrected variant of transfer entropy, called effective transfer entropy, has been proposed [Marschinski and Kantz, 2002]. Note that this article does not aim at suggesting solutions for the finite sample effect for any of the measures reviewed. Our (less ambitious) goal is to compare different measures in terms of their robustness against variations of the sample size N. For all our experiments N > 1000, and as our results demonstrate all reviewed measures suffer from a finite sample effect to a larger or lesser degree.
2.3 Extended and nonlinear Granger causality Another approach to evaluate causal relations between bivariate time series is to examine if the prediction of one series can be improved by incorporating information about the past values of the other. The idea was originally proposed by Wiener [1959], and was then later formalized by Granger [1969] in the context of linear regression models of stochastic processes. If the variance of the prediction error for the current value of xt is significantly reduced by including past observations of yt , then yt can be said to cause xt [Granger, 1969]. Consider the following set of autoregressive predictions of the current values xt and yt based on m past observations of the time series: xt = αT xmt−1 + ǫ x
(6)
yt = βT ymt−1 + ǫy
(7)
6
where ǫ x and ǫy represent the prediction errors, and α and β are determined by autoregression so as to minimize the squared norm of ǫ x and ǫy . By combining the two equations, we can express the prediction of the current value of xt and yt as the weighted sum of their past values: xt = aT xmt−1 + bT ymt−1 + ǫ x|y yt =
cT ymt−1
+d
T
xmt−1
+ ǫy|x .
(8) (9)
The magnitude of the prediction errors ǫ x|y and ǫy|x can be evaluated by their variances. If the prediction of xt improves by incorporating values of yt , that is, if var[ǫ x|y ] < var[ǫ x ], then yt is said to have a causal influence on xt . Similarly, if var[ǫy|x ] < var[ǫy ] then xt has a causal influence on yt . Here, we are interested in nonlinear systems. Granger causality, however, is formulated for linear models, and its direct application to nonlinear systems may not be appropriate in the general case. An extension to nonlinear multivariate time series was proposed in [Chen et al., 2004]. The idea is to divide the data into a set of local neighborhoods where the data can be approximated by a linear model, and a locally linear prediction scheme can be used [Farmer and Sidorowich, 1987]. The joint dynamics of the time series is expressed as ! xm,τ t m,τ (10) zt = m,τ . yt The state vector zm,τ can be considered as a point in the 2m-dimensional space t Z. Given L < N randomly chosen points zm,τ space Z t(k) in the reconstructed phase m,τ m,τ (t(k) = 1, . . . , N), we define L clusters of local neighbors Θk = {zt : zt(k) − zm,τ t | < δ} where δ is the size of the neighborhood and k = 1, . . . , L. The parameter δ is chosen of the order of the resolution of the measurements. In the case no neighborhood closer than δ exists, the value of δ has to be increased. Granger causality can be applied to each local neighborhood Θk by using the method of time delays [Takens, 1981]. Averaging over the neighborhood sampling the entire reconstructed phase space (i.e. L neighboring clusters) gives the extended Granger causality indices: i var[ǫ x|y ] 1 X δY→X = (11) 1 − L i var[ǫ xi ] i var[ǫy|x ] 1 X . δX→Y = (12) 1 − L i var[ǫyi ] 7
A different approach was taken by Ancona et al. [2004] who extended Granger causality by altering the embedding vectors of Eqs.6-9 as follows: xt = αT Φ(xmt−1 ) + ǫ x yt = β
T
Ψ(ymt−1 )
+ ǫy
(13) (14)
and xt = aT Φ(xmt−1 ) + bT Ψ(ymt−1 ) + ǫ x|y
(15)
yt = cT Φ(ymt−1 ) + dT Ψ(xmt−1 ) + ǫy|x
(16)
where Φ and Ψ are P-dimensional real vectors whose elements are nonlinear radial basis functions (RBF) centered at xmρ and ymρ (1 ≤ ρ ≤ P). By contrast with extended Granger causality which works locally, nonlinear Granger causality acts globally by performing a ”global” nonlinear regression over the reconstructed state spaces. As in [Ancona et al., 2004], we employed Gaussian RBFs whose variances σ2 were a priori fixed parameters, and whose P centers were chosen by fuzzy c-means clustering. Other clustering algorithms can be applied as well. By introducing cY→X , ǫ x − ǫ x|y cX→Y , ǫy − ǫy|x
(17) (18)
we can measure the causal relation among the two processes. If cX→Y > 0 then the prediction of xt improves by incorporating yt , and X can be said to causally influence Y. Analogously, if cY→X > 0 then Y has a causal influence on X.
2.4 Similarity index The method introduced by Arnhold et al. [1999] differs from the previous ones. Given the R < N nearest vectors of xt = xmt in state space X with time indices rt (i), and the R < N nearest vectors of yt = ymt in state space Y with time indices st (i) (with i = 1, 2, . . . , R), two distance measures can be defined: R
dtR (X) =
1X xt − xrt (i) 2 R i=1
R 1X dtR X Y = xt − x st (i) 2 . R i=1
8
(19) (20)
If the two systems are strongly synchronized, both sets of neighbors – the original one (Eq. 19) and the mutual one (Eq. 20) – coincide and dtR (X) ≈ dtR X Y . By contrast, for systems, mutual neighbors are more spread and independent dtR (X) ≪ dtR X Y . A na¨ıve influence index quantifying the degree of similarity between xmt and ymt can be defined as (see [Arnhold et al., 1999]): N 1 X dtR (X) . S X Y , N t=1 dtR X Y
(21)
A related measure in which the arithmetic average is replaced by the geometric average was also proposed by Arnhold et al. [1999]: N 1X d N (X) log t . H X Y , N t=1 dtR X Y
(22)
Here, the Y-conditioned mean-squared distances dtR (X|Y) are compared to the mean-squared distances dtN (X) of xmi to all xmj (where i , j). We call this measure Similarity Index 1. The major drawback of this index is that for weak causal structure and noisy time series, the detection of a coupling becomes difficult. To suppress the influence of the original time series and emphasize the contribution of the other system, a new index can be introduced [Bhattacharya et al., 2003] (subsequently referred to as Similarity Index 2). For each reference vector xt , 2R neighbors bearing the time indices rt′ (i) are considered (with i = 1, 2, . . . , 2R). Out of these 2R state vectors, only the farthest R vectors are chosen. The average radius of the modified point cloud is given by: dt′ (X) =
2R 2 1 X xt − xrt′ (i) . R i=R+1
(23)
In other words, the R nearest vectors are discarded to reduce the influence of the original neighbors. The influence of Y on X is represented by the following equation: N 1 X dt′ (X) . E X Y , (24) N t=1 dtR X Y Similarly, it is possible to estimate the influence of X on Y by means of E Y X . 9
2.5 Predictability improvement Another index quantifying the directional influence between two time series is the predictability improvement [Feldmann and Bhattacharya, 2004]. As for the similarity index, the R nearest vectors of the reference vector xt , xmt are denoted by their indices as rt (i), i = 1, 2, . . . , R. The difference between the future value of a given point in state space xi+h (h is the prediction horizon), and the predicted value based on the mean of the future values of the R nearest neighbors is the prediction error. By assuming a locally constant prediction model, the error is: R
1X xr +h . Ex(xt ) = xi+h − R i=1 i
(25)
The mean squared error is computed by estimating the sum of the square of the error over all chosen time steps t (here, N time steps): N 1X MS E x(X) = E x(xt )2 . N t=1
(26)
Subsequently, the indices of the k nearest vectors in the coupled embedding space Z are identified. By replacing ri in Eq.(26) with these indices and averaging over all xt , we obtain MS E x (X ⊕ Y) as the prediction error in the coupled space. Eventually, the predictability improvement PI(X|Y) of X by taking into account Y is defined as: PI(X|Y) , MSEx(X) − MSEx(X ⊕ Y). (27) In analogy, we can define the predictability improvement PI(Y|X) of Y by considering X. At first sight, this measure looks like the similarity index. The predictability improvement, however, is closer to Granger causality because it estimates the regression errors from state vectors. A significant difference is the absence of autoregressive modeling. Our implementation of predictability improvement uses a locally constant prediction model [Feldmann and Bhattacharya, 2004].
2.6 Intermediate summary The six reviewed methods and the required common and method-specific parameters are summarized in Table 1. The methods fall into three classes:
10
1. The first class includes measures of deviation of time series from the generalized Markov property (Eq. 4). We describe a method that expresses such deviation in terms of Kullback-Leibler entropy: Transfer entropy (TE). 2. The key idea of the measures of the second class is to express the mutual influence of the time series onto each other in terms of prediction errors. The errors are computed by applying regression models to the mixed state vectors consisting of a varying number of delayed samples from both systems. The methods reviewed here are: (a) extended Granger causality index (EGCI) based on local linear regression; (b) nonlinear Granger causality (NLGC) based on global nonlinear regression; and (c) the predictability improvement (PI) based on locally constant regression. 3. The basic assumption of the third class of methods is that interrelations manifest themselves as neighborhood relations between states reconstructed from the individual time series of each system. The two measures using neighborhood relations in state space are similarity index 1 and 2 (SI-1 and SI-2). [*** INSERT TABLE 1 HERE ***]
3 Numerical Studies We illustrate the behavior and the effectiveness of the reviewed methods by means of five model systems with built-in causality structure: a system consisting of coupled linear processes, two systems composed of unidirectionally coupled chaotic maps, and two systems of bidirectionally coupled chaotic maps (see Table 2). We start by studying a system of coupled linear autoregressive processes – which being mathematically simple – will hopefully help the reader better understand the six reviewed methods. The second model system is a spatio-temporal system – a one-dimensional Ulam map lattice (i.e. a chain of unidirectionally coupled chaotic units known as Ulam maps). The third system considered is composed of two unidirectionally coupled Henon maps [Henon, 1976]. Finally, we study two types of bidirectionally coupled Henon maps – identical and nonidentical ones. All simulations were done in Matlab (source code available upon request). [*** INSERT TABLE 2 HERE ***] 11
3.1 Unidirectional coupling 1: linear processes As a first experiment, for simplicity, we studied the system formed of two unidirectionally coupled autoregressive processes of first order: 2 xi = 0.8 xi−1 + n(x) i (σ x ) + e yi−1
(28)
2 yi = 0.4 yi−1 + n(y) i (σy )
(29)
(y) where the parameter e ∈ [0, 1] regulates the coupling strength, and n(x) i and ni are independent Gaussian random processes with zero mean and variance σ2x = σ2y = 0.2. The numerical experiments were performed with N = 104 samples (to assure stationarity the initial 104 iterations were discarded). For each coupling, we generated a bivariate time series {xi , yi } (random initial conditions extracted from a normal distribution with zero mean and unit variance). By construction, the two time series are unidirectionally coupled with time series yi influencing time series xi .
3.2 Unidirectional coupling 2: Ulam map lattice For our second experiment with used a one-dimensional ring lattice of unidirectionally coupled Ulam maps with periodic boundary conditions [Schreiber, 2000]: (l−1) (l) x(l) i = f (e xi−1 + (1 − e) xi−1 )
(30)
f (x) = 2 − x2 ,
(31)
l = 1, . . . , L
with the strength of the coupling e varying from 0 to 1, and L (the number of Ulam maps forming the lattice) fixed for all experiments to L = 100. We conducted numerical experiments for two sample sizes: N = 103 and N = 105 (after discarding the initial 105 points as transients in order to absorb the influence of ini(2) tial conditions). For each coupling, a bivariate time series {x(1) i , xi } was generated (random initial conditions). By the way the coupled map lattice was constructed x(1) causally affects x(2) i i , while the causal relation in the backward direction is negligible. In other words, the coupled equations form a driver-response system.
12
3.3 Unidirectional coupling 3: Henon maps In our third experiment, we considered the same system studied in [Schmitz, 2000], namely, two unidirectionally coupled Henon maps: xi = a − x2i−1 + b x xi−2 yi = a − {e xi−1 + (1 − e) yi−1 } yi−1 + by yi−2
(32) (33)
where e ∈ [0, 1] denotes the strength of the coupling. Throughout all experiments the coefficients were fixed to a = 1.4, b x = 0.3 and by = 0.3 (identical maps). The experiments were conducted with N = 103 , 104 and 105 samples. Again, a sufficiently high number of iterations (105 ) were discarded as transients and only the subsequent samples were used. Because by construction the systems are unidirectionally coupled (that is, X drives Y), the causal influence of Y on X is expected to be close to zero.
3.4 Bidirectional coupling: Henon maps For our fourth and fifth simulation we used bidirectionally coupled Henon maps [e.g. Wiesenfeldt et al., 2001]: xi = a − x2i−1 + b x xi−2 + cyx (x2i−1 − y2i−1 ) yi = a − y2i−1 + by yi−2 + cxy (y2i−1 − x2i−1 )
(34) (35)
where a = 1.4. In our numerical experiments, we varied the coupling strengths cyx and cxy independently from 0.0 to 0.4. The initial 105 samples were discarded as transients, and the following N = 104 samples were used for estimating strength and direction of the coupling. By appropriately choosing b x and by , we can simulate identical maps (b x = by ) and nonidentical maps (b x > by or b x < by ). In the case of identical maps, we set b x = by = 0.3 (case 1). For nonidentical maps, we set b x = 0.3 and by = 0.1 (case 2). The aim of the latter experiment was to assess how each method can capture the causal structure determined by the asymmetry. Generally speaking, given two arbitrary time series the strength and direction of their coupling is not identical because the dynamics of the underlying processes differ. This experiment has important implications for the application of the measures to ”real world” problems where inevitably asymmetries of coupling will be present.
13
4 Results In this section, we expose the results of our numerical studies. We first present the results for unidirectionally coupled linear processes. We proceed with the ones obtained for unidirectionally coupled maps (Ulam and Henon map), and end with the ones for bidirectionally coupled identical and nonidental Henon maps.
4.1 Unidirectional coupling 1: Linear processes The results for the two unidirectionally coupled autoregressive processes are reproduced in Fig. 1. The embedding dimension for all methods was m = 2 (exception made for the transfer entropy and for the predictability improvement for which m = 1). The time lag was set to τ = 1. Other parameters are illustrated in Table 3. Figure 1 shows the estimated coupling strength as a function of the real coupling strength averaged over five experimental runs (the errorbars denote the standard deviation). As evident from the graphs, all methods give acceptable results despite the large variance of the noise. Note that both Granger causality based measures perform very well (even for higher levels of noise). In fact, they represent the best choice for coupled linear processes. The predictability improvement PI(Y|X) of Y by taking into account X is affected by a rather large variance. An increase of the number of samples yields, quite obviously, a more robust estimate of the coupling between the two linear processes. [*** INSERT TABLE 3 HERE ***] [*** INSERT FIGURE 1 HERE ***]
4.2 Unidirectional coupling 2: Ulam map lattice Figure 2 shows the results for the Ulam map lattice with N = 103 samples. The results for N = 105 samples are displayed in Fig. 3. The embedding dimension was m = 1 and the time lag τ = 1 for all methods. Other parameters are shown in Table 4. Note that to keep the computations stable for some of the methods we had to carefully tune some of the critical parameters (such as size of the neighborhood and number of nearest neighbors; Table 1). We conducted five experimental runs for each method with randomly chosen initial conditions. The graphs show the average of the estimated coupling strength (roughly equivalent to amount of (2) exchanged ”information”) between the time series Y(x(1) i ) and X(xi ) as a function 14
of the real coupling strength. The errorbars designate standard deviations. Intuitively, the larger the coupling between two systems, the larger is the information flow between them. [*** INSERT TABLE 4 HERE ***] [*** INSERT FIGURE 2 HERE ***] [*** INSERT FIGURE 3 HERE ***] By construction the Ulam map lattice is expected to have the following properties: 1. The information flow from Y to X increases as the coupling strength increases. 2. The information flow from X to Y is vanishingly small (i.e. is close to zero) because the coupling is unidirectional. 3. For e = 0.18 and e = 0.82, there is a sharp drop in the flow of information and the estimated coupling is zero. For these two values of e both systems are in a (generally) synchronized state, that is, driver and response are indistinguishable from each other. Figure 2 clearly illustrates that transfer entropy, extended Granger causality, and predictability improvement give consistent results and successfully disclose the strength and the direction of the coupling. Given an adequate number of RBF kernels (expressed by the parameter P) nonlinear Granger causality gives arguably the best results. The index, however, collapses for values of e for which the two coupled systems enter a (generally) synchronized state. In particular, for e = 0.825 the method proved to be computationally unstable. The two similarity indices correctly detect the two synchronized states but fail to capture the amount of information propagating through the one-dimensional lattice (due to a lack of neighbors in their R-neighborhoods). This result is not surprising considering that both methods have similar algorithmic properties. The results for N = 105 are displayed in Fig. 3. Trend-wise they resemble the ones for N = 103 . The major difference lies in the fact that by using more samples the plots are less noisy. We point out that computationally speaking nonlinear Granger causality was the heaviest method (for instance, with our not too 15
optimized algorithm and for P > 300 and N = 105 the estimation of the coupling between the Ulam maps would have taken several days or even weeks on our Pentium P4-3.2GHz). We further note that in both experiments (N = 103 and N = 105 ), the similarity indices show ”similar” results – apologies for the pun. We also observe that for all methods the standard deviations are negligibly small compared with the average information propagating through the system. In other words, by discarding 105 initial samples it is indeed possible to neglect the influence of the initial conditions on future states of the system.
4.3 Unidirectional coupling 3: Henon maps Figure 4 displays the results for unidirectionally coupled Henon maps for sample sizes N = {103 , 104, 105 }. The embedding dimension in this case was m = 2 and the time lag τ = 1 (for computational reasons we used m = 1 and τ = 1 for transfer entropy). Other parameter settings are shown in Table 5. [*** INSERT TABLE 5 HERE ***] For uni- and bidirectionally coupled Henon maps we define the following index: C(iX,Y ) = C(i) = i(X → Y) − i(Y → X), (36) where i(X → Y) and i(Y → X) denote any of the measures of interdependence introduced before. In the context of unidirectionally coupled maps the following points are noteworthy: (1) C(i) > 0 due to the unidirectionality of the map, i.e. the information flows from X to Y and not in the opposite direction; (2) C(i) drops sharply at e = 0.7 and remains vanishingly small for e > 0.7 as a result of a transition to a state of synchronization between the two maps. As evident from Fig. 4, transfer entropy gives rather noise-free and consistent results even for N = 1000 samples. The extended Granger causality index and similarity index 1 also succeed in capturing the coupling between the two maps. As opposed to transfer entropy, however, the performance of both indices depends quite strongly on the number of samples. As for the influence of X on Y estimated by nonlinear Granger causality, the difference between the unidirectional couplings seems to be reasonably consistent with the theoretical predictions (in particular for N = 105 ). From our results, however, we are not allowed to conclude that nonlinear Granger causality adequately captures the coupling between the two Henon maps.
16
We further note that the two similarity indices as well as the predictability improvement displayed a somewhat peculiar behavior (results are not shown). In both cases, C(i) showed a sudden increase around e = 0.7 – a threshold value leading to a state of synchronization between the two Henon maps. Interestingly, while SI-1 gave consistent results, SI-2 failed to do so. The results for predictability improvement are displayed in separate graphs (Fig. 5) because the variation of scale of the index for different sample sizes did not allow the identification of the real trends if plotted in the same graph. As illustrated by Fig. 5, larger sample sizes (N = 104 , 105) produced good – although slightly noisy – results, as opposed to smaller sample sizes. [*** INSERT FIGURE 4 HERE ***] [*** INSERT FIGURE 5 HERE ***]
4.4 Bidirectional coupling: Henon maps The results for identical and nonidentical bidirectional coupled Henon maps are reproduced in Figs. 6 and 7. The time delay was τ = 1 and the embedding dimension was m = 2 (exception made for transfer entropy for which m = 1). Other parameters are shown in Table 6. [*** INSERT TABLE 6 HERE ***] In the case of bidirectionally coupled Henon maps, C(i) is deeply affected by the coupling constants cyx and cxy (Eqs. 34 and 35). We make two assumptions: (1) The difference between the flows of information C(i) is large and positive if cyx has a low value and cxy has a high value; (2) C(i) is low and negative if cyx has a high value and cxy has a low value. 4.4.1 Case 1: identical maps Figure 6 displays the results for identically coupled Henon maps. Note that some (cxy , cyx ) pairs had to be rejected because of computational reasons. In case of synchronization, both similarity indices and the extended Granger causality index failed to converge in a reasonable time. Moreover, close to the center of every map an area exists in which the coupled systems exhibited general synchronization. The area is defined as cxy > 0.1, cyx < 0.25 and cxy + cyx < 0.4. In this region, every method showed difficulty estimating the coupling constants due to 17
numerical instabilities – in particular, similarity index 1. Trends of directionality were reliably and consistently detected by transfer entropy, the extended Granger causality index, predictability improvement, and similarity index 1. On the other side, nonlinear Granger causality was overly noisy, and the similarity index 2 showed little directional trends. [*** INSERT FIGURE 6 HERE ***] 4.4.2 Case 2: nonidentical maps Figure 7 reproduces the results obtained with nonidentically coupled Henon maps. We note that for most of the methods some of the couplings estimated in the righthalf side of the plots (i.e. for large values of cxy ) led to numerical instabilities. We suppose that such instabilities are not due to the methods, but are a result of the intrinsic nature of the maps: considering that these points were obtained in highly coupled conditions, the data is chaotic, and the coupling difficult to estimate (as a consequence of the instabilities). [*** INSERT FIGURE 7 HERE ***] The most striking difference from the case of identical maps (case 1) is the shift of the synchronized area. The shape of the area also morphed from a triangle to a rectangle. Both changes are attributable to the asymmetric structure of the coupled system. In the synchronized area, all methods delivered results similar to the ones obtained in the case of identical maps. Another interesting point is the variance in the size of the areas reflecting unidirectional coupling (red and blue areas). In previous experiments, SI-1, EGCI, PI, and TE delivered consistent results in the sense that the sizes of those areas were almost identical. For the nonidentical maps case, however, SI-1 and TE produced a larger area of coupling strengths leading to high values of C(iX,Y ) (i.e. information flows from X to Y), and a narrower area of low values of C(iX,Y ) (information flows from Y to X). The extend Granger causality index unearthed non-directional trends in mutually coupled areas, that is, for high cxy and cyx . By contrast, the predictability improvement showed a large window for which the influence of Y on X was dominant. As in the case of identical maps, nonlinear Granger causality and similarity index 2 showed weak directionality. An additional point of interest is the high information flows from X to Y in mutually coupled areas.
18
5 Discussion Asymmetric (causal) interdependencies between coupled systems are common in practice. In this article, we compared six measures for quantifying causal relations between bivariate time series. Although none of the measures is general enough to detect asymmetric relations unambiguously, we can make a few remarks. The first one concerns the computational stability of the indices. In all experiments, transfer entropy and predictability improvement delivered results consistent with the theoretical predictions. Both measures proved to be numerically stable even for a limited number of samples (N = 1000). The measures based on Granger causality performed not quite as well. Both were affected by instabilities, in particular when applied to bidirectionally coupled Henon maps (exception made for the case of coupled linear processes for which both measures performed very well). In the case of nonlinear Granger causality, we suppose that such instabilities are due to the small number of RBF kernels used (P = 10). As the the method was applied to a two-dimensional parameter space (cxy , cyx ), it was computationally impractical to employ a larger number of RBFs. From the point of view of machine learning, to take full advantage of kernel-based methods it is important to have a sample pool which is large enough compared to the rank of the processed data. More precisely, if the rank is too small, kernel-based methods tend to over-fit the data set. This is even more of a concern if such methods are applied to real world data (e.g. physiological data), which are typically characterized by high-dimensional nonlinear relationships between the variables. We also experienced some problems with the neighborhood-based approaches. The reason might be that these methods are somewhat heuristic, and only provide a rather ad hoc way of evaluating the statistical relationship between time series. In fact, as our experimental results show, both similarity indices could not capture well the direction and the strength of the coupling between chaotic maps – in particular for unidirectionally coupled maps. This result reflects the nature of neighborhood-based approaches: they are adequate for detecting ”synchrony” between two time series, but not for measuring the direction of the causal interaction (i.e. the asymmetry of the interdependence). We omit the results here, but in the case of unidirectionally coupled Henon maps, both indices showed a sudden increase for coupling strengths e > 0.7. This critical value of e demarks the boundary between the ”synchronized area” and the ”directional area.” Although many of our examples indicate that the reviewed methods are able to unearth interdependencies and directional flow of information between coupled systems, given a set of time series it is in principle impossible to make detailed 19
statements about the processes that gave raise to them. It is thus important to avoid overinterpretations of the outcome of the data analysis. The interpretation of two observed systems as ”driver” and ”response”, for instance, is only possible if the existence of a third external driving system can be excluded (e.g. by carefully designing the coupled systems). Again, we do not imply that the output of any of the methods has causal meaning a priori (for additional discussion of this crucial issue we refer to [Arnhold et al., 1999; Quian Quiroga et al., 2000]). In this context, an important question is whether the relationship between a more active system (e.g. an independent system with many excited degrees of freedom) and a passive system (e.g. a dependent system with fewer excited degrees of freedom) can have a causal driver-response interpretation in particular circumstances. For unidirectionally coupled Henon maps and bidirectionally coupled systems, we presented results displaying the difference C(i) between the unidirectional transfers of information (Eq. 36). It is important to assess whether this difference is large enough to be considered significant or not. It might well be – as in the case of predictability improvement applied to unidirectionally coupled Henon maps (Fig. 5) – that relevant but small-scale causality structure is neglected (due to an inadequate choice of the scale or the resolution of the presented results). One way to avoid this problem is to use the (normalized) directionality index introduced in [Rosenblum and Pikovsky, 2001]: d(X, Y) =
i(X → Y) − i(Y → X) , i(X → Y) + i(Y → X)
(37)
where i(X → Y) and i(Y → X) should be replaced by any of the indices introduced before. The index varies from 1 in the case of unidirectional coupling (system X drives system Y) to −1 in the opposite case (system Y drives system X) with intermediate values corresponding to a bidirectional influence between the two systems. In preliminary experiments we tested the directionality index. We do not describe in detail the results here, but we found that the index works well as long as the numerical stability of the computations is guaranteed. If this is not the case, the index collapses. Problems also occurred when the information was negative while the denumerator had a small value. In this case d(X, Y) jumped to an extraordinary large value. Considering the outcome of this ”preliminary” study, we decided not to use the directionality index. The problem of relevant small-scale causality structure hidden by less relevant structure (noise) remains open. At last point concerns the aforementioned computational issues. All methods were implemented using standard Matlab procedures and were executed on the 20
same computer. The most time consuming method was the nonlinear Granger causality index. The reason is that it is a kernel-based method for which for every sample we need to calculate a radial-based kernel function and the location of its center (e.g. using c-means clustering). An additional drawback of NLGC is its heavy memory cost. Transfer entropy showed the lightest computational load, its only apparent drawback (at least in in our implementation) being the estimation of the joint probabilities as normalized histograms (note that it would be possible to use kernel density estimators instead). As for the interdependence indices based on neighborhood relationships, they rely on the calculation of nearest neighbors. This operation is rather time consuming because the number of points in the neighborhood needs to be large enough to establish good statistics but also small enough to ensure linearization. The requirement is a large pool of samples entailing heavy computational costs.
6 Conclusion The components of a complex system rarely display linear interdependencies. The identification of nonlinear relationships between the individual components provides crucial insights into the hidden dynamics of the system and sheds light on the underlying mechanisms. En route to a substantive theory of causality analysis, a variety of methods capable of detecting information flow between two systems have been proposed to address this important task. This article presented a survey of several such measures of interdependence or ”causality structure.” To compare and evaluate the measures and assess their performance, we conducted numerical experiments with linear and nonlinear model systems (unidirectionally coupled linear processes, as well as unidirectionally and bidirectionally coupled chaotic maps). Our results show that all indices are able to unearth nontrivial statistical structure in the data which might be exploited to monitor, characterize, control, and predict a system’s future dynamics. For the examined chaotic maps, however, some methods seem to perform ”better” than other ones – in terms of consistency with theoretical predictions and computational stability. From our review, we conclude that given a complex system with a priori unknown dynamics, the first method of choice might be transfer entropy or its ”finite sample”-corrected variant, effective transfer entropy [Marschinski and Kantz, 2002]. If a large enough pool of samples is available, both nonlinear extensions of Granger causality and the predictability improvement might provide adequate alternatives. We end by remarking that the quantification of causal structure via data anal21
ysis has to be accompanied by a substantive theory of causal analysis. The establishment of such a converging explanatory framework will have far-reaching effects, not only in physics, but also in fields such as artificial intelligence and robotics.
7 Acknowledgements M.L. was supported by research grant no. 04413 of the Japan Society for the Promotion of Science. We would like to thank two anonymous reviewers for comments on a draft of this manuscript.
References Ancona, N., Marinazzo, D., Stramaglia, S., [2004] “Radial basis function approach to nonlinear Granger causality of time series”. Physical Review E 70, 056221. Arnhold, J., Grassberger, P., Lehnertz, K., Elger, C. E., [1999] “A robust method for detecting interdependences: application to intracranially recorded EEG”. Physica D 134, 419–430. Bhattacharya, J., Pereda, E., Petsche, H., [2003] “Effective detection of coupling in short and noisy bivariate data”. IEEE Trans. on Systems, Man, and Cybernetics – Part B: Cybernetics 33 (1), 85–95. Chen, Y., Rangarajan, G., Feng, J., Ding, M., [2004] “Analyzing multiple nonlinear time series with extended Granger causality”. Physics Letters A 324, 26–35. Farmer, J. D., Sidorowich, J. J., [1987] “Predicting chaotic time series”. Physical Review Letters 59 (8), 845–848. Feldmann, U., Bhattacharya, J., [2004] “Predictability improvement as an asymmetrical measure of interdependence in bivariate time series”. Int. J. of Bifurcation and Chaos 14 (2), 505–514. Fraser, A. M., Swinney, H. L., [1986] “Independent coordinates for strange attractors from mutual information”. Physical Review A 33, 1134–1140.
22
Govidan, R. B., Raethjen, J., Kopper, F., Claussen, J. C., Deuschl, G., [2005] “Estimation of time delay by coherence analysis”. Physica A 350, 277–295. Granger, C. W. J., [1969] “Investigating causal relations by econometric models and cross-spectral methods”. Econometrica 37 (3), 424–438. Henon, M., [1976] “A two-dimensional mapping with a strange attractor”. Commun. Math. Phys. 50 (1), 69–77. Kennel, M. B., Brown, R., Abarbanel, H. D. I., [1992] “Determining embedding dimension for phase-space reconstruction using a geometrical construction”. Physical Review A 45, 3403–3411. Marschinski, R., Kantz, H., [2002] “Analysing the information flow between financial time series”. European Physical J. B 30 (2), 275–281. Packard, N. H., Crutchfield, J. P., Farmer, J. D., Shaw, R. S., [1980] “Geometry from a time series”. Physical Review Letters 45 (9), 712–716. Palus, M., Stefanovska, A., [2003] “Direction of coupling from phases of interacting oscillators: an information-theoretic approach”. Physical Review E 67, 055201. Parlitz, U., Kocarev, L., [1999] “Synchronization of chaotic systems”. In: Handbook of Chaos Control. H.G. Schuster, pp. 271–303. Pearl, J., [1988]. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers: San Francisco, CA. Pikovsky, A., Rosenblum, M., Kurth, J., [2001]. Synchronization: A Universal Concept in Nonlinear Science. Cambridge University Press: Cambridge, UK. Priestly, M. B., [1989]. Spectral Analysis and Time Series. Academic Press: London. Quian Quiroga, R., Arnhold, J., Grassberger, P., [2000] “Learning driver-response relationships from synchronization patterns”. Phys. Rev. E 61 (5), 5142–5148. Rosenblum, M. G., Pikovsky, A. S., [2001] “Detecting direction of coupling in interacting oscillators”. Physical Review E 64, 045202.
23
Roulston, M. S., [1999] “Estimating the errors on measured entropy and mutual information”. Physica D 125, 285–294. Rybski, D., Havlin, S., Bunde, A., [2003] “Phase synchronization in temperature and precipitation records”. Physica A 320, 601–610. Schmitz, A., [2000] “Measuring statistical dependence and coupling of subsystems”. Physical Review E 62 (5), 7508–7511. Schreiber, T., [2000] “Measuring information transfer”. Physical Review Letters 85 (2), 461–464. Takens, M., [1981] “Detecting strange attractors in turbulence”. In: Rand, D., Young, L. (Eds.), Lecture Notes in Mathematics: Dynamical Systems and Turbulence. Vol. 898. Springer Verlag: Berlin, pp. 366–381. Wiener, N., [1959] “The theory of prediction”. In: E.F.Beckenbach (Ed.), Modern Mathematics for Engineers. McGraw-Hill: New York. Wiesenfeldt, M., Parlitz, U., Lauterborn, W., [2001] “Mixed state analysis of multivariate time series”. Int. J. of Bifurcation and Chaos 11 (8), 2217–2226.
24
Table 1: Methods at a glance. Approach and method-specific parameters. Common parameters: N (number of samples), m (embedding dimension), τ (time delay). Method Transfer Entropy (TE) Extended G. C. (EGCI) Nonlinear G. C. (NLGC) Predictability improv. (PI) Similarity index (SI-1/2)
Approach Information theory Local linear regression
Method-specific parameters r (kernel parameter) L (number of neighbors) δ (neighborhood size) Regression with RBF P (number of RBF) σ (variance of Gaussians) Constant regression h (prediction horizon) R (nearest neighbors) Neighborhood analysis R (nearest neighbors)
25
Table 2: Studied model systems. Model system Coupled linear processes Ulam map lattice Henon maps Identical Henon maps Nonidentical Henon maps
coupling direction unidirectional unidirectional unidirectional bidirectional bidirectional
26
coupling scheme linear linear nonlinear nonlinear nonlinear
dynamics linear nonlinear nonlinear nonlinear nonlinear
Table 3: Method-specific parameters used for study of coupled linear autoregressive processes. Parameters common to all methods: m = 2, τ = 1. Method Parameters Transfer Entropy r = 8, m = 1 Extended G. C. L = 20, δ = 0.8 Nonlinear G. C. P = 10, σ = 0.05 Predictability improv. h = 1, R = 10, m = 1 Similarity index 1 R = 10 Similarity index 2 R = 30
27
Table 4: Method-specific parameters used for study of coupled Ulam maps lattice. Parameters common to all methods: m = 1, τ = 1. Method Parameters Transfer Entropy r=8 Extended G. C. L = 100, δ = 0.5, 0.2 Nonlinear G. C. P = 50, 50, σ = 0.05 Predictability improv. h = 1, R = 1 Similarity index 1 R = 20 Similarity index 2 R = 20
28
Table 5: Method-specific parameters for analysis of unidirectionally coupled Henon maps. Common parameters: m = 2, τ = 1. Method Parameters Transfer Entropy r = 8, m = 1 Extended G. C. L = 100, δ = 0.5, 0.3, 0.2 Nonlinear G. C. P = 50, 50, 100, σ = 0.05 Predictability improv. h = 1, R = 1 Similarity index 1 R = 20 Similarity index 2 R = 20
29
Table 6: Method-specific parameters used for analysis of bidirectionally coupled Henon maps. Common parameters: m = 2, τ = 1. Method Parameters Transfer Entropy r = 8, m = 1 Extended G. C. L = 100, δ = 0.6 Nonlinear G. C. P = 10, σ = 0.05 Predictability improv. h = 1, R = 1 Similarity index 1 R = 20 Similarity index 2 R = 100
30
0.5
TY→ X TX→ Y
estimated coupling (EGCI)
estimated coupling (TE)
0.3 0.25 0.2 0.15 0.1
δY→ X δX→ Y
0.4 0.3 0.2 0.1
0.05 0 0
estimated coupling (NLGC)
0.05
0.2
0.4 0.6 real coupling (e)
0.8
1
0
cX→ Y
0.04
0.2
2.5
cY→ X estimated coupling (PI)
0
0.03 0.02 0.01
0.4 0.6 real coupling (e)
0.8
1
0.4 0.6 real coupling (e)
0.8
1
0.4 0.6 real coupling (e)
0.8
1
PI(X|Y) PI(Y|X)
2 1.5 1 0.5 0
0 0
0.2
0.4 0.6 real coupling (e)
0.8
1
0
0.2 −3
x 10 H(X|Y) H(Y|X)
0.35
E(X|Y) E(Y|X)
13 estimated coupling (SI−2)
estimated coupling (SI−1)
0.4
0.3 0.25 0.2 0.15 0.1 0.05
12 11 10 9 8 7 6
0 0
0.2
0.4 0.6 real coupling (e)
0.8
5
1
Figure 1:
31
0
0.2
1 1.6
TX→ Y
1.4
estimated coupling (EGCI)
estimated coupling (TE)
δY→ X
TY→ X
1.2 1 0.8 0.6 0.4
δX→ Y
0.8
0.6
0.4
0.2
0.2 0
0
0.2
0.4 0.6 real coupling (e)
0.8
0
1
2
0
0.2
0.8
1
0.4 0.6 real coupling (e)
0.8
1
0.4 0.6 real coupling (e)
0.8
1
2.5 cY→ X
PI(X|Y) PI(Y|X)
cX→ Y
estimated coupling (PI)
estimated coupling (NLGC)
0.4 0.6 real coupling (e)
1.5
1
0.5
2 1.5 1 0.5 0
0
0
0.2
0.4 0.6 real coupling (e)
0.8
1
0
−3
−3
x 10
5 4 3 2 1 0
E(X|Y) E(Y|X)
7 estimated coupling (SI−2)
estimated coupling (SI−1)
x 10
H(X|Y) H(Y|X)
6
0.2
6 5 4 3 2 1
0
0.2
0.4 0.6 real coupling (e)
0.8
0
1
Figure 2:
32
0
0.2
TX→ Y
1.4
estimated coupling (EGCI)
estimated coupling (TE)
δY→ X
TY→ X
1.6
1.2 1 0.8 0.6 0.4
δX→ Y
1 0.8 0.6 0.4 0.2
0.2 0
0
0.2
0.8
0
1
cY→ X
2
1.5
1
0.5
0
0
0.2
0.4 0.6 real coupling (e)
0.8
0
estimated coupling (SI−2)
estimated coupling (SI−1)
1
0.4 0.6 real coupling (e)
0.8
1
0.4 0.6 real coupling (e)
0.8
1
PI(X|Y) PI(Y|X)
0
0.2 −7
6 H(X|Y) H(Y|X)
3 2 1
0
0.8
0.5
1
4
0
0.4 0.6 real coupling (e)
1
x 10
5
0.2
1.5
−7
6
0
2
cX→ Y
estimated coupling (PI)
estimated coupling (NLGC)
0.4 0.6 real coupling (e)
0.2
0.4 0.6 real coupling (e)
0.8
Figure 3:
33
E(X|Y) E(Y|X)
5 4 3 2 1 0
1
x 10
0
0.2
0.8
N=1000 N=10000 N=100000
0.7
1
0.6 0.5
C(EGCI)
C(TE)
N=1000 N=10000 N=100000
1.2
0.4
0.8 0.6
0.3 0.4 0.2 0.2
0.1
0
0 0
0.2
0.4 0.6 coupling (e)
0.8
1
0
0.2
0.4 0.6 coupling (e)
0.8
1
0.4 0.6 coupling (e)
0.8
1
0.4 0.6 coupling (e)
0.8
1
−3
0.15
4 N=1000 N=10000 N=100000
N=1000 N=10000 N=100000
3 2 C(PI)
0.05
1
0 0 −0.05 −1 −0.1
0
0.2
0.4 0.6 coupling (e)
0.8
1
0
6
0.2
2.5 N=1000 N=10000 N=100000
5
N=1000 N=10000 N=100000
2
C(SI−2)
4 C(SI−1)
C(NLGC)
0.1
x 10
3
1.5
1
2 0.5
1 0
0 0
0.2
0.4 0.6 coupling (e)
0.8
1
0
Figure 4:
34
0.2
−4
3.5
−5
x 10
3.5
3
3
2.5
2.5 C(PI)
2 C(PI)
x 10
1.5
2 1.5
1 1
0.5
0.5
0 −0.5
0 0
0.2
0.4 0.6 coupling (e)
0.8
1
0
Figure 5:
35
0.2
0.4 0.6 coupling (e)
0.8
1
C(TE)
C(EGCI)
0.4
0.4 0.4
0.8
0.3 0.3
0.6 0.3
0.2
0.4
0.2
0.2 cyx
cyx
0.1 0
0.2
0
−0.1
−0.2
−0.2
0.1
−0.4
0.1
−0.3
−0.6
−0.4 0 0
0.1
0.2 cxy
0.3
−0.8 0 0
0.4
0.1
C(NGLC)
0.2 cxy
0.3
0.4
C(PI)
0.4
−4
x 10
0.4 0.2
2
0.15 0.3
0.3
0.1
1
cyx
cyx
0.05 0
0.2
0.2
0
0.1
−1
−0.05 −0.1 0.1
−0.15 −0.2
0 0
0.1
0.2 cxy
0.3
−2 0 0
0.4
0.1
C(SI−1)
0.2 cxy
0.3
0.4
C(SI−2)
0.4
0.4
0.8 0.6
2 0.3
0.4
0.3
0.2
0
0.2 cyx
cyx
1
0
0.2
−0.2 −1 0.1
−0.4 0.1
−0.6
−2
−0.8 0 0
0.1
0.2 cxy
0.3
0 0
0.4
Figure 6:
36
0.1
0.2 cxy
0.3
0.4
C(TE)
C(EGCI)
0.4
0.4 0.4
0.8 0.6
0.3
0.3
0.2
0.4
cyx
cyx
0.2 0
0.2
0.2
0 −0.2
−0.2 0.1
−0.4
0.1
−0.6
−0.4
−0.8 0 0
0.1
0.2 cxy
0.3
0 0
0.4
0.1
C(NGLC)
0.2 cxy
0.3
0.4
C(PI)
0.4
−4
x 10
0.4
2
0.2 0.3
0.1
0.3 1
0.2
−0.1
cyx
cyx
0 0.2
0
−0.2 0.1
0.1 −0.3
0 0
−0.4 0.1
0.2 cxy
0.3
0 0
0.4
−1 0.1
C(SI−1)
0.2 cxy
0.3
0.4
C(SI−2)
0.4
0.4 0.8
2
0.6 0.3
1 0
0.2
0.4 0.2
cyx
cyx
0.3
0.2
0
−1 0.1
−2
−0.2 −0.4
0.1
−0.6 −0.8
−3 0 0
0.1
0.2 cxy
0.3
0 0
0.4
Figure 7:
37
0.1
0.2 cxy
0.3
0.4
Figure 1: Unidirectionally coupled linear processes for N = 104 . Influence of autoregressive process Y on process X (solid), influence of process X on process Y (dashed). Top left: transfer entropy (TE); top right: extended Granger causality index (EGCI); center left: nonlinear Granger causality (NLGC); center right: predictability improvement (PI); bottom left: similarity index 1 (SI-1); bottom right: similarity index 2 (SI-2). The horizontal axis denotes the real coupling strength e, the vertical axis is the directional flow of information between the two processes. Errorbars represent the standard deviations. For choice of parameters, see Table 3.
38
Figure 2: Unidirectionally coupled Ulam map for N = 103 . Influence of data series Y(x1 ) on series X(x2 ) (solid), influence of X(x2 ) on Y(x1 ) (dashed). Top left: transfer entropy (TE); top right: extended Granger causality index (EGCI); center left: nonlinear Granger causality (NLGC); center right: predictability improvement (PI); bottom left: similarity index 1 (SI-1); bottom right: similarity index 2 (SI-2). The horizontal axis denotes the coupling strength e, the vertical axis is the difference between the flow of information (amount of ”exchanged” information). Errorbars represent the standard deviations. For choice of parameters, see Table 4.
39
Figure 3: Unidirectionally coupled Ulam map for N = 105 . Influence of data series Y(x1 ) on series X(x2 ) (solid), influence of X(x2 ) on Y(x1 ) (dashed). Top left: transfer entropy (TE); top right: extended Granger causality index (EGCI); center left: nonlinear Granger causality (NLGC); center right: predictability improvement (PI); bottom left: similarity index 1 (SI-1); bottom right: similarity index 2 (SI-2). The horizontal axis denotes the coupling strength e, and the vertical axis the difference between the flow of information. Results are average over five realizations and errorbars represent the standard deviations. For choice of parameters, see Table 4.
40
Figure 4: Influence of different choices of sample size on measures of causality for unidirectionally coupled Henon maps for N = 103 (dashed), N = 104 (thin solid line), N = 105 (thick solid line). Top left: transfer entropy (TE); top right: extended Granger causality index (EGCI); center left: nonlinear Granger causality (NLGC); center right: predictability improvement (PI); bottom left: similarity index 1 (SI-1); bottom right: similarity index 2 (SI-2). For choice of parameters, see Table 5.
41
Figure 5: Predictability improvement for unidirectionally coupled Henon maps. N = 104 (left), N = 105 (right).
42
Figure 6: Measures of causality for bidirectionally coupled identical Henon maps (N = 104 ). Top left: transfer entropy (TE); top right: extended Granger causality index (EGCI); center left: nonlinear Granger causality (NLGC); center right: predictability improvement (PI); bottom left: similarity index I (SI-1); bottom right: similarity index II (SI-2). The horizontal axis denotes the coupling cxy of map X on map Y. The vertical axis denotes the coupling cyx of map Y on map X. Red colors indicate the information flow from X to Y, and blue colors indicate the opposite flow. Outliers rejected due to computational problems are in white. For choice of parameters, see Table 6.
43
Figure 7: Measures of causality for bidirectionally coupled nonidentical Henon maps (N = 104 ). Top left: transfer entropy (TE); top right: extended Granger causality index (EGCI); center left: nonlinear Granger causality (NLGC); center right: predictability improvement (PI); bottom left: similarity index I (SI-1); bottom right: similarity index II (SI-2). The horizontal axis denotes the coupling cxy of map X on map Y. The vertical axis denotes the coupling cyx of map Y on map X. Red colors indicate that the information flows from X to Y, and blue colors indicate the opposite flow. Outliers rejected due to computational problems are in white. For choice of parameters, see Table 6.
44