International Journal of Bifurcation and Chaos, Vol. 14, No. 2 (2004) 773–791 c World Scientific Publishing Company
BLIND SOURCE SEPARATION TECHNIQUES FOR DECOMPOSING EVENT-RELATED BRAIN SIGNALS ∗,† ¨ KLAUS-ROBERT MULLER Department of Computer Science, University of Potsdam, August-Bebel-Strasse 89, 14482 Potsdam, Germany
[email protected] † ´ RICARDO VIGARIO Neural Networks Research Centre, HUT, P.O. Box 5400, FIN-02015 HUT, Finland
[email protected] FRANK MEINECKE† Department of Physics, University of Potsdam, Am Neuen Palais 10, 14469 Potsdam, Germany
[email protected] † Fraunhofer
ANDREAS ZIEHE FIRST.IDA, Kekul´estr. 7, 12489 Berlin, Germany
[email protected] Received May 15, 2002; Revised November 1, 2002 Recently blind source separation (BSS) methods have been highly successful when applied to biomedical data. This paper reviews the concept of BSS and demonstrates its usefulness in the context of event-related MEG measurements. In a first experiment we apply BSS to artifact identification of raw MEG data and discuss how the quality of the resulting independent component projections can be evaluated. The second part of our study considers averaged data of event-related magnetic fields. Here, it is particularly important to monitor and thus avoid possible overfitting due to limited sample size. A stability assessment of the BSS decomposition allows to solve this task and an additional grouping of the BSS components reveals interesting structure, that could ultimately be used for gaining a better physiological modeling of the data. Keywords: Blind Source Separation (BSS); Independent Component Analysis (ICA); temporal decorrelation; TDSEP; high order statistics; MEG; evoked responses; reliability; bootstrap.
robust mathematical algorithms to extract meaningful features of interest, while suppressing disturbances due to technical or physiological noise sources. Latent variable models combined with projection techniques are good candidates to achieve this goal. Well-known approaches of this kind are
1. Introduction Recent advances in brain imaging techniques allow to monitor the active brain noninvasively at high spatial and temporal resolution. Processing this vast amount of data calls for efficient and ∗
Author for correspondence. 773
774
K.-R. M¨ uller et al.
e.g. principal component analysis (PCA), projection pursuit (PP) or factor analysis (FA); In this paper we focus on a similar concept known as blind source separation [Jutten & Herault, 1991; Comon, 1994; Hyv¨arinen et al., 2001]. This method has proven its success in many application areas including e.g. hyperspectral data analysis [Parra et al., 2000], separation of acoustic signals [Lee et al., 1997; Lee et al., 1998; Parra & Spence, 2000; Murata et al., 2001] and the so far most successful application is biomedical data like Electroencephalograms (EEG) [Makeig et al., 1997; Vigario, 1997; Barros et al., 2000], Magnetoencephalograms (MEG) [Vig´ario, 1999; Vig´ario et al., 2000; W¨ ubbeler et al., 2000; Ziehe et al., 2001; Sander et al., 2002], Magnetoneurograms (MNG) [Ziehe et al., 2000a], functional magnetic resonance imaging (fMRI) [McKeown et al., 1998; Biswal & Ulmer, 1999; Petersen et al., 2000] or optical (calcium) imaging [Stetter et al., 2000] (see also [Hyv¨arinen et al., 2001] for further applications). However, as for any unsupervised method, care has to be taken to evaluate the results of such an analysis, especially when studying large complex systems like the human brain. In this paper we briefly review1 the BBS concept and discuss its practical use in two biomedical applications: (a) artifact identification in raw MEG measurements and (b) analysis of event-related averaged MEG data. In Sec. 2 we present a short description of the blind source separation (BSS) problem. The independent component analysis (ICA) theory is then introduced, together with some popular ICA algorithms. Section 3.2 explains a further interesting type of BSS algorithm which uses the temporal structure in the data for unmixing. In Sec. 4 we discuss built-in limits of BSS algorithms (assumptions and statistical overlearning problems) and subsequently compare ICA and PCA briefly. Since BSS is an unsupervised algorithm where one rarely has access to the ground truth, it is mandatory to provide self-supervising techniques that allow to quantify the reliability of the resulting projections. After this mathematical and algorithmic preparation we will address the above mentioned two biomedical applications involving BSS: (a) artifact 1
identification in Sec. 6 which demonstrates the separation quality that one can achieve for a MEG data set, where typical artifacts have been produced deliberately and (b) decomposition of event-related activity presented in Sec. 7. We conclude with a brief discussion.
2. The Blind Source Separation Model In many signal processing applications one has only access to measurements of mixed, i.e. superimposed signals and the question is how to construct suitable projections that allow to demix and thus find the underlying (unmixed) signals of interest. Blind source separation (BSS) techniques aim exactly to reveal unknown sources using two ingredients (a) a model about the mixing process (typically a linear superposition cf. Eq. (1)) and (b) the assumption of statistical independence. As opposed to other signal processing techniques like beamforming (e.g. [Veen & Buckley, 1988]), BSS uses no geometrical information about the sensor array or the underlying sources, therefore BSS is called “blind”. Let us assume that we measure a multivariate time-series and at time instant k, the observed n-dimensional data vector, x(k) = [x 1 (k), . . . , xn (k)]T is given by the superposition x(k) =
m X
aj sj (k) = As(k) .
(1)
j=1
As stated above the source signals, s 1 (k), . . . , sm (k), are unknown, along with the coefficients of the mixing process (matrix A = [a1 , . . . , am ]). The goal is therefore to estimate both unknowns from the measurements x(k) and in principle all there is to do is to invert the mixing process y(k) = ˆs(k) = Bx(k) = BAs(k) ,
(2)
where B ≈ A−1 is called the separating matrix.2 The general BSS problem requires A to be an n×m matrix of full rank, with n ≥ m (i.e. there are at least as many mixtures as independent sources). In most algorithmic derivations, an equal number of sources and sensors is assumed. In the model summarized by Eq. (1), and schematically illustrated in Fig. 1 we neglect
Note that due to the reviewing nature of this paper previously published material is used (cf. [Vig´ ario, 1999; Vig´ ario et al., 2000; Vig´ ario & Oja, 2000; Meinecke et al., 2002; Vig´ ario et al., 2002]). 2 Unmixing is unique up to scaling and permutation, i.e. BA = PD, where D is a diagonal matrix and P is a permutation matrix.
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
s
A Mixing
y = ^s
v
x
B
V
WT Separation
Fig. 1. Schematical illustration of the mathematical model used to perform the ICA decomposition.
any noise contribution. Some approaches to take additive noise into account can be found in [Hyv¨arinen, 1999; Vig´ario, 1999; M¨ uller et al., 1999; Kawanabe & Murata, 2000; S¨arel¨a et al., 2001; Hyv¨arinen et al., 2001]. Note that for decomposing brain data both the estimates of underlying source signals and the mixing matrix provide important information as the former reveals the temporal dynamics of a component (corresponding to a particular brain source), whereas the latter will give the corresponding spatial (stationary) field patterns, facilitating localization of the respective cortex activation.
3. Blind Source Separation Methods A powerful method to perform BSS is independent component analysis (ICA). The literature on that subject is large and steadily growing (see [Comon, 1994; Cardoso, 1998, 1999; Roberts & Everson, 2001; Hyv¨arinen et al., 2001] for reviews). Most of those methods try to estimate the mixing matrix A, respectively its inverse B, by optimizing an appropriate cost-function which measures the degree of statistical independence among the signal components. A necessary condition for statistical independence is uncorrelatedness, therefore most algorithms enforce the data to be uncorrelated and unit-variance by a whitening (or sphering) transformation. This whitening step may also include a data compression which can be helpful to reduce/avoid overlearning (overfitting) effects that occur due to the limited number of sample points [S¨arel¨a & Vig´ario, 2001]. Whitening may be accomplished by PCA projection: v(k) = Vx(k), with E{v(k)v(k)T } = I. The whitening matrix V is given by V = Λ−1/2 UT , where Λ = diag[λ(1), . . . , λ(m)] is a diagonal matrix with the eigenvalues of the data covariance matrix E{x(k)x(k) T }, and U is a matrix with the corresponding eigenvectors as its
775
columns. In terms of v(k), the model (1) becomes v(k) = VAs(k), and it is clear that W = VA is an orthogonal matrix, since otherwise the whitening condition would be violated [Hyv¨arinen et al., 2001]. Therefore, the solution is now sought in the form: ˆs(k) = WT v(k) .
(3)
Even if the whitening itself is not enough to get statistical independence (except for Gaussian signals), the problem has now been reduced to the determination of an orthogonal transformation (rotation). One possibility to get additional information to determine this “missing” rotation would be to take the higher-order statistics (HOS) into account and enforce independence by zeroing all higher-order cross-moments [Comon, 1994; Cardoso, 1999].
3.1. Exploiting higher-order statistics 3.1.1. Gradient-based techniques Many popular algorithms for ICA are gradientbased techniques that optimize a contrast function which measures the statistical independence or the deviation from a Gaussian distribution — using the fact that a mixture of independent non-Gaussian sources becomes more Gaussian. For example the FastICA algorithm [Hyv¨arinen & Oja, 1997] uses the fourth-order cumulant — also called the kurtosis — as a measure of nonGaussianity. The kurtosis of the ith source signal is defined as kurt(si ) = E{s4i } − 3[E{s2i }]2 , where E{·} denotes the mathematical expectation value of the bracketed quantity. The kurtosis is negative for source signals whose amplitudes have subGaussian probability densities (distributions flatter than Gaussian), positive for super-Gaussian (sharper than Gaussian, and with longer tails), and zero for Gaussian densities. Maximizing the norm of the kurtosis leads to the identification of nonGaussian sources. Consider a linear combination y = wT v of the white random vector v, with kwk = 1. Then E{y 2 } = 1 and kurt(y) = E{y 4 }−3. Then the gradient of the kurtosis with respect to w is 4E{v(wT v)3 }. FastICA is a deflation algorithm that minimizes the kurtosis by finding one of the columns of the separating matrix W (noted w) at a time and thus
776
K.-R. M¨ uller et al.
identifying the respective independent source. 3 In the lth iteration FastICA computes T v)3 } − 3w wl∗ = E{v(wl−1 l−1 ∗ ∗ wl = wl /kwl k .
(4)
In order to estimate up to m components, the algorithm is run repeatedly removing the information contained in components that were already extracted (for matlab code see http://www.cis.hut.fi/ projects/ica/fastica/ and cf. [Hyv¨arinen & Oja, 1997; Vig´ario, 1999; Hyv¨arinen et al., 2001] for further details). In general, gradient-based ICA algorithms consider cost functions of the form J(B) = E{G(y)} where G is some nonlinear function and determine the solution by iterative updates of B using ∂J(B) = E{g(y)xT } , ∂B where g(y) is the gradient of G(y) and y = Bx. Assuming further that B is invertible, we know that x = B−1 y and obtain ∂J(B) = E{g(y)yT }(B T )−1 . ∂B Provided that the nonlinear functions G(y) are suitably chosen, the algorithm converges to a stationary point, where the components of y are mutually independent [Amari et al., 1996]. A MatlabT M package that implements algorithms of this type (e.g. INFOMAX [Bell & Sejnowski, 1995]) can be found at: http://www.sccn.ucsd.edu/∼scott/ica-downloadform.html
3.1.2. JADE In contrast to these gradient-based techniques there are also algebraic methods for BSS. For example, Cardoso et al. developed the JADE [Cardoso & Souloumiac, 1993] algorithm based on the joint diagonalization of matrices. After whitening and possible dimension reduction, a set of matrices obtained from “parallel slices” (Eigen matrices) of the fourth-order cumulant tensor is approximately diagonalized with a single orthogonal transformation [Cardoso & Souloumiac, 1993; Comon, 1994]. Usually JADE provides excellent results on low dimensional data if sufficiently many sample points 3
are available. However, for high dimensional problems like MEG the effort for storing and processing the fourth-order cumulants is O(m4 ) which may be prohibitively heavy. Therefore for such applications JADE needs to project down to a lower dimensional subspace. A MatlabT M package that performs the JADE can be found in the address: ftp://sig.enst.fr/pub/jfc/Algo/Jade
3.2. Exploiting time structure In the above ICA approaches to BSS the signals were assumed to be identically independently distributed (i.i.d.). However, many “natural” signals, like speech signals or in particular, EEG and MEG signals are characterized by their rich dynamical time structure. Therefore, it seems natural to directly exploit temporal, resp. spectral information, for blind source separation. An attractive algorithmic framework to deal with this problem is — similar to the JADE case above — the simultaneous diagonalization of appropriately defined matrices by algebraic methods (cf. [Tong et al., 1991; Molgedey & Schuster, 1994; Belouchrani et al., 1997; Pham & Garat, 1997; Ziehe & M¨ uller, 1998; Wu & Principe, 1999; Ziehe et al., 2000b]). In particular, the TDSEP algorithm [Ziehe & M¨ uller, 1998] finds an estimate of the mixing matrix A by simultaneously diagonalizing a set of several symmetrized time-lagged correlation matrices defined as 1 X (x(t)xT (t − τ ) + x(t − τ )xT (t)) C(x, τ ) = 2T Under the assumption that the source signals s(t) are mutually uncorrelated (over time) and possess a pronounced auto-correlation [Ziehe & M¨ uller, 1998], this solves the BSS problem since the covariance matrices of the source signals are transformed according to C(x, τ ) = AC(s, τ )AT . This algorithm can also be seen as an efficient way to minimize the cost function XX J(B) = (Cij (Bx, τ ))2 , (5) τ
i6=j
that measures the cross-correlation of the outputs y = Bx for several time-lags τ . Note that though in theory the quality of source separation depends on
The corresponding independent source signal can then be found using Eq. (3).
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
the selection of the lag parameters τ which should ideally be adapted to the auto-correlation structure of the signals; in practice, however, it turns out that TDSEP is quite robust w.r.t. the choice of these parameters, as long as sufficiently many matrices are used for simultaneous diagonalization (for details see [M¨ uller et al., 1999]). As with the ICA methods presented in Sec. 3.1.1, the simultaneous diagonalization procedure that is employed here performs first a sphering or whitening step followed by a rotation. The determination of the rotation matrix W is achieved by the extended Jacobi-type method by Cardoso and Comon [1996], applied to the set of time-delayed correlation matrices. This is in analogy to the simultaneous diagonalization of eigenmatrices of the cumulant tensor in JADE and in fact both algorithms can be combined in a straight-forward manner and by this enlarging the set of matrices to be diagonalized (cf. [M¨ uller et al., 1999]). From the computational point of view TDSEP is highly efficient and robust, because the temporal structure is used and thus higher order moments do not need to be estimated or stored. Therefore TDSEP is particularly well suited for the processing of high-dimensional multichannel data typically encountered in physiological recordings. A Matlab T M implementation of this algorithm is available (see http://www.first.fhg.de/∼ziehe/research.html).
4. Limits and Problems of Source Separation Algorithms 4.1. How good is the underlying model assumption Although BSS uses a “weak” model it must be emphasized that it can only yield results within its model class. This is the same for all unsupervised learning algorithms: if we run PCA (see e.g. [Duda et al., 2001]) then all we can expect is an orthogonal decomposition result that describes the data optimal in the least squares sense (i.e. orthogonal directions of the data cloud with maximal variance or energy are found). This however does not mean that PCA provides a good basis for understanding the data, if say a nonorthogonal transformation into an ICA basis would have been more appropriate to describe the intrinsic nature of the data. 4 4
777
It is instructive to explain the difference between PCA and ICA in Fig. 2, which show toy data constructed to make this difference clear. In Fig. 2(a) we see a scatterplot of the data points given by two mixed super-Gaussian sources. The projection of the data points onto the coordinate axes yields the observed signals. It can easily be seen that the two signals are not independent. Figure 2(b) shows this data set in the PCA basis (i.e. the eigenvector basis of the covariance matrix C(x).) Note, that PCA results in an orthogonal transformation, therefore Fig. 2(b) is a rotated version of Fig. 2(a). Clearly PCA does not provide a good basis for describing the data, since the signals are still mixed. Figures 2(c) and 2(d) show the two steps of source separation by JADE: first a whitening step rescales the data set such that the covariance matrix is the identity. The signals are now uncorrelated but not independent. A rotation then transforms the data set into the ICA basis. It is easy to see, that this basis provides an excellent basis to further understand the given data. The efficiency of high-order statistics to produce good ICA algorithms (the kurtosis based FastICA, JADE, . . .) comes with a price: a sensitivity to outliers.5 In fact, these often turn out to be the leading factors in the search for the underlying sources. Another limitation of the presented BSS approaches is that one can extract only up to as many underlying sources as the number of sensors. In the brain, however, the multitude of microscopic sources outnumbers by far the number of sensors. Nevertheless, it can be argued that the total number of macroscopically observable sources, active at a given time, is of a much smaller count.
4.2. The necessary amount of data: Overfitting When the number of samples of the observed data is insufficient to explain the high-dimensional independent source signals, we may encounter overlearning or overfitting effects in BSS algorithms. This result is true for practically any type of contrast-based linear ICA implementation [S¨arel¨a & Vig´ario, 2001]. In the extreme case of an equal
The same argument holds of course for nonlinear decompositions like kernel PCA [Sch¨ olkopf et al., 1998]. PCA is also nonrobust to outliers but the use of robust methods allows to obtain less sensitive PCA algorithms, e.g. [Huber, 1981]. 5
778
K.-R. M¨ uller et al.
(a)
(b)
(c)
(d)
Fig. 2. Schematical comparison between PCA and ICA. (a) Shows the observed (mixed) signals, (b) PCA does not provide a good basis for understanding the data, (c) whitening yields uncorrelated, but not independent signals, (d) the ICA basis describes the intrinsic nature of the data.
number of independent source signals, mixtures and sample sizes, it is shown that the optimal solution is the one that produces source signals that are zero almost everywhere except for a single spike or bump. In the referred communication, experimental evidence of this overfitting effect is given, using both simulated and real MEG data. Temporal decorrelation algorithms, such as the TDSEP, may suffer from a different type of overlearning, yielding a series of sinusoidal components of various frequency contents. Furthermore, channel noise, explicitly ignored in the model of Eq. (1), effectively doubles the number of independent sources, rendering the overlearning problem ever more present. In some applications, we may construct an approximate noise model. Projections to signal spaces orthogonal to the noise space can then be performed [Hyv¨arinen, 1998; M¨ uller et al., 1999; Hyv¨arinen et al., 2001; S¨arel¨a et al., 2001]. When utilizing algorithms requiring prewhitening, this preparatory stage can be used to reduce the dimensionality of the data. The compression reduces the overlearning effect, especially if the rejected components contain mainly noise (i.e. unnecessary information). Ultimately, a good reliability measure of the components found may help to determine whether we are in presence of clear overlearning, or of an overlearning-like “true” source. Furthermore, such measures could help determine whether the model is appropriate at all or which model, i.e. BSS algorithm is to be chosen. The first such measure that computes the stability under bootstrap resampling
was recently proposed by Meinecke et al. [2002] and will be discussed in more detail in the next section.
5. Reliability of ICA Projections 5.1. The Bootstrap resampling The reliability of ICA projections can be estimated by resampling techniques. Resampling is a statistical method which gives e.g. the variance of estimators only from one set of data x at hand by virtue of modern computer power. Among such procedures, the Jackknife and the Bootstrap are most well-known (see e.g. [Efron & Tibshirani, 1993; Shao & Tu, 1995]). The Jackknife produces surrogate data sets by deleting only one datum each time from the original data set. There are generalizations of this approach like the delete-k Jackknife which deletes more than one datum at a time. The bootstrap is a more general approach and has been widely used in data analysis recently. Let us consider the standard case where a sample x1 , . . . , xT with i.i.d. observations from a distribution F is given. We will write the data by the vector x = (x1 , . . . , xT ). A scalar parameter θ is estimated with an estimator θˆi (x). To get some information about the accurancy of this estimate we build B bootstrap samples x ∗b = ∗b (x∗b 1 , . . . , xT ) with b = 1, . . . , B by drawing T data points with replacement from the given sample. We remark that some data points might occur several times, while others do not occur at all in a particular bootstrap sample. On each surrogate x∗b , the estimator θˆi∗b = θˆi (x∗b ) is calculated,
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
so we have B estimators6 θˆi∗1 , . . . , θˆi∗B . We can now for example estimate the variance of θˆi by calculating the variance of the bootstrap estimators θˆi∗ . There is a wide literature on statistical properties of the Bootstrap and its extentions, which supports the use of resampling procedures. For example, it can be shown that for i.i.d. data the bootstrap estimators of the distributions of many commonly used statistics are consistent [Shao & Tu, 1995].
5.2. Resampling for BSS/ICA In the ICA-case the parameters to be estimated are the one-dimensional source signal subspaces (since the mixing matrix is defined only up to a free scale and permutation) and a natural distance measure between two such subspaces is the angle difference between them. It is straightforward to apply bootstrap resampling for i.i.d. data and for algorithms that do not use temporal structure. Less obvious is to construct a time structure preserving resampling. A simple bootstrap approach would clearly destroy temporal structure, but it can be generalized such that methods like TDSEP that use temporal correlations are still applicable. Consider a time series of length T . The P bootstrap resampling defines a series {a t } with at = T and each at0 tells how often the data point x(t0 ) has been drawn. Using this, we can calculate the resampled time-lagged correlation matrices as 1 X ∗ at · [xi (t)xj (t − τ ) (τ ) = Cˆij 2T t + xi (t − τ )xj (t)] .
(6)
In the general, non-i.i.d. case, this resampling procedure cannot be proven to provide consistent estimators like the simple i.i.d. Bootstrap, however the deviation from the optimal estimator can be bounded [Meinecke et al., 2002].7 In this paper, we will use a resampling procedure, that allows to detect even possible multidimensional independent components (or clusters of signals that cannot reliably be separated one from another) (see [Meinecke et al., 2002] 7 for details). In our experiments we will use the jointdiagonalization algorithms JADE and TDSEP and produce our surrogate data by Bootstrap resampling or Eq. (6), respectively. An important and
779
useful feature is that this resampling is done after a prior source separation, so the bootstrap estimators of the mixing matrix on this (already separated) ˆ ∗b , i.e. orthogodata set will be small rotations R nal matrices in the vicinity of the identity matrix. This prior separation of the sources corresponds to a change of coordinate system, and thus does not affect the joint approximate diagonalization carried out later on the bootstrap replicates. The crucial idea on how to find stable higher dimensional source signal subspaces is to calculate not the overall rotation for each projection direction, but to decompose each rotation into N (N −1)/2 elementary rotations within all two-dimensional planes spanned by the coordinate axes. This can be carried out by taking the matrix logarithm ˆ ∗b ) . ˆ ∗b = ln(R α (7) Here, each component αij of α is the angle of a rotation in the i–j-plane (see Appendix). If we now calculate the variance of the α-matrix component-wise, we obtain a separability matrix v u B u1 X 2 ˆ (ˆ α∗b (8) Sij = t ij ) . B b=1
The component Sˆij of this matrix measures how unstable the BSS solution is with respect to a rotation in the i–j-plane, i.e. how reliable the respective one-dimensional subspaces i and j can be separated from each other. Note that a low Sˆij corresponds to good separability. If the used BSS algorithm was successful in separating the independent subspaces, the separability matrix should have a block structure that groups together one-dimensional ICA projections that cannot be reliably separated from each other but form a multidimensional stable subspace. Therefore this method allows to reveal structural dependencies which provides highly important and relevant information for subsequent biomedical interpretation.
6. Artifact Contaminated MEG Data 6.1. Description of the data As a first example, we illustrate the efficient analysis of artifact contaminated MEG data with BSS
ˆ These estimators from the bootstrap samples are called the bootstrap replication of θ. Resampling Techniken f¨ ur ICA und IHRE Anwendungen in der Biomedizinischen Datenanalyse by Frank Meinecke, Diploma Thesis, University of Potsdam, May 2003 (in German). 6
7
780
K.-R. M¨ uller et al.
SCALES 3
4
5
5
6
6
1
2
MEG
SENSOR POSITIONS saccades
blinking
1000 fT/cm 10 s
biting
MEG 1 1 2 2 3 3 4 4 5 5 6 6
Fig. 3. A sample of the 122-channel MEG recordings. The time courses of the magnetic fields measured by sensors at the indicated positions in two orthogonal directions are plotted (from [Vigario et al., 2000]).
algorithms. The data under study has been recorded at the Brain Research Unit of the Low Temperature Laboratory of the Helsinki University of Technology, in a magnetically shielded room using a 122channel whole-scalp Neuromag-122 SQUID magnetometer [H¨am¨al¨ainen et al., 1993]. The measured subject was asked to deliberately blink and make horizontal saccades, in order to produce typical ocular artifacts. Moreover, to produce myographic artifacts, the subject was asked to bite his teeth for as long as 20 seconds. Yet another artifact was created by placing a digital watch one meter away from the helmet into the shielded room. Further information on the data can be found in [Vig´ario et al., 1998]. Figure 3 presents a sample of the measured signals, with a duration of approximately one minute
(i.e. 8000 samples), showing brain activity blurred by the aforementioned artifacts.
6.2. Results In earlier studies, FastICA as well as Topographic ICA have been applied to the data and results can be found in [Vig´ario et al., 1998; Hyv¨arinen et al., 2001]. In the experiments reported here, we used the source separation algorithms JADE and TDSEP (with τ = {0, 1, . . . , 20}). In order to mitigate the overlearning effect, we applied the BSS not to all 122 signals, but to the first 20 principal components. Furthermore, following the recommendation in [S¨arel¨a & Vig´ario, 2001], the data was high-pass filtered with a cut-off frequency of 0.5 Hz (using again the same setting as in [Hyv¨arinen et al., 2001]).
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
Estimated Independent Component No.
Estimated source signals
Separability Matrix
2
2
4
4
6
6
8
8
10
10
12
12
14
14
16
16
18
18
20
20 10s
20s
30s
40s
50s
781
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05
5
10
15
20
0
Fig. 4. JADE decomposition of the artifact time series. 11: horizontal eye movements, 12: vertical eye movements (blink), 13: cardiac cycle, 14–20: muscle activity (biting).
Fig. 5.
Field patterns found for a set of components identified by JADE on artifacts.
Figure 4 shows the JADE decomposition. The separability matrix, on the right, highlights the existence of three reliable one-dimensional independent components. The temporal structure of these components allows us to give them clear physical interpretations: while signals 11 and 12 capture the horizontal and vertical eye movements (or blinks),
component 13 reflects mainly the cardiac contamination present in the MEG recordings. The block of independent components 14–20 corresponds to muscle activity. Hence, it can be said that the muscle artifacts need, at least, a sixdimensional space to fully explain their contribution in this data set. These results agree and corroborate
782
K.-R. M¨ uller et al.
the findings of topographic ICA [Hyv¨arinen et al., 2001]. The signal of the digital watch, completely invisible in the raw data, now appears concentrated in component 5. Figure 5 shows the front, left, top and right views of the three one-dimensional artifact components found by the JADE algorithm (cf. Fig. 4). The
top set corresponds to the horizontal eye activity. The middle one is characteristic for the vertical and blink eye artifacts, whereas the last row agrees with the expected pattern of the cardiac signal [Jousm¨aki & Hari, 1996; Vig´ario et al., 1998]. In Fig. 6 we see the TDSEP decomposition of the same data. Here the separability matrix reveals that TDSEP is able to find more independent
Estimated Independent Component No.
Estimated source signals
Separability Matrix
2
2
4
4
6
6
8
8
10
10
12
12
14
14
16
16
18
18
20
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05
20 10s
20s
30s
40s
50s
5
10
15
20
0
Fig. 6. TDSEP decomposition of the artifact time series. 1: horizontal eye movements, 2: vertical eye movements (blink), 3: watch, 12: cardiac cycle, 15–20: muscle activity.
First two principal components (eye movement artifacts) (a)
(b)
PC 2
1
2
10s
20s
30s
40s
50s
PC 1
Independent components (JADE) that capture eye movement artifacts (c)
(d)
IC 12
11
12
10s
20s
30s
40s
50s
IC 11
Fig. 7. (a) The first two principal components display eye artifacts. (b) A better representation of these artifacts is found in an ICA basis (JADE).
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
components with one-dimensional reliability than JADE. Among the estimated source signals with clear interpretations are both eye artifacts (Nos. 1 and 2), the digital watch (No. 3) and the cardiac cycle (No. 12). Note that, once again, the muscle activity is grouped as a six-dimensional block of components 14–20. Because of the strong historical and methodological relations that exist between PCA and ICA, an exemplary comparison of the separations achieved by such techniques is shown in Fig. 7. The upper panel ((a) and (b)) shows the time courses and a scatterplot of the strongest two principal components. Coincidently, they capture mainly the eye movement artifacts. For a good comparison, the lower panel ((c) and (d)) displays the time courses and scatterplot of the corresponding independent components, found by JADE. A very significant improvement of the signal-to-noise ratio is observed in the ICA decomposition. Several blinks became now visible, which were buried under muscle activity and other sources of noise. Although not shown here, these denoising/separation differences are even more prominent for the weaker signals of the mixture (e.g. the watch): Here PCA fails completely in finding them, since it looks only at the power of the signals, while ICA is able to find reasonable good source reconstruction.
7. Analysis of Multimodal Evoked Fields In a second example, we analyze event-related MEG recordings, averaged over many trials. While this is needed to improve the signal-to-noise ratio, the effective number of data points available for BSS is considerably decreased. Therefore, this dataset is suited to study overlearning behavior that can occur when one imprudently applies the BSS methods presented in this paper. Furthermore, we will show how the unreliable nature of these behaviors can be used to our advantage. The ultimate goal is again to derive good strategies to identify interpretable components of brain responses.
7.1. Description of the data The data consists of averaged responses of a single subject to vibrotactile stimulation, generated with a bass-reflex loudspeaker and a tube delivering the tone into the magnetically shielded room. The tube
783
was coupled to a balloon which was held by the subject with both hands. Both tactile and the concomitant auditory stimuli were thus present (further details on the stimuli and the setup can be found in [Jousm¨aki & Hari, 1999]; an application of ICA to this data can be found in [Vig´ario et al., 2000], and references therein). Figure 8 shows the complete set of 122 averaged evoked responses. The two inserts in the figure indicate roughly the shape of signals of interest present in the measurements. The left one shows a sharp and early response, originated in the primary somatosensory cortex while the right insert has a broader signal, with longer latency than the previous one, which is associated with the primary auditory responses.
7.2. Results The first results we show are left here as a “word of warning” against overlearning. In fact, we used the same strategy for this data as in the previous section, reducing the dimension from 122 to 20. Yet, the amount of available samples is now much smaller. Therefore, strong overlearning is produced by both JADE and TDSEP (see Figs. 9 and 10). Higher-order statistics methods, such as JADE, typically produce bumpy components, whereas the overlearning regime of TDSEP is characterized by sinusodial source estimates [S¨arel¨a & Vig´ario, 2001; Ziehe & M¨ uller, 1998]. Note that nearly no overlearned component show a good degree of reliability. The exceptions are the 13th JADE component, as well as the last three TDSEP components. The later ones, with a frequency content of 100 Hz and 50 Hz, are in fact reasonable components for this study, as they take into account the powerline artifacts, present on the whole data. For the JADE overlearn, it will become clear, from the subsequent analysis, that it is a simplified version of a reliable brain response to the vibrotactile stimulation. A more reasonable data compression is achieved when retaining only the strongest seven principal components. Figure 11 shows the result of the application of JADE to such compressed data. Here we see that only the first three components are clearly reliable. The first two, with evident blocking, peak around 100 ms after the onset of the vibrotactile stimulation, and are therefore expected to be mainly the response of the brain to the auditory stimulus. The third has a much shorter duration
784
K.-R. M¨ uller et al.
Original Averages
Fig. 8.
Averaged responses to vibrotactile stimuli measured with a 122 channel MEG system (from [Vigario et al., 2000]).
Estimated source signals
Separability Matrix
2
2
4
4
6
6
8
8
10
10
12
12
14
14
16
16
18
18
20
0.4
0.2
20 0ms 200ms 400ms Fig. 9.
5
10
15
20
TDSEP decomposition of the vibrotactile data in an overlearning regime.
0
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
Estimated source signals
Separability Matrix
2
2
4
4
6
6
8
8
10
10
12
12
14
14
16
16
18
18
20
20 5
10
15
20
0
JADE decomposition of the vibrotactile data in an overlearning regime.
Estimated source signals
Separability Matrix
1
1
2
2
3
3
4
4
5
5
6
6
7
7 0ms 200ms 400ms
Fig. 11.
0.4
0.2
0ms 200ms 400ms Fig. 10.
785
0.4
0.2
1
2
3
4
5
6
7
0
JADE decomposition of the vibrotactile data, after compression to the seven strongest principal components.
and latency, very typical for somatosensory evoked responses (for further considerations on these matters, please refer to [Jousm¨aki & Hari, 1999; Vig´ario & Oja, 2000]).
Figures 12 and 13 show the mixing vectors corresponding to the first two JADE ICs, disposed as three-dimensional field patterns. Several views of these patterns are supplied to simplify its
786
K.-R. M¨ uller et al.
Fig. 12. Field patterns of the first JADE component. Auditory response clear on both hemispheres, although showing some predominance to the left one.
inspection. Furthermore, the localization of equivalent current dipoles (ECDs), explaining best the aforementioned patterns, is marked over MRI scans of the measured subject. It is clearly visible that those dipoles exist within the sylvian fissure, as should be expected for such early auditory brain responses. Figure 14 presents similar plottings as the previous ones, now for the third component. Note that the field patterns have a very clear dipolar structure, and that the ECDs that best fit these patterns fall on the expected primary somatosensory cortex area. Once again, these results agree with the ones observed in [Jousm¨aki & Hari, 1999; Vig´ario & Oja, 2000]. According to our prior knowledge we expect, that the components of interest in this vibrotactile evoked response study are temporal and very focal, which results in a high sparsity. Furthermore,
due to the averaging, the number of data point becomes very small and almost all periodicity of the data is lost. In such cases we anticipate that HOS based methods like JADE are much better qualified to separate the signals than temporal decorrelation methods like TDSEP. In fact, Fig. 15 shows that the only somewhat reliable components are the 6th and the 7th, which could be seen as depicting trends. The signals of interest, coincidently still the first three, are not clearly separated from each other, as can be seen from the blocking effect of the separability matrix. Furthermore, it seems evident that the first TDSEP component, in an attempt to produce its preferred periodic solutions, accounts for most of the information contained in the first and third JADE components, i.e. the somatosensory and some of the auditory responses. The field patterns of Fig. 16 clearly show contributions from both signals, i.e. the left and top
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
Fig. 13.
787
Field patterns of the second JADE component. Mainly the right-hand side component of the auditory brain response.
Fig. 14.
Field patterns of the third JADE component, corresponding to the somatosensory response.
788
K.-R. M¨ uller et al. Estimated source signals 1
1
2
2
3
3
4
4
5
5
6
6
7
7 0ms 200ms 400ms
Fig. 15.
Separability Matrix
0.4
0.2
1
2
3
4
5
6
7
0
TDSEP decomposition of the vibrotactile data, after compression to the seven strongest principal components.
Fig. 16. Field pattern of the first TDSEP component, accounting for most of the somatosensory as well as some auditory brain responses, i.e. auditory and somatosensory signal could not be separated.
views show a pattern that is clearly related to the somatosensory response of the left hemisphere. Yet, the very strong auditory response, illustrated on the right view of the field pattern, overshadows the right somatosensory response. Clearly the reliability analysis would suggest to use JADE over TDSEP for this particular application.
8. Concluding Remarks What makes blind source separation an appealing method for the analysis of neurobiological data, is that it uses a “weak” model, i.e. statistical independence (e.g. JADE) or independence over time (e.g. TDSEP) respectively. Although no strong model (such as the ones based on physiological
models) is imposed on the data, note that ICA algorithms still extract components, which have very good neurophysiological plausibility. So far, we could count on a few measures to assess the good quality of our BSS decomposition of neurobiological recordings, such as the inspection of the time/frequency courses of the components and their respective field patterns. Now, from a signal processing viewpoint, we introduce an additional unsupervised evaluation method for the goodness of the analyses: the resampling-based reliability measure. The basis for this assessment is the concept of stability: small distortions due to resampling should not change the projections found. If they change, then this is a good indicator that the underlying assumption of the BSS algorithm has failed.
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
In the reported experiments, we have used raw and averaged biomagnetic brain signals and demonstrated the merits and pitfalls of different BSS approaches. Special emphasis was given to the validation of the ICA model. Interestingly, we find that the reliability method can detect solutions that are due to overfitting, and thus allows algorithm and even hyperparameter selection in the unsupervised learning scenario of BSS. ICA was able to isolate artifacts in MEG. In the event-related study, ICA differentiated efficiently between somatosensory and auditory brain responses in the case of combined auditory and vibrotactile stimulation. In addition, the independent components, found with no other modeling assumption than their statistical independence (or independence over time), exhibit field patterns in good agreement with the conventional current dipole models. The equivalent current dipoles corresponding to the independent components are located in brain regions expected to be activated by the respective stimuli. Future research will be devoted to incorporating prior knowledge into BSS models. In some preliminary studies, we have observed that, as we depart from purely statistically based assumptions, we might get closer to physiologically more plausible decompositions of electromagnetic brain signals [Vig´ario, 2000; Vig´ario et al., 2000; S¨arel¨a et al., 2001]. On the other hand, fitting neural sources in a classical framework, may be hard if some temporal overlap is present in their activations. A well balanced use of both the model and appropriate priors is therefore needed in order to ultimately obtain a good exploratory decomposition technique for the application at hand.
Acknowledgments Valuable discussions with M. Kawanabe, C. Sch¨afer, J. Kohlmorgen, G. Nolte, B. Blankertz, G. Curio, J. S¨arel¨a, Harri Valpola and participants of the Potsdam workshop on ERPs 2001 are gratefully acknowledged. The authors thank the Low Temperature Laboratory at HUTs for the permission to use their data in this work. R. Vig´ario was funded by EU (Marie Curie fellowship HPMF-CT2000-00813). K.-R. M¨ uller and A. Ziehe were partly funded by EU (IST-1999-14190 – BLISS) and the BMBF in the BCI project.
789
References Amari, S., Cichocki, A. & Yang, H. [1996] “A new learning algorithm for blind source separation,” in Advances in Neural Information Processing 8 (Proc. NIPS’95), eds. Touretzky, D. S., Mozer, M. C. & Hasselmo, M. E. (MIT Press, Cambridge, MA), pp. 757–763. Attias, H. [1998] “Independent factor analysis,” Neural Comput. 11, 803–851. Barros, A. K., Vig´ ario, R., Jousm¨ aki, V. & Ohnishi, N. [2000] “Extraction of periodic signals from multichannel bioelectrical measurements,” IEEE Trans. Biomed. Eng. 47, 583–588. Bell, A. & Sejnowski, T. [1995] “An informationmaximization approach to blind separation and blind deconvolution,” Neural Comput. 7, 1129–1159. Belouchrani, A., Abed Meraim, K., Cardoso, J.-F. & Moulines, E. [1997] “A blind source separation technique based on second order statistics,” IEEE Trans. SP 45, 434–444. Biswal, B. B. & Ulmer, J. L. [1999] “Blind source separation of multiple signal sources of fMRI data sets using independent component analysis,” J. Comput. Assist. Tomogr. 23, 265–271. Cardoso, J.-F. & Souloumiac, A. [1993] “Blind beamforming for non Gaussian signals,” IEE Proc.-F 140, 362–370. Cardoso, J.-F. & Comon, P. [1996] “Independent component analysis, a survey of some algebraic methods,” Proc. ISCAS’96, pp. 93–96. Cardoso, J.-F. [1998] “Blind signal separation: Statistical principles,” Proc. IEEE 86, 2009–2025. Cardoso, J.-F. [1999] “High-order contrasts for independent component analysis,” Neural Comput. 11, 157–192. Comon, P. [1994] “Independent component analysis — a new concept?” Sign. Process. 36, 287–314. Duda, R., Hart, P. E. & Stork, D. G. [2001] Pattern Classification, 2nd edition (John Wiley). Efron, B. & Tibshirani, R. J. [1993] An Introduction to the Bootstrap, 1st edition (Chapman and Hall). H¨ am¨ al¨ ainen, M., Hari, R., Ilmoniemi, R., Knuutila, J. & Lounasmaa, O. [1993] “Magnetoencephalography — theory, instrumentation, and applications to noninvasive studies of the working human brain,” Rev. Mod. Phys. 65, 413–497. Huber, P. J. [1981] Robust Statistics (John Wiley, NY). Hyv¨ arinen, A. & Oja, E. [1997] “A fast fixed-point algorithm for independent component analysis,” Neural Comput. 9, 1483–1492. Hyv¨ arinen, A. [1998] “Independent component analysis in the presence of Gaussian noise by maximizing joint likelihood,” Neurocomputing 22, 49–67. Hyv¨ arinen, A. [1999] “Gaussian moments for noisy independent component analysis,” IEEE Sign. Process. Lett. 6, 145–147.
790
K.-R. M¨ uller et al.
Hyv¨ arinen, A., Hoyer, P. O. & Inki, M. [2001] “Topographic independent component analysis,” Neural Comput. 13, 1527–1558. Hyv¨ arinen, A., Karhunen, J. & Oja, E. [2001] Independent Component Analysis (Wiley-Interscience). Jousm¨ aki, V. & Hari, R. [1996] “Cardiac artifacts in magnetoencephalogram,” J. Clin. Neurophysiol. 13, 172–176. Jousm¨ aki, V. & Hari, R. [1999] “Somatosensory evoked fields to large-area vibrotactile stimuli,” Clin. Neurophysiol. 110, 905–909. Jutten, C. & Herault, J. [1991] “Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture,” Sign. Process. 24, 1–10. Kawanabe, M. & Murata, N. [2000] “Independent component analysis in the presence of Gaussian noise,” Technical Report METR 2000-02, Department of Mathematical Engineering and Information Physics, University of Tokyo. Lee, T., Bell, A. J. & Lambert, R. H. [1997] “Blind separation of delayed and convolved sources,” in Advances in Neural Information Processing Systems, Vol. 9, eds. Mozer, M. C., Jordan, M. I. & Petsche, T. (MIT Press), p. 758. Lee, T.-W., Ziehe, A., Orglmeister, R. & Sejnowski, T. [1998] “Combining time-delayed decorrelation and ICA: Towards solving the cocktail party problem,” Proc. ICASSP98, Vol. 2, Seattle, pp. 1249–1252. Makeig, S., Jung, T.-P., Bell, A., Ghahremani, D. & Sejnowski, T. [1997] “Blind separation of auditory event-related brain responses into independent components,” Proc. Natl. Acad. Sci. USA 94, 10979– 10984. McKeown, M., Makeig, S., Brown, S., Jung, T.-P., Kindermann, S., Bell, A., Iragui, V. & Sejnowski, T. [1998] “Blind separation of functional magnetic resonance imaging (fMRI) data,” Human Brain Mapp. 6, 368–372. Meinecke, F., Ziehe, A., Kawanabe, M. & M¨ uller, K.-R. [2002] “A resampling approach to estimate the stability of one- or multidimensional independent components,” IEEE Trans. Biomed. Eng., submitted. Molgedey, L. & Schuster, H. [1994] “Separation of a mixture of independent signals using time delayed correlations,” Phys. Rev. Lett. 72, 3634–3637. M¨ uller, K.-R., Philips, P. & Ziehe, A. [1999] “JADET D : Combining higher-order statistics and temporal information for blind source separation (with noise),” Proc. Int. Workshop on Independent Component Analysis and Blind Source Separation (ICA’99), Aussois, pp. 87–92. Murata, N., Ikeda, S. & Ziehe, A. [2001] “An approach to blind source separation based on temporal structure of speech signals,” Neurocomput. 41, 1–24. Parra, L. & Spence, C. [2000] “Convolutive blind
separation of non-stationary sources,” IEEE Trans. Speech Audio Process. 8, 320–327. Parra, L., Spence, C., Sajda, P., Ziehe, A. & M¨ uller, K.-R. [2000] “Unmixing hyperspectral data,” Advances in Neural Information Processing Systems, Vol. 12, eds. Solla, S. A., Leen, T. K. & M¨ uller, K.-R. (MIT Press), pp. 942–948. Petersen, K., Hansen, L., Kolenda, T., Rostrup, E. & Strother, S. [2000] “On the independent component of functional neuroimages,” Proc. Int. Workshop on Independent Component Analysis and Blind Signal Separation (ICA2000), Helsinki, Finland, pp. 251–256. Pham, D. & Garat, P. [1997] “Blind separation of mixture of independent sources through a quasimaximum likelihood approach,” IEEE Trans. Sign. Process. 45, 1712–1725. Roberts, S. & Everson, R. (eds.) [2001] Independent Component Analysis: Principles and Practice (Cambridge University Press). Sander, T., W¨ ubbeler, G., Lueschow, A., Curio, G. & Trahms, L. [2002] “Cardiac artifact subspace identification and elimination in cognitive MEG data using time-delayed decorrelation,” IEEE Trans. Biomed. Eng. 49, 345–354. S¨ arel¨ a, J., Valpola, H., Vig´ ario, R. & Oja, E. [2001] “Dynamical factor analysis of rhythmic magnetoencephalographic activity,” Proc. Third Int. Workshop on Independent Component Analysis and Blind Signal Separation, ICA’01 San Diego, CA, United States. S¨ arel¨ a, J. & Vig´ ario, R. [2001] “The problem of overlearning in high-order ICA approaches: Analysis and solutions,” 6th Int. Work-Conf. Artificial Neural Networks, (IWANN’2001) Granada, Spain. Sch¨ olkopf, B., Smola, A. & M¨ uller, K.-R. [1998] “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Comput. 10, 1299–1319. Shao, J. & Tu, D. [1995] The Jackknife and Bootstrap (Springer, NY). Stetter, M., Schießl, I., Otto, T., Sengpiel, F., H¨ ubener, M., Bonhoeffer, T. & Obermayer, K. [2000] “Principal component analysis and blind separation of sources for optical imaging of intrinsic signals,” Neuroimage 11, 482–490. Tong, L., Liu, R., Soon, V. C. & Huang, Y. [1991] “Indeterminacy and identifiability of blind identification,” IEEE Trans. Circuits Syst. 38, 499–509. Veen, B. D. V. & Buckley, K. M. [1988] “Beamforming: A versatile approach to spatial filtering,” IEEE ASSP Magazine. Vigario, R. [1997] “Extraction of ocular artefacts from EEG using independent component analysis,” Electroencephalogr. Clin. Neurophysiol. 103, 395–404. Vig´ ario, R., Jousm¨ aki, V., H¨ am¨ al¨ ainen, M., Hari, R. & Oja, E. [1998] “Independent component analysis for identification of artifacts in magnetoencephalographic recordings,” in Neural Information Processing
Blind Source Separation Techniques for Decomposing Event-Related Brain Signals
Systems 10 (Proc. NIPS’97), eds. Jordan, M. I., Kearns, M. J. & Solla, S. A. (MIT Press, Cambridge, MA). Vig´ ario, R. [1999] Independent Component Approach to the Analysis of EEG and MEG Recordings. Ph.D. thesis, Helsinki University of Technology. Vig´ ario, R. [2000] “Dipole modeling in FastICA decomposition of evoked responses,” Proc. 2nd Int. Workshop on Independent Component Analysis and Blind Separation of Signals (ICA’00) Helsinki, Finland. Vig´ ario, R. & Oja, E. [2000] “Independence: A new criterion for the analysis of the electromagnetic fields in the global brain?” Neural Networks 13, 891–907. Vig´ ario, R., S¨ arel¨ a, J., Jousm¨ aki, V., H¨ am¨ al¨ ainen, M. & Oja, E. [2000] “Independent component approach to the analysis of EEG and MEG recordings,” IEEE Trans. Biomed. Eng. 47, 589–593. Vig´ ario, R., Ziehe, A., M¨ uller, K.-R., S¨ arel¨ a, J., Oja, E., Jousm¨ aki, V., W¨ ubbeler, G., Trahms, L., Mackert, B.-M. & Curio, G. [2002] “Blind decomposition of multimodal and dc evoked responses,” in Advances in Exploratory Analysis and Data Modeling in Functional Neuroimaging, eds. Sommer, F. & Wichert, A. (MIT Press, Cambridge, MA). Wu, H.-C. & Principe, J. [1999] “Simultaneous diagonalization in the frequency domain (SDIF) for source separation,” Proc. Int. Workshop on Independent Component Analysis and Blind Source Separation (ICA’99) Aussois, France, pp. 245–250. W¨ ubbeler, G., Ziehe, A., Mackert, B.-M., M¨ uller, K.-R., Trahms, L. & Curio, G. [2000] “Independent component analysis of noninvasively recorded cortical magnetic DC-fields in humans,” IEEE Trans. Biomed. Eng. 47, 594–599. Ziehe, A. & M¨ uller, K.-R. [1998] “TDSEP — an effective algorithm for blind separation using time structure,” Proc. Int. Conf. Artificial Neural Networks (ICANN’98), Sk¨ ovde, Sweden, pp. 675–680. Ziehe, A., M¨ uller, K.-R., Nolte, G., Mackert, B.-M. & Curio, G. [2000a] “Artifact reduction in magnetoneurography based on time-delayed second order correlations,” IEEE Trans. Biomed. Eng. 47, 75–87. Ziehe, A., Nolte, G., Curio, G. & M¨ uller, K.-R. [2000b] “Ofi: Optimal filtering algorithms for source separation,” Proc. ICA 2000, eds. Pajunen, P. & Karhunen, J. (Helsinki University Press), pp. 127– 132. Ziehe, A., Nolte, G., Sander, T., M¨ uller, K.-R. & Curio, G. [2001] “A comparison of ICA-based artifact reduction methods for MEG,” in Recent Advances in Biomagnetism, Proc. 12th Int. Conf. Biomagnetism, ed. Nenonen, J., Espoo, Finland Helsinki University of Technology, pp. 895–898.
791
Appendix A Rotation Angles and Rotation Matrices The rotation matrices in N -dimensional space form a representation of the special orthogonal group SO(N ). This means, that for all R ∈ SO(N ) RRT = 1
(A.1)
det(R) = +1
(A.2)
Any orthogonal matrix R can be written as the exponential of a single antisymmetric matrix α R = eα =
∞ X 1 n α n! n=0
To see this, just transpose R = eα : T
RT = (eα )T = eα = e−α = R−1 A basis {Mij } in the space of all antisymmetric matrices can be defined by (Mij )ab = δai δbj − δaj δbi
(A.3)
where δia denotes the Kronecker–Symbol: δai = 1 if i = a and δai = 0 otherwise. Note, that each Mij is a N by N matrix, a and b specify the matrix entries, i.e. a is the row- and b the column-index. Using this, every orthogonal matrix can be written as N X 1 R = exp (A.4) αij Mij 2 i,j=1
Due to the antisymmetry of the M ij , one can choose the αij to be antisymmetric, too, without loss of generality. If we do so, each α ij corresponds to the angle of a basic rotation within the i–j-plane. More precisely, exp(αij Mij ) is a orthogonal transformation that rotates the i-axis towards the j-axis by the angle αij . So, given an arbitrary rotation matrix R, it is always straightforward to decompose it into elementary rotations within different planes by taking the matrix-logarithm: α = ln(R)
(A.5)