On the use of sparse signal decomposition in the ... - Semantic Scholar

Report 2 Downloads 18 Views
On the use of sparse signal decomposition in the analysis of multi-channel surface electromyograms Fabian J. Theis a,∗, Gonzalo A. Garc´ıa b a Institute

of Biophysics, University of Regensburg, 93040 Regensburg, Germany

b Department

of Bioinformatic Engineering, Osaka University, Osaka, Japan

Abstract The decomposition of surface electromyogram data sets (s-EMG) is studied using blind source separation techniques based on sparseness; namely independent component analysis, sparse nonnegative matrix factorization, and sparse component analysis. When applied to artificial signals we find noticeable differences of algorithm performance depending on the source assumptions. In particular, sparse nonnegative matrix factorization outperforms the other methods with regards to increasing additive noise. However, in the case of real s-EMG signals we show that despite the fundamental differences in the various models, the methods yield rather similar results and can successfully separate the source signal. This can be explained by the fact that the different sparseness assumptions (super-Gaussianity, positivity together with minimal 1-norm and fixed number of zeros respectively) are all only approximately fulfilled thus apparently forcing the algorithms to reach similar results, but from different initial conditions. Key words: surface EMG, blind source separation, sparse component analysis, independent component analysis, sparse nonnegative matrix factorization

1

Introduction

A basic question in data analysis, signal processing, data mining as well as neuroscience is how to represent a large data set X (observed as an (m × ∗ Corresponding author. Email addresses: [email protected] (Fabian J. Theis), [email protected] (Gonzalo A. Garc´ıa).

Preprint submitted to Elsevier Science

8 April 2005

N)−matrix) in different ways. One of the simplest approaches lies in a linear decomposition, X = AS, with A an (m × N)−matrix (mixing matrix ) and S an (n × N)−matrix storing the sources. Both A and S are unknown, hence this problem is often described as blind source separation (BSS). To get a well-defined problem, A and S have to satisfy additional properties such as: • the source components Si (rows of S) are assumed to be realizations of stochastically independent random variables — this method is called independent component analysis (ICA) [1,2], • the sources S are required to contain as many zeros as possible — we then speak of sparse component analysis (SCA) [3,4], • A and S are assumed to be nonnegative, which is denoted by nonnegative matrix factorization (NMF) [5]. The above-mentioned models as well as their interplay have recently been in the focus of many researchers, for instance concerning the question of how ICA and sparseness are related to each other and how they can be integrated into BSS algorithms [6–10], how to deal with nonnegativity in the ICA case [11,12] or how to extend NMF in order to include sparseness [13,14]. Much work has already been devoted to these subjects, and their applications to various fields are currently emerging. Indeed, linear representations such as the above have several potential applications including decomposition of objects into ‘natural’ components [5], redundancy and dimensionality reduction [2], biomedical data analysis, micro-array data mining or enhancement, feature extraction of images in nuclear medicine, etc. [1,2]. In this study, we will analyze and compare the above models, not from a theoretical point of view but rather from a concrete, real-world example, namely the analysis of surface electromyogram (s-EMG) data sets. An electromyogram (EMG) denotes the electric signal generated by a contracting muscle [15]; its study is relevant to the diagnosis of motoneuron diseases [16] as well as neurophysiological research [17]. In general, EMG measurements make use of invasive, painful needle electrodes. An alternative is to use s-EMG, which is measured using non-invasive, painless surface electrodes. However, in this case the signals are rather more difficult to interpret due to noise and overlap of several source signals as shown in Fig 1(a). We have already applied ICA in order to solve the s-EMG decomposition problem [18], however performance in real-world noisy s-EMG is still problematic, and it is yet unknown if the assumption of independent sources holds very well in the setting of s-EMG. In the present work, we apply sparse BSS methods based on various model assumptions to s-EMG signals. We first outline each of those methods and the corresponding performance indices used for their comparison. We then present the decompositions obtained with each method, and finally discuss these results in section 4. 2

(a)

(b)

α-motoneurons

MU#1

CNS

MU#2

spinal cord motor unit

MU#1 ( )

0.5 mV

s-EMG

50 ms

MU#2 ( )

Fig. 1. Electromyogram; (a) Example of a signal obtained from a surface electrode showing the superposition of the activity of several motor units. (b) α-motoneurons convey the commands from the central nervous system to the muscles. A motor unit consists of an α-motoneuron and the muscle fibres it innervates.

2

Method

2.1 Signal origin and acquisition

The central nervous system conveys commands to the muscles by trains of electric impulses (firings) via α-motoneurons, whose bodies are located in the spinal cord. The terminal axons of an α-motoneuron innervate a group of muscle fibres. A motor unit (MU) consists of an α-motoneuron and the muscle fibres it innervates, Fig. 1(b). When a MU is active, it produces a train of electric impulses called motor unit action potential train (MUAPT). The sEMG signal is composed of the weighted summation of several MUAPTs; an example is given in Fig. 1(a).

2.1.1 Artificial signal generator We developed a multi-channel s-EMG generator based on Disselhorst-Klug et al.’s model [19], employing Andreassen and Rosenfalck’s conductivity parameters [20], and the firing rate statistical characteristics described by Clamann [21]. The volume conductor between a muscle fibre and the skin surface where the electrodes are placed acts as a spatial low-pass filter; hence, a source signal is attenuated in direct proportion to its distance from the electrode. Using Griep’s tripole model [22], we can calculate the spatial potential distribution 3

MU#1

0 -10 10

MU#4

0

5

Depth inside the arm [mm]

MU#3

MU#2

6

-10 10 0

MU#4

-10 10 0 -10

MU#5

Synthetic s-EMG Source Signals [arbitrary units]

10

10

MU#2 4

3

2

1

0 -10

Ch#1

50

100

150

200

250

300

350

400

450

0 -15

500

Ch#2

Channel 1 Channel 2 Channel 3 Channel 4 Channel 5 Channel 6 Channel 7

Synthetic s-EMG signal [arbitrary units]

-20

Ch#4

Ch#5

-5

Ch#6

Ch#7

0

Ch#8

5

10

Electrode-Array surface [mm]

(a) synthetic source signals -20

Ch#3

-10

Time [ms]

Channel 8

MU#5

MU#3 MU#1

(b) source locations

20 0

20 0

20 0 -20

1

20

0.9

0

0.8

-20

0.7 0.6

20

0.5

0

0.4

-20

0.3

20

0.2

0

0.1

-20

0 1

20

2

0

3

-20

Ch

20 0 -20

an

4

ne

l

5 5

6

4

7 50

100

150

200

250

300

350

400

Time [ms]

(c) generated s-EMG signals

450

500

3

8

al

ign eS

2 1

c

r Sou

(d) mixing matrix

Fig. 2. Artificially created signals. (a) Five synthetic motor units action potential trains (source signals, MUAPTs) generated by the model. (b) Artificial surface electromyogram (s-EMG); location of the sources (motor units, encircled asterisks) respect the electrodes location marked by an x. The y-axis represents the depth inside the upper limb. (c) Artificially created mixture signal — only first 5000 samples (out of 15000) are shown. The eight-channel signal is generated from the five source signals (a), using the mixing matrix illustrated in (d).

generated on the skin surface by the excitation of a single muscle fibre. The potential distribution generated by the firing of a single motor unit can be calculated as the linear superposition of the contributions of all its constitutive muscle fibres, which are not located in one plane. The aforementioned model can be considered as a linear instantaneous mixing process. In reality however it is well-known that the model could exhibit slightly convolutive behavior, which is due to the fact that the media crossed by the s-EMG sources is anisotropic, provoking delayed mixtures of the same signals (convolutions) rather than an instantaneous mixture of them. However, in the muscle model used, the distances between the different muscle fibres have been increased 4

name

meaning

value

num

number of MUs

5

siz

size of each MU, in number of fibres

20-30 (uniformly distr.)

cfr

central firing rate

20 Hz

ED

electrode inter-pole distance

2.54 mm

CH

number of generated channels

8

meanIPI

mean inter-pulse interval

1/cfr · 1000 ms

IPIstd

Clamann formula for IPI deviation

9.1 · 10−4 · meanIPI2 + 4

SY

radial conductivity

0.063 S/m

SZ

axial conductivity

0.328 S/m

A

anisotropy ratio

SZ /SY = 5.2

SI

intracellular conductivity

1.010 S/m

D

fibre diameter

50 · 10−6 m

CV

conduction velocity

4.0 m/s

FatHeight

fat-skin layer thickness

3 mm

MUterritory

XY-plane radius of each MU

1.0 mm

DistEndplate

endplate-to-electrode distance

20 mm

FibZsegment

length of the endplate in the Z axis

0.4 mm

CondVolCond conductivity of the volume conductor Table 1 Parameters used for artificial s-EMG generation.

0.08 S/m

by the anisotropy factor A, calculated as the ratio between the radial conductivity (0.063 S/m) and the axial conductivity (0.328 S/m), see Tab. 1 for parameter details. Afterwards, the potential distribution can be calculated as if the volume conductor were isotropic. The channels composing the synthetic s-EMG contain the same MUAP without any time delay among themselves, hence producing a instantaneous mixture with the other source signals. We have generated 100 synthetic, eight-channel s-EMG signals of 1.5-second duration. Each s-EMG was composed of five MUs randomly located under a 3 mm fat and skin layer in the detection area of the electrode-array (up to ±2 mm of the sides of the electrode array and up to a depth of 6 mm). Each MU in turn consisted of 20 to 30 fibres (uniform distribution) of infinite length located randomly in a 1 mm circle in the XY plane. The conduction velocity of the fibres was 4.0 m/s. Note that an average biceps brachii MU is generally composed of at least 200-300 muscle fibres [23]; however, for the sake of computational speed, that number was reduced to one tenth, which 5

only resulted in the reduction of the amplitude of the final MUAPs. This does not have any influence in our case, as the algorithms are not affected by the general amplitude of the signals, only by their respective proportions, if at all. The simulated electrode-array used to generate the mixed recordings had the same dimensions and characteristics of the real one and was virtually located 20 mm apart from the endplate region, which had a thickness of 0.4 mm. X axis corresponded to the length of the electrode, Y axis to the depth, and Z axis to the fibres direction. The mean instantaneous firing rate of each MU was 20 firings/second (mean inter-pulse interval meanIPI of 50 ms), with a standard deviation calculated following Clamman’s model as 9.1 · 10−4 · meanIPI2 + 4 [21]. The s-EMG signals were generated as followed; each muscle fibre composing a firing MU generates an action potential following the moving tripole model ([24] and references therein); the final MUAP appearing in each channel is the summation of those action potentials after applying the anisotropy coefficient A and the fading produced by the media crossed by the signal ([19] and references therein). Note that the model generally holds for any skeletal muscle. However, some of the parameters (such as the fat layer and firing rate) were chosen following the average, normal values of the biceps brachii [25]. A single such data set is visualized in Fig. 2. The eight-dimensional s-EMG observations that have been generated from the five underlying MUAPTs are shown in Fig. 2(a). Their locations are depicted in Fig. 2(b) and the mixing matrix used to generate the synthetic data set (see Fig. 2(c)) is shown in Fig. 2(d). We will use several (100) of these signals with randomly chosen source locations in batch runs for statistical comparisons.

2.1.2 Real s-EMG signal acquisition Fig. 3 shows the experimental setting for the acquisition of real s-EMG recordings. The subject (after giving informed consent) sat on a chair and was strapped to it with nylon belts as shown in Fig. 3(a). The subject’s forearm was put in a cast fixed to the table, the shoulder angle being approximately 30◦ . To measure the torque exerted by the subject, a strain gauge was placed between the forearm cast and the table fixation. An electrode array was placed on the biceps short head muscle belly, out of the innervation zone, transverse to the direction of muscle fibers, after cleaning the skin with alcohol and SkinPure (Nihon Kohden Corp, Tokyo, Japan). The electrode array consists of 16 stainless steel poles of 1 mm in diameter, 3 mm in height, and 2.54 mm apart in both plane directions that are attached to a rigid acrylic plate, see Fig. 3(b). The poles of each bipolar electrode were placed at the shortest possible distance to obtain sharper MUAPs. 6

(a) electrode-array

chair belt

oscilloscope

mass-EMG electrode 8 chan.s

strain gauge

styrofoam cast

measured torque target torque

(b) direction of muscle fiber 41mm 1mm 3mm CH1

2.54mm

2.54mm CH8

41mm

Fig. 3. Experimental setting; (a) Eight-channels s-EMG was recorded during an isometric, constant force 30% of subject’s MVC. (b) Detail of the s-EMG bipolar electrode array used for the recordings.

Eight-channel EMG bipolar signals were measured (input impedance above 1012 Ω, CMRR of 110 dB), filtered by a first-order, band-pass Butterworth filter (cut-off frequencies of 70 Hz and 1 kHz), and then amplified (gain of 80 dB). The target torque and the measured torque were displayed on an oscilloscope as a thick and as a thin line respectively, which the subject was asked to match. The eight-channel s-EMG was sampled at a frequency of 10 kHz, 12 bits per sample, by a PCI-MIO-16E-1 A/D converter card (National Instruments, Austin, Texas, USA) installed in a PC, which was also equipped with a LabView (National Instruments, Austin, Texas, USA) program in charge of controlling the experiments. The signals were recorded during a 5s period of constant-force isometric contractions at 30% maximum voluntary contraction (MVC). Before applying any BSS method, the signal was preprocessed using a modified dead-zone filter developed to retain the MUAPs belonging to the target MUAPT and to remove both noise and low-amplitude MUAPs that did not reach the given thresholds. The nonlinear filter is realized by an algorithm that was developed in order to retain the MUAPs belonging to the target MUAPT and to remove both the noise and the low-amplitude MUAPs that did not reach the given thresholds. In former works, these thresholds were initially set at a level three times above the noise level (recording at 0% MVC) and then 7

adjusted with the aid of an incorporated graphical user interface [18]. In the present work, the thresholds were set to 10% for all the signals so that we may compare all obtained results easily. The algorithm looks for zero-crossings on the signal and defines a waveform as the samples of the considered signal comprised between two consecutive zero-crossings. If the maximum (respectively, minimum in the case of a negative waveform) is above (respectively, below) the threshold, that waveform is kept in the signal given as output. Otherwise, the waveform is substituted by a zero-voltage signal on that period. It is quite important to use such a filter, because although BSS algorithms are generally able to separate the noise as a component, they are usually unable to separate more source signals than available channels. Generally in our experiments there were more MUs than available channels even at low levels of contraction, but only few of them had amplitude above the noise level. In order to eliminate the activity of some MUs we delete the MUAPTs belonging to distant MUs, whose MUAPs are less powerful. This is enhanced by PCA preprocessing and dimension reduction as discussed later in section 3.2.

2.2 Blind source separation

Linear blind source separation (BSS) describes the task of blindly recovering A and S in the equation X = AS + N

(1)

where X consists of m signals with N observations each, put together into an (m × N)−matrix. A is a full-rank (m × n)−matrix, and we typically assume that m ≥ n (complete and under-complete case). Moreover N models additive noise, which is commonly assumed to be white Gaussian. Depending on the assumptions on A and S we get different models. Our goal is to estimate the mixing matrix A. Then the sources can be recovered by applying the pseudoinverse A+ of A to the mixtures X (which is optimal in the maximumlikelihood sense when Gaussian noise is assumed). In the case of s-EMG data, the original signals are the MUAPTs generated by the motor units active during a sustained contraction. In this setting, A quantifies how each source contributes to each observation. Of course additional requirements will have to be applied to the model to guarantee a satisfactorily small space of solutions, and depending on the assumptions on A and S we will obtain different models. 8

2.2.1 Independent component analysis Independent component analysis (ICA) describes the task of (here: linearly) transforming a given multivariate random vector such that its transform is stochastically independent. In our setting the random vector is given by N realizations, and ICA is applied to solve the BSS problem (1), where S is assumed to be independent. As in all BSS problems, a key issue lies in the question of identifiability of the model, and it can be shown that A (and hence S) is already uniquely — except for column permutation and scaling — determined by X if S contains at most one Gaussian and is square integrable [26,27]. This enables us to apply ICA to the BSS problem and to recover the original sources. The idea of ICA was first expressed by Jutten and H´erault [28], while the term ICA was later coined by Comon [26]. In contrast to principal component analysis (PCA), ICA uses higher-order statistics to fully separate the data. Typical algorithms are based on contrasts such as minimum mutual information, maximum entropy, diagonal cumulants or non-Gaussianity. For more details on ICA we refer to the two available excellent text-books [1,2]. In the following we will use the so-called JADE (joint approximate diagonalization of eigenmatrices) algorithm, which identifies the sources using the fact that due to independence, the diagonal cumulants of the sources vanish. Furthermore, fixing two indices of the 4th order cumulants, it is easy to see that such cumulant matrices of the mixtures are diagonalized by A. After pre-processing by PCA, we can assume that A is orthogonal. Then diagonalization of one mixture cumulant matrix already yields A, given that its eigenvalues are pairwise different. However, in practice this is not always the case; furthermore, estimation errors could result in a bad estimate of the cumulant matrix and hence of A. Therefore, joint diagonalization of a whole set of cumulant matrices yields an improved estimate of A. Algorithms for actually performing joint diagonalization include gradient descent on the sum of off-diagonal terms, iterative construction of A by Givens rotation in two coordinates [29], an iterative two-step recovery of A [30] or — more recently — a linear least-squares algorithm for diagonalization [31], where the latter two algorithms can also search for non-orthogonal matrices A. One method we use for analyzing the s-EMG signals is JADE-based ICA. Confirming results from [32] we show that ICA can indeed extract the underlying sources. In the case of s-EMG signals, all sources are strongly super-Gaussian and can therefore safely be assumed to be non-Gaussian, so identifiability holds. However, due to the nonnegativity of A, the scaling indeterminacy is reduced to multiplication with a positive scalar in each column. If we additionally use the common assumption of unit variance of the sources, this 9

already eliminates the scaling indeterminacy. In order to use an ordinary ICA algorithm, we simply have to add a ’postprocessing’ stage: to guarantee a nonnegative matrix, column signs are flipped to have only (or as many as possible) nonnegative column entries. Also note that statistical independence meaning that the multivariate probability densities factorize is not related to the synchrony in the firing of MUs [33] — otherwise overlapping MUs could not be separated. We finally want to remark that ICA can also be interpreted as sparse signal decomposition method in the case of super-Gaussian sources. This follows from the fact that a good and often-used contrast for ICA is given by maximization of non-Gaussianity [34] — this can be approximately derived from the fact that due to the central limit theorem, a mixture of independent sources tends to be more Gaussian than the sources, so the process can be inverted by maximizing non-Gaussianity. In our setting the sources are very sparse, hence strongly non-Gaussian. An ICA decomposition is therefore closely related to a decomposition into parts of maximal sparseness — at least if sparseness is measured using kurtosis. 2.2.2 (Sparse) nonnegative matrix factorization In contrast to other matrix factorization models such as PCA, ICA or SCA, nonnegative matrix factorization (NMF) strictly requires both matrices A and S to have nonnegative entries, which means that the data can be described using only additive components. Such a constraint has many physical realizations and applications, for instance in object decomposition [5]. If additionally some sparseness constraints are put on A and S, we speak of sparse NMF, see [14] for more details. Typically, NMF is performed using a least-squares (Euclidean) contrast E(A, S) = kX − ASk2 ,

(2)

which is to be minimized. This optimization problem, albeit convex in each variable separately, is not convex in both at the same time and hence direct estimation is not possible. Paatero [35] minimizes (2) using a gradient algorithm, whereas Lee and Seung [36] develop a multiplicative update rule increasing algorithm performance considerably. Although NMF has recently gained popularity due to its simplicity and power in various applications, its solutions frequently fail to exhibit the desired sparse object decomposition. Therefore, Hoyer [14] proposes a modification of the NMF model to include sparseness. However, a simple modification of the cost function (2) could yield undesirable local minima, so instead he chooses to minimize (2) under the constraint of fixed sparseness of both A and S. Here, 10

sparseness is measured by combining the Euclidean norm k.k2 and the 1-norm P kxk1 := i |xi | as follows: √ n − kxk1 /kxk2 √ sparseness(x) := (3) n−1 if x ∈ Rn \ {0}. So sparseness(x) = 1 (maximal) if x contains n − 1 zeros, and it reaches zero if the absolute value of all coefficients of x coincide. The devised algorithm is based on the iterative application of a gradient decent step and a projection step, thus restricting the search to the subspace of sparse solutions. We perform the factorization using the publicly available Matlab library nmfpack 1 , which is used in [14]. So NMF decomposes X into nonnegative A and nonnegative S. The assumption that A has nonnegative coefficients is very well fulfilled by s-EMG recordings; however as seen before the sources also have negative entries. In order to be able to apply the algorithms, we therefore preprocess the data using the function    0, x < 0 κ(x) = (4)   x, x ≥ 0 to cut off non-zero values; this yields the new random vector (sample matrix) X+ := (κ(X1 ), . . . , κ(Xn ))⊤ . For comparison, we also construct a new sample set by simply leaving out samples that have at least one negative value. Here we model this by the random vector X∗ . 2.2.3 Sparse component analysis Sparse component analysis (SCA) [3,4] requires strong sparseness in the sources only — this is then sufficient to decompose the observations. In order to define the SCA model, a vector x ∈ Rn is said to be k-sparse if x has at most k non-zero entries. This k-sparseness implies k0 -sparseness for k0 ≥ k. If an n-dimensional vector is k-sparse for k = n − 1, it is simply said to be sparse. The goal of sparse component analysis of level k (k-SCA) is to decompose X into X = AS as above such that each sample (i.e. column) of S is k-sparse. In the following we will assume k = n − 1. Note that, in contrast to the ICA model, the above model is not translation invariant. However, it is easy to see that if instead of A we allow an affine linear transformation, the translation constant can be determined from X only as long as the sources are non-deterministic. In other words, instead of assuming k-sparseness of the sources we could also assume that at any time 1

http://www.cs.helsinki.fi/u/phoyer/

11

instant only k source components are allowed to vary from a previously fixed constant (which can be different for each source). In [3] we showed that under slight conditions k-sparseness already guarantees identifiability of the model, even in the case of less observations than sources. In the setting of s-EMG however we are in the comfortable situation of having more observations than sources, so as in the ICA case we preprocess our data using PCA projection — this dimension reduction algorithm can be applied even to our case of non-decorrelated sources as (given low or no noise) the first three principal components will span the source signal subspace, see comment in section 3.2. The above uniqueness result is based on the fact that due to the assumed sparseness the data clusters into a fixed number of hyperplanes. This fact can also be used in an algorithm to actually reconstruct A by identifying the set of hyperplanes. From the hyperplanes, A can be recovered by simply taking intersections. However, multiple hyperplane identification is non trivial, and the involved cost function

σ(A) =

N n |a⊤ X(t)| 1 X min i , N t=1 i=1 kX(t)k

(5)

where ai denote the columns of A, is highly non-convex. In order to improve the robustness of the proposed, stochastic identifier, we developed an identification algorithm using a generalized Hough transform [37]. Alternatively a generalization of k-means clustering can be used, which iteratively clusters the data into groups belonging to the different hyperplanes, and then identifies a hyperplane within the cluster by regression [38]. In this paper, we assume sparseness of the sources S in the sense that at least one coefficient of S at a certain time instant has to be zero. In the case of s-EMG, the maximum natural firing rate of a motor unit is about 30 pulses/second, lasting each pulse less than 15 ms [39]. Therefore, a motor unit is active less than 450 ms per second; that is, at least 55% of the time each source signal is zero. In addition, the firings of different motor units are not synchronized (only their respective firing rate shows the tendency to change together, the so-called common drive [40]). For these reasons, the probability of all n sources firing at a given time instant is bounded by 0.45n , which quickly approaches zero for increasing n. Even in the case of only n = 3 sources, at most 9% of the samples are fully active. Hence the source conditions for SCA should be rather well fulfilled, an we can find isolated MUAPs inside an s-EMG signal using SCA with high probability. 12

2.3 Measures used for comparison

In order to compare the recovered signals with the artificial sources, we simply compare the mixing matrices. For this we employ Amari’s separation performance index [41], which is given by the equation n X





n X

n n X X |pij | |pij |  E1 (P) = − 1 + −1 i=1 j=1 maxk |pik | j=1 i=1 maxk |pkj |

!

ˆ + A, A being the real mixing matrix and A ˆ + the where P = (pij ) = A ˆ Note that E1 (P) ≤ 2n(n − 1). For both pseudoinverse of its estimation A. ˆ+ ˆ ˆ the artificial and the real signals, we calculate E1 (A 1 A2 ), where Ai are the two recovered mixing matrices. Furthermore, we also want to study the recovered signals. So in order to be able to compare between different methods, to each pair of components obtained with each method, as well as to the source signals and the synthetic s-EMG channels, we apply the following equivalence measures: Principe’s quadratic mutual information (QMI) [42], Kullback-Leibler information distance (K-LD) [43], Renyi’s entropy measure [43], mutual information measure (MuIn) [42], Rosenblatt’s squared distance functional (RSD) [43], Skaug and Tjøstheim’s weighted difference (STW) [43] and cross-correlation (Xcor). All the measures are normalized with respect to the maximum value obtained when applied to each component with itself; that is, we divide by the maximum of each comparison matrix diagonal (maximum mutual information). For the calculation of the above-mentioned indices it is necessary to estimate both joint and marginal probability density function of the signals. We have decide to use the data-driven method Kn-nearest-neighbour (KnNN) [44] (for details, refer to [32]). Furthermore, in order to compare separation performance in the presence of noise, we measure the strength of a one-dimensional signal S versus additive noise S˜ using the signal-to-noise ratio (SNR) defined by ˜ := 20 log10 kSk . SNR(S, S) ˜ kS − Sk

3

Results

In this section we compare the various models for source separation when applied to both toy and real s-EMG recordings. 13

(a) AJADE

(c) ANMF*

(b) ANMF 0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0 1

0 1

0.2 0 1

2

3

4

5

6

7

8

2

3

4

5

3 1 2

channel

(d) AsNMF

6

7

8

(e) AsNMF*

0.4

0.4

0.4

0.2

0.2

0.2

0 1

0 1 5

6

7

8

5

6

7

8

3 1 2

(f) ASCA 0.6

4

4

signal

0.6

3

3

3 1 2

0.6

2

2

2

0 1 3

3 1 2

4

5

6

7

8

3 1 2

2

3

4

5

6

7

8

3 1 2

Fig. 4. Mixing matrices recovered for the synthetic s-EMG using different methods; (a) ICA using joint approximate diagonalization of eigenmatrices (JADE), (b, c) nonnegative matrix factorization with different preprocessing (NMF, NMF∗ ), (d, e) Sparse NMF (sNMF, sNMF∗ ) with the same two preprocessing methods as NMF, and (f) sparse component analysis (SCA).

3.1 Artificial signals In the first example, we compare performance in the well-known setting of artificially created toy-signals.

3.1.1 Single s-EMG For visualization, we will first analyze a single artificial s-EMG recording, and only later present batch-runs over multiple separate realizations to test for statistical robustness. As data set, we use toy signals as in section 3.1 but with only three source components for easier visualization. The ICA-result is produced using JADE after PCA to 3 components. Please note that here and in the following we perform dimension reduction because in the small sensor volumes in question not many MUs are present. This is confirmed by considering the eigenvalue structure of the covariance matrix: taking the mean over 10 real s-EMG data sets — further discussed in section 3.2 — the ratio of third to first largest eigenvalue is only 0.11, and the ratio of the fourth to the first only 0.04. Taking sums, in the mean we loose only 5.7% of raw data 14

(a) JADE (b) NMF (f) SCA

(e) sNMF*

(d) sNMF

(c) NMF*

0.05

(g) Source

Component given by each method [arbitrary units]

4 2 0 -2 -4

0 -0.05 0.05 0 -0.05 0.05 0 -0.05 0.05 0 -0.05 5 0 -5

10 0 -10 50

100

150

200

250

300

350

400

450

500

Time [ms]

Fig. 5. One of the three recovered sources after applying the pseudoinverse A+ of the estimated mixing matrices to the synthetic s-EMG mixtures X; (a-f) results obtained using the different methods, see Fig. 4; (g) for comparison, also the original source signal is shown.

by dimension reduction to the first three eigenvalues, which lies easily in the range of the noise level of typical data sets. In order to reduce the ever-present permutation and scaling ambiguity, the columns of the recovered mixing matrix AJADE are normalized to unit length; furthermore, column sign is chosen to give a positive sum of coefficients (because A is assumed to have only positive coefficients), and the columns are permuted such that the maxima index was increasing, Fig. 4(a). The three components are mostly active in channels 3, 4 or 5 respectively, which coincides with our construction (the real sources lie closest to sensor 4, 3 and 5 respectively). Fig. 5(a) shows one of the recovered sources, and for comparison, Fig. 5(g) shows the original source signal. We subsequently apply NMF and sparse NMF to both X+ and X∗ (denoted by NMF, NMF∗ , sNMF and sNMF∗ respectively); in all cases we achieve fast convergence. The mixing matrices obtained are shown in Fig. 4(b-e). In all four cases we observe a good recovery of the source locations. Cross-multiplication of these matrices with their pseudoinverses shows a high similarity, and the recovered source signals are similar and match well the original source, see Fig. 5(b-e). 15

4

3 2.5 2 1.5 1 0.5 0

JADE SCA

Measure mean [a.u.]

3.5

NMF sNMF*

NMF*

sNMF

sNMF

NMF* NMF

1.5

1

0.5

0 Renyi

QMI

KLD

MuIn

Measure

sNMF*

RSD

STW

Xcor

Channels sNMF* sNMF NMF* NMF SCA Signal JADE Source

(a) matrix comparison by Amari index (b) comparison of the recovered sources Fig. 6. Inter-component performance index comparisons. (a) Comparison of the Amari index of matrices acquired from two different methods each in the case of synthetic s-EMG decomposition. As the comparison matrix is symmetric, half of its values have been omitted for clarity. (b) Mean of the inter-components mutual information for each method estimated by different measures. For comparison, the measures corresponding to the source signals (minimal indices) and to the channels of the mixed s-EMG (maximal indices) are also shown.

Finally, we perform SCA using a generalized Hough transform [37]. Note that there are also other algorithms possible for such extraction, and model generalizations are possible, see for example [45]. After whitening and dimension reduction, we perform Hough hyperplane identification with bin-size 180 and manually identified the maxima in the obtained the Hough accumulator. The recovered mixing matrix is again visualized in Fig. 4(f). Similar to the previous results, the three components are roughly most active at locations 2 to 3, 4 and 5 respectively. Multiplication with the pseudoinverse of the recovered matrices from the above experiments shows that the result coincides quite well with the matrices from NMF, but differs slightly more from the ICA-recovery. We calculate and compare the Amari index for each method and as shown in Fig. 6(a), no major differences can be detected. A comparison of the recovered sources using the indices from section 2.3 is given in Fig. 6(b), where also the indices corresponding to the original source signals (minimum mutual information values) and to the channels of the s-EMG (maximum values) have been added. All the methods separate the mixtures (improvement in terms of indices) but the methods yield somewhat different result, with JADE giving quite different sources than the rest of the methods, and NMF and NMF∗ performing rather similar. In terms of source independence, of course JADE scores best ratings in the indices, as can be confirmed by calculating the Amari index of the recovered matrix with the original source matrix: 16

mean kurtosis sparseness σ(A2 )

JADE

NMF

NMF∗

sNMF

sNMF∗

SCA

sources

10.80

8.48

8.80

8.85

9.64

7.14

11.4

0.286

0.268

0.295

0.270

0.302

0.201

0.252

2.91

2.02

2.11

1.99

2.36

0.96

3.24

Table 2 Sparseness measures of the recovered sources using the various methods. In the first row, the mean kurtosis is calculated (the higher, the more ‘spiky’ the signal). The second row gives the sparseness index from equation (3) (the higher, the sparser the signal) and the third row the cost function (5) employed in SCA (the lower, the better the SCA criterion is fulfilled). Optimal values are printed in bold face.

E1 (A+ 2 A)

AJADE

ANMF

ANMF∗

AsNMF

AsNMF∗

ASCA

1.39

3.27

2.96

3.41

2.56

3.86

JADE clearly outperforms the other methods — this will be explained in the next paragraph. However we have to add that the signal generation additionally involves a slightly nonlinear filter, so we can only estimate the real mixing matrix A from the sources and the mixtures, which yielded a non-negligible error. Hence this result indicates that JADE could best approximate the linearized system. One key question in this work is how the different models induce sparseness. Clearly, sparseness is a rather ambiguous term, so we calculate three indices (kurtosis, ‘sparseness’ from equation (3) and σ from (5)) for the real as well as the recovered sources using the proposed methods, see Tab. 2. As expected, ICA gives the highest kurtosis among all methods, whereas sparse NMF yields highest values of the sparseness criterion. And SCA has the lowest i.e. best value regarding k-sparseness measured by σ(A). We further see that both the kurtosis and the sparseness criterion seem to be somewhat related with regards to the data set as high values in both indices are achieved by both JADE and sNMF. The k-sparseness criterion, which fixes only the zero(semi)norm i.e. requires a fixed amount of zeros in the data without additional requirements about the other values does not induce as high sparseness when measured using kurtosis/sparseness and vice versa. Finally by looking at the sparseness indices of the real sources, we can now understand why JADE outperformed the other methods in this toy data setting — indeed kurtosis of the sources is high and the mutual information low, see Fig. 6(b). But in terms of sparseness and especially σ, the sources are not as sparse as expected. Hence the (sparse) NMF and mainly the SCA algorithm could not perform as well as JADE when compared to the original sources as noted above. However, we will see that in the case of real s-EMG signals, this distinction will break; furthermore, sparse NMF turns out to be more robust against noise than JADE, as is shown in the following. 17

30

25

25 20

15

SNR

Amari index

20

10

15 10 5

5

0

0 JADE

NMF

NMF∗

sNMF

sNMF∗

SCA

JADE

(a) Amari index

NMF

NMF∗

sNMF

sNMF∗

SCA

(b) SNR

Fig. 7. Boxplot of the separation performance when identifying 5 sources in 8-dimensional observed artificial s-EMG recordings. (a) shows the mean Amari index of the product of the identified separation and the real mixing matrix, and (b) depicts the SNR of the recovered sources versus the real sources. Mean and variance were taken over 100 runs.

3.1.2 Multiple s-EMG experiments We now show performance of the above algorithms when applied to 100 different realizations of artificial s-EMG data sets. The data consists of 8 channels with 5 underlying source activities; more details about data generation are given in section 2.1.1. The algorithm parameters are the same as above with the exception that automated SCA is performed using Mangasarian-style clustering [38] after PCA dimension reduction to 5. The ICA and sparse BSS algorithms from above are applied to these data sets, and Amari index as well as SNR of the recovered sources versus the original sources is stored. In Fig. 7, the means of these two indices as well as their deviations are shown in a box plot, separately for each algorithm. As in the single s-EMG experiment, these statistics confirm that JADE performs best, both in terms of matrix and in terms of source recovery (which is more or less the same due to the fact that we are dealing with the noiseless case so far). The NMF algorithms identify the mixing matrix with acceptable performance, however (sparse) NMF taking only positive samples (sNMF∗ ) tends to separate the data slightly better than by sample preprocessing using κ from equation (4). SCA cannot detect the source matrix as well as the other BSS algorithms — again the SCA conditions seem to be somewhat violated — however it performs adequately well recovering the sources, which is due to the fact that some sources are recovered very well resulting in a higher SNR than the NMF algorithms. For practical reasons, it is important to check to what extend the signal-to-interference ratio between the channels is improved after applying BSS algorithms. For each run, we monitor the SIR of the original sources and the recoveries by taking the mean over all channels. The two SIR means are 18

divided to give a mean improvement ratio. Taking again mean over the 100 runs yields the following improvements:

JADE NMF mean SIR improvement

4.15

2.54

NMF∗

sNMF

sNMF∗

SCA

2.87

0.701

1.90

3.08

This confirms our results; JADE works very well for preprocessing, but also NMF∗ and, interestingly, SCA.

In order to test the robustness of the methods against noise, we recorded sEMG at 0%MVC, that is, when the muscle was totally relaxed; in this way we obtained a recording of noise only. As expected the resulting recordings have nearly the same means and variances, and are close to Gaussian. This is confirmed by a Jarque-Bera test, which asymptotically tests for goodness-of-fit of an observed signal to a normal distribution. We recorded two different noise signals, and the test was positive in 11 out of 16 cases (at significance level α = 5%); the 5 exceptions had p-values not lower than 0.001. Furthermore, the noise is independent as expected because it has a close to diagonal covariance matrix, Amari index of 2.1, which is quite low for 8 dimensional signals. The noise is not fully i.i.d. but exhibits slight non-stationarity. Nonetheless, we take this findings as confirmation to assume additive Gaussian noise in the following. We will show mean algorithm performance over 50 runs for varying noise levels.

Note that due to the Gaussian noise, the models (especially the NMF model which already uses such a Gaussian error term) hold well given the additive noise. We multiplied this noise signal progressively by 0, 0.01, 0.05, 0.1, 0.5, 1 and 5 (which corresponds to mean source SNRs of ∞, 36, 22, 16, 2.1, -3.9 and -18 dB) and then added each of the obtained signals to a randomly generated synthetic s-EMG containing 5 sources as above. The Amari index was calculated for each method and for each noise level. We thus obtained the comparative graph shown in Fig. 8. Interestingly, sparse NMF∗ outperforms JADE in all cases, which indicates that the sNMF model (which already includes noise), works best in cases of slight to stronger additive noise — which makes it very well adapted to real applications. Again SCA performs somewhat problematically, however separates the data well a the noise level of -3.9 dB. We believe that this is due to the involved thresholding parameter in SCA hyperplane detection; apparently it is necessary to implement an adaptive choice of this parameter in order to improve separation as in the case of SNR of -3.9 dB. 19

9 JADE

NMF NMF∗ sNMF sNMF∗ SCA

8

median Amari−index

7 6 5 4 3 2 1 0 −20

−10

0

10 20 mean SNR (dB)

30

40

Channel 3 Channel 4 Channel 5 Channel 6 Channel 8

Channel 7

Real s-EMG signal [V]

Channel 2

Channel 1

Fig. 8. Result of the experiment testing the robustness of the different methods against noise. Plotted is the mean over 50 runs. 5 0 -5 -10 5 0 -5 -10 5 0 -5 -10 5 0 -5 -10 5 0 -5 -10 5 0 -5 -10 5 0 -5 -10 5 0 -5 -10

100

200

300

400

500

600

700

800

900

1000

Time [ms]

Fig. 9. Real s-EMG obtained from a healthy subject performing a sustained contraction at 30% MVC.

20

(a) AJADE

(b) ANMF

0.6 0.4

(c) ANMF*

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.2

0

0 1

1

2

3

4

5

6

7

8

2

0 1 3

4

5

3 1 2

channel

6

7

8

(e) AsNMF* 0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0 1

0 1 4

5

6

7

8

4

5

3 1 2

0.8

3

3

6

7

8

3 1 2

signal

(d) AsNMF

2

2

2

0 1 3

4

3 1 2

5

6

7

8

3 1 2

(f) ASCA

2

3

4

5

6

7

8

1 2

Fig. 10. Mixing matrices recovered for the real s-EMG using different methods; (a) JADE, (b, c) NMF with different preprocessing, (d, e) sparse NMF with the same two preprocessing methods as NMF, and (f) SCA.

3.2 Real s-EMG signals In this section we analyze real s-EMG recordings obtained from ten healthy subjects. At first we will again study a single s-EMG and plot it for visual inspection, and then show statistics over multiple subjects. The first data set has been obtained from a single subject performing a sustained contraction at 30% MVC, see Fig. 9. The signal acquisition and preprocessing is described in section 2.1.2. We initially use JADE as ICA algorithm of choice. The estimated mixing matrix is visualized in Fig. 10(a). As in the case of synthetic signals, we then apply NMF and sparse NMF to both X+ and X∗ and obtain the recovered mixing matrices visualized in Fig. 10(b-e). We face the following problem when recovering the source signals by SCA. If we use PCA to n = 3 dimensions, we cannot achieve convergence; also a generalized Hough plot [37] does not reveal such structure. Hence we choose dimension reduction to n = 2. In the two-dimensional projected mixtures, the data clearly clusters along two lines, so the assumption of sparseness holds in 2 dimensions. We use Mangasarian-style clustering / SCA (similar to kmeans) [38] to recover these directions. The thus recovered mixing matrix in 21

Measure value [a.u.]

1.5

1

0.5

0 JADE

SCA

NMF

sNMF*

sNMF

NMF* NMF

2 1.5 1 0.5 0 KLD MuIn QMI RSD Measure Renyi STW Xcor

NMF*

sNMF

2.5

sNMF*

SCA JADE

Channels sNMF* sNMF NMF* NMF

Signal

(a) matrix comparison by Amari index (b) comparison of the recovered sources Fig. 11. Inter-component performance index comparisons; (a) comparison of the Amari index; (b) mean of various source dependence measures. JADE

NMF

NMF∗

sNMF

sNMF∗

SCA

mean kurtosis

4.97

4.74

4.80

4.81

4.80

4.82

sparseness

0.387

0.424

0.413

0.408

0.407

0.405

0.76

0.50

0.55

0.60

0.62

0.70

σ(A2 )

Table 3 Comparison of sparseness of the recovered sources (real s-EMG data).

two dimensions is plotted in Fig. 10(f). Note that the SCA matrix columns match two columns of the mixing matrices found by the other methods. Similar to the toy data set, we compare the recoveries based on the different methods using various indices from section 2.3 for mixing matrices and recovered sources , Fig. 11. In contrast to the artificial signals, here all methods yield rather similar performance, the mean Amari indices are roughly half the value of the indices in the toy data setting. This confirms that the methods recover rather similar sources. As in the case of artificial signals, we again compare the sparseness of the recovered sources, see Tab. 3. Due to the noise present in the real data, the signals are clearly less sparse than the artificial data. Furthermore a comparison between the various methods yields noticeably less differences than in the case of artificial signals. At first glance it seems unclear why SCA performs worse in terms of k-sparseness (using σ(A)) than the NMF methods, but better than JADE. This however can be explained by the fact that the PCA dimension reduction reduces the number of parameters hence SCA cannot search the whole space, so it performs worse than NMF in that respect. However it outperforms JADE, which also uses PCA preprocessing. 22

(a) JADE (b) NMF

8 6 4 2 0 -2 -4

(c) NMF*

8 6 4 2 0 -2

(d) sNMF

6 4 2 0 -2

(e) sNMF*

8 6 4 2 0 -2

(f) SCA

8 6 4 2 0 -2 -4

(g) s-EMG

Component given by each method [a.u.]

8 6 4 2 0 -2 -4

1.5 1 0.5 0 -0.5 50

100

150

200

250

300

350

400

450

500

Time [ms]

Fig. 12. One of the three recovered sources after applying A+ to an s-EMG data recorded at 30% MVC; (a-f) results obtained using the different methods; (g) original source signal.

In order to be able to draw statistically more relevant conclusions, we compare the various BSS algorithms for s-EMGs of nine subject, recorded at 30% MVC. Fig. 12 plots a single extracted source for each BSS algorithms; Fig. 12(g) shows the s-EMG channel where the dominant source signal is chosen for comparison. One problem of performing batch comparisons however lies in the fact that separation performance is commonly evaluated by visual inspection or at most by comparison with other separation methods — because of course the original sources are unknown. We cannot do plots similar to Fig. 12 for each subject, so in order to provide a more objective measure, we consider the main application of BSS for real s-EMG analysis — preprocessing in order to achieve ‘cleaner’ data for template matching. A common measure for this is to count the number of zero-crossings of the sources (after combining them to a full eight dimensional observation vector by taking only the maximally active source in each channel). Note that this zerocrossing count is already outputted by the MDZ filter. It is directly related to the amount of MUAPs present in a s-EMG signal [46], and by comparing this index before and after the BSS algorithms, we can analyze whether and to what extent they actually enhance the signal. With the aid of the MDZ filter, we count the number of waves (the excursion of the signal between two 23

subject ID

JADE

NMF

NMF∗

sNMF

sNMF∗

SCA

b

9%

7%

9%

10%

11%

10%

f

5%

3%

5%

4%

5%

4%

g

37%

35%

35%

35%

36%

30%

k

35%

35%

35%

35%

34%

38%

m

16%

16%

17%

16%

16%

13%

og

9%

8%

8%

8%

8%

10%

ok

3%

7%

9%

9%

10%

7%

s

41%

42%

42%

41%

41%

37%

y

71%

71%

73%

71%

73%

74%

25.0 %

25.1 %

25.7%

25.4%

25.8 %

24.6 %

means

std. deviations 21.4 % 21.4 % 21.3% 20.9% 21.0% 21.4 % Table 4 Negative zero-crossing mean ratios i.e. relative enhancements for each subject, together with mean performance of each algorithm.

consecutive zero crossings) and take the mean over all channels before and after applying each of the BSS algorithms. We then subtract the latter from the previous value and divide by the initial number of waves in order to have an index than can be compared between different signals. Tab. 4 shows the resulting ratios for the nine subjects. All BSS algorithms result in a reduction of zero-crossings, and best results per run are achieved by NMF∗ , sNMF∗ and SCA. We see that in the mean, all algorithm perform somewhat similarly, with sNMF∗ being best and SCA being worst. Hence the best algorithm for this data set, sNMF∗ , achieved a mean reduction of the number of waves was 25.8%, which means that after applying the algorithms each channel is composed, in average, of one fourth of the original number of waves, making the template-matching technique easily applicable.

4

Discussion

The main focus of this work lies in the application of three different sparse BSS models — source independence, (sparse) nonnegativity and k-sparseness — to the analysis of s-EMG signals. This application is motivated by the fact that the underlying MUAPTs exhibit properties (mainly sparseness) that fit quite well to the three in principle different models. Furthermore, we take interest in how well these models behave in the case of slightly perturbed initial conditions — ICA for instance is known to be quite robust against 24

small errors in the independence assumption — and how they can deal with additional noise. In the first example of artificially created mixtures, we are able to demonstrate that a decomposition analysis using the three models was possible, and give comparisons over larger data sets. Although the recovered sources are all rather alike, Fig. 5 and Fig. 7, we found that ICA outperformed the other methods concerning distance to the real solution, which was mainly due to the fact that the artificial sources — but not in the case of real signals, see below — best fitted to the ICA model, Tab. 2. However, when considering additional noise with increasing power, sparse NMF turned out to be more robust than the ICA model, Fig. 8. We then applied the BSS methods to real s-EMG data sets, and the three different models yielded surprisingly similar results, although in theory these models do not fully overlap. We speculate that this similarity is due to the fact that — as most probably in all applications — the models do not fully hold. This allows the various algorithms to only approximate the model solutions, and hence to arrive at similar solutions, but from different directions. Furthermore, this indicates that the three models look for different properties of the sources, and that these properties are fulfilled to varying extent, see Tab. 3 for numerical details. Comparisons over s-EMG data sets from multiple subjects again confirmed similar equality of performance, where again sparse NMF slightly outperformed the other algorithms in the mean (in nice correspondence with the noise result from above). Note that the aim of the present work is not the full recovery of a target MUAPT (source signal) in its original form. In fact, as shown previously [18], it would be sufficient to increase the amplitude a target MUAPT so that in average it is above the noise and above the other MUAPTs level. Then, we are able to cut the interfering signals with a modified dead zone filter and thus isolating the target MUAPT. And indeed all of the employed sparse BSS methods fulfill this requirement, which is confirmed by the decrease in the number of zero-crossings of the separated signals, see Tab. 4.

5

Conclusion

We have compared the effectiveness of various sparse BSS methods for signal decomposition; namely, ICA, NMF, sparse NMF and SCA, when applied to s-EMG signals. Surface EMG signals represent an attractive testing signal as they approximately fulfill all the requirements of these methods and are of major importance in medical diagnosis and basic neurophysiological research. 25

All methods, in spite of being based on very different approaches, gave similar results in the case of real data and decomposed them sufficiently well. We therefore suggest using sparse BSS as an important preprocessing tool before applying the common template matching technique. In terms of algorithm comparisons, ICA performed better than the other algorithms in the noiseless case, however sparse NMF∗ outperformed the other methods when noise was added and slightly so in the case of multiple real s-EMG recordings. In later work concerning methods of s-EMG decomposition, we therefore want to focus on properties and possible improvements of sparse NMF regarding parameter choice (which level of sparseness to choose) and signal preprocessing (in order to deal with positive signals). Preprocessing to improve sparseness is currently studied. In order to be better able to compare methods, we also plan to apply the methods to artificial sources generated using other available s-EMG generators [47]. Finally extensions to convolutive mixing situations will have to be analyzed.

Acknowledgements

We would like to thank Dr. S. Rainieri for helpful discussion and K. Maekawa for the first version of the s-EMG generator. This work was partly supported by the Ministry of Education, Culture, Sports, Science, and Technology of Japan (Grant-in-Aid for Scientific Research). G.A.G. is supported by a grant from the same Ministry. F.T. gratefully acknowledges partial financial support by the DFG (GRK 638) and the BMBF (project ‘ModKog’).

References [1] A. Hyv¨ arinen, J. Karhunen, E. Oja, Independent Component Analysis, John Wiley & Sons, 2001. [2] A. Cichocki, S. Amari, Adaptive blind signal and image processing, John Wiley & Sons, 2002. [3] P. Georgiev, F. Theis, A. Cichocki, Blind source separation and sparse component analysis of overcomplete mixtures, in: Proc. ICASSP 2004, Vol. 5, Montreal, Canada, 2004, pp. 493–496. [4] P. Georgiev, F. Theis, A. Cichocki, Sparse component analysis and blind source separation of underdetermined mixtures, IEEE Trans. Neural Networks in press. [5] D. Lee, H. Seung, Learning the parts of objects by non-negative matrix factorization, Nature 40 (1999) 788–791.

26

[6] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1) (1998) 33–61. [7] F. Theis, E. Lang, C. Puntonet, A geometric algorithm for overcomplete linear ICA, Neurocomputing 56 (2004) 381–398. [8] M. Zibulevsky, B. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary, Neural Computation 13 (4) (2001) 863–882. [9] M. Lewicki, T. Sejnowski, Learning overcomplete representations, Neural Computation 12 (2) (2000) 337–365. [10] B. Olshausen, D. Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature 381 (6583) (1996) 607–609. [11] M. D. Plumbley, E. Oja, A ’non-negative pca’ algorithm for independent component analysis, IEEE Transactions on Neural Networks 15 (1) (2004) 66– 76. [12] E. Oja, M. D. Plumbley, Blind separation of positive sources using non-negative pca, in: Proc. ICA 2003, Nara, Japan, 2003, pp. 11–16. [13] W. Liu, N. Zheng, X. Lu, Non-negative matrix factorization for visual coding, in: Proc. ICASSP 2003, Vol. III, 2003, pp. 293–296. [14] P. Hoyer, Non-negative matrix factorization with sparseness constraints, Journal of Machine Learning Research 5 (2004) 1457–1469. [15] J. Basmajian, C. D. Luca, Muscle Alive. Their Functions Revealed by Electromyography, 5th Edition, Williams & Wilkins, Baltimore, 1985. [16] A. Halliday, S. Butler, R. Paul, A Textbook of Clinical Neurophysiology, John Wiley & Sons, New York, 1987. [17] D. Farina, R. Merletti, R. Enoka, The extraction of neural strategies from the surface EMG, Journal of Applied Physiology 96 (2004) 1486–1495. [18] G. Garc´ıa, R. Okuno, K. Akazawa, Decomposition algorithm for surface electrode-array electromyogram in voluntary isometric contraction, IEEE Engineering in Medicine and Biology Magazine in press. [19] C. Disselhorst-Klug, J. Silny, G. Rau, Estimation of the relationship between the noninvasively detected activity of single motor units and their characteristic pathological changes by modelling, Journal of Electromyography and Kinesiology 8 (1998) 323–335. [20] S. Andreassen, A. Rosenfalck, Relationship of intracellular and extracellular action potentials of skeletal muscle fibers, CRC critical reviews in bioengineering 6 (4) (1981) 267–306. [21] H. Clamann, Statistical analysis of motor unit firing patterns in a human skeletal muscle, Biophysical Journal 9 (1969) 1233–1251.

27

[22] P. Griep, F. Gielen, H. Boom, K. Boon, L. Hoogstraten, C. Pool, W. WallingaDe-Jonge, Calculation and registration of the same motor unit action potential, Electroenceph Clin Neurophysiol 53 (1973) 388–404. [23] E. St˚ alberg, J. Trontelj, Single Fiber Electromyography, Old Working (UK), The Mirvalle Press, 1979. [24] S. Maekawa, T. Arimoto, M. Kotani, Y. Fujiwara, Motor unit decomposition of surface emg using multichannel blind deconvolution, in: Proc. ISEK 2002, Vienna, Austria, 2002, pp. 38–39. [25] K. McGill, K. Cummins, L. Dorfman, Automatic decomposition of the clinical electromyogram, IEEE Transactions on Biomedical Engineering 32 (7) (1985) 470–477. [26] P. Comon, Independent component analysis - a new concept?, Signal Processing 36 (1994) 287–314. [27] F. Theis, A new concept for separability problems in blind source separation, Neural Computation 16 (2004) 1827–1850. [28] J. H´erault, C. Jutten, Space or time adaptive signal processing by neural network models, in: J. Denker (Ed.), Neural Networks for Computing. Proceedings of the AIP Conference, American Institute of Physics, New York, 1986, pp. 206–211. [29] J.-F. Cardoso, A. Souloumiac, Jacobi angles for simultaneous diagonalization, SIAM J. Mat. Anal. Appl. 17 (1) (1995) 161–164. [30] A. Yeredor, Non-orthogonal joint diagonalization in the leastsquares sense with application in blind source separation, IEEE Trans. Signal Processing 50 (7) (2002) 1545–1553. [31] A. Ziehe, P. Laskov, K.-R. Mueller, G. Nolte, A linear least-squares algorithm for joint diagonalization, in: Proc. of ICA 2003, Nara, Japan, 2003, pp. 469–474. [32] G. Garc´ıa, K. Maekawa, K. Akazawa, Decomposition of synthetic multi-channel surface-electromyogram using independent component analysis, in: Proc. ICA 2004, Vol. 3195 of Lecture Notes in Computer Science, Granada, Spain, 2004, pp. 985–991. [33] S. Takahashi, Y. Sakurai, M. Tsukada, Y. Anzai, Classification of neural activities from tetrode recordings using independent component analysis, Neurocomputing 49 (2002) 289–298. [34] A. Hyv¨ arinen, E. Oja, A fast fixed-point algorithm for independent component analysis, Neural Computation 9 (1997) 1483–1492. [35] P. Paatero, U. Tapper, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics 5 (1994) 111–126.

28

[36] D. Lee, H. Seung, Algorithms for non-negative matrix factorization, in: Advances in Neural Information Processing (Proc. NIPS 2000), Vol. 13, MIT Press, 2000, pp. 556–562. [37] F. Theis, P. Georgiev, A. Cichocki, Robust overcomplete matrix recovery for sparse sources using a generalized hough transform, in: Proc. ESANN 2004, d-side, Evere, Belgium, Bruges, Belgium, 2004, pp. 343–348. [38] P. Bradley, O. Mangasarian, k-plane clustering, Journal of Global Optimization 16 (1) (2000) 23–32. [39] C. D. Luca, Physiology and mathematics of myoelectric signals, IEEE Transactions on Biomedical Engineering 26 (6) (1979) 313–325. [40] C. D. Luca, R. LeFever, M. McCue, A. Xenakis, Control scheme governing concurrently active human motor units during voluntary contractions, Journal of Physiology 329 (1982) 129–142. [41] S. Amari, A. Cichocki, H. Yang, A new learning algorithm for blind signal separation, Advances in Neural Information Processing Systems 8 (1996) 757– 763. [42] D. Xu, J. Principe, J. F. III, H.-C. Wu, A novel measure for independent component analysis (ICA), in: Proc. ICASSP 1998, Vol. 2, Seattle, 1998, pp. 1161–1164. [43] D. Tjøstheim, Measures of dependence and tests of independence, Statistics 28 (1996) 249–282. [44] R. Duda, P. Hart, D. Stork, Pattern classification, 2nd Edition, Wiley, New York, 2001. [45] F. Theis, S. Amari, Postnonlinear overcomplete blind source separation using sparse sources, in: Proc. ICA 2004, Vol. 3195 of Lecture Notes in Computer Science, Granada, Spain, 2004, pp. 718–725. [46] P. Zhou, Z. Erim, W. Rymer, Motor unit action potential counts in surface electrode array EMG, in: Proc. IEEE EMBS 2003, Cancun, Mexico, 2003, pp. 2067–2070. [47] B. Freriks, H. Hermens, European Recommendations for surface electromyography, results of the SENIAM project, Roessingh Research and Development b.v. (CD-ROM), 1999.

29