Multiresolution Representations and Data Mining of ... - CiteSeerX

Report 5 Downloads 57 Views
Multiresolution Representations and Data Mining of Neural Spikes for Brain-Machine Interfaces Sung-Phil Kim1, Jose M. Carmena2,3, Miguel A. Nicolelis2,3,4, Jose C. Principe1

Computational NeuroEngineering Lab1, University of Florida, Gainesville, FL 32611 Dept. of Neurobiology2, Center for Neuroengineering3, Department of Biomedical Engineering4 Duke University, Durham, NC-27710 Absract-In brain-machine interface (BMI) applications, neural firing activities have been represented by spike counts with a fixed-width time bin. Adaptive models have been designed to utilize these bin counts for mapping the associated behavior which is typically 2D or 3D hand movement. However, the representation of the firing activities can be enriched by binning neural spikes with multiple time scales based on multiresolution analysis. This multiresolution representation of neural activities can provide more accurate prediction of the hand movement parameters. Data mining techniques must be applied to models using multiresolution representation in order to avoid overfitting. In this paper, we demonstrate that the multiresolution representation improves the performance of the linear model for BMIs compared to the model with the fixed-width time bin.

I. INTRODUCTION The main function of the signal processing module in brainmachine interfaces (BMIs) is to learn mappings between neural activity patterns and motor parameters. To represent the neural activity, most of experimental designs utilize the estimate of local mean firing rate of neurons. The local mean firing rate has been estimated by binning neural spikes with a non-overlapping sliding time window of the length ranging from 50ms up to 100ms [1-5]. Those representations of the firing rate have been used for modeling of the relationship with responsive motor parameters. Adaptive models (including the simple Wiener filter [6]) based on this estimate have predicted motor parameters with a correlation coefficient between 0.6 ~ 0.8. It has also been shown that all the proposed models reached the same basic performance level for a target reaching task, which may not be sufficient for more involved real applications [7]. This led us to explore different representations of neural activity to help improve the accuracy of the models. One possible way is to estimate the firing rate with multi-scale binning, which can be referred as the multiresolution analysis of neural spike trains [8]. In this analysis, the short-time neural activity as well as the largescale activity can be represented. This multi-scale binning possibly discloses information that may not be available with fixed-width binning. With the multiresolution representation of neural spikes, however, the dimensionality of inputs to the mapping model will be considerably increased. Also, input variables are likely to be collinear, thus increasing the condition number of the input covariance matrix. Therefore, regularization methods seem very appropriate to avoid poor generalization. Among

the number of regularization techniques used in data mining, the method based on L1-norm penalty will be more suitable since it is able to select input variables. It will also enable us to understand the structure of the neural activity by selecting more correlated neurons with the associated behavior. In this paper, we present the multiresolution analysis of neural spikes and the application of modeling on those representations using the L1-norm based regularization method. II. MULTIRESOLUTION ANALYSIS OF NEURAL SPIKES The multiresolution representation of a neural spike train can be performed via the wavelet transform. To facilitate the computation, we apply the discrete wavelet transform with the dyadic Haar wavelets. This dyadic Haar wavelet is basically utilized in the à trous wavelet transform which can be implemented very effectively in hardware [9]. The smallest scale should be larger than 1ms because of the refractory period of a neuron. The largest scale should not exceed 1sec since it has been reported that the neural activity up to 1sec is correlated with the associated behavior [1]. In our model, we select eight scales starting at 5ms1 up to 640ms with dyadic scaling: 5, 10, 20, 40, 80, 160, 320, and 640ms. With a set of scales, the wavelet transform is computed on each neuronal spike train. The Haar wavelet transform can be regarded as multi-scale binning: at any given time, neural spikes are binned with different time window widths. The multi-scale binning process is repeated at each time instance at a sampling rate of 200Hz, therefore time windows are sliding with overlaps except the 5ms bin. In that case, the larger scale with wider overlaps represents a smoother temporal firing pattern. This procedure is illustrated in Fig. 1. Fig. 2 demonstrates an example of the wavelet transform coefficients (or multi-scale binned data) for some particular neuron. The coefficients (for 5sec-long data) are presented along with the associated hand movement trajectories in the bottom panel. In order to present the association between wavelet coefficients and the movement, the temporal pattern of the wavelet coefficients for each scale is plotted on top of the hand position trajectory (x-coordinate) in fig. 3. The figure demonstrates that temporal patterns with larger scales seem to

1 The minimum scale (5ms) was empirically determined for which binning yielded significantly different time series from raw spike trains.

t0 5ms

0

10ms 0 20ms 1 40ms 80ms

640ms

4 6

160ms 320ms

2

12

21

Fig. 1. An illustration of the multiresolution representation of a spike train. At a given time instance t0, the spikes are binned with six different widths. The number in each bin denotes spike counts.

Fig. 3. The temporal trajectory of neural spike counts with multi-scale binning along with the x-coordinate of the hand position. The solid line depicts the wavelet coefficients while the dotted line does the trajectory. Note that each trajectory is normalized to fit in the dynamic range of hand trajectory for the visual purpose.

hundred neurons and eight scales, the input dimension of a model reaches 1000. This fact leads us to employ data mining to select the both neurons and resolutions for model fitting. One of popular data mining techniques is to add a Lp-norm penalty to the mean squared error (MSE) cost function such as, 2 p (1) J ( w) = E[ e ] + λ ∑ wi i

Fig. 2. The wavelet transforms of a neuronal spike train. The top plot presents the wavelet coefficients with 8 scales: 5ms ~ 640ms. The darker pixels represent larger magnitudes of coefficients. The bottom plot shows the corresponding x-(solid), y-(dotted) coordinates of the hand position (top) and velocity (bottom).

be more correlated with hand trajectories than smaller scales. III. DATA MINING OF MULTIRESOLUTION REPRESENTATION The multiresolution representation of the firing activity from a large number of neurons leads to difficulties in model training due to; 1) a very large model (degrees of freedom), and 2) colinearity between inputs. For instance, with over one

where e denotes the error vector, λ is the regularization parameter, and wi is a model parameter. p is typically set to 1 or 2, where p=2 leads to ridge regression [10] in statistics or weight decay [11] in neural networks, and p=1 leads the LASSO (Least Absolute Shrinkage and Selection Operator) [12] in statistics. In a Bayesian framework, p=2 corresponds to the Gaussian prior in parameter space, and p=1 corresponds to the Laplacian prior [13]. It has been shown that the L1-norm penalty generates sparse models, and thus provides better generalization. The L1-norm penalty also enables the selection of variables, which is useful for the analysis of data. Hence, we choose to regularize models with the L1-norm penalty in the multiresolution representation. The LASSO can be implemented by the least angle regression (LAR) algorithm which has been recently proposed by Efron et. al. [14]. The LAR procedure provides the solution of the minimization problem of the cost function (1) with much less computational complexity than LASSO. The LAR procedure starts with all zero coefficients. The

input variable having the most correlation with the desired response is selected. We proceed in the direction of the selected variable with a step size which is determined such that some other input variable achieves as much correlation with the current residual as the first input. Then, we move in the equiangular direction between these two inputs until the third input has the same correlation. This procedure is repeated until either all input variables join the selection, or the sum of parameters meets a preset threshold. In this way, the maximum correlation between inputs and the residual decreases over each selection step. IV. BRAIN-MACHINE INTERFACES DESIGN In this section, we present the BMIs design based on the multiresolution representation of neural spikes and the linear model learned by the LAR algorithm. We will demonstrate the analysis of neural activities in BMIs based on variable selection. Then, the generalization performance of prediction models using the multiresolution representation will be compared with the model with fixed-width binning. Data Descriptions The neural recordings were collected in Dr. Nicolelis primate laboratory at Duke University. Microwire electrode arrays [5] were chronically implanted in the dorsal premotor cortex (PMd), supplementary motor area (SMA), primary motor cortex (M1, both hemispheres) and primary somatosensory cortex (S1) of an adult female monkey (Macaca mulatta) while performing a target hitting task. The task involved the presentation of a randomly placed target on a computer monitor in front of the monkey. The monkey used a hand-held manipulandum (joystick) to move the computer cursor so that it intersects the target. While the monkey performed the motor task, the hand position was recorded at 50Hz in real time along with the corresponding neural activity from multiple channels [5]. Then it was upsampled to 200Hz using cubic spline interpolation. After spike-detection and sorting [5], a total 185 unit/multiunit were extracted in the particular recording session. The distribution of these neurons over cortical areas is summarized in table I (numerical indices were labeled for neuron identification purpose). 320-sec dataset was used for training and consecutive 240-sec dataset was for testing. Multiresolution Based Modeling and Analyses Each neural spikes were binned with multiple scales from 5ms~640ms. The binned data with each scale was normalized to have zero-mean and maximum magnitude of 1 such that a model can avoid biasing to the larger scale inputs. 185 neurons with 8 scales yielded the input dimension of 1480. The multiresolution binning procedure for the 320-sec training dataset generated an input data matrix X (64000x1480), where each row represents the input vector at a given time instance. Then, a linear model was built to predict the desired response (x-, or y-coordinates of hand position or velocity) vector d

TABLE I DISTRIBUTION OF NEURONS IN CORTICAL AREA PMd M1 S1 1~66 67~123 124~161 a (57) (38) ( 66) a The number of neurons in each area

SMA

M1_ipsilateral

162~180 (19)

181~185 (5)

(64000x1) with the linear combination of X such as, d = dˆ + e = Xw + e (2) where w is the model weight vector, and e is the error vector. The desired responses were normalized to have zero-mean. The weight vector w was learned by the LAR algorithm. To determine the L1-norm constraint, we utilized the hold-out cross-validation [15]. We held the last 10% of the training data for the validation set. The threshold was determined by minimizing the MSE for the validation set. The LAR algorithm stopped learning when the L1-norm of weights reached this threshold. The LAR algorithm selected a different subset of input variables for each desired response (there were four responses including x-, and y- coordinates of hand position and velocity). The cortical distributions of the number of selected neurons are summarized in table II. Fig. 4 describes the selection results for each desired response. The black pixels denote the selected variables aligned in neuronal space (xaxis) with the scales in y-axis. These graphs show that LAR tends to select larger scales since the temporal trajectory of larger scales exhibited more correlation with movement as shown in fig. 3. Model Comparison with Fixed-Width Binning We designed two linear models with different input data. In the first linear model, a fixed-width time window of length 80ms was used for binning. Then, the binned data for each neuron were embedded by time delay lines consisting of five delays (six taps from the delay line). These binning and embedding procedures yielded 185×6=1110 input variables. The same delay line (5 delays) was applied to each scale of multiresolution data in the second model, yielding eight times as many input variables (185×8×6=8880) as the first model. The LAR algorithm was used for both models to estimate weights for 2D hand position and velocity. The number of nonzero weights after training is listed in table III. It is noteworthy that the weights for the multiresolution data were more pruned compared to the case of single resolution. TABLE II THE NUMBER OF SELECTED NEURONS FROM EACH AREA PMd a

M1

S1

SMA

M1_ips.

Position-X

18 ( 27%) 27 (47%)

9 (24%)

7 (37%)

2 (40%)

Position-Y

20 (30%) 26 (39%) 15 (39%)

7 (37%)

0 (0%)

Velocity-X

50 (76%) 46 (81%) 30 (79%) 15 (79%) 5 (100%)

Velocity-Y 40 (61%) 42 (74%) 30 (79%) 13 (79%) a The ratio of the number of selected neurons.

3 (60%)

TABLE III THE NUMBER OF NONZERO PARAMETERS Position-X

Position-Y

Velocity-X

V. CONCLUSIONS Velocity-Y

Single-scale 568 (a51.1%) 675 (60.8%) 359 (32.3%) 379 (34.1%) a

Multi-scale 344 (3.9%) 449 (5.1%) 379 (4.3%) 580 (6.5%) The ratio of the number of nonzero parameters to the total.

In this paper, we have introduced a multiresolution representation of neural firing activities in BMIs based on discrete wavelet transform. This representation builds a richer input feature space for the linear prediction model when compared with the traditional fixed-width time window binning. The empirical results have shown that multiresolution based linear model slightly improved the generalization performance compared to the single resolution based model. We have also proposed to use a data mining technique in order to reduce the dimension of the multiresolution input space that linearly increases with the number of scales. The LAR algorithm, which can select variables using the L1-norm penalty, could effectively regularize the linear model. Furthermore, this variable selection technique has enabled interesting analysis of neural activity related to behavior. Further studies will pursue this topic. ACKNOWLEDGEMENT This work was supported by a DARPA sponsored grant # ONR-450595112 to MALN and the CRPF to JMC. REFERENCES

Fig. 4. The distribution of the selected input variables for (a) x-coordinate, (b) and y-coordinate of position, and (c) x-coordinate, and (d) y-coordinate of velocity. Black pixels denote the selected inputs. Neuron indices are aligned in the x-axis, and scales are in the y-axis.

We used two performance measures including the correlation coefficients (CC) and the normalized MSE. The CC varies within [-1 1] where 0 indicates no correlation and 1 indicates the perfect correlation between the desired response and the model output. The normalized MSE was computed as dividing MSE by the desired signal power. These measures were evaluated for the test dataset to assess generalization performance. This analysis resulted in superior performance of the linear model with multiresolution data compared to the one with fixed-width binned data, as summarized in table IV. To assess the statistical difference of performances between two models, we performed the one-tail t-test based on MSE as proposed in [7]. The null hypothesis of no difference between model performance was rejected at significance level of 0.05/0.01 (p