Temporal Pattern Classification with Spiking Neural Networks [1] Alex Rowbottom, André Grüning, Brian Gardner Department of Computing, University of Surrey, UK
5. Audio to Spike Encoding Process
Create a method of audio to spike pattern conversion - We develop a method of converting speech into spike patterns based upon Fourier transforms, used as inputs to feed into a spiking neural network.
Classify speech – We run experiments with the classification of real speech data using the network, to assess the potential of such a system. We explore single and multi-spike classification methods and assess the potential of these.
These images are taken from processing the spoken word ‘zero’.
Neuron model – We define the total membrane potential 𝑢 at time t using an adaptation of Pfister’s spike response neuron model [2][3]. We are using a deterministic rather than stochastic model for this system, meaning that when the post-synaptic potential u(t) is equal or greater than the firing threshold ϑ, we fire a spike.
Tempotron Learning [3]- The Tempotron learning rule is used to teach a network to fire an arbitrary spike-train only to the output node associated with a class, and must remain silent on the other output nodes for a successful classification. The change in synapse weight at time t is defined as follows: 𝑓
𝜖0 𝑡𝑚𝑎𝑥 − 𝑡𝑖 exp − 𝜏𝑚 − 𝜏𝑠 𝜏𝑚
𝑡 − 𝑡𝑖 𝜏𝑠
outputting a spike train with precise target spike times. Unlike the Tempotron, this means the Pfister rule has the ability to classify multiple classes with just one neuron, by having differing spike train target configurations per class. We implement a deterministic version as follows: ∆𝑤𝑗 = 𝜆 𝑓 𝑡0
𝑓 𝑃𝑆𝑃𝑗 𝑡0
−
3 - Threshold and scale – We set any data point as either the minimum or maximum amplitude, reducing each frequency/time point to a binary value. We also remove approximately the top 10,000Hz, as these frequency hold barely any vocal information. 4 - Convert to spikes – We take each frequency bin as an input neuron, and each binary point as a spike. This leaves us with a spike train for each input neuron. To give more effective inputs, we thin out the number of subsequent spikes, and balance the low and high frequency bins to mimic the logarithmic nature of human hearing.
Time
(4]
We train the network using a set of male spoken digits from 0 to 9, each converted into a series of spike trains and sent through a single layer spiking neural network. We then test the network’s ability to differentiate between varying numbers of digits (or classes), and achieve the following results:
Tempotron – The Tempotron rule performed exceptionally well with the classification of up to four classes, and reasonably well for five and six classes. For noisy, variable data such as speech, using only one layer of neurons, this is an impressive result.
𝑓
For these experiments we use feedforward single layer configurations, where we have a layer of input neurons, connected to one or many output neuron.
2 Classes 100.00%
3 Classes 100.0%
4 Classes 100.00%
5 Classes 85.33%
6 Classes 81.36%
7 Classes 61.90%
Pfister –Although on the surface it looks like Tempotron learning performed better, we must remember that the Pfister rule has a much harder task as it only uses a single neuron. We see how three spikes is optimum when differentiating between two classes, and two spikes when differentiating between three or four.
Single Output Neuron
We use multiple output neurons for Tempotron learning, where we assign each to a specific class. We train the network to fire on the output node associated with the input’s class, whilst the others must remain silent.
(3)
6. Results
𝑡0
4. Network Configurations
Multiple Output Neurons
Time
Time
Accuracy
We use a single output neuron for Pfister learning, where we match the output spike train with the closest generated target train for each class, using the Van Rossum distance metric. These target spike trains can have multiple spikes, and part of this research investigates how the number of spikes affects the learning.
(2)
𝑓
− exp −
Pfister Learning [2][3]- The Pfister rule is able to optimise the probability of
𝑓 𝑃𝑆𝑃𝑗 𝑡0
Frequency
3. Spiking Network Models
2 - Perform short-term Fourier transforms – We take very small windows of the clip and transform each from the time to the frequency domain. We then join these windows to produce a spectrogram, where red represents high amplitudes of a frequency bin at a certain time, and blue represents low amplitudes.
Frequency
Here we propose a biologically-inspired method for audio to spike encoding and the recognition of speech, using spiking neural networks. We then train the network with sets of spoken words and test with different sets.
References
Time
Input Neurons
The process of how auditory neurons encode and learn to recognise speech signals is still an open question. Humans learn to differentiate between even very slight variations of tone, identifying words with ease, whilst current methods of speech recognition within machines either perform poorly in comparison or are biologically-implausible.
𝑡𝑖