Converting an Image into Spike Trains for a Spiking Neural Network

A spiking neural network (SNN) is one where the artificial neurons more closely model biological neurons than the neurons in a traditional artificial neural network (ANN). An SNN accepts a stream of 0s and 1s called a spike train, as opposed to ANNs which accept floating-point values such as 0.3827.

Suppose you want to create a neural classifier for images. Suppose the image is 28 by 28 = 784 pixels and each pixel value is a number (or possibly three numbers representing red, green, blue). If you use a traditional ANN you can just create a network with 784 input nodes.

But if you want to use an SNN, it’s not entirely obvious how to represent the image input data as a spike train. A well-known benchmark dataset for image recognition is called the MNIST (modified National Institute of Standards and Technology) handwritten digit database. The dataset consists of 70,000 28 x 28 pixel images of a single digit (0 to 9). Each pixel is a grayscale intensity value between 0 = 00h (no stroke) and 255 = FFh (dark stroke). The dataset is divided into a 60,000-item set for training (6,000 instances of each digit) and a 10,000-item set for testing (1,000 instances of each digit). An example of a ‘5’ digit is shown below.


This is the first image in the MNIST images training data.

One approach for representing an MNIST image using spike trains is to create a set of 28 * 28 = 784 spike train sequences where each sequence has a firing rate that is proportional to the pixel value, as demonstrated in my blog post at https://jamesmccaffreyblog.com/2019/12/13/generating-artificial-spike-trains-for-spiking-neural-networks/. For example, pixels with small values will be assigned a small firing rate and generate mostly 0-values in their associated spike train. Pixel values that are moderate in magnitude, such as 126 = 7Eh, will be assigned a moderate firing rate and generate roughly 50% spiking 1 values and 50% non-spiking 0 values. Pixel values that are large, such as 255 = FFh will be assigned a large firing rate and generate spike trains that are mostly 1 values.

Because the resulting spike trains have a random component, you’d need to create very long spike trains for each pixel, as opposed to just a handful. This corresponds to the way a biological retina of a person with their eyes open is continuously processing a stream of input information.



Three images that were used for the covers of 1960s era science fiction magazines by artist Ed Emshwiller (1925 – 1990). He won four Hugo awards for science fiction art. It’s unlikely that anyone in the 1960s could have predicted the image recognition technologies that exist today.

This entry was posted in Machine Learning. Bookmark the permalink.