Neural Anomaly Detection Using Keras

I wrote an article titled “Neural Anomaly Detection Using Keras” in the March 2019 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2019/03/01/neural-anomaly-detection-using-keras.aspx.

Anomaly detection, also called outlier detection, is the process of finding rare items in a dataset. Examples include finding fraudulent login events and fake news items. In general, anomaly detection is often extremely difficult, and there are many different techniques you can employ.

Demo run showing an anomalous ‘2’ data item

In my article, I demonstrate a neural network base approach. The idea is to create a neural autoencoder for a set of data. An autoencoder learns to predict its input, or, put another way, an autoencoder replicates data. The main principle of the technique presented in the article is that the more difficult a data item is to replicate, the more likely it is to be an anomaly.

The article uses the well-known MNIST image dataset. Each data item is 784 pixels representing a hand-drawn image of a digit from ‘0’ to ‘9’. For simplicity, instead of the entire 60,000 training items, I used just 1,000 images.

A typical, non-anomalous ‘2’ image.

So, there are several parts to the puzzle. First the raw data has to be prepared. Next you must define a neural autoencoder. I did so using the Keras code library which is a wrapper over the difficult-to-use TensorFlow library. Next you must define a metric that measures the difference/discrepancy between a predicted output and an actual output.

Anomaly detection using a deep neural autoencoder is not a well-known technique. An advantage of using a neural technique compared to a standard clustering technique is that neural techniques can handle non-numeric data by encoding that data. Most clustering techniques depend on a distance measure which means the source data must be strictly numeric.

Three images from an Internet search for “Dickens London”. Left: unknown setting and artist. Center: London Bridge under construction (Joseph Josiah Dodd). Right: unknown setting and artist but looks to me like it’s from a video game.

3 Responses to Neural Anomaly Detection Using Keras

sau001 says:

March 7, 2019 at 9:53 am

Nice one. I have a few questions:
1)What is the size of the feature vector that was presented to the Autoencoder? Was it 784 or did you apply some feature engineering?

2)How many training samples did you use for the digit 2? You mentioned 1000 training images. Does that mean 100 images of the digit 2?

Loading...
- jamesdmccaffrey says:
  
  March 8, 2019 at 2:53 pm
  
  1.) Yes, I used 784 as the input size. With neural techniques, feature engineering is normally not needed because the network weights and biases values will adjust to give less weight to less relevant inputs.
  2.) Yes, the training data had 100 of each digit. (I think — I did this several weeks ago. But I can’t imagine any reason why I’d not use 100 of each digit.)
  
  Loading...
  - sau001 says:
    
    March 10, 2019 at 3:22 am
    
    Cheers
    
    Loading...