I Track Down a Nasty Bug in a CNTK Classification Program

I’m a big fan of the CNTK library. But . . .

CNTK is a powerful library of machine learning code. Whenever I look at a machine learning library (such as TensorFlow, Keras, Caffe, Theano, scikit-learn, Torch) my first example is usually to create a neural network classifier for Fisher’s Iris Dataset. The dataset has 150 items where each item represents one of three species of iris (setosa, versicolor, virginica). Each item has four predictor variables (sepal length and width, and petal length and width). There are 50 of each species.

Version 2.0 of CNTK was released in June 2017. I had very little trouble getting a nice NN classifier up and running. But when version 2.1 was released, my demo program no longer worked. Odd. Then a couple of days ago, CNTK version 2.2 was released and my NN classification program that worked perfectly on v2.0 still didn’t work.

When I say “didn’t work”, I mean that the program ran but it just didn’t learn. After training, the classification accuracy was 0.3333 = 33% = one-third. In other words, because there are only three species and the accuracy was 33%, the classifier was just guessing.

I spent hours and hours trying to track down the problem. At last I determined the problem occurred in this code:

print("Creating a 4-10-3 tanh softmax NN for Iris data ") 
with default_options(init = \
  glorot_uniform()):
  hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh,
    name='hidLayer')(input_Var)  
  oLayer = Dense(output_dim, activation=C.ops.softmax,
    name='outLayer')(hLayer)
nnet = oLayer

This code defines the NN architecture. It may look a bit tricky, but it’s fairly simple, and I spent most of my time looking at other parts of the code, especially the data reading routines.

It turns out that the problem was in the init = glorot_uniform() initialization. When creating a NN you must initialize all the weights and biases to small random values. The Glorot technique is one of many supported by CNTK.

Well, after hours of trying almost everything else, I finally replaced the initializer, and everything magically worked:

print("Creating a 4-10-3 tanh softmax NN for Iris data ") 
with default_options(init = \
  C.initializer.uniform(scale=0.10, seed=1)):
  hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh,
    name='hidLayer')(input_Var)  
  oLayer = Dense(output_dim, activation=C.ops.softmax,
    name='outLayer')(hLayer)
nnet = oLayer

The uniform initializer is the simplest and most primitive initialization technique.

I don’t know if there is a bug in the CNTK code, or if there was just a design change in the Glorot code.

The moral of the story is that using a machine learning code library usually saves a lot of time, but debugging library code is much more difficult than debugging raw code.

“Discovery” – by Sandra Bauser