Neural Network Library Training Algorithms

For people who are new to neural network libraries such as Keras, CNTK, PyTorch, and TensorFlow, selecting a training algorithm can be a bit confusing. All the libraries support the five main algorithms: stochastic gradient descent (SGD), Adagrad, Adadelta, Adam, and RMSprop. But there are many other algorithms that are variations of the basic five.

Before I go any further, let me summarize rules of thumb for the main five. Use SGD for single hidden layer networks. Don’t use Adagrad except to replicate somebody else’s result. Use Adadelta or RMSprop for recurrent neural networks. Use Adam for general deep neural networks. These rules of thumb are only to get started and every problem requires a lot of experimentation.

Every algorithm has many parameters which makes the totsal number of possibilities for a trainer astronomically large — far too large to try them all. So, training a neural network really is art and science, where the art aspect really means educated guesses based on experience.

I scanned through the libraries’ documentation and put together a chart. One of the problems with libraries is that they often have too much functionality. This is mostly a matter of psychology. If the developers of one library implement a feature, developers of the other libraries feel obliged to do the same.

Four paintings by German artist Carl Spitzweg (1808 – 1885). His work strikes me as a combination of (mostly) art and (some) science.