I’ve been slowly but surely implementing core neural network functionality using the JavaScript language. My most recent exploration was modifying a back-propagation function so that it used momentum. The motivation is to speed up training. The idea is best explained using code. Here’s a snippet of the training code:
// update input-to-hidden weights
for (let i = 0; i < this.ni; ++i) {
for (let j = 0; j < this.nh; ++j) {
let delta = -1.0 * lrnRate * ihGrads[i][j];
this.ihWeights[i][j] += delta;
this.ihWeights[i][j] += momRate * ihPrevWtsDelta[i][j];
ihPrevWtsDelta[i][j] = delta; // save delta for next time
}
}
This code updates ihWeights[i][j] which are the weights that connect input node [i] to hidden node [j]. Each weight is incremented by a quantity delta which depends on the gradient for the i-j weight times a small value such as 0.01 which is the learning rate. This adding delta update decreases the error of the network. (The gradients are computed earlier, and that’s the tricky part).
Momentum adds a second quantity to each weight. The momentum quantity is the value of the previous delta times a small value like 0.6 which is the momentum rate. The effect of this is that as long as the updates are improving, the amount added each times increases.
Now eventually, each weight value will overshoot an optimal value, and the updates will change signs and reduce the value. But experience has shown that this process speeds up training in the long run.
Suppose you’re blindfolded and are trying to reach a goal that is, unknown to you, 100 feet ahead. You could take one step and check to see if you’re at the goal yet, Then take another step, and so on until you get close enough to the goal. This is like regular back-propagation training.
Or you could take one step, and determine you’re not at the goal yet. Next you take two steps. Then four steps, then eight steps and so on. Eventually (rather quickly) you would overshoot your goal that’s 100 feet ahead, and then you’d turn around and go back until you reach the goal. This is like momentum. Your progress might look something like:
at 0, take 1 step, at 1 at 1, take 2 steps, at 3 at 3, take 4 steps, at 7 at 7, take 8 steps, at 15 at 15, take 16 steps, at 31 at 31, take 32 steps, at 63 at 63, take 64 steps, at 127 at 127, go back 32 steps, at 95 at 95, take 4 steps, at 99 close enough, stop.
This is highly simplified and just an analogy of course but should help you understand how momentum speeds up training.

Four famous pinball machines. “Baffle Ball” (1931) – First widely successful machine, no flippers. “Humpty Dumpty” (1947) – First machine with flippers. “Knock Out” (1950) – Very high quality machine for its time. “Creature from the Black Lagoon” (1992) – Considered by fans to be one of the top-50 best modern machines.

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
Don’t really have a question, just wanted to thank you for the wonderful content you produce here, MSDN Magazine, Visual Studio Magazine, Succinctly ebooks, etc.
I can count on the fingers of one hand the number of people who explain statistics, metaheuristics, and machine learning concepts and methods as clearly as you do, especially in the context of actually programming them. I recommend your content regularly to both newcomers and experienced researchers/programmers trying to learn new paradigms, algorithms, and methods.
So, thank you once again, your help was invaluable for me on more than one occasion.
Thank you for the kind words!