I have implemented quite a few different types of neural networks using several different programming languages. The other day I took a close look at implementing a neural network with dropout training, using JavaScript.
Dropout is a technique that is used during NN training, which is intended to reduce model overfitting. The idea of the technique is quite simple but implementation is a bit tricky. To implement dropout, on each training item (or mini-batch of items), randomly select approximately one-half of the hidden nodes and then act as if the selected nodes just aren’t there. Put slightly differently, during training you drop nodes, not by physically removing them, but by ignoring their presence.
Why dropout sometimes creates a better NN prediction model is not fully understood but there are two key ideas. First, by randomly dropping hidden nodes, the hidden nodes can’t rely on the presence of other hidden nodes (called “co-adaptation”). Second, by randomly dropping nodes, you are generating many different NN sub-models which are averaged together.
In my demo code, I used the most straightforward approach possible. There are two functions that use hidden nodes – the forward-pass function (often called eval() or compute() or similar), and the backward pass training function. Such code always looks something like:
// compute hidden node values
for (let j = 0; j < this.nh; ++j) {
for (let i = 0; i < this.ni; ++i) {
hSums[j] += this.iNodes[i] * this.ihWeights[i][j];
}
hSums[j] += this.hBiases[j];
this.hNodes[j] = hyperTan(hSums[j]);
}
So, to pretend that a particular hidden node is not there, you can maintain a list of the indices of nodes that have been picked as drop-nodes, and then modify the normal non-dropout code to this:
// compute hidden node values
for (let j = 0; j < this.nh; ++j) {
if (useDropOut == true && this.dNodeIndices[j] == 1) {
continue;
}
for (let i = 0; i < this.ni; ++i) {
hSums[j] += this.iNodes[i] * this.ihWeights[i][j];
}
hSums[j] += this.hBiases[j];
this.hNodes[j] = hyperTan(hSums[j]);
}
This approach is relatively simple, although there are several details to deal with.
Apart from implementing dropout, there’s also a couple of questions related to using dropout. Based on my experience, when using dropout you must often use more hidden nodes and also use more training iterations. When dropout works, sometimes the prediction accuracy on the training data is a bit worse, but prediction accuracy on the test data (which is the ultimate goal) is a bit better. Finally, based on my experience, using dropout doesn’t always help, and sometimes produces a worse NN model than not using dropout.

Internet search for people who dropped out of school. Successful college dropouts: Bill Gates, Paul Allen, Michael Dell, David Neeleman (JetBlue), Steve Jobs, Steven Spielberg, Mark Zuckerberg. Well-known high school dropouts: Gisele Bundchen (model), Hilary Swank (actress), Katy Perry (music), Jessica Simpson (music), Cameron Diaz (actress), Paris Hilton (TV), Jennifer Lawrence (actress).

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
Puh i love Dropout, and DropConnect and ResidualNets, one good approach is to make sure, that the drop is combined with the count of the neurons of each layer. Prevent the big drops on the low neuron layers. If u see the drop visual, u can find the problems much easier.
Take DropConnect, if u got no control and u drop all connections from one neuron, u create a ghost with 0, activated with the sigmoid the result is a trouble 0.5 value in our network, but this is a problem of DropConnect only.
My approach is a one index nn, better called as “all is a percetron” concept, so u can control every drop (layer = i, neuron = j, weight = m).
int u[] = {3,5,5,5,2};
int input = u[0];
int dnn = ArraySize(u);
for(int i=0, j=0, t=0, w=0; i < dnn; i++, t+=u[i-1], w+=u[i-1]*u[i])
for(int k=0; k < u[i+1]; k++, j++){
if(dropped[j])continue;
double net = bias[j];
for(int n=t, m=w+k; n < t+u[i]; n++, m+=u[i+1])
net += neuron[n] * weight[m];
neuron[j+input] = activation((net),actFunc[i(or)j]);
}
The code is a little bit tricky, just ignore this black coding magic in the i loop. I used it for a short demo and without helpstep arrays, but it worked on my old dead language mql4.
Hope u like the idea behind it, and with Dropout we can drop the drop for the backprop if we multiply with 0.
I’ll look this code over when I get some free time, but it looks interesting. Like you, I’ve examined different implementation strategies for dropout and have discovered (as your comment suggests) that although dropout is very simple in concept, implementing dropout is very, very tricky — in the sense that there are many alternative approaches. JM
Hope you think a little bit about how you can build the backprop, but to think about it has cost me months. Take the picture, thats how I explain me the process in the code.
https://photos.app.goo.gl/2fvCnnE5Hpshbi2W6
You can see how the code works, but with some lags because of my old Laptop.
https://youtu.be/jZgb3-W7BpQ
Green stands for the FF, red for BP, gold is our delta (“weight update”).
FF = take the products and the bias and activate.
BP = calc gradient and update weights.
Keep in mind, we dont need a netinput array and we can take a variable for summing up in FF and BP. We can control every activation over the actFunc list. The outputs should not be activated, so we got a clear process for the moment.
FF -> error -> BP -> update…
One move to understand is the j index. The bias, or if we want to use a netinput array, can drop the inputs, but the neuron index is build with them. On the picture you see both indices, neuron green and gradient red, you can take the gradient index also as netinput or bias index.
So I hope you got the FF, if we use the maxtrick with Softmax we will get the maxPosition too, this tweak follows to 1 calc of the Crossentropy.
Think about, the code is for every common language in the c family. Thats why some decission is not as it may expect.
So, backprop, its a raw clean version without dropout I show you here, because if you want to understand it well, the base are important.
int nns = 0, wnn = 0, dnn = ArraySize(u)-1; // i have to correct the first init, its -1
for(int n=0;n<dnn+1;n++)ust[n+1] = nns += u[n]; // nn steps {0,3,8,13,18,20}
for(int n=1;n<dnn+1;n++)wst[n] = wnn += u[n-1] * u[n]; // weight steps {0,15,40,65,75}
ArrayResize(ust,cnn+2); ArrayResize(wst,cnn+1);
for(int i=dnn, j=nns-1, m=wnn-1; i != 0; i–)
for(int k=1; k ust[i-1]; w-=u[i])
delta[w] = weight[w] + (-eps * gra * neuron[n]);
gradient[j-input] = gra;
}
//— update weights and bias
These was the essentials. There are more details, and I got some nice parity pattern for drop, but for this time its enough. For me you are a living legend, so this is what I want to give you back and maybe it helps for a better world. 🙂
TK
Sorry, somthing left in the code:
for(int i=dnn, j=nns-1, m=wnn-1; i != 0; i–)
for(int k=1; k ust[i-1]; w-=u[i])
delta[w] = weight[w] + (-eps * gra * neuron[n]);
gradient[j-input] = gra;
}