I gave a talk on fundamentals of neural networks recently. My approach is to use a combination of pictures plus code. When I hit the part about initializing weights and biases, I mentioned that initialization is surprisingly important, but I didn’t have time to go into details.
The most common approach for weight initialization is to use uniform random values in some range, for example [-0.01, +0.01] but several deep learning libraries now use Glorot initialization as the default. I coded up a program to demonstrate.
Actually, the term “Glorot initialization” is ambiguous because there are two variations. In both cases the idea is to compute a standard deviation based on the architecture of the network. In one variation of Glorot, the standard deviation is used with a uniform random distribution to generate initial weight values. In the second variation, the standard deviation is used with a Normal (Gaussian) distribution. But somewhat confusingly, both variations are sometimes called “normalized initialization”.
My demo program run uses the Glorot uniform variation where standard deviation = sqrt(6.0 / (fan-in + fan-out)) for each layer. It’s usual practice to leave initial biases values at 0.0.
The paper considered one of the original sources for Glorot initialization is at: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
The moral of the story is that when getting up to speed with deep learning, it’s important to know lots of little details but also be able to work at higher levels of abstraction.

“Victoria Harbor”, Kwan Yeuk Pang. Interesting combination of detail and abstraction.

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.