"Neural Network Regression from Scratch Using C#" in Visual Studio Magazine

I wrote an article titled “Neural Network Regression from Scratch Using C#” in the October 2023 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2023/10/18/neural-network-regression.aspx.

The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict the annual income of a person based on their sex (male or female), age, State of residence and political leaning (conservative, moderate, liberal).

There are roughly a dozen major regression techniques, and each technique has several variations. Among the most common techniques are linear regression, linear ridge regression, k-nearest neighbors regression, kernel ridge regression, Gaussian process regression, decision tree regression and neural network regression. My article gives a complete end-to-end demo of neural network regression from scratch, using the C# language.

The demo uses one of my standard synthetic datasets that looks like:

0, 0.24, 1,0,0, 0.2950, 0,0,1
1, 0.39, 0,0,1, 0.5120, 0,1,0
0, 0.63, 0,1,0, 0.7580, 1,0,0
. . .

The fields are sex (0 = male, 1 = female), age (divided by 100), State (100 = Michigan, 010 = Nebraska, 001 = Oklahoma), income (divided by $100,000) and political leaning (100 = conservative, 010 = moderate, 001 = liberal). The goal is to predict income from the other four variables.

The demo neural network has architecture 8-100-1 with tanh hidden activation, identity output activation, and uniform [-0.01, +0.01) weight initialization. The network is trained using basic SGD optimization for 2,000 epochs with a batch size of 10, and a constant learning rate of 0.01.

The article explains NN IO using a small 3-4-1 network

The demo neural network uses a single hidden layer. It is possible to extend the demo network architecture to multiple hidden layers, but this would require a huge effort. Theoretically, a neural network with a single hidden layer and enough hidden nodes can compute anything that a neural network with multiple hidden layers can compute. This fact comes from what is called the Universal Approximation Theorem.

“Haute couture” is French, meaning “high dressmaking” — exclusive from-scratch fashion. A lot of haute couture is absurd, ugly, and impractical. But some are nice, like the three shown here. I like to think that I sometimes create “haute systems” (not really). Left: By designer Iris van Herpen. Center: By Ashi Studio. Right: By Julien Fournie.