"Random Forest Regression Using C#" in Visual Studio Magazine

I wrote an article titled “Random Forest Regression Using C#” in the March 2026 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2026/03/18/random-forest-regression-using-csharp.aspx.

The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict the price of a house based on its square footage, number of bedrooms, property tax rate, and so on.

A simple decision tree regressor encodes a virtual set of if-then rules to make a prediction. For example, “if house-age is greater than 10.0 and house-age is less than or equal to 13.5 and bedrooms greater than 3.0 then price is $525,665.38.” Simple decision trees usually overfit their training data, and then the tree predicts poorly on new, previously unseen data.

A random forest is a collection of simple decision tree regressors that have been trained on different random subsets of the source training data. This process usually goes a long way toward limiting the overfitting problem.

To make a prediction for an input vector x, each tree in the forest makes a prediction and the final predicted y value is the average of the predictions. A bagging (“bootstrap aggregation”) regression system is a specific type of random forest system where all columns/predictors of the source training data are always used to construct the training data subsets.

The VSM article presents a complete demo of random forest regression using the C# language. The output of the demo is:

Begin C# Random Forest regression demo

Loading synthetic train (200) and test (40) data
Done

First three train X:
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065
  0.0776 -0.1616  0.3704 -0.5911  0.7562
 -0.9452  0.3409 -0.1654  0.1174 -0.7192

First three train y:
  0.4840
  0.1568
  0.8054

Setting nTrees = 100
Setting maxDepth = 6
Setting minSamples = 2
Setting minLeaf = 1
Setting nCols = 5
Setting nRows = 150

Creating and training random forest regression model
Done

Accuracy train (within 0.10) = 0.8050
Accuracy test (within 0.10) = 0.5750

MSE train = 0.0006
MSE test = 0.0016

Predicting for x =
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065
y = 0.4728

End demo

The demo data is synthetic. It was generated by a 5-10-1 neural network with random weights and bias values. The idea here is that the synthetic data does have an underlying, but complex, structure which can be predicted.

When using decision trees for regression, it’s not necessary to normalize the training data predictor values because no distance between data items is computed. However, it’s not a bad idea to normalize the predictors just in case you want to send the data to other regression algorithms that do require normalization.

Random forest regression is most often used with data that has strictly numeric predictor variables. It is possible to use random forest regression with mixed categorical and numeric data, by using ordinal encoding on the categorical data. In theory, ordinal encoding shouldn’t work well. For example, if you have a predictor variable color with possible encoded values red = 0, blue = 1, green = 2, then red will always be less-than-or-equal to any other color value in the decision tree construction process. However, in practice, ordinal encoding for random forest regression often works well.

The motivation for combining many simple decision tree regressors into a forest is the fact that a simple decision tree will always overfit training data if the tree is deep enough. By using a collection of trees that have been trained on different random subsets of the source data, the averaged prediction of the collection is much less likely to overfit.

Random forest regression is conceptually related to bagging (bootstrap aggregation) regression, adaptive boosting regression, and gradient boosting regression. I’m a big fan of 1950s science fiction movies. To my eye, several of the actresses in my favorite films seem to have somewhat similar physical appearances.

Left: Actress Julie Adams (1926-2019) is best known for her lead role in “Creature from the Black Lagoon” (1954).

Center: Actress Joan Taylor (1929-2012) had lead roles in “Earth vs. the Flying Saucers” (1956) and “20 Million Miles to Earth” (1957).

Right: Actress Joan Weldon (1930-2021) was in “Them!” (1954) — the well-known giant ant movie.