Decision tree regression is a machine learning technique that incorporates a set of if-then rules in a tree data structure to predict a single numeric value. For example, a decision tree regression model prediction might be, “If employee age is greater than 28.0 and age is less than or equal to 33.5 and years-experience is less than or equal to 7.0 and height is greater than 66.0 then bank account balance is $754.14.”
I have implemented decision tree regression from scratch many times. Decision trees are conceptually simple, but there are a huge number of design alternatives. My latest version is implemented from scratch with C#, using list storage for tree nodes (no pointers/references), a sequential iterative algorithm to build the tree (no recursion, no stack), weighted variance minimization for the node split function, and saves associated row indices in each node for high interpretability. And that just begins to touch on all the design details. Whew!
The demo data looks like:
0.1660, 0.4406, -0.9998, -0.3953, -0.7065, 0.4840 0.0776, -0.1616, 0.3704, -0.5911, 0.7562, 0.1568 -0.9452, 0.3409, -0.1654, 0.1174, -0.7192, 0.8054 0.9365, -0.3732, 0.3846, 0.7528, 0.7892, 0.1345 . . .
The first five values on each line are the x predictors. The last value on each line is the target y variable to predict. The data is synthetic and was generated by a 5-10-1 neural network with random weights and bias. Therefore, the data has an underlying structure which can be predicted. There are 200 training items and 40 test items.
The output of the demo program is:
Begin decision tree regression (simple version)
Loading synthetic train (200) and test (40) data
Done
First three train X:
-0.1660 0.4406 -0.9998 -0.3953 -0.7065
0.0776 -0.1616 0.3704 -0.5911 0.7562
-0.9452 0.3409 -0.1654 0.1174 -0.7192
First three train y:
0.4840
0.1568
0.8054
Setting maxDepth = 3
Setting minSamples = 2
Setting minLeaf = 18
Using default numSplitCols = -1
Creating and training tree
Done
Tree:
ID 0 | idx 0 | v -0.2102 | L 1 | R 2 | y 0.3493 | leaf F
ID 1 | idx 4 | v 0.1431 | L 3 | R 4 | y 0.5345 | leaf F
ID 2 | idx 0 | v 0.3915 | L 5 | R 6 | y 0.2382 | leaf F
ID 3 | idx 0 | v -0.6553 | L 7 | R 8 | y 0.6358 | leaf F
ID 4 | idx -1 | v 0.0000 | L 9 | R 10 | y 0.4123 | leaf T
ID 5 | idx 4 | v -0.2987 | L 11 | R 12 | y 0.3032 | leaf F
ID 6 | idx 2 | v 0.3777 | L 13 | R 14 | y 0.1701 | leaf F
ID 7 | idx -1 | v 0.0000 | L -1 | R -1 | y 0.6952 | leaf T
ID 8 | idx -1 | v 0.0000 | L -1 | R -1 | y 0.5598 | leaf T
ID 11 | idx -1 | v 0.0000 | L -1 | R -1 | y 0.4101 | leaf T
ID 12 | idx -1 | v 0.0000 | L -1 | R -1 | y 0.2613 | leaf T
ID 13 | idx -1 | v 0.0000 | L -1 | R -1 | y 0.1882 | leaf T
ID 14 | idx -1 | v 0.0000 | L -1 | R -1 | y 0.1381 | leaf T
Rows associated with node [14]:
1 5 10 11 12 13 15 17 28 30 33 37 39 41 46 52 55 65 71 74 78 82
Evaluating model
Accuracy train (within 0.10) = 0.3750
Accuracy test (within 0.10) = 0.4750
MSE train = 0.0048
MSE test = 0.0054
Predicting for trainX[0] =
-0.1660 0.4406 -0.9998 -0.3953 -0.7065
Predicted y = 0.4101
IF
column 0 gt -0.2102 AND
column 0 lte 0.3915 AND
column 4 lte -0.2987 AND
THEN
predicted = 0.4101
End demo
There is a lot going on in the output. If you compare this output with the visual representation, most of the ideas of decision tree regression will become mostly clear:
The demo decision tree has maxDepth = 3. This creates a tree with 15 nodes ([0] through [14], some of which could be empty). In general, if a tree has maxDepth = n, then a full tree has 2^(n+1) – 1 nodes. These tree nodes could be stored as C# program-defined Node objects and connected via pointers/references. Instead, the architecture used by the demo decision tree is to create a List collection with size/count = 15, where each item in the List is a Node object.
With this design, if the root node is at index [0], then the left child is at [1] and the right child is at [2]. In general, if a node is at index [i], then the left child is at index [2*i+1] and the right child is at index [2*i+2].
The tree splitting part of constructing a decision tree regression system is perhaps best understood by looking at a concrete example. The root node of the demo decision tree is associated with all 200 rows of the training data. The splitting process selects some of the 200 rows to be assigned to the left child node, and the remaining rows to be assigned to the right child node.
For simplicity, suppose that instead of the demo training data with 200 rows and 5 columns of predictors, a tree node is associated with just 5 rows of data, and the data has just 3 columns of predictors:
X y
[0] 0.99 0.22 0.77 3.0
[1] 0.55 0.44 0.00 9.0
[2] 0.88 0.66 0.88 7.0
[3] 0.11 0.33 0.22 1.0
[4] 0.55 0.88 0.33 5.0
[0] [1] [2]
The split algorithm will select some of the five rows to go to the left child and the remaining rows to go to the right child. The idea is to select the two sets of rows in a way so that their associated y values are close together. The demo implementation does this by finding the split that minimizes the weighted variance of the target y values. (There are several other split algorithms).
The splitting algorithm examines every possible x value in every column. For each possible candidate split value (also called the threshold), the weighted variance of the y values of the resulting split is computed. For example, for x = 0.33 in column [1], the rows where the values in column [1] are less than or equal to 0.33 are rows [3] and [0] with associated y values (1.0, 3.0). The other rows are [1], [2], [4] with associated y values (9.0, 7.0, 5.0).
The variance of a set of y values is the average of the sum of the squared differences between the values and their mean (average).
For this proposed split, the variance of y values (1.0, 3.0) is computed as 1.00. The mean of (1.0, 3.0) = (1.0 + 3.0) / 2 = 2.0 and so the variance is ((3.0 – 2.0)^2 + (1.0 – 2.0)^2) / 2 = 1.00.
The mean of y values (9.0, 7.0, 5.0) = (9.0 + 7.0 + 5.0) / 3 = 7.0 and the variance is ((9.0 – 7.0)^2 + (7.0 -7.0)^2 + (5.0 – 7.0)^2) / 3 = 2.67.
Because the proposed split has different numbers of rows (two rows for the left child and three rows for the right child), the weighted variance of the proposed split is used and is ((2 * 1.00) + (3 * 2.67)) / 5 = 2.00.
This process is repeated for every x value in every column. The x value and its column that generate the smallest weighted variance define the split.
The demo does not use recursion to build the tree, which is a common aproach in university computer science data structures classes, but not good for a production environment. Another common approach is to use a stack and push and pop nodes as they are created. But the demo in this blog uses sequential iteration, constructing the tree nodes in order. The details are extrememly tricky and can only be completely understood by carefully dissecting the demo code. Implementation took me many hours of coding and debugging.
Decision trees are rarely used by themselves because they usually overfit the training data, and predict poorly on new, previously unseen data. Multiple decision trees are usually combined into an ensemble technique such as bagging trees (“bootstrap aggregation”), random forest regression, adaptive boosting regression, and gradient boosting regression.

Each node in a decision tree has a left child and a right child. Tree children are neither males nor females. I’m a big fan of early science fiction movies. A surprising number of these movies feature the daughter of a scientist.
Left: In “Beyond the Time Barrier” (1960), U.S. test pilot Bill Allison accidentally flies 64 years into the future. A nuclear war has forced civilization underground. The pilot falls in love with Trirene who is the granddaughter of a leading scientist. Things don’t turn out well for the two lovers. My personal grade = C.
Center: “Reptilicus” (1962) is essentially a Danish version of “Godzilla”. Danish miners uncover a section of a frozen dinosaur tail. It regenerates into the full beast. Professor Martens devises a way to deal with the monster. He has two daughters, Lise (left) and Karen (right). My persnal grade = C+.
Right: In “The Man from Planet X” (1951), an alien spaceship lands in the Scottish moors. It is a scout for a full scale invasion. Professor Elliot is aided by his daughter Enid to defeat the aliens’ plans. My grade = B.
Demo code. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor chokes on symbols).
using System;
using System.IO;
using System.Collections.Generic;
// full List storage with indexes (no pointers)
// iteration-based construction (no stack, no recursion)
// Nodes hold associated row idxs (interpretability)
namespace DecisionTreeRegressionSimple
{
internal class DecisionTreeRegressionProgram
{
static void Main(string[] args)
{
Console.WriteLine("\nBegin decision tree" +
" regression (simple version) ");
// 1. load data
Console.WriteLine("\nLoading synthetic train" +
" (200) and test (40) data");
string trainFile = "..\\..\\..\\Data\\" +
"synthetic_train_200.txt";
int[] colsX = new int[] { 0, 1, 2, 3, 4 };
int colY = 5;
double[][] trainX =
MatLoad(trainFile, colsX, ',', "#");
double[] trainY =
MatToVec(MatLoad(trainFile,
new int[] { colY }, ',', "#"));
string testFile = "..\\..\\..\\Data\\" +
"synthetic_test_40.txt";
double[][] testX =
MatLoad(testFile, colsX, ',', "#");
double[] testY =
MatToVec(MatLoad(testFile,
new int[] { colY }, ',', "#"));
Console.WriteLine("Done ");
Console.WriteLine("\nFirst three train X: ");
for (int i = 0; i "lt" 3; ++i)
VecShow(trainX[i], 4, 8);
Console.WriteLine("\nFirst three train y: ");
for (int i = 0; i "lt" 3; ++i)
Console.WriteLine(trainY[i].ToString("F4").
PadLeft(8));
// 2. create and train/build tree
// small tree for blog, article
int maxDepth = 3; //
int minSamples = 2;
int minLeaf = 18;
int numSplitCols = -1; // all columns
Console.WriteLine("\nSetting maxDepth = " +
maxDepth);
Console.WriteLine("Setting minSamples = " +
minSamples);
Console.WriteLine("Setting minLeaf = " +
minLeaf);
Console.WriteLine("Using default numSplitCols = -1 ");
Console.WriteLine("\nCreating and training tree ");
DecisionTreeRegressor dtr =
new DecisionTreeRegressor(maxDepth, minSamples,
minLeaf, numSplitCols, seed: 0);
dtr.Train(trainX, trainY);
Console.WriteLine("Done ");
Console.WriteLine("\nTree: ");
dtr.Display(showRows: false);
Console.WriteLine("\nRows associated w node [14]: ");
for (int i = 0; i "lt" dtr.tree[14].rows.Count; ++i)
Console.Write(dtr.tree[12].rows[i] + " ");
Console.WriteLine("");
// 3. evaluate model
Console.WriteLine("\nEvaluating model ");
double accTrain = dtr.Accuracy(trainX, trainY, 0.10);
Console.WriteLine("Accuracy train (within 0.10) = " +
accTrain.ToString("F4"));
double accTest = dtr.Accuracy(testX, testY, 0.10);
Console.WriteLine("Accuracy test (within 0.10) = " +
accTest.ToString("F4"));
double mseTrain = dtr.MSE(trainX, trainY);
Console.WriteLine("\nMSE train = " +
mseTrain.ToString("F4"));
double mseTest = dtr.MSE(testX, testY);
Console.WriteLine("MSE test = " +
mseTest.ToString("F4"));
// 4. use model
Console.WriteLine("\nPredicting for trainX[0] = ");
double[] x = trainX[0];
VecShow(x, 4, 9);
double predY = dtr.Predict(x);
Console.WriteLine("Predicted y = " +
predY.ToString("F4"));
dtr.Explain(x);
Console.WriteLine("\nEnd demo ");
Console.ReadLine();
} // Main()
// ------------------------------------------------------
// helpers for Main():
// MatLoad(), MatToVec(), VecShow().
// ------------------------------------------------------
static double[][] MatLoad(string fn, int[] usecols,
char sep, string comment)
{
List"lt"double[]"gt" result =
new List"lt"double[]"gt"();
string line = "";
FileStream ifs = new FileStream(fn, FileMode.Open);
StreamReader sr = new StreamReader(ifs);
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith(comment) == true)
continue;
string[] tokens = line.Split(sep);
List"lt"double"gt" lst = new List"lt"double"gt"();
for (int j = 0; j "lt" usecols.Length; ++j)
lst.Add(double.Parse(tokens[usecols[j]]));
double[] row = lst.ToArray();
result.Add(row);
}
sr.Close(); ifs.Close();
return result.ToArray();
}
static double[] MatToVec(double[][] mat)
{
int nRows = mat.Length;
int nCols = mat[0].Length;
double[] result = new double[nRows * nCols];
int k = 0;
for (int i = 0; i "lt" nRows; ++i)
for (int j = 0; j "lt" nCols; ++j)
result[k++] = mat[i][j];
return result;
}
static void VecShow(double[] vec, int dec, int wid)
{
for (int i = 0; i "lt" vec.Length; ++i)
Console.Write(vec[i].ToString("F" + dec).
PadLeft(wid));
Console.WriteLine("");
}
} // class Program
// ========================================================
public class DecisionTreeRegressor
{
public int maxDepth;
public int minSamples; // aka min_samples_split
public int minLeaf; // min number of values in a leaf
public int numSplitCols; // mostly for random forest
public List"lt"Node"gt" tree = new List"lt"Node"gt"();
public Random rnd; // order in which cols are searched
public double[][] trainX; // store data by ref
public double[] trainY;
// ------------------------------------------------------
public class Node
{
public int id;
public int colIdx; // aka featureIdx
public double thresh;
public int left; // index into List
public int right;
public double value;
public bool isLeaf;
public List"lt"int"gt" rows; // rows train data
public Node()
{
this.id = -1;
this.colIdx = -1;
this.thresh = 0.0; // aka split value
this.left = -1;
this.right = -1;
this.value = 0.0; // aka pred y
this.isLeaf = false;
this.rows = null;
}
} // class Node
// --------------------------------------------
public DecisionTreeRegressor(int maxDepth = 2,
int minSamples = 2, int minLeaf = 1,
int numSplitCols = -1, int seed = 0)
{
// if maxDepth = 0, tree has just a root node
// if maxDepth = 1, at most 3 nodes (root, l, r)
// if maxDepth = n, at most 2^(n+1) - 1 nodes
this.maxDepth = maxDepth;
this.minSamples = minSamples;
this.minLeaf = minLeaf;
this.numSplitCols = numSplitCols; // for ran. forest
// create full tree List with null nodes
int numNodes = (int)Math.Pow(2, (maxDepth + 1)) - 1;
for (int i = 0; i "lt" numNodes; ++i)
{
this.tree.Add(null); // empty nodes
}
this.rnd = new Random(seed);
}
// ------------------------------------------------------
// public: Train(), Predict(), Explain(), Accuracy(),
// MSE(), Display().
// helpers: MakeTree(), BestSplit(), TreeTargetMean(),
// TreeTargetVariance().
// ------------------------------------------------------
public void Train(double[][] trainX, double[] trainY)
{
this.trainX = trainX; //
this.trainY = trainY;
this.MakeTree();
// at this point the rows List in each node can be
// nulled out to save space, especially if the
// decision tree is part of an ensemble
}
// ------------------------------------------------------
public double Predict(double[] x)
{
int p = 0;
Node currNode = this.tree[p];
while (currNode != null &&
currNode.isLeaf == false &&
p "lt" this.tree.Count)
{
if (x[currNode.colIdx] "lte" currNode.thresh)
p = currNode.left;
else
p = currNode.right;
currNode = this.tree[p];
}
return this.tree[p].value;
}
// ------------------------------------------------------
public void Explain(double[] x)
{
int p = 0;
Console.WriteLine("\nIF ");
Node currNode = this.tree[p];
while (currNode.isLeaf == false)
{
Console.Write("column ");
Console.Write(currNode.colIdx + " ");
if (x[currNode.colIdx] "lte" currNode.thresh)
{
Console.Write(" "lte" ");
Console.WriteLine(currNode.thresh.
ToString("F4").PadLeft(8) + " AND ");
p = currNode.left;
currNode = this.tree[p];
}
else
{
Console.Write(" "gt" ");
Console.WriteLine(currNode.thresh.
ToString("F4").PadLeft(8) + " AND ");
p = currNode.right;
currNode = this.tree[p];
}
}
Console.WriteLine("THEN \npredicted = " +
currNode.value.ToString("F4"));
}
// ------------------------------------------------------
public double Accuracy(double[][] dataX, double[] dataY,
double pctClose)
{
int numCorrect = 0; int numWrong = 0;
for (int i = 0; i "lt" dataX.Length; ++i)
{
double actualY = dataY[i];
double predY = this.Predict(dataX[i]);
if (Math.Abs(predY - actualY) "lt"
Math.Abs(pctClose * actualY))
++numCorrect;
else
++numWrong;
}
return (numCorrect * 1.0) / (numWrong + numCorrect);
}
// ------------------------------------------------------
public double MSE(double[][] dataX,
double[] dataY)
{
// standard machine learning MSE
int n = dataX.Length;
double sum = 0.0;
for (int i = 0; i "lt" n; ++i)
{
double actualY = dataY[i];
double predY = this.Predict(dataX[i]);
sum += (actualY - predY) * (actualY - predY);
}
return sum / n;
}
// ------------------------------------------------------
public void Display(bool showRows)
{
for (int i = 0; i "lt" this.tree.Count; ++i)
{
Node n = this.tree[i];
// check for empty nodes
if (n == null) continue;
string s1 = "ID " +
n.id.ToString().PadRight(3) + " | ";
string s2 = "idx " +
n.colIdx.ToString().PadLeft(3) + " | ";
string s3 = "v " +
n.thresh.ToString("F4").PadLeft(8) + " | ";
string s4 = "L " +
n.left.ToString().PadLeft(3) + " | ";
string s5 = "R " +
n.right.ToString().PadLeft(3) + " | ";
string s6 = "y " +
n.value.ToString("F4").PadLeft(8) + " | ";
string s7 = "leaf " + (n.isLeaf == true ? "T":"F");
//n.isLeaf.ToString().PadLeft(6);
string s8;
if (showRows == true)
{
s8 = "\nRows: \n";
for (int j = 0; j "lt" n.rows.Count; ++j)
{
if (j "gt" 0 && j % 20 == 0)
s8 += "\n";
s8 += n.rows[j].ToString().PadLeft(4) + " ";
}
}
else
{
s8 = "";
}
Console.WriteLine(s1 + s2 + s3 + s4 +
s5 + s6 + s7 + s8);
if (showRows == true) Console.WriteLine("");
}
} // Display()
// ------------------------------------------------------
// helpers: MakeTree(), BestSplit(),
// TreeTargetMean(), TreeTargetVariance()
private void MakeTree()
{
// no recursion, no pointers, List storage, no stack
if (this.numSplitCols == -1) // use all cols
this.numSplitCols = this.trainX[0].Length;
// prepare root node
List"lt"int"gt" allRows = new List"lt"int"gt"();
for (int i = 0; i "lt" this.trainX.Length; ++i)
allRows.Add(i);
double grandMean = this.TreeTargetMean(allRows);
// wait to supply colIdx and thresh in loop
Node root = new Node();
root.id = 0;
root.left = 1;
root.right = 2;
root.value = grandMean;
root.isLeaf = false; // already set
root.rows = allRows;
this.tree[0] = root;
for (int i = 0; i "lt" this.tree.Count; ++i)
{
Node currNode = this.tree[i];
// curr node has values for everything
// except colIdx and thresh
// curr node too deep to have children OR
// curr node not enough rows to split then
// leave both children as null
if (currNode == null ||
currNode.rows.Count == 0) { continue; }
// if parent cannot be split, make parent a leaf
if (currNode.id "gte" (int)Math.Pow(2,
(this.maxDepth)) - 1 ||
currNode.rows.Count "lt" this.minSamples)
{
currNode.isLeaf = true;
continue;
}
// parent has enough rows to try to split
double[] splitInfo = this.BestSplit(currNode.rows);
int colIdx = (int)splitInfo[0];
double splitVal = splitInfo[1]; //split value
if (colIdx == -1) // unable to split, parent is leaf
{
currNode.isLeaf = true;
continue;
}
// complete the fields for curr node
currNode.colIdx = colIdx;
currNode.thresh = splitVal;
// construct the children, except colIdx and thresh
Node leftNode = new Node();
Node rightNode = new Node();
// construct the children rows using split info
// all info except colIdx and thresh
List"lt"int"gt" leftIdxs = new List"lt"int"gt"();
List"lt"int"gt" rightIdxs = new List"lt"int"gt"();
for (int k = 0; k "lt" currNode.rows.Count; ++k)
{
int r = currNode.rows[k];
if (this.trainX[r][colIdx] "lte" splitVal)
leftIdxs.Add(r);
else
rightIdxs.Add(r);
}
leftNode.id = currNode.id * 2 + 1;
if (leftNode.id "gt" (int)Math.Pow(2,
(maxDepth + 1)) - 2) leftNode.id = -1;
leftNode.left = leftNode.id * 2 + 1;
if (leftNode.left "gt" (int)Math.Pow(2,
(maxDepth + 1)) - 2) leftNode.left = -1;
leftNode.right = leftNode.id * 2 + 2;
if (leftNode.right "gt" (int)Math.Pow(2,
(maxDepth + 1)) - 2) leftNode.right = -1;
leftNode.rows = leftIdxs;
leftNode.value =
this.TreeTargetMean(leftNode.rows);
this.tree[leftNode.id] = leftNode;
rightNode.id = currNode.id * 2 + 2;
if (rightNode.id "gt" (int)Math.Pow(2,
(maxDepth + 1)) - 2) rightNode.id = -1;
rightNode.left = rightNode.id * 2 + 1;
if (rightNode.left "gt" (int)Math.Pow(2,
(maxDepth + 1)) - 2) rightNode.left = -1;
rightNode.right = rightNode.id * 2 + 2;
if (rightNode.right "gt" (int)Math.Pow(2,
(maxDepth + 1)) - 2) rightNode.right = -1;
rightNode.rows = rightIdxs;
rightNode.value =
this.TreeTargetMean(rightNode.rows);
this.tree[rightNode.id] = rightNode;
} // i
return;
}
// ------------------------------------------------------
private double[] BestSplit(List"lt"int"gt" rows)
{
// implicit params numSplitCols, minLeaf, numsplitCols
// result[0] = best col idx (as double)
// result[1] = best split value
rows.Sort();
int bestColIdx = -1; // indicates bad split
double bestThresh = 0.0;
double bestVar = double.MaxValue; // smaller is better
int nRows = rows.Count; // or dataY.Length
int nCols = this.trainX[0].Length;
if (nRows == 0)
{
throw new Exception("empty data in BestSplit()");
}
// process cols in scrambled order
int[] colIndices = new int[nCols];
for (int k = 0; k "lt" nCols; ++k)
colIndices[k] = k;
// shuffle, inline Fisher-Yates
int n = colIndices.Length;
for (int i = 0; i "lt" n; ++i)
{
int ri = rnd.Next(i, n); // be careful
int tmp = colIndices[i];
colIndices[i] = colIndices[ri];
colIndices[ri] = tmp;
}
// numSplitCols is usually all columns (-1)
for (int j = 0; j "lt" this.numSplitCols; ++j)
{
int colIdx = colIndices[j];
HashSet"lt"double"gt" examineds =
new HashSet"lt"double"gt"();
for (int i = 0; i "lt" nRows; ++i) // each row
{
// if curr thresh been seen, skip it
double thresh = this.trainX[rows[i]][colIdx];
if (examineds.Contains(thresh)) continue;
examineds.Add(thresh);
// get row idxs where x is lte, gt thresh
List"lt"int"gt" leftIdxs = new List"lt"int"gt"();
List"lt"int"gt" rightIdxs = new List"lt"int"gt"();
for (int k = 0; k "lt" nRows; ++k)
{
if (this.trainX[rows[k]][colIdx] "lte" thresh)
leftIdxs.Add(rows[k]);
else
rightIdxs.Add(rows[k]);
}
// Check if proposed split has too few values
if (leftIdxs.Count "lt" this.minLeaf ||
rightIdxs.Count "lt" this.minLeaf)
continue; // to next row
double leftVar =
this.TreeTargetVariance(leftIdxs);
double rightVar =
this.TreeTargetVariance(rightIdxs);
double weightedVar = (leftIdxs.Count * leftVar +
rightIdxs.Count * rightVar) / nRows;
if (weightedVar "lt" bestVar)
{
// if this never happens, bestColIdx remains -1
// which means a bad split. used in MakeTree()
bestColIdx = colIdx;
bestThresh = thresh;
bestVar = weightedVar;
}
} // each row
} // j each col
double[] result = new double[2]; // out params ugly
result[0] = 1.0 * bestColIdx;
result[1] = bestThresh;
return result;
} // BestSplit()
// ------------------------------------------------------
private double TreeTargetMean(List"lt"int"gt" rows)
{
// mean of rows items in trainY: for node prediction
double sum = 0.0;
for (int i = 0; i "lt" rows.Count; ++i)
{
int r = rows[i];
sum += this.trainY[r];
}
return sum / rows.Count;
}
// ------------------------------------------------------
private double TreeTargetVariance(List"lt"int"gt" rows)
{
double mean = this.TreeTargetMean(rows);
double sum = 0.0;
for (int i = 0; i "lt" rows.Count; ++i)
{
int r = rows[i];
sum += (this.trainY[r] - mean) *
(this.trainY[r] - mean);
}
return sum / rows.Count;
}
// ------------------------------------------------------
} // class DecisionTreeRegressor
} // ns
Training data:
# synthetic_train_200.txt # -0.1660, 0.4406, -0.9998, -0.3953, -0.7065, 0.4840 0.0776, -0.1616, 0.3704, -0.5911, 0.7562, 0.1568 -0.9452, 0.3409, -0.1654, 0.1174, -0.7192, 0.8054 0.9365, -0.3732, 0.3846, 0.7528, 0.7892, 0.1345 -0.8299, -0.9219, -0.6603, 0.7563, -0.8033, 0.7955 0.0663, 0.3838, -0.3690, 0.3730, 0.6693, 0.3206 -0.9634, 0.5003, 0.9777, 0.4963, -0.4391, 0.7377 -0.1042, 0.8172, -0.4128, -0.4244, -0.7399, 0.4801 -0.9613, 0.3577, -0.5767, -0.4689, -0.0169, 0.6861 -0.7065, 0.1786, 0.3995, -0.7953, -0.1719, 0.5569 0.3888, -0.1716, -0.9001, 0.0718, 0.3276, 0.2500 0.1731, 0.8068, -0.7251, -0.7214, 0.6148, 0.3297 -0.2046, -0.6693, 0.8550, -0.3045, 0.5016, 0.2129 0.2473, 0.5019, -0.3022, -0.4601, 0.7918, 0.2613 -0.1438, 0.9297, 0.3269, 0.2434, -0.7705, 0.5171 0.1568, -0.1837, -0.5259, 0.8068, 0.1474, 0.3307 -0.9943, 0.2343, -0.3467, 0.0541, 0.7719, 0.5581 0.2467, -0.9684, 0.8589, 0.3818, 0.9946, 0.1092 -0.6553, -0.7257, 0.8652, 0.3936, -0.8680, 0.7018 0.8460, 0.4230, -0.7515, -0.9602, -0.9476, 0.1996 -0.9434, -0.5076, 0.7201, 0.0777, 0.1056, 0.5664 0.9392, 0.1221, -0.9627, 0.6013, -0.5341, 0.1533 0.6142, -0.2243, 0.7271, 0.4942, 0.1125, 0.1661 0.4260, 0.1194, -0.9749, -0.8561, 0.9346, 0.2230 0.1362, -0.5934, -0.4953, 0.4877, -0.6091, 0.3810 0.6937, -0.5203, -0.0125, 0.2399, 0.6580, 0.1460 -0.6864, -0.9628, -0.8600, -0.0273, 0.2127, 0.5387 0.9772, 0.1595, -0.2397, 0.1019, 0.4907, 0.1611 0.3385, -0.4702, -0.8673, -0.2598, 0.2594, 0.2270 -0.8669, -0.4794, 0.6095, -0.6131, 0.2789, 0.4700 0.0493, 0.8496, -0.4734, -0.8681, 0.4701, 0.3516 0.8639, -0.9721, -0.5313, 0.2336, 0.8980, 0.1412 0.9004, 0.1133, 0.8312, 0.2831, -0.2200, 0.1782 0.0991, 0.8524, 0.8375, -0.2102, 0.9265, 0.2150 -0.6521, -0.7473, -0.7298, 0.0113, -0.9570, 0.7422 0.6190, -0.3105, 0.8802, 0.1640, 0.7577, 0.1056 0.6895, 0.8108, -0.0802, 0.0927, 0.5972, 0.2214 0.1982, -0.9689, 0.1870, -0.1326, 0.6147, 0.1310 -0.3695, 0.7858, 0.1557, -0.6320, 0.5759, 0.3773 -0.1596, 0.3581, 0.8372, -0.9992, 0.9535, 0.2071 -0.2468, 0.9476, 0.2094, 0.6577, 0.1494, 0.4132 0.1737, 0.5000, 0.7166, 0.5102, 0.3961, 0.2611 0.7290, -0.3546, 0.3416, -0.0983, -0.2358, 0.1332 -0.3652, 0.2438, -0.1395, 0.9476, 0.3556, 0.4170 -0.6029, -0.1466, -0.3133, 0.5953, 0.7600, 0.4334 -0.4596, -0.4953, 0.7098, 0.0554, 0.6043, 0.2775 0.1450, 0.4663, 0.0380, 0.5418, 0.1377, 0.2931 -0.8636, -0.2442, -0.8407, 0.9656, -0.6368, 0.7429 0.6237, 0.7499, 0.3768, 0.1390, -0.6781, 0.2185 -0.5499, 0.1850, -0.3755, 0.8326, 0.8193, 0.4399 -0.4858, -0.7782, -0.6141, -0.0008, 0.4572, 0.4197 0.7033, -0.1683, 0.2334, -0.5327, -0.7961, 0.1776 0.0317, -0.0457, -0.6947, 0.2436, 0.0880, 0.3345 0.5031, -0.5559, 0.0387, 0.5706, -0.9553, 0.3107 -0.3513, 0.7458, 0.6894, 0.0769, 0.7332, 0.3170 0.2205, 0.5992, -0.9309, 0.5405, 0.4635, 0.3532 -0.4806, -0.4859, 0.2646, -0.3094, 0.5932, 0.3202 0.9809, -0.3995, -0.7140, 0.8026, 0.0831, 0.1600 0.9495, 0.2732, 0.9878, 0.0921, 0.0529, 0.1289 -0.9476, -0.6792, 0.4913, -0.9392, -0.2669, 0.5966 0.7247, 0.3854, 0.3819, -0.6227, -0.1162, 0.1550 -0.5922, -0.5045, -0.4757, 0.5003, -0.0860, 0.5863 -0.8861, 0.0170, -0.5761, 0.5972, -0.4053, 0.7301 0.6877, -0.2380, 0.4997, 0.0223, 0.0819, 0.1404 0.9189, 0.6079, -0.9354, 0.4188, -0.0700, 0.1907 -0.1428, -0.7820, 0.2676, 0.6059, 0.3936, 0.2790 0.5324, -0.3151, 0.6917, -0.1425, 0.6480, 0.1071 -0.8432, -0.9633, -0.8666, -0.0828, -0.7733, 0.7784 -0.9444, 0.5097, -0.2103, 0.4939, -0.0952, 0.6787 -0.0520, 0.6063, -0.1952, 0.8094, -0.9259, 0.4836 0.5477, -0.7487, 0.2370, -0.9793, 0.0773, 0.1241 0.2450, 0.8116, 0.9799, 0.4222, 0.4636, 0.2355 0.8186, -0.1983, -0.5003, -0.6531, -0.7611, 0.1511 -0.4714, 0.6382, -0.3788, 0.9648, -0.4667, 0.5950 0.0673, -0.3711, 0.8215, -0.2669, -0.1328, 0.2677 -0.9381, 0.4338, 0.7820, -0.9454, 0.0441, 0.5518 -0.3480, 0.7190, 0.1170, 0.3805, -0.0943, 0.4724 -0.9813, 0.1535, -0.3771, 0.0345, 0.8328, 0.5438 -0.1471, -0.5052, -0.2574, 0.8637, 0.8737, 0.3042 -0.5454, -0.3712, -0.6505, 0.2142, -0.1728, 0.5783 0.6327, -0.6297, 0.4038, -0.5193, 0.1484, 0.1153 -0.5424, 0.3282, -0.0055, 0.0380, -0.6506, 0.6613 0.1414, 0.9935, 0.6337, 0.1887, 0.9520, 0.2540 -0.9351, -0.8128, -0.8693, -0.0965, -0.2491, 0.7353 0.9507, -0.6640, 0.9456, 0.5349, 0.6485, 0.1059 -0.0462, -0.9737, -0.2940, -0.0159, 0.4602, 0.2606 -0.0627, -0.0852, -0.7247, -0.9782, 0.5166, 0.2977 0.0478, 0.5098, -0.0723, -0.7504, -0.3750, 0.3335 0.0090, 0.3477, 0.5403, -0.7393, -0.9542, 0.4415 -0.9748, 0.3449, 0.3736, -0.1015, 0.8296, 0.4358 0.2887, -0.9895, -0.0311, 0.7186, 0.6608, 0.2057 0.1570, -0.4518, 0.1211, 0.3435, -0.2951, 0.3244 0.7117, -0.6099, 0.4946, -0.4208, 0.5476, 0.1096 -0.2929, -0.5726, 0.5346, -0.3827, 0.4665, 0.2465 0.4889, -0.5572, -0.5718, -0.6021, -0.7150, 0.2163 -0.7782, 0.3491, 0.5996, -0.8389, -0.5366, 0.6516 -0.5847, 0.8347, 0.4226, 0.1078, -0.3910, 0.6134 0.8469, 0.4121, -0.0439, -0.7476, 0.9521, 0.1571 -0.6803, -0.5948, -0.1376, -0.1916, -0.7065, 0.7156 0.2878, 0.5086, -0.5785, 0.2019, 0.4979, 0.2980 0.2764, 0.1943, -0.4090, 0.4632, 0.8906, 0.2960 -0.8877, 0.6705, -0.6155, -0.2098, -0.3998, 0.7107 -0.8398, 0.8093, -0.2597, 0.0614, -0.0118, 0.6502 -0.8476, 0.0158, -0.4769, -0.2859, -0.7839, 0.7715 0.5751, -0.7868, 0.9714, -0.6457, 0.1448, 0.1175 0.4802, -0.7001, 0.1022, -0.5668, 0.5184, 0.1090 0.4458, -0.6469, 0.7239, -0.9604, 0.7205, 0.0779 0.5175, 0.4339, 0.9747, -0.4438, -0.9924, 0.2879 0.8678, 0.7158, 0.4577, 0.0334, 0.4139, 0.1678 0.5406, 0.5012, 0.2264, -0.1963, 0.3946, 0.2088 -0.9938, 0.5498, 0.7928, -0.5214, -0.7585, 0.7687 0.7661, 0.0863, -0.4266, -0.7233, -0.4197, 0.1466 0.2277, -0.3517, -0.0853, -0.1118, 0.6563, 0.1767 0.3499, -0.5570, -0.0655, -0.3705, 0.2537, 0.1632 0.7547, -0.1046, 0.5689, -0.0861, 0.3125, 0.1257 0.8186, 0.2110, 0.5335, 0.0094, -0.0039, 0.1391 0.6858, -0.8644, 0.1465, 0.8855, 0.0357, 0.1845 -0.4967, 0.4015, 0.0805, 0.8977, 0.2487, 0.4663 0.6760, -0.9841, 0.9787, -0.8446, -0.3557, 0.1509 -0.1203, -0.4885, 0.6054, -0.0443, -0.7313, 0.4854 0.8557, 0.7919, -0.0169, 0.7134, -0.1628, 0.2002 0.0115, -0.6209, 0.9300, -0.4116, -0.7931, 0.4052 -0.7114, -0.9718, 0.4319, 0.1290, 0.5892, 0.3661 0.3915, 0.5557, -0.1870, 0.2955, -0.6404, 0.2954 -0.3564, -0.6548, -0.1827, -0.5172, -0.1862, 0.4622 0.2392, -0.4959, 0.5857, -0.1341, -0.2850, 0.2470 -0.3394, 0.3947, -0.4627, 0.6166, -0.4094, 0.5325 0.7107, 0.7768, -0.6312, 0.1707, 0.7964, 0.2757 -0.1078, 0.8437, -0.4420, 0.2177, 0.3649, 0.4028 -0.3139, 0.5595, -0.6505, -0.3161, -0.7108, 0.5546 0.4335, 0.3986, 0.3770, -0.4932, 0.3847, 0.1810 -0.2562, -0.2894, -0.8847, 0.2633, 0.4146, 0.4036 0.2272, 0.2966, -0.6601, -0.7011, 0.0284, 0.2778 -0.0743, -0.1421, -0.0054, -0.6770, -0.3151, 0.3597 -0.4762, 0.6891, 0.6007, -0.1467, 0.2140, 0.4266 -0.4061, 0.7193, 0.3432, 0.2669, -0.7505, 0.6147 -0.0588, 0.9731, 0.8966, 0.2902, -0.6966, 0.4955 -0.0627, -0.1439, 0.1985, 0.6999, 0.5022, 0.3077 0.1587, 0.8494, -0.8705, 0.9827, -0.8940, 0.4263 -0.7850, 0.2473, -0.9040, -0.4308, -0.8779, 0.7199 0.4070, 0.3369, -0.2428, -0.6236, 0.4940, 0.2215 -0.0242, 0.0513, -0.9430, 0.2885, -0.2987, 0.3947 -0.5416, -0.1322, -0.2351, -0.0604, 0.9590, 0.3683 0.1055, 0.7783, -0.2901, -0.5090, 0.8220, 0.2984 -0.9129, 0.9015, 0.1128, -0.2473, 0.9901, 0.4776 -0.9378, 0.1424, -0.6391, 0.2619, 0.9618, 0.5368 0.7498, -0.0963, 0.4169, 0.5549, -0.0103, 0.1614 -0.2612, -0.7156, 0.4538, -0.0460, -0.1022, 0.3717 0.7720, 0.0552, -0.1818, -0.4622, -0.8560, 0.1685 -0.4177, 0.0070, 0.9319, -0.7812, 0.3461, 0.3052 -0.0001, 0.5542, -0.7128, -0.8336, -0.2016, 0.3803 0.5356, -0.4194, -0.5662, -0.9666, -0.2027, 0.1776 -0.2378, 0.3187, -0.8582, -0.6948, -0.9668, 0.5474 -0.1947, -0.3579, 0.1158, 0.9869, 0.6690, 0.2992 0.3992, 0.8365, -0.9205, -0.8593, -0.0520, 0.3154 -0.0209, 0.0793, 0.7905, -0.1067, 0.7541, 0.1864 -0.4928, -0.4524, -0.3433, 0.0951, -0.5597, 0.6261 -0.8118, 0.7404, -0.5263, -0.2280, 0.1431, 0.6349 0.0516, -0.8480, 0.7483, 0.9023, 0.6250, 0.1959 -0.3212, 0.1093, 0.9488, -0.3766, 0.3376, 0.2735 -0.3481, 0.5490, -0.3484, 0.7797, 0.5034, 0.4379 -0.5785, -0.9170, -0.3563, -0.9258, 0.3877, 0.4121 0.3407, -0.1391, 0.5356, 0.0720, -0.9203, 0.3458 -0.3287, -0.8954, 0.2102, 0.0241, 0.2349, 0.3247 -0.1353, 0.6954, -0.0919, -0.9692, 0.7461, 0.3338 0.9036, -0.8982, -0.5299, -0.8733, -0.1567, 0.1187 0.7277, -0.8368, -0.0538, -0.7489, 0.5458, 0.0830 0.9049, 0.8878, 0.2279, 0.9470, -0.3103, 0.2194 0.7957, -0.1308, -0.5284, 0.8817, 0.3684, 0.2172 0.4647, -0.4931, 0.2010, 0.6292, -0.8918, 0.3371 -0.7390, 0.6849, 0.2367, 0.0626, -0.5034, 0.7039 -0.1567, -0.8711, 0.7940, -0.5932, 0.6525, 0.1710 0.7635, -0.0265, 0.1969, 0.0545, 0.2496, 0.1445 0.7675, 0.1354, -0.7698, -0.5460, 0.1920, 0.1728 -0.5211, -0.7372, -0.6763, 0.6897, 0.2044, 0.5217 0.1913, 0.1980, 0.2314, -0.8816, 0.5006, 0.1998 0.8964, 0.0694, -0.6149, 0.5059, -0.9854, 0.1825 0.1767, 0.7104, 0.2093, 0.6452, 0.7590, 0.2832 -0.3580, -0.7541, 0.4426, -0.1193, -0.7465, 0.5657 -0.5996, 0.5766, -0.9758, -0.3933, -0.9572, 0.6800 0.9950, 0.1641, -0.4132, 0.8579, 0.0142, 0.2003 -0.4717, -0.3894, -0.2567, -0.5111, 0.1691, 0.4266 0.3917, -0.8561, 0.9422, 0.5061, 0.6123, 0.1212 -0.0366, -0.1087, 0.3449, -0.1025, 0.4086, 0.2475 0.3633, 0.3943, 0.2372, -0.6980, 0.5216, 0.1925 -0.5325, -0.6466, -0.2178, -0.3589, 0.6310, 0.3568 0.2271, 0.5200, -0.1447, -0.8011, -0.7699, 0.3128 0.6415, 0.1993, 0.3777, -0.0178, -0.8237, 0.2181 -0.5298, -0.0768, -0.6028, -0.9490, 0.4588, 0.4356 0.6870, -0.1431, 0.7294, 0.3141, 0.1621, 0.1632 -0.5985, 0.0591, 0.7889, -0.3900, 0.7419, 0.2945 0.3661, 0.7984, -0.8486, 0.7572, -0.6183, 0.3449 0.6995, 0.3342, -0.3113, -0.6972, 0.2707, 0.1712 0.2565, 0.9126, 0.1798, -0.6043, -0.1413, 0.2893 -0.3265, 0.9839, -0.2395, 0.9854, 0.0376, 0.4770 0.2690, -0.1722, 0.9818, 0.8599, -0.7015, 0.3954 -0.2102, -0.0768, 0.1219, 0.5607, -0.0256, 0.3949 0.8216, -0.9555, 0.6422, -0.6231, 0.3715, 0.0801 -0.2896, 0.9484, -0.7545, -0.6249, 0.7789, 0.4370 -0.9985, -0.5448, -0.7092, -0.5931, 0.7926, 0.5402
Test data:
# synthetic_test_40.txt # 0.7462, 0.4006, -0.0590, 0.6543, -0.0083, 0.1935 0.8495, -0.2260, -0.0142, -0.4911, 0.7699, 0.1078 -0.2335, -0.4049, 0.4352, -0.6183, -0.7636, 0.5088 0.1810, -0.5142, 0.2465, 0.2767, -0.3449, 0.3136 -0.8650, 0.7611, -0.0801, 0.5277, -0.4922, 0.7140 -0.2358, -0.7466, -0.5115, -0.8413, -0.3943, 0.4533 0.4834, 0.2300, 0.3448, -0.9832, 0.3568, 0.1360 -0.6502, -0.6300, 0.6885, 0.9652, 0.8275, 0.3046 -0.3053, 0.5604, 0.0929, 0.6329, -0.0325, 0.4756 -0.7995, 0.0740, -0.2680, 0.2086, 0.9176, 0.4565 -0.2144, -0.2141, 0.5813, 0.2902, -0.2122, 0.4119 -0.7278, -0.0987, -0.3312, -0.5641, 0.8515, 0.4438 0.3793, 0.1976, 0.4933, 0.0839, 0.4011, 0.1905 -0.8568, 0.9573, -0.5272, 0.3212, -0.8207, 0.7415 -0.5785, 0.0056, -0.7901, -0.2223, 0.0760, 0.5551 0.0735, -0.2188, 0.3925, 0.3570, 0.3746, 0.2191 0.1230, -0.2838, 0.2262, 0.8715, 0.1938, 0.2878 0.4792, -0.9248, 0.5295, 0.0366, -0.9894, 0.3149 -0.4456, 0.0697, 0.5359, -0.8938, 0.0981, 0.3879 0.8629, -0.8505, -0.4464, 0.8385, 0.5300, 0.1769 0.1995, 0.6659, 0.7921, 0.9454, 0.9970, 0.2330 -0.0249, -0.3066, -0.2927, -0.4923, 0.8220, 0.2437 0.4513, -0.9481, -0.0770, -0.4374, -0.9421, 0.2879 -0.3405, 0.5931, -0.3507, -0.3842, 0.8562, 0.3987 0.9538, 0.0471, 0.9039, 0.7760, 0.0361, 0.1706 -0.0887, 0.2104, 0.9808, 0.5478, -0.3314, 0.4128 -0.8220, -0.6302, 0.0537, -0.1658, 0.6013, 0.4306 -0.4123, -0.2880, 0.9074, -0.0461, -0.4435, 0.5144 0.0060, 0.2867, -0.7775, 0.5161, 0.7039, 0.3599 -0.7968, -0.5484, 0.9426, -0.4308, 0.8148, 0.2979 0.7811, 0.8450, -0.6877, 0.7594, 0.2640, 0.2362 -0.6802, -0.1113, -0.8325, -0.6694, -0.6056, 0.6544 0.3821, 0.1476, 0.7466, -0.5107, 0.2592, 0.1648 0.7265, 0.9683, -0.9803, -0.4943, -0.5523, 0.2454 -0.9049, -0.9797, -0.0196, -0.9090, -0.4433, 0.6447 -0.4607, 0.1811, -0.2389, 0.4050, -0.0078, 0.5229 0.2664, -0.2932, -0.4259, -0.7336, 0.8742, 0.1834 -0.4507, 0.1029, -0.6294, -0.1158, -0.6294, 0.6081 0.8948, -0.0124, 0.9278, 0.2899, -0.0314, 0.1534 -0.1323, -0.8813, -0.0146, -0.0697, 0.6135, 0.2386






















.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.