Saving and Loading a C# Decision Tree Regression Model

It’s relatively rare to want to save a decision tree regression model to file. Decision trees are usually a component of an ensemble collection of trees (bagging tree, random forest, adaptive boosting, gradient boosting) and you want to save the main model, not the decision tree component. Also, in most cases, decision trees can be trained very quickly, and so it’s often easier just to retrain the tree model.

But I recently refactored my standard C# implementation of a decision tree regression system, and just for fun, I figured I’d implement Save() and Load() methods. See my post at jamesmccaffreyblog.com/2025/10/20/decision-tree-regression-without-recursion-from-scratch-using-csharp/ for details of the system.

Saving and loading any machine learning regression model is highly dependent on the underlying architecture. Saving and loading a decision tree is easy if the tree is implemented using a List data structure where children are integer indexes into the List. But if a tree is implemented using pointers/references, saving and loading is a bit tricky because references are essentally memory addresses and they vanish when a program terminates.

It took me a few hours to implement a Save() and a Load() for my decision tree regression system. The output of a demo is:

Loading synthetic train (200) and test (40) data
Done

Setting maxDepth = 3
Setting minSamples = 2
Setting minLeaf = 18
Creating and training tree
Done

Tree:
ID 0      0  -0.2102   not_null   not_null   0.0000  False
ID 1      4   0.1431   not_null   not_null   0.0000  False
ID 2      0   0.3915   not_null   not_null   0.0000  False
ID 3      0  -0.6553   not_null   not_null   0.0000  False
ID 4     -1   0.0000   null       null       0.4123   True
ID 5      4  -0.2987   not_null   not_null   0.0000  False
ID 6      2   0.3777   not_null   not_null   0.0000  False
ID 7     -1   0.0000   null       null       0.6952   True
ID 8     -1   0.0000   null       null       0.5598   True
ID 11    -1   0.0000   null       null       0.4101   True
ID 12    -1   0.0000   null       null       0.2613   True
ID 13    -1   0.0000   null       null       0.1882   True
ID 14    -1   0.0000   null       null       0.1381   True

Predicting for trainX[0] =
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065
Predicted y = 0.4101

Saving model to file
Done

Loading saved model to new tree
Done

New tree:
ID 0      0  -0.2102   not_null   not_null   0.0000  False
ID 1      4   0.1431   not_null   not_null   0.0000  False
ID 2      0   0.3915   not_null   not_null   0.0000  False
ID 3      0  -0.6553   not_null   not_null   0.0000  False
ID 4     -1   0.0000   null       null       0.4123   True
ID 5      4  -0.2987   not_null   not_null   0.0000  False
ID 6      2   0.3777   not_null   not_null   0.0000  False
ID 7     -1   0.0000   null       null       0.6952   True
ID 8     -1   0.0000   null       null       0.5598   True
ID 11    -1   0.0000   null       null       0.4101   True
ID 12    -1   0.0000   null       null       0.2613   True
ID 13    -1   0.0000   null       null       0.1882   True
ID 14    -1   0.0000   null       null       0.1381   True

Using saved model to predict for:
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065
Predicted y = 0.4101

End demo

The save a trained decision tree model, I traverse the tree breadth-first using a queue data structure. In pseudo-code:

initialize an empty queue with root node
open file to write to
while queue is not empty
  node = dequeue()
  fetch node values (replacing pointers with node IDs)
  write values to file as a comma-separated line
  if node left child not null, add to queue
  if node right child not null, add to queue
end-loop
close file

The implementation is (replace “gt” with Boolean symbol):

public void Save(string fn)
{
  FileStream ofs = new FileStream(fn, FileMode.Create);
  StreamWriter sw = new StreamWriter(ofs);

  Queue q = new Queue();
  q.Enqueue(this.root);
  while (q.Count "gt" 0) {
    Node n = q.Dequeue();
    string s = "";
    s += n.id + ",";
    s += n.colIdx + ",";
    s += n.thresh.ToString("F4") + ",";
    if (n.left == null) s += "-1,";  // for null
    else s += n.left.id + ",";
    if (n.right == null) s += "-1,";
    else s += n.right.id + ",";
    s += n.value.ToString("F4") + ",";
    s += n.isLeaf.ToString();

    sw.WriteLine(s);

    if (n.left != null) q.Enqueue(n.left);
    if (n.right != null) q.Enqueue(n.right);
  } // while

  sw.Close(); ofs.Close();
  return;
}

To load a tree, I create a List of dummy nodes. A tree with maxDepth = n has at most 2^(n+1)-1 nodes. Then I iterate through the saved file, parsing out the values from each line, and feed them to the current tree node in the List. In pseudo-code:

initialize a List with 2^(n+1)-1 dummy nodes
open saved file
while file not finished
  read a line
  parse values from line (replacing node IDs with pointers)
  feed values to curr node in List
end-loop
close file
set root to List[0]

The implementation is (replace “lt” with Boolean symbol):

public void Load(string fn)
{
  FileStream ifs = new FileStream(fn, FileMode.Open);
  StreamReader sr = new StreamReader(ifs);

  int maxNodes = (int)Math.Pow(2, this.maxDepth + 1) - 1;
  List lst = new List();
  for (int i = 0; i "lt" maxNodes; ++i)
    lst.Add(new Node(-1, -1, 0.0,
      null, null, 0.0, false)); // dummy node
 
  string line = "";
  string[] tokens = null;
  while ((line = sr.ReadLine()) != null)
  {
    tokens = line.Split(',');
    int idx = int.Parse(tokens[0]); // where ?
    lst[idx].id = idx;
    lst[idx].colIdx = int.Parse(tokens[1]);
    lst[idx].thresh = double.Parse(tokens[2]);

    int leftIdx = int.Parse(tokens[3]);
    if (leftIdx != -1)
      lst[idx].left = lst[leftIdx]; // index to pointer

    int rightIdx = int.Parse(tokens[4]);
    if (rightIdx != -1)
      lst[idx].right = lst[rightIdx];

    lst[idx].value = double.Parse(tokens[5]);
    lst[idx].isLeaf = bool.Parse(tokens[6]);
  }

  sr.Close(); ifs.Close();
  this.root = lst[0];
  return;
}

A fun and interesting experiment.



Decision tree regression models have a certain beauty to them (in my mind anyway). Illustrator Joseph Bolegard (1889-1963) was well-known for his advertising art in the 1940s and 1950s, which often featured attractive models. Until recently, advertising used models that ordinary people could admire and want to be like, as opposed to the modern trend of using models that seem to be the least common denominators of society. But there is still some good, non-pandering advertising around.


This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply