Checking Machine Learning Training Data for Multicollinearity Using VIF (Variance Inflation Factor) from Scratch Python

In machine learning, if training data is multicollinear, the resulting model will likely be poor. The most common way to analyze training data for multicollinearity is to compute the VIF (variance inflation factor) for each column of the data.

VIF is a value between 1.0 and positive infinity (well in weird scenarios, a VIF value could be less than one). Briefly, if all column VIF values are less than about 7.0, the data is probably OK.

if VIF is close to 1.0, the column is not correlated.
if VIF between 1.0 and 5.0, column is mildly correlated
if VIF between 5.0 and 10.0, column is highly correlated
if VIF greater than 10.0, column is extremely correlated

To compute the VIF for a specified column of training data, you use the specified column as the dependent y variable, and use the remaining columns as the independent predictor variables, and compute a linear regression model, and then compute the R2 (coefficient of determination) for the model. The VIF value for the column is 1.0 / (1.0 – R2).

Suppose that you have a set of training data X predictor values, and you use some column c as the dependent y variable, and all the other columns as predictors for c. After training the linear regression model, you compute R2 and it is 0.90 — which means column c is predicted very well by the other columns. The VIF value for column c is 1.0 / (1.0 – R2) = 1.0 / 0.10 = 10.0 which is large which is bad because column c is a linear combination of the other columns — the data is somewhat multicollinear. Now, with the same setup, suppose R2 is 0.20 — which means column c cannot be predicted well by the other columns. The VIF value is 1.0 / (1.0 – 0.20) = 1.0 / 0.8 = 1.25 which is a small value, which is good, because column c is not a linear combination of the other columns, and therefore the data is not multicollinear.

I put together a demo using Python and the scikit library. I created two datasets. The first data set has five columns of predictors, followed by a column of target y values. The data is “normal” in the sense that there’s no multicollinearity. There are 20 items. It looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
. . .

The second dataset is highly multicollinear, where the third column is 2 times the first column, plus the second column, plus a small random value between 0.000 and 0.001. It looks like:

-0.1660,  0.4406,  0.1096, -0.3953, -0.7065, 0.4840
 0.0776, -0.1616, -0.0045, -0.5911,  0.7562, 0.1568
-0.9452,  0.3409, -1.5482,  0.1174, -0.7192, 0.8054
. . .

The output of the demo program is:

Begin variance inflation factor demo

Loading synthetic (20) normal data

First two X:
[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]
[ 0.0776 -0.1616  0.3704 -0.5911  0.7562]

Begin VIF analysis
col =   0 | vif = 1.1980
col =   1 | vif = 1.4591
col =   2 | vif = 1.2345
col =   3 | vif = 1.3025
col =   4 | vif = 1.2120

Loading synthetic (20) multicollinear data
(col[2] = 2.0 * col[0] + col[1] + rnd)

First two X:
[-0.1660  0.4406  0.1096 -0.3953 -0.7065]
[ 0.0776 -0.1616 -0.0045 -0.5911  0.7562]

Begin VIF analysis
col =   0 vif = 25546262.9389
col =   1 vif = 6023299.3886
col =   2 vif = 30951889.1370
col =   3 vif = 1.2937
col =   4 vif = 1.2117

End demo

As expected, the first dataset didn’t have any bad VIF values, but the VIF values for the second dataset show that columns [0], [1], [2] are highly correlated.

No moral to this blog post. Just an interesting exploration.



In machine learning, you don’t want a relationship between two columns in your training data. But in science fiction movies, you absolutely want a good relationship between the hero and the main actress.

I’m a huge fan of science fiction movies from the 1950s and 1960s. Here are posters of two films that were good, but they could have been great if the chemistry between the hero and the main lady were better.

Left: In “Crack in the World” (1965), scientists create a project to drill to the Earth’s magma center to gain a source of unlimited heat, and therefore unlimited energy. The plan involves firing a thermonuclear missile into a hole. This was not a good idea, to put it mildly. The chemistry between Dr. Rampion (actor Kieron Moore) and the wife of his boss, Dr. Sorensen (actress Janette Scott) was, well, one with no chemistry. But I give the movie a B grade anyway.

Right: In “The Day the Earth Caught Fire” (1961), The U.S. and the Soviets unknowingly explode nuclear test weapons at the same time on the same day. This was not a good idea, to put it mildly. The Earth is knocked out of orbit, towards the Sun. Only exploding every nuclear device on the planet simultaneously might save humanity. The chemistry between newspaper reporter Peter Stenning (actor Edward Judd) and office worker Jeannie Craig (actress Janet Munro — she’s one of my sci fi favorites) was awkward and unconvincing. But I give the movie a B- grade anyway.


Demo program:

# variance_inflation_factor.py

import numpy as np
from sklearn.linear_model import LinearRegression

np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')

def vif(data, i):
  # vif = 1.0 / (1.0 - R2) if col [i] is dependent variable
  X = np.delete(data, i, axis=1) # all cols except i
  y = data[:,i]

  model = LinearRegression()
  model.fit(X, y)
  r2 = model.score(X, y)
  result = 1.0 / (1.0 - r2)
  return result

# -----------------------------------------------------------

print("\nBegin variance inflation factor demo ")

print("\nLoading synthetic (20) normal data ")
train_X = \
  np.loadtxt(".\\Data\\synthetic_train_20.txt",
  usecols=[0,1,2,3,4], delimiter=",")
 
print("\nFirst two X: ")
for i in range(2):
  print(train_X[i])

print("\nBegin VIF analysis ")

for c in range(len(train_X[0])):
  z = vif(train_X, c)
  print("col = %3d | vif = %0.4f " % (c, z))

print("\nLoading synthetic (20) mulicollinear data ")
print("(col[2] = 2.0 * col[0] + col[1] + rnd) ")
train_X = \
  np.loadtxt(".\\Data\\synthetic_train_20_collinear.txt",
  usecols=[0,1,2,3,4], delimiter=",")

print("\nFirst two X: ")
for i in range(2):
  print(train_X[i])

print("\nBegin VIF analysis ")

for c in range(len(train_X[0])):
  z = vif(train_X, c)
  print("col = %3d vif = %0.4f " % (c, z))
print("\nEnd demo ")

First, normal, dataset:

# synthetic_train_20.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996

Second, multicollinear, dataset:

# synthetic_train_20_collinear.txt
# col [2] = 2*[0] + [1] + rand(0.001)
#
-0.1660,  0.4406,  0.1096, -0.3953, -0.7065, 0.4840
 0.0776, -0.1616, -0.0045, -0.5911,  0.7562, 0.1568
-0.9452,  0.3409, -1.5482,  0.1174, -0.7192, 0.8054
 0.9365, -0.3732,  1.5016,  0.7528,  0.7892, 0.1345
-0.8299, -0.9219, -2.5800,  0.7563, -0.8033, 0.7955
 0.0663,  0.3838,  0.5179,  0.3730,  0.6693, 0.3206
-0.9634,  0.5003, -1.4245,  0.4963, -0.4391, 0.7377
-0.1042,  0.8172,  0.6100, -0.4244, -0.7399, 0.4801
-0.9613,  0.3577, -1.5636, -0.4689, -0.0169, 0.6861
-0.7065,  0.1786, -1.2325, -0.7953, -0.1719, 0.5569
 0.3888, -0.1716,  0.6073,  0.0718,  0.3276, 0.2500
 0.1731,  0.8068,  1.1544, -0.7214,  0.6148, 0.3297
-0.2046, -0.6693, -1.0770, -0.3045,  0.5016, 0.2129
 0.2473,  0.5019,  0.9980, -0.4601,  0.7918, 0.2613
-0.1438,  0.9297,  0.6435,  0.2434, -0.7705, 0.5171
 0.1568, -0.1837,  0.1313,  0.8068,  0.1474, 0.3307
-0.9943,  0.2343, -1.7528,  0.0541,  0.7719, 0.5581
 0.2467, -0.9684, -0.4732,  0.3818,  0.9946, 0.1092
-0.6553, -0.7257, -2.0345,  0.3936, -0.8680, 0.7018
 0.8460,  0.4230,  2.1166, -0.9602, -0.9476, 0.1996
Posted in Machine Learning, Scikit | Leave a comment

Machine Learning Regression Showdown: Kernel Ridge Regression vs. Support Vector Regression Using Scikit

Bottom line: I compared kernel ridge regression (KRR) and support vector regression (SVR), using a small set of synthetic data. The results were essentially the same.

Over the past few years, based on several machine learning regression projects that I worked on, I have noticed that kernel ridge regression and support vector regression usually give pretty much the same results. This suggests that the kernel mechanism is more important than the error function (mean squared error for KRR, epsilon-insensitive loss for SVR). Put another way, KRR and SVR are really the same except they penalize incorrect predictions in different ways.

One morning before work, I figured I run a short investigation, using the scikit-learn KernelRidge and SVR modules. The key results were nearly identical:

# KRR:
# gamma = 0.1000
# alpha = 0.0010
# Accuracy (within 0.10) train = 0.9700
# Accuracy (within 0.10) test = 0.9500

# SVR:
# gamma = 0.1000
# C = 10.0000
# epsilon = 0.0100
# Accuracy (within 0.10) train = 0.9650
# Accuracy (within 0.10) test = 0.9500

The synthetic demo data has 200 training items and 40 test items, small, so the difference between the accuracy of the KRR model (97.00% = 194 out of 200 correct) and the SVR model (96.50% = 193 out of 200 correct) is not significant.

In my opinion, KRR is preferable to SVR in most scenarios because KRR has one less hyperparameter to tune, and the underlying KRR training/optimization algorithm is much simpler (either closed form with matrix inverse or SGD for very large datasets) than SVR training/optimization (hideously complicated quadratic programming). Also, KRR scales better than SVR for large datasets (you can use SGD training).

The only advantage of SVR over KRR is that while KRR requires you to store all training data items (for making predictions), even though SVR must store training items too, but during training, SVR removes some items, leaving just the “support vectors”. This means SVR requires less memory after training than KRR, and SVR is a bit faster than KRR when making predictions (although the difference in speed between SVR and KRR is often tiny).

Output of the KRR demo:

Begin scikit kernel ridge regression demo

Loading synthetic train (200) and test (40) data
Done

First three train X:
[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]
[ 0.0776 -0.1616  0.3704 -0.5911  0.7562]
[-0.9452  0.3409 -0.1654  0.1174 -0.7192]

First three train y:
0.4840
0.1568
0.8054

Creating scikit KRR (with RBF) model
Setting gamma = 0.1000
Setting alpha = 0.0010
Done

Training scikit KRR model
Done.

Evaluating model

Accuracy (within 0.10) train = 0.9700
Accuracy (within 0.10) test = 0.9500

MSE train = 0.0001
MSE test = 0.0002

End demo

Output of the SVR demo:

Begin scikit SVR demo

Loading synthetic train (200) and test (40) data
Done

First three train X:
[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]
[ 0.0776 -0.1616  0.3704 -0.5911  0.7562]
[-0.9452  0.3409 -0.1654  0.1174 -0.7192]

First three train y:
0.4840
0.1568
0.8054

Creating scikit SVR model
Setting gamma = 0.1000
Setting C = 10.0000
Setting epsilon = 0.0100
Done

Training SVR model
Done.

support vectors:
[  1   2   3   6   7  10  16  17  18  19  20  21  23  25
  26  32  37  39  40  41  42  43  46  47  48  52  57  58
  59  60  61  64  68  69  71  73  76  78  81  83  87  88
  89  90  92  93  94  95  97 100 101 106 109 110 112 118
 119 124 125 136 137 139 140 141 143 152 154 156 157 159
 161 163 164 165 167 169 170 171 173 176 179 180 181 185
 186 189 191 196 198 199]

Number support vectors = 90

Evaluating model

Accuracy (within 0.10) train = 0.9650
Accuracy (within 0.10) test = 0.9500

MSE train = 0.0001
MSE test = 0.0002

End demo

The KRR model stores all 200 training items, but SVR stores only 90 items. Does this matter? It depends on the scenario.



Kernel ridge regression and support vector regression are quite similar. During the golden age of pinball machines, there were several machines with a similar playing card theme.

Left: “Drop-a-Card” by Gottlieb (1971). A very well-knonw machine. About 2600 were produced. This was the last “short flipper”machine from Gottlieb. I played this game often at the UC Irvine Student Center when I was an undergraduate there.

Right: “Straight Flush” by Williams (1970). About 2400 were produced. I preferred the Williams style long flippers over the Gottlieb style short flippers.


The KRR demo program. Replace the “lt” in the accuracy() function with the less-than Boolean symbol (my blog editor chokes on symbols).

# krr_scikit.py
# kernel ridge regression on synthetic data

import numpy as np
from sklearn.kernel_ridge import KernelRidge

# KernelRidge(alpha=1, *, kernel='linear', gamma=None,
# degree=3, coef0=1, kernel_params=None)

# -----------------------------------------------------------

np.set_printoptions(precision=4, suppress=True,
  floatmode='fixed', linewidth=60)

# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
  n = len(data_X)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)[0]

    if np.abs(y - y_pred) "lt" np.abs(y * pct_close):
      n_correct += 1
    else: 
      n_wrong += 1
  return n_correct / (n_correct + n_wrong)

def mse(model, data_X, data_y):
  n = len(data_X)
  sum = 0.0
  for i in range(n):
    actual_y = data_y[i]
    pred_y = model.predict(data_X[i].reshape(1, -1))[0]
    diff = actual_y - pred_y
    sum += diff * diff
  return sum /n

# -----------------------------------------------------------
# -----------------------------------------------------------

print("\nBegin scikit kernel ridge regression demo ")

np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')

print("\nLoading synthetic train (200) and test (40) data ")
train_Xy = np.loadtxt(".\\Data\\synthetic_train_200.txt",
  usecols=[0,1,2,3,4,5], delimiter=",")
train_X = train_Xy[:,[0,1,2,3,4]]
train_y = train_Xy[:,5]

test_Xy = np.loadtxt(".\\Data\\synthetic_test_40.txt",
  usecols=[0,1,2,3,4,5], delimiter=",")
test_X = test_Xy[:,[0,1,2,3,4]]
test_y = test_Xy[:,5]
print("Done ")

print("\nFirst three train X: ")
for i in range(3):
  print(train_X[i])
print("\nFirst three train y: ")
for i in range(3):
  print("%0.4f " % train_y[i])

print("\nCreating scikit KRR (with RBF) model ")
gamma = 0.1000  # found by grid search below
alpha = 0.0010
print("Setting gamma = %0.4f " % gamma)
print("Setting alpha = %0.4f " % alpha)
model = KernelRidge(kernel='rbf', gamma=gamma, alpha=alpha)
print("Done ")

print("\nTraining scikit KRR model ")
model.fit(train_X, train_y)
print("Done. ")

print("\nEvaluating model ")
acc_train = accuracy(model, train_X, train_y, 0.10)
acc_test = accuracy(model, test_X, test_y, 0.10)
print("\nAccuracy (within 0.10) train = %0.4f " % \
  acc_train)
print("Accuracy (within 0.10) test = %0.4f " % \
  acc_test)

mse_train = mse(model, train_X, train_y)
mse_test = mse(model, test_X, test_y)
print("\nMSE train = %0.4f " % mse_train)
print("MSE test = %0.4f " % mse_test)

# gamma_vals = [0.001, 0.01, 0.05, 0.10, 1.0, 10.0]
# alpha_vals = [0.001, 0.01, 0.05, 0.10, 1.0, 10.0]

# for i in range(len(gamma_vals)):
#   for j in range(len(alpha_vals)):
#     print("\n============")
#     print("gamma = %0.4f " % gamma_vals[i])
#     print("alpha = %0.4f " % alpha_vals[j])
#     model = KernelRidge(kernel='rbf', 
#       gamma=gamma_vals[i], alpha=alpha_vals[j])
#     model.fit(train_X, train_y)
#     acc_train = accuracy(model, train_X, train_y, 0.10)
#     print("Accuracy (within 0.10) train = %0.4f " % \
#      acc_train)
#     acc_test = accuracy(model, test_X, test_y, 0.10)
#     print("Accuracy (within 0.10) test = %0.4f " % \
#      acc_test)

# KRR:
# gamma = 0.1000
# alpha = 0.0010
# Accuracy (within 0.10) train = 0.9700
# Accuracy (within 0.10) test = 0.9500

# SVR:
# gamma = 0.1000
# C = 10.0000
# epsilon = 0.0100
# Accuracy (within 0.10) train = 0.9650
# Accuracy (within 0.10) test = 0.9500

print("\nEnd demo ")

The SVR demo program. Replace the “lt” in the accuracy() function with the less-than Boolean symbol (my blog editor chokes on symbols).

# svr_scikit.py
# SVR on synthetic data

import numpy as np
from sklearn.svm import SVR

# SVR(*, kernel='rbf', degree=3, gamma='scale',
# coef0=0.0, tol=0.001, C=1.0, epsilon=0.1, shrinking=True,
# cache_size=200, verbose=False, max_iter=-1)

# -----------------------------------------------------------

np.set_printoptions(precision=4, suppress=True,
  floatmode='fixed', linewidth=60)

# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
  n = len(data_X)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)[0]

    if np.abs(y - y_pred) "lt" np.abs(y * pct_close):
      n_correct += 1
    else: 
      n_wrong += 1
  return n_correct / (n_correct + n_wrong)

def mse(model, data_X, data_y):
  n = len(data_X)
  sum = 0.0
  for i in range(n):
    actual_y = data_y[i]
    pred_y = model.predict(data_X[i].reshape(1, -1))[0]
    diff = actual_y - pred_y
    sum += diff * diff
  return sum /n

# -----------------------------------------------------------
# -----------------------------------------------------------

print("\nBegin scikit SVR demo ")

np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')

print("\nLoading synthetic train (200) and test (40) data ")
train_Xy = np.loadtxt(".\\Data\\synthetic_train_200.txt",
  usecols=[0,1,2,3,4,5], delimiter=",")
train_X = train_Xy[:,[0,1,2,3,4]]
train_y = train_Xy[:,5]

test_Xy = np.loadtxt(".\\Data\\synthetic_test_40.txt",
  usecols=[0,1,2,3,4,5], delimiter=",")
test_X = test_Xy[:,[0,1,2,3,4]]
test_y = test_Xy[:,5]
print("Done ")

print("\nFirst three train X: ")
for i in range(3):
  print(train_X[i])
print("\nFirst three train y: ")
for i in range(3):
  print("%0.4f " % train_y[i])

print("\nCreating scikit SVR model ")
gamma = 0.1000  # found by grid search (below)
C = 10.0000
epsilon = 0.0100
print("Setting gamma = %0.4f " % gamma)
print("Setting C = %0.4f " % C)
print("Setting epsilon = %0.4f " % epsilon)
model = SVR(kernel='rbf', gamma=gamma, C=C, epsilon=epsilon)
print("Done ")

print("\nTraining SVR model ")
model.fit(train_X, train_y)
print("Done. ")

print("\nsupport vectors: ")
print(model.support_)
print("\nNumber support vectors =  ")
print(len(model.support_))

print("\nEvaluating model ")
acc_train = accuracy(model, train_X, train_y, 0.10)
acc_test = accuracy(model, test_X, test_y, 0.10)
print("\nAccuracy (within 0.10) train = %0.4f " % \
  acc_train)
print("Accuracy (within 0.10) test = %0.4f " % \
  acc_test)

mse_train = mse(model, train_X, train_y)
mse_test = mse(model, test_X, test_y)
print("\nMSE train = %0.4f " % mse_train)
print("MSE test = %0.4f " % mse_test)

# gamma_vals = [0.01, 0.10, 1.0, 10.0]
# C_vals = [0.01, 0.10, 1.0, 10.0]
# epsilon_vals = [0.01, 0.10, 1.0, 10.0]

# for i in range(len(gamma_vals)):
#   for j in range(len(C_vals)):
#     for k in range(len(epsilon_vals)):
      # print("\n============")
      # print("gamma = %0.4f " % gamma_vals[i])
      # print("C = %0.4f " % C_vals[j])
      # print("epsilon = %0.4f " % epsilon_vals[k])
      # model = SVR(gamma=gamma_vals[i], C=C_vals[j],
      #  epsilon=epsilon_vals[k])
      # model.fit(train_X, train_y)
      # acc_train = accuracy(model, train_X, train_y, 0.10)
      # print("Accuracy (within 0.10) train = %0.4f " % \
      #  acc_train)
      # acc_test = accuracy(model, test_X, test_y, 0.10)
      # print("Accuracy (within 0.10) test = %0.4f " % \
      #  acc_test)

# SVR:
# gamma = 0.1000
# C = 10.0000
# epsilon = 0.0100
# Accuracy (within 0.10) train = 0.9650
# Accuracy (within 0.10) test = 0.9500

# KRR:
# gamma = 0.1000
# alpha = 0.0010
# Accuracy (within 0.10) train = 0.9700
# Accuracy (within 0.10) test = 0.9500

print("\nEnd demo ")

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning, Scikit | Leave a comment

Adding L2 Regularization to Linear Regression Trained Using MP Pseudo-Inverse via QR-Householder with C#

There are three ways to train a basic linear regression model: 1.) using stochastic gradient descent, 2.) using left pseudo-inverse (normal equations) via Cholesky inverse, 3.) using relaxed Moore-Penrose pseudo-inverse via one of many possible inverses.

I have implemented many of the training algorithms from scratch. For type (3) training, my approach-of-choice is QR-Householder. One of the (very) minor disadvantages of using MP pseudo-inverse training is that there’s no easy way to add L2 regularization. Before I go any further, let me state forcefully that I almost never use regularization with linear regression. It’s just not useful, but that’s another (surprisingly complicated) story.

One evening, after I walked my dogs, just for fun, I figured I’d add L2 regularization to my implementation of linear regression trained with MP pseudo-inverse via QR-Householder.

Output of the demo:

Begin C# linear regression using MP pinv
 (QR-Householder) training with L2 regularization

Loading synthetic train (200) and test (40) data
Done

First three train X:
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065
  0.0776 -0.1616  0.3704 -0.5911  0.7562
 -0.9452  0.3409 -0.1654  0.1174 -0.7192

First three train y:
  0.4840
  0.1568
  0.8054

Creating and training Linear Regression model
 using QR p-inverse
Setting L2 lamda = 1.0000
Done

Coefficients/weights:
-0.2618  0.0331  -0.0453  0.0353  -0.1132
Bias/constant: 0.3618

Evaluating model

Accuracy train (within 0.10) = 0.4700
Accuracy test (within 0.10) = 0.6500

MSE train = 0.0026
MSE test = 0.0019

Predicting for x =
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065

Predicted y = 0.5311

End demo

The demo data is synthetic. It was generated by a neural network, and so linear regression cannot predict it very well. There are 200 training items and 40 test items so it’s a small dataset. There are 5 predictors.

Adding L2 regularization to a linear regression model that’s trained using MP pseudo-inverse is tricky. Expressed in a diagram:

The diagram illustrates an example with just 4 predictor variables and just 20 training items. First the training data is converted to a design matrix by adding a leading column of 1.0 values. Next a special regularization matrix is added to the bottom of the design matrix. The square root of the regularization constant is added to the diagonal of the regularization matrix, except for the upper-left cell which corresponds to the model bias. The target y values have associated 0.0s appended as shown.

At this point the modified X matrix and y vector are trained using MP pinv as usual.

If you get the feeling that this is complicated, well it is, especially since regularization for linear regression is rarely useful.

Interesting experiment but not really useful in a practical way. The technique can be used with any linear model, notably quadratic regression where it can sometimes be mildly useful



I am fascinated to the point of obsession by mathematics, computer science, and machine learning. All of these things represent a model of reality in some way.

When I was a young man, I loved building plastic models, especially those from the Revell company. Some of my all time favorites were models produced in the 1950s that were space related. Here’s one model that I remember with great fondness.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor chokes on them).

using System;
using System.IO;
using System.Collections.Generic;

namespace LinearRegressionPinvQRHouseholder
{
  internal class LinearRegressionPinvProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin C# linear regression" +
        " using MP pinv (QR-Householder) training with" +
        " L2 regularization ");

      // 1. load data
      Console.WriteLine("\nLoading synthetic train" +
        " (200) and test (40) data");
      string trainFile =
        "..\\..\\..\\Data\\synthetic_train_200.txt";
      int[] colsX = new int[] { 0, 1, 2, 3, 4 };
      double[][] trainX =
        MatLoad(trainFile, colsX, ',', "#");
      double[] trainY =
        MatToVec(MatLoad(trainFile,
        new int[] { 5 }, ',', "#"));

      string testFile =
        "..\\..\\..\\Data\\synthetic_test_40.txt";
      double[][] testX =
         MatLoad(testFile, colsX, ',', "#");
      double[] testY =
        MatToVec(MatLoad(testFile,
        new int[] { 5 }, ',', "#"));
      Console.WriteLine("Done ");

      Console.WriteLine("\nFirst three train X: ");
      for (int i = 0; i "lt" 3; ++i)
        VecShow(trainX[i], 4, 8);

      Console.WriteLine("\nFirst three train y: ");
      for (int i = 0; i "lt" 3; ++i)
        Console.WriteLine(trainY[i].ToString("F4").
          PadLeft(8));

      // 2. create and train model using pseudo-inverse
      Console.WriteLine("\nCreating and training" +
        " Linear Regression model using QR p-inverse");
      LinearRegressor model = new LinearRegressor();

      // train no regularization
      //model.Train(trainX, trainY);
      //Console.WriteLine("Done ");

      // train with regularization
      double lamda = 1.0;
      Console.WriteLine("Setting L2 lamda = " +
        lamda.ToString("F4"));
      model.Train(trainX, trainY, lamda);
      Console.WriteLine("Done ");

      // 2b. show model parameters
      Console.WriteLine("\nCoefficients/weights: ");
      for (int i = 0; i "lt" model.weights.Length; ++i)
        Console.Write(model.weights[i].
          ToString("F4") + "  ");
      Console.WriteLine("\nBias/constant: " +
        model.bias.ToString("F4"));

      // 3. evaluate model
      Console.WriteLine("\nEvaluating model ");
      double accTrain = model.Accuracy(trainX, trainY, 0.10);
      Console.WriteLine("\nAccuracy train (within 0.10) = " +
        accTrain.ToString("F4"));
      double accTest = model.Accuracy(testX, testY, 0.10);
      Console.WriteLine("Accuracy test (within 0.10) = " +
        accTest.ToString("F4"));

      double mseTrain = model.MSE(trainX, trainY);
      Console.WriteLine("\nMSE train = " +
        mseTrain.ToString("F4"));
      double mseTest = model.MSE(testX, testY);
      Console.WriteLine("MSE test = " +
        mseTest.ToString("F4"));

      // 4. use model to predict first training item
      double[] x = trainX[0];
      Console.WriteLine("\nPredicting for x = ");
      VecShow(x, 4, 9);
      double predY = model.Predict(x);
      Console.WriteLine("\nPredicted y = " +
        predY.ToString("F4"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main()

    // ------------------------------------------------------
    // helpers for Main(): MatLoad(), MatToVec(), VecShow()
    // ------------------------------------------------------

    static double[][] MatLoad(string fn, int[] usecols,
      char sep, string comment)
    {
      List"lt"double[]"gt" result = new List"lt"double[]"gt"();
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        string[] tokens = line.Split(sep);
        List"lt"double"gt" lst = new List"lt"double"gt"();
        for (int j = 0; j "lt" usecols.Length; ++j)
          lst.Add(double.Parse(tokens[usecols[j]]));
        double[] row = lst.ToArray();
        result.Add(row);
      }
      sr.Close(); ifs.Close();
      return result.ToArray();
    }

    static double[] MatToVec(double[][] mat)
    {
      int nRows = mat.Length;
      int nCols = mat[0].Length;
      double[] result = new double[nRows * nCols];
      int k = 0;
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[k++] = mat[i][j];
      return result;
    }

    static void VecShow(double[] vec, int dec, int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString("F" + dec).
          PadLeft(wid));
      Console.WriteLine("");
    }

  } // class Program

  // ========================================================

  public class LinearRegressor
  {
    public double[] weights;
    public double bias;
    private Random rnd;

    // ------------------------------------------------------

    public LinearRegressor(int seed = 0)  // ctor
    {
      this.weights = new double[0];
      this.bias = 0;
      this.rnd = new Random(seed); // not used this version
    }

    // ------------------------------------------------------
    // primary: Train(), Predict(), Accuracy(), MSE()
    // helpers: MatToDesign(), MatVecProd()
    // ------------------------------------------------------

    public double Predict(double[] x)
    {
      double result = 0.0;
      for (int j = 0; j "lt" x.Length; ++j)
        result += x[j] * this.weights[j];
      result += this.bias;
      return result;
    }

    // ------------------------------------------------------

    public void Train(double[][] trainX, double[] trainY)
    {
      // no regularization.
      // wts = pinv(designX) * y
      int dim = trainX[0].Length;
      this.weights = new double[dim];
      double[][] X = MatToDesign(trainX);
      double[][] Xpinv = QRHouseholder.MatPinv(X);
      double[] biasAndWts = MatVecProd(Xpinv, trainY);
      this.bias = biasAndWts[0];
      for (int i = 1; i "lt" biasAndWts.Length; ++i)
        this.weights[i - 1] = biasAndWts[i];
      return;
    }

    // ------------------------------------------------------

    public void Train(double[][] trainX, double[] trainY,
      double lamda)
    {
      // train using MP pinv QR Householder with L2
      int nRows = trainX.Length;  // 200
      int dim = trainX[0].Length;  // 5
      this.weights = new double[dim];

      double[][] Xd = MatToDesign(trainX); // 200 by 6
      double[][] X = MatRegularize(Xd, lamda); // tricky
      
      double[] Y = new double[nRows + dim + 1];
      for (int j = 0; j "lt" nRows; ++j)
        Y[j] = trainY[j];

      double[][] Xpinv = QRHouseholder.MatPinv(X);
      double[] biasAndWts = MatVecProd(Xpinv, Y);
      this.bias = biasAndWts[0];
      for (int i = 1; i "lt" biasAndWts.Length; ++i)
        this.weights[i - 1] = biasAndWts[i];
      return;
    }

    // ------------------------------------------------------

    private static double[][] MatRegularize(double[][] Xd,
      double lamda)
    {
      // Xd is a design matrix with leading col of 1.0s
      // add dim by dim to bottom of X
      // then . . . 
      int nRows = Xd.Length;  // src
      int nCols = Xd[0].Length;  // src
      double[][] result = MatMake(nRows + nCols, nCols);

      // copy top into result, inc leading 1.0s
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[i][j] = Xd[i][j];
        
      // fill bottom starting at [1][1] (skip bias)
      int col = 1;
      for (int i = nRows + 1; i "lt" result.Length; ++i)
        result[i][col++] = Math.Sqrt(lamda);
      
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatMake(int nRows, int nCols)
    {
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols];
      return result;
    }

    // ------------------------------------------------------

    public double Accuracy(double[][] dataX, double[] dataY,
      double pctClose)
    {
      int numCorrect = 0; int numWrong = 0;
      for (int i = 0; i "lt" dataX.Length; ++i)
      {
        double actualY = dataY[i];
        double predY = this.Predict(dataX[i]);
        if (Math.Abs(predY - actualY) "lt"
          Math.Abs(pctClose * actualY))
          ++numCorrect;
        else
          ++numWrong;
      }
      return (numCorrect * 1.0) / (numWrong + numCorrect);
    }

    // ------------------------------------------------------

    public double MSE(double[][] dataX, double[] dataY)
    {
      int n = dataX.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" n; ++i)
      {
        double actualY = dataY[i];
        double predY = this.Predict(dataX[i]);
        sum += (actualY - predY) * (actualY - predY);
      }
      return sum / n;
    }

    // ------------------------------------------------------
    
    private static double[] MatVecProd(double[][] M,
      double[] v)
    {
      // return a regular vector
      int nRows = M.Length;
      int nCols = M[0].Length;
      int n = v.Length;
      if (nCols != n)
        throw new Exception("non-conform in MatVecProd");

      double[] result = new double[nRows];
      for (int i = 0; i "lt" nRows; ++i)
        for (int k = 0; k "lt" nCols; ++k)
          result[i] += M[i][k] * v[k];

      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatToDesign(double[][] M)
    {
      // add a column of 1s
      int nRows = M.Length;
      int nCols = M[0].Length;
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols + 1];

      for (int i = 0; i "lt" nRows; ++i)
      {
        result[i][0] = 1.0;
        for (int j = 1; j "lt" nCols + 1; ++j)
          result[i][j] = M[i][j - 1];
      }
      return result;
    }

  } // class LinearRegressor

  // ========================================================

  public class QRHouseholder
  {
    // container for MP pseudo-inverse via QR-Householder
    // A = Q * R
    // pinv(A) = inv(R) * inv(Q)  note order matters
    //         = inv upper tri (easy) * transpose (easy)

    public static double[][] MatPinv(double[][] M)
    {
      double[][] Q; double[][] R;
      MatDecompQR(M, out Q, out R);  // Householder
      double[][] Ri = MatInvUpperTri(R);
      double[][] Qi = MatTranspose(Q);
      double[][] result = MatProduct(Ri, Qi);
      return result;
    }

    // ------------------------------------------------------

    public static double[][] MatInvUpperTri(double[][] U)
    {
      int n = U.Length;  // must be square matrix

      double[][] result = MatMake(n, n);
      for (int i = 0; i "lt" n; ++i)
        result[i][i] = 1.0;
      for (int k = 0; k "lt" n; ++k)
      {
        for (int j = 0; j "lt" n; ++j)
        {
          for (int i = 0; i "lt" k; ++i)
          {
            result[j][k] -= result[j][i] * U[i][k];
          }
          result[j][k] /= U[k][k];
        }
      }
      return result;
    }

    // ------------------------------------------------------

    public static double[][] MatMake(int nRows, int nCols)
    {
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols];
      return result;
    }

    // ------------------------------------------------------

    public static double[][] MatTranspose(double[][] M)
    {
      int nRows = M.Length;
      int nCols = M[0].Length;
      double[][] result = MatMake(nCols, nRows);
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[j][i] = M[i][j];
      return result;
    }

    // ------------------------------------------------------

    public static double[][] MatProduct(double[][] A,
      double[][] B)
    {
      int aRows = A.Length; int aCols = A[0].Length;
      int bRows = B.Length; int bCols = B[0].Length;
      if (aCols != bRows)
        throw new Exception("Non-conformable matrices");

      double[][] result = new double[aRows][];
      for (int i = 0; i "lt" aRows; ++i)
        result[i] = new double[bCols];

      for (int i = 0; i "lt" aRows; ++i) // each row of A
        for (int j = 0; j "lt" bCols; ++j) // each col of B
          for (int k = 0; k "lt" aCols; ++k)
            result[i][j] += A[i][k] * B[k][j];

      return result;
    }

    // ------------------------------------------------------

    public static void MatDecompQR(double[][] A, 
      out double[][] Q,  out double[][] R)
    {
      int m = A.Length; int n = A[0].Length;
      if (m "lt" n)
        Console.WriteLine("FATAL: nRows must be gte nCols");

      double[][] QQ = MatMake(m, m); // working full Q
      for (int i = 0; i "lt" m; ++i)
        QQ[i][i] = 1.0;  // identity matrix

      double[][] RR = MatMake(m, n);
      for (int i = 0; i "lt" m; ++i)
        for (int j = 0; j "lt" n; ++j)
          RR[i][j] = A[i][j]; // copy of A is working R

      int k = Math.Min(m, n);  // or just use n
      for (int j = 0; j "lt" k; ++j) // main processing loop
      {
        int xn = m - j;
        double[] x = new double[xn];
        for (int i = 0; i "lt" xn; ++i)
          x[i] = RR[j + i][j];

        double ss = 0.0;
        for (int i = 0; i "lt" xn; ++i)
          ss += x[i] * x[i];
        double normX = Math.Sqrt(ss);

        // if (normX == 0.0) continue;
        if (Math.Abs(normX) "lt" 1.0e-12) continue;

        double sign;
        if (x[0] "gte" 0.0) sign = -1.0;
        else sign = 1.0; // counter-intuitive
      
        double[] u = new double[xn];
        for (int i = 0; i "lt" xn; ++i)
          u[i] = x[i] / (x[0] - sign * normX); // check div 0
        u[0] = 1.0;

        // compute scaling factor tau = 2 / (u^T * u)
        double tau = -sign * (x[0] - sign * normX) / normX;

        // dimensions for sub-matrices
        int nRowsSubR = m - j;   int nColsSubR = n - j;
        int nRowsSubQ = m;       int nColsSubQ = m - j;

        double[] vr = new double[nColsSubR];
        for (int c = 0; c "lt" nColsSubR; ++c)
        {
          double acc = 0.0;
          for (int r = 0; r "lt" nRowsSubR; ++r)
            acc += u[r] * RR[j + r][j + c];
          vr[c] = acc;
        }

        double[] vq = new double[nRowsSubQ];
        for (int r = 0; r "lt" nRowsSubQ; ++r)
        {
          double acc = 0.0;
          for (int c = 0; c "lt" nColsSubQ; ++c)
            acc += u[c] * QQ[r][j + c];
          vq[r] = acc;
        }

        // update sub-R
        for (int r = 0; r "lt" nRowsSubR; ++r)
          for (int c = 0; c "lt" nColsSubR; ++c)
            RR[j + r][j + c] -= tau * u[r] * vr[c];

        // update sub-Q
        for (int r = 0; r "lt" nRowsSubQ; ++r)
          for (int c = 0; c "lt" nColsSubQ; ++c)
            QQ[r][j + c] -= tau * vq[r] * u[c];
       
      } // j

      // extract QQ RR into out params
      Q = MatMake(m, n);
      for (int i = 0; i "lt" m; ++i)
        for (int j = 0; j "lt" n; ++j)
          Q[i][j] = QQ[i][j];

      R = MatMake(n, n);
      for (int i = 0; i "lt" n; ++i)
        for (int j = 0; j "lt" n; ++j)
          R[i][j] = RR[i][j];

      return;
    } // MatDecompQR

  } // class QRHouseholder


  // ========================================================

} // ns

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning | Leave a comment

Using a Decision Tree for Anomaly Detection (Anomaly Forest) Implemented With C#

I came across an anomaly detection idea that I hadn’t seen before. It’s called an Isolation Forest. Briefly, if you send data to a decision tree regressor, data items that are anomalous will end up closer to the root node of the tree.

For example, suppose you have just four data items:

[0] xx  xx  xx  19
[1] xx  xx  xx  35
[2] xx  xx  xx  72
[3] xx  xx  xx  18

Each line/item represents a person. The first three values are things like income, debt, savings. The fourth item is person age. If you apply decision tree regression with age as the dependent variable, the first split would send items [0], [1], [3] to the root left child, and send item [2] to the root right child, because the ages for items [0], [1], [3] are relatively similar, but the age of 72 for item [2] is much different.

The resulting tree might look like:

                  node0 (root)                  Level 0
             rows [0] [1] [2] [3]

     node1                        node2         Level 1
   rows [0] [2] [3]              row [1]

 node3         node4         node5       node6  Level 2
rows [0] [3]  row [2]        empty       empty

So if age is the dependent variable, a decision tree will place age 72, the anomalous age, in a node that near to the root. Now if you repeat the process, using each column in turn as the dependent variable, and then track which data item appears in the lowest level on average, then you have identified the relatively most anomalous data item.

The scikit-learn library has an IsolationForest module, but I didn’t like several of the details, especially the fact that the scikit module splits randomly instead of using variance reduction. So, I put together my own version that splits in the usual, non-random way. I used C#.

The output of my demo is:

Begin Anomaly Forest demo

Loading synthetic data (200)

First three data items:
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065
  0.0776 -0.1616  0.3704 -0.5911  0.7562
 -0.9452  0.3409 -0.1654  0.1174 -0.7192

Creating Anomaly Forest object
Done

Analyzing dataset
Using col 0 as dependent variable
Using col 1 as dependent variable
Using col 2 as dependent variable
Using col 3 as dependent variable
Using col 4 as dependent variable
Done

First three anomaly scores:
[0]  8.4000
[1]  10.0000
[2]  9.8000

Most anomalous data item = [122]
Anomaly score = 6.0000

End demo

I called the technique Anomaly Forest to distinguish it from the scikit Isolation Forest.

Ultimately, my Anomaly Forest demo is OK, but because it treats each data column separately, the technique does not take into account interactions between the column variables — so it’s sort of a naive Bayes anomaly detection in a sense. But good fun.



Anomaly detection is like finding a hidden pattern in a dataset. The cover of every issue of Playboy Magazine (except the first one in December 1953), has a bunny logo somewhere. On most covers, the logo is in plain sight, but on some covers, the logo is cleverly hidden.

Left: The logo is disguised as holly leaves on the model’s hat. Right: The logo is disguised as light reflections on the model’s vinyl raincoat, on the lapel.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor chokes on symbols).

using System;
using System.IO;
using System.Collections.Generic;

namespace AnomalyForest
{
  internal class AnomalyForestProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin Anomaly Forest demo ");

      // 1. load data
      Console.WriteLine("\nLoading synthetic data (200) ");
      string dataFile =
        "..\\..\\..\\Data\\synthetic_200.txt";
      int[] colsX = new int[] { 0, 1, 2, 3, 4 };
      double[][] data = MatLoad(dataFile, colsX, ',', "#");
      
      Console.WriteLine("\nFirst three data items: ");
      for (int i = 0; i "lt" 3; ++i)
        VecShow(data[i], 4, 8);

      Console.WriteLine("\nCreating Anomaly Forest object ");
      AnomalyDetector ad = new AnomalyDetector(10);
      Console.WriteLine("Done ");
      Console.WriteLine("\nAnalyzing dataset ");
      ad.Analyze(data);
      Console.WriteLine("Done ");

      Console.WriteLine("\nFirst three anomaly scores: ");
      for (int i = 0; i "lt" 3; ++i)
        Console.WriteLine("[" + i + "]  " +
          ad.scores[i].ToString("F4"));

      int[] indices = new int[data.Length];
      for (int i = 0; i "lt" indices.Length; ++i)
        indices[i] = i;

      int minIndx = 0; ;
      double minScore = ad.scores[0]; ;
      for (int i = 0; i "lt" ad.scores.Length; ++i)
      {
        if (ad.scores[i] "lt" minScore)
        {
          minScore = ad.scores[i];
          minIndx = i;
        }
      }
      Console.WriteLine("\nMost anomalous data item = [" +
        minIndx + "]");
      Console.WriteLine("Anomaly score = " +
        minScore.ToString("F4"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main

    // ------------------------------------------------------
    // helpers for Main()
    // ------------------------------------------------------

    static double[][] MatLoad(string fn, int[] usecols,
      char sep, string comment)
    {
      List"lt"double[]"gt" result =
        new List"lt"double[]"gt"();
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        string[] tokens = line.Split(sep);
        List"lt"double"gt" lst = new List"lt"double"gt"();
        for (int j = 0; j "lt" usecols.Length; ++j)
          lst.Add(double.Parse(tokens[usecols[j]]));
        double[] row = lst.ToArray();
        result.Add(row);
      }
      sr.Close(); ifs.Close();
      return result.ToArray();
    }

    static void VecShow(double[] vec, int dec, int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString("F" + dec).
        PadLeft(wid));
      Console.WriteLine("");
    }

    static void VecShow(int[] vec, int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString().PadLeft(wid));
      Console.WriteLine("");
    }

  } // class Program

  // ========================================================

  public class AnomalyDetector
  {
    public int maxDepth;  // each tree
    public double[] scores; // one score per data item

    public AnomalyDetector(int maxDepth)
    {
      this.maxDepth = maxDepth;
      this.scores = new double[0];  // null-ish
    }

    public void Analyze(double[][] data)
    {
      int nRows = data.Length;
      int nCols = data[0].Length;
      this.scores = new double[nRows];

      for (int c = 0; c "lt" nCols; ++c)
      {
        Console.WriteLine("Using col " + c +
          " as dependent variable ");
        // make a trainX and trainY
        double[] trainY = new double[nRows];
        double[][] trainX = MatMake(nRows, nCols - 1);
        
        for (int i = 0; i "lt" nRows; ++i)
        {
          int p = 0;  // points into cols of trainX
          for (int j = 0; j "lt" nCols; ++j)
          {
            if (j == c) // at the special column
            {
              trainY[i] = data[i][j];
            }
            else
            {
              trainX[i][p++] = data[i][j];
            }
          } // j
        } // i

        // make a tree minSamples = 2, minLeaf = 1
        DecisionTreeRegressor t = 
          new DecisionTreeRegressor(this.maxDepth, 2, 1, -1);
        t.Train(trainX, trainY);

        // scan tree nodes for assigned rows in leaf nodes
        for (int id = 0; id "lt" t.tree.Count; ++id)
        {
          int level = 
            (int)Math.Truncate(Math.Log2((double)id + 1));
          if (t.tree[id] != null &&
            t.tree[id].rows != null &&
            t.tree[id].isLeaf == true &&
            t.tree[id].rows.Count "gte" 1)
          {
            for (int r = 0; r "lt" t.tree[id].rows.Count; ++r)
            {
              int currRow = t.tree[id].rows[r];
              this.scores[currRow] += level;
            }
          }
        } // each node

      } // c, each column

      // normalize scores relative to number cols
      for (int i = 0; i "lt" this.scores.Length; ++i)
        scores[i] /= nCols;

    } // Analyze()

    private static double[][] MatMake(int nRows, int nCols)
    {
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols];
      return result;
    }

  } // class AnomalyDetector

  // ========================================================

  public class DecisionTreeRegressor
  {
    public int maxDepth;
    public int minSamples;  // aka min_samples_split
    public int minLeaf;  // min number of values in a leaf
    public int numSplitCols; // mostly for random forest
    public List"lt"Node"gt" tree = new List"lt"Node"gt"();
    public Random rnd;  // order in which cols are searched

    public double[][] trainX;  // store data by ref
    public double[] trainY;

    // ------------------------------------------------------

    public class Node
    {
      public int id;
      public int colIdx;  // aka featureIdx
      public double thresh;
      public int left;  // index into List
      public int right;
      public double value;
      public bool isLeaf;
      public List"lt"int"gt" rows;  // assoc rows in train data

      public Node()
      {
        this.id = -1;
        this.colIdx = -1;
        this.thresh = 0.0;  // aka split value
        this.left = -1;
        this.right = -1;
        this.value = 0.0;  // aka pred y
        this.isLeaf = false;
        this.rows = null;
      }
    } // class Node

    // --------------------------------------------

    public DecisionTreeRegressor(int maxDepth = 2,
      int minSamples = 2, int minLeaf = 1,
      int numSplitCols = -1, int seed = 0)
    {
      // if maxDepth = 0, tree has just a root node
      // if maxDepth = 1, at most 3 nodes (root, l, r)
      // if maxDepth = n, at most 2^(n+1) - 1 nodes
      this.maxDepth = maxDepth;
      this.minSamples = minSamples;
      this.minLeaf = minLeaf;
      this.numSplitCols = numSplitCols;  // for ran. forest

      // create full tree List with null nodes
      int numNodes = (int)Math.Pow(2, (maxDepth + 1)) - 1;
      for (int i = 0; i "lt" numNodes; ++i)
      {
        this.tree.Add(null);  // empty nodes
      }
      this.rnd = new Random(seed);
    }

    // ------------------------------------------------------
    // public: Train()
    // helpers: MakeTree(), BestSplit(), TreeTargetMean(),
    //   TreeTargetVariance().
    // ------------------------------------------------------

    public void Train(double[][] trainX, double[] trainY)
    {
      this.trainX = trainX; // 
      this.trainY = trainY;
      this.MakeTree();
    }

    // ------------------------------------------------------
    
    private void MakeTree()
    {
      // no recursion, no pointers, List storage, no stack
      if (this.numSplitCols == -1) // use all cols
        this.numSplitCols = this.trainX[0].Length;

      // prepare root node
      List"lt"int"gt" allRows = new List"lt"int"gt"();
      for (int i = 0; i "lt" this.trainX.Length; ++i)
        allRows.Add(i);
      double grandMean = this.TreeTargetMean(allRows);

      // wait to supply colIdx and thresh in loop
      Node root = new Node();
      root.id = 0;
      root.left = 1;
      root.right = 2;
      root.value = grandMean;
      root.isLeaf = false; // already set
      root.rows = allRows;
      this.tree[0] = root;

      for (int i = 0; i "lt" this.tree.Count; ++i)
      {
        Node currNode = this.tree[i];
        // curr node has values for everything
        // except colIdx and thresh

        // curr node too deep to have children OR
        // curr node not enough rows to split then
        // leave both children as null
        if (currNode == null ||
          currNode.rows.Count == 0) { continue; }

        // if parent cannot be split, make parent a leaf
        if (currNode.id "gte" (int)Math.Pow(2,
          (this.maxDepth)) - 1 ||
          currNode.rows.Count "lt" this.minSamples)
        {
          currNode.isLeaf = true;
          continue;
        }

        // parent has enough rows to try to split
        double[] splitInfo = this.BestSplit(currNode.rows);
        int colIdx = (int)splitInfo[0];
        double splitVal = splitInfo[1]; //split value

        if (colIdx == -1)  // unable split, is a leaf
        {
          currNode.isLeaf = true;
          continue;
        }

        // complete the fields for curr node
        currNode.colIdx = colIdx;
        currNode.thresh = splitVal;

        // construct the children, 
        // except for colIdx and thresh
        // which will be supplied in main loop
        Node leftNode = new Node();
        Node rightNode = new Node();

        // construct children rows using split info
        // all info except colIdx and thresh
        List"lt"int"gt" leftIdxs = new List"lt"int"gt"();
        List"lt"int"gt" rightIdxs = new List"lt"int"gt"();
        for (int k = 0; k "lt" currNode.rows.Count; ++k)
        {
          int r = currNode.rows[k];
          if (this.trainX[r][colIdx] "lte" splitVal)
            leftIdxs.Add(r);
          else
            rightIdxs.Add(r);
        }

        leftNode.id = currNode.id * 2 + 1;
        if (leftNode.id "gt" (int)Math.Pow(2,
          (maxDepth + 1)) - 2) leftNode.id = -1;
        leftNode.left = leftNode.id * 2 + 1;
        if (leftNode.left "gt" (int)Math.Pow(2,
          (maxDepth + 1)) - 2) leftNode.left = -1;
        leftNode.right = leftNode.id * 2 + 2;
        if (leftNode.right "gt" (int)Math.Pow(2,
          (maxDepth + 1)) - 2) leftNode.right = -1;

        leftNode.rows = leftIdxs;
        leftNode.value =
          this.TreeTargetMean(leftNode.rows);
        this.tree[leftNode.id] = leftNode;

        rightNode.id = currNode.id * 2 + 2;
        if (rightNode.id "gt" (int)Math.Pow(2,
          (maxDepth + 1)) - 2) rightNode.id = -1;
        rightNode.left = rightNode.id * 2 + 1;
        if (rightNode.left "gt" (int)Math.Pow(2,
          (maxDepth + 1)) - 2) rightNode.left = -1;
        rightNode.right = rightNode.id * 2 + 2;
        if (rightNode.right "gt" (int)Math.Pow(2,
          (maxDepth + 1)) - 2) rightNode.right = -1;
        rightNode.rows = rightIdxs;
        rightNode.value =
          this.TreeTargetMean(rightNode.rows);
        this.tree[rightNode.id] = rightNode;

      } // i
      return;
    }

    // ------------------------------------------------------

    private double[] BestSplit(List"lt"int"gt" rows)
    {
      // implicit params numSplitCols, minLeaf, numSplitCols
      // result[0] = best col idx (as double)
      // result[1] = best split value
      rows.Sort();

      int bestColIdx = -1;  // indicates bad split
      double bestThresh = 0.0;
      double bestVar = double.MaxValue;  // smaller is better

      int nRows = rows.Count;  // or dataY.Length
      int nCols = this.trainX[0].Length;

      if (nRows == 0)
      {
        throw new Exception("empty data in BestSplit()");
      }

      // process cols in scrambled order
      int[] colIndices = new int[nCols];
      for (int k = 0; k "lt" nCols; ++k)
        colIndices[k] = k;
      // shuffle, inline Fisher-Yates
      int n = colIndices.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        int ri = rnd.Next(i, n);  // be careful
        int tmp = colIndices[i];
        colIndices[i] = colIndices[ri];
        colIndices[ri] = tmp;
      }

      // numSplitCols is usually all columns (-1)
      for (int j = 0; j "lt" this.numSplitCols; ++j)
      {
        int colIdx = colIndices[j];
        HashSet"lt"double"gt" examineds =
          new HashSet"lt"double"gt"();

        for (int i = 0; i "lt" nRows; ++i) // each row
        {
          // if curr thresh been seen, skip it
          double thresh = this.trainX[rows[i]][colIdx];
          if (examineds.Contains(thresh)) continue;
          examineds.Add(thresh);

          // get row idxs where x is lte, gt thresh
          List"lt"int"gt" leftIdxs = new List"lt"int"gt"();
          List"lt"int"gt" rightIdxs = new List"lt"int"gt"();
          for (int k = 0; k "lt" nRows; ++k)
          {
            if (this.trainX[rows[k]][colIdx] "lte" thresh)
              leftIdxs.Add(rows[k]);
            else
              rightIdxs.Add(rows[k]);
          }

          // Check if proposed split has too few values
          if (leftIdxs.Count "lt" this.minLeaf ||
            rightIdxs.Count "lt" this.minLeaf)
            continue;  // to next row

          double leftVar =
            this.TreeTargetVariance(leftIdxs);
          double rightVar =
            this.TreeTargetVariance(rightIdxs);
          double weightedVar = (leftIdxs.Count * leftVar +
            rightIdxs.Count * rightVar) / nRows;

          if (weightedVar "lt" bestVar)
          {
            // if this never happens, bestColIdx remains -1
            // which means a bad split. used in MakeTree()
            bestColIdx = colIdx;
            bestThresh = thresh;
            bestVar = weightedVar;
          }

        } // each row
      } // j each col

      double[] result = new double[2];  // out params ugly
      result[0] = 1.0 * bestColIdx;
      result[1] = bestThresh;
      return result;

    } // BestSplit()

    // ------------------------------------------------------

    private double TreeTargetMean(List"lt"int"gt" rows)
    {
      // mean of rows items in trainY: for node prediction
      double sum = 0.0;
      for (int i = 0; i "lt" rows.Count; ++i)
      {
        int r = rows[i];
        sum += this.trainY[r];
      }
      return sum / rows.Count;
    }

    // ------------------------------------------------------

    private double TreeTargetVariance(List"lt"int"gt" rows)
    {
      double mean = this.TreeTargetMean(rows);
      double sum = 0.0;
      for (int i = 0; i "lt" rows.Count; ++i)
      {
        int r = rows[i];
        sum += (this.trainY[r] - mean) *
          (this.trainY[r] - mean);
      }
      return sum / rows.Count;
    }

    // ------------------------------------------------------

  } // class DecisionTreeRegressor

  
  // ========================================================

} // ns

Demo data

# synthetic_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402
Posted in Machine Learning | Leave a comment

An Example When a Machine Learning Regression Model Has a Negative R2 Score

If you have a machine learning regression model that predicts a single numeric value, the three most common ways to evaluate the model are mean squared error (MSE), accuracy, and coefficient of determination (R2).

Note that root mean squared error (RMSE) is just the square root of MSE. This is useful when the target variable to predict has units, such as dollars. MSE has units “dollars-squared” but RMSE has units “dollars”.

MSE is reasonably interpretable, for example, if MSE = 0, the model predicts perfectly. But MSE has no upper limit, and MSE depends on how the data items are scaled.

Accuracy is very interpretable, for example, if accuracy = 75% and there are 200 data items, the model predicts 150 out of 200 correctly. But accuracy requires an arbitrary percentage closeness (perhaps 10%) that determines if a prediction is correct or not.

R2 is sort of a cross between MSE and accuracy. It’s often stated that “R2 is a value between 0 and 1 where higher values indicate a more accurate model.” So, an R2 score of 1 means a model predicts perfectly (which is never possible in practice). R2 is calculated as 1 – (SSres / SStot). The SSres is the sum of the squared differences between target y and predicted y values. The SStot is the sum of squared differences between average of target y and predicted y values. (Note R2 is not the same as classical statistics r2 for correlation).

The R2 score doesn’t isn’t affected by data scaling (good) but R2 can be negative (not good), and there’s no theoretical limit to how negative R2 can be.



As far as I know, all scikit-learn library regression modules (LinearRegression, Ridge, KernelRidge, GradientBoostingRegressor, etc.) have a built in score() function that gives the R2 value for the trained model. Implementing R2 from scratch is easy.


Suppose you have a set of training data. The simplest possible regression model is to just predict the average of the target y values in the training data, for any input x. But if you have a really bad regression model that predicts even worse than just returning the average of the target y values, the R2 score will be negative.

Here’s an example:

x0    x1    y        pred y   ss res   ss tot
0.2   0.3   0.34      0.19    0.0225   0.0841
0.6   0.5   0.86      0.61    0.0625   0.0529
0.3   0.7   0.79      0.44    0.1225   0.0256
0.6   0.2   0.56      0.46    0.0100   0.0049
0.1   0.6   0.61      0.31    0.0900   0.0004
0.3   0.4   0.49      0.29    0.0400   0.0196
0.5   0.5   0.75      0.50    0.0625   0.0144
0.2   0.6   0.64      0.34    0.0900   0.0001
                  
            5.04              0.5000   0.2020
            0.63            
                              R2 = -1.4752   

The target y values are calculated by an unseen y = (x0 * x0) + x1. The predicted y values are computed by the terrible model y’ = (x0 * x0) – x1. R2 = 1 – (0.5000 / 0.2020) = 1 – 2.4752 = -1.4752.

The moral of this blog post is that there’s no single best way to evaluate a regression model.



Evaluating a regression model is mostly objective. Evaluating the name of a hotel is subjective. But here are a couple of names of foreign hotels that don’t resonate well in English.


Demo program.

# ridge_scikit_scratch_r2.py
# scratch R2 with scikit Ridge
# just for fun

import numpy as np
from sklearn.linear_model import Ridge

# Ridge(alpha=1.0, *, fit_intercept=True, copy_X=True,
# max_iter=None, tol=0.0001, solver='auto', positive=False,
# random_state=None)

# -----------------------------------------------------------

def r2(model, data_X, data_y):
  n = len(data_X)
  ss_res = 0.0
  ss_tot = 0.0
  mean_y = np.mean(data_y)
  for i in range(n):
    x = data_X[i]
    y = data_y[i]
    pred_y = model.predict([x])[0]
    ss_res += (y - pred_y) * (y -pred_y) # inefficient . .
    ss_tot += (y - mean_y) * (y -mean_y)

  r2 = 1.0 - (ss_res / ss_tot) # assume non-zero . .
  return r2

# -----------------------------------------------------------

print("\nBegin from-scratch R2 demo ")

np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')

print("\nLoading synthetic train (20) data ")
train_Xy = np.loadtxt(".\\Data\\synthetic_train_20.txt",
  usecols=[0,1,2,3,4,5], delimiter=",")
train_X = train_Xy[:,[0,1,2,3,4]]
train_y = train_Xy[:,5]

print("\nFirst three train X: ")
for i in range(3):
  print(train_X[i])
print("\nFirst three train y: ")
for i in range(3):
  print("%0.4f " % train_y[i])

print("\nCreating scikit Ridge model ")
alpha = 1.0
print("Using default L2 alpha = %0.4f " % alpha)
model = Ridge(alpha=alpha, solver='sag',
  fit_intercept=True, random_state=0)
print("Done ")

print("\nTraining scikit Ridge model ")
print("Using SAG with default training params ")
model.fit(train_X, train_y)
print("Done. Used " + str(model.n_iter_) + " iterations" )

print("\nModel weights: ")
print(model.coef_)
print("Model bias = %0.4f " % model.intercept_)

r2_scikit = model.score(train_X, train_y)
r2_scratch = r2(model, train_X, train_y)

print("\nR2 using built-in score() = %0.4f " % r2_scikit)
print("R2 using from-scratch = %0.4f " % r2_scratch)

print("\nEnd demo ")

Demo data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
Posted in Machine Learning | Leave a comment

Why You Should Never Use L1 Regularization or Lasso Regression

There is absolutely no reason to ever use L1 regularization. Well, maybe that’s a bit too extreme of a statement. But, “I never use L1 regularization or lasso regression unless I’m forced to by a legacy system” is accurate.

I am mildly baffled that the L1 regularization technique even exists. L2 regularization is superior to L1 in almost every scenario that I have ever encountered in several decades of machine learning experience.

Note that L1 regularization applied to linear regression is called lasso (the somewhat pretentious “Least Absolute Shrinkage and Selection Operator”) regression, and L2 regularization applied to linear regression is called ridge regression. The ugly elastic net regression uses L1 and L2 regularization together.

The only argument for using L1 regularization instead of L2 is, “L1 regularization can drive model weight(s) to exactly zero but L2 cannot.” This is true, but because L2 regularization can drive weight(s) to arbitrarily close to zero, you can use L2 and then, only if absolutely necessary, manually set the very small weight(s) to zero.

Setting a model weight to zero throws away information, and it’s very rare to encounter a scenario where this was a good idea. A predictor variable has “zero importance” only if there is exactly zero correlation to the target variable, which almost never happens with real-life data.



An example of L1 regularization for linear regression (lasso regression) using the scikit Lasso module. Not recommended.


The only L1 scenario I can even remotely imagine is one where you have thousands of predictor variables and you absolutely must get rid of some. I have never seen this scenario in practice, and even if it did occur, L2 regularization can do the job. Or you can preprocess the data using PCA or similar dimensionality reduction.

There are three reasons why L2 regularization is technically better than L1.

1.) L2 regularization works with any kind of training — iterative SGD training and closed form training (left pseudo-inverse via normal equations, and Moore-Penrose pseudo-inverse). L1 works only with SGD, and even then you must use a minor variation called sub-gradient stochastic gradient descent (SSGD). When closed form training works, it is usually much better than SGD because you don’t need to tune the learning rate and number of training iterations.

2.) L2 regularization is significantly better at dealing with multicollinearity in data. If two features are correlated, L1 will arbitrarily pick one feature and zero out the other, but L2 distributes weights evenly.

3.) Because L2 regularization doesn’t eliminate any predictor weights by setting them to zero, L2 regularized models often have better prediction accuracy than L1 models on new, non-training data. I have never seen L1 regularization out-perform L2 regularization (even though such scenarios must exist).

In short, L2 regularization is significantly better technically than L1, and L2 can identify unimportant predictors better than L1 can. In other words, there is really no reason to ever use L1 regularization, unless you are forced to by some sort of architecture design, or a legacy requirment.

By the way, a minor disadvantage of both L1 and L2 regularization is that they both require the training data to be scaled, typically to mean = 0 and unit variance. This is isn’t a huge deal but can be mildly annoying in practice.



The vast majority of machine learning techniques are good, but there are techniques like L1 regularization that are technical embarrassments when used naively by beginners.

I know very little about, and have very little interest in, politics in my country (US). But I do know that the vast majority of former First Ladies of the United States were examples of poise, grace, charm, and class. Melania Trump, Jacqueline Kennedy, and Hilary Clinton come to mind. But then there was an absolute disgrace and embarrassment to the entire cuntry, at least in terms of class and charm.


Demo program. Replace “lt” in accuracy() with less-than operator symbol. (My blog editor chokes om symbols).

# lasso_scikit.py
# L1 regularization. Not recommended!

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn import linear_model

np.set_printoptions(precision=4, suppress=True,
  floatmode='fixed', linewidth=60)

# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
  n = len(data_X)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)[0]

    if np.abs(y - y_pred) "lt" np.abs(y * pct_close):
      n_correct += 1
    else: 
      n_wrong += 1
  return n_correct / (n_correct + n_wrong)

def mse(model, data_X, data_y):
  n = len(data_X)
  sum = 0.0
  for i in range(n):
    actual_y = data_y[i]
    pred_y = model.predict(data_X[i].reshape(1, -1))[0]
    diff = actual_y - pred_y
    sum += diff * diff
  return sum /n

# -----------------------------------------------------------

print("\nBegin lasso (L1) regression using scikit ")
print("Note: Not a recommended technique ")
print("L2 regularization is better in all scenarios ")

print("\nLoading train (200) and test (40) data ")
train_Xy = np.loadtxt(".\\Data\\synthetic_train_200.txt",
  usecols=[0,1,2,3,4,5], delimiter=",")
train_X = train_Xy[:,[0,1,2,3,4]]
train_y = train_Xy[:,5]

test_Xy = np.loadtxt(".\\Data\\synthetic_test_40.txt",
  usecols=[0,1,2,3,4,5], delimiter=",")
test_X = test_Xy[:,[0,1,2,3,4]]
test_y = test_Xy[:,5]

print("\nFirst three train X: ")
for i in range(3):
  print(train_X[i])
print("\nFirst three train y: ")
for i in range(3):
  print("%0.4f " % train_y[i])

# Lasso(alpha=1.0, *, fit_intercept=True, precompute=False,
# copy_X=True, max_iter=1000, tol=0.0001, warm_start=False,
# positive=False, random_state=None, selection='cyclic')

print("\nCreating and using data scaler on train X ")
scaler = StandardScaler()
scaler.fit(train_X)
scaled_train_X = scaler.transform(train_X)
print("\nFirst three scaled train X: ")
for i in range(3):
  print(scaled_train_X[i])
scaled_test_X = scaler.transform(test_X)

alpha = 0.01
print("\nCreating lasso model, alpha = %0.4f" % alpha)
model = linear_model.Lasso(alpha=alpha, random_state=0)
print("Done ")

print("\nTraining model ")
model.fit(scaled_train_X, train_y)
print("Done. ")

print("\nModel weights: ")
print(model.coef_)
print("Model bias = %0.4f " % model.intercept_)

print("\nEvaluating model ")
acc_train = accuracy(model, scaled_train_X, train_y, 0.10)
acc_test = accuracy(model, scaled_test_X, test_y, 0.10)
print("\nAccuracy (within 0.10) train = %0.4f " % \
  acc_train)
print("Accuracy (within 0.10) test = %0.4f " % \
  acc_test)

mse_train = mse(model, scaled_train_X, train_y)
mse_test = mse(model, scaled_test_X, test_y)
print("\nMSE train = %0.4f " % mse_train)
print("MSE test = %0.4f " % mse_test)

x = train_X[0]
print("\nPredicting for x = ")
print(x)
scaled_x = scaler.transform([x])
pred_y = model.predict(scaled_x)[0]
print("Predicted y = %0.4f " % pred_y)

print("\nEnd demo ")

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning, Scikit | Leave a comment

Support Vector Regression Using From-Scratch-Python SVR with SGD Applied to the Diabetes Dataset

I write code almost every day. Like most skills, writing code is something that must be practiced. And besides that, I enjoy writing code. I had recently implemented a version of kernel support vector regression (SVR), from-scratch, using Python. So, one evening after work, I figured I’d run the scikit Diabetes Dataset through my from-scratch support vector regression model.

Based on previous experiments with linear regression, quadratic regression, neural network regression, kernel ridge regression, random forest regression, and AdaBoost regression, I was certain that the from-scratch support vector regression model would give poor prediction accuracy. My goal was not to create a good regression model — I wanted to validate my from-scratch SVR implementation by comparing results with the scikit library SVR module.

The raw Diabetes Dataset looks like:

59, 2, 32.1, 101.00, 157,  93.2, 38, 4.00, 4.8598, 87, 151
48, 1, 21.6,  87.00, 183, 103.2, 70, 3.00, 3.8918, 69,  75
72, 2, 30.5,  93.00, 156,  93.6, 41, 4.00, 4.6728, 85, 141
. . .

The dataset has 442 items. Each item represents a patient and has 10 predictor values followed by a target value to predict. The 10 predictor variables are age in column [0], sex [1], body mass index [2], blood pressure [3], serum cholesterol [4], low-density lipoproteins [5], high-density lipoproteins [6], total cholesterol [7], triglycerides [8], blood sugar [9]. The stated (in the documentation) target value to predict in the last column is a measure of diabetes [10].

The sex encoding isn’t explained anywhere but I suspect male = 1, female = 2 because there are 235 1 values and 206 2 values).

Note that this Diabetes Dataset, which is included as an example dataset in the Python language scikit-learn library, is not the same as the Pima Diabetes Dataset from the UCI dataset repository. See https://jamesmccaffreyblog.com/2026/02/03/the-origin-and-history-of-scikit-learn-diabetes-dataset/.

When using SVR, it’s a good idea to normalize predictor values so that a predictor variable with very large magnitude values doesn’t overwhelm the other predictor variables.

I converted the sex values from 1,2 into 0,1. Then I applied divide-by-constant normalization by dividing the 10 predictor columns by (100, 1, 100, 1000, 1000, 1000, 100, 10, 10, 1000) and the target y values by 1000. The resulting encoded and normalized data looks like:

0.5900, 1.0000, 0.3210, . . . 0.1510
0.4800, 0.0000, 0.2160, . . . 0.0750
0.7200, 1.0000, 0.3050, . . . 0.1410
. . .

I split the 442-items into a 342-item training set and a 100-item test set.

There are two versions of support vector regression. The kernel version is much more powerful than the linear version — so much so that linear SVR is essentially useless. I use kernel SVR.

First I ran the scikit SVR model. The key output is:

Begin scikit SVR on Diabetes Dataset demo

Loading diabetes train (342), test (100) data
Done

Creating scikit SVR model
Setting gamma = 0.3000
Setting C = 1.0000
Setting epsilon = 0.0100
Done

Training SVR model
Done.

Number support vectors =
299

Evaluating model

Accuracy (within 0.10) train = 0.1959
Accuracy (within 0.10) test = 0.1848

MSE train = 0.0029
MSE test = 0.0031

End demo

The number of support vectors generated by scikit SVR (299 items out of 342 training items) is determined implicitly by complex interactions between the gamma, C, and epsilon parameters. So I adjusted my epsilon value to 0.05 so that my from-scratch version ended up with roughly the same number of support vectors (292) as the scikit model (299).

The key output of my from-scratch SVR demo:

Begin scratch SVR using SGD on Diabetes Dataset

Loading diabetes train (342), test (100) data
Done

First three X predictors:
[0.5900 1.0000 0.3210 0.1010 0.1570 0.0932 0.3800 0.4000
 0.4860 0.0870]
[0.4800 0.0000 0.2160 0.0870 0.1830 0.1032 0.7000 0.3000
 0.3892 0.0690]
[0.7200 1.0000 0.3050 0.0930 0.1560 0.0936 0.4100 0.4000
 0.4673 0.0850]

First three y targets:
0.1510
0.0750
0.1410

Creating scratch Python SVR model
Setting gamma = 0.3000
Setting epsilon = 0.0500
Setting C = 1.0000
Setting lrn_rate = 0.0100
Setting max_epochs = 10000
Setting KKT tol = 0.000010

Training SVR model using SGD
epoch =    0  |  MSE = 0.0052
epoch = 2000  |  MSE = 0.0032
epoch = 4000  |  MSE = 0.0030
epoch = 6000  |  MSE = 0.0030
epoch = 8000  |  MSE = 0.0033
Done

Number support vectors = 292

Train accuracy (0.10) = 0.1696
Test accuracy (0.10) = 0.1522

MSE train = 0.0030
MSE test = 0.0031

End demo

My from-scratch SVR implementation gave similar results to the scikit SVR, and so, my experiment gave me some evidence that my from-scratch SVR implementation is probably correct

If you use the built-in load_diabetes() function with a return_X_y=True parameter, column [10] is automatically the target to predict. But I discovered that column [10] cannot be predicted with any meaningful accuracy.

But columns [4], [5], [6], [7], and [8] can be predicted meaningfully. You can load the data as a DataFrame and then specify a different column as the target. Alternatively, you can use preprocessed training data.

Good fun for me.



It’s common for me and my colleagues to refactor software systems many times. It’s rarely possible to a get a non-trivial system completely correct on the first effort. Therefore, a system is usually a collection of software sequels, so to speak, where each sequel is (hopefully) a bit better than its predecessor.

I’m a big fan of old science fiction movies. There have been dozens of sci-fi sequels. “Star Wars” (1977) and “Star Wars: The Empire Strikes Back” (1980). “Alien” (1979) and “Aliens” (1986). And so on. But, in my opinion, most sci-fi movie sequels are worse than their predecessor. However there are exceptions where the sequel is as-good-as, or even better than the original.

Left: In “The Magnetic Monster” (1953), agents from the fictitious Office of Scientific Investigation (OSI) investigate strange magnetic phenomena. A rouge scientist has created a new element that feeds on energy. It doubles its mass every 11 hours. If the menace can’t be stopped, it will eventually grow so large that it will throw Earth out of orbit. My grade = B-

Right: In “Riders to the Stars” (1954), the fictitious Office of Scientific Investigation (OSI) recruits men for a top secret project. The goal: Take a manned spacecraft to capture a meteor to learn why meteors do not disintegrate like steel does, when under constant bombardment from cosmic radiation. My grade = B.


Demo program. Replace “lt” (less than) in the accuracy() function with the Boolean less-than operator symbol. (My blog editor chokes on symbols).

# svr_sgd_scratch_diabetes.py
# SGD training

import numpy as np
from sklearn.svm import SVR  # for validation

# -----------------------------------------------------------

np.set_printoptions(precision=4, suppress=True,
  floatmode='fixed', linewidth=60)

# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
  if data_X.size == 0: return 0.0
  n = len(data_X)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    pred_y = model.predict(x)[0]
    if np.abs(y - pred_y) "lt np.abs(y * pct_close):
      n_correct += 1
    else: 
      n_wrong += 1
  return n_correct / (n_correct + n_wrong)

# -----------------------------------------------------------

def mse(model, data_X, data_y):
  if data_X.size == 0: return -1.0
  n = len(data_X)
  sum = 0.0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    pred_y = model.predict(x)[0]
    diff = pred_y - y
    sum += diff * diff
  return sum /n

# ===========================================================

class KernelSVR:
  def __init__(self, gamma=0.5, epsilon=0.001, C=1.0,
    lr=0.01, max_epochs=100, tol=1.0e-3, seed=0):
    self.gamma = gamma
    self.epsilon = epsilon
    self.C = C  
    self.lr = lr
    self.max_epochs = max_epochs
    self.tol = tol      # for in-epsilon tube
    self.alpha = None   # model weights
    self.b = 0.0
    self.supp_X = None  # pruned train X
    self.supp_y = None
    self.rnd = np.random.RandomState(seed)

  # ---------------------------------------------------------

  def kernel_matrix(self, X):
    n = len(X)
    result = np.zeros((n,n))
    for i in range(0,n):
      for j in range(i,n):
        z = self.rbf(X[i], X[j])
        result[i,j] = z
        result[j,i] = z
    return result

  # ---------------------------------------------------------

  def fit(self, X, y):
    n, dim = X.shape
    self.supp_X = X  # by ref
    self.supp_y = y
    
    self.alpha = np.zeros(n)
    lo = -0.01; hi = 0.01
    for i in range(n):
      self.alpha[i] = (hi - lo) * self.rnd.random() + lo
    self.b = 0.0
    
    # precompute kernel matrix and set regularization
    K = self.kernel_matrix(X)
    lamda = 1.0 / self.C         # do not allow 0
    freq = self.max_epochs // 5  # progress messages
    indices = np.arange(n)

    for epoch in range(self.max_epochs):
      self.rnd.shuffle(indices)
      
      for i in range(len(indices)):
        idx = indices[i]
        pred_y = np.dot(self.alpha, K[:, idx]) + self.b
        error = pred_y - y[idx]
        
        inside_tube = False
        if error "gt" self.epsilon:
          grad_loss = 1.0
        elif error "lt" -self.epsilon:
          grad_loss = -1.0
        else:
          grad_loss = 0.0
          inside_tube = True
          
        # local kernel regularization gradient
        grad_reg = self.alpha[idx] * K[idx, idx]
        
        # decoupled update to the active index
        self.alpha[idx] -= self.lr * \
          (lamda * grad_reg + grad_loss)
        self.b -= self.lr * grad_loss

        if inside_tube == True and \
          abs(self.alpha[idx]) "lt" self.tol:
          self.alpha[idx] = 0.0  # force small wt to zero
        
        # in-loop clip to bound updates mid-flight
        if self.alpha[idx] "lt" -self.C:
          self.alpha[idx] = -self.C
        elif self.alpha[idx] "gt" self.C:
          self.alpha[idx] = self.C

      if epoch % freq == 0:
        m = mse(self, X, y)
        print("epoch = %4d  |  MSE = %0.4f " % (epoch,m))
        pass

    # final global clip to all alphas
    self.alpha = np.clip(self.alpha, -self.C, self.C)

    # prune: store only explicit support vectors
    sv_mask = (np.abs(self.alpha) > 1.0e-5)
    self.supp_X = X[sv_mask]
    self.supp_y = y[sv_mask]
    self.alpha = self.alpha[sv_mask]

    return  # all done

  # ---------------------------------------------------------

  def rbf(self, v1, v2):
    sum = 0.0
    for i in range(len(v1)):
      sum += (v1[i] - v2[i]) * (v1[i] - v2[i])
    return np.exp(-1 * self.gamma * sum)

  def predict_one(self, x):
    # x is a vector
    n = len(self.supp_X)
    sum = 0.0
    for i in range(n):
      xx = self.supp_X[i]
      k = self.rbf(x, xx)
      sum += self.alpha[i] * k
    result = sum + self.b
    return result

  def predict(self, X):
    # X is a matrix
    n = len(X)
    result = np.zeros(n)
    for i in range(n):
      result[i] = self.predict_one(X[i])
    return result

  # ---------------------------------------------------------

  def r2_score(self, data_X, data_y):
    # coefficient of determination == scikit score()
    ss_res = 0.0; ss_tot = 0.0
    n = len(data_X)
    mean_y = np.mean(data_y)
    for i in range(n):
      x = data_X[i].reshape(1,-1)
      y = data_y[i]
      pred_y = self.predict(x)[0]
      ss_res += (y - pred_y) * (y - pred_y)
      ss_tot += (y - mean_y) * (y - mean_y)
    result = 1.0 - (ss_res / ss_tot)
    return result

# ===========================================================

def main():
  print("\nBegin scratch SVR using SGD on Diabetes Dataset ")

  print("\nLoading diabetes train (342), test (100) data ")
  train_file = ".\\Data\\diabetes_norm_train_342.txt"

  cols_X = [0,1,2,3,4,5,6,7,8,9]  
  col_y = 10  # cols # 4 5 6 7 8 are much better
  train_X = np.loadtxt(train_file, comments="#",
    usecols=cols_X, delimiter=",",  dtype=np.float64)
  train_y = np.loadtxt(train_file, comments="#", usecols=col_y,
    delimiter=",", dtype=np.float64)

  test_file = ".\\Data\\diabetes_norm_test_100.txt"
  test_X = np.loadtxt(test_file, comments="#",
    usecols=cols_X, delimiter=",",  dtype=np.float64)
  test_y = np.loadtxt(test_file, comments="#", usecols=col_y,
    delimiter=",",  dtype=np.float64)
  print("Done ")

  # alternative normalization and split
  # from sklearn.datasets import load_diabetes
  # from sklearn.model_selection import train_test_split
  # X, y = load_diabetes(return_X_y=True, scaled=True)
  # train_X, test_X, train_y, test_y = \
  #   train_test_split(X, y, random_state=0)  # 25% test

  print("\nFirst three X predictors: ")
  for i in range(3):
    print(train_X[i])
  print("\nFirst three y targets: ")
  for i in range(3):
    print("%0.4f" % train_y[i])

  # create and train model
  print("\nCreating scratch Python SVR model ")
  gamma = 0.30
  # epsilon = 0.001
  epsilon = 0.05 # 299 -- 
  C = 1.0
  lr = 0.01
  max_epochs = 10000
  tol = 0.00001

  print("Setting gamma = %0.4f " % gamma)
  print("Setting epsilon = %0.4f " % epsilon)
  print("Setting C = %0.4f " % C)
  print("Setting lrn_rate = %0.4f " % lr)
  print("Setting max_epochs = " + str(max_epochs))
  print("Setting KKT tol = %0.6f " % tol)

  print("\nTraining SVR model using SGD ")
  model = KernelSVR(gamma=gamma, epsilon=epsilon, C=C, lr=lr,
    max_epochs=max_epochs, tol=tol)
  model.fit(train_X, train_y)
  print("Done ")

  n_supp = len(model.alpha)
  print("\nNumber support vectors = " + str(n_supp))
    
  acc_train = accuracy(model, train_X, train_y, 0.10)
  print("\nTrain accuracy (0.10) = %0.4f" % acc_train)
  acc_test = accuracy(model, test_X, test_y, 0.10)
  print("Test accuracy (0.10) = %0.4f" % acc_test)

  mse_train = mse(model, train_X, train_y)
  mse_test = mse(model, test_X, test_y)
  print("\nMSE train = %0.4f " % mse_train)
  print("MSE test = %0.4f " % mse_test)

  print("\nEnd demo ")

# -----------------------------------------------------------

if __name__ == "__main__":
  main()

Training data:


# diabetes_norm_train_342.txt
# cols [0] to [9] predictors. col [10] target
# norm division constants:
# 100, -1, 100, 1000, 1000, 1000, 100, 10, 10, 1000, 1000
#
0.5900, 1.0000, 0.3210, 0.1010, 0.1570, 0.0932, 0.3800, 0.4000, 0.4860, 0.0870, 0.1510
0.4800, 0.0000, 0.2160, 0.0870, 0.1830, 0.1032, 0.7000, 0.3000, 0.3892, 0.0690, 0.0750
0.7200, 1.0000, 0.3050, 0.0930, 0.1560, 0.0936, 0.4100, 0.4000, 0.4673, 0.0850, 0.1410
0.2400, 0.0000, 0.2530, 0.0840, 0.1980, 0.1314, 0.4000, 0.5000, 0.4890, 0.0890, 0.2060
0.5000, 0.0000, 0.2300, 0.1010, 0.1920, 0.1254, 0.5200, 0.4000, 0.4291, 0.0800, 0.1350
0.2300, 0.0000, 0.2260, 0.0890, 0.1390, 0.0648, 0.6100, 0.2000, 0.4190, 0.0680, 0.0970
0.3600, 1.0000, 0.2200, 0.0900, 0.1600, 0.0996, 0.5000, 0.3000, 0.3951, 0.0820, 0.1380
0.6600, 1.0000, 0.2620, 0.1140, 0.2550, 0.1850, 0.5600, 0.4550, 0.4249, 0.0920, 0.0630
0.6000, 1.0000, 0.3210, 0.0830, 0.1790, 0.1194, 0.4200, 0.4000, 0.4477, 0.0940, 0.1100
0.2900, 0.0000, 0.3000, 0.0850, 0.1800, 0.0934, 0.4300, 0.4000, 0.5385, 0.0880, 0.3100
0.2200, 0.0000, 0.1860, 0.0970, 0.1140, 0.0576, 0.4600, 0.2000, 0.3951, 0.0830, 0.1010
0.5600, 1.0000, 0.2800, 0.0850, 0.1840, 0.1448, 0.3200, 0.6000, 0.3584, 0.0770, 0.0690
0.5300, 0.0000, 0.2370, 0.0920, 0.1860, 0.1092, 0.6200, 0.3000, 0.4304, 0.0810, 0.1790
0.5000, 1.0000, 0.2620, 0.0970, 0.1860, 0.1054, 0.4900, 0.4000, 0.5063, 0.0880, 0.1850
0.6100, 0.0000, 0.2400, 0.0910, 0.2020, 0.1154, 0.7200, 0.3000, 0.4291, 0.0730, 0.1180
0.3400, 1.0000, 0.2470, 0.1180, 0.2540, 0.1842, 0.3900, 0.7000, 0.5037, 0.0810, 0.1710
0.4700, 0.0000, 0.3030, 0.1090, 0.2070, 0.1002, 0.7000, 0.3000, 0.5215, 0.0980, 0.1660
0.6800, 1.0000, 0.2750, 0.1110, 0.2140, 0.1470, 0.3900, 0.5000, 0.4942, 0.0910, 0.1440
0.3800, 0.0000, 0.2540, 0.0840, 0.1620, 0.1030, 0.4200, 0.4000, 0.4443, 0.0870, 0.0970
0.4100, 0.0000, 0.2470, 0.0830, 0.1870, 0.1082, 0.6000, 0.3000, 0.4543, 0.0780, 0.1680
0.3500, 0.0000, 0.2110, 0.0820, 0.1560, 0.0878, 0.5000, 0.3000, 0.4511, 0.0950, 0.0680
0.2500, 1.0000, 0.2430, 0.0950, 0.1620, 0.0986, 0.5400, 0.3000, 0.3850, 0.0870, 0.0490
0.2500, 0.0000, 0.2600, 0.0920, 0.1870, 0.1204, 0.5600, 0.3000, 0.3970, 0.0880, 0.0680
0.6100, 1.0000, 0.3200, 0.1037, 0.2100, 0.0852, 0.3500, 0.6000, 0.6107, 0.1240, 0.2450
0.3100, 0.0000, 0.2970, 0.0880, 0.1670, 0.1034, 0.4800, 0.4000, 0.4357, 0.0780, 0.1840
0.3000, 1.0000, 0.2520, 0.0830, 0.1780, 0.1184, 0.3400, 0.5000, 0.4852, 0.0830, 0.2020
0.1900, 0.0000, 0.1920, 0.0870, 0.1240, 0.0540, 0.5700, 0.2000, 0.4174, 0.0900, 0.1370
0.4200, 0.0000, 0.3190, 0.0830, 0.1580, 0.0876, 0.5300, 0.3000, 0.4466, 0.1010, 0.0850
0.6300, 0.0000, 0.2440, 0.0730, 0.1600, 0.0914, 0.4800, 0.3000, 0.4635, 0.0780, 0.1310
0.6700, 1.0000, 0.2580, 0.1130, 0.1580, 0.0542, 0.6400, 0.2000, 0.5293, 0.1040, 0.2830
0.3200, 0.0000, 0.3050, 0.0890, 0.1820, 0.1106, 0.5600, 0.3000, 0.4344, 0.0890, 0.1290
0.4200, 0.0000, 0.2030, 0.0710, 0.1610, 0.0812, 0.6600, 0.2000, 0.4234, 0.0810, 0.0590
0.5800, 1.0000, 0.3800, 0.1030, 0.1500, 0.1072, 0.2200, 0.7000, 0.4644, 0.0980, 0.3410
0.5700, 0.0000, 0.2170, 0.0940, 0.1570, 0.0580, 0.8200, 0.2000, 0.4443, 0.0920, 0.0870
0.5300, 0.0000, 0.2050, 0.0780, 0.1470, 0.0842, 0.5200, 0.3000, 0.3989, 0.0750, 0.0650
0.6200, 1.0000, 0.2350, 0.0803, 0.2250, 0.1128, 0.8600, 0.2620, 0.4875, 0.0960, 0.1020
0.5200, 0.0000, 0.2850, 0.1100, 0.1950, 0.0972, 0.6000, 0.3000, 0.5242, 0.0850, 0.2650
0.4600, 0.0000, 0.2740, 0.0780, 0.1710, 0.0880, 0.5800, 0.3000, 0.4828, 0.0900, 0.2760
0.4800, 1.0000, 0.3300, 0.1230, 0.2530, 0.1636, 0.4400, 0.6000, 0.5425, 0.0970, 0.2520
0.4800, 1.0000, 0.2770, 0.0730, 0.1910, 0.1194, 0.4600, 0.4000, 0.4852, 0.0920, 0.0900
0.5000, 1.0000, 0.2560, 0.1010, 0.2290, 0.1622, 0.4300, 0.5000, 0.4779, 0.1140, 0.1000
0.2100, 0.0000, 0.2010, 0.0630, 0.1350, 0.0690, 0.5400, 0.3000, 0.4094, 0.0890, 0.0550
0.3200, 1.0000, 0.2540, 0.0903, 0.1530, 0.1004, 0.3400, 0.4500, 0.4533, 0.0830, 0.0610
0.5400, 0.0000, 0.2420, 0.0740, 0.2040, 0.1090, 0.8200, 0.2000, 0.4174, 0.1090, 0.0920
0.6100, 1.0000, 0.3270, 0.0970, 0.1770, 0.1184, 0.2900, 0.6000, 0.4997, 0.0870, 0.2590
0.5600, 1.0000, 0.2310, 0.1040, 0.1810, 0.1164, 0.4700, 0.4000, 0.4477, 0.0790, 0.0530
0.3300, 0.0000, 0.2530, 0.0850, 0.1550, 0.0850, 0.5100, 0.3000, 0.4554, 0.0700, 0.1900
0.2700, 0.0000, 0.1960, 0.0780, 0.1280, 0.0680, 0.4300, 0.3000, 0.4443, 0.0710, 0.1420
0.6700, 1.0000, 0.2250, 0.0980, 0.1910, 0.1192, 0.6100, 0.3000, 0.3989, 0.0860, 0.0750
0.3700, 1.0000, 0.2770, 0.0930, 0.1800, 0.1194, 0.3000, 0.6000, 0.5030, 0.0880, 0.1420
0.5800, 0.0000, 0.2570, 0.0990, 0.1570, 0.0916, 0.4900, 0.3000, 0.4407, 0.0930, 0.1550
0.6500, 1.0000, 0.2790, 0.1030, 0.1590, 0.0968, 0.4200, 0.4000, 0.4615, 0.0860, 0.2250
0.3400, 0.0000, 0.2550, 0.0930, 0.2180, 0.1440, 0.5700, 0.4000, 0.4443, 0.0880, 0.0590
0.4600, 0.0000, 0.2490, 0.1150, 0.1980, 0.1296, 0.5400, 0.4000, 0.4277, 0.1030, 0.1040
0.3500, 0.0000, 0.2870, 0.0970, 0.2040, 0.1268, 0.6400, 0.3000, 0.4190, 0.0930, 0.1820
0.3700, 0.0000, 0.2180, 0.0840, 0.1840, 0.1010, 0.7300, 0.3000, 0.3912, 0.0930, 0.1280
0.3700, 0.0000, 0.3020, 0.0870, 0.1660, 0.0960, 0.4000, 0.4150, 0.5011, 0.0870, 0.0520
0.4100, 0.0000, 0.2050, 0.0800, 0.1240, 0.0488, 0.6400, 0.2000, 0.4025, 0.0750, 0.0370
0.6000, 0.0000, 0.2040, 0.1050, 0.1980, 0.0784, 0.9900, 0.2000, 0.4635, 0.0790, 0.1700
0.6600, 1.0000, 0.2400, 0.0980, 0.2360, 0.1464, 0.5800, 0.4000, 0.5063, 0.0960, 0.1700
0.2900, 0.0000, 0.2600, 0.0830, 0.1410, 0.0652, 0.6400, 0.2000, 0.4078, 0.0830, 0.0610
0.3700, 1.0000, 0.2680, 0.0790, 0.1570, 0.0980, 0.2800, 0.6000, 0.5043, 0.0960, 0.1440
0.4100, 1.0000, 0.2570, 0.0830, 0.1810, 0.1066, 0.6600, 0.3000, 0.3738, 0.0850, 0.0520
0.3900, 0.0000, 0.2290, 0.0770, 0.2040, 0.1432, 0.4600, 0.4000, 0.4304, 0.0740, 0.1280
0.6700, 1.0000, 0.2400, 0.0830, 0.1430, 0.0772, 0.4900, 0.3000, 0.4431, 0.0940, 0.0710
0.3600, 1.0000, 0.2410, 0.1120, 0.1930, 0.1250, 0.3500, 0.6000, 0.5106, 0.0950, 0.1630
0.4600, 1.0000, 0.2470, 0.0850, 0.1740, 0.1232, 0.3000, 0.6000, 0.4644, 0.0960, 0.1500
0.6000, 1.0000, 0.2500, 0.0897, 0.1850, 0.1208, 0.4600, 0.4020, 0.4511, 0.0920, 0.0970
0.5900, 1.0000, 0.2360, 0.0830, 0.1650, 0.1000, 0.4700, 0.4000, 0.4500, 0.0920, 0.1600
0.5300, 0.0000, 0.2210, 0.0930, 0.1340, 0.0762, 0.4600, 0.3000, 0.4078, 0.0960, 0.1780
0.4800, 0.0000, 0.1990, 0.0910, 0.1890, 0.1096, 0.6900, 0.3000, 0.3951, 0.1010, 0.0480
0.4800, 0.0000, 0.2950, 0.1310, 0.2070, 0.1322, 0.4700, 0.4000, 0.4935, 0.1060, 0.2700
0.6600, 1.0000, 0.2600, 0.0910, 0.2640, 0.1466, 0.6500, 0.4000, 0.5568, 0.0870, 0.2020
0.5200, 1.0000, 0.2450, 0.0940, 0.2170, 0.1494, 0.4800, 0.5000, 0.4585, 0.0890, 0.1110
0.5200, 1.0000, 0.2660, 0.1110, 0.2090, 0.1264, 0.6100, 0.3000, 0.4682, 0.1090, 0.0850
0.4600, 1.0000, 0.2350, 0.0870, 0.1810, 0.1148, 0.4400, 0.4000, 0.4710, 0.0980, 0.0420
0.4000, 1.0000, 0.2900, 0.1150, 0.0970, 0.0472, 0.3500, 0.2770, 0.4304, 0.0950, 0.1700
0.2200, 0.0000, 0.2300, 0.0730, 0.1610, 0.0978, 0.5400, 0.3000, 0.3829, 0.0910, 0.2000
0.5000, 0.0000, 0.2100, 0.0880, 0.1400, 0.0718, 0.3500, 0.4000, 0.5112, 0.0710, 0.2520
0.2000, 0.0000, 0.2290, 0.0870, 0.1910, 0.1282, 0.5300, 0.4000, 0.3892, 0.0850, 0.1130
0.6800, 0.0000, 0.2750, 0.1070, 0.2410, 0.1496, 0.6400, 0.4000, 0.4920, 0.0900, 0.1430
0.5200, 1.0000, 0.2430, 0.0860, 0.1970, 0.1336, 0.4400, 0.5000, 0.4575, 0.0910, 0.0510
0.4400, 0.0000, 0.2310, 0.0870, 0.2130, 0.1264, 0.7700, 0.3000, 0.3871, 0.0720, 0.0520
0.3800, 0.0000, 0.2730, 0.0810, 0.1460, 0.0816, 0.4700, 0.3000, 0.4466, 0.0810, 0.2100
0.4900, 0.0000, 0.2270, 0.0653, 0.1680, 0.0962, 0.6200, 0.2710, 0.3892, 0.0600, 0.0650
0.6100, 0.0000, 0.3300, 0.0950, 0.1820, 0.1148, 0.5400, 0.3000, 0.4190, 0.0740, 0.1410
0.2900, 1.0000, 0.1940, 0.0830, 0.1520, 0.1058, 0.3900, 0.4000, 0.3584, 0.0830, 0.0550
0.6100, 0.0000, 0.2580, 0.0980, 0.2350, 0.1258, 0.7600, 0.3000, 0.5112, 0.0820, 0.1340
0.3400, 1.0000, 0.2260, 0.0750, 0.1660, 0.0918, 0.6000, 0.3000, 0.4263, 0.1080, 0.0420
0.3600, 0.0000, 0.2190, 0.0890, 0.1890, 0.1052, 0.6800, 0.3000, 0.4369, 0.0960, 0.1110
0.5200, 0.0000, 0.2400, 0.0830, 0.1670, 0.0866, 0.7100, 0.2000, 0.3850, 0.0940, 0.0980
0.6100, 0.0000, 0.3120, 0.0790, 0.2350, 0.1568, 0.4700, 0.5000, 0.5050, 0.0960, 0.1640
0.4300, 0.0000, 0.2680, 0.1230, 0.1930, 0.1022, 0.6700, 0.3000, 0.4779, 0.0940, 0.0480
0.3500, 0.0000, 0.2040, 0.0650, 0.1870, 0.1056, 0.6700, 0.2790, 0.4277, 0.0780, 0.0960
0.2700, 0.0000, 0.2480, 0.0910, 0.1890, 0.1068, 0.6900, 0.3000, 0.4190, 0.0690, 0.0900
0.2900, 0.0000, 0.2100, 0.0710, 0.1560, 0.0970, 0.3800, 0.4000, 0.4654, 0.0900, 0.1620
0.6400, 1.0000, 0.2730, 0.1090, 0.1860, 0.1076, 0.3800, 0.5000, 0.5308, 0.0990, 0.1500
0.4100, 0.0000, 0.3460, 0.0873, 0.2050, 0.1426, 0.4100, 0.5000, 0.4673, 0.1100, 0.2790
0.4900, 1.0000, 0.2590, 0.0910, 0.1780, 0.1066, 0.5200, 0.3000, 0.4575, 0.0750, 0.0920
0.4800, 0.0000, 0.2040, 0.0980, 0.2090, 0.1394, 0.4600, 0.5000, 0.4771, 0.0780, 0.0830
0.5300, 0.0000, 0.2800, 0.0880, 0.2330, 0.1438, 0.5800, 0.4000, 0.5050, 0.0910, 0.1280
0.5300, 1.0000, 0.2220, 0.1130, 0.1970, 0.1152, 0.6700, 0.3000, 0.4304, 0.1000, 0.1020
0.2300, 0.0000, 0.2900, 0.0900, 0.2160, 0.1314, 0.6500, 0.3000, 0.4585, 0.0910, 0.3020
0.6500, 1.0000, 0.3020, 0.0980, 0.2190, 0.1606, 0.4000, 0.5000, 0.4522, 0.0840, 0.1980
0.4100, 0.0000, 0.3240, 0.0940, 0.1710, 0.1044, 0.5600, 0.3000, 0.3970, 0.0760, 0.0950
0.5500, 1.0000, 0.2340, 0.0830, 0.1660, 0.1016, 0.4600, 0.4000, 0.4522, 0.0960, 0.0530
0.2200, 0.0000, 0.1930, 0.0820, 0.1560, 0.0932, 0.5200, 0.3000, 0.3989, 0.0710, 0.1340
0.5600, 0.0000, 0.3100, 0.0787, 0.1870, 0.1414, 0.3400, 0.5500, 0.4060, 0.0900, 0.1440
0.5400, 1.0000, 0.3060, 0.1033, 0.1440, 0.0798, 0.3000, 0.4800, 0.5142, 0.1010, 0.2320
0.5900, 1.0000, 0.2550, 0.0953, 0.1900, 0.1394, 0.3500, 0.5430, 0.4357, 0.1170, 0.0810
0.6000, 1.0000, 0.2340, 0.0880, 0.1530, 0.0898, 0.5800, 0.3000, 0.3258, 0.0950, 0.1040
0.5400, 0.0000, 0.2680, 0.0870, 0.2060, 0.1220, 0.6800, 0.3000, 0.4382, 0.0800, 0.0590
0.2500, 0.0000, 0.2830, 0.0870, 0.1930, 0.1280, 0.4900, 0.4000, 0.4382, 0.0920, 0.2460
0.5400, 1.0000, 0.2770, 0.1130, 0.2000, 0.1284, 0.3700, 0.5000, 0.5153, 0.1130, 0.2970
0.5500, 0.0000, 0.3660, 0.1130, 0.1990, 0.0944, 0.4300, 0.4630, 0.5730, 0.0970, 0.2580
0.4000, 1.0000, 0.2650, 0.0930, 0.2360, 0.1470, 0.3700, 0.7000, 0.5561, 0.0920, 0.2290
0.6200, 1.0000, 0.3180, 0.1150, 0.1990, 0.1286, 0.4400, 0.5000, 0.4883, 0.0980, 0.2750
0.6500, 0.0000, 0.2440, 0.1200, 0.2220, 0.1356, 0.3700, 0.6000, 0.5509, 0.1240, 0.2810
0.3300, 1.0000, 0.2540, 0.1020, 0.2060, 0.1410, 0.3900, 0.5000, 0.4868, 0.1050, 0.1790
0.5300, 0.0000, 0.2200, 0.0940, 0.1750, 0.0880, 0.5900, 0.3000, 0.4942, 0.0980, 0.2000
0.3500, 0.0000, 0.2680, 0.0980, 0.1620, 0.1036, 0.4500, 0.4000, 0.4205, 0.0860, 0.2000
0.6600, 0.0000, 0.2800, 0.1010, 0.1950, 0.1292, 0.4000, 0.5000, 0.4860, 0.0940, 0.1730
0.6200, 1.0000, 0.3390, 0.1010, 0.2210, 0.1564, 0.3500, 0.6000, 0.4997, 0.1030, 0.1800
0.5000, 1.0000, 0.2960, 0.0943, 0.3000, 0.2424, 0.3300, 0.9090, 0.4812, 0.1090, 0.0840
0.4700, 0.0000, 0.2860, 0.0970, 0.1640, 0.0906, 0.5600, 0.3000, 0.4466, 0.0880, 0.1210
0.4700, 1.0000, 0.2560, 0.0940, 0.1650, 0.0748, 0.4000, 0.4000, 0.5526, 0.0930, 0.1610
0.2400, 0.0000, 0.2070, 0.0870, 0.1490, 0.0806, 0.6100, 0.2000, 0.3611, 0.0780, 0.0990
0.5800, 1.0000, 0.2620, 0.0910, 0.2170, 0.1242, 0.7100, 0.3000, 0.4691, 0.0680, 0.1090
0.3400, 0.0000, 0.2060, 0.0870, 0.1850, 0.1122, 0.5800, 0.3000, 0.4304, 0.0740, 0.1150
0.5100, 0.0000, 0.2790, 0.0960, 0.1960, 0.1222, 0.4200, 0.5000, 0.5069, 0.1200, 0.2680
0.3100, 1.0000, 0.3530, 0.1250, 0.1870, 0.1124, 0.4800, 0.4000, 0.4890, 0.1090, 0.2740
0.2200, 0.0000, 0.1990, 0.0750, 0.1750, 0.1086, 0.5400, 0.3000, 0.4127, 0.0720, 0.1580
0.5300, 1.0000, 0.2440, 0.0920, 0.2140, 0.1460, 0.5000, 0.4000, 0.4500, 0.0970, 0.1070
0.3700, 1.0000, 0.2140, 0.0830, 0.1280, 0.0696, 0.4900, 0.3000, 0.3850, 0.0840, 0.0830
0.2800, 0.0000, 0.3040, 0.0850, 0.1980, 0.1156, 0.6700, 0.3000, 0.4344, 0.0800, 0.1030
0.4700, 0.0000, 0.3160, 0.0840, 0.1540, 0.0880, 0.3000, 0.5100, 0.5199, 0.1050, 0.2720
0.2300, 0.0000, 0.1880, 0.0780, 0.1450, 0.0720, 0.6300, 0.2000, 0.3912, 0.0860, 0.0850
0.5000, 0.0000, 0.3100, 0.1230, 0.1780, 0.1050, 0.4800, 0.4000, 0.4828, 0.0880, 0.2800
0.5800, 1.0000, 0.3670, 0.1170, 0.1660, 0.0938, 0.4400, 0.4000, 0.4949, 0.1090, 0.3360
0.5500, 0.0000, 0.3210, 0.1100, 0.1640, 0.0842, 0.4200, 0.4000, 0.5242, 0.0900, 0.2810
0.6000, 1.0000, 0.2770, 0.1070, 0.1670, 0.1146, 0.3800, 0.4000, 0.4277, 0.0950, 0.1180
0.4100, 0.0000, 0.3080, 0.0810, 0.2140, 0.1520, 0.2800, 0.7600, 0.5136, 0.1230, 0.3170
0.6000, 1.0000, 0.2750, 0.1060, 0.2290, 0.1438, 0.5100, 0.4000, 0.5142, 0.0910, 0.2350
0.4000, 0.0000, 0.2690, 0.0920, 0.2030, 0.1198, 0.7000, 0.3000, 0.4190, 0.0810, 0.0600
0.5700, 1.0000, 0.3070, 0.0900, 0.2040, 0.1478, 0.3400, 0.6000, 0.4710, 0.0930, 0.1740
0.3700, 0.0000, 0.3830, 0.1130, 0.1650, 0.0946, 0.5300, 0.3000, 0.4466, 0.0790, 0.2590
0.4000, 1.0000, 0.3190, 0.0950, 0.1980, 0.1356, 0.3800, 0.5000, 0.4804, 0.0930, 0.1780
0.3300, 0.0000, 0.3500, 0.0890, 0.2000, 0.1304, 0.4200, 0.4760, 0.4927, 0.1010, 0.1280
0.3200, 1.0000, 0.2780, 0.0890, 0.2160, 0.1462, 0.5500, 0.4000, 0.4304, 0.0910, 0.0960
0.3500, 1.0000, 0.2590, 0.0810, 0.1740, 0.1024, 0.3100, 0.6000, 0.5313, 0.0820, 0.1260
0.5500, 0.0000, 0.3290, 0.1020, 0.1640, 0.1062, 0.4100, 0.4000, 0.4431, 0.0890, 0.2880
0.4900, 0.0000, 0.2600, 0.0930, 0.1830, 0.1002, 0.6400, 0.3000, 0.4543, 0.0880, 0.0880
0.3900, 1.0000, 0.2630, 0.1150, 0.2180, 0.1582, 0.3200, 0.7000, 0.4935, 0.1090, 0.2920
0.6000, 1.0000, 0.2230, 0.1130, 0.1860, 0.1258, 0.4600, 0.4000, 0.4263, 0.0940, 0.0710
0.6700, 1.0000, 0.2830, 0.0930, 0.2040, 0.1322, 0.4900, 0.4000, 0.4736, 0.0920, 0.1970
0.4100, 1.0000, 0.3200, 0.1090, 0.2510, 0.1706, 0.4900, 0.5000, 0.5056, 0.1030, 0.1860
0.4400, 0.0000, 0.2540, 0.0950, 0.1620, 0.0926, 0.5300, 0.3000, 0.4407, 0.0830, 0.0250
0.4800, 1.0000, 0.2330, 0.0893, 0.2120, 0.1428, 0.4600, 0.4610, 0.4754, 0.0980, 0.0840
0.4500, 0.0000, 0.2030, 0.0743, 0.1900, 0.1262, 0.4900, 0.3880, 0.4304, 0.0790, 0.0960
0.4700, 0.0000, 0.3040, 0.1200, 0.1990, 0.1200, 0.4600, 0.4000, 0.5106, 0.0870, 0.1950
0.4600, 0.0000, 0.2060, 0.0730, 0.1720, 0.1070, 0.5100, 0.3000, 0.4249, 0.0800, 0.0530
0.3600, 1.0000, 0.3230, 0.1150, 0.2860, 0.1994, 0.3900, 0.7000, 0.5472, 0.1120, 0.2170
0.3400, 0.0000, 0.2920, 0.0730, 0.1720, 0.1082, 0.4900, 0.4000, 0.4304, 0.0910, 0.1720
0.5300, 1.0000, 0.3310, 0.1170, 0.1830, 0.1190, 0.4800, 0.4000, 0.4382, 0.1060, 0.1310
0.6100, 0.0000, 0.2460, 0.1010, 0.2090, 0.1068, 0.7700, 0.3000, 0.4836, 0.0880, 0.2140
0.3700, 0.0000, 0.2020, 0.0810, 0.1620, 0.0878, 0.6300, 0.3000, 0.4025, 0.0880, 0.0590
0.3300, 1.0000, 0.2080, 0.0840, 0.1250, 0.0702, 0.4600, 0.3000, 0.3784, 0.0660, 0.0700
0.6800, 0.0000, 0.3280, 0.1057, 0.2050, 0.1164, 0.4000, 0.5130, 0.5493, 0.1170, 0.2200
0.4900, 1.0000, 0.3190, 0.0940, 0.2340, 0.1558, 0.3400, 0.7000, 0.5398, 0.1220, 0.2680
0.4800, 0.0000, 0.2390, 0.1090, 0.2320, 0.1052, 0.3700, 0.6000, 0.6107, 0.0960, 0.1520
0.5500, 1.0000, 0.2450, 0.0840, 0.1790, 0.1058, 0.6600, 0.3000, 0.3584, 0.0870, 0.0470
0.4300, 0.0000, 0.2210, 0.0660, 0.1340, 0.0772, 0.4500, 0.3000, 0.4078, 0.0800, 0.0740
0.6000, 1.0000, 0.3300, 0.0970, 0.2170, 0.1256, 0.4500, 0.5000, 0.5447, 0.1120, 0.2950
0.3100, 1.0000, 0.1900, 0.0930, 0.1370, 0.0730, 0.4700, 0.3000, 0.4443, 0.0780, 0.1010
0.5300, 1.0000, 0.2730, 0.0820, 0.1190, 0.0550, 0.3900, 0.3000, 0.4828, 0.0930, 0.1510
0.6700, 0.0000, 0.2280, 0.0870, 0.1660, 0.0986, 0.5200, 0.3000, 0.4344, 0.0920, 0.1270
0.6100, 1.0000, 0.2820, 0.1060, 0.2040, 0.1320, 0.5200, 0.4000, 0.4605, 0.0960, 0.2370
0.6200, 0.0000, 0.2890, 0.0873, 0.2060, 0.1272, 0.3300, 0.6240, 0.5434, 0.0990, 0.2250
0.6000, 0.0000, 0.2560, 0.0870, 0.2070, 0.1258, 0.6900, 0.3000, 0.4111, 0.0840, 0.0810
0.4200, 0.0000, 0.2490, 0.0910, 0.2040, 0.1418, 0.3800, 0.5000, 0.4796, 0.0890, 0.1510
0.3800, 1.0000, 0.2680, 0.1050, 0.1810, 0.1192, 0.3700, 0.5000, 0.4820, 0.0910, 0.1070
0.6200, 0.0000, 0.2240, 0.0790, 0.2220, 0.1474, 0.5900, 0.4000, 0.4357, 0.0760, 0.0640
0.6100, 1.0000, 0.2690, 0.1110, 0.2360, 0.1724, 0.3900, 0.6000, 0.4812, 0.0890, 0.1380
0.6100, 1.0000, 0.2310, 0.1130, 0.1860, 0.1144, 0.4700, 0.4000, 0.4812, 0.1050, 0.1850
0.5300, 0.0000, 0.2860, 0.0880, 0.1710, 0.0988, 0.4100, 0.4000, 0.5050, 0.0990, 0.2650
0.2800, 1.0000, 0.2470, 0.0970, 0.1750, 0.0996, 0.3200, 0.5000, 0.5380, 0.0870, 0.1010
0.2600, 1.0000, 0.3030, 0.0890, 0.2180, 0.1522, 0.3100, 0.7000, 0.5159, 0.0820, 0.1370
0.3000, 0.0000, 0.2130, 0.0870, 0.1340, 0.0630, 0.6300, 0.2000, 0.3689, 0.0660, 0.1430
0.5000, 0.0000, 0.2610, 0.1090, 0.2430, 0.1606, 0.6200, 0.4000, 0.4625, 0.0890, 0.1410
0.4800, 0.0000, 0.2020, 0.0950, 0.1870, 0.1174, 0.5300, 0.4000, 0.4419, 0.0850, 0.0790
0.5100, 0.0000, 0.2520, 0.1030, 0.1760, 0.1122, 0.3700, 0.5000, 0.4898, 0.0900, 0.2920
0.4700, 1.0000, 0.2250, 0.0820, 0.1310, 0.0668, 0.4100, 0.3000, 0.4754, 0.0890, 0.1780
0.6400, 1.0000, 0.2350, 0.0970, 0.2030, 0.1290, 0.5900, 0.3000, 0.4318, 0.0770, 0.0910
0.5100, 1.0000, 0.2590, 0.0760, 0.2400, 0.1690, 0.3900, 0.6000, 0.5075, 0.0960, 0.1160
0.3000, 0.0000, 0.2090, 0.1040, 0.1520, 0.0838, 0.4700, 0.3000, 0.4663, 0.0970, 0.0860
0.5600, 1.0000, 0.2870, 0.0990, 0.2080, 0.1464, 0.3900, 0.5000, 0.4727, 0.0970, 0.1220
0.4200, 0.0000, 0.2210, 0.0850, 0.2130, 0.1386, 0.6000, 0.4000, 0.4277, 0.0940, 0.0720
0.6200, 1.0000, 0.2670, 0.1150, 0.1830, 0.1240, 0.3500, 0.5000, 0.4788, 0.1000, 0.1290
0.3400, 0.0000, 0.3140, 0.0870, 0.1490, 0.0938, 0.4600, 0.3000, 0.3829, 0.0770, 0.1420
0.6000, 0.0000, 0.2220, 0.1047, 0.2210, 0.1054, 0.6000, 0.3680, 0.5628, 0.0930, 0.0900
0.6400, 0.0000, 0.2100, 0.0923, 0.2270, 0.1468, 0.6500, 0.3490, 0.4331, 0.1020, 0.1580
0.3900, 1.0000, 0.2120, 0.0900, 0.1820, 0.1104, 0.6000, 0.3000, 0.4060, 0.0980, 0.0390
0.7100, 1.0000, 0.2650, 0.1050, 0.2810, 0.1736, 0.5500, 0.5000, 0.5568, 0.0840, 0.1960
0.4800, 1.0000, 0.2920, 0.1100, 0.2180, 0.1516, 0.3900, 0.6000, 0.4920, 0.0980, 0.2220
0.7900, 1.0000, 0.2700, 0.1030, 0.1690, 0.1108, 0.3700, 0.5000, 0.4663, 0.1100, 0.2770
0.4000, 0.0000, 0.3070, 0.0990, 0.1770, 0.0854, 0.5000, 0.4000, 0.5338, 0.0850, 0.0990
0.4900, 1.0000, 0.2880, 0.0920, 0.2070, 0.1400, 0.4400, 0.5000, 0.4745, 0.0920, 0.1960
0.5100, 0.0000, 0.3060, 0.1030, 0.1980, 0.1066, 0.5700, 0.3000, 0.5148, 0.1000, 0.2020
0.5700, 0.0000, 0.3010, 0.1170, 0.2020, 0.1396, 0.4200, 0.5000, 0.4625, 0.1200, 0.1550
0.5900, 1.0000, 0.2470, 0.1140, 0.1520, 0.1048, 0.2900, 0.5000, 0.4511, 0.0880, 0.0770
0.5100, 0.0000, 0.2770, 0.0990, 0.2290, 0.1456, 0.6900, 0.3000, 0.4277, 0.0770, 0.1910
0.7400, 0.0000, 0.2980, 0.1010, 0.1710, 0.1048, 0.5000, 0.3000, 0.4394, 0.0860, 0.0700
0.6700, 0.0000, 0.2670, 0.1050, 0.2250, 0.1354, 0.6900, 0.3000, 0.4635, 0.0960, 0.0730
0.4900, 0.0000, 0.1980, 0.0880, 0.1880, 0.1148, 0.5700, 0.3000, 0.4394, 0.0930, 0.0490
0.5700, 0.0000, 0.2330, 0.0880, 0.1550, 0.0636, 0.7800, 0.2000, 0.4205, 0.0780, 0.0650
0.5600, 1.0000, 0.3510, 0.1230, 0.1640, 0.0950, 0.3800, 0.4000, 0.5043, 0.1170, 0.2630
0.5200, 1.0000, 0.2970, 0.1090, 0.2280, 0.1628, 0.3100, 0.8000, 0.5142, 0.1030, 0.2480
0.6900, 0.0000, 0.2930, 0.1240, 0.2230, 0.1390, 0.5400, 0.4000, 0.5011, 0.1020, 0.2960
0.3700, 0.0000, 0.2030, 0.0830, 0.1850, 0.1246, 0.3800, 0.5000, 0.4719, 0.0880, 0.2140
0.2400, 0.0000, 0.2250, 0.0890, 0.1410, 0.0680, 0.5200, 0.3000, 0.4654, 0.0840, 0.1850
0.5500, 1.0000, 0.2270, 0.0930, 0.1540, 0.0942, 0.5300, 0.3000, 0.3526, 0.0750, 0.0780
0.3600, 0.0000, 0.2280, 0.0870, 0.1780, 0.1160, 0.4100, 0.4000, 0.4654, 0.0820, 0.0930
0.4200, 1.0000, 0.2400, 0.1070, 0.1500, 0.0850, 0.4400, 0.3000, 0.4654, 0.0960, 0.2520
0.2100, 0.0000, 0.2420, 0.0760, 0.1470, 0.0770, 0.5300, 0.3000, 0.4443, 0.0790, 0.1500
0.4100, 0.0000, 0.2020, 0.0620, 0.1530, 0.0890, 0.5000, 0.3000, 0.4249, 0.0890, 0.0770
0.5700, 1.0000, 0.2940, 0.1090, 0.1600, 0.0876, 0.3100, 0.5000, 0.5333, 0.0920, 0.2080
0.2000, 1.0000, 0.2210, 0.0870, 0.1710, 0.0996, 0.5800, 0.3000, 0.4205, 0.0780, 0.0770
0.6700, 1.0000, 0.2360, 0.1113, 0.1890, 0.1054, 0.7000, 0.2700, 0.4220, 0.0930, 0.1080
0.3400, 0.0000, 0.2520, 0.0770, 0.1890, 0.1206, 0.5300, 0.4000, 0.4344, 0.0790, 0.1600
0.4100, 1.0000, 0.2490, 0.0860, 0.1920, 0.1150, 0.6100, 0.3000, 0.4382, 0.0940, 0.0530
0.3800, 1.0000, 0.3300, 0.0780, 0.3010, 0.2150, 0.5000, 0.6020, 0.5193, 0.1080, 0.2200
0.5100, 0.0000, 0.2350, 0.1010, 0.1950, 0.1210, 0.5100, 0.4000, 0.4745, 0.0940, 0.1540
0.5200, 1.0000, 0.2640, 0.0913, 0.2180, 0.1520, 0.3900, 0.5590, 0.4905, 0.0990, 0.2590
0.6700, 0.0000, 0.2980, 0.0800, 0.1720, 0.0934, 0.6300, 0.3000, 0.4357, 0.0820, 0.0900
0.6100, 0.0000, 0.3000, 0.1080, 0.1940, 0.1000, 0.5200, 0.3730, 0.5347, 0.1050, 0.2460
0.6700, 1.0000, 0.2500, 0.1117, 0.1460, 0.0934, 0.3300, 0.4420, 0.4585, 0.1030, 0.1240
0.5600, 0.0000, 0.2700, 0.1050, 0.2470, 0.1606, 0.5400, 0.5000, 0.5088, 0.0940, 0.0670
0.6400, 0.0000, 0.2000, 0.0747, 0.1890, 0.1148, 0.6200, 0.3050, 0.4111, 0.0910, 0.0720
0.5800, 1.0000, 0.2550, 0.1120, 0.1630, 0.1106, 0.2900, 0.6000, 0.4762, 0.0860, 0.2570
0.5500, 0.0000, 0.2820, 0.0910, 0.2500, 0.1402, 0.6700, 0.4000, 0.5366, 0.1030, 0.2620
0.6200, 1.0000, 0.3330, 0.1140, 0.1820, 0.1140, 0.3800, 0.5000, 0.5011, 0.0960, 0.2750
0.5700, 1.0000, 0.2560, 0.0960, 0.2000, 0.1330, 0.5200, 0.3850, 0.4318, 0.1050, 0.1770
0.2000, 1.0000, 0.2420, 0.0880, 0.1260, 0.0722, 0.4500, 0.3000, 0.3784, 0.0740, 0.0710
0.5300, 1.0000, 0.2210, 0.0980, 0.1650, 0.1052, 0.4700, 0.4000, 0.4159, 0.0810, 0.0470
0.3200, 1.0000, 0.3140, 0.0890, 0.1530, 0.0842, 0.5600, 0.3000, 0.4159, 0.0900, 0.1870
0.4100, 0.0000, 0.2310, 0.0860, 0.1480, 0.0780, 0.5800, 0.3000, 0.4094, 0.0600, 0.1250
0.6000, 0.0000, 0.2340, 0.0767, 0.2470, 0.1480, 0.6500, 0.3800, 0.5136, 0.0770, 0.0780
0.2600, 0.0000, 0.1880, 0.0830, 0.1910, 0.1036, 0.6900, 0.3000, 0.4522, 0.0690, 0.0510
0.3700, 0.0000, 0.3080, 0.1120, 0.2820, 0.1972, 0.4300, 0.7000, 0.5342, 0.1010, 0.2580
0.4500, 0.0000, 0.3200, 0.1100, 0.2240, 0.1342, 0.4500, 0.5000, 0.5412, 0.0930, 0.2150
0.6700, 0.0000, 0.3160, 0.1160, 0.1790, 0.0904, 0.4100, 0.4000, 0.5472, 0.1000, 0.3030
0.3400, 1.0000, 0.3550, 0.1200, 0.2330, 0.1466, 0.3400, 0.7000, 0.5568, 0.1010, 0.2430
0.5000, 0.0000, 0.3190, 0.0783, 0.2070, 0.1492, 0.3800, 0.5450, 0.4595, 0.0840, 0.0910
0.7100, 0.0000, 0.2950, 0.0970, 0.2270, 0.1516, 0.4500, 0.5000, 0.5024, 0.1080, 0.1500
0.5700, 1.0000, 0.3160, 0.1170, 0.2250, 0.1076, 0.4000, 0.6000, 0.5958, 0.1130, 0.3100
0.4900, 0.0000, 0.2030, 0.0930, 0.1840, 0.1030, 0.6100, 0.3000, 0.4605, 0.0930, 0.1530
0.3500, 0.0000, 0.4130, 0.0810, 0.1680, 0.1028, 0.3700, 0.5000, 0.4949, 0.0940, 0.3460
0.4100, 1.0000, 0.2120, 0.1020, 0.1840, 0.1004, 0.6400, 0.3000, 0.4585, 0.0790, 0.0630
0.7000, 1.0000, 0.2410, 0.0823, 0.1940, 0.1492, 0.3100, 0.6260, 0.4234, 0.1050, 0.0890
0.5200, 0.0000, 0.2300, 0.1070, 0.1790, 0.1237, 0.4250, 0.4210, 0.4159, 0.0930, 0.0500
0.6000, 0.0000, 0.2560, 0.0780, 0.1950, 0.0954, 0.9100, 0.2000, 0.3761, 0.0870, 0.0390
0.6200, 0.0000, 0.2250, 0.1250, 0.2150, 0.0990, 0.9800, 0.2000, 0.4500, 0.0950, 0.1030
0.4400, 1.0000, 0.3820, 0.1230, 0.2010, 0.1266, 0.4400, 0.5000, 0.5024, 0.0920, 0.3080
0.2800, 1.0000, 0.1920, 0.0810, 0.1550, 0.0946, 0.5100, 0.3000, 0.3850, 0.0870, 0.1160
0.5800, 1.0000, 0.2900, 0.0850, 0.1560, 0.1092, 0.3600, 0.4000, 0.3989, 0.0860, 0.1450
0.3900, 1.0000, 0.2400, 0.0897, 0.1900, 0.1136, 0.5200, 0.3650, 0.4804, 0.1010, 0.0740
0.3400, 1.0000, 0.2060, 0.0980, 0.1830, 0.0920, 0.8300, 0.2000, 0.3689, 0.0920, 0.0450
0.6500, 0.0000, 0.2630, 0.0700, 0.2440, 0.1662, 0.5100, 0.5000, 0.4898, 0.0980, 0.1150
0.6600, 1.0000, 0.3460, 0.1150, 0.2040, 0.1394, 0.3600, 0.6000, 0.4963, 0.1090, 0.2640
0.5100, 0.0000, 0.2340, 0.0870, 0.2200, 0.1088, 0.9300, 0.2000, 0.4511, 0.0820, 0.0870
0.5000, 1.0000, 0.2920, 0.1190, 0.1620, 0.0852, 0.5400, 0.3000, 0.4736, 0.0950, 0.2020
0.5900, 1.0000, 0.2720, 0.1070, 0.1580, 0.1020, 0.3900, 0.4000, 0.4443, 0.0930, 0.1270
0.5200, 0.0000, 0.2700, 0.0783, 0.1340, 0.0730, 0.4400, 0.3050, 0.4443, 0.0690, 0.1820
0.6900, 1.0000, 0.2450, 0.1080, 0.2430, 0.1364, 0.4000, 0.6000, 0.5808, 0.1000, 0.2410
0.5300, 0.0000, 0.2410, 0.1050, 0.1840, 0.1134, 0.4600, 0.4000, 0.4812, 0.0950, 0.0660
0.4700, 1.0000, 0.2530, 0.0980, 0.1730, 0.1056, 0.4400, 0.4000, 0.4762, 0.1080, 0.0940
0.5200, 0.0000, 0.2880, 0.1130, 0.2800, 0.1740, 0.6700, 0.4000, 0.5273, 0.0860, 0.2830
0.3900, 0.0000, 0.2090, 0.0950, 0.1500, 0.0656, 0.6800, 0.2000, 0.4407, 0.0950, 0.0640
0.6700, 1.0000, 0.2300, 0.0700, 0.1840, 0.1280, 0.3500, 0.5000, 0.4654, 0.0990, 0.1020
0.5900, 1.0000, 0.2410, 0.0960, 0.1700, 0.0986, 0.5400, 0.3000, 0.4466, 0.0850, 0.2000
0.5100, 1.0000, 0.2810, 0.1060, 0.2020, 0.1222, 0.5500, 0.4000, 0.4820, 0.0870, 0.2650
0.2300, 1.0000, 0.1800, 0.0780, 0.1710, 0.0960, 0.4800, 0.4000, 0.4905, 0.0920, 0.0940
0.6800, 0.0000, 0.2590, 0.0930, 0.2530, 0.1812, 0.5300, 0.5000, 0.4543, 0.0980, 0.2300
0.4400, 0.0000, 0.2150, 0.0850, 0.1570, 0.0922, 0.5500, 0.3000, 0.3892, 0.0840, 0.1810
0.6000, 1.0000, 0.2430, 0.1030, 0.1410, 0.0866, 0.3300, 0.4000, 0.4673, 0.0780, 0.1560
0.5200, 0.0000, 0.2450, 0.0900, 0.1980, 0.1290, 0.2900, 0.7000, 0.5298, 0.0860, 0.2330
0.3800, 0.0000, 0.2130, 0.0720, 0.1650, 0.0602, 0.8800, 0.2000, 0.4431, 0.0900, 0.0600
0.6100, 0.0000, 0.2580, 0.0900, 0.2800, 0.1954, 0.5500, 0.5000, 0.4997, 0.0900, 0.2190
0.6800, 1.0000, 0.2480, 0.1010, 0.2210, 0.1514, 0.6000, 0.4000, 0.3871, 0.0870, 0.0800
0.2800, 1.0000, 0.3150, 0.0830, 0.2280, 0.1494, 0.3800, 0.6000, 0.5313, 0.0830, 0.0680
0.6500, 1.0000, 0.3350, 0.1020, 0.1900, 0.1262, 0.3500, 0.5000, 0.4970, 0.1020, 0.3320
0.6900, 0.0000, 0.2810, 0.1130, 0.2340, 0.1428, 0.5200, 0.4000, 0.5278, 0.0770, 0.2480
0.5100, 0.0000, 0.2430, 0.0853, 0.1530, 0.0716, 0.7100, 0.2150, 0.3951, 0.0820, 0.0840
0.2900, 0.0000, 0.3500, 0.0983, 0.2040, 0.1426, 0.5000, 0.4080, 0.4043, 0.0910, 0.2000
0.5500, 1.0000, 0.2350, 0.0930, 0.1770, 0.1268, 0.4100, 0.4000, 0.3829, 0.0830, 0.0550
0.3400, 1.0000, 0.3000, 0.0830, 0.1850, 0.1072, 0.5300, 0.3000, 0.4820, 0.0920, 0.0850
0.6700, 0.0000, 0.2070, 0.0830, 0.1700, 0.0998, 0.5900, 0.3000, 0.4025, 0.0770, 0.0890
0.4900, 0.0000, 0.2560, 0.0760, 0.1610, 0.0998, 0.5100, 0.3000, 0.3932, 0.0780, 0.0310
0.5500, 1.0000, 0.2290, 0.0810, 0.1230, 0.0672, 0.4100, 0.3000, 0.4304, 0.0880, 0.1290
0.5900, 1.0000, 0.2510, 0.0900, 0.1630, 0.1014, 0.4600, 0.4000, 0.4357, 0.0910, 0.0830
0.5300, 0.0000, 0.3320, 0.0827, 0.1860, 0.1068, 0.4600, 0.4040, 0.5112, 0.1020, 0.2750
0.4800, 1.0000, 0.2410, 0.1100, 0.2090, 0.1346, 0.5800, 0.4000, 0.4407, 0.1000, 0.0650
0.5200, 0.0000, 0.2950, 0.1043, 0.2110, 0.1328, 0.4900, 0.4310, 0.4984, 0.0980, 0.1980
0.6900, 0.0000, 0.2960, 0.1220, 0.2310, 0.1284, 0.5600, 0.4000, 0.5451, 0.0860, 0.2360
0.6000, 1.0000, 0.2280, 0.1100, 0.2450, 0.1898, 0.3900, 0.6000, 0.4394, 0.0880, 0.2530
0.4600, 1.0000, 0.2270, 0.0830, 0.1830, 0.1258, 0.3200, 0.6000, 0.4836, 0.0750, 0.1240
0.5100, 1.0000, 0.2620, 0.1010, 0.1610, 0.0996, 0.4800, 0.3000, 0.4205, 0.0880, 0.0440
0.6700, 1.0000, 0.2350, 0.0960, 0.2070, 0.1382, 0.4200, 0.5000, 0.4898, 0.1110, 0.1720
0.4900, 0.0000, 0.2210, 0.0850, 0.1360, 0.0634, 0.6200, 0.2190, 0.3970, 0.0720, 0.1140
0.4600, 1.0000, 0.2650, 0.0940, 0.2470, 0.1602, 0.5900, 0.4000, 0.4935, 0.1110, 0.1420
0.4700, 0.0000, 0.3240, 0.1050, 0.1880, 0.1250, 0.4600, 0.4090, 0.4443, 0.0990, 0.1090
0.7500, 0.0000, 0.3010, 0.0780, 0.2220, 0.1542, 0.4400, 0.5050, 0.4779, 0.0970, 0.1800
0.2800, 0.0000, 0.2420, 0.0930, 0.1740, 0.1064, 0.5400, 0.3000, 0.4220, 0.0840, 0.1440
0.6500, 1.0000, 0.3130, 0.1100, 0.2130, 0.1280, 0.4700, 0.5000, 0.5247, 0.0910, 0.1630
0.4200, 0.0000, 0.3010, 0.0910, 0.1820, 0.1148, 0.4900, 0.4000, 0.4511, 0.0820, 0.1470
0.5100, 0.0000, 0.2450, 0.0790, 0.2120, 0.1286, 0.6500, 0.3000, 0.4522, 0.0910, 0.0970
0.5300, 1.0000, 0.2770, 0.0950, 0.1900, 0.1018, 0.4100, 0.5000, 0.5464, 0.1010, 0.2200
0.5400, 0.0000, 0.2320, 0.1107, 0.2380, 0.1628, 0.4800, 0.4960, 0.4913, 0.1080, 0.1900
0.7300, 0.0000, 0.2700, 0.1020, 0.2110, 0.1210, 0.6700, 0.3000, 0.4745, 0.0990, 0.1090
0.5400, 0.0000, 0.2680, 0.1080, 0.1760, 0.0806, 0.6700, 0.3000, 0.4956, 0.1060, 0.1910
0.4200, 0.0000, 0.2920, 0.0930, 0.2490, 0.1742, 0.4500, 0.6000, 0.5004, 0.0920, 0.1220
0.7500, 0.0000, 0.3120, 0.1177, 0.2290, 0.1388, 0.2900, 0.7900, 0.5724, 0.1060, 0.2300
0.5500, 1.0000, 0.3210, 0.1127, 0.2070, 0.0924, 0.2500, 0.8280, 0.6105, 0.1110, 0.2420
0.6800, 1.0000, 0.2570, 0.1090, 0.2330, 0.1126, 0.3500, 0.7000, 0.6057, 0.1050, 0.2480
0.5700, 0.0000, 0.2690, 0.0980, 0.2460, 0.1652, 0.3800, 0.7000, 0.5366, 0.0960, 0.2490
0.4800, 0.0000, 0.3140, 0.0753, 0.2420, 0.1516, 0.3800, 0.6370, 0.5568, 0.1030, 0.1920
0.6100, 1.0000, 0.2560, 0.0850, 0.1840, 0.1162, 0.3900, 0.5000, 0.4970, 0.0980, 0.1310
0.6900, 0.0000, 0.3700, 0.1030, 0.2070, 0.1314, 0.5500, 0.4000, 0.4635, 0.0900, 0.2370
0.3800, 0.0000, 0.3260, 0.0770, 0.1680, 0.1006, 0.4700, 0.4000, 0.4625, 0.0960, 0.0780
0.4500, 1.0000, 0.2120, 0.0940, 0.1690, 0.0968, 0.5500, 0.3000, 0.4454, 0.1020, 0.1350
0.5100, 1.0000, 0.2920, 0.1070, 0.1870, 0.1390, 0.3200, 0.6000, 0.4382, 0.0950, 0.2440
0.7100, 1.0000, 0.2400, 0.0840, 0.1380, 0.0858, 0.3900, 0.4000, 0.4190, 0.0900, 0.1990
0.5700, 0.0000, 0.3610, 0.1170, 0.1810, 0.1082, 0.3400, 0.5000, 0.5268, 0.1000, 0.2700
0.5600, 1.0000, 0.2580, 0.1030, 0.1770, 0.1144, 0.3400, 0.5000, 0.4963, 0.0990, 0.1640
0.3200, 1.0000, 0.2200, 0.0880, 0.1370, 0.0786, 0.4800, 0.3000, 0.3951, 0.0780, 0.0720
0.5000, 0.0000, 0.2190, 0.0910, 0.1900, 0.1112, 0.6700, 0.3000, 0.4078, 0.0770, 0.0960
0.4300, 0.0000, 0.3430, 0.0840, 0.2560, 0.1726, 0.3300, 0.8000, 0.5529, 0.1040, 0.3060
0.5400, 1.0000, 0.2520, 0.1150, 0.1810, 0.1200, 0.3900, 0.5000, 0.4701, 0.0920, 0.0910
0.3100, 0.0000, 0.2330, 0.0850, 0.1900, 0.1308, 0.4300, 0.4000, 0.4394, 0.0770, 0.2140
0.5600, 0.0000, 0.2570, 0.0800, 0.2440, 0.1516, 0.5900, 0.4000, 0.5118, 0.0950, 0.0950
0.4400, 0.0000, 0.2510, 0.1330, 0.1820, 0.1130, 0.5500, 0.3000, 0.4249, 0.0840, 0.2160
0.5700, 1.0000, 0.3190, 0.1110, 0.1730, 0.1162, 0.4100, 0.4000, 0.4369, 0.0870, 0.2630

Test data:


# diabetes_norm_test_100.txt
#
0.6400, 1.0000, 0.2840, 0.1110, 0.1840, 0.1270, 0.4100, 0.4000, 0.4382, 0.0970, 0.1780
0.4300, 0.0000, 0.2810, 0.1210, 0.1920, 0.1210, 0.6000, 0.3000, 0.4007, 0.0930, 0.1130
0.1900, 0.0000, 0.2530, 0.0830, 0.2250, 0.1566, 0.4600, 0.5000, 0.4719, 0.0840, 0.2000
0.7100, 1.0000, 0.2610, 0.0850, 0.2200, 0.1524, 0.4700, 0.5000, 0.4635, 0.0910, 0.1390
0.5000, 1.0000, 0.2800, 0.1040, 0.2820, 0.1968, 0.4400, 0.6000, 0.5328, 0.0950, 0.1390
0.5900, 1.0000, 0.2360, 0.0730, 0.1800, 0.1074, 0.5100, 0.4000, 0.4682, 0.0840, 0.0880
0.5700, 0.0000, 0.2450, 0.0930, 0.1860, 0.0966, 0.7100, 0.3000, 0.4522, 0.0910, 0.1480
0.4900, 1.0000, 0.2100, 0.0820, 0.1190, 0.0854, 0.2300, 0.5000, 0.3970, 0.0740, 0.0880
0.4100, 1.0000, 0.3200, 0.1260, 0.1980, 0.1042, 0.4900, 0.4000, 0.5412, 0.1240, 0.2430
0.2500, 1.0000, 0.2260, 0.0850, 0.1300, 0.0710, 0.4800, 0.3000, 0.4007, 0.0810, 0.0710
0.5200, 1.0000, 0.1970, 0.0810, 0.1520, 0.0534, 0.8200, 0.2000, 0.4419, 0.0820, 0.0770
0.3400, 0.0000, 0.2120, 0.0840, 0.2540, 0.1134, 0.5200, 0.5000, 0.6094, 0.0920, 0.1090
0.4200, 1.0000, 0.3060, 0.1010, 0.2690, 0.1722, 0.5000, 0.5000, 0.5455, 0.1060, 0.2720
0.2800, 1.0000, 0.2550, 0.0990, 0.1620, 0.1016, 0.4600, 0.4000, 0.4277, 0.0940, 0.0600
0.4700, 1.0000, 0.2330, 0.0900, 0.1950, 0.1258, 0.5400, 0.4000, 0.4331, 0.0730, 0.0540
0.3200, 1.0000, 0.3100, 0.1000, 0.1770, 0.0962, 0.4500, 0.4000, 0.5187, 0.0770, 0.2210
0.4300, 0.0000, 0.1850, 0.0870, 0.1630, 0.0936, 0.6100, 0.2670, 0.3738, 0.0800, 0.0900
0.5900, 1.0000, 0.2690, 0.1040, 0.1940, 0.1266, 0.4300, 0.5000, 0.4804, 0.1060, 0.3110
0.5300, 0.0000, 0.2830, 0.1010, 0.1790, 0.1070, 0.4800, 0.4000, 0.4788, 0.1010, 0.2810
0.6000, 0.0000, 0.2570, 0.1030, 0.1580, 0.0846, 0.6400, 0.2000, 0.3850, 0.0970, 0.1820
0.5400, 1.0000, 0.3610, 0.1150, 0.1630, 0.0984, 0.4300, 0.4000, 0.4682, 0.1010, 0.3210
0.3500, 1.0000, 0.2410, 0.0947, 0.1550, 0.0974, 0.3200, 0.4840, 0.4852, 0.0940, 0.0580
0.4900, 1.0000, 0.2580, 0.0890, 0.1820, 0.1186, 0.3900, 0.5000, 0.4804, 0.1150, 0.2620
0.5800, 0.0000, 0.2280, 0.0910, 0.1960, 0.1188, 0.4800, 0.4000, 0.4984, 0.1150, 0.2060
0.3600, 1.0000, 0.3910, 0.0900, 0.2190, 0.1358, 0.3800, 0.6000, 0.5421, 0.1030, 0.2330
0.4600, 1.0000, 0.4220, 0.0990, 0.2110, 0.1370, 0.4400, 0.5000, 0.5011, 0.0990, 0.2420
0.4400, 1.0000, 0.2660, 0.0990, 0.2050, 0.1090, 0.4300, 0.5000, 0.5580, 0.1110, 0.1230
0.4600, 0.0000, 0.2990, 0.0830, 0.1710, 0.1130, 0.3800, 0.4500, 0.4585, 0.0980, 0.1670
0.5400, 0.0000, 0.2100, 0.0780, 0.1880, 0.1074, 0.7000, 0.3000, 0.3970, 0.0730, 0.0630
0.6300, 1.0000, 0.2550, 0.1090, 0.2260, 0.1032, 0.4600, 0.5000, 0.5951, 0.0870, 0.1970
0.4100, 1.0000, 0.2420, 0.0900, 0.1990, 0.1236, 0.5700, 0.4000, 0.4522, 0.0860, 0.0710
0.2800, 0.0000, 0.2540, 0.0930, 0.1410, 0.0790, 0.4900, 0.3000, 0.4174, 0.0910, 0.1680
0.1900, 0.0000, 0.2320, 0.0750, 0.1430, 0.0704, 0.5200, 0.3000, 0.4635, 0.0720, 0.1400
0.6100, 1.0000, 0.2610, 0.1260, 0.2150, 0.1298, 0.5700, 0.4000, 0.4949, 0.0960, 0.2170
0.4800, 0.0000, 0.3270, 0.0930, 0.2760, 0.1986, 0.4300, 0.6420, 0.5148, 0.0910, 0.1210
0.5400, 1.0000, 0.2730, 0.1000, 0.2000, 0.1440, 0.3300, 0.6000, 0.4745, 0.0760, 0.2350
0.5300, 1.0000, 0.2660, 0.0930, 0.1850, 0.1224, 0.3600, 0.5000, 0.4890, 0.0820, 0.2450
0.4800, 0.0000, 0.2280, 0.1010, 0.1100, 0.0416, 0.5600, 0.2000, 0.4127, 0.0970, 0.0400
0.5300, 0.0000, 0.2880, 0.1117, 0.1450, 0.0872, 0.4600, 0.3150, 0.4078, 0.0850, 0.0520
0.2900, 1.0000, 0.1810, 0.0730, 0.1580, 0.0990, 0.4100, 0.4000, 0.4500, 0.0780, 0.1040
0.6200, 0.0000, 0.3200, 0.0880, 0.1720, 0.0690, 0.3800, 0.4000, 0.5784, 0.1000, 0.1320
0.5000, 1.0000, 0.2370, 0.0920, 0.1660, 0.0970, 0.5200, 0.3000, 0.4443, 0.0930, 0.0880
0.5800, 1.0000, 0.2360, 0.0960, 0.2570, 0.1710, 0.5900, 0.4000, 0.4905, 0.0820, 0.0690
0.5500, 1.0000, 0.2460, 0.1090, 0.1430, 0.0764, 0.5100, 0.3000, 0.4357, 0.0880, 0.2190
0.5400, 0.0000, 0.2260, 0.0900, 0.1830, 0.1042, 0.6400, 0.3000, 0.4304, 0.0920, 0.0720
0.3600, 0.0000, 0.2780, 0.0730, 0.1530, 0.1044, 0.4200, 0.4000, 0.3497, 0.0730, 0.2010
0.6300, 1.0000, 0.2410, 0.1110, 0.1840, 0.1122, 0.4400, 0.4000, 0.4935, 0.0820, 0.1100
0.4700, 1.0000, 0.2650, 0.0700, 0.1810, 0.1048, 0.6300, 0.3000, 0.4190, 0.0700, 0.0510
0.5100, 1.0000, 0.3280, 0.1120, 0.2020, 0.1006, 0.3700, 0.5000, 0.5775, 0.1090, 0.2770
0.4200, 0.0000, 0.1990, 0.0760, 0.1460, 0.0832, 0.5500, 0.3000, 0.3664, 0.0790, 0.0630
0.3700, 1.0000, 0.2360, 0.0940, 0.2050, 0.1388, 0.5300, 0.4000, 0.4190, 0.1070, 0.1180
0.2800, 0.0000, 0.2210, 0.0820, 0.1680, 0.1006, 0.5400, 0.3000, 0.4205, 0.0860, 0.0690
0.5800, 0.0000, 0.2810, 0.1110, 0.1980, 0.0806, 0.3100, 0.6000, 0.6068, 0.0930, 0.2730
0.3200, 0.0000, 0.2650, 0.0860, 0.1840, 0.1016, 0.5300, 0.4000, 0.4990, 0.0780, 0.2580
0.2500, 1.0000, 0.2350, 0.0880, 0.1430, 0.0808, 0.5500, 0.3000, 0.3584, 0.0830, 0.0430
0.6300, 0.0000, 0.2600, 0.0857, 0.1550, 0.0782, 0.4600, 0.3370, 0.5037, 0.0970, 0.1980
0.5200, 0.0000, 0.2780, 0.0850, 0.2190, 0.1360, 0.4900, 0.4000, 0.5136, 0.0750, 0.2420
0.6500, 1.0000, 0.2850, 0.1090, 0.2010, 0.1230, 0.4600, 0.4000, 0.5075, 0.0960, 0.2320
0.4200, 0.0000, 0.3060, 0.1210, 0.1760, 0.0928, 0.6900, 0.3000, 0.4263, 0.0890, 0.1750
0.5300, 0.0000, 0.2220, 0.0780, 0.1640, 0.0810, 0.7000, 0.2000, 0.4174, 0.1010, 0.0930
0.7900, 1.0000, 0.2330, 0.0880, 0.1860, 0.1284, 0.3300, 0.6000, 0.4812, 0.1020, 0.1680
0.4300, 0.0000, 0.3540, 0.0930, 0.1850, 0.1002, 0.4400, 0.4000, 0.5318, 0.1010, 0.2750
0.4400, 0.0000, 0.3140, 0.1150, 0.1650, 0.0976, 0.5200, 0.3000, 0.4344, 0.0890, 0.2930
0.6200, 1.0000, 0.3780, 0.1190, 0.1130, 0.0510, 0.3100, 0.4000, 0.5043, 0.0840, 0.2810
0.3300, 0.0000, 0.1890, 0.0700, 0.1620, 0.0918, 0.5900, 0.3000, 0.4025, 0.0580, 0.0720
0.5600, 0.0000, 0.3500, 0.0793, 0.1950, 0.1408, 0.4200, 0.4640, 0.4111, 0.0960, 0.1400
0.6600, 0.0000, 0.2170, 0.1260, 0.2120, 0.1278, 0.4500, 0.4710, 0.5278, 0.1010, 0.1890
0.3400, 1.0000, 0.2530, 0.1110, 0.2300, 0.1620, 0.3900, 0.6000, 0.4977, 0.0900, 0.1810
0.4600, 1.0000, 0.2380, 0.0970, 0.2240, 0.1392, 0.4200, 0.5000, 0.5366, 0.0810, 0.2090
0.5000, 0.0000, 0.3180, 0.0820, 0.1360, 0.0692, 0.5500, 0.2000, 0.4078, 0.0850, 0.1360
0.6900, 0.0000, 0.3430, 0.1130, 0.2000, 0.1238, 0.5400, 0.4000, 0.4710, 0.1120, 0.2610
0.3400, 0.0000, 0.2630, 0.0870, 0.1970, 0.1200, 0.6300, 0.3000, 0.4249, 0.0960, 0.1130
0.7100, 1.0000, 0.2700, 0.0933, 0.2690, 0.1902, 0.4100, 0.6560, 0.5242, 0.0930, 0.1310
0.4700, 0.0000, 0.2720, 0.0800, 0.2080, 0.1456, 0.3800, 0.6000, 0.4804, 0.0920, 0.1740
0.4100, 0.0000, 0.3380, 0.1233, 0.1870, 0.1270, 0.4500, 0.4160, 0.4318, 0.1000, 0.2570
0.3400, 0.0000, 0.3300, 0.0730, 0.1780, 0.1146, 0.5100, 0.3490, 0.4127, 0.0920, 0.0550
0.5100, 0.0000, 0.2410, 0.0870, 0.2610, 0.1756, 0.6900, 0.4000, 0.4407, 0.0930, 0.0840
0.4300, 0.0000, 0.2130, 0.0790, 0.1410, 0.0788, 0.5300, 0.3000, 0.3829, 0.0900, 0.0420
0.5500, 0.0000, 0.2300, 0.0947, 0.1900, 0.1376, 0.3800, 0.5000, 0.4277, 0.1060, 0.1460
0.5900, 1.0000, 0.2790, 0.1010, 0.2180, 0.1442, 0.3800, 0.6000, 0.5187, 0.0950, 0.2120
0.2700, 1.0000, 0.3360, 0.1100, 0.2460, 0.1566, 0.5700, 0.4000, 0.5088, 0.0890, 0.2330
0.5100, 1.0000, 0.2270, 0.1030, 0.2170, 0.1624, 0.3000, 0.7000, 0.4812, 0.0800, 0.0910
0.4900, 1.0000, 0.2740, 0.0890, 0.1770, 0.1130, 0.3700, 0.5000, 0.4905, 0.0970, 0.1110
0.2700, 0.0000, 0.2260, 0.0710, 0.1160, 0.0434, 0.5600, 0.2000, 0.4419, 0.0790, 0.1520
0.5700, 1.0000, 0.2320, 0.1073, 0.2310, 0.1594, 0.4100, 0.5630, 0.5030, 0.1120, 0.1200
0.3900, 1.0000, 0.2690, 0.0930, 0.1360, 0.0754, 0.4800, 0.3000, 0.4143, 0.0990, 0.0670
0.6200, 1.0000, 0.3460, 0.1200, 0.2150, 0.1292, 0.4300, 0.5000, 0.5366, 0.1230, 0.3100
0.3700, 0.0000, 0.2330, 0.0880, 0.2230, 0.1420, 0.6500, 0.3400, 0.4357, 0.0820, 0.0940
0.4600, 0.0000, 0.2110, 0.0800, 0.2050, 0.1444, 0.4200, 0.5000, 0.4533, 0.0870, 0.1830
0.6800, 1.0000, 0.2350, 0.1010, 0.1620, 0.0854, 0.5900, 0.3000, 0.4477, 0.0910, 0.0660
0.5100, 0.0000, 0.3150, 0.0930, 0.2310, 0.1440, 0.4900, 0.4700, 0.5252, 0.1170, 0.1730
0.4100, 0.0000, 0.2080, 0.0860, 0.2230, 0.1282, 0.8300, 0.3000, 0.4078, 0.0890, 0.0720
0.5300, 0.0000, 0.2650, 0.0970, 0.1930, 0.1224, 0.5800, 0.3000, 0.4143, 0.0990, 0.0490
0.4500, 0.0000, 0.2420, 0.0830, 0.1770, 0.1184, 0.4500, 0.4000, 0.4220, 0.0820, 0.0640
0.3300, 0.0000, 0.1950, 0.0800, 0.1710, 0.0854, 0.7500, 0.2000, 0.3970, 0.0800, 0.0480
0.6000, 1.0000, 0.2820, 0.1120, 0.1850, 0.1138, 0.4200, 0.4000, 0.4984, 0.0930, 0.1780
0.4700, 1.0000, 0.2490, 0.0750, 0.2250, 0.1660, 0.4200, 0.5000, 0.4443, 0.1020, 0.1040
0.6000, 1.0000, 0.2490, 0.0997, 0.1620, 0.1066, 0.4300, 0.3770, 0.4127, 0.0950, 0.1320
0.3600, 0.0000, 0.3000, 0.0950, 0.2010, 0.1252, 0.4200, 0.4790, 0.5130, 0.0850, 0.2200
0.3600, 0.0000, 0.1960, 0.0710, 0.2500, 0.1332, 0.9700, 0.3000, 0.4595, 0.0920, 0.0570
Posted in Machine Learning, Scikit | Leave a comment

Using Trimmed Kernel Ridge Regression to Approximate Support Vector Regression With C#

Bottom line: I refactored a C# implementation of trimmed kernel ridge regression to approximate support vector regression. The old version had ugly repeated code in two different SGD training phases, so I created a helper method to remove the code repeated-ness.

As is often the case with interesting topics, explaining what the problem is, takes longer than explaining the solution. So bear with me. The goal of a machine learning regression problem is to predict a single numeric value. For example, a bank might want to predict a maximum loan amount based on applicant age, sex, annual income, debt, and so on.

Common regression techniques include linear regression, quadratic regression, nearest neighbors regression, kernel ridge regression, Gaussian process regression, kernel support vector regression, neural network regression, random forest regression, and gradient boost regression. Each technique has dozens of variations, and each technique has pros and cons.

Two closely related machine learning regression techniques are kernel ridge regression (KRR) and support vector regression (SVR). Both techniques use a kernel function (usually the radial basis function, RBF) to compare two data items for similarity. Both techniques must store training data in order to make predictions. KRR must store all training items, while SVR eliminates some of the items during training, leaving just the “support vectors” that need to be stored. So, SVR uses less memory than KRR.

On the other hand, the loss function used by SVR is not differentiable, so SVR cannot be trained using stochastic gradient descent, and therefore, SVR does not scale well to very large datasets. So, KRR is easier to train than SVR.

Note: In addition to the kernel support vector regression technique discussed in this blog post, there is a linear support vector regression technique which is absolutely useless in practice.

One morning before work, I got the idea of combining KRR and SVR. Briefly, I train a KRR model as normal using all data. Then I identify training data items that are predicted “too well” and remove them, leaving just pseudo support vectors. The somewhat counterintuitive SVR idea is that items that are predicted too well don’t help model accuracy very much. Then I retrain a new KRR model using only the reduced training data. This gives a trimmed/sparse KRR model that approximates an SVR model.

In my mind, this technique gives the advantages of KRR (ability to handle very large datasets via SGD training) and the advantages of SVR (fewer stored model items and weights than KRR). I first implemented the idea using the scikit KernelRidge module and the idea worked well. Then I implemented the idea using C#, just to see if I could — using C# I had to implement a lot of code from scratch that is built-in to Python and scikit. The C# code worked but it was clunky so I refactored the clunky C# code to make it more rational. But the refactored C# code was awkward because the SGD part was repeated in the preliminary training phase and the second/final training phase. I refactored the code to avoid the duplicate code.

Let me point out that the idea of trimming a KRR model to approximate an SVR model is fairly obvious, and so I’m reasonably sure other people have used the idea, but I could find no examples or discussion on the Internet.

The output of the new, re-re-factored C# demo is:

Begin kernel ridge regression trim approximation
 to SVR using C#

Loading train (200) and test (40) data
Done

First three X predictors:
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065
   0.0776  -0.1616   0.3704  -0.5911   0.7562
  -0.9452   0.3409  -0.1654   0.1174  -0.7192

First three target y:
  0.4840
  0.1568
  0.8054

Creating KRR model
Setting RBF gamma = 0.30
Done

Setting lrnRate = 0.0500
Setting maxEpochs = 5000
Setting alpha decay =  0.00001
Setting auto-exit tol = 0.0010
Setting trim epsilon = 0.0035

Training model
epoch =      0 MSE = 0.0181 acc = 0.1700
Auto-exit at epoch 723
epoch =      0 MSE = 0.0377 acc = 0.1009
Auto-exit at epoch 707
Done

Final model weights:
 -0.7406  -0.4260  -0.3825 . . . 
. . .      0.3299   0.8800

Evaluating trimmed KRR model
Number model item/weights = 109

Train acc = 0.9817
Test acc = 0.9500

Train MSE = 0.0001
Test MSE = 0.0003

End KRR trim/prune demo

The demo data is synthetic. There are 200 training items and 40 test items. The demo data was generated by a neural network with random weights and biases. The data is normalized, which is strongly recommended for KRR and SVR because items are compared mathematically and you don’t want a column with huge magnitudes to dominate. Both KRR and SVR can handle non-numeric data, but that’s another story.

The key calling statements are:

double gamma = 0.30;
KRR krr = new KRR(gamma);

double lrnRate = 0.05;
int maxEpochs = 5000;
double alpha = 1.0e-5;  // wt decay regularization
double exitTol = 0.001;  // auto-exit
double epsilon = 0.0035; // trimming
// larger epsilon: fewer wts retained
// smaller epsilon: more wts retained
krr.Train(trainX, trainY, lrnRate, maxEpochs, alpha,
  exitTol, epsilon);

The large number of parameters is a significant weakness of KRR (and SVR too).

After training on 200 data items, those items that were predicted with a small error of less than 0.0035 were removed. There were 91 such well-predicted items, leaving a reduced training dataset of 109 items. These would be called the support vectors in SVR. After training a new KRR model on the reduced dataset, the trimmed/sparse KRR model scored 98.17% accuracy on the reduced training data (107 out of 109 correct) and 95.00% accuracy on the test data (38 out of 40 correct). Very nice.

To validate my C# trimmed KRR approximation to SVR, I ran the data through the scikit SVR module. A grid-search optimized scikit SVR model created 90 support vectors (better than the trimmed KRR 109 items), with 96.50% accuracy on the training data (not quite as good as the C# trimmed KRR model), and 95.00% accuracy on the test data (identical to the trimmed KRR model). In short, the C# trimmed KRR model and the SVR model are very similar. The large number of hyperparameters involved with KRR and SVR makes an exact comparison impossible in practice.

An interesting experiment.



It’s common for me to refactor the software systems I create many times. It’s rarely possible to a get a non-trivial system completely correct on the first effort, and so a system is usually a collection of software sequels, so to speak, where each sequel is a bit better than its predecessor.

I’m a big fan of science fiction movies. There have been many sci fi sequels. Star Wars (1977) and Star Wars: The Empire Strikes Back (1980). Alien (1979) and Aliens (1986). And so on. But most movie sequels are worse than their predecessor, however there are exceptions where the sequel is better than the original.

Left: In “Pitch Black” (2000), a group of people crash land on a planet that is infested by dangerous creatures. The survivors are picked off one by one. A classic movie theme. My grade = C+.

Right: In the sequel, “The Chronicles of Riddick” (2004), an alien race, called the Necromongers, are religious fanatics. The story is wildly creative, with very good special effects, great set design, and OK acting. The movie is not well-liked by critics, typically about 1 star out of 4. But I like it a lot and give it my personal B+ grade.


The (re-re-factored) C# trimmed KRR demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols. (My blog editor chokes on symbols).

using System;
using System.IO;
using System.Collections.Generic;

namespace KernelRidgeRegressionTrim
{
  internal class KernelRidgeRegressionProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin kernel ridge " +
        "regression trim approximation to SVR using C# ");

      Console.WriteLine("\nLoading train (200) and" +
        " test (40) data ");
      string trainFile =
        "..\\..\\..\\Data\\synthetic_train_200.txt";
      double[][] trainX =
        Utils.MatLoad(trainFile,
        new int[] { 0, 1, 2, 3, 4 }, ',', "#");
      double[] trainY =
        Utils.MatToVec(Utils.MatLoad(trainFile,
        new int[] { 5 }, ',', "#"));

      string testFile =
        "..\\..\\..\\Data\\synthetic_test_40.txt";
      double[][] testX =
        Utils.MatLoad(testFile,
        new int[] { 0, 1, 2, 3, 4 }, ',', "#");
      double[] testY =
        Utils.MatToVec(Utils.MatLoad(testFile,
        new int[] { 5 }, ',', "#"));
      Console.WriteLine("Done ");

      Console.WriteLine("\nFirst three X predictors: ");
      for (int i = 0; i "lt" 3; ++i)
        Utils.VecShow(trainX[i], 4, 9);
      Console.WriteLine("\nFirst three target y: ");
      for (int i = 0; i "lt" 3; ++i)
        Console.WriteLine(trainY[i].ToString("F4").
          PadLeft(8));

      Console.WriteLine("\nCreating KRR model ");
      double gamma = 0.3;    // RBF param
      Console.WriteLine("Setting RBF gamma = " +
        gamma.ToString("F2"));
      KRR krr = new KRR(gamma);
      Console.WriteLine("Done ");

      double lrnRate = 0.05;
      int maxEpochs = 5000;
      double alpha = 1.0e-5;  // wt decay regularization
      double exitTol = 0.001;  // auto-exit
      double epsilon = 0.0035; // trimming
      // larger epsilon: fewer wts retained
      // smaller epsilon: more wts retained
      Console.WriteLine("\nSetting lrnRate = " +
        lrnRate.ToString("F4"));
      Console.WriteLine("Setting maxEpochs = " + maxEpochs);
      Console.WriteLine("Setting alpha decay =  " +
        alpha.ToString("F5"));
      Console.WriteLine("Setting auto-exit tol = " +
        exitTol.ToString("F4"));
      Console.WriteLine("Setting trim epsilon = " +
        epsilon.ToString("F4"));
      Console.WriteLine("\nTraining model ");
      krr.Train(trainX, trainY, lrnRate, maxEpochs, alpha,
        exitTol, epsilon);
      Console.WriteLine("Done ");
      Console.WriteLine("\nFinal model weights: ");
      Utils.VecShow(krr.wts, 4, 9);

      Console.WriteLine("\nEvaluating trimmed KRR model ");
      Console.WriteLine("Number model item/weights = " +
        krr.wts.Length);
      double trainAcc = 
        krr.Accuracy(krr.trainX, krr.trainY, 0.10);
      double testAcc = krr.Accuracy(testX, testY, 0.10);

      Console.WriteLine("\nTrain acc = " +
        trainAcc.ToString("F4"));
      Console.WriteLine("Test acc = " +
        testAcc.ToString("F4"));

      double trainMSE = krr.MSE(krr.trainX, krr.trainY);
      double testMSE = krr.MSE(testX, testY);

      Console.WriteLine("\nTrain MSE = " +
        trainMSE.ToString("F4"));
      Console.WriteLine("Test MSE = " +
        testMSE.ToString("F4"));

      Console.WriteLine("\nEnd KRR trim/prune demo ");
      Console.ReadLine();
    } // Main()

  } // class Program

  /*
  Creating scikit SVR model
  Setting gamma = 0.1000
  Setting C = 10.0000
  Setting epsilon = 0.0100

  Number support vectors = 90
  Accuracy (within 0.10) train = 0.9650
  Accuracy (within 0.10) test = 0.9500
  MSE train = 0.0001
  MSE test = 0.0002 
  */

  // ========================================================

  public class KRR
  {
    public double gamma;  // for RBF kernel
    public double[][] trainX;  // need for prediction
    public double[] trainY; // not used this version
    public double[] wts;  // one per trainX item
    public Random rnd; // for SGD

    // ------------------------------------------------------

    public KRR(double gamma, int seed = 1)
    {
      this.gamma = gamma;
      this.trainX = new double[0][];
      this.trainY = new double[0];
      this.wts = new double[0];
      this.rnd = new Random(seed);  // shuffle train order
    } // ctor

    // ------------------------------------------------------

    private void Shuffle(int[] indices)
    {
      // Fisher-Yates for SGD
      for (int i = 0; i "lt" indices.Length; ++i)
      {
        int ri = this.rnd.Next(i, indices.Length);
        int tmp = indices[i];
        indices[i] = indices[ri];
        indices[ri] = tmp;
      }
    } // Shuffle

    // ------------------------------------------------------

    private void TrainSGD(double[][] trainX, double[] trainY,
     double lrnRate, int maxEpochs, double alpha,
     double exitTol, double epsilon)
    {
      // helper for Train()
      // assign to this.trainX, this.trainY, this.wts

      // trainx, trainY by ref:
      this.trainX = trainX;
      this.trainY = trainY;

      int freq = maxEpochs / 5;  // when to show progress

      // 1. init weights
      this.wts = new double[trainX.Length];
      double lo = -0.10; double hi = 0.10;
      for (int i = 0; i "lt" this.wts.Length; ++i)
        this.wts[i] = (hi - lo) *
          this.rnd.NextDouble() + lo;

      // 2. set up indices for shuffling
      int[] indices = new int[trainX.Length];
      for (int i = 0; i "lt" indices.Length; ++i)
        indices[i] = i;

      // 3. set up prev weights for auto-exit
      double[] prevWeights = new double[trainX.Length];
      for (int j = 0; j "lt" trainX.Length; ++j)
        prevWeights[j] = this.wts[j];

      // 4. SGD prelim train
      for (int epoch = 0; epoch "lt" maxEpochs; ++epoch)
      {
        Shuffle(indices);
        for (int i = 0; i "lt" trainX.Length; ++i)
        {
          int idx = indices[i];
          double[] x = trainX[idx];
          double predY = this.Predict(x);
          double actualY = trainY[idx];

          // update wt assoc with x
          this.wts[idx] -= lrnRate * (predY - actualY);
        } // each item

        if (epoch % freq == 0)
        {
          double rmse = this.MSE(trainX, trainY);
          double acc = this.Accuracy(trainX, trainY, 0.10);
          string s1 = "epoch = " +
            epoch.ToString().PadLeft(6);
          string s2 = " MSE = " +
            rmse.ToString("F4");
          string s3 = " acc = " + acc.ToString("F4");
          Console.WriteLine(s1 + s2 + s3);
        }
                
        // if max_change_in_wts / max_weights "lt" tol
        int numWts = this.wts.Length;
        double[] weightDeltas = new double[numWts];
        for (int j = 0; j "lt" numWts; ++j)
          weightDeltas[j] = 
            Math.Abs(prevWeights[j] - this.wts[j]);
        double maxChange = weightDeltas[0];
        for (int j = 0; j "lt" numWts; ++j)
          if (weightDeltas[j] "gt" maxChange)
            maxChange = weightDeltas[j];

        double maxWeightMag = Math.Abs(this.wts[0]);
        for (int j = 0; j "lt" numWts; ++j)
          if (Math.Abs(this.wts[j]) "gt" maxWeightMag)
            maxWeightMag = Math.Abs(this.wts[j]);

        if (maxWeightMag != 0.0 &&
          (maxChange / maxWeightMag) "lt" exitTol)
        {
          Console.WriteLine("Auto-exit at epoch " + epoch);
          break;
        }

        // an auto-exit didn't happen
        for (int j = 0; j "lt" numWts; ++j)
          prevWeights[j] = this.wts[j];
      
      } // each epoch

      // 5. apply one final wt decay regularization
      // as opposed to small decay after every epoch
      for (int j = 0; j "lt" this.wts.Length; ++j)
        this.wts[j] *= (1.0 - alpha);
    }

    // ------------------------------------------------------

    public void Train(double[][] trainX, double[] trainY,
     double lrnRate, int maxEpochs, double alpha,
     double exitTol, double epsilon)
    {
      // 1. preliminary train
      this.TrainSGD(trainX, trainY, lrnRate,
        maxEpochs, alpha, exitTol, epsilon);

      // 2. construct trimmed data
      int[] isSupportVec = new int[trainX.Length]; 
      for (int i = 0; i "lt" trainX.Length; ++i)
      {
        if (Math.Abs(this.Predict(trainX[i]) - 
          trainY[i]) "gt" epsilon)
          isSupportVec[i] = 1; // just OK prediction
      }
      List"lt"double[]"gt" tmpTrainX = 
        new List"lt"double[]"gt"();
      List"lt"double"gt" tmpTrainY = 
        new List"lt"double"gt"();
      for (int i = 0; i "lt" trainX.Length; ++i)
      {
        if (isSupportVec[i] == 1)
        {
          tmpTrainX.Add(trainX[i]);
          tmpTrainY.Add(trainY[i]);
        }
      }
      double[][] newTrainX = tmpTrainX.ToArray();
      double[] newTrainY = tmpTrainY.ToArray();
      
      // 3. re-train using new small dataset
      this.TrainSGD(newTrainX, newTrainY, lrnRate,
        maxEpochs, alpha, exitTol, epsilon);
    }

    // ------------------------------------------------------

    private double Rbf(double[] v1, double[] v2)
    {
      // the gamma version aot len_scale version
      int dim = v1.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" dim; ++i)
      {
        sum += (v1[i] - v2[i]) * (v1[i] - v2[i]);
      }
      return Math.Exp(-1 * this.gamma * sum);
    }

    // ------------------------------------------------------

    public double Predict(double[] x)
    {
      int N = this.trainX.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" N; ++i)
      {
        double[] xx = this.trainX[i];
        double k = this.Rbf(x, xx);
        sum += this.wts[i] * k;
      }
      return sum;
    }

    // ------------------------------------------------------

    public double Accuracy(double[][] dataX,
      double[] dataY, double pctClose)
    {
      int numCorrect = 0; int numWrong = 0;
      int n = dataX.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        double[] x = dataX[i];
        double actualY = dataY[i];
        double predY = this.Predict(x);
        if (Math.Abs(actualY - predY) "lt"
          Math.Abs(actualY * pctClose))
          ++numCorrect;
        else
          ++numWrong;
      }
      return (numCorrect * 1.0) / n;
    }

    // ------------------------------------------------------

    public double MSE(double[][] dataX, double[] dataY)
    {
      double sum = 0.0;
      int n = dataX.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        double[] x = dataX[i];
        double actualY = dataY[i];
        double predY = this.Predict(x);
        sum += (actualY - predY) * (actualY - predY);
      }
      return sum / n;
    }
  } // class KRR

  // ========================================================

  public class Utils
  {
    // ------------------------------------------------------

    public static double[][] MatLoad(string fn,
      int[] usecols, char sep, string comment)
    {
      List"lt"double[]"gt" result = 
        new List"lt"double[]"gt"();
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        string[] tokens = line.Split(sep);
        List"lt"double"gt" lst = new List"lt"double"gt"();
        for (int j = 0; j "lt" usecols.Length; ++j)
          lst.Add(double.Parse(tokens[usecols[j]]));
        double[] row = lst.ToArray();
        result.Add(row);
      }
      sr.Close(); ifs.Close();
      return result.ToArray();
    }

    // ------------------------------------------------------

    public static double[] MatToVec(double[][] mat)
    {
      int nRows = mat.Length;
      int nCols = mat[0].Length;
      double[] result = new double[nRows * nCols];
      int k = 0;
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[k++] = mat[i][j];
      return result;
    }

    // ------------------------------------------------------

    public static void MatShow(double[][] m, int dec,
      int wid)
    {
      int nRows = m.Length; int nCols = m[0].Length;
      double small = 1.0 / Math.Pow(10, dec);
      for (int i = 0; i "lt" nRows; ++i)
      {
        for (int j = 0; j "lt" nCols; ++j)
        {
          double v = m[i][j];
          if (Math.Abs(v) "lt" small) v = 0.0;
          Console.Write(v.ToString("F" + dec).
            PadLeft(wid));
        }
        Console.WriteLine("");
      }
    }

    // ------------------------------------------------------

    public static void VecShow(double[] vec, int dec,
      int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString("F" + dec).
        PadLeft(wid));
      Console.WriteLine("");
    }

    // ------------------------------------------------------

  } // class Utils

  // ========================================================

} // ns

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning | Leave a comment

Support Vector Regression with SGD Training From Scratch Using C#

The goal of a machine learning regression problem is to predict a single numeric value. For example, a bank might want to predict the maximum safe load amount for a customer, based on age, account balance, annual income, and so on.

One of about a dozen common regression techniques is (kernel) support vector regression (SVR). One day before work, I realized that I had not implemented SVR from scratch, using C#, for a long time. So I figured I’d take a stab at it.

There are three main ways to train a support regression model: quadratic programming (QP) optimization, the sequential minimal optimization (SMO) algorithm, and stochastic sub-gradient descent (SSGD / SGD). I used SGD, which is far by the simplest SVR training technique.

The output of my demo is:

Begin C# kernel support vector regression with SGD

Loading train (200) and test (40) data
Done

First three X predictors:
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065
   0.0776  -0.1616   0.3704  -0.5911   0.7562
  -0.9452   0.3409  -0.1654   0.1174  -0.7192

First three target y:
  0.4840
  0.1568
  0.8054

Creating SVR object
Setting RBF gamma = 0.3000
Setting epsilon = 0.007500
Setting C = 1.00
Setting lrnRate = 0.0010
Setting maxEpochs = 5000
Setting tol = 0.000100
Done

Training SVR model using SGD
epoch =      0 MSE = 0.0430 acc = 0.1300
epoch =   1000 MSE = 0.0001 acc = 0.9850
epoch =   2000 MSE = 0.0001 acc = 0.9850
epoch =   3000 MSE = 0.0001 acc = 0.9800
epoch =   4000 MSE = 0.0001 acc = 0.9800
Done


Model alpha (weights):
 -0.9256  -0.0443  -0.0041  -0.6581  . . .   0.0049
  0.2983   0.4573  -0.0488   0.0956  . . .   0.1471
. . .
  0.0017   0.0023   0.3193   0.0063  . . .  -0.0078
 -0.0690   0.9895   . . .    0.3843
Model bias = 0.4030

Number supp vectors = 194

Evaluating model

Train acc (within 0.10) = 0.9900
Test acc (within 0.10) = 0.9250

Train MSE = 0.0001
Test MSE = 0.0001

Train R2 = 0.9985
Test R2 = 0.9949

Predicting for trainX[0]
Predicted y = 0.4922

End SVR with SGD training demo

The demo data is synthetic. It was generated by a 5-10-1 neural network with random weights and bias values. The idea here is that the synthetic data does have an underlying, but complex, non-linear structure which can be predicted.

During training, the SVR model assigns one weight value, into a vector called alpha, to each training item, plus a special weight called the bias. After training, the SVR model determined that 6 of the 200 alpha weights were very close to zero, and so those 6 alpha values and their 6 associated training items were removed. This left 194 weight values and training items, called the support vectors.

All of the parameter values must be determined by trial and error. The gamma parameter defines the RBF function that is used to measure the similarity between data item vectors. Larger values of gamma shrinks the radius of influence of individual training points. This tends to increase model accuracy at the expense of increased risk of model overfitting.

The epsilon value defines how close to correct a prediction must be to be considered a non-support vector. Larger values of epsilon create fewer support vectors.

The C value is used for model regularization, which prevents model alpha weights from becoming very large, which often leads to model overfitting. Larger values of C have a smaller regularization affect.

The lrnRate value controls how much alpha weight values change at each update during training. Larger values of lrnRate increase the speed of training, at the risk of jumping over good weight values.

The maxEpochs value controls how many iterations are performed during training. The effect of larger values of SVR maxEpochs can vary greatly.

The tol (“tolerance”) value controls pruning away training vectors to support vectors, by defining how to close to 0 an alpha weight value must be in order to be pruned away. Larger values of tol allow more alpha weights to be defined as zero, which reduces the number of support vectors.

The biggest weakness of support vector regression is the difficulty of tuning the hyperparameters. Small changes in parameter values can have extremely large changes in the model, and the hyperparameters interact in complex ways.

Support vector regression had a brief surge of popularity in the late 1990s and early 2000s. However, data scientists realized that the closely related kernel ridge regression (KRR) has several significant advantages over SVR, and so the use of SVR declined to the point where it is not used very much today.

SVR is more difficult to implement than KRR, SVR is much more difficult to tune than KRR (KRR can use true SGD, which is easier to tune than SVR sub-gradient descent), and SVR often gives slightly worse prediction accuracy than KRR (due mostly to the difficulty in parameter tuning). That said, there are some problem scenarios where kernel SVR is highly effective.



One of the advantages of implementing a machine learning regression system from scratch is that it gives you a full explanation and understanding of exactly how the system works.

I’m a big fan of science fiction movies from the 1950s and 1960s. I especially like scenes where a scientist explains what the threat is.

Left: In “Crack in the World” (1965), a scientist explains how his project that will drill a hole into the Earth’s magma core (using a nuclear missile), to gain access to unlimited power. Well, based on the title of this movie, you can correctly guess that this idea did not work out very well. My grade for this movie = B.

Right: In “Gorath” (1962), a scientist explains that a runaway star named Gorath is on a collision course with Earth. All governments unite to build gigantic rocket thrusters at the South Pole to move Earth out of the way, and then back again. The plan succeeds. My grade for this movie = B.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my lame blog editor often chokes on symbols).

using System;
using System.IO;
using System.Collections.Generic;

// kernel SVR with SGD training
// hard-wired RBF kernel function

namespace SupportVectorRegressionSGD
{
  internal class SupportVectorRegressionSGDProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin C# kernel support " +
        "vector regression with SGD ");

      Console.WriteLine("\nLoading train (200) and" +
        " test (40) data ");
      string trainFile =
        "..\\..\\..\\Data\\synthetic_train_200.txt";
      double[][] trainX =
        Utils.MatLoad(trainFile,
        new int[] { 0, 1, 2, 3, 4 }, ',', "#");
      double[] trainY =
        Utils.MatToVec(Utils.MatLoad(trainFile,
        new int[] { 5 }, ',', "#"));

      string testFile =
        "..\\..\\..\\Data\\synthetic_test_40.txt";
      double[][] testX =
        Utils.MatLoad(testFile,
        new int[] { 0, 1, 2, 3, 4 }, ',', "#");
      double[] testY =
        Utils.MatToVec(Utils.MatLoad(testFile,
        new int[] { 5 }, ',', "#"));
      Console.WriteLine("Done ");

      Console.WriteLine("\nFirst three X predictors: ");
      for (int i = 0; i "lt" 3; ++i)
        Utils.VecShow(trainX[i], 4, 9);
      Console.WriteLine("\nFirst three target y: ");
      for (int i = 0; i "lt" 3; ++i)
        Console.WriteLine(trainY[i].ToString("F4").
          PadLeft(8));

      Console.WriteLine("\nCreating SVR object");
      double gamma = 0.30;    // RBF param
      double epsilon = 0.0075;
      double C = 1.0;
      double lrnRate = 0.001;
      int maxEpochs = 5000;
      double tol = 1.0e-4;

      Console.WriteLine("Setting RBF gamma = " +
        gamma.ToString("F4"));
      Console.WriteLine("Setting epsilon = " +
        epsilon.ToString("F6"));
      Console.WriteLine("Setting C = " +
        C.ToString("F2"));
      Console.WriteLine("Setting lrnRate = " +
        lrnRate.ToString("F4"));
      Console.WriteLine("Setting maxEpochs = " +
        maxEpochs);
      Console.WriteLine("Setting tol = " +
        tol.ToString("F6"));

      SVR model = new SVR(gamma, epsilon, C,
        lrnRate, maxEpochs, tol, seed: 0);
      Console.WriteLine("Done ");

      Console.WriteLine("\nTraining SVR model using" +
        " SGD ");
      model.Train(trainX, trainY);
      Console.WriteLine("Done ");
      Console.WriteLine("\nModel alpha (weights): ");
      Utils.VecShow(model.alpha, 4, 9);
      Console.WriteLine("\nModel bias = " + 
        model.b.ToString("F4"));

      Console.WriteLine("\nNumber supp vectors = " +
        model.suppX.Length);

      Console.WriteLine("\nEvaluating model ");
      double trainAcc =
        model.Accuracy(trainX, trainY, 0.10);
      double testAcc =
        model.Accuracy(testX, testY, 0.10);

      Console.WriteLine("\nTrain acc (within 0.10) = " +
        trainAcc.ToString("F4"));
      Console.WriteLine("Test acc (within 0.10) = " +
        testAcc.ToString("F4"));

      double trainMSE = model.MSE(trainX, trainY);
      double testMSE = model.MSE(testX, testY);

      Console.WriteLine("\nTrain MSE = " +
        trainMSE.ToString("F4"));
      Console.WriteLine("Test MSE = " +
        testMSE.ToString("F4"));

      double trainR2 = model.R2(trainX, trainY);
      double testR2 = model.R2(testX, testY);

      Console.WriteLine("\nTrain R2 = " +
        trainR2.ToString("F4"));
      Console.WriteLine("Test R2 = " +
        testR2.ToString("F4"));

      double[] x = trainX[0];
      Console.WriteLine("\nPredicting for trainX[0] ");
      double y = model.Predict(x);
      Console.WriteLine("Predicted y = " +
        y.ToString("F4"));

      Console.WriteLine("\nEnd SVR with SGD training demo ");
      Console.ReadLine();
    } // Main()

  } // class Program

  // ========================================================

  public class SVR
  {
    public double gamma;  // for RBF kernel
    public double epsilon;
    public double C; // weight regularization
    public double[][] suppX;  // needed for pred
    public double[] suppY;
    public double[] alpha;  // one per trainX item
    public double b;   // bias
    public double lrnRate;  // for SGD training
    public int maxEpochs;
    public double tol;
    public Random rnd;

    // ------------------------------------------------------

    public SVR(double gamma, double epsilon, double C,
      double lrnRate, int maxEpochs, double tol,
      int seed = 0)
    {
      this.gamma = gamma;
      this.epsilon = epsilon;
      this.C = C;
      this.suppX = new double[0][]; // compiler happy
      this.suppY = new double[0];
      this.lrnRate = lrnRate;
      this.maxEpochs = maxEpochs;
      this.tol = tol;
      this.alpha = new double[0];
      this.b = 0.0;
      this.rnd = new Random(seed);  // shuffle train order
    } // ctor

    // ------------------------------------------------------

    public void Train(double[][] trainX, double[] trainY)
    {
      this.suppX = trainX;
      this.suppY = trainY;
      int n = trainX.Length;

      // init weights
      this.alpha = new double[n];
      double lo = -0.01; double hi = 0.01; // not needed
      for (int i = 0; i "lt" n; ++i)
        this.alpha[i] =
          (hi - lo) * this.rnd.NextDouble() + lo;
      this.b = 0.0;

      // precompute all rbf values to K for fast train
      // not feasible for huge datasets
      double[][] K = this.MakeK(trainX);

      // set up indices for random order SGD training
      int[] indices = Utils.VecRange(n); // 0, 1, 2, ..
      double lamda = 1.0 / this.C;
      int progressFreq = (int)(this.maxEpochs / 5);

      // main sub-gradient processing loop
      for (int epoch = 0; epoch "lt" this.maxEpochs; ++epoch)
      {
        this.Shuffle(indices);
        for (int i = 0; i "lt" indices.Length; ++i)
        {
          int idx = indices[i];
          double predY = 0.0;
          for (int j = 0; j "lt" this.alpha.Length; ++j)
            predY += this.alpha[j] * K[idx][j]; // fast
          predY += this.b;
          double error = predY - trainY[idx];

          double gradLoss;
          bool insideTube = false;
          if (error "gt" this.epsilon)
            gradLoss = 1.0;
          else if (error "lt" -this.epsilon)
            gradLoss = -1.0;
          else
          {
            gradLoss = 0.0;
            insideTube = true;
          }

          // local kernel regularization gradient
          double gradReg = this.alpha[idx] * K[idx][idx];

          //  decoupled updates to the active index
          this.alpha[idx] -= this.lrnRate *
            (lamda * gradReg + gradLoss);
          this.b -= this.lrnRate * gradLoss;

          // force tiny weights to 0
          if (insideTube == true &&
            Math.Abs(this.alpha[idx]) "lt" this.tol)
            this.alpha[idx] = 0.0;

          // in-loop clip to bound updates mid-flight
          if (this.alpha[idx] "lt" -this.C)
            this.alpha[idx] = -this.C;
          else if (this.alpha[idx] "gt" this.C)
            this.alpha[idx] = this.C;

        } // each item

        // show training progress every few epochs
        if (epoch % progressFreq == 0)
        {
          double mse =
            this.MSE(trainX, trainY);
          double acc =
            this.Accuracy(trainX, trainY, 0.10);
          string s1 = "epoch = " +
            epoch.ToString().PadLeft(6);
          string s2 = " MSE = " +
            mse.ToString("F4");
          string s3 = " acc = " + acc.ToString("F4");
          Console.WriteLine(s1 + s2 + s3);
        }

      } // each epoch

      // final global clip
      for (int i = 0; i "lt" n; ++i)
      {
        if (this.alpha[i] "lt" -this.C)
          this.alpha[i] = -this.C;
        else if (this.alpha[i] "gt" this.C)
          this.alpha[i] = this.C;
      }

      // prune: store only explicit support vectors
      List"lt"int"gt" svLst = new List"lt"int"gt"();
      for (int i = 0; i "lt" this.alpha.Length; ++i)
      {
        if (Math.Abs(this.alpha[i]) "gt" 1.0e-5)
          svLst.Add(i);
      }
      int[] svMask = svLst.ToArray();

      this.suppX = Utils.MatSelectRows(trainX, svMask);
      this.suppY = Utils.VecSelectItems(trainY, svMask);
      this.alpha = Utils.VecSelectItems(this.alpha, svMask);

      return;  // all done
    } // Train

    // ------------------------------------------------------

    private void Shuffle(int[] indices)
    {
      // Fisher-Yates helper for Train()
      for (int i = 0; i "lt" indices.Length; ++i)
      {
        int ri = this.rnd.Next(i, indices.Length);
        int tmp = indices[i];
        indices[i] = indices[ri];
        indices[ri] = tmp;
      }
    } // Shuffle

    // ------------------------------------------------------

    private double RBF(double[] v1, double[] v2)
    {
      int n = v1.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" n; ++i)
      {
        double d = v1[i] - v2[i];
        sum += d * d;
      }
      double result = Math.Exp(-1 * this.gamma * sum);
      return result;
    }

    // ------------------------------------------------------

    private double[][] MakeK(double[][] X)
    {
      // Kernel-Gram matrix helper for Train()
      // pre-compute all similarities, to avoid re-computes
      int n = X.Length;
      double[][] result = Utils.MatMake(n, n);
      for (int i = 0; i "lt" n; ++i)
        for (int j = 0; j "lt" n; ++j)
          result[i][j] = this.RBF(X[i], X[j]);
      return result;
    }

    // ------------------------------------------------------

    public double Predict(double[] x)
    {
      int n = this.suppX.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" n; ++i)
      {
        double[] xx = this.suppX[i];
        double k = this.RBF(x, xx);
        sum += this.alpha[i] * k;
      }
      return sum + this.b;
    }

    // ------------------------------------------------------

    public double Accuracy(double[][] dataX,
      double[] dataY, double pctClose)
    {
      int numCorrect = 0; int numWrong = 0;
      int n = dataX.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        double[] x = dataX[i];
        double actualY = dataY[i];
        double predY = this.Predict(x);
        if (Math.Abs(actualY - predY) "lt"
          Math.Abs(actualY * pctClose))
          ++numCorrect;
        else
          ++numWrong;
      }
      return (numCorrect * 1.0) / n;
    }

    // ------------------------------------------------------

    public double MSE(double[][] dataX, double[] dataY)
    {
      double sum = 0.0;
      int n = dataX.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        double[] x = dataX[i];
        double actualY = dataY[i];
        double predY = this.Predict(x);
        sum += (actualY - predY) * (actualY - predY);
      }
      return sum / n;
    }

    // ------------------------------------------------------

    public double R2(double[][] dataX, double[] dataY)
    {
      // coefficient of determination
      int n = dataX.Length;
      double ssRes = 0.0; double ssTot = 0.0;
      double meanY = Utils.VecMean(dataY);

      for (int i = 0; i "lt" n; ++i)
      {
        double[] x = dataX[i];
        double actualY = dataY[i];
        double predY = this.Predict(x);
        ssRes += (actualY - predY) * (actualY - predY);
        ssTot += (actualY - meanY) * (actualY - meanY);
      }
      double result = 1.0 - (ssRes / ssTot);
      return result;
    }

  } // class KRR

  // ========================================================

  public class Utils
  {
    // ------------------------------------------------------

    public static double[][] MatLoad(string fn,
      int[] usecols, char sep, string comment)
    {
      List"lt"double[]"gt" result =
        new List"lt"double[]"gt"();
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        string[] tokens = line.Split(sep);
        List"lt"double"gt" lst = new List"lt"double"gt"();
        for (int j = 0; j "lt" usecols.Length; ++j)
          lst.Add(double.Parse(tokens[usecols[j]]));
        double[] row = lst.ToArray();
        result.Add(row);
      }
      sr.Close(); ifs.Close();
      return result.ToArray();
    }

    // ------------------------------------------------------

    public static double[] MatToVec(double[][] X)
    {
      int nRows = X.Length;
      int nCols = X[0].Length;
      double[] result = new double[nRows * nCols];
      int k = 0;
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[k++] = X[i][j];
      return result;
    }

    // ------------------------------------------------------

    //public static double[] MatGetColumn(double[][] X,
    //  int col)
    //{
    //  int n = X.Length;
    //  double[] result = new double[n];
    //  for (int i = 0; i "lt" n; ++i)
    //    result[i] = X[i][col];
    //  return result;
    //}

    // ------------------------------------------------------

    public static double[][] MatSelectRows(double[][] X,
      int[] rows)
    {
      int nRowsSrc = X.Length;
      int nColsSrc = X[0].Length;
      int n = rows.Length;
      double[][] result = MatMake(n, nColsSrc);

      for (int i = 0; i "lt" n; ++i) // i pts into result
      {
        int srcRow = rows[i];
        for (int j = 0; j "lt" nColsSrc; ++j)
        {
          result[i][j] = X[srcRow][j];
        }
      }
      return result;
    }

    // ------------------------------------------------------

    public static double[][] MatMake(int nRows, int nCols)
    {
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols];
      return result;
    }

    // ------------------------------------------------------

    public static double VecMean(double[] vec)
    {
      int n = vec.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" n; ++i)
        sum += vec[i];
      double result = sum / n;
      return result;
    }

    // ------------------------------------------------------

    public static int[] VecRange(int n)
    {
      int[] result = new int[n];
      for (int i = 0; i "lt" n; ++i)
        result[i] = i;
      return result;
    }

    // ------------------------------------------------------

    //public static double VecDot(double[] v1, double[] v2)
    //{
    //  int n = v1.Length;
    //  double sum = 0.0;
    //  for (int i = 0; i "lt" n; ++i)
    //    sum += v1[i] * v2[i];
    //  return sum;
    //}

    // ------------------------------------------------------

    public static double[] VecSelectItems(double[] vec,
      int[] idxs)
    {
      int n = idxs.Length;
      double[] result = new double[n];
      for (int i = 0; i "lt" n; ++i)
      {
        result[i] = vec[idxs[i]];
      }
      return result;
    }

    // ------------------------------------------------------

    public static void VecShow(int[] vec, int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString().PadLeft(wid));
      Console.WriteLine("");
    }

    // ------------------------------------------------------

    public static void MatShow(double[][] m, int dec,
      int wid)
    {
      int nRows = m.Length; int nCols = m[0].Length;
      double small = 1.0 / Math.Pow(10, dec);
      for (int i = 0; i "lt" nRows; ++i)
      {
        for (int j = 0; j "lt" nCols; ++j)
        {
          double v = m[i][j];
          if (Math.Abs(v) "lt" small) v = 0.0;
          Console.Write(v.ToString("F" + dec).
            PadLeft(wid));
        }
        Console.WriteLine("");
      }
    }

    // ------------------------------------------------------

    public static void VecShow(double[] vec, int dec,
      int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString("F" + dec).PadLeft(wid));
      Console.WriteLine("");
    }

    // ------------------------------------------------------

  } // class Utils

  // ========================================================

} // ns

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning | Leave a comment

Example of Anomaly Detection Using scikit IsolationForest

I came across a relatively obscure module in the scikit-learn library called IsolationForest. The module is used for anomaly detection. In a nutshell, if you apply a decision tree to a dataset, because of the way the branches are created, after the tree is created, tree nodes that are near the root, and which hold a single value, are anomalous.

I decided to put a demo together.

For my demo I used a set of synthetic data that looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.9776, -0.9616,  0.9704, -0.9911,  0.9562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
. . . 

The last value in each column is a dependent variable, used for regression explorations, so I used only the first five values on each line. There are 200 data items.

I deliberately modified data item [1] to make the magnitudes of each value very large — so data item [1] is anomalous. Notice all the values in item [1] are very close to -1 or +1.

The output of my IsolationForest demo is:

Begin scikit IsolationForest demo

Loading synthetic data (200)

First three data items:
[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]
[ 0.9776 -0.9616  0.9704 -0.9911  0.9562]
[-0.9452  0.3409 -0.1654  0.1174 -0.7192]

Creating IsolationForest
Done

Analyzing dataset
Done

First three anomaly scores
  0.0044
 -0.1226
  0.0079

Most anomalous item = [1]
Anomaly score = -0.1226

End demo

Smaller values (more negative) are more anomalous. The demo correctly identified item [1] as the most anomalous.

When I first discovered the scikit IsolationForest, I didn’t like some of the technical details and so I used the key ideas to implement a C# system that I called an Anomaly Forest. When I ran the data through the C# program, it also picked data item [1] as most anomalous. (See the screenshot above).

Different anomaly detection algorithms identify different types of anomalies. The scikit IsolationForest module is just one of many techniques.



In some sense, machine learning is all about finding hidden patterns in data. Every cover of Playboy Magazine (except for the very first issue in December 1953) has a logo of a bunny. On most covers, the bunny logo is fairly obvious, but sometimes the logo is very cleverly hidden.

Left: On the July 1976 (the 200th anniversary of the United States), the bunny logo is disguised as one of the stars on the U.S. flag.

Right: This is the most famous cover for the hidden logo. Many thousands of readers wrote to the magazine saying that they had examined every square inch of the cover and there is no logo. But there is — the trickiest hidden logo ever, up until that time. The logo is hidden in the text “The Playmate of the Year”. The bunny head is the upper-left part of the “Y” of “Year” and the ears are the bottom part of the “P” and the “y” in Playmate”. Because of this tricky cover, starting a few months later, the magazine explained where the hidden logo is on the Contents page of each issue.


Demo program.

# isolation_forest_scikit.py
# anomaly detection

import numpy as np
from sklearn.ensemble import IsolationForest

np.set_printoptions(precision=4, suppress=True,
  floatmode='fixed', linewidth=120)

print("\nBegin scikit IsolationForest demo ")

print("\nLoading synthetic data (200) ")
data_file = ".\\Data\\synthetic_200.txt"

cols_X = [0,1,2,3,4]  
data_X = np.loadtxt(data_file, comments="#",
  usecols=cols_X, delimiter=",",  dtype=np.float64)

print("\nFirst three data items: ")
for i in range(3):
  print(data_X[i])

print("\nCreating IsolationForest ")
detector = IsolationForest(n_estimators=50, random_state=0)
print("Done ")

print("\nAnalyzing dataset ")
detector.fit(data_X)
print("Done ")

scores = detector.decision_function(data_X)
print("\nFirst three anomaly scores ")

for i in range(3):
  print("%8.4f " % scores[i])

min_indx = 0
min_score = scores[0]
for i in range(len(scores)):
  if scores[i] < min_score:
    min_score = scores[i]
    min_indx = i

print("\nMost anomalous item = [" + str(min_indx) + "]")
print("Anomaly score = %0.4f " % min_score)

# print("")
# for i in range(len(scores)):
#   print("%4d  %0.4f " % (i, scores[i]))

print("\nEnd demo ")

Data:

# synthetic_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.9776, -0.9616,  0.9704, -0.9911,  0.9562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402
Posted in Machine Learning, Scikit | Leave a comment