An Efficient Accuracy Function for PyTorch Regression

A regression problem is one where the goal is to predict a numeric value. A classic example is the Boston Housing problem where the goal is to predict the median house price of a small town near Boston. There are 13 predictor variables: crime rate in the town, large lot percentage, industry information, Charles River information, pollution, average number rooms per house, house age information, distance to Boston, accessibility to highways, tax rate, pupil-teacher ratio, proportion of Black residents, and percentage of low (social) status residents.

Most neural network code libraries have a built-in accuracy function for classification problems because there’s no ambiguity about whether a prediction is correct or wrong. For example, if you’re predicting the political party affiliation of a person (Democrat, Republican, Other), a prediction is either correct or it’s not.

But with any regression problem, you have to define what a correct prediction is. Typically you define correctness as being within a certain percentage of the true value. For example, if a correct median house price is one that’s within 10% of the true value, then if a town has median house price of $60,000 then a correct prediction is a value between $54,000 and $66,000.

The PyTorch library is relatively low-level and works with Tensor objects. Just to stay in practice, I decided to code up a PyTorch accuracy function that’s efficient in the sense that it works directly on Tensors.

Here’s a relatively inefficient (but clear) accuracy function:

def accuracy(model, data_x, data_y, pct_close):
  # data_x and data_y are numpy array-of-arrays matrices
  n_feat = len(data_x[0])  # number features
  n_items = len(data_x)    # number items
  n_correct = 0; n_wrong = 0
  for i in range(n_items):
    X = T.Tensor(data_x[i])
    # Y = T.Tensor(data_y[i])  # not needed
    oupt = model(X)            # Tensor
    pred_y = oupt.item()       # scalar

    if np.abs(pred_y - data_y[i]) < \
      np.abs(pct_close * data_y[i]):
      n_correct += 1
    else:
      n_wrong += 1
  return (n_correct * 100.0) / (n_correct + n_wrong)

And here’s a relatively efficient (but not-so clear) Tensor version:

def akkuracy(model, data_x, data_y, pct_close):
  # pure Tensor, efficient version
  n_items = len(data_y)
  X = T.Tensor(data_x)
  Y = T.Tensor(data_y)  # actual as [102] Tensor

  oupt = model(X)       # predicted as [102,1] Tensor
  pred = oupt.view(n_items)  # predicted as [102]

  n_correct = T.sum((T.abs(pred - Y) < T.abs(pct_close * Y)))
  acc = (n_correct.item() * 100.0 / n_items)  # scalar
  return acc

There’s a lot going on in both versions and a full explanation would take several pages. But if you’re reading this post, you likely have some coding experience and if you examine the code closely, it should (eventually) make sense.

My friends who don’t work in technology sometimes ask me what I do, and sometimes I’ll explain that I work with machine learning and artificial intelligence and that I write code every day. They’ll often comment that coding is very intricate, low-level stuff and isn’t the Big Picture of ML and AI more important?

Well, yes, the Big Picture is more important. But an expert can’t be an expert unless he really understands the nuts-and-bolts of ML/AI. The company I work for sometimes hosts external speakers. Over the past few weeks I’ve heard several talks on ML/AI policy by so-called experts. The talks were laughable — the speakers were parroting generalities about interpretability of results and bias and fairness and policy and other related ML/AI Big Picture topics, but it was clear they really didn’t understand ML/AI at anything other than a superficial blah-blah-blah level, making their opinions pretty much irrelevant.

Hollywood celebrities are experts on just about everything. Philosophy, world’s most important problems, travel.

1 Response to An Efficient Accuracy Function for PyTorch Regression

Peter Boos says:

November 26, 2018 at 5:44 pm

Here are some more kernels (its a competition )
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/kernels

I don’t have all the time of the world, but Kaggle is a nice place to start to see how people tackle these kinds of problems, and it’s also very interesting to see that sometimes fairly simple networks are quite efficient as compared to deep networks (and with a lot shorter training time).

Loading...