Using Inverse Softmax Weighting or Modified Uniform Weighting Instead of Inverse Distance Weighting

I devised what I think are two new and possibly useful micro-algorithms. Suppose you are doing k-nearest neighbors regression with k = 3. To predict the y value for an x value, you find the 3 closest training data points, and then return the average of the 3 known correct target values. For example, suppose x = (2.0, 7.0, 1.0) and:

closest train Xs  distance to x     target y

(2.0, 5.0, 1.0)   sqrt(4.0) = 2.00    0.50
(3.0, 8.0, 3.0)   sqrt(6.0) = 2.45    0.60
(4.0, 9.0, 2.0)   sqrt(9.0) = 3.00    0.90

The simplest prediction strategy is to just use the average of the 3 target values: predicted y = (0.50 + 0.60 + 0.90) / 3 = 0.67.

Note that this is equivalent to weighting each target y equally by 1 / k = 0.33: (0.33 * 0.50) + (0.33 * 0.60) + (0.33 * 0.90) = 0.67.

Example of k-nearest neighbors regression using modified uniform weights described in this post.

Uniform weighting works pretty well but conceptually it makes sense that closer data items should have more influence. With uniform weighting, all items contribute equally.

The standard improved technique (spoiler: it doesn’t work) is to weight each target y by the inverse of the distance to the target. Smaller distances will have larger weights.

closest train Xs  distance to x      inverse distance    target y

(2.0, 5.0, 1.0)   sqrt(4.0) = 2.00   1 / 2.00 = 0.5000   0.50
(3.0, 8.0, 3.0)   sqrt(6.0) = 2.45   1 / 2.45 = 0.4082   0.60
(4.0, 9.0, 2.0)   sqrt(9.0) = 3.00   1 / 3.00 = 0.3333   0.90
                                                ------
                                                1.2415

The weights are computed like so:

weights                    target y

0.5000 / 1.2415 = 0.4027   0.50
0.4082 / 1.2415 = 0.3288   0.60
0.3333 / 1.2415 = 0.2685   0.90
                  ------
                  1.0000

And so the predicted y = (0.4027 * 0.50) + (0.3288 * 0.60) + (0.2685 * 0.90) = 0.64, which is less than the equal-weight predicted y value of 0.67.

Unfortunately, the inverse distance weighting technique is great in theory but just doesn’t work in practice. The problem is when a distance is 0 or very close to 0. You can’t divide by 0, and the inverse of a very small number will be huge. Surprisingly, there’s no easy solution to this zero or close to zero distance problem for inverse weighting.

But I came up with two ideas for a different type of weighting.

The first idea is to compute the softmax of the nearest distances, and then use the inverse of those values. This works because softmax computes pseudo-probabilities that sum to 1 and none of the values are ever 0 or negative. Like this:

closest train Xs  distance to x      target y

(2.0, 5.0, 1.0)   sqrt(4.0) = 2.00   0.50
(3.0, 8.0, 3.0)   sqrt(6.0) = 2.45   0.60
(4.0, 9.0, 2.0)   sqrt(9.0) = 3.00   0.90

First, the 3 softmax values of the 3 closest distances are computed, and inverted

distance  exp(distance)  exp / sum   inverse softmax

2.00       7.3891        0.1891      1 / 0.1891 = 5.2866
2.45      11.5884        0.2967      1 / 0.2967 = 3.3709
3.00      20.0856        0.5142      1 / 0.5142 = 1.9448
          -------        ------                   ------
          39.0629        1.0000                  10.6023

Then the weights are:

weights                     target y

5.2866 / 10.6023 = 0.4986   0.50
3.3709 / 10.6023 = 0.3179   0.60
1.9448 / 10.6023 = 0.1834   0.90
                  ------
                   1.0000

And so the predicted y = (0.4986 * 0.50) + (0.3179 * 0.60) + (0.1834 * 0.90) = 0.61, which is less than the equal-weight predicted y value of 0.67.

To recap, instead of using inverse distances to compute weights, you can first compute the softmax of the distances, which avoids errors with small or zero distances, then use the inverses of the softmax values.

I’ve tried some experiments using this inverse softmax weighting technique and it works pretty well. There is an engineering pitfall that computing softmax values can blow up if a distance is very large because you use exp(distance). But that problem can be dealt with too. See https://jamesmccaffreyblog.com/2016/03/04/the-max-trick-when-computing-softmax/.

Here’s some example C# code (“lt” and “gt” are the less-than and greater-than operators).

  static double[] InverseSoftmaxWts(double[] dists)
  {
    // assumes distances are sorted low to high
    int n = dists.Length;
    double[] result = new double[n];

    // compute softmax using max trick to avoid exp(huge)
    double max = dists[0];
    for (int i = 0; i "lt" n; ++i)
      if (dists[i] "gt" max) max = dists[i];

    double[] modifiedDists = new double[n];
    for (int i = 0; i "lt" n; ++i)
      modifiedDists[i] = dists[i] / max;

    double sum = 0.0;
    for (int i = 0; i "lt" n; ++i)
    {
      double x = Math.Exp(modifiedDists[i]);
      result[i] = x;
      sum += x;
    }

    for (int i = 0; i "lt" n; ++i)
      result[i] /= sum;

    // invert so small sm-dists are larger
    sum = 0.0;
    for (int i = 0; i "lt" n; ++i)
    {
      double x = 1 / result[i];
      result[i] = x;
      sum += x;
    }

    for (int i = 0; i "lt" n; ++i)
      result[i] /= sum;

    return result;
  } // InverseSoftmaxWts

The inverse softmax weighting technique gives the smallest distance relatively high value and the largest distance very small value. My second weighting micro-algorithm for k-NN regression modifies basic uniform weighting so that that shortest distance has the biggest weight, the longest distance has the smallest weight, and all the middle distances have the same intermediate weight values (so that all the weight sum to 1.0). For example, if there are k = 4 distances, then the four weights are (0.3750, 0.2500, 0.2500, 0.1250). If k = 5 the weights are (0.3000, 0.2000, 0.2000, 0.2000, 0.1000). And so on.

For k = 1 or 2 or 3 it’s necessary to hard-code the weight values. Here’s some C# code:

  static double[] ModifiedUniformWts(int k)
  {
    double[] result = new double[k];
    if (k == 1) result[0] = 1.0;
    else if (k == 2)
    {
      result[0] = 0.6000;
      result[1] = 0.4000;
    }
    else if (k == 3)
    {
      result[0] = 0.4000;
      result[1] = 0.3500;
      result[2] = 0.2500;
    }
    else if (k "gte" 4)
    {
      double big = 1.5 * (1.0 / k);  // ex: 1.5 * 0.25 = 0.3750
      double small = 0.5 * (1.0 / k);  // 0.5 * 0.25 = 0.1250
      double remainder = 1.0 - (big + small);  // 0.5000
      double x = remainder / (k - 2);  // 0.2500
      result[0] = big;
      result[k - 1] = small;
      for (int i = 1; i "lt" k - 1; ++i)  // middle wts
        result[i] = x;
    }
    return result;  // 0.3750, 0.2500, 0.2500, 0.1250
  }

The “Outer Limits” TV series ran from 1963 to 1965 (49 black and white episodes), and then again from 1995 to 2002 (152 color episodes).

Left: The alien in “The Galaxy Being” was filmed using inverse photography. The alien is accidentally transported to Earth. He is good but is met with suspicion. This was the very first Outer Limits episode.

Center: The alien in “The Bellero Shield” travels to Earth via a scientist’s laser experiment. The alien is good but the scientist’s wife is evil and kills the alien in order to try and get his force field device.

Right: The aliens in “Second Soul” contact Earth and ask if they can have dead bodies to inhabit. The story leads you to believe that the aliens are bad but it turns out they’re good.

1 Response to Using Inverse Softmax Weighting or Modified Uniform Weighting Instead of Inverse Distance Weighting

Thorsten Kleppe says:

July 18, 2023 at 3:59 am

k-NN is currently making waves in combination with gzip in the paper:
“Low Resource” Text Classification: A Parameter-Free Classification method with compressors

With your idea, the whole thing could become even more interesting. I am very excited. But I have to let it all sink in and understand, great blog post James.

Loading...