While I was waiting in an airport to fly to a conference recently, I got the idea of using a technique called roulette wheel selection to solve multi-armed bandit problems. Bottom line: the technique appears to work very well.
Imagine you’re standing in front of three slot machines. You have 40 tokens. Each machine pays either $1 or $0 per play according to a probability distribution that is unknown to you. Your challenge is to use your 40 tokens optimally and find the best machine.
There are many techniques to solve this kind of multi-armed bandit problem: epsilon-first, epsilon-greedy, UCB1, and Thompson sampling (beta distribution) are among the best known.
In pseudo-code, most of these techniques look like:
track number of S = successes (wins), and F = failures (losses) for each machine loop 40 times use curr S and F values to pick a good machine play the selected machine update S or F for the played machine end-loop
You don’t want to pick the current best machine on each iteration because the machines pay out probabilistically and you may not have correctly identified the true best machine. You need to balance exploring for the best machine with exploiting the current best machine.
So my thought was to use a technique from evolutionary optimization called roulette wheel selection to pick a machine to play. Suppose at some point in time the situation is:
machine 0 : S = 4, F = 2 machine 1 : S = 3, F = 1 machine 2 : S = 2, F = 3
You can use S and F to estimate the probability of each machine winning:
machine 0 : P = 4/6 = 0.67 machine 1 : P = 3/4 = 0.75 machine 2 : P = 2/5 = 0.40
You normalize the probabilities into fitness values that sum to 1.0:
machine 0 : f = 0.67 / (0.67 + 0.75 + 0.40) = 0.37 machine 1 : f = 0.75 / (0.67 + 0.75 + 0.40) = 0.41 machine 2 : f = 0.40 / (0.67 + 0.75 + 0.40) = 0.22
You want to select a machine in a way that is proportional to its fitness value, in other words, machine 1 should have the best chance to be selected, followed by machine 0, then machine 2.
Now the accumulated fitness values and associated index values are:
0.00 0.37 0.78 1.00
0 1 2
You generate a random value between 0.0 and 1.0 which determines the selected machine. For example, if p = 0.30 (between 0.0 and 0.37), machine 0 is selected. If p = 0.55 (between 0.37 and 0.78), machine 1 is selected and so on.
I coded up a demo and the roulette wheel selection technique seems to work extremely well. The roulette wheel technique is closest to Thompson sampling. Thompson sampling has the drawback that as the values of S and F increase, the likelihood of picking the machine with the best current estimated probability of winning becomes close to 1. In other words, you don’t explore non-current-best machines. The roulette wheel selection technique does not have this disadvantage.
Wow. Maybe I’m smarter than I thought I was. Or maybe not.

Left: Roulette wheels are mechanical works of art. Center: Humphrey Bogart in “Casablanca” (1942). Right: Ursula Andress in “Casino Royale” (1967)

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
The roulette wheel selection algorithm sounds absolutely amazing. I have never heard about that, but your explanation was really fantastic.
The idea feels like a softmax classification to squeeze the space to make decisions what is the best to do. Crazy stuff!
I tried to code the algorithm, here is my short implementation:
int S[]={4,3,2}, F[]={2,1,3};
float machine[3]={};
float selected = 0.5f, step=0, sum=0;
for(int i=0;i<3;i++)
sum += machine[i] = (float)S[i] / (S[i]+F[i]);
for(int i=0;i=step&&selected<=machine[i] / sum + step)
Print("machine ",i," selected by ",selected," between ",step," and ",machine[i] / sum + step);
Hope the code is on a piece and you like it.
I once had the case that the rand() was too slow.
On the Intel site they showed a nice fast_rand() function on listing 1, that works around 2 times faster in my tests.
https://software.intel.com/en-us/articles/fast-random-number-generator-on-the-intel-pentiumr-4-processor
With a lot of iterations this could be very powerful like this:
selected = fast_rand() / 32767.0f;