The Difference Between Deep Reinforcement Learning and (Regular) Reinforcement Learning

Consider the problem of solving a maze. You have “state” which describes your location in the maze (and possibly other things such as energy level). For every state you have possible “actions” such as move-left. So you can construct a SA table of every possible state and every possible action from each state. For example, if state = cell23, all possible actions might be (move-left, move-up, move-down) because a wall blocks the move-right action.

If you repeatedly start at a random state and then move through the maze until you find the end, you can construct a Q table that assigns a “quality” value q for every combination of state and action. For example q = Q(cell23, move-right) = 5.67 and q = Q(cell90, move-down) = 0.15. The Q table is constructed using something called the Bellman equation. With the Q table constructed, you can use it to solve the maze, by taking the action the gives the largest quality value at each state.

Left: A very simple maze. Bellman equation below. Right: The Q table for the maze, which was computed using the Bellman equation.

This is (regular) reinforcement learning. The technique will work if a.) all states are discrete, b.) you don’t have an insanely large number of states, c.) you can iterate through the maze zillions of times.

In practice, most interesting problems either have non-discrete states, or have an astronomically large number of states, or both. In such situations you can’t construct a state-action table, and therefore you can’t directly construct a Q table. To get around this problem you can create a deep neural network that predicts the Q value for any possible state-action pair. For example q = nn.predict(cell9204816537, move-left). With a way to estimate the q value for any possible state and action you can use standard reinforcement learning algorithms. This is deep reinforcement learning.

In other words, regular reinforcement learning uses a Q table to compute quality values for taking some action from some state. Deep reinforcement learning uses a deep neural network to compute an estimated quality value for taking some action from some state.

Deep reinforcement learning has produced some astonishing results, in particular Google’s AlphaZero chess program. However, it’s not clear at all if deep reinforcement learning can do anything practical. So far the answer is no, but many very smart people believe that deep RL will eventually produce useful systems.

Paint-by-numbers kits were introduced in 1951 under the Craft Master brand by businessman Max Klein and artist Dan Robbins. The idea was hugely successful and millions of kits were sold to households across America. Other companies quickly joined Craft Master. Paint by numbers is still popular today. Here is the box cover and two paintings from a 1958 space themed kit from ToyKraft aimed at children. Paint by numbers is sort of a way to democratize art. There are efforts under way to democratize ML and deep reinforcement learning but it may take a while for these efforts to become reality.