Implementing an LSTM Cell using Python

Just for fun, while I was eating breakfast one morning, I decided to code up an LSTM cell using Python. So I did.

An LSTM cell is a complex software module that accepts input (as a vector), generates output, and maintains cell state. If you connect an LSTM cell with some additional plumbing, you get an LSTM network. These networks can be used with sequence data, such as a sequence of words in a sentence.

I used as my base reference the description given in the Wikipedia entry on the topic. There are many, many variations of LSTMs, and I used the simplest.

It was a good exercise and reinforced my understanding of LSTMs and NumPy dot() function (matrix multiplication), multiply() function (Hadamard, matrix element-wise multiplication), addition function (element-wise addition which is implemented with add() or the overloaded ‘+’ operator).

LSTM are very interesting. At some point I’ll take a stab at hooking up a full LSTM network, and then training the LSTM network, which will not be a trivial task.

# lstm_io.py

import numpy as np
np.set_printoptions(precision=4)

def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def compute_outputs(xt, h_prev, c_prev,
      Wf, Wi, Wo, Wc,
      Uf, Ui, Uo, Uc,
      bf, bi, bo, bc):

  ft = sigmoid(np.dot(Wf,xt) + np.dot(Uf,h_prev) + bf)
  it = sigmoid(np.dot(Wi,xt) + np.dot(Ui,h_prev) + bi)
  ot = sigmoid(np.dot(Wo,xt) + np.dot(Uo,h_prev) + bo)
  ct = np.multiply(ft, c_prev) + \
    np.multiply(it, np.tanh(np.dot(Wc,xt) + \
    np.dot(Uc, h_prev) + bc))
  ht = np.multiply(ot, np.tanh(ct))
  return (ht, ct)

# =========================================================

def main():
  print("\nBegin LSTM demo\n")

  xt = np.array([[1.0], [2.0]], dtype=np.float32)
  h_prev = np.zeros(shape=(3,1), dtype=np.float32)
  c_prev = np.zeros(shape=(3,1), dtype=np.float32)

  W = np.array([[0.01, 0.02],
                [0.03, 0.04],
                [0.05, 0.06]], dtype=np.float32)

  U = np.array([[0.07, 0.08, 0.09],
                [0.10, 0.11, 0.12],
                [0.13, 0.14, 0.15]], dtype=np.float32)

  b = np.array([[0.16], [0.17], [0.18]], dtype=np.float32)

  Wf = np.copy(W); Wi = np.copy(W)
  Wo = np.copy(W); Wc = np.copy(W)

  Uf = np.copy(U); Ui = np.copy(U)
  Uo = np.copy(U); Uc = np.copy(U)
  
  bf = np.copy(b); bi = np.copy(b)
  bo = np.copy(b); bc = np.copy(b)

  print("Sending input = (1.0, 2.0) \n")

  (ht, ct) = compute_outputs(xt, h_prev, c_prev, Wf, Wi,
    Wo, Wc, Uf, Ui, Uo, Uc, bf, bi, bo, bc)
  print("output = ")
  print(ht)
  print("")
  print("new cell state = ")
  print(ct)
  print("\n")

  h_prev = np.copy(ht)
  c_prev = np.copy(ct)
  xt = np.array([[3.0], [4.0]], dtype=np.float32) 

  print("Sending input = (3.0, 4.0) \n")

  (ht, ct) = compute_outputs(xt, h_prev, c_prev, Wf, Wi,
    Wo, Wc, Uf, Ui, Uo, Uc, bf, bi, bo, bc)
  print("output = ")
  print(ht)
  print("")
  print("new cell state = ")
  print(ct)
  
  print("\nEnd \n")

if __name__ == "__main__":
  main()
This entry was posted in Machine Learning. Bookmark the permalink.

1 Response to Implementing an LSTM Cell using Python

  1. PGT-ART's avatar PGT-ART says:

    Yesterday I tried to use the C# example you gave of this..
    For predictions, its a memory but how would you use this inside airplane traffic data ?
    Does this learn a adjusting pattern over time, or is it rather a likely state decision model.
    ea state input state xxx some number result categoried towards a=64, B=16, C=73 and state e = 40 ,

    I have a related question to this model and prediction models, how to create and use loop-backs and signal delay in time series neural networks. (not for trading but personal medical reasons)

Comments are closed.