I Simulate a PyTorch LSTM from Scratch

I’ve been investigating LSTM (long, short-term memory) networks for quite a long time. LSTM networks are very, very complex. As part of my path to knowledge, I simulated a PyTorch version of an LSTM cell (there are many slight variations of LSTMs) using nothing but raw Python. For me, doing this was the only way for me to be sure that I abolutely undertand LSTMs.

So, first I set up a PyTorch LSTM with 3 inputs and 5 outputs. This means it’s an LSTM cell designed to accept one word at a time, where each word is a vector of three values, like (0.98, 1.32, 0.87) and each word emits five output values. An LSTM network has an embedding layer to convert words to their numeric values, and has a dense layer to convert the output values into a form useful for the problem at hand.



Top image: A PyTorch LSTM cell. Bottom image: My from-scratch version of the same LSTM cell.

Then I initialized the PyTorch LSTM cell’s 200 weights and biases values to 0.01, 0.02, 0.03, . . . 1.99, 2.00.

Next, I set up a dummy micro-sentence with two words where each word has three values — (1.0 2.0 3.0), (4.0 5.0 6.0).

I fed the two words to the PyTorch LSTM and captured the final outputs (ht) and the final internal cell state (ct) after the second word:

Final ht:
0.9618  0.9623  0.9626  0.9629  0.9631

Final ct:
1.9700  1.9760  1.9807  1.9843  1.9872 

Then I looked at my simulated PyTorch LSTM cell. I initialized it to the same 200 initial values and fed the same inputs, and . . . drum roll please . . . got the identical output values.


The outputs of the PyTorch version and the from-scratch version are identical. Success.

My simulated PyTorch LSTM was simplified in the sense that it doesn’t do sentence-batching, doesn’t do bi-directional processing, and doesn’t allow cell stacking. Even so, my simulated LSTM cell is very complex.

I am now satisfied that I understand exactly how PyTorch LSTM cells work.

One of my character flaws is that once a technical problem enters my brain, I can’t rest until I solve the problem to my satisfaction. This is often a good thing but it has a downside too because some problems will stick in my head for months or even years. Such problems are continuously floating around in my head and emerge to my subconscious when I’m sleeping. But, this is just how my brain works so I don’t worry about it one way or another — it’s beyond my control for the most part.

I’ve never seen a really good, but simple, explanation with code, of exactly how LSTM cells work. So, I intend to tidy up my demo code a bit and then write up a (hopefully) good explanation, and then publish that code and explanation in Visual Studio Magazine where I write a monthly column on data science: https://visualstudiomagazine.com/Articles/List/Neural-Network-Lab.aspx



Research suggests that men and women have different causes of sleeplessness. Women tend to worry about family and interpersonal relationships. Men tend to worry about work and money. What’s not clear is the extent to which these differences are biological. Consensus seems to be the majority of the difference is biological, but there’s no way to come up with a definitive answer.

This entry was posted in Machine Learning, PyTorch. Bookmark the permalink.