Unraveling the Mysteries of a PyTorch LSTM Module

Update: See https://jamesmccaffreyblog.com/2019/06/28/the-pytorch-lstm-module-input-shape-tricked-me-again/

Because PyTorch is so new, there aren’t many code examples to be found on the Internet, and the documentation is frequently out-of-sync with the latest code. I’ve worked with very new, rapidly changing code libraries before and there’s no magic solution — you just have to dig away as best you can.

LSTM recurrent neural modules are tricky. Very tricky. I’ve been probing away, perhaps an hour a day, for several weeks now. In my most recent investigation, I set up a hypothetical situation where I have a batch of three sentences, where each sentence has four words, and each word is composed of a vector with five values.

It would take pages of text to explain what is going on even in my tiny demo so I won’t try. But the key thing I learned was how to correctly shape the various inputs to an LSTM module. It’s very tricky and not at all obvious. But I know from previous experience with learning similarly immature technologies, that every investigation is adding a bit of knowledge in my brain and that eventually I’ll unlock the conceptual hidden doors and master PyTorch LSTMs.

But it might take a long time.



I’ve always been fascinated by hidden doors and hidden rooms, ever since I read The Hardy Boys “The Secret Panel”. From left: A secret door cleverly disguised as a book shelf. Part of the famous Winchester House. A woman raises an entire stairway to reveal a hidden room. The Hardy Boys.

This entry was posted in Machine Learning, PyTorch. Bookmark the permalink.

1 Response to Unraveling the Mysteries of a PyTorch LSTM Module

  1. Peter Boos's avatar Peter Boos says:

    LSTM’s kinda, are a bus structure for ordering by a learned likelihood index.
    I’ve been wondering, can the working of LSTM’s be altered so that they don’t use a bus structure but more alike a tree with long branches (connecting multiple bus structures). So far I’ve not seen such a neural net model. And i wonder if it might be able to better store text data.
    (maybe it learns itself to create new branches, or not), its just that so far I’ve not seen it and wonder if its possible.

Comments are closed.