At last. I’ve been working for months (an hour or two a day, three or four times a week) on the problem of creating a prediction model for the IMDB movie review sentiment analysis problem. I finally got a fairly good system up and running.
Update: You can find a complete end-to-end demo of IMDB sentiment analysis using PyTorch at https://jamesmccaffreyblog.com/2022/01/17/imdb-movie-review-sentiment-analysis-using-an-lstm-with-pytorch/
The IMDB movie dataset has 50,000 movie reviews. They are divided into 25,000 reviews for training and 25,000 for testing the accuracy of a trained model. The training and test sets have 12,500 positive reviews (“I like this movie a lot”) and 12,500 negative reviews (“This wasn’t a good movie”).
The first challenge was to get the movie data into a useable form. This involved things like converting to all lower case, removing most punctuation (but not single quote characters because contractions like “don’t” are critically important), and so on.
The second challenge was to fully understand and master the PyTorch LSTM cell behavior. LSTM (long, short-term memory) cells are extremely complex.
The third challenge was to get a solid grasp of PyTorch tensors. There are hundreds of PyTorch tensor functions, and dealing with them is very tricky.
The fourth challenge was to learn many of the low-level nuances of the PyTorch library. Things like train vs. eval mode, how the loss functions work, and so on.
And there were many other smaller challenges along the way.
So, my working demo uses movie reviews that have 50 words or less. I didn’t want to pad all reviews to the exact same length so that I could use mini-batch training, because that is a minor programming nightmare. Therefore, I had to use online training, which isn’t such a bad thing anyway. To keep things simple, I didn’t use any dropout on the LSTM network, which resulted in my demo model being somewhat overfitted — the trained model achieved 98.06% accuracy on the approximately 600 training reviews (50 words or less) but only 67.17% accuracy on the test reviews (also 50 words or less).
But, at least I have a basic system up and running.
Whew! That was a lot of work.

Sometimes avoiding hard work pays off. Sometimes it doesn’t. Lazy popcorn consumption. Lazy way of putting gas in car. Lazy-dog walking. Lazy dog-walking.

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
98,06% with 600 reviews sounds unreal, and the testing result of 68,17% with 600 reviews sounds really really really good for me.
The big question after this topic, how are the results if you take the whole dataset?
It’s not feasible to use the entire dataset because there are a few reviews that are over 2,500 words long, But with a reasonable limit, say 250 words, the system gets about 92% accuracy on training data and about 81% accuracy on the test data. Most people, when they see an LSTM for the first time are amazed. I know I was — it seemed like magic.
Yeah, I am amazed too and euphoric what is coming up next from your LSTM project. 🙂
Let me resume for my understanding, you can take all reviews but it works only with a limit of each review with 250 words well.
But the limit confuse me a little bit. Letters are in words, words in sentences, and all sentences result in the review. In the topic about the jagged matrix you use also different sizes for the words, but for the review limit there must be a strong dependency between density packed data reviews and a better accuraccy.
Another thought about LSTM’s, in German “the” can stand for “der”, “die” or “das”, in English “the men” means in German “der Mann”, “the woman” = “die Frau”, “the house” = “das Haus”. That was the easy part.
Because I dont understand how I could explain why we name it “the butterfly” = “der Schmetterling”, “the snake” = “die Schlange” and “the squirrel” = “das Eichhörnchen” and singular, and in plural its the same “die Eichhörnchen” and for snakes “die Schlangen”, confusing. It’s more a feeling for me what is right.
It seems a LSTM learns this right feeling, more then to trivial save the combinations, that’s a part of the magic for me.
After 6 month my best result with a usable NN and the MNIST was 95% on testing data,
I was thinking about how to push it. Some experiments show me, the prediction if the number is even or odd can increase the accuraccy. So my plan is to filtering and classify the numbers in new categories to seperate them better.
A 8 and a 9 could look very similiar, so odd or even can make a better prediction and after that, the wrong result will hopefully outfiltred.
For 1 and 7 the better prediction could be, is the number is more then 4 or less.
The parity gives also a binary result and cut the numbers to 50%.
This could be a meaningful fun for a MNIST NN, for a LSTM a solution like this would be much more cooler.
My LSTM skills are still very limited, but it makes more and more sense through the great success you have achieved to focus more on this area.