A Big Step Closer to the IMDB Movie Sentiment Example Using PyTorch

I made a big step in getting closer to my goal of creating a PyTorch LSTM prediction system for the IMDB movie review data. The IMDB dataset has 50,000 real movie reviews: 25,000 training (12,500 positive reviews, 12,500 negative reviews) and 25,000 test reviews.

Update: You can find a complete end-to-end demo of IMDB sentiment analysis using PyTorch at https://jamesmccaffreyblog.com/2022/01/17/imdb-movie-review-sentiment-analysis-using-an-lstm-with-pytorch/

I’m slowly coming to the realization that working with PyTorch LSTM networks is very, very tricky. You have to have a solid grasp of PyTorch tensors, near-expert level skill with Python, deep understanding of LSTM cells, awareness of PyTorch strangenesses (such as the invisible forward() method), and advanced knowledge of machine learning concepts such as cross entropy loss and dropout.

All that said, I finally put together a working demo that’s halfway to a solution for the IMDB problem. For simplicity, I use only 10 hard-coded dummy movie reviews, such as “the movie was excellent”. I use variable length reviews (no padding) which means I’m pretty much restricted to online training (i.e., processing one review at a time rather than batch processing multiple reviews). I use three classes (“negative”, “average/neutral”, “positive”) instead of just two because multiclass classification is (surprisingly if you’re new to ML) somewhat easier than binary classification.

But I’m satisfied the system works, and more than that, I completely understand exactly how the system works. With this information in my head, I’m confident I can tackle the IMDB problem.

Random images of Japanese TV commercials. Interesting strangenesses. Sometimes I’m glad that I don’t watch TV.

1 Response to A Big Step Closer to the IMDB Movie Sentiment Example Using PyTorch

Thorsten Kleppe says:

July 18, 2019 at 8:38 am

That’s so cool, how far you are with the LSTM.
It’s really exciting to watch your progress.

A few days ago I was reading about some ML stuff and came to a nice explanation about hierarchical softmax, and my first thought after the first idea was going to your LSTM project.

In honor to your work I did a step and transform the perceptron concept in a working demo example with MNIST (learned from your example) and the 60.000 train images on C# lang.
Its a pseudo rand NN with relu activation, sm + ce as output and batch size.
The only technique I add is bounce restriction with 2 more lines, its a simple method but it makes the NN much more robust for testing. Also in the BP the math version ist commented out, so its a engineering version with better accuracy as the math version.

https://github.com/grensen/perceptron_concept/blob/master/nn_001.cs
I really hope you like it and it would be so cool if you could evaluate this network.

With this concept it was possible for me to establish really crazy ideas.
Im unsure to present you my work is a good behaivor, on the other side I dont want something important let out, if it is, because I start with that. So for the moment here is a last yt demo about interesting stuff.

The video shows 5 examples, all initialized with the same weights:
1. the reference, a common NN.
2. a pseudo rand connected NN.
3. same NN as 2, but with common output connections, seems a good idea.
4. NN with pseudo rand on the next two layers.
5. NN with 50% pseudo only on the seccond layer and 50% commons with parity distribution on his output layer.

https://www.youtube.com/watch?v=ir6mgLMkezA&feature=youtu.be

It was amazing for me to see, how big the impact was, and with prunning it seems the disadvantage from the pseudo cancels a bit out.
There were a NN with under 500 connections and it was still learning the MNIST, absolutly crazy.

So, Im happy and every day with a new topic from you is a better day.
Almost everytime I am searching about ML on your blog first. 🙂

Loading...