Sentiment Analysis based on IMDB Movie Review dataset

So the next step in analysing novels is to create a sentiment analyser. I got the IMDB movie review dataset from Kaggle here.

There are a number of challenges here. The sentiment analysis I have done before was based on Twitter data. Short input data. Some of the inputs to the movie review db are over 1000 words long. So a histogram of the lengths of the reviews looks like this:


A cutoff of 1000 max words seems sensible as a first pass. That still constitutes a lot of uninformative padding, but I can refine the method later. 

So my first model looks like this:


This gives categorical accuracy of about 84%. Not great, but could be worse. 

I ran a bunch of predictions on movie reviews. Some that I made up and others from the test set. The model was 'correct' most of the time, but was very close to the middle of the distribution every time. It does not inspire much confidence. 


So before I try to get this working on a long piece of work like a novel I need to improve the model quite a bit. The plan is to run inference for this model on the server side. No extra benefit in getting it running on TFJS. 


Comments

Popular posts from this blog

Execute Jupyter notebooks line by line in VS Code

Using TensorFlow Serving

Text Summarisation with BERT