Modifying the POS tagger to predict on large body of test

Modifying the POS tagger to predict on large body of test - python

March 03, 2021

So the output from the tensorflow.js model works well and the POS tags that are being predicted match well those in the training text.

Next I am trying to run some predictions on a larger body of text. Extracting the Named Entities from a novel is one of the goals of this project. I have tested the predictions code on the python side, but just using the training data as input. This is fine for rough testing, but I need the python to be able to accept a large string of text and apply POS tags to every word.

I downloaded Bleak House from Project Gutenberg. Stripped out some of the pre and post boilerplate and tried to run some predictions on that.

I realised at this point that I had not made any allowance for unknown words. When you build your vocab from the training data and test with that too, this does not come up. I used defaultdicts to get this working.

The tags being predicted look wrong at this stage, but I will debug next. Once the POS tagger works on the python side with a large body of text I can get that working on the client side.

Search This Blog

30 days of ML

Modifying the POS tagger to predict on large body of test - python

Comments

Post a Comment

Popular posts from this blog

Execute Jupyter notebooks line by line in VS Code

Using TensorFlow Serving

Text Summarisation with BERT