Modifying the POS tagger to predict on large body of test - python

So the output from the tensorflow.js model works well and the POS tags that are being predicted match well those in the training text. 

Next I am trying to run some predictions on a larger body of text. Extracting the Named Entities from a novel is one of the goals of this project. I have tested the predictions code on the python side, but just using the training data as input. This is fine for rough testing, but I need the python to be able to accept a large string of text and apply POS tags to every word. 

I downloaded Bleak House from Project Gutenberg. Stripped out some of the pre and post boilerplate and tried to run some predictions on that. 

I realised at this point that I had not made any allowance for unknown words. When you build your vocab from the training data and test with that too, this does not come up. I used defaultdicts to get this working. 

The tags being predicted look wrong at this stage, but I will debug next. Once the POS tagger works on the python side with a large body of text I can get that working on the client side. 

Comments

Popular posts from this blog

Recreating python feature extraction code in JavaScript

Named Entity Recognition for Bleak House

Using curl scriptlets from a single file in VS Code