Modifying the POS tagger to predict on large body of test - python
So the output from the tensorflow.js model works well and the POS tags that are being predicted match well those in the training text.
Next I am trying to run some predictions on a larger body of text. Extracting the Named Entities from a novel is one of the goals of this project. I have tested the predictions code on the python side, but just using the training data as input. This is fine for rough testing, but I need the python to be able to accept a large string of text and apply POS tags to every word.
I downloaded Bleak House from Project Gutenberg. Stripped out some of the pre and post boilerplate and tried to run some predictions on that.
I realised at this point that I had not made any allowance for unknown words. When you build your vocab from the training data and test with that too, this does not come up. I used defaultdicts to get this working.
The tags being predicted look wrong at this stage, but I will debug next. Once the POS tagger works on the python side with a large body of text I can get that working on the client side.
Comments
Post a Comment