Moving toward Named Entity Recognition

So the POS tagger works reasonably well at this stage. The process of building it into a web page is ongoing, but for now a segue to the ultimate goal which is Named Entity Recognition.

The NLTK book is our friend again here and this chapter has extensive detail on how to build NER systems and how to use their built-in one. 

So the process I will follow to build this will be similar to that for the POS tagger: 

  1. Get some appropriately annotated data. This looks perfect for our needs.
  2. Pick a feature set. For the POS set I took the feature set used to build the NLTK tagger. Not sure what I will do here, but the dataset seems to provide some clues based on the columns provided in the second table. 
  3. Train a NN to predict the IOB tag for a word and therefore if it is a Named Entity.
To incorporate this into the JavaScript code I will include 2 models, one to predict the POS tags and a second which will consume these tags and word/word fragments to produce the Named Entities.

Comments

Popular posts from this blog

Execute Jupyter notebooks line by line in VS Code

Using TensorFlow Serving

Text Summarisation with BERT