Moving toward Named Entity Recognition
So the POS tagger works reasonably well at this stage. The process of building it into a web page is ongoing, but for now a segue to the ultimate goal which is Named Entity Recognition.
The NLTK book is our friend again here and this chapter has extensive detail on how to build NER systems and how to use their built-in one.
So the process I will follow to build this will be similar to that for the POS tagger:
- Get some appropriately annotated data. This looks perfect for our needs.
- Pick a feature set. For the POS set I took the feature set used to build the NLTK tagger. Not sure what I will do here, but the dataset seems to provide some clues based on the columns provided in the second table.
- Train a NN to predict the IOB tag for a word and therefore if it is a Named Entity.
To incorporate this into the JavaScript code I will include 2 models, one to predict the POS tags and a second which will consume these tags and word/word fragments to produce the Named Entities.
Comments
Post a Comment