Posts

Showing posts from February, 2021

A Javascript Refresher

Building the prediction part of the POS tagger has shown me that I need to brush up on my javascript. The list of resources given by Facebook on the React getting started page are very good.  MDN is always my go to, but I had not come across The Modern JavaScript Tutorial which also looks excellent. So worth a little time over the next few days to tidy this up so that I am not getting tangled in basic JS concepts when I should be getting the model working in the browser.  At the moment I am writing the JS code to convert a set of input words into an appropriate form for the model to predict the part of speech.  This is roughly how it works for now Given a set of words, split them and convert to a windowed dataset of length 5. This covers all of the input features that I used to build the model, so for each word I just need the 2 before and the 2 after.  Seems straightforward, but the tricky bit is getting started. Once the user has input 5 words I am ready to conve...

Moving toward Named Entity Recognition

So the POS tagger works reasonably well at this stage. The process of building it into a web page is ongoing, but for now a segue to the ultimate goal which is Named Entity Recognition. The NLTK book is our friend again here and this chapter has extensive detail on how to build NER systems and how to use their built-in one.  So the process I will follow to build this will be similar to that for the POS tagger:  Get some appropriately annotated data.  This looks perfect for our needs. Pick a feature set. For the POS set I took the feature set used to build the NLTK tagger. Not sure what I will do here, but the dataset seems to provide some clues based on the columns provided in the second table.  Train a NN to predict the IOB tag for a word and therefore if it is a Named Entity. To incorporate this into the JavaScript code I will include 2 models, one to predict the POS tags and a second which will consume these tags and word/word fragments to produce the Named Ent...

Using a Tensorflow model in the Browser

Image
So the model from yesterday performs pretty well. Next step is to get it running in the browser for a simple demo app. As text is typed the model will output the parts of speech. Converting TensorFlow models to TensorFlow.js is very straightforward. First install tensorflowjs at the command line using pip.  pip install tensorflowjs Then run a single command to convert your model. You can find the details here . There are plenty of options that you can set, but the vanilla version worked fine for me.  tensorflowjs_converter data/model ./output_dir --input_format=tf_saved_model Next I needed a bit of JavaScript to load the model into the page and make a prediction. You can see that I have stopped this in the Chrome debugger so that I could fiddle with the shape of the tensor for prediction.  So I just chose some numbers to fill the tensor and run a prediction: Just to check that everything is working as I expect I ran the same tensor into the python version of the model: S...

Simple POS tagger with 92% accuracy

Image
My goal here is to build a POS tagger using Keras. I initially did some very poor attempts at the tagger with things like bigrams. Next I tried Sequence models. Could not get them to work. They may not be inherently a bad idea for this, but everything about the code felt like overkill. Also I had yet to get it to work, so that really convinced me to drop that idea.  My most recent effort is based on using the features described here for building a tagger.  The encoding of the tags and words had a few gotchas. Although the input features consist of a mixture of words, suffixes, prefixes and tags from other parts of the context, the target is just a POS tag. Leaving the one hot encoding of the target at the overall vocab size - about 47k tokens would have given very poor results. There are less than 50 tags. Needle in haystack stuff. So 2 vocabularies were needed. I needed to cover the full range of input features in the vocab, so this involved adding tokens for the various wor...

Sequence models are not the right fit for POS taggers

So I was trying to train a sequence model to predict the POS tag for a word. The idea I was going with was that POS tags in a body of text follow some sort of sequence - much the same way as words do. This sequential nature of Natural Language is a cornerstone of NLP.  On reflection - and having tried unsuccessfully to train a Sequence model - I have come to the conclusion that this does not work. Here are some of the things I tried I changed the labelled training data into a long list of tuples of Word and POS tag. Stuff like ('run', 'VB'). I split this up into a windowed dataset - i.e. I created subsequences of n input tuples and 1 target tuple, n being the sequence length which I arbitrarily set to 20.   Then I created dictionaries for the various vocabularies - these were not word vocabularies, but tuple vocabularies. So an index number for each of the (word, POS) tuples in the training data. I translated the training data into a set of integer lists in this way....

Playing with Part of Speech (POS) taggers

Breaking text up into parts of speech is useful for a variety of tasks. When a query is sent to a search engine for example the part of speech of a word in the query will be a significant determinant of the results.  Overall the aim of this project is to build a Named Entity Recognition system. But one of the useful feeds into this is creating a POS tagger. This will demonstrate some of the related techniques for working with language and the generated Parts of Speech will probably be useful features in the more advanced Named Entity Recognition model.  So yesterday I said I would identify the data to use for training and testing and set up the shell of a github project. Here  is the project and the data I am using is from the Wall Street Journal. It was supplied as part of the Coursera course. However any POS tagged dataset should be fine.  I tried 3 approaches Find the most common tag for each word in the training set. Just use these (and a tag for Unknown) to make...

What will I do for the next 30 days?

I have been taking online courses in ML for several years now, mainly on Coursera. I do some ML in work too, but want to use this space to push my skills further. I have a lot of ideas of things to do with ML, but usually fail to deliver a finished product. There are a bunch of reasons, but distraction and busyness are high on this list.  So I will start simple and try to build a straightforward model to make some predictions. Ideally that will go into a web page so that the end result can be demonstrated.  I have just completed this course in Natural Language Processing. Lots of interest in there, but one of the simpler models was for Named Entity Recognition. Given a body of text - a book say - can a model find out what are the named entities - people and places for example? This type of model forms the basis for a bunch of other useful functions. How often does each character appear in the book? Using sentiment analysis across the span of the book is it possible to determi...