Experimenting with pretrained embedding vectors

March 11, 2021

The GloVe word vectors from Stanford are a good place to start with using pretrained weights in an embedding layer. They consist of a large vocabulary - about 400k words and a word vector for each one. They come in various vector lengths from 50 to 300.

When you are adding an embedding layer in keras you can specify the weights to use and set the layer as untrainable.

So this is my model:

def get_pretrained_model():

model = keras.Sequential()

model.add(layers.Embedding(len(word_index)+1, 200, input_length=max_len, weights=[embeddings_matrix], trainable=False))

model.add(layers.Bidirectional(layers.LSTM(32)))

model.add(layers.Dense(6, activation='relu'))

model.add(layers.Dense(1, activation='sigmoid'))

Despite training this on a Colab GPU it still only gets to just above 80% accuracy.

I am using the IMDB sentiment dataset and may be coming up against the limits of the length of that dataset.

Search This Blog

30 days of ML

Experimenting with pretrained embedding vectors

Comments

Post a Comment

Popular posts from this blog

Execute Jupyter notebooks line by line in VS Code

Using TensorFlow Serving

Text Summarisation with BERT