NLP and Word Embeddings

Natural language processing with deep learning is a powerful combination. Using word vector representations and embedding layers, train recurrent neural networks with outstanding performance across a wide variety of applications, including sentiment analysis, named entity recognition and neural machine translation.
Learning Objectives
Explain how word embeddings capture relationships between words
Load pre-trained word vectors
Measure similarity between word vectors using cosine similarity
Use word embeddings to solve word analogy problems such as Man is to Woman as King is to ______.
Reduce bias in word embeddings
Create an embedding layer in Keras with pre-trained word vectors
Describe how negative sampling learns word vectors more efficiently than other methods
Explain the advantages and disadvantages of the GloVe algorithm
Build a sentiment classifier using word embeddings
Build and train a more sophisticated classifier using an LSTM
Introduction to Word Embeddings
Word Representation
1-hot representation: but you cannot see the similarity between apple and orange, or king and queen.
Featured representation: word embedding(described by features, like Gender, Royal, Age, Food, 300 features )
Using Embeddings
Using embeddings(feature dimensions) to replace one-hot-vector, allowing algorithm generalize much better.
Transfer learning and word embeddings
learn word embeddings from a large text corpus. (1-100B words), or download pre-trained embedding online
transfer embedding to new task with smaller training set
Optional: continue to finetune the word embeddings with new data.
Relation to face encoding, people using the encoding and embedding interchangeable.
Properties of World Embeddings
Help with analogy reasoning.
Analogies using word vectors:
if two vectors having the same directions and modules, then mathematically they are the same vectors; the following equation maybe uses to measure the similarities of two words.
$$e_{man} - e_{woman} ≈ e_{king} - e_{queen}$$
t-SNE (t-distributed Stochastic Neighbor Embedding), which helps visualize high-dimensional data in a lower-dimensional space while preserving the relationships between data points. while t-SNE can provide a visual representation, the mapping is non-linear, and some relationships may not be preserved in the 2D representation.
Cosine similarity
Cosine Similarity between two vectors ( u ) and ( v ) is given by:
[Cosine Similarity=u⋅v|u||v|]
Where:
( u \cdot v ) is the dot product of the vectors ( u ) and ( v ).
( |u| ) and ( |v| ) are the Euclidean norms (lengths) of the vectors ( u ) and ( v ), respectively.
This formula measures the cosine of the angle between the two vectors, providing a value between -1 and 1, where:
1 indicates that the vectors are identical,
0 indicates orthogonality (no similarity), and
-1 indicates that the vectors are diametrically opposed.
Learning Word Embeddings
how word embeddings learned? Word2Vec model.
Learning Word Embeddings
How word embeddings are learned, and utilized in language.
In the context of the lecture, the big E typically stands for the embedding matrix, which each row is a word embedding, which capture semantic relationships, and allowing model to understand the meaning of words in the context.
Context, the model predict the next word in a sequence based on a given context, for example, maybe the previous four words and follow up four words.
Contextual learning: it is important in learning meaning embeddings, showing how different contexts can lead to effective word representations. As the model trains, it generates embeddings that represent words in a continuous vector space, where similar words are located closer together. This allows the model to understand and utilize semantic relationships effectively.
Subscribe to my newsletter
Read articles from Yichun Zhao directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Yichun Zhao
Yichun Zhao
Developer | Adept in software development | Building expertise in machine learning and deep learning