Natural language processing with deep learning is a powerful combination. Using word vector representations and embedding layers, train recurrent neural networks with outstanding performance across a wide variety of applications, including sentiment analysis, named entity recognition and neural machine translation.

Learning Objectives

Explain how word embeddings capture relationships between words
Load pre-trained word vectors
Measure similarity between word vectors using cosine similarity
Use word embeddings to solve word analogy problems such as Man is to Woman as King is to ______.
Reduce bias in word embeddings
Create an embedding layer in Keras with pre-trained word vectors
Describe how negative sampling learns word vectors more efficiently than other methods
Explain the advantages and disadvantages of the GloVe algorithm
Build a sentiment classifier using word embeddings
Build and train a more sophisticated classifier using an LSTM

Introduction to Word Embeddings

Word Representation

1-hot representation: but you cannot see the similarity between apple and orange, or king and queen.

Featured representation: word embedding(described by features, like Gender, Royal, Age, Food, 300 features )

Using Embeddings

Using embeddings(feature dimensions) to replace one-hot-vector, allowing algorithm generalize much better.

Transfer learning and word embeddings

learn word embeddings from a large text corpus. (1-100B words), or download pre-trained embedding online
transfer embedding to new task with smaller training set
Optional: continue to finetune the word embeddings with new data.

Relation to face encoding, people using the encoding and embedding interchangeable.

Properties of World Embeddings

Help with analogy reasoning.

Analogies using word vectors:

if two vectors having the same directions and modules, then mathematically they are the same vectors; the following equation maybe uses to measure the similarities of two words.

$$e_{man} - e_{woman} ≈ e_{king} - e_{queen}$$

t-SNE (t-distributed Stochastic Neighbor Embedding), which helps visualize high-dimensional data in a lower-dimensional space while preserving the relationships between data points. while t-SNE can provide a visual representation, the mapping is non-linear, and some relationships may not be preserved in the 2D representation.

Cosine similarity

Cosine Similarity between two vectors ( u ) and ( v ) is given by:

[Cosine Similarity=u⋅v|u||v|]

Where:

( u \cdot v ) is the dot product of the vectors ( u ) and ( v ).
( |u| ) and ( |v| ) are the Euclidean norms (lengths) of the vectors ( u ) and ( v ), respectively.

This formula measures the cosine of the angle between the two vectors, providing a value between -1 and 1, where:

1 indicates that the vectors are identical,
0 indicates orthogonality (no similarity), and
-1 indicates that the vectors are diametrically opposed.

Learning Word Embeddings

how word embeddings learned? Word2Vec model.

Learning Word Embeddings

How word embeddings are learned, and utilized in language.

In the context of the lecture, the big E typically stands for the embedding matrix, which each row is a word embedding, which capture semantic relationships, and allowing model to understand the meaning of words in the context.

Context, the model predict the next word in a sequence based on a given context, for example, maybe the previous four words and follow up four words.

Contextual learning: it is important in learning meaning embeddings, showing how different contexts can lead to effective word representations. As the model trains, it generates embeddings that represent words in a continuous vector space, where similar words are located closer together. This allows the model to understand and utilize semantic relationships effectively.

NLP and Word Embeddings

Learning Objectives

Introduction to Word Embeddings

Word Representation

Using Embeddings

Properties of World Embeddings

Cosine similarity

Learning Word Embeddings

Learning Word Embeddings

Subscribe to my newsletter

Yichun Zhao

Yichun Zhao