Currently reading Chip Huyen’s AI Engineering Book, and I plan to write about some concepts I come across while reading. This is going to be the first in a series of articles, and in this article, I’d be discussing Language Models

#What’s a Language Model

Language Models are a type of AI model designed to understand, generate, and manipulate human language. It encodes statitistical information about one or more languages, this information let’s us know how likely a word is to occur in a given context.

A token is the base unit of a Language Model. It can be a character, a word, or a part of word. The process of breaking down a text into tokens is called tokenization. The following examples are illustrations of how words/sentences are broken into tokens

NoneBashCSSCC#GoHTMLObjective-CJavaJavaScriptJSONPerlPHPPowershellPythonRubyRustSQLTypeScriptYAMLCopy

word -> word
chatting -> chat + ing
football -> football
the game is the game -> the + game + is + the + game
wouldn't -> would + n't

You can view how a sentence is broken down into tokens by the various OpenAI language models here.

#Why is a token used as the base unit instead of a word or a character

Tokens allow the model to break a word down into meaningful parts, unlike words, so a word like cooking can be broken down into cook + ing
There are fewer unique tokens compared to unique words, which allows the model's vocabulary size to be smaller
Tokens also help a model break down unknown words into meaningful parts, for example, a made-up word like candiding can be broken into candid + ing

#Types of Language Models

There are two different types of Language Models, they differ based on the information they can use to predict a token

Masked Language Model

A masked LM is trained to predict missing tokens anywhere in the sequence using context from the tokens before and after the missing piece to predict the missing token. It essentially helps you fill in the gap. An example of a Masked Language Model is the Bidirectional Encoder Representation (BERT).

For example, giving the following statement: My favourite _______ is Lionel Messi. A masked language model should be able to predict that the missing token is Football player .

Masked Language Models are commonly used for non-generative tasks such as sentiment analysis and text classification. It’s also useful for tasks that require an understanding of the overall context of the system such as code debugging.

Autoregressive Language Model

Autoregressive Language Models are trained to predict the next token in a sequence using only the preceding tokens. For example, it should be able to predict or give suggestions of what’s next in the following sequence

NoneBashCSSCC#GoHTMLObjective-CJavaJavaScriptJSONPerlPHPPowershellPythonRubyRustSQLTypeScriptYAMLCopy
```
 My favourite color is ____
```
It can continually generate one token after another. It’s useful for text generation and are more popular than masked language models.

Language Models work mostly with text; multimodal models are models that can work with more than one data mode e.g, images, text, video, or audio in any combination. A generative multimodal model is also called a large multimodal model (LMM). If a language model generates the next token conditioned on text-only tokens, a multimodal model generates the next token conditioned on both text and image tokens, or whichever modalities the model supports

#Large Language Models

Large language models, also known as LLMs, are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

Large language models are incredibly flexible. One model can perform completely different tasks, such as answering questions, summarizing documents, translating languages, and completing sentences

#Final thoughts

Language models form the foundation of many modern AI systems. As I continue reading and exploring more from the AI Engineering book, I’m looking to write more about how these models are built and used within real-world applications.

Stay tuned for the next post in the series!

Language Models: A Reader's Journey with Chip Huyen

#What’s a Language Model

#Why is a token used as the base unit instead of a word or a character

#Types of Language Models

#Large Language Models

#Final thoughts

Subscribe to my newsletter

Abdullateef (Belle)

Abdullateef (Belle)

Language Models: A Reader's Journey with Chip Huyen

#What’s a Language Model

#Why is a token used as the base unit instead of a word or a character

#Types of Language Models

#Multi Modal Models

#Large Language Models

#Final thoughts

Subscribe to my newsletter

Abdullateef (Belle)

Abdullateef (Belle)