Chapter 1 delved into the absolute basics of LLMs . Raschka began by explaining that Large Language Models (LLMs) are simply advanced computer programs that are designed to read, understand, and generate the human language. He then goes on to differentiate the newer system from older systems by highlighting how LLMS use deep learning techniques, which is a subset of machine learning, in order to learn from massive amounts of text (think books encyclopedias, etc. ). Learning from massive amounts of text allows LLMs to be able to perform such complex tasks such as answering math questions, writings short stories, or even translating language. Raschka states they do this process by predicting what words will come next in a sentence. For example if a machine learning model was given the phrase, “the dog ate the ___” it would fill in the blank with the highest likely word such as “food“, he goes into more depth later on in the book about this process.

A key innovation behind the modern LLM that Raschka talks about in this chapter is the transformer architecture. Transformers are excellent at paying attention to all words in a sentence, and in relation to each other, rather than just one at a time. This process helps LLMs understand the context as well as the semantic relationship between words, making the model a lot more flexible and powerful then prior approaches. An example of this is the word “run“, while it has multiple distinguishable meanings to us such as running a Cafe or running a mile prior models where unable to comprehend this difference. However, transformer architecture is a solution to this problem.

Thank you for reading,

Abhyut Tangri

Summary of Chapter 1 from Building LLMs from Scratch

Subscribe to my newsletter

Abhyut Tangri

Abhyut Tangri