AI Sketchbook Series #1 - Text Representation - One Hot Encoding

This series aims to demystify fundamental concepts in AI using a semi-visual approach that combines sketches, diagrams, and concise notes. The goal is to build intuition for more complex conepts by illustrating their foundational building blocks.
For this 1st post of the series, understanding how text is represented numerically is a foundational step in Natural Language Processing (NLP). One-Hot Encoding, while simple, provides a crucial initial insight into this process. This visual exploration will lay the groundwork for grasping more complex techniques used in modern language models (like the initial stages of some LLM architectures) and information retrieval systems (where categorical features might be one-hot encoded before vectorization for database indexing). By visualizing this basic concept, we can build a clearer intuition for how machines begin to process and understand language. Stay tuned for the full sketch !
Subscribe to my newsletter
Read articles from Walid Hajeri (WalidHaj) directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Walid Hajeri (WalidHaj)
Walid Hajeri (WalidHaj)
Customer Engineer with a passion for well-designed tech products. Tech side - Interest in Cloud-native App Dev & AI Other side - University of Paris 1 Sorbonne alumnus, grew up in a creative family, passionate about all things related to visual arts & design in general.