Understanding Generative AI and the role of AI Engineer

What is GenAI?

Generative AI (GenAI) is a type of artificial intelligence that creates new content such as text, images, music, or code by learning patterns from existing data, mimicking human creativity.

In machine learning, mathematical or statistical models are fed data to find patterns for predictions. Unlike traditional ML models that focus on prediction or classification, GenAI's biggest differentiating feature is its ability to create new text, images, videos or even code.

Until recently, AI was not seen as capable of replicating human creativity. However, GenAI has changed that perception in just a few years time, making it possible to mimic human creativity with ease.

Now comes the big question— Is GenAI truly successful?

In my view, if a technology can confidently answer "yes" to the following questions, then it's fair to call it successful. The internet is an example of a truly successful technology, as it answers "yes" to all the key questions unlike blockchain, which still falls short in several areas. Now, let’s see how GenAI measures up.

Does it solve real world problems?
Absolutely. From automating customer support to improving diagnostics in healthcare and enhancing accessibility in education, GenAI is being applied meaningfully across industries.
Is it useful on a daily basis?
Yes. I use coding assistants like Windsurf almost every day. Whether it’s writing code, drafting content, or getting quick answers, GenAI tools have become part of my routine.
Is it impacting world economics?
Undeniably. Take the Chinese model DeepSeek. Its release earlier this year caused a $1 trillion drop in US tech stocks in a single day. That’s a clear sign of the economic ripple effects GenAI can create.
Is it creating new jobs?
Yes again. While some argue that AI may displace jobs, GenAI has also created entirely new roles like "AI Engineer". Demand for such positions is rising quickly and is projected to become as common as software or web developer roles in the next five years.
Is it accessible?
Completely. You don’t need to know how to code to use GenAI. People across the globe, speaking English, Hindi, or any other language, are interacting with these models effortlessly using natural language.

The Challenge of Learning GenAI

The major problem with studying GenAI is the field’s rapid pace of evolution. Every day, there’s a new model, research paper, or tool being released, making it nearly impossible to follow a structured learning path.

On top of that, there’s an overwhelming amount of information out there. Figuring out what’s actually useful is tough. Platforms like LinkedIn are filled with hype and FOMO, which only adds to the noise.

In truth, we’re all still figuring this field out together.

A Mental Model to Understand GenAI

At the core of GenAI are foundation models, which are massive models trained on huge datasets (often the entire internet) using powerful hardware like hundreds of GPUs.

Unlike traditional ML models that are task-specific, foundation models are general-purpose and can be fine-tuned for diverse tasks across different domains.

Large Language Models (LLMs) are a prime example of foundation models and are the backbone of GenAI today. LLMs can perform diverse tasks like text generation, sentiment analysis, summarization, and question answering because they are trained on massive datasets with large architectures and numerous parameters.

Beyond LLMs, there are also Large Multimodal Models (LMMs) that can handle inputs like images, videos, and sound, not just text. That is why the term "foundation models" is more accurate when referring to the central component of GenAI.

Two Sides of GenAI: Builder vs User

The GenAI ecosystem splits neatly into two perspectives:

Builder's Perspective – Focuses on building (developing, training and deploying) foundation models themselves.
User's Perspective – Focuses on using already built foundation models to develop applications.

Builder's Perspective: Creating Foundation Models

This side is more technical and typically involves roles like research scientists, data scientists, machine learning engineers, and MLOps. It focuses on creating and deploying foundation models for global use.

Key Prerequisites :

Machine learning & deep learning fundamentals
Frameworks like PyTorch (preferred) or TensorFlow

Core Topics:

Transformer Architecture: Understanding the core architecture, including encoder/decoder sides, embeddings, self-attention mechanisms, layer normalization, and language modeling concepts.
Types of Transformers: Learning about encoder-only, decoder-only, and encoder-decoder based transformers, with specific focus on architectures like BERT and GPT.
- Encoder Only (BERT)
- Decoder Only (GPT)
- Encoder & Decoder based (T5)
Pre-training: This involves training foundation models on vast datasets. Topics include -
- Training Objectives
- Tokenization Strategies
- Training Strategies (on-machine, cloud, distributed)
- Handling challenges of large-scale training
Optimization: Techniques to make large foundation models runnable on typical hardware.
- Training Optimization
- Model compression
- Optimizing Inference
Fine-tuning: Adapting a generalized foundation model for specific tasks or types of tasks to enhance its performance.
- Task specific tuning
- Instruction tuning
- Continual Pretraining
- RLHF
- PEFT
Evaluation: Applying thorough evaluation techniques and metrics to assess a model's performance after fine-tuning, similar to how LLM leaderboards compare models such as GPT-4, DeepSeek, Claude using standardized benchmarks.
Deployment: The final crucial step of making the trained models accessible for widespread use.

User’s Perspective: Using Foundation Models

This side is more accessible and appeals to developers interested in building apps using pre-trained models. It focuses on leveraging pre-built foundation models to create applications.

Core Tools and Concepts:

Building Basic LLM Applications: Learning how to use different types of available LLMs (closed-source via APIs, open-source via tools like Hugging Face or Ollama for local execution), and utilizing frameworks like LangChain to build LLM-based applications.
- Open Source vs Closed Source LLMs
- Using LLM APIs
- LangChain
- HuggingFace
- Ollama
Prompt Engineering: The art and science of writing effective prompts to get better and more refined answers from an LLM.
Retrieval Augmented Generation (RAG): A technique allowing an LLM to answer questions based on private or specific external documents that it wasn't originally trained on, by showing it that private data.
Fine-tuning: A shallower level of fine-tuning compared to the builder's side, where users can adapt an LLM for their specific needs.
AI Agents: These are advanced systems built on top of LLMs that go beyond simple conversations. They can reason, make decisions, and take actions by using external tools or APIs. For example, an AI agent can chat with a user, check flight availability, and book tickets - all within the same interaction.
LLM Ops (LLM Operations): The process of developing, deploying, evaluating, and improving LLM-based applications for customers or clients. This involves technical handling throughout the application lifecycle.
Miscellaneous (Multimodal Models): Exploring how to work with foundation models that handle inputs and outputs beyond text, such as audio and video, including the study of diffusion-based models like Stable Diffusion.

So, Should an AI Engineer Learn Both?

Yes. The AI Engineer role sits at the intersection of both perspectives. Understanding how foundation models are built can significantly enhance one's ability to operate effectively from the user's perspective, leading to better career opportunities and salary potential.

Let’s not forget: GenAI is still evolving. But if you grasp both the builder and user mindsets, you're positioning yourself to lead in one of the most exciting tech revolutions of our time.

Demystifying GenAI and the Role of an AI Engineer