Graphics Pipeline: Real-Time Rendering Explained

Modern GPUs are built to execute a highly parallel and programmable pipeline designed for transforming vertex data into rendered pixels. Whether using OpenGL, Vulkan, or DirectX, the fundamental structure remains similar. The graphics pipeline consists of sequential programmable and fixed-function stages that work on geometry and pixel data.

This article walks through each stage from a technical standpoint, including C++ and GLSL examples.

Input Assembly: Structuring Geometry for the GPU

The process begins on the CPU side by defining geometry in terms of vertices, which form the basis for all rendering.

struct Vertex {
    glm::vec3 position;
    glm::vec3 normal;
    glm::vec2 texCoord;
};

These vertices are uploaded to GPU memory using Vertex Buffer Objects (VBOs). A Vertex Array Object (VAO) is then created to describe the layout of these attributes and bind them efficiently.

GLuint vao, vbo;
glGenVertexArrays(1, &vao);
glGenBuffers(1, &vbo);

glBindVertexArray(vao);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);

// Position
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), (void*)0);
glEnableVertexAttribArray(0);

// Normal
glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), (void*)offsetof(Vertex, normal));
glEnableVertexAttribArray(1);

// TexCoord
glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, sizeof(Vertex), (void*)offsetof(Vertex, texCoord));
glEnableVertexAttribArray(2);

Vertex Shader: Per-Vertex Transformation

The Vertex Shader is the first programmable stage in the GPU pipeline. It transforms each vertex from local object space into clip space using a model-view-projection (MVP) matrix.

#version 450
layout(location = 0) in vec3 inPosition;
layout(location = 1) in vec3 inNormal;
layout(location = 2) in vec2 inTexCoord;

layout(set = 0, binding = 0) uniform MVP {
    mat4 model;
    mat4 view;
    mat4 projection;
} mvp;

layout(location = 0) out vec3 fragNormal;
layout(location = 1) out vec2 fragTexCoord;

void main() {
    gl_Position = mvp.projection * mvp.view * mvp.model * vec4(inPosition, 1.0);
    fragNormal = mat3(transpose(inverse(mvp.model))) * inNormal;
    fragTexCoord = inTexCoord;
}

Operations include space transformation, normal correction, and forwarding interpolated values to the fragment stage.

Primitive Assembly and Clipping

After vertex processing, the GPU assembles vertices into primitives based on the drawing mode (e.g., GL_TRIANGLES). These primitives are then clipped against the view frustum to discard geometry outside the camera’s field of view.

Clipping is done in homogeneous clip space, and the GPU uses perspective division to transform coordinates into Normalized Device Coordinates (NDC).

Rasterization: From Primitives to Fragments

Once primitives are assembled and clipped, the rasterizer maps them onto screen-space pixels. The result is a grid of fragments, each carrying interpolated data such as texture coordinates, normals, or custom attributes.

Rasterization does not produce actual colors yet. It simply determines which pixels are covered by which primitives and prepares data for the fragment shader.

Fragment Shader: Computing Final Pixel Colors

The Fragment Shader is executed once per fragment and outputs the final pixel color. It performs lighting, texturing, and any other per-pixel effects.

#version 450
layout(location = 0) in vec3 fragNormal;
layout(location = 1) in vec2 fragTexCoord;
layout(location = 0) out vec4 outColor;

layout(set = 1, binding = 0) uniform sampler2D diffuseTexture;

void main() {
    vec3 normal = normalize(fragNormal);
    vec3 lightDir = normalize(vec3(0.5, 0.8, 0.6));
    float diff = max(dot(normal, lightDir), 0.0);
    vec3 color = texture(diffuseTexture, fragTexCoord).rgb;
    outColor = vec4(color * diff, 1.0);
}

The output of this shader is then passed through various post-processing stages before being written to the framebuffer.

Per-Fragment Operations: Depth, Stencil, and Blending

Before a fragment becomes a pixel, several operations determine its fate.

Depth Testing compares the fragment’s depth with existing depth buffer contents. If it fails, the fragment is discarded.
Stencil Testing can mask out certain areas of the screen based on complex rules.
Blending combines the incoming fragment color with the color already present in the framebuffer. This is crucial for transparency.

Example OpenGL blending setup:

glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);

Framebuffer: Writing the Final Output

The accepted fragment is written to a Framebuffer Object (FBO). This framebuffer may be the default one (for direct display) or a custom one used for post-processing.

Framebuffers can have multiple attachments for color, depth, and stencil. Multiple render targets (MRT) allow writing to several color attachments at once, useful for deferred shading.

Optional Stages: Advanced Pipeline Features

Modern rendering pipelines support additional programmable stages:

Geometry Shader

Executed after the vertex shader, it can generate new primitives from existing ones. It’s useful for dynamic LOD, wireframe generation, or billboard creation.

layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;

void main() {
    for (int i = 0; i < 3; ++i) {
        gl_Position = gl_in[i].gl_Position;
        EmitVertex();
    }
    EndPrimitive();
}

Tessellation Shaders

Used in conjunction with patches to dynamically subdivide geometry on the GPU. Controlled via a Tessellation Control Shader (TCS) and Tessellation Evaluation Shader (TES). These are used for smooth surface rendering in high-end applications.

Compute Shaders

Though not part of the rasterization pipeline, compute shaders can pre-process textures, simulate particle systems, or do physics-based calculations entirely on the GPU.

layout(local_size_x = 16, local_size_y = 16) in;
layout(rgba32f, binding = 0) uniform image2D outputImage;

void main() {
    ivec2 id = ivec2(gl_GlobalInvocationID.xy);
    vec4 color = vec4(float(id.x)/800.0, float(id.y)/600.0, 0.0, 1.0);
    imageStore(outputImage, id, color);
}

GPU rendering follows a structured sequence where vertex data is transformed, shaded, and tested before becoming final pixel output. Each frame reflects a pipeline of discrete, highly parallel stages operating with precision across the GPU.

As the internal flow becomes clearer, rendering techniques such as deferred shading, post-processing, and physically based lighting begin to align with the underlying mechanics. The process retains its complexity, but the architecture behind it becomes accessible and deliberate rather than opaque.

Understanding the Graphics Pipeline: A Deep Dive into Real-Time Rendering