LLVM Instruction Selection: SelectionDAG Building

Yuanbo LiYuanbo Li
4 min read

LLVM's instruction selection (ISel) is a key stage in the LLVM backend that translates high-level LLVM IR into target-specific machine instructions. This post focuses on the frontend of instruction selection—the phase that builds the SelectionDAG from LLVM IR—and illustrates how special nodes like chain and glue enforce proper execution order in a functional representation.


Overview of LLVM Instruction Selection

LLVM's instruction selection pipeline consists of several stages:

  1. Frontend Interface: Converts LLVM IR into an initial SelectionDAG.

  2. Legalization: Adjusts the DAG to meet target-specific instruction constraints.

  3. Instruction Selection: Pattern-matches DAG nodes to actual machine instructions.

  4. Post-selection Cleanup: Prepares data structures for final emission.

  5. MachineInstr Emission: Emits final machine code instructions.

This post focuses on the frontend interface, the bridge between LLVM IR and the SelectionDAG.

Instruction Selection Frontend Interface

Purpose

  • What: Converts LLVM IR to a target-independent SelectionDAG representation.

  • Why: Enables mid-level and target-independent optimizations before lowering to machine code.

High-Level Behavior

LLVM builds a SelectionDAG for each basic block in the LLVM IR. Each DAG is composed of SDNode objects representing arithmetic, memory, and control flow operations. Data dependencies form a directed acyclic graph (DAG) that supports both optimization and instruction selection.

Key Data Structures

SelectionDAG

Defined in llvm/include/llvm/CodeGen/SelectionDAG.h, this class stores the DAG:

class SelectionDAG {
  SDValue Root;
  FoldingSet<SDNode> AllNodes;
  NodeAllocatorType NodeAllocator;
  ...
};
  • Root: Final output of the block.

  • FoldingSet: Supports node uniquing (hash-consing).

  • NodeAllocator: Bump-pointer allocator for fast memory allocation.

SDNode

Nodes in the DAG. Each represents an operation and tracks:

  • Opcode (ISD::ADD, ISD::CALL, etc.)

  • Operands (other SDValues)

  • Value types

  • Debug info and flags (e.g., nuw, exact)

class SDNode {
  int32_t NodeType;
  SDUse *UseList;
  SDValue *OperandList;
  const EVT *ValueList;
  DebugLoc debugLoc;
  ...
};

SDValue

A handle to a specific value produced by an SDNode, allowing each node to produce multiple outputs.


Side Effects and the Need for Order

SelectionDAG is a functional graph with just data dependencies. This means the backend needs extra help to enforce instruction ordering, especially for side-effecting operations:

Side effects are any changes to program state not captured in the return value, such as:

  • Function calls

  • Loads and stores

  • Inline assembly

To preserve ordering in a functional DAG, LLVM uses fake values:

  • chain: Enforces ordering among memory side effects

  • glue: Enforces ordering among non-memory effects, like register setup for calls or returns


Example: chain Enforcing Memory Order

Consider the IR fragment:

%call = tail call i32 @bar(i32 %0)
store i32 %call, ptr %p

DAG nodes:

t12: ch, glue = X86ISD::CALL ...
t13: ch = callseq_end t12
t16: ch = store t13, t15, %ir.p

Flow:

t12 (CALL)
 └── chain → t13 (callseq_end)
         └── chain → t16 (store)
  • CALL must occur before callseq_end

  • store must occur after CALL finishes

Without chaining, the DAG would allow reordering or elimination of CALL or store, breaking semantics.

Example: glue Enforcing Register Sequence

%call = tail call i32 @bar(i32 %0)

DAG nodes:

t8: ch = CopyToReg $edi, t4               ; pass argument in $edi
t12: ch, glue = X86ISD::CALL ..., glue t8 ; actual function call
t15: i32 = CopyFromReg $eax, glue t12     ; get return value

Glue Flow:

t8: ch = CopyToReg $edi, t4               ; prepare arg
 └── glue → t12: CALL
          └── glue → t15: CopyFromReg $eax ; get return value
  • CALL must happen after argument is in $edi

  • CopyFromReg must wait until CALL completes

Glue enables correct sequencing without memory chains or SSA data edges.


Node Allocation and Internals

  • DAG nodes are created via SelectionDAG::getNode, with CSE (common subexpression elimination) support.

  • Backed by a bump-pointer allocator (NodeAllocator) to efficiently handle thousands of short-lived nodes.

template <typename SDNodeT, typename... Args>
SDNodeT *newSDNode(Args &&... Args) {
  return new (NodeAllocator.template Allocate<SDNodeT>())
      SDNodeT(std::forward<Args>(Args)...);
}

This setup ensures:

  • Fast allocation

  • Memory locality

  • No deallocation overhead during DAG building


Conclusion

LLVM’s instruction selection frontend builds a functional DAG to represent IR computations, enhanced with artificial values like chain and glue to enforce ordering for side-effecting operations. This design gives LLVM flexibility to optimize and pattern-match efficiently, while ensuring correctness for memory and register effects.

0
Subscribe to my newsletter

Read articles from Yuanbo Li directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yuanbo Li
Yuanbo Li