LLVM's instruction selection (ISel) is a key stage in the LLVM backend that translates high-level LLVM IR into target-specific machine instructions. This post focuses on the frontend of instruction selection—the phase that builds the SelectionDAG from LLVM IR—and illustrates how special nodes like chain and glue enforce proper execution order in a functional representation.

Overview of LLVM Instruction Selection

LLVM's instruction selection pipeline consists of several stages:

Frontend Interface: Converts LLVM IR into an initial SelectionDAG.
Legalization: Adjusts the DAG to meet target-specific instruction constraints.
Instruction Selection: Pattern-matches DAG nodes to actual machine instructions.
Post-selection Cleanup: Prepares data structures for final emission.
MachineInstr Emission: Emits final machine code instructions.

This post focuses on the frontend interface, the bridge between LLVM IR and the SelectionDAG.

Instruction Selection Frontend Interface

Purpose

What: Converts LLVM IR to a target-independent SelectionDAG representation.
Why: Enables mid-level and target-independent optimizations before lowering to machine code.

High-Level Behavior

LLVM builds a SelectionDAG for each basic block in the LLVM IR. Each DAG is composed of SDNode objects representing arithmetic, memory, and control flow operations. Data dependencies form a directed acyclic graph (DAG) that supports both optimization and instruction selection.

Key Data Structures

`SelectionDAG`

Defined in llvm/include/llvm/CodeGen/SelectionDAG.h, this class stores the DAG:

class SelectionDAG {
  SDValue Root;
  FoldingSet<SDNode> AllNodes;
  NodeAllocatorType NodeAllocator;
  ...
};

Root: Final output of the block.
FoldingSet: Supports node uniquing (hash-consing).
NodeAllocator: Bump-pointer allocator for fast memory allocation.

`SDNode`

Nodes in the DAG. Each represents an operation and tracks:

Opcode (ISD::ADD, ISD::CALL, etc.)
Operands (other SDValues)
Value types
Debug info and flags (e.g., nuw, exact)

class SDNode {
  int32_t NodeType;
  SDUse *UseList;
  SDValue *OperandList;
  const EVT *ValueList;
  DebugLoc debugLoc;
  ...
};

`SDValue`

A handle to a specific value produced by an SDNode, allowing each node to produce multiple outputs.

Side Effects and the Need for Order

SelectionDAG is a functional graph with just data dependencies. This means the backend needs extra help to enforce instruction ordering, especially for side-effecting operations:

Side effects are any changes to program state not captured in the return value, such as:

Function calls

Loads and stores

Inline assembly

To preserve ordering in a functional DAG, LLVM uses fake values:

chain: Enforces ordering among memory side effects
glue: Enforces ordering among non-memory effects, like register setup for calls or returns

Example: `chain` Enforcing Memory Order

Consider the IR fragment:

%call = tail call i32 @bar(i32 %0)
store i32 %call, ptr %p

DAG nodes:

t12: ch, glue = X86ISD::CALL ...
t13: ch = callseq_end t12
t16: ch = store t13, t15, %ir.p

Flow:

t12 (CALL)
 └── chain → t13 (callseq_end)
         └── chain → t16 (store)

CALL must occur before callseq_end
store must occur after CALL finishes

Without chaining, the DAG would allow reordering or elimination of CALL or store, breaking semantics.

Example: `glue` Enforcing Register Sequence

%call = tail call i32 @bar(i32 %0)

DAG nodes:

t8: ch = CopyToReg $edi, t4               ; pass argument in $edi
t12: ch, glue = X86ISD::CALL ..., glue t8 ; actual function call
t15: i32 = CopyFromReg $eax, glue t12     ; get return value

Glue Flow:

t8: ch = CopyToReg $edi, t4               ; prepare arg
 └── glue → t12: CALL
          └── glue → t15: CopyFromReg $eax ; get return value

CALL must happen after argument is in $edi
CopyFromReg must wait until CALL completes

Glue enables correct sequencing without memory chains or SSA data edges.

Node Allocation and Internals

DAG nodes are created via SelectionDAG::getNode, with CSE (common subexpression elimination) support.
Backed by a bump-pointer allocator (NodeAllocator) to efficiently handle thousands of short-lived nodes.

template <typename SDNodeT, typename... Args>
SDNodeT *newSDNode(Args &&... Args) {
  return new (NodeAllocator.template Allocate<SDNodeT>())
      SDNodeT(std::forward<Args>(Args)...);
}

This setup ensures:

Fast allocation
Memory locality
No deallocation overhead during DAG building

Conclusion

LLVM’s instruction selection frontend builds a functional DAG to represent IR computations, enhanced with artificial values like chain and glue to enforce ordering for side-effecting operations. This design gives LLVM flexibility to optimize and pattern-match efficiently, while ensuring correctness for memory and register effects.

LLVM Instruction Selection: SelectionDAG Building

Overview of LLVM Instruction Selection

Instruction Selection Frontend Interface

Purpose

High-Level Behavior

Key Data Structures

`SelectionDAG`

`SDNode`

`SDValue`

Side Effects and the Need for Order

Example: `chain` Enforcing Memory Order

Flow:

Example: `glue` Enforcing Register Sequence

Glue Flow:

Node Allocation and Internals

Conclusion

Subscribe to my newsletter

Yuanbo Li

Yuanbo Li

LLVM Instruction Selection: SelectionDAG Building

Overview of LLVM Instruction Selection

Instruction Selection Frontend Interface

Purpose

High-Level Behavior

Key Data Structures

SelectionDAG

SDNode

SDValue

Side Effects and the Need for Order

Example: chain Enforcing Memory Order

Flow:

Example: glue Enforcing Register Sequence

Glue Flow:

Node Allocation and Internals

Conclusion

Subscribe to my newsletter

Yuanbo Li

Yuanbo Li

`SelectionDAG`

`SDNode`

`SDValue`

Example: `chain` Enforcing Memory Order

Example: `glue` Enforcing Register Sequence