LLVM Instruction Selection: Pattern Matching in ISelDAGToDAG

Yuanbo LiYuanbo Li
8 min read

This is some of my notes for LLVM's ISelDAGToDAG instruction selector and how SelectionDAG patterns are matched and lowered to machine instructions.


Purpose

What: ISelDAGToDAG translates a legalized SelectionDAG into target-specific machine instructions.

Why: This is the final step before scheduling and register allocation, bridging high-level IR with concrete target assembly code.

  • Converts target-independent DAGs into optimal target-specific nodes

  • Enables machine-specific codegen optimizations


High-Level Flow

DoInstructionSelection()
├── PreprocessISelDAG()       // Prepares DAG
├── Traverse DAG in reverse    // ISelUpdater
│   ├── Skip dead nodes
│   ├── Enforce node invariants
│   ├── Convert StrictFP if unsupported
│   └── Select() → Calls target-specific logic
└── PostprocessISelDAG()

The Select() Function

The core per-node function to emit target instructions. Use AMDGPU target as example:

Select(SDNode *N)
├── Skip already selected machine opcodes
├── Memory ops (LOAD/STORE): glueCopyToM0LDSInit(), SelectCode()
├── Arithmetic ops: SelectADD_SUB_I64(), SelectUADDO_USUBO(), etc.
├── Vector ops: SelectBuildVector(), SelectVectorShuffle()
├── Constants: buildSMovImm64()
├── Bitfields: getBFE32()
├── AMDGPU specific: SelectDIV_SCALE(), SelectMAD_64_32()
├── Misc: SelectBRCOND(), SelectFP_EXTEND(), etc.
└── Fallback: SelectCode() → tablegen matcher

SelectCode & SelectCodeCommon

SelectCode() does a target-specific selection. It can also dispatch the TableGen-generated matcher. It prepares the matcher state and invokes SelectCodeCommon():

SelectCodeCommon(NodeToMatch) Outline:

1. Pre-checks for special nodes
2. Init matcher state
3. Pattern matching loop
4. Emit new machine nodes
5. State updates (chains/glue)
6. Fallback to CannotYetSelect()

Matcher Phases

1. Special Node Handling

case ISD::EntryToken: case ISD::BasicBlock: ...
  NodeToMatch->setNodeId(-1);
  return;
  • Nodes not needing selection (e.g., EntryToken, TokenFactor, and others)

2. State Initialization

There are several important data structures for pattern matching

  • Node stack: for traversal, to keep track of where we are during traversal. This is necessary because SelectionDAG does not keep parent SDNode (multiple parents due to hash-consed / folding set data structure)

  • Recorded nodes: to track match operand values

  • Input Chain/Glue, Chain nodes matched: keep track of chains and glues

3. Matcher Loop Mechanics

  • Traverse using MoveChild / MoveParent

  • Predicate checks: CheckOpcode, CheckType, etc.

  • Backtracking with MatchScopes

  • Emitting matched nodes via EmitNode, MorphNodeTo


Concrete Example: Matching INC32m on X86

Let's walk through a real example. Consider this LLVM IR:

define void @test(i32* %p) {
entry:
  %val = load i32, i32* %p
  %inc = add i32 %val, 1
  store i32 %inc, i32* %p
  ret void
}

Legalized SelectionDAG Before ISel

SelectionDAG has 10 nodes:
  t0: ch,glue = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
      t7: i32 = add t5, Constant:i32<1>
    t8: ch = store<(store (s32) into %ir.p)> t5:1, t7, t2, undef:i64
  t10: ch = X86ISD::RET_GLUE t8, TargetConstant:i32<0>

Step-by-Step Breakdown

  • t0: EntryToken

    • Represents start of side-effect chain

    • Used to order operations like load/store

  • t2: CopyFromReg t0, Register:i64 %0

    • Reads first argument (i32* %p) from calling convention register

    • Returns a value and a new chain

  • t5: load t0, t2, undef

    • Loads from memory address in t2

    • Chain input is t0 (side-effect ordering)

  • t7: add t5, Constant<1>

    • Simple integer increment
  • t8: store t7, t2

    • Writes result of add back to memory

    • Chain input is t5:1 (load chain)

  • t10: RET_GLUE t8, TargetConstant<0>

    • Ensures store completes before returning

Pattern Matching Walkthrough (llc -debug-only=isel)

Matching RET_GLUE

Note that pattern matching is usually bottom up, because we first match SDNode with no uses.

ISEL: Starting selection on root node: t10
Morphed node: t10 = RET TargetConstant<0>, t8
  • Matches a target-defined pattern in X86InstrFragments.td:
def X86retglue : SDNode<"X86ISD::RET_GLUE", SDTX86Ret,
                        [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
  • Emits RET with the correct chain and glue

Matching Store+Add+Load into INC32m

ISEL: Matching t8: store(add(load %p), 1)
Morphed node: t8 = INC32m t2, imm1, $noreg, 0, $noreg, t0
  • Matches pattern:
(store (add (load addr:$src1), 1), addr:$src1)
  • Defined in X86InstrArithmetic.td:
---------
def INC32m : IncOpM_MF<Xi32>, OpSize32;
---------
class IncOpM_MF<X86TypeInfo t> : UnaryOpM_MF<0xFF, MRM0m, "inc", t, null_frag> {
  let Pattern = [(store (add (t.LoadNode addr:$src1), 1), addr:$src1)];
}
---------
  • Final lowered node: t8 = INC32m t2, 1, $noreg, 0, $noreg, t0
  • Breakdown of this node: (note that the index, offset, segment is X86 specific for memory addressing)

    • t2: base address pointer

    • 1: increment value

    • $noreg: no index

    • 0: offset

    • $noreg: no segment override

    • t0: chain input


Final Selected DAG

Selected SelectionDAG:
  t0: ch,glue = EntryToken
      t2: i64,ch = CopyFromReg t0, Register:i64 %0
    t8: i32,ch = INC32m<Mem:(store (s32) into %ir.p) (load (s32) from %ir.p)> t2, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i16 $noreg, t0
  t10: ch = RET TargetConstant:i32<0>, t8:1
  • This is now ready for instruction scheduling and register allocation.

  • Shows how three IR-level instructions were lowered into two efficient X86 machine instructions.


Absolutely — you're right again. The LLVM IR is the critical bridge between C++ source and instruction selection (ISel). If the IR is structured poorly, even if semantically equivalent, it prevents ISel from emitting optimal code.

So let me restructure the explanation with strong emphasis on IR structure, why it matters, and how it shapes ISel behavior — using your complex_conditional_poor vs complex_conditional_good example.


Case Study: How C++ Shape of Conditions Steers x86 ISel (SETcc vs Jcc)

Two functions are semantically identical, but they lead to very different IR—and that difference cascades into instruction selection quality on x86.

The C++: “poor” vs “good”

// Suboptimal: materializes booleans and uses selects
int complex_conditional_poor(int *arr, int len, int t1, int t2) {
    int count = 0;
    for (int i = 0; i < len; i++) {
        bool cond1 = arr[i] > t1;
        bool cond2 = arr[i] < t2;
        bool should_update = cond1 && cond2;

        int diff = should_update ? (arr[i] - t1) : arr[i];
        arr[i] = diff;
        count += should_update ? 1 : 0;
    }
    return count;
}

// Better: preserves control-flow conditions
int complex_conditional_good(int *arr, int len, int t1, int t2) {
    int count = 0;
    for (int i = 0; i < len; i++) {
        if (arr[i] > t1 && arr[i] < t2) {
            arr[i] -= t1;
            count++;
        }
    }
    return count;
}

What the IR Looks Like

“Poor” IR (booleans + select)

%cmp1 = icmp sgt i32 %val, %t1
%cmp2 = icmp slt i32 %val, %t2
%cond = and i1 %cmp1, %cmp2

%sel_val = select i1 %cond, i32 %val - %t1, i32 %val
store i32 %sel_val, i32* %ptr

%cond_z = zext i1 %cond to i32
%count_new = add i32 %count, %cond_z

Why this hurts on x86

  • The icmp results become SSA booleans (i1), not live EFLAGS.

  • To reuse %cond multiple times, ISel must materialize it with SETcc into a GPR, combine with AND, and often feed a CMOV/SELECT—burning uops and registers.

  • EFLAGS set by CMP get discarded early, so later users must re-test.

Typical lowering sketch (SelectionDAG)

; Two compares produce EFLAGS (via SUB/CMP), then:
i8 = X86ISD::SETCC  ; for each compare
i8 = and            ; combine bools
i32 = X86ISD::CMP   ; TEST/compare materialized bool
i32 = X86ISD::CMOV  ; emulate 'select'

Triggering pattern

  def : Pat<(X86cmov GR32:$src1, 0, timm:$cond, EFLAGS),
            (CFCMOV32rr GR32:$src1, (inv_cond_XFORM timm:$cond))>;

Representative asm shape

cmpl   %edx, %r9d      ; arr[i] > t1
setg   %r10b
cmpl   %ecx, %r9d      ; arr[i] < t2
setl   %r11b
andb   %r10b, %r11b
movl   $0, %r10d
cmovnel %edx, %r10d    ; select t1 or 0 based on cond
subl   %r10d, %r9d     ; arr[i] -= selected
movl   %r9d, (%rdi,%r8,4)
movzbl %r11b, %r9d     ; count += cond
addl   %r9d, %eax

Key signals: SETcc, boolean and, CMOV, and a zero-extend of the materialized condition.


“Good” IR (branching preserves flags)

%cmp1  = icmp sgt i32 %val, %t1
%cmp2  = icmp slt i32 %val, %t2
%cond  = and i1 %cmp1, %cmp2
br i1 %cond, label %then, label %else

then:
  %sub = sub i32 %val, %t1
  store i32 %sub, i32* %ptr
  %count_new = add i32 %count, 1
  br label %cont

else:
  store i32 %val, i32* %ptr
  br label %cont

Why this helps on x86

  • The icmp feeds a br i1, so EFLAGS can be consumed directly by a Jcc.

  • No boolean materialization, no select ⇒ fewer uops, less register pressure.

  • Arithmetic happens only on the taken path.

Typical lowering sketch (SelectionDAG)

i32,i32 = X86ISD::SUB     ; compute compare, set EFLAGS
i8      = X86ISD::SETCC   ; (may appear transiently), often folded
ch      = X86ISD::BRCOND  ; Jcc consumes EFLAGS

Triggering pattern

let isBranch = 1, isTerminator = 1, Uses = [EFLAGS], SchedRW = [WriteJump],
    isCodeGenOnly = 1, ForceDisassemble = 1 in {
  def JCC_1 : Ii8PCRel <0x70, AddCCFrm, (outs),
                        (ins brtarget8:$dst, ccode:$cond),
                        "j${cond}\t$dst",
                        [(X86brcond bb:$dst, timm:$cond, EFLAGS)]>;

Representative asm shape

movl  (%rdi,%r8,4), %r10d
movl    %r10d, %r9d
subl  %edx, %r10d
setle %r11b            ; or folded pattern feeding branch
cmpl  %ecx, (%rdi,%r8,4)
setge %r10b
orb   %r11b, %r10b
jne   .Lthen
movl  %r9d, (%rdi,%r8,4)
incl    %eax
jmp   .Lcont

the final code streamlines to a single CMP + Jcc pattern.


What ISel Is Matching Under the Hood

  • “Poor” path uses target-specific nodes like:

    • X86ISD::SETCC (i8 0/1) consuming the EFLAGS producer,

    • boolean and,

    • X86ISD::CMOV to implement select.

  • “Good” path lets the TableGen pattern for:

    • [(X86brcond bb:$dst, timm:$cond, EFLAGS)]
      directly match Jcc, keeping the flags live and avoiding materialization.

Quick Comparison

Aspect“Poor” IR (select + i1 reuses)“Good” IR (branch)
Condition formMaterialized i1 in GPRsLive in EFLAGSJcc
PredicationselectCMOVReal control flow
Extra opsSETcc, AND, CMOV, movzblUsually none
Register pressureHigher (carry bools)Lower
Micro-opsMoreFewer

Practical Guidelines (for writing IR-friendly C++)

  • Prefer if (...) { ... } else { ... } over ternary select when the condition is reused.

  • Avoid building long boolean pipelines you’ll consume multiple times.

  • Let branches carry the condition so the backend can map to Jcc and reuse EFLAGS.

  • If you truly need predication, try to keep compares close to their use to avoid re-tests.


Summary

  • ISelDAGToDAG performs pattern-driven matching on SelectionDAG

  • TableGen patterns describe mappings from DAG to machine instrs

  • Matcher logic uses stacks, backtracking, predicates, and emits

  • Chain/glue nodes preserve correctness in side-effect ordering

  • Complex patterns can be matched and morphed in one phase

  • Excellent developers with hardware and compiler knowledge can be aware of the instruction selection to generate more performant code

0
Subscribe to my newsletter

Read articles from Yuanbo Li directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yuanbo Li
Yuanbo Li