LLVM Instruction Selection: Pattern Matching in ISelDAGToDAG

This is some of my notes for LLVM's ISelDAGToDAG instruction selector and how SelectionDAG patterns are matched and lowered to machine instructions.
Purpose
What: ISelDAGToDAG translates a legalized SelectionDAG into target-specific machine instructions.
Why: This is the final step before scheduling and register allocation, bridging high-level IR with concrete target assembly code.
Converts target-independent DAGs into optimal target-specific nodes
Enables machine-specific codegen optimizations
High-Level Flow
DoInstructionSelection()
├── PreprocessISelDAG() // Prepares DAG
├── Traverse DAG in reverse // ISelUpdater
│ ├── Skip dead nodes
│ ├── Enforce node invariants
│ ├── Convert StrictFP if unsupported
│ └── Select() → Calls target-specific logic
└── PostprocessISelDAG()
The Select() Function
The core per-node function to emit target instructions. Use AMDGPU target as example:
Select(SDNode *N)
├── Skip already selected machine opcodes
├── Memory ops (LOAD/STORE): glueCopyToM0LDSInit(), SelectCode()
├── Arithmetic ops: SelectADD_SUB_I64(), SelectUADDO_USUBO(), etc.
├── Vector ops: SelectBuildVector(), SelectVectorShuffle()
├── Constants: buildSMovImm64()
├── Bitfields: getBFE32()
├── AMDGPU specific: SelectDIV_SCALE(), SelectMAD_64_32()
├── Misc: SelectBRCOND(), SelectFP_EXTEND(), etc.
└── Fallback: SelectCode() → tablegen matcher
SelectCode & SelectCodeCommon
SelectCode() does a target-specific selection. It can also dispatch the TableGen-generated matcher. It prepares the matcher state and invokes SelectCodeCommon()
:
SelectCodeCommon(NodeToMatch)
Outline:
1. Pre-checks for special nodes
2. Init matcher state
3. Pattern matching loop
4. Emit new machine nodes
5. State updates (chains/glue)
6. Fallback to CannotYetSelect()
Matcher Phases
1. Special Node Handling
case ISD::EntryToken: case ISD::BasicBlock: ...
NodeToMatch->setNodeId(-1);
return;
- Nodes not needing selection (e.g., EntryToken, TokenFactor, and others)
2. State Initialization
There are several important data structures for pattern matching
Node stack: for traversal, to keep track of where we are during traversal. This is necessary because SelectionDAG does not keep parent SDNode (multiple parents due to hash-consed / folding set data structure)
Recorded nodes: to track match operand values
Input Chain/Glue, Chain nodes matched: keep track of chains and glues
3. Matcher Loop Mechanics
Traverse using
MoveChild
/MoveParent
Predicate checks:
CheckOpcode
,CheckType
, etc.Backtracking with
MatchScopes
Emitting matched nodes via
EmitNode
,MorphNodeTo
Concrete Example: Matching INC32m
on X86
Let's walk through a real example. Consider this LLVM IR:
define void @test(i32* %p) {
entry:
%val = load i32, i32* %p
%inc = add i32 %val, 1
store i32 %inc, i32* %p
ret void
}
Legalized SelectionDAG Before ISel
SelectionDAG has 10 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
t7: i32 = add t5, Constant:i32<1>
t8: ch = store<(store (s32) into %ir.p)> t5:1, t7, t2, undef:i64
t10: ch = X86ISD::RET_GLUE t8, TargetConstant:i32<0>
Step-by-Step Breakdown
t0:
EntryToken
Represents start of side-effect chain
Used to order operations like load/store
t2:
CopyFromReg t0, Register:i64 %0
Reads first argument (i32* %p) from calling convention register
Returns a value and a new chain
t5:
load t0, t2, undef
Loads from memory address in
t2
Chain input is
t0
(side-effect ordering)
t7:
add t5, Constant<1>
- Simple integer increment
t8:
store t7, t2
Writes result of add back to memory
Chain input is
t5:1
(load chain)
t10:
RET_GLUE t8, TargetConstant<0>
- Ensures store completes before returning
Pattern Matching Walkthrough (llc -debug-only=isel)
Matching RET_GLUE
Note that pattern matching is usually bottom up, because we first match SDNode with no uses.
ISEL: Starting selection on root node: t10
Morphed node: t10 = RET TargetConstant<0>, t8
- Matches a target-defined pattern in
X86InstrFragments.td
:
def X86retglue : SDNode<"X86ISD::RET_GLUE", SDTX86Ret,
[SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;
- Emits
RET
with the correct chain and glue
Matching Store+Add+Load into INC32m
ISEL: Matching t8: store(add(load %p), 1)
Morphed node: t8 = INC32m t2, imm1, $noreg, 0, $noreg, t0
- Matches pattern:
(store (add (load addr:$src1), 1), addr:$src1)
- Defined in
X86InstrArithmetic.td
:
---------
def INC32m : IncOpM_MF<Xi32>, OpSize32;
---------
class IncOpM_MF<X86TypeInfo t> : UnaryOpM_MF<0xFF, MRM0m, "inc", t, null_frag> {
let Pattern = [(store (add (t.LoadNode addr:$src1), 1), addr:$src1)];
}
---------
- Final lowered node:
t8 = INC32m t2, 1, $noreg, 0, $noreg, t0
Breakdown of this node: (note that the index, offset, segment is X86 specific for memory addressing)
t2
: base address pointer1
: increment value$noreg
: no index0
: offset$noreg
: no segment overridet0
: chain input
Final Selected DAG
Selected SelectionDAG:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t8: i32,ch = INC32m<Mem:(store (s32) into %ir.p) (load (s32) from %ir.p)> t2, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i16 $noreg, t0
t10: ch = RET TargetConstant:i32<0>, t8:1
This is now ready for instruction scheduling and register allocation.
Shows how three IR-level instructions were lowered into two efficient X86 machine instructions.
Absolutely — you're right again. The LLVM IR is the critical bridge between C++ source and instruction selection (ISel). If the IR is structured poorly, even if semantically equivalent, it prevents ISel from emitting optimal code.
So let me restructure the explanation with strong emphasis on IR structure, why it matters, and how it shapes ISel behavior — using your complex_conditional_poor
vs complex_conditional_good
example.
Case Study: How C++ Shape of Conditions Steers x86 ISel (SETcc vs Jcc)
Two functions are semantically identical, but they lead to very different IR—and that difference cascades into instruction selection quality on x86.
The C++: “poor” vs “good”
// Suboptimal: materializes booleans and uses selects
int complex_conditional_poor(int *arr, int len, int t1, int t2) {
int count = 0;
for (int i = 0; i < len; i++) {
bool cond1 = arr[i] > t1;
bool cond2 = arr[i] < t2;
bool should_update = cond1 && cond2;
int diff = should_update ? (arr[i] - t1) : arr[i];
arr[i] = diff;
count += should_update ? 1 : 0;
}
return count;
}
// Better: preserves control-flow conditions
int complex_conditional_good(int *arr, int len, int t1, int t2) {
int count = 0;
for (int i = 0; i < len; i++) {
if (arr[i] > t1 && arr[i] < t2) {
arr[i] -= t1;
count++;
}
}
return count;
}
What the IR Looks Like
“Poor” IR (booleans + select)
%cmp1 = icmp sgt i32 %val, %t1
%cmp2 = icmp slt i32 %val, %t2
%cond = and i1 %cmp1, %cmp2
%sel_val = select i1 %cond, i32 %val - %t1, i32 %val
store i32 %sel_val, i32* %ptr
%cond_z = zext i1 %cond to i32
%count_new = add i32 %count, %cond_z
Why this hurts on x86
The
icmp
results become SSA booleans (i1
), not live EFLAGS.To reuse
%cond
multiple times, ISel must materialize it withSETcc
into a GPR, combine withAND
, and often feed aCMOV
/SELECT
—burning uops and registers.EFLAGS set by
CMP
get discarded early, so later users must re-test.
Typical lowering sketch (SelectionDAG)
; Two compares produce EFLAGS (via SUB/CMP), then:
i8 = X86ISD::SETCC ; for each compare
i8 = and ; combine bools
i32 = X86ISD::CMP ; TEST/compare materialized bool
i32 = X86ISD::CMOV ; emulate 'select'
Triggering pattern
def : Pat<(X86cmov GR32:$src1, 0, timm:$cond, EFLAGS),
(CFCMOV32rr GR32:$src1, (inv_cond_XFORM timm:$cond))>;
Representative asm shape
cmpl %edx, %r9d ; arr[i] > t1
setg %r10b
cmpl %ecx, %r9d ; arr[i] < t2
setl %r11b
andb %r10b, %r11b
movl $0, %r10d
cmovnel %edx, %r10d ; select t1 or 0 based on cond
subl %r10d, %r9d ; arr[i] -= selected
movl %r9d, (%rdi,%r8,4)
movzbl %r11b, %r9d ; count += cond
addl %r9d, %eax
Key signals: SETcc
, boolean and
, CMOV
, and a zero-extend of the materialized condition.
“Good” IR (branching preserves flags)
%cmp1 = icmp sgt i32 %val, %t1
%cmp2 = icmp slt i32 %val, %t2
%cond = and i1 %cmp1, %cmp2
br i1 %cond, label %then, label %else
then:
%sub = sub i32 %val, %t1
store i32 %sub, i32* %ptr
%count_new = add i32 %count, 1
br label %cont
else:
store i32 %val, i32* %ptr
br label %cont
Why this helps on x86
The
icmp
feeds abr i1
, so EFLAGS can be consumed directly by aJcc
.No boolean materialization, no
select
⇒ fewer uops, less register pressure.Arithmetic happens only on the taken path.
Typical lowering sketch (SelectionDAG)
i32,i32 = X86ISD::SUB ; compute compare, set EFLAGS
i8 = X86ISD::SETCC ; (may appear transiently), often folded
ch = X86ISD::BRCOND ; Jcc consumes EFLAGS
Triggering pattern
let isBranch = 1, isTerminator = 1, Uses = [EFLAGS], SchedRW = [WriteJump],
isCodeGenOnly = 1, ForceDisassemble = 1 in {
def JCC_1 : Ii8PCRel <0x70, AddCCFrm, (outs),
(ins brtarget8:$dst, ccode:$cond),
"j${cond}\t$dst",
[(X86brcond bb:$dst, timm:$cond, EFLAGS)]>;
Representative asm shape
movl (%rdi,%r8,4), %r10d
movl %r10d, %r9d
subl %edx, %r10d
setle %r11b ; or folded pattern feeding branch
cmpl %ecx, (%rdi,%r8,4)
setge %r10b
orb %r11b, %r10b
jne .Lthen
movl %r9d, (%rdi,%r8,4)
incl %eax
jmp .Lcont
the final code streamlines to a single CMP
+ Jcc
pattern.
What ISel Is Matching Under the Hood
“Poor” path uses target-specific nodes like:
X86ISD::SETCC
(i8 0/1) consuming the EFLAGS producer,boolean
and
,X86ISD::CMOV
to implementselect
.
“Good” path lets the TableGen pattern for:
[(X86brcond bb:$dst, timm:$cond, EFLAGS)]
directly matchJcc
, keeping the flags live and avoiding materialization.
Quick Comparison
Aspect | “Poor” IR (select + i1 reuses) | “Good” IR (branch) |
Condition form | Materialized i1 in GPRs | Live in EFLAGS → Jcc |
Predication | select → CMOV | Real control flow |
Extra ops | SETcc , AND , CMOV , movzbl | Usually none |
Register pressure | Higher (carry bools) | Lower |
Micro-ops | More | Fewer |
Practical Guidelines (for writing IR-friendly C++)
Prefer
if (...) { ... } else { ... }
over ternaryselect
when the condition is reused.Avoid building long boolean pipelines you’ll consume multiple times.
Let branches carry the condition so the backend can map to
Jcc
and reuse EFLAGS.If you truly need predication, try to keep compares close to their use to avoid re-tests.
Summary
ISelDAGToDAG performs pattern-driven matching on SelectionDAG
TableGen patterns describe mappings from DAG to machine instrs
Matcher logic uses stacks, backtracking, predicates, and emits
Chain/glue nodes preserve correctness in side-effect ordering
Complex patterns can be matched and morphed in one phase
Excellent developers with hardware and compiler knowledge can be aware of the instruction selection to generate more performant code
Subscribe to my newsletter
Read articles from Yuanbo Li directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
