Introduction to ISAs and SAIL

Hello, I’m Siddharth, due to my interest for low level stuff I’ve been researching a lot about the SAIL to CGEN project and well it has led me here. I like documenting stuff so here is a quick introduction of ISAs and SAIL for anyone contributing to the project or looking for knowledge around this in general.

Why Instruction Set Architectures Matter

An Instruction Set Architecture (ISA) is the interface between a CPU's hardware and software, defining how instructions are formatted, which registers are used, and how memory operations behave.Imagine an Instruction Set Architecture (ISA) as the secret handshake between a CPU and the software it runs which tells the hardware and software how to communicate exactly. It acts as the CPU’s “programmer interface”. A well-designed ISA ensures efficient execution, compatibility across hardware generations, and portability of software.

Extending an Instruction Set Architecture (ISA) by adding new instructions demands precise specification to ensure correct behavior of compilers and emulators. But extending an ISA gets tricky, and it’s not just a matter of tossing in a new instruction. Without clear considerations and instruction formats, register usage, and memory operations, compilers may generate incompatible or erroneous machine code, leading to software failures. Similarly, emulators rely on accurate ISA specifications to faithfully replicate hardware functionality, preventing misinterpretation of instructions that could disrupt program execution or compatibility across platforms.

Now that we somewhat get what ISAs are and why we might need them, let’s move onto SAIL.

Introduction to SAIL

SAIL (Safe And Intuitive Architecture Language) is designed to model the behavior of ISAs, which are the rules defining how a processor executes instructions (e.g., x86, ARM, RISC-V). It’s a formal language, meaning it’s rigorous enough to be used for mathematical proofs, simulations, or generating tools like compilers and emulators. Unlike ad-hoc pseudocode, SAIL provides a structured, unambiguous way to describe instruction semantics, ensuring that everyone—humans and machines—interprets the spec the same way.

But how exactly? Well, imagine you’re defining an ADD instruction for a simple RISC-V processor. In SAIL, you might write something like:

function add(rd: reg, rs1: reg, rs2: reg) = {
  let val1 = read_reg(rs1);
  let val2 = read_reg(rs2);
  let result = val1 + val2;
  set_flag(Zero, result == 0);
  write_reg(rd, result);
}

This takes two register values (rs1, rs2), add them, and store the result in register rd. It’s clean, typed, and explicit—no room for misinterpretation. Also I find it really close to general programming syntaxes, so yay on that too!

Why SAIL though? It’s just better, in terms of Precision, Type-Checking, Modularity, tool support and what not! Let’s take an example here, Pseudocode vs Sail for a LOAD instruction —

//PseudoCode
LOAD rd, [rs1 + imm]
  rd <- Mem[rs1 + imm]

// SAIL
function load(rd: reg, rs1: reg, imm: bits(12)) = {
  let addr = read_reg(rs1) + sign_extend(imm);
  if is_aligned(addr, 4) then {
    let value = mem_read(addr, 4);
    write_reg(rd, value);
  } else {
    raise_exception(UnalignedAccess);
  }
}

Difference is apparant - Vague on size, alignment, or errors vs clear, typed, and verifiable.

Why even go SAILING?

SAIL’s structured, type-checked approach to defining ISA semantics unlocks a range of powerful capabilities, transforming a single specification into multiple practical outputs. Here’s what it can do:

1. Automatic Emulator Generation

SAIL specifications can be compiled into executable emulators in languages like C or OCaml. This means you can take a SAIL model of, say, a RISC-V ADD instruction and generate a simulator that runs it. Let’s the previous example again -

function add(rd: reg, rs1: reg, rs2: reg) = {
  let val1 = read_reg(rs1);
  let val2 = read_reg(rs2);
  let result = val1 + val2;
  write_reg(rd, result);
}

The SAIL compiler can translate this into C code that simulates the instruction, letting you test or debug the ISA without manual coding.

2. Better Documentation

SAIL models can be processed to produce clean, human-readable documentation in formats like LaTeX or AsciiDoc. This is crucial for spec adoption, as it turns formal definitions into accessible manuals. For instance, our lovely ADD function could be rendered as a LaTeX table or AsciiDoc section describing its inputs, outputs, and behavior, ready for inclusion in an ISA reference manual.

3. Formal Verification Interfaces

SAIL integrates with theorem provers like Coq, HOL4, and Isabelle, enabling formal verification of ISA properties. You can prove that an instruction doesn’t violate memory consistency or that a pipeline handles hazards correctly. For example, a SAIL model of a LOAD instruction can be fed into Coq to verify alignment checks:

function load(rd: reg, rs1: reg, imm: bits(12)) = {
  let addr = read_reg(rs1) + sign_extend(imm);
  if is_aligned(addr, 4) then {
    let value = mem_read(addr, 4);
    write_reg(rd, value);
  } else {
    raise_exception(UnalignedAccess);
  }
}

This can be translated into a Coq model to prove the instruction behaves correctly under all inputs.

4. Symbolic Test Generators

Tools like Isla and Islaris can use SAIL models to generate symbolic tests, exploring edge cases like unaligned accesses or overflow conditions. What this means is we get test cases to check that our instruction can handle any edge cases and erroneous stuff properly.

SAIL’s ability to generate emulators, documentation, verification models, and tests from a single spec streamlines ISA development. It’s like writing one recipe that can be cooked, documented, verified for safety, and tested for flavor—all automatically. This makes SAIL indispensable for modern architecture design.

What’s Next?

Next Up: In our next blog, we’ve explored CGEN, a tool for generating CPU simulators and compiler backends, and how it integrates with compilers to bridge ISA specs and code generation. It’s a deep dive into its workflow, with examples showing how CGEN complements SAIL for end-to-end architecture development. Here’s the link to the next part -

A DEEP DIVE INTO GCC AND CGEN

Note that this blog series is my own research and there might be a few corrections to be done so don’t depend blindly on this and follow your own path taking this as a helping hand. For maintainers or expert readers, kindly reach out if there is anything that needs to be updated.

You can reach out to me on X - HERE. If you liked the work, consider dropping a follow @https://x.com/Sidd190b

Thanks for reading! Keep learning!

ISAs and SAIL : A quick introduction ( I )