SW/HW Co-Simulation—What It’s For and How We Do It

Software/hardware co-simulation links executable software to a hardware model to verify behavior before silicon or FPGA bring-up. Teams use it at two levels: unit-level (e.g., HLS IP correctness and I/O contracts) and system-level (firmware exercising resets, interrupts, backpressure, and peripheral models). Methodologies span a spectrum of control and productivity:

File-based (C/C++ vectors + vendor TB): functional checks with minimal setup; timing/handshakes live in the RTL testbench.
Verilator (C++): cycle-accurate, per-cycle control (eval()), fastest loops for fuzzing, CI, and performance studies.
cocotb + Verilator (Python): cycle-accurate with Python ergonomics (async/await, easy randomization and analysis).

In this blog I will deep dive to the above three flows with my open-source project CoSim_Demo as demonstration so you can pick the right co-sim lane for your project.

CoSim Without Tears: Picking the Right Flow for Your Project

If you build chips (or the stuff that talks to them), you eventually bump into software/hardware co-simulation. It shows up everywhere: pre-silicon bring-up, FPGA prototyping, “does this driver actually poke the right bits?” moments. But how you co-sim depends on what you’re trying to achieve:

HLS/FPGA prototyping: treat co-sim like a unit test harness for IP blocks. You’re proving the math and I/O early, fast, and often.
SoC firmware work: think system level. You’re not only checking the firmware; you’re validating that the modeled peripherals behave like the real ones—resets, interrupts, ready/valid, the whole dance.

The two dials: control and abstraction

Most flows fall along two axes: how much cycle-level control you need, and which language you want to write tests in.

1) File-based co-sim (C/C++ + vendor sim TB)

When you just want to know “does it work?” without micromanaging clocks, file-based is the lowest-friction path. Your C/C++ code pushes vectors and checks results; a prewritten RTL testbench handles timing and handshakes. It’s simple, portable, and great for golden-vector testing. The trade-off: backpressure or dynamic handshakes can feel scripted and clunky.

2) Verilator co-sim (pure C++)

Need to drive the design every cycle, tick clocks yourself, and profile performance? Verilator is your power tool. You link a Verilated model into C++, call eval() per cycle, and control ready/valid like you own the bus—because you do. It’s fast (no file I/O) and CI-friendly for big regressions and fuzzing.

3) cocotb + Verilator (Python)

Prefer writing tests in Python, but still want cycle accuracy? cocotb lets you await edges and randomize traffic with a couple of lines. You keep Verilator’s speed and tracing while tapping into Python’s ecosystem (hypothesis, numpy, quick data munging). Overhead exists, but for unit/regression scope it’s usually a non-issue.

The following table shows detailed tradeoffs among the three Co-Simulation approaches:

Aspect	File‑based CoSim	Verilator‑based CoSim (C++)	Cocotb + Verilator (Python)
Integration boundary	Files between C and Verilog TB	In‑process C++ API ⇄ Verilated model	Python coroutines ⇄ Verilator via cocotb FFI
Authoring language	C/C++ for vector gen/check; Verilog TB for I/O	C++ testbench + Verilog DUT	Python testbench + Verilog DUT
Timing fidelity	Transaction/time‑stepped; cycle sync is manual	Cycle‑accurate (`eval()` per cycle)	Cycle‑accurate (per‑cycle drives/awaits)
Backpressure / handshakes	Awkward (pre‑scripted)	Natural (toggle `ready/valid` each cycle)	Natural (await on `ready`, randomize `valid`)
Performance	Slower (disk I/O, process barriers)	Fast (no file I/O; multithreaded Verilator)	Fast; Python overhead acceptable for unit/regression scope
Tooling	Any vendor sim (Questa/ModelSim, etc.)	Verilator (OSS), C++17 toolchain	Verilator (OSS), Python 3.8+, `cocotb`
Waveforms	Vendor formats (WLF/VCD), viewer varies	FST with GTKWave	VCD/FST via Verilator tracing; GTKWave
Coverage	Vendor coverage tools	Verilator `--coverage` + `verilator_coverage`	Verilator coverage (enable via extra args) + same tooling
Best for	Golden vectors, portability to many simulators	High‑iteration fuzzing, CI perf runs, native C/C++	High‑level tests, quick protos, Python ecosystems

To make this concrete, the rest of the post walks through my open-source cosim_demo project . It drives the same tiny DUT across three harnesses:

(1) file-based C/C++ with a vendor-style RTL testbench

(2) Verilator with an in-process C++ testbench

(3) cocotb + Verilator in Python

→ so the only thing that changes is the methodology, not the design. The repo keeps setup friction low (Make targets, sample vectors, wave dumps), letting you compare timing control, backpressure handling, runtime, waveforms, and coverage side-by-side. We’ll use it to build each flow, run a few representative tests, peek at traces, and note where each approach shines.

What this repo focuses on

cosim_demo distills three patterns that cover 90% of practical needs while staying tiny and copy-pasteable:

File-based CoSim
Verilator-based CoSim (C++)
Verilator + Python cocotb (Make-based)

All three wrap the same simple DUT—a ready/valid adder—and emphasize scoreboard checking, directed + randomized traffic, and artifacts you can keep (waves, coverage).

The Adder DUT

The DUT, which is a simple read/valid adder, owns a two-deep output queue (main buffer + spill) that sustains one result per cycle when out_ready is high and absorbs one cycle of backpressure without stalling inputs. It computes in_a + in_b in a single cycle, prioritizes draining the spill buffer, and asserts in_ready whenever the spill slot is free. Active-low reset (rst_n) clears both buffers and valid flags.

The following flow diagram shows the internal adder architecture:

The examples in this repo

Example #1: file-based/simple_adder_rv

This example demonstrates the file-based CoSim and shows a clean file contract: software emits inputs, Verilog testbench reads/drives the DUT, outputs are captured and compared. Designed for deterministic goldens and portability across simulators.

The C++ host testbench invokes QuestaSim via std::system(), using a fixed command string to run the simulation to completion (run -all; quit -f). Afterward, it validates results by comparing outputs.txt with the software-computed golden values. The main() exit code reports the outcome: 0 for pass, 5 for fail.

The example end-to-end CoSim flow is shown below:

One can build and run this example by running make run, the simulation shall output the following:

Example #2: verilator-based/rv_adder_example

This example shows a simple yet portable Verilator-based Co-simulation example that verifies the ready/valid adder (i.e. adder_rv_simple) using a C++ testbench. The test drives both directed vectors and random streaming with backpressure, checks results with a scoreboard, dumps an FST waveform, and writes coverage.

The testbench main() runs a directed smoke test first with no backpressure (forces top->out_ready = 1), then flushes any leftover outputs to clear the adder’s internal buffers. Next it switches to randomized streaming using std::mt19937_64 rng(1), randomizing a, b, and top->in_valid / top->out_ready to exercise backpressure cases. Stimulus is driven on the negedge; after top->eval() on the posedge, it performs FST tracing and checks results. A std::queue holds golden values, and a mismatch counter accumulates errors—its total determines the main() return code.

The following diagram demonstrates the end-to-end simulation flow:

One can build and run this example by running make run, the simulation shall output the following:

As shown above, the screenshot confirms the expected artifacts and shows the coverage summary (e.g., Total coverage ... 82.00%) and the annotation directory hint.

Example #3: verilator-based/cocotb_rv_adder

This is a Make-based test that drives the same ready/valid adder on the Verilator backend. It shows how to verify a SystemVerilog ready/valid adder (adder_rv_simple.sv) using a cocotb Python testbench on a Verilated C++ simulator. It also includes FST waveform dumping and Verilator coverage with post-run source annotation.

The cocotb test coroutine test_adder_rv_simple( ) mirrors Example #2—running a directed smoke test followed by randomized traffic—but differs in stimulus and clock handling. Specifically, inputs are applied right after negedge phase via await FallingEdge(dut.clk), and outputs are sampled in the simulator’s Observed stage using await ReadOnly(). This ensures cycle-accurate observation without race conditions. The following code snippet demonstrates such input setting and output sampling:

# drive only on FallingEdge
await FallingEdge(dut.clk)
dut.in_valid.value = 1
dut.in_a.value = a
dut.in_b.value = b

# observe only after RisingEdge
await RisingEdge(dut.clk)
await ReadOnly()
# sample output here, no writes are allowed here
dut_sum = dut.out_sum.value

For cocotb–Verilator communication, VPI is the glue. Cocotb registers its callbacks into the Verilated C++ model via VPI, which is the standard way for external code to hook into the simulator’s event schedule and design hierarchy. In fact, cocotb talks to Verilator entirely through VPI.

And if you’re not familiar with simulation time stages, here’s a quick primer. Basically, at a given simulation time t, the kernel doesn’t execute everything at once. Instead, it runs through a fixed sequence of event regions (queues). Processes schedule work into these queues; the kernel drains them in order until no more work remains (this may take multiple delta cycles at the same time t). Only then can time advance to t+Δ. This ordering lets RTL and testbench code read and write signals without races. The following are the nine regions in order:

Preponed – Snapshot values before updates (e.g., immediate assertion sampling on edges).
Active – Normal execution; blocking assigns update nets/vars; NBAs are scheduled here.
Inactive – Handles #0 and events deferred within the same time slot.
NBA (Non-Blocking Assign) – Commits all scheduled NBAs “simultaneously” (sequential logic).
Re-NBA – A second NBA commit pass (for NBAs scheduled by later regions).
Observed – Read-only; assertions/coverage/sample points see final values for this time.
Reactive – Testbench/program/clocking-block drives; meant to avoid racing with the DUT.
Re-Active – Any re-evaluation triggered by reactive drives.
Postponed – Last read-only look; $monitor, VCD/FST dumps, PLI/VPI callbacks.

The following diagram demonstrates the end-to-end simulation flow:

One can build and run this example by running make run, the simulation shall output the following:

The above screenshot confirms the expected artifacts and shows the test summary where test result and simulation time are shown.

Summary

SW/HW co-simulation meaningfully speeds pre-silicon work—both chip design and FPGA prototyping—by letting you verify at the abstraction and control level that best fits the task. cosim_demo isn’t a heavyweight framework; it’s a set of small, production-ready patterns you can drop into real codebases. Use file-based co-sim for portable, deterministic golden checks; choose Verilator + C++ when you need cycle-accurate control, high speed, and coverage; reach for cocotb + Verilator when Python’s ergonomics make test authoring faster. Same DUT, three viewpoints—pick what fits today, keep the others handy as your verification needs evolve.

SW/HW CoSim That Sticks: Three Re-useable Practical Patterns

Table of contents

SW/HW Co-Simulation—What It’s For and How We Do It

CoSim Without Tears: Picking the Right Flow for Your Project

The two dials: control and abstraction

1) File-based co-sim (C/C++ + vendor sim TB)

2) Verilator co-sim (pure C++)

3) cocotb + Verilator (Python)

What this repo focuses on

The Adder DUT

The examples in this repo

Example #1: file-based/simple_adder_rv

Example #2: verilator-based/rv_adder_example

Example #3: verilator-based/cocotb_rv_adder

Summary

Subscribe to my newsletter

Shuran Xu

Shuran Xu