SW/HW Co-Simulation Stack Design

Shuran XuShuran Xu
4 min read

1. Background & Motivation

Modern RISC-V SoC development often demands parallel hardware and firmware progress. Waiting for RTL completion or silicon fabrication delays firmware bring-up and can mask integration bugs until late in the project. Traditional RTL-only simulations, while accurate, are painfully slow for running substantial firmware workloads.

This project was born to bridge that gap — by combining a high-speed software Instruction Set Simulator (ISS) with a cycle-accurate RTL testbench, we can validate early firmware against evolving hardware models. The design is tuned for RISC-V 64-bit systems and validated in QuestaSim. It targets scenarios where the software must interact with memory-mapped peripherals in a realistic way, without requiring a full hardware prototype.

In short, this stack accelerates early-stage software/hardware co-verification in the pre-silicon stage by giving firmware engineers realistic hardware responses and hardware engineers realistic software traffic.


2. Project Overview

This repository implements a SW/HW Co-Simulation environment for RISC-V that couples:

  • Unicorn Engine (RISC-V 64-bit ISS) for executing ELF binaries at near-native speed.

  • SystemVerilog Testbench running in QuestaSim for peripheral and bus cycle simulation.

  • DPI-C Bridge Layer that connects the ISS process with the HDL simulation process.

The test environment allows:

  • Loading arbitrary RISC-V ELF binaries into the ISS memory space.

  • Forwarding memory-mapped I/O (MMIO) accesses from ISS to RTL peripherals.

  • Synchronizing ISS instruction execution with RTL simulation time.

  • Capturing and analyzing CPU ↔ peripheral transactions.

Unlike a purely virtual platform, this approach gives us RTL-accurate behavior for selected hardware blocks while still executing real firmware at usable speeds.


3. Architecture & Module Design

The following diagram illustrates the project architecture:

As shown, the SystemVerilog testbench orchestrates the run, while the Unicorn ISS loads the firmware into RAM and executes it (advancing the PC, decoding, executing, and registering GPIO MMIO hooks). When the firmware touches GPIO via MMIO, the ISS forwards the access parameters—addr, data, and len—through DPI-C to the AXI-Lite driver. The driver lowers these TLM fields into standard AXI-Lite handshakes and drives the GPIO RTL. On completion, the GPIO returns read data or a write response back through the same path to the ISS. This flow faithfully simulates MMIO with hardware-accurate behavior. Besides, the component responsibilities are listed as follows:

  1. Firmware Application

    • Writes data 0xA5A5A5A5 and 0xB6B6B6B6 to a memory-mapped GPIO address.

    • Read data from that same memory-mapped GPIO addres.

    • .Exit using ebreak instruction.

  2. RISC-V Unicorn ISS Layer

    • Loads and executes firmware binaries (riscv64 ELF).

    • Provides hooks for instruction execution and MMIO access trapping.

    • Maintains its own simulated memory space, with GPIO module address ranges marked as MMIO.

  3. DPI-C Bridge

    • Implements the glue between C-based ISS code and SystemVerilog testbench.

    • Uses import "DPI-C" functions to pass transactions, memory reads/writes, and interrupt events.

    • Encapsulates all ISS–RTL data exchanges so that the ISS code is not tightly coupled to HDL code.

  4. SystemVerilog Testbench in QuestaSim

    • Contains bus functional models (BFMs) and signal-driving logic to emulate the memory system and peripherals.

    • Implements clock and reset generation, transaction routing, and signal assertions.

    • Captures simulation waveforms for debugging in QuestaSim’s GUI.

  5. Target Peripheral (RTL)

    • A simple GPIO module is designed to communicate with the Unicorn ISS via AXILite protocol.

    • Provide realistic read/write latency and responses.

    • Allow functional verification of hardware/firmware interaction via the ISS layer.

This modular design ensures that we can replace the ISS with another RISC-V model or extend the peripheral set without overhauling the simulation infrastructure.


4. Simulation Workflow

The following sequence diagram shows the simulation workflow of the design:

The SV testbench sets the pace, ticking the clock and providing the stage for each step. The Unicorn ISS starts by placing the firmware into RAM and begins executing instructions. When it encounters a GPIO access, it pauses mid-run and hands the request—complete with address, data, and length—to the DPI‑C bridge. The bridge acts as a translator, passing the request to the AXI‑Lite driver, which then performs the proper handshake dance with the GPIO RTL.

Once the RTL finishes the job—whether returning data or confirming a write—the result travels back along the same path to the ISS. The ISS resumes from where it left off, and the process continues. This steady back‑and‑forth ensures every memory‑mapped I/O in the firmware is exercised against the real hardware logic, making the simulation both realistic and precise.

The following screenshot shows the simulation result when running with QuestaSim:

Feel free to explore the repository to build the design and tailor it to your needs. Detailed build and run instructions are provided, along with ready-to-use scripts.


5. Key Takeaways & Next Steps

This RISC-V–specific SW/HW co-simulation stack demonstrates that we can combine fast CPU emulation with cycle-accurate RTL simulation in a single test environment using DPI with the following benefits:

  • Early firmware bring-up without waiting for FPGA or silicon.

  • Cycle-accurate hardware behavior for selected components.

  • Modular structure — ISS, DPI layer, and peripherals are independently replaceable.

Potential next steps:

  • Extend the ISS model to support privileged-mode features (e.g., custom CSRs).

  • Add more peripherals and DMA engines to broaden the test coverage.

  • Implement timing models for more accurate cycle estimation.

0
Subscribe to my newsletter

Read articles from Shuran Xu directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Shuran Xu
Shuran Xu

I am currently a senior software engineer at Microchip working on embedded applications in HLS C++. In my free time I love building side projects in a wide variety, ranging from RTL -level modular design to simple games in C++. Additionally, I also enjoy writing technical blogs to share my knowledge and insights with everyone interested in the tech world.