Enhancing Retro Games - REX
Hi, I'm jsd1982. I've been playing around in the retro-gaming enhancement scene for the last 5 years or so. I'm primarily invested in making technologies to enable applications to enhance SNES games with unique experiences.
REX
REX, short for "Retro Extensions", is a project containing multiple subsystems that solve problems dealing with enhancing retro games. The goal is to level the playing field between emulators and hardware consoles by offering applications a single API that always works for both cases.
The current implementation of REX is being embedded into a custom fork of snes9x.
Before we dig into REX, we need to get some history and context out of the way first.
ALttPO
I suppose my most notable personal accomplishment to date has been ALttPO, A Link To The Past Online. Most of my recent work is tangentially related to that project.
ALttPO allows multiple players to play a single ALTTP game with each other over the public Internet. The main feature, and what the project started with, is "sprite sync" which enables players to see other players in the same area of the world when they happen to cross paths. We can see an example of that in the screenshot above where 10 players can all see each other in Link's house, each sporting a unique color and customized player sprite.
The remaining features of ALttPO deal with world-state synchronization among players so that teams can share progression, items, and even the current state of a room such as if pots are picked up, trap doors triggered, doors opened or unlocked, etc.
ALttPO works by customizing bsnes with a scripting language called AngelScript. The main logic is implemented in AngelScript. This is powered by C++ code inside bsnes that provides the scripting language with useful functions dealing with networking, file I/O, and UI window management.
To show the extra players on screen, the emulated PPU in bsnes was enhanced to allow AngelScript to render extra dynamic sprites into the frame just as if they were native SNES PPU sprites but using their own VRAM-like memory. The problem isn't that there aren't enough hardware sprite slots, it's that there isn't enough VRAM to store all the tile graphics for the sprites of all the players simultaneously. This restriction was lifted with the custom sprite engine.
O2, a.k.a. ALttPO v2
After ALttPO (v1) reached a stable-enough state, there came many requests for hardware console support. And so began the O2 project.
I looked into the SD2SNES / FX Pak Pro around that time since it was already a popular flash cart. I was pleasantly surprised by what I found. The main benefit of the FX Pak Pro for application developers is that it has a microcontroller with a USB port and simple protocol exposed to allow PC-side applications to read/write SNES memory. Even better - the microcontroller firmware is open source and can be modified, recompiled, and uploaded to the SD card.
As for O2's goal of console support, it would seem that the main "sprite sync" feature of ALttPO would be virtually impossible to implement on an unmodified SNES console primarily due to VRAM limitations. The best that could be done for hardware consoles is to offer only the ALttPO features dealing with game state synchronization and skip the sprite sync. This is what O2 does.
O2 uses the FX Pak Pro's USB protocol to read/write SNES memory but does so in a roundabout way due to some limitations we'll get into later. O2 can also connect to existing SNES emulators extended with network protocols to allow emulated memory access. Of note are RetroArch with its network-command feature, snes9x-rr with its embedded Lua 5.1 script host, and snes9x-nwa with its custom EmuNWA protocol.
A thorny problem exists in how SNES memory is accessed between emulators and hardware consoles via the FX Pak Pro. Let's dig into it...
Emulator Memory Access
In SNES emulators, every memory access protocol and scripting language extension (listed above) gives applications an API with essentially unlimited access to all emulated SNES memory chips. While this certainly makes things easy for applications to read and write memory freely, it completely falls apart when those same applications want to support the hardware console via the FX Pak Pro's USB protocol using that same memory access API.
So what exactly falls apart here?
Well, for starters, applications cannot directly write to WRAM via the FX Pak Pro.
That's kind of a big deal since WRAM is the primary work RAM of the system and any modification you'd want to make to a game's running state probably wants to end up in WRAM. It's no fault of the FX Pak Pro itself, it's just a consequence of how the SNES works.
Writing to WRAM is trivial to do with emulators because there are no physical barriers within the emulator like there are with hardware that prevent such memory writes.
FX Pak Pro
Generally speaking, a cartridge plugged into the SNES only has read and write access to the memory chips in the cartridge itself: usually the game ROM chip and the traditionally battery-backed save SRAM chip which stores your saved game data.
Cartridges have no direct access to any of the memory chips internal to the SNES console such as WRAM (work RAM), VRAM (video RAM), CGRAM (palette), OAM (sprites), APURAM (audio), etc. These internal chips can only be written to by the SNES CPU. However, a cartridge with enough smarts in it can observe traffic that goes over the system bus. Such traffic includes reads and writes to most of these internal memory chips.
This is exactly what the FX Pak Pro does. It monitors the system bus activity via its FPGA and captures the writes made to the various internal SNES memory chips into a relatively large SRAM chip inside the cartridge.
A custom USB protocol (called usb2snes) is exposed by the FX Pak Pro firmware to allow applications to read from and write to this internal SRAM chip directly. (The design of this protocol is a little weird - that's a story for another post - but it gets the job done at least.)
Writing to this internal SRAM chip via the USB protocol can change the game ROM and save SRAM areas that are directly mapped into the system bus by the FPGA. However, writing to the other portions that capture the data in internal SNES memory chips (like WRAM) won't have any effect on the system because those areas are just read-only copies of observed data.
The NMI EXE feature
To work around not being able to write directly to WRAM, the FX Pak Pro offers a feature I'm calling "NMI EXE" that lets applications dynamically inject a small amount of custom 65816 ASM code into the game on a per-frame basis.
This feature works by redirecting the NMI (non-maskable interrupt) vector from somewhere in the game ROM (where it normally is) to a normally-unused memory region at $2C00
mapped to a 512-byte BRAM buffer in the FPGA. NMI is triggered by the PPU when it reaches the end of the visible area of the frame to allow the game code to update the PPU state for the next frame to be drawn.
While not as flexible as direct write access to WRAM, this feature can allow an application to perform a few WRAM writes per frame or do other things via custom ASM code. The effective write bandwidth is very low and depends on how many cycles the whole NMI routine takes to complete before scanline 0 is reached. If NMI runs too long you'll start to see black bars at the top of the next frame.
REX - NMI EXE
As part of the REX snes9x implementation, the NMI EXE feature of the FX Pak Pro is emulated directly in snes9x to make the emulator compatible with the FX Pak Pro. This helps applications target a single API that works for both emulator and console alike.
USB Latency
A problem for certain applications is the relatively high-latency access via USB 1.1 to the FX Pak Pro's internal SRAM chip that contains all the observed memory data. The microcontroller in the cart manages the USB hardware. The firmware configures the USB as a CDC-class device operating at "full speed" 12 Mbits/s. This is plenty of bandwidth but the problem lies with the poll rate.
The poll rate for full-speed USB devices can be configured in the range of 1ms to 255ms. Using the fastest poll rate of 1ms means that USB data can only be polled at most 16 times during one SNES game frame which runs at ~60Hz or ~16.67ms. That doesn't offer much in the way of timing precision if an application wants to coordinate with a specific bit of game logic that may be writing to memory the application is interested in reading or vice versa.
To get around this latency problem, we can enhance the FX Pak Pro's firmware to do that complex waiting/reading/writing work on behalf of the application and send back the results to the application via USB. After all, the microcontroller in the cart has very low latency access to the internal SRAM chip and can easily spin in a tight loop issuing reads to wait for a specific condition to be met.
Now the problem is how would an application convey such complex logic to the firmware. That's where IOVM comes in.
IOVM
IOVM is a "virtual machine" (VM) that can only perform simple I/O tasks. It's not a general-purpose VM and has no branch instructions of any kind.
The design goals of IOVM are:
Express simple high-level memory operations: read chunk, write chunk, wait until a memory byte meets a condition.
Applications can easily construct IOVM programs dynamically.
Avoid any sort of state management as part of IOVM programs, i.e. no register reads/writes/copies, no stack management, no function calling.
IOVM programs proceed from start to finish in a linear order and do not branch around the program.
Early-abort of IOVM programs based on memory tests.
Have programs be compact in representation to not waste time/space.
Guarantee IOVM program termination at all costs.
Be fast to execute, especially in an embedded environment.
Be implemented in C.
Be embeddable in FX Pak Pro firmware.
Be embeddable in emulator software.
You can find the complete description of how IOVM works here.
With REX, an application can generate an IOVM program, upload it to the REX emulator or FX Pak Pro firmware via USB and execute the program. The application will receive notifications when a read operation completes so it can capture the data being read.
Subscribe to my newsletter
Read articles from jsd1982 directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by