Code-generation in Brain2 : From Math to Machine Code

Mrigesh ThakurMrigesh Thakur
9 min read

The Challenge: Making Python Fast Enough for Neuroscience

An exploration of the compilation pipeline that transforms high-level neural network descriptions into high-performance simulations

How does a neuroscience simulator convert mathematical equations written in Python into code that runs at near-C speeds? This question lies at the heart of Brian's architecture, where an elegant code generation system bridges the gap between user-friendly syntax and computational performance.

The Fundamental Challenge

Neural network simulations present a unique computational challenge. Consider a modest simulation with 10,000 neurons running for 1 second of biological time with 0.1ms timesteps—this requires 100 million computational updates. Pure Python execution would render such simulations impractical.

Brian's solution involves dynamic code generation: analyzing user equations at runtime and generating optimized, compiled code tailored to the specific mathematical operations required..

# User input: High-level mathematical description
G = NeuronGroup(1000, 'dv/dt = (I - v)/tau : volt')
G.v = -70*mV

# Brian's challenge: Transform this into efficient execution

Architecture Overview: The Compilation Pipeline

Brian's code generation follows a systematic pipeline from abstract mathematical descriptions to executable machine code:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   User Code     │───▶│  Code Generation │───▶│  Compiled Code  │
│                 │    │     Pipeline     │    │                 │
│ Mathematical    │    │                  │    │ Optimized       │
│ Equations       │    │ ┌──────────────┐ │    │ Machine Code    │
│                 │    │ │ Parsing      │ │    │                 │
└─────────────────┘    │ │ Translation  │ │    └─────────────────┘
                       │ │ Templates    │ │
                       │ │ Compilation  │ │
                       │ └──────────────┘ │
                       └──────────────────┘

Phase 1: Mathematical Parsing and Abstract Code Generation

Let's follow a differential equation through Brian's code generation pipeline. It's actually a pretty beautiful process:

Input Equation:

'dv/dt = (I - v)/tau : volt'

Abstract Code Generation:
So Behind the scenes, Brian parses your equation and thinks:
"Ah, this is a differential equation that I need to solve numerically." It converts this into abstract code:

┌─────────────────────────────────────────────────────────────┐
│                    Equation Processing                      │
├─────────────────────────────────────────────────────────────┤
│ Original: dv/dt = (I - v)/tau                               │
│     ↓                                                       │
│ Parsed: Differential equation with state variable 'v'       │
│     ↓                                                       │
│ Numerical: `v = v + dt * ((I - v)/tau)`                     │
│     ↓                                                       │
│ Abstract Code: Ready for language-specific translation      │
└─────────────────────────────────────────────────────────────┘

Phase 2: Memory Architecture and Variable Management

Brian creates numpy arrays to store your data. This is where the RuntimeDevice comes in - think of it as Brian's memory manager:

Runtime Device Memory Layout
┌─────────────────────────────────────────────────────────────┐
│                    RuntimeDevice.arrays                     │
├─────────────────────────────────────────────────────────────┤
│  Variable('v')  → np.array([-0.07, -0.07, ...], float64)    │
│  Variable('I')  → np.array([0.0, 0.0, ...], float64)        │
│  Variable('tau')→ np.array([0.01, 0.01, ...], float64)      │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                   CodeObject.namespace                      │
├─────────────────────────────────────────────────────────────┤
│  '_array_neurongroup_v'   → [numpy array reference]         │
│  '_array_neurongroup_I'   → [numpy array reference]         │
│  '_array_neurongroup_tau' → [numpy array reference]         │
└─────────────────────────────────────────────────────────────┘

Phase 3: Template-Based Code Generation

Imagine you're a master chef who needs to prepare thousands of different meals very quickly. Instead of inventing each recipe from scratch every time, you keep a collection of proven recipe templates: one for pasta dishes, one for grilled meats, one for desserts, etc. When an order comes in, you quickly identify what type of meal it is and grab the appropriate template, then customize it with specific ingredients.

This is exactly what Brian does with code generation - but instead of recipes, it uses code templates.

Why Templates? The Smart Shortcut

When Brian encounters your neural network equations, it could theoretically generate completely custom code from scratch every time. But that would be like reinventing the wheel for every simulation. Instead, Brian's creators realized something clever: most neural computations follow predictable patterns.

Think about it:

  • Updating neuron voltages? That's always going to involve loops and mathematical operations

  • Detecting spikes? That's always going to involve checking thresholds

  • Transmitting between synapses? That's always going to involve queue management

So instead of starting from zero, Brian keeps a library of battle-tested code templates, each optimized for specific types of neural operations.

The Template Library: Specialized Blueprints

Brian maintains several core templates, each serving as a blueprint for a different type of neural computation:

Brian's Template Collection
┌─────────────────────────────────────────────────────────────────┐
│                    Template Selection                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Your Code: 'dv/dt = (I-v)/tau'                                 │
│                     ↓                                           │
│              [Analysis Engine]                                  │
│                     ↓                                           │
│        "This is a differential equation                         │
│         that updates neuron states"                             │
│                     ↓                                           │
│          [Template Selection]                                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ State Update    │    │ Spike Detection │    │ Synaptic        │
│ Template        │    │ Template        │    │ Transmission    │
│                 │    │                 │    │ Template        │
│ "I know how to  │    │ "I know how to  │    │ "I know how to  │
│ efficiently     │    │ efficiently     │    │ efficiently     │
│ update voltage  │    │ detect when     │    │ pass spikes     │
│ equations"      │    │ neurons spike"  │    │ between neurons"│
│                 │    │                 │    │                 │
│ stateupdate.pyx │    │ threshold.pyx   │    │ synapses.pyx    │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Here's what a Cython template looks like:

# State Update Template (this is a simplified version , actual one is a bit more scary :)
def main(_namespace):
    # Template provides the structure:
    cdef double* [VARIABLE_NAME]_ptr = &_namespace['[ARRAY_NAME]'][0]

    for _idx in range([NUM_NEURONS]):
        # Your equation gets inserted here:
        [VARIABLE_NAME]_ptr[_idx] = [YOUR_MATH_EXPRESSION]

When you write 'dv/dt = (I-v)/tau', Brian fills in:

  • [VARIABLE_NAME] becomes v

  • [ARRAY_NAME] becomes _array_neurongroup_v

  • [NUM_NEURONS] becomes 100 (from your NeuronGroup)

  • [YOUR_MATH_EXPRESSION] becomes v + dt * ((I - v)/tau)

Memory Sharing Architecture

Imagine you and your friend are working on a massive jigsaw puzzle together. In most computer systems, this would work like having two separate puzzle tables: every time your friend (the fast compiled code) wants to work on a piece, they'd have to copy it from your table (Python) to theirs, work on it, then copy it back. This copying back and forth would waste enormous amounts of time.

But Brian has figured out something much smarter: what if you both work on the exact same table?

The Traditional Problem: The Expensive Messenger

In typical programming scenarios, when Python wants to hand data to compiled code (like C++ or Cython), here's what usually happens:

Traditional Approach (Slow!)
┌─────────────────┐                    ┌─────────────────┐
│   Python Data   │                    │  Compiled Code  │
│                 │  1. Copy data →    │                 │
│ [1, 2, 3, 4...] │                    │ [1, 2, 3, 4...] │
│                 │                    │ ↓ Process ↓     │
│                 │                    │ [2, 4, 6, 8...] │
│                 │  ← 2. Copy back    │                 │
│ [2, 4, 6, 8...] │                    │                 │
└─────────────────┘                    └─────────────────┘

This copying is like having a messenger run back and forth between two offices every time they need to share a document. For large neural network simulations with millions of numbers, this messenger would be running marathons!

Brian's Brilliant Solution: Shared Memory

Brian eliminates the messenger entirely. Instead of copying data, both Python and the compiled code work directly on the same memory location. It's like having one shared workspace that everyone can access directly.

How the Magic Works: The Three-Layer Handoff

Let's trace through exactly how Brian achieves this memory sharing miracle:

Layer 1: Python Creates the Foundation

# When you create a NeuronGroup, Brian's RuntimeDevice says:
# "I'll create numpy arrays to store the neuron data"

device.arrays[Variable('v')] = np.array([-70.0, -70.0, -70.0, ...])  # Neuron voltages
device.arrays[Variable('I')] = np.array([0.0, 0.0, 0.0, ...])        # Input currents

These numpy arrays live in a specific place in your computer's memory - let's say memory addresses 1000, 2000, and 3000.

Layer 2: The Namespace Bridge

# Brian creates a "namespace" - like a phone book that maps names to addresses
namespace = {
    '_array_neurongroup_v': device.arrays[Variable('v')],  # Points to address 1000
    '_array_neurongroup_I': device.arrays[Variable('I')],  # Points to address 2000
}

The namespace doesn't copy the data - it just creates references (like bookmarks) pointing to where the real data lives.

Layer 3: Compiled Code Gets Direct Access

# The compiled Cython code receives the namespace and says:
# "Give me direct C pointers to that memory!"

cdef double* v_ptr = &_namespace['_array_neurongroup_v'][0]  # Pointer to address 1000
cdef double* I_ptr = &_namespace['_array_neurongroup_I'][0]  # Pointer to address 2000

# Now it can modify the data directly at C speed:
for _idx in range(N):
    v_ptr[_idx] = v_ptr[_idx] + dt * ((I_ptr[_idx] - v_ptr[_idx])/tau)

The Beautiful Result: True Collaboration

Here's what makes this so elegant:

Brian's Zero-Copy Architecture
┌─────────────────────────────────────────────────────────────┐
│                    Computer Memory                          │
│                                                             │
│  Address 1000: [-70.0, -70.0, -70.0, ...]  ← Voltage data   │
│  Address 2000: [0.0, 0.0, 0.0, ...]        ← Current data   │
│                     ↑           ↑                           │
│                     │           │                           │
│         ┌───────────┘           └───────────┐               │
│         │                                   │               │
│ ┌───────▼──────┐                   ┌───────▼──────┐         │
│ │ Python says: │                   │ Cython says: │         │
│ │ "My numpy    │                   │ "My C pointer│         │
│ │  array lives │                   │  points to   │         │
│ │  here"       │                   │  here"       │         │
│ └──────────────┘                   └──────────────┘         │
└─────────────────────────────────────────────────────────────┘

Both Python and the compiled code are looking at the exact same memory locations. When the Cython code changes a value, Python sees it instantly. When Python modifies an array, Cython sees it immediately. No copying, no delays, no waste.

Complete Data Flow Analysis

The full compilation and execution cycle demonstrates the sophisticated coordination between components:

But Wait, There's a Problem... 🤔

Now, here's where our discussion takes an interesting turn. This system, while clever, has some pain points that are bugging the Brian2 (and me, as I work on fixing them!).

The Cython Compilation Bottleneck

Every time Brian needs to generate new code, it has to:

  1. Generate Cython source code

  2. Call the Cython compiler

  3. Call the C++ compiler

  4. Link everything into a shared library

  5. Import the result back into Python

This compilation dance can take several seconds - which doesn't sound like much until you're iterating on a model and waiting for compilation every single time….

The SpikeQueue: A Case Study in Inefficiency

Let me show you a specific example that really highlights the problem. One of the most critical components in neural network simulation is the SpikeQueue - it manages when spikes get delivered between neurons.

Here's what currently happens in Brian:

# 1. Generated Cython code calls a Python method
{% block maincode %}
    _owner.push_spikes()  # Python method call!
{% endblock %}

# 2. Python method extracts spikes and calls another Python method  
def push_spikes(self):
    events = self.eventspace[: self.eventspace[len(self.eventspace) - 1]]
    if len(events):
        self.queue.push(events)  # Another Python call!

# 3. Finally calls the C++ implementation through Cython
def push(self, np.ndarray[int32_t, ndim=1, mode='c'] spikes):
    self.thisptr.push(<int32_t*>spikes.data, spikes.shape[0])

So we have compiled code calling Python calling Cython calling C++. It's like ordering food through three different delivery apps when you could just walk to the kitchen!

What we Have

A more efficient approach would let the generated code directly call the C++ implementation, cutting out all the Python middlemen.

What we want

What we want

So this is the challenge I’ll be solving the following week and documenting what I did and how … so stay tuned .. :)

0
Subscribe to my newsletter

Read articles from Mrigesh Thakur directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mrigesh Thakur
Mrigesh Thakur