Using FIR Filters for Advanced Multi-band Processing in JUCE

This is an introduction to the implementation of a multi-band audio processor using FIR (Finite Impulse Response) filters in JUCE, using C++.

Getting Started

What's an FIR?

First, it might help to go over a couple definitions.

An FIR, or Finite Impulse Response filter, is a digital filter that processes audio by convolving the input signal with a finite sequence of coefficients, otherwise known as an impulse response. This 1-dimensional series of numbers completely characterizes how the system reacts to input audio.

Unlike IIR (Infinite Impulse Response) filters that can have infinite duration responses, FIR filters use only feedforward paths and always produce a finite-duration output, and consequently no feedback loops. This makes them more stable.

FIR filters also provide linear phase response when their coefficients are symmetric, thereby eliminating phase distortion across frequencies. You get precise control over the frequency response, at the cost of computational complexity and some latency.

What does "filter kernel size" actually mean?

The filter kernel size (sometimes called filter length or number of taps) is just the number of coefficients in your FIR filter's impulse response. In code, it's the length of the array you're “convolving” with your audio.

  • Small kernel size (e.g. 32, 64, 128): The filter is "short." It's fast, low-latency, and light on CPU, but you won't get razor-sharp frequency cutoffs.

  • Large kernel size (e.g. 512, 1024, 2048+): The filter is "long." It'll give you steeper, more precise frequency separation at the cost of higher CPU usage and more latency.

What does this mean in practice?

  • The kernel size is the main thing that sets your filter's frequency resolution and how sharp the transitions are.

  • For a low-pass, a longer kernel gives you a steeper roll-off and less band "bleed."

  • The kernel size also sets the latency:
    For a linear-phase FIR, latency = (kernel size - 1) / 2 samples.

Why use an FIR?

FIR filters are invaluable for many reasons. They provide perfectly linear phase response which prevents phase distortion in multi-band processing and mastering, they're inherently stable without risk of feedback oscillation, they allow arbitrary frequency response shaping through impulse response design, they enable perfect reconstruction when used in filter banks, they have predictable latency that's easy to compensate for ((N-1)/2 samples for kernel size N), and they operate without internal state which makes them robust against numerical issues like denormals and floating-point drift.

FIR filters are the go-to solution when you need transparent, phase-accurate, and stable filtering—especially for multiband splitters, linear-phase EQs, and mastering tools. The tradeoff is higher CPU usage and latency for sharp filters, but we can manage this easily with modern CPUs and FFT-based convolution in professional audio software.

Smaller Filter Kernels

JUCE provides a straightforward way to process smaller filter kernels using juce::dsp::FIR::Filter. [1]

//We use the process duplicator to build a stereo filter from a mono filter
juce::dsp::FIR::Filter<float> firFilter;
juce::dsp::ProcessorDuplicator<juce::dsp::FIR::Filter<float>, juce::dsp::FIR::Coefficients<float>> stereoFilter;
void prepareToPlay(double sampleRate, int samplesPerBlock) override
{
    juce::dsp::ProcessSpec spec;
    spec.sampleRate = sampleRate;
    spec.maximumBlockSize = samplesPerBlock;
    spec.numChannels = 2;

    auto coeffs = juce::dsp::FIR::Coefficients<float>::createLowPass(sampleRate, 2000.0, 101);
    stereoFilter.state = *coeffs;
    stereoFilter.prepare(spec);
}
//Called once per block
void processBlock(juce::AudioBuffer<float>& buffer, juce::MidiBuffer&) override
{
    juce::dsp::AudioBlock<float> block(buffer);
    juce::dsp::ProcessContextReplacing<float> context(block);
    stereoFilter.process(context);
}

In my case, I'm trying to build a multi-band effect, and I need the best kind of FIR filter for the job. To get the best separation between bands, it's generally recommended to use a Linear-Phase Filter, which is known for maintaining that sharp frequency control we want, without any phase distortion. And since these filters often have long impulse responses, sometimes hundreds to thousands of taps... You can imagine the problems we might run into when we want to process them in the time domain.

Thankfully, JUCE provides a better way to process FIR filters with larger filter kernels in the frequency domain, which is much faster due to some FFT mathematics I might talk about in the future.

Working with Larger Filter Kernels

The JUCE API simplifies things for us.[2] Any FIR filter is fundamentally a convolution operation. All you have to do is convert your filter to an impulse response, store it in a juce::AudioBuffer, and load it into the convolution engine in a way that's real-time safe by using the function loadImpulseResponse(). It's thread-safe, and will update the convolver without causing any noticeable cracks, clicks or pops.

Using FIRFilter is fast enough for FIRCoefficients with a size lower than 128 samples. For longer filters, it might be more efficient to use the class juce::dsp::Convolution instead, which does the same processing in the frequency domain thanks to FFT.

Filter Design & Implementation

Filter Design Fundamentals

When designing FIR filters, our goal is to implement a filter with a specific frequency response—such as an ideal low-pass, high-pass, or band-pass. In theory, these ideal filters have perfectly sharp transitions in the frequency domain (for example, an ideal low-pass is 1 below the cutoff and 0 above). However, the impulse response of such an ideal filter is infinitely long and non-causal, which is not practical for real-time audio processing.

To make these filters usable, we must approximate the ideal frequency response with a finite-length (truncated) impulse response. This process involves taking the inverse Fourier transform of the desired frequency response to obtain the corresponding impulse response, and then truncating it to a manageable number of taps (filter length N). This truncation is equivalent to multiplying the infinite impulse response by a window function of length N, which is why windowing is a critical step in FIR filter design.

In summary, FIR filter design is fundamentally about approximating an ideal (infinite, non-causal) frequency response with a practical, finite-length filter that can be implemented efficiently in software or hardware.

JUCE's FilterDesign class

JUCE's dsp::FilterDesign class provides industry-standard algorithms for creating high-quality FIR filters. [3] However, there's an important limitation: it only provides low-pass filter design methods. Here's how we can work around this to create a complete multi-band system:

// JUCE only provides a lowpass FIR design
auto lowpassCoeffs = juce::dsp::FilterDesign<double>::designFIRLowpassWindowMethod(
    cutoffFreq, 
    sampleRate, 
    filterOrder, 
    juce::dsp::FilterDesign<double>::WindowingMethod::blackmanHarris
);

Creating High-pass and Band-pass Filters

Since JUCE doesn't provide direct high-pass or band-pass FIR design methods, we use classic DSP techniques to compensate. [10] For high-pass filters, we use spectral inversion by subtracting the lowpass from a delta (unit impulse). For band-pass filters, we subtract two lowpass filters with different cutoff frequencies.

// However, we can calculate the Highpass from the Lowpass with Spectral Inversion
auto lowpassCoeffs = juce::dsp::FilterDesign<double>::designFIRLowpassWindowMethod(
    cutoffFreq, sampleRate, filterOrder, windowMethod);

std::vector<double> highpassCoeffs(lowpassCoeffs->getFilterOrder() + 1);
for (size_t i = 0; i < highpassCoeffs.size(); ++i)
{
    if (i == highpassCoeffs.size() / 2)
        highpassCoeffs[i] = 1.0 - lowpassCoeffs->coefficients[i];
    else
        highpassCoeffs[i] = -lowpassCoeffs->coefficients[i];
}

// Similarly, we can easily derive the Bandpass from the Two Lowpass Filters
auto lowpass1 = juce::dsp::FilterDesign<double>::designFIRLowpassWindowMethod(
    highFreq, sampleRate, filterOrder, windowMethod);
auto lowpass2 = juce::dsp::FilterDesign<double>::designFIRLowpassWindowMethod(
    lowFreq, sampleRate, filterOrder, windowMethod);

std::vector<double> bandpassCoeffs(lowpass1->getFilterOrder() + 1);
for (size_t i = 0; i < bandpassCoeffs.size(); ++i)
    bandpassCoeffs[i] = lowpass1->coefficients[i] - lowpass2->coefficients[i];

Advanced Techniques

Smart Filter Caching

For a professional multi-band processor, we need to be efficient about handling the dynamic crossover frequencies. Designing FIR filters in real-time is pretty expensive on the CPU, so I'd recommend that you implement some sort of smart caching system. This way, previously designed filters can be instantly re-used for the same parameters, dramatically reducing CPU load.

class MultibandProcessor
{
private:
    std::unordered_map<FilterParams, std::vector<double>, FilterParamsHash> filterCache;
    std::mutex filterCacheMutex;

    std::atomic<int> cacheHits{0};
    std::atomic<int> cacheMisses{0};

    struct FilterParams
    {
        float frequency1, frequency2;
        int order;
        WindowingMethod window;
        double sampleRate;

        bool operator==(const FilterParams& other) const
        {
            return std::abs(frequency1 - other.frequency1) < 0.1f &&
                   std::abs(frequency2 - other.frequency2) < 0.1f &&
                   order == other.order &&
                   window == other.window &&
                   std::abs(sampleRate - other.sampleRate) < 0.1;
        }
    };
};

The Importance of Perfect Reconstruction

Perfect reconstruction ensures professional, artifact-free audio with flat response and matched band gains, while its absence leads to audible artifacts and nonstandard behavior.

This means that the content within the bands needs to be phase coherent and amplitude coherent.

We already ensure phase coherence between bands by using symmetric FIR coefficients, which, if you remember, naturally gives us a linear phase response. We ensure amplitude coherence by ensuring that individual bands aren't normalized.

This completely destroys perfect reconstruction and causes massive gain imbalances between bands.

//We use this to handle verbosity
using AudioBufferSet = std::tuple<juce::AudioBuffer<float> juce::AudioBuffer<float>, juce::AudioBuffer<float>>;

AudioBufferSet designComplementaryFilterBank(float lowFreq, float highFreq, int length)
{
    DBG("Designing complementary filter bank for perfect reconstruction");
    auto lowpassLow = designCachedFIRFilter(lowFreq, 0.0f, length, currentWindowingMethod, "lowpass");
    auto lowpassHigh = designCachedFIRFilter(highFreq, 0.0f, length, currentWindowingMethod, "lowpass");

    std::vector<double> bassCoeffs = lowpassLow;
    std::vector<double> midCoeffs(lowpassLow.size());
    std::vector<double> trebleCoeffs(lowpassLow.size());
    for (size_t i = 0; i < lowpassLow.size(); ++i)
    {
        midCoeffs[i] = lowpassHigh[i] - lowpassLow[i];

        if (i == lowpassLow.size() / 2)
            trebleCoeffs[i] = 1.0 - lowpassHigh[i];
        else
            trebleCoeffs[i] = -lowpassHigh[i];
    }

    double reconstructionSum = 0.0;
    size_t centerTap = lowpassLow.size() / 2;
    reconstructionSum = bassCoeffs[centerTap] + midCoeffs[centerTap] + trebleCoeffs[centerTap];
    DBG("Perfect reconstruction verification: " << reconstructionSum << " (should be 1.0)");
    auto bassIR = juce::AudioBuffer<float>(1, (int)bassCoeffs.size());
    auto midIR = juce::AudioBuffer<float>(1, (int)midCoeffs.size());
    auto trebleIR = juce::AudioBuffer<float>(1, (int)trebleCoeffs.size());

    return std::make_tuple(std::move(bassIR), std::move(midIR), std::move(trebleIR));
}

void loadComplementaryFilters(float lowFreq, float highFreq)
{
    auto [bassIR, midIR, trebleIR] = designComplementaryFilterBank(lowFreq, highFreq, filterOrder);

    bassConvolution.loadImpulseResponse(std::move(bassIR), sampleRate,
        juce::dsp::Convolution::Stereo::no, 
        juce::dsp::Convolution::Trim::no, 
        juce::dsp::Convolution::Normalise::no);

    midConvolution.loadImpulseResponse(std::move(midIR), sampleRate,
        juce::dsp::Convolution::Stereo::no, 
        juce::dsp::Convolution::Trim::no, 
        juce::dsp::Convolution::Normalise::no);

    trebleConvolution.loadImpulseResponse(std::move(trebleIR), sampleRate,
        juce::dsp::Convolution::Stereo::no, 
        juce::dsp::Convolution::Trim::no, 
        juce::dsp::Convolution::Normalise::no);
}

The key insight is that for perfect reconstruction, the sum of all filter impulse responses must equal a unit impulse: [7]

Bass(n) + Mid(n) + Treble(n) = δ(n)

Where:

  • Bass(n) = lowpass(f₁)

  • Mid(n) = lowpass(f₂) - lowpass(f₁)

  • Treble(n) = δ(n) - lowpass(f₂)

Summing these: lowpass(f₁) + [lowpass(f₂) - lowpass(f₁)] + [δ(n) - lowpass(f₂)] = δ(n)

This mathematical relationship only works if you don't normalize the filters individually!

Production Optimization

Production-Grade Optimizations

The previous sections covered the mathematical foundations, but commercial multi-band processors require additional optimizations for professional performance. This level of implementation detail is what separates hobby projects from professional audio software that can compete with industry standards from SSL, Waves, and FabFilter.

Optimized Convolution Engine Configuration

JUCE's dsp::Convolution is a partitioned engine that defaults to a zero-latency "short head + long tail" scheme. [9] This destroys CPU performance when hosting multiple long kernels:

// This is wrong because it kills performance with long IRs
juce::dsp::Convolution bassConvolution;  // Uses zero-latency mode
juce::dsp::Convolution midConvolution;   // CPU explodes with tiny host buffers
juce::dsp::Convolution trebleConvolution;

// Instead, we can configure for optimal partitioning
class OptimizedMultibandProcessor
{
private:
    // Configure convolution engines for production performance
    static constexpr int headSize = 128;      // Larger head for efficiency
    static constexpr int latencyTarget = 256; // Allow latency for larger FFTs

    juce::dsp::Convolution bassConvolution{juce::dsp::Convolution::NonUniform{256}};
    juce::dsp::Convolution midConvolution{juce::dsp::Convolution::NonUniform{256}};
    juce::dsp::Convolution trebleConvolution{juce::dsp::Convolution::NonUniform{256}};

public:
    void prepare(const juce::dsp::ProcessSpec& spec)
    {
        bassConvolution.prepare(spec);
        midConvolution.prepare(spec);
        trebleConvolution.prepare(spec);

        DBG("Convolution engines configured - Head: " << headSize 
            << " samples, Target latency: " << latencyTarget << " samples");
    }
};

Why does this matter? Zero-latency mode forces tiny FFT partitions causing massive CPU overhead, while fixed head sizes with latency targets enables a larger FFT size for dramatically lower CPU usage in multi-kernel systems like multi-band processors.

Lock-Free IR Hot-Swapping

Traditionally, we block access to shared resources while we’re updating them by using a mutex. However, we don’t want to block the audio thread with mutex locks during filter updates, since this will cause audio dropouts in production:

// Now this is how NOT to do things. The following example blocks the audio thread.
std::lock_guard<std::mutex> lock(filterCacheMutex);  // BLOCKS AUDIO!
auto it = filterCache.find(params);
// This is a bit more sensible
class LockFreeIRManager
{
private:
    struct IRUpdate
    {
        enum Type { Bass, Mid, Treble };
        Type bandType;
        juce::AudioBuffer<float> impulseResponse;
        bool ready = false;
    };

    juce::AbstractFifo irUpdateFifo{32}; //lock-free for IR updates
    std::array<IRUpdate, 32> irUpdateBuffer;

    std::unique_ptr<juce::Thread> irDesignThread; //background thread for IR design
    std::atomic<bool> shouldExit{false};


    // The convolvers are "Double-buffered" so that we can switch between them seamlessly.
    struct ConvolverPair
    {
        juce::dsp::Convolution primary{juce::dsp::Convolution::NonUniform{256}};
        juce::dsp::Convolution secondary{juce::dsp::Convolution::NonUniform{256}};
        std::atomic<bool> usePrimary{true};
    };

    ConvolverPair bassConvolvers, midConvolvers, trebleConvolvers;

public:
    LockFreeIRManager()
    {
        irDesignThread = std::make_unique<IRDesignThread>(*this);
        irDesignThread->startThread(juce::Thread::Priority::normal);
    }

    ~LockFreeIRManager()
    {
        shouldExit = true;
        irDesignThread->stopThread(1000);
    }

    void processBlock(juce::AudioBuffer<float>& buffer) //NO LOCKS!!!
    {
        // Check for completed IR updates
        processIRUpdates();

        // Process with active convolvers
        auto& bassConv = bassConvolvers.usePrimary ? 
                        bassConvolvers.primary : bassConvolvers.secondary;
        auto& midConv = midConvolvers.usePrimary ? 
                       midConvolvers.primary : midConvolvers.secondary;
        auto& trebleConv = trebleConvolvers.usePrimary ? 
                          trebleConvolvers.primary : trebleConvolvers.secondary;

        // Process bands accordingly from here...
    }

    // This is called from the UI thread
    void requestIRUpdate(float lowFreq, float highFreq, int order)
    {
        IRDesignRequest request{lowFreq, highFreq, order}; //queued
        designRequestFifo.push(request);
    }

private:
    void processIRUpdates()
    {
        int numReady = irUpdateFifo.getNumReady();
        if (numReady == 0) return;

        int startIndex, blockSize;
        irUpdateFifo.prepareToRead(numReady, startIndex, blockSize);

        for (int i = 0; i < blockSize; ++i)
        {
            auto& update = irUpdateBuffer[(startIndex + i) % 32];
            if (update.ready)
            {
                loadIRToInactiveConvolver(update);
                swapConvolvers(update.bandType);
                update.ready = false;
            }
        }

        irUpdateFifo.finishedRead(blockSize);
    }

    void loadIRToInactiveConvolver(const IRUpdate& update)
    {
        // Load IR to inactive convolver for seamless switching
        switch (update.bandType)
        {
            case IRUpdate::Bass:
                {
                    auto& inactive = bassConvolvers.usePrimary ? 
                                   bassConvolvers.secondary : bassConvolvers.primary;
                    inactive.loadImpulseResponse(
                        juce::AudioBuffer<float>(update.impulseResponse), 
                        sampleRate, juce::dsp::Convolution::Stereo::no,
                        juce::dsp::Convolution::Trim::no, 
                        juce::dsp::Convolution::Normalise::no);
                }
                break;
            // Similar for Mid and Treble...
        }
    }

    void swapConvolvers(IRUpdate::Type bandType)
    {
        // Atomic swap for glitch-free switching
        switch (bandType)
        {
            case IRUpdate::Bass:
                bassConvolvers.usePrimary = !bassConvolvers.usePrimary;
                break;
            // Similar for other bands...
        }
    }
};

Dynamic Latency Reporting

When the filter order changes, we have to update the latency to the host immediately.

class LatencyAwareProcessor : public juce::AudioProcessor
{
private:
    std::atomic<int> currentLatencySamples{0};

public:
    void updateFilterOrder(int newOrder)
    {
        int newLatency = (newOrder - 1) / 2;
        currentLatencySamples = newLatency;
        setLatencySamples(newLatency);
        DBG("Latency updated: " << newLatency << " samples (" 
            << (newLatency / getSampleRate() * 1000.0) << " ms)");
        requestFilterUpdate();
    }

    double getTailLengthSeconds() const override
    {
        return currentLatencySamples.load() / getSampleRate();
    }
};

Filter Design & Implementation

Filter Design Fundamentals

The Gibbs phenomenon is a fundamental challenge in FIR filter design that occurs when we attempt to convert ideal frequency responses into finite-length filters. This mathematical phenomenon manifests as persistent oscillations and ripples that appear when a discontinuous function (like an ideal low-pass filter) is approximated by a finite series. [4]

When we design an ideal filter in the frequency domain and convert it to the time domain using the inverse Fourier transform, we get an infinite-length impulse response. To create a practical FIR filter, we must truncate this infinite response to a finite length N. However, direct truncation (equivalent to applying a rectangular window) causes the Gibbs phenomenon to appear as severe artifacts: strong ripples that don't decay, particularly pronounced near transition bands, resulting in poor stop-band attenuation and audible ringing.

By applying a smooth windowing function, we can taper the impulse response coefficients toward zero at the edges. This smoothing dramatically reduces the intensity of artifacts produced by the Gibbs phenomenon by gradually attenuating the coefficients rather than abruptly cutting them off. Different windowing functions offer various trade-offs between transition bandwidth, stop-band rejection, and ripple characteristics, allowing you to choose the optimal balance for your specific application.

Multiple Windowing Methods Support

JUCE provides several windowing methods, each with different characteristics:

enum class WindowingMethod
{
    rectangular = 0,   // Sharp but with ringing
    hann,              // Good general purpose
    hamming,           // Similar to Hann
    blackman,          // Better stopband rejection
    blackmanHarris,    // Excellent stopband (-92dB)
    kaiser             // Adjustable with beta parameter
};

Blackman-Harris is typically the best choice for multi-band processing due to its superior stop-band rejection (-92dB), minimal spectral leakage, clean frequency separation, and professional audio quality. [5]

The Kaiser window is also suitable because it offers a tunable beta parameter, allowing you to precisely adjust the balance between transition bandwidth and stop-band attenuation.

Ultimately, choosing the best windowing option for the job is situational, so I decided to provide access to all of the Windows and their parameters, to shed some of that responsibility back to the end user.

Real-Time UI Integration

Here, I've integrated filter controls directly into the UI via the JUCE component system:

class MultibandControlPanel : public juce::Component
{
private:
    std::unique_ptr<juce::ComboBox> windowingMethodCombo;
    std::unique_ptr<juce::Slider> filterOrderSlider;

    void setupFilterControls()
    {
        // Windowing method dropdown
        windowingMethodCombo->addItem("Rectangular", 1);
        windowingMethodCombo->addItem("Hann", 2);
        windowingMethodCombo->addItem("Hamming", 3);
        windowingMethodCombo->addItem("Blackman", 4);
        windowingMethodCombo->addItem("Blackman-Harris", 5);
        windowingMethodCombo->addItem("Kaiser", 6);

        windowingMethodCombo->onChange = [this]()
        {
            auto selectedMethod = static_cast<MultibandProcessor::WindowingMethod>(
                windowingMethodCombo->getSelectedId() - 1);
            multibandProcessor.setWindowingMethod(selectedMethod);
        };

        // Filter order slider (128 to 8192 coefficients)
        filterOrderSlider->setRange(128, 8192, 128);
        filterOrderSlider->setValue(2048);
        filterOrderSlider->onValueChange = [this]()
        {
            multibandProcessor.setFilterOrder((int)filterOrderSlider->getValue());
        };
    }
};

The UI shows us the current windowing method and filter order, along with cache performance metrics, filter design timing, and professional-grade visual indicators, all in real-time.

Performance Considerations

Bear in mind that the FIR filter length trades quality for computational cost. 128 samples work for real-time applications, 512 for better quality, 1024 for professional-grade filtering... any more and you might hit significant real-time performance constraints.

That's it! You now have everything you need to build a professional multi-band effect using JUCE's convolution engine. The combination of linear-phase FIR filters and frequency-domain processing gives you pristine audio quality and efficient real-time performance.

Why Linkwitz-Riley Filters for Crossovers?

You might wonder why we use Linkwitz-Riley filters specifically for multi-band crossovers instead of regular Butterworth or Chebyshev filters. The answer lies in perfect reconstruction - the fundamental requirement for transparent multi-band processing.

The Perfect Reconstruction Problem

When splitting audio into multiple bands and recombining them, we need:

  1. Flat magnitude response when all bands are active

  2. Phase coherence across the crossover region

  3. No amplitude dips or peaks at crossover frequencies

Regular filters fail this test. Here's why:

lowpass.setType(juce::dsp::IIR::Coefficients<float>::makeLowPass(sampleRate, crossoverFreq));
highpass.setType(juce::dsp::IIR::Coefficients<float>::makeHighPass(sampleRate, crossoverFreq));

Linkwitz-Riley filters solve this perfectly:

  • 4th-order design (two cascaded 2nd-order Butterworth sections)

  • -6dB crossover point for each filter

  • Complementary magnitude response (sum = 0dB everywhere)

  • Identical group delay (linear phase at crossover)

bassFilter.setType(juce::dsp::LinkwitzRileyFilterType::lowpass);
trebleFilter.setType(juce::dsp::LinkwitzRileyFilterType::highpass);
bassFilter.setCutoffFrequency(crossoverFreq);
trebleFilter.setCutoffFrequency(crossoverFreq);

This is why professional mixing consoles, mastering processors, and high-end speakers use Linkwitz-Riley crossovers exclusively. [8]

A Note on Filter Coefficient Functions

Throughout this article, I've focused on JUCE's built-in filter design methods rather than diving deep into the mathematical derivations of filter coefficients. Here's why:

The mathematics behind FIR filter design involves:

  • Complex frequency domain analysis (Z-transforms, DFT theory)

  • Window function mathematics (Fourier analysis, spectral leakage theory)

  • Numerical optimization algorithms (Parks-McClellan, Remez exchange)

  • Advanced signal processing concepts (impulse response, frequency response)

Each windowing method requires extensive mathematical background: Kaiser window (Bessel functions, ripple optimization), Blackman-Harris (4-term cosine series), Hamming/Hann (raised cosine functions with different DC components).

So, rather than reproduce textbook DSP theory, I've focused on practical implementation using JUCE's professionally-tested algorithms, such as those found in the FilterDesign class, which encapsulates years of DSP research into simple methods.

Implementation Details

N-Band Scalable Architecture

I used these principles to build a class that can handle any arbitrary number of bands. This architecture has several nifty features. The use of template parameters enables compile-time safety checks for array sizes and band configurations. Each band gets its own configuration structure containing a name, ID, frequency range and enabled state. We space the bands logarithmically across the frequency spectrum to match human perception (albeit not perfectly... more on that later) The filter bank is carefully designed to achieve perfect reconstruction. I even confirmed this... when testing, the bands sum to unity.

template<size_t NumBands>
class MultibandProcessor
{
    static_assert(NumBands >= 2, "Must have at least 2 bands");
    static_assert(NumBands <= 32, "Maximum 32 bands supported");

public:
    // Band configuration structure
    struct BandConfig 
    {
        std::string name;
        std::string id;
        float lowFreq = 20.0f;
        float highFreq = 20000.0f;
        bool enabled = true;
    };

    MultibandProcessor()
    {
        // Initialize with logarithmically-spaced bands
        float logMin = std::log10(20.0f);
        float logMax = std::log10(20000.0f);
        float logStep = (logMax - logMin) / NumBands;

        for (size_t i = 0; i < NumBands; ++i)
        {
            float lowFreq = std::pow(10.0f, logMin + i * logStep);
            float highFreq = std::pow(10.0f, logMin + (i + 1) * logStep);

            bandConfigs[i] = BandConfig{
                "Band " + std::to_string(i + 1),
                "band_" + std::to_string(i),
                lowFreq,
                highFreq
            };
        }
    }

    // 1. Process...
    void processBlock(juce::AudioBuffer<float>& buffer)
    {
        // 2. Split...
        for (size_t band = 0; band < NumBands; ++band)
        {
            if (!bandConfigs[band].enabled) continue;

            //3. Process...
            processBand(band, buffer);
        }

        //4. Perfectly reconstruct...
        reconstructOutput(buffer);
    }

private:
    std::array<BandConfig, NumBands> bandConfigs;
    std::array<juce::dsp::Convolution, NumBands> bandConvolutions;
};

// Some convenient type aliases...
using ThreeBandProcessor = MultibandProcessor<3>;    // Bass, Mid, Treble
using FourBandProcessor = MultibandProcessor<4>;     // Sub, Low, Mid, High
using FiveBandProcessor = MultibandProcessor<5>;     // Sub, Bass, Low Mid, High Mid, Treble
using EightBandProcessor = MultibandProcessor<8>;    // Professional mastering
using SixteenBandProcessor = MultibandProcessor<16>; // Surgical precision

Here's how to use the scalable architecture in practice:

class MasteringProcessor : public juce::AudioProcessor
{
private:
    // Choose number of bands based on application
    EightBandProcessor processor;  // 8 bands for mastering

    void processBlock(juce::AudioBuffer<float>& buffer,
                     juce::MidiBuffer& midiMessages) override
    {
        processor.processBlock(buffer);
    }

    void prepare(double sampleRate, int samplesPerBlock)
    {
        // Load band-specific plugins
        processor.loadPlugin(0, "path/to/sub_enhancer.vst3");
        processor.loadPlugin(1, "path/to/bass_compressor.vst3");
        // ... load other band plugins
    }
};

class SimpleMixProcessor : public juce::AudioProcessor
{
private:
    // Basic 3-band processing for mixing
    ThreeBandProcessor processor;

    void processBlock(juce::AudioBuffer<float>& buffer,
                     juce::MidiBuffer& midiMessages) override
    {
        processor.processBlock(buffer);
    }
};

Advanced Configuration

See how easy it is to customize the band configuration?

void setupCustomBands()
{
    std::array<BandConfig, 4> customBands = {
        BandConfig{"Sub", "sub", 20.0f, 60.0f},
        BandConfig{"Bass", "bass", 60.0f, 200.0f},
        BandConfig{"Mid", "mid", 200.0f, 2000.0f},
        BandConfig{"High", "high", 2000.0f, 20000.0f}
    };

    processor.setBandConfiguration(customBands);
}

This architecture scales from simple 3-band processing to complex mastering applications while maintaining perfect phase alignment and reconstruction.

FIR Filter Latency Compensation

One of the most critical aspects of FIR multi-band processing is proper latency compensation. Linear-phase FIR filters introduce inherent delay that must be:

  1. Calculated correctly

  2. Compensated between bands

  3. Reported accurately to the host DAW

The Critical Latency Formula

For a linear-phase FIR filter, the latency in samples is:

int getLatencySamples() const 
{ 
    if (filterType == FilterType::LinearPhase)
        return (currentFilterOrder - 1) / 2;
    else
        return 0;
}

Why (N-1)/2 and not just N/2? Linear-phase FIR filters have symmetric impulse responses around a center tap:

  • For N coefficients: [0, 1, 2, ..., N-1]

  • Center tap position: (N-1)/2

  • Required look-ahead: (N-1)/2 samples

// Example: 5-tap filter [a, b, c, b, a]
// Positions:              0  1  2  3  4
// Center tap at position: (5-1)/2 = 2
// Latency: 2 samples

Comprehensive Latency Management

Your plugin must handle three types of latency: [11]

  1. Per-band filter latency

  2. Oversampling latency (if used)

  3. Processing latency (from other effects)

class LatencyManager
{
private:
    std::array<juce::dsp::DelayLine<float>, NumBands> latencyCompensation;
    std::atomic<int> currentLatencySamples{0};

public:
    void prepare(const juce::dsp::ProcessSpec& spec)
    {
        const int maxLatency = 8192;
        for (auto& delay : latencyCompensation)
        {
            delay.prepare(spec);
            delay.setMaximumDelayInSamples(maxLatency);
        }
    }

    void updateLatency()
    {
        int filterLatency = (currentFilterOrder - 1) / 2;

        int oversamplingLatency = 0;
        if (oversamplingEnabled)
        {
            oversamplingLatency = oversamplers[0].getLatencyInSamples();
            jassert(std::all_of(oversamplers.begin(), oversamplers.end(),
                [&](const auto& os) { return os.getLatencyInSamples() == oversamplingLatency; }));
        }

        int processingLatency = 0;
        for (const auto& plugin : bandPlugins)
            if (plugin) processingLatency = std::max(processingLatency, plugin->getLatencySamples());

        int totalLatency = filterLatency + oversamplingLatency + processingLatency;
        currentLatencySamples.store(totalLatency);

        if (hostLatencyCallback)
            hostLatencyCallback(totalLatency);

        DBG("Latency updated - Filter: " << filterLatency 
            << "ms, Oversampling: " << oversamplingLatency 
            << "ms, Processing: " << processingLatency << "ms");
    }

    void processBlock(juce::AudioBuffer<float>& buffer)
    {
        for (size_t band = 0; band < NumBands; ++band)
        {
            if (latencyCompensation[band].getDelay() > 0.0f)
            {
                juce::dsp::AudioBlock<float> block(bandBuffers[band]);
                juce::dsp::ProcessContextReplacing<float> context(block);
                latencyCompensation[band].process(context);
            }
        }
    }
};

Latency-Matched Bypass

When bypassing the plugin, you must delay the dry signal by the same amount:

class LatencyMatchedBypass
{
private:
    juce::dsp::DelayLine<float> bypassDelay;
    bool bypassed = false;

public:
    void prepare(const juce::dsp::ProcessSpec& spec)
    {
        bypassDelay.prepare(spec);
        bypassDelay.setMaximumDelayInSamples(8192);
    }

    void setBypassDelay(float delaySamples)
    {
        bypassDelay.setDelay(delaySamples);
    }

    void processBlock(juce::AudioBuffer<float>& buffer)
    {
        if (bypassed)
        {
            juce::dsp::AudioBlock<float> block(buffer);
            juce::dsp::ProcessContextReplacing<float> context(block);
            bypassDelay.process(context);
        }
        else
        {
            processMultibandFIR(buffer);
        }
    }
};

Verifying Latency Compensation

Use this test to verify your latency compensation is working:

bool testLatencyCompensation()
{
    juce::AudioBuffer<float> test(2, 4096);
    test.clear();
    test.setSample(0, 128, 1.0f);
    test.setSample(1, 128, 1.0f);

    auto original = test;
    processor.processBlock(test, juce::MidiBuffer());

    int inputPos = 128;
    int outputPos = -1;
    for (int i = 0; i < test.getNumSamples(); ++i)
    {
        if (std::abs(test.getSample(0, i)) > 0.9f)
        {
            outputPos = i;
            break;
        }
    }

    int measuredLatency = outputPos - inputPos;
    int reportedLatency = processor.getLatencySamples();

    DBG("Latency Test - Measured: " << measuredLatency 
        << ", Reported: " << reportedLatency);

    return measuredLatency == reportedLatency;
}

Remember: Proper latency compensation is critical for:

  • Phase alignment between bands

  • Host plugin delay compensation (PDC)

  • Automation timing

  • Bypass functionality

  • Multi-instance plugin synchronization

And remember... Always verify your latency calculations with both unit tests and real-world testing in a DAW.

Critical Implementation Details

Lock-Free IR Loading Threading

Do not call loadImpulseResponse() from a background thread, even though it's wait-free. The function must be invoked on the audio thread only:

void designAndLoadFIRFilters(float lowFreq, float highFreq)
{
    currentPendingIR = std::make_unique<PendingIR>();
    auto [bassIR, midIR, trebleIR] = designComplementaryFilterBank(lowFreq, highFreq, currentFilterOrder);

    currentPendingIR->bassIR = std::move(bassIR); // Store in pending structure
    pendingIRs.store(currentPendingIR.get()); // This is our lock-free handoff
}

void checkForPendingIRs()
{
    // MUST be called from audio thread only!
    auto* pending = pendingIRs.load();
    if (pending != nullptr && pending->ready)
    {
        // Load on audio thread (wait-free operation)
        bassConvolution.loadImpulseResponse(std::move(pending->bassIR), sampleRate,
            juce::dsp::Convolution::Stereo::no, juce::dsp::Convolution::Trim::no, 
            juce::dsp::Convolution::Normalise::no);
        pendingIRs.store(nullptr);
    }
}

Phase Alignment Verification

After IR swap, you MUST verify that all convolvers report identical latency:

void checkForPendingIRs()
{
    // ... load IRs ...

    // CRITICAL: Verify phase alignment after IR swap
    int bassLatency = bassConvolution.getLatency();
    int midLatency = midConvolution.getLatency();
    int trebleLatency = trebleConvolution.getLatency();

    // Assert latency matching (critical for phase alignment)
    if (bassLatency != midLatency || midLatency != trebleLatency)
    {
        DBG("WARNING: Latency mismatch detected! This will cause phase smearing.");
        // TODO: Add DelayLine compensation for mismatched latencies
    }
}

Comprehensive Latency Reporting

Report latency changes immediately after every structural change.

int getLatencySamples() const 
{ 
    int filterLatency = 0;
    if (filterType == FilterType::LinearPhase)
        filterLatency = (currentFilterOrder - 1) / 2;  // FIR latency

    // Add oversampling latency if enabled
    int oversamplingLatency = 0;
    if (oversamplingFactor > 1 && bassOversampling != nullptr)
    {
        // CRITICAL: Include oversampling delay for Ableton Live/Logic Pro PDC
        oversamplingLatency = bassOversampling->getLatencyInSamples();
    }

    return filterLatency + oversamplingLatency;
}

// Call setLatencySamples() immediately after ANY structural change
void setOversamplingFactor(int factor)
{
    oversamplingFactor = factor;
    audioProcessor.notifyLatencyChanged(); // Critical for Live/Logic PDC
}

De-normal Protection Commentary

void processBlock(juce::AudioBuffer<float>& buffer, juce::MidiBuffer& midiMessages)
{
    // Denormal protection prevents Intel/AMD CPU throttling to sub-MHz speeds <sup>[[6]](#references)</sup>
    // when convolution tail partitions decay to denormal values.
    // It's really more of an issue with older processors... but you can't be too sure.
    juce::ScopedNoDenormals noDenormals;
}

The Lock-Free Queue System

Now, remember... filter updates are painfully slow. Every time a user tweaks a crossover frequency or changes the filter order, you're looking at 5-50ms of filter design time, which can break flow.

The naive approach is to design filters directly in the parameter change callback. This blocks the audio thread, causes dropouts, and makes your plugin feel sluggish. Users will notice immediately.

A smarter solution is to use a lock-free queue system that relegates all the heavy lifting to a background thread while keeping the audio thread responsive.

The lock-free queue uses memory ordering to ensure thread safety. There are three options we can use to control how atomic operations synchronize between threads:

  • memory_order_relaxed: Fast, no synchronization with other reads/writes

  • memory_order_acquire: Ensures all writes before a release are visible

  • memory_order_release: Ensures all writes before this are visible to acquire

This is lock-free and wait-free. That means no mutexes, no blocking and no priority inversion.

Why This Matters for Professional Audio

Professional audio software needs to be exceptional and reliable. When a producer turns a knob or moves a fader in the studio, they expect immediate visual feedback, zero audio artifacts, seamless automation, and consistent performance regardless of plugin complexity. The lock-free queue system makes this possible by decoupling filter design from audio processing, with background threads handling the heavy lifting while the audio thread remains unblocked. This is the kind of implementation detail that separates good plugins from great ones - your users might not understand the technical details, but they'll definitely feel the difference.

Critical Implementation Details

Heap Allocation for FIFO: The lock-free FIFO uses std::unique_ptr<T[]> instead of std::array<T, Size>. Large queues (>32 entries) can cause a crash by exceeding the plugin stack limits on MacOS.

Power-of-Two Queue Size: The FIFO size must be a power of two (2, 4, 8, 16, 32...) for optimal cache performance. Non-power-of-two sizes cause cache aliasing during modulus operations, degrading performance significantly.

Extended De-normal Protection: ScopedNoDenormals must wrap the entire processing chain, not just convolution. De-normals frequently arise during the down-sample stage of over-sampled plugins, causing severe CPU performance degradation if not protected.

The Problem: Blocking Filter Design

With the naive approach, users will experience audio dropouts during parameter changes, sluggish UI response when adjusting controls, and poor user experience compared to IIR-based plugins.

// This is wrong because it blocks the audio thread...
void setCrossoverFrequency(float newFreq)
{
    crossoverFreq = newFreq;

    // So this ends up taking up to 10-50ms for large filter orders!
    auto [bassIR, midIR, trebleIR] = designComplementaryFilterBank(
        crossoverFreq, highFreq, 2048);  // 2048 taps = ~25ms design time

    // This will work fine, but it doesn't matter if we've already blocked the thread
    bassConvolution.loadImpulseResponse(std::move(bassIR), sampleRate, ...);

    // Don't forget... if the audio thread is blocked, you're going to get dropouts.
}

The Lock-Free Solution: Background Processing

The secret is to handle the filter design asynchronously. This means the impulse response should be calculated on a background thread, the actual loading should happen on the main thread, and there should be no copying of the juce::AudioBuffer used to convey the impulse.

template<typename T, size_t Size>
class LockFreeFIFO
{
    static_assert((Size & (Size - 1)) == 0, "FIFO size must be a power of two for optimal cache performance");

public:
    LockFreeFIFO() : writeIndex(0), readIndex(0), buffer(std::make_unique<T[]>(Size)) {}

    bool push(T&& item)
    {
        const auto currentWrite = writeIndex.load(std::memory_order_relaxed);
        const auto nextWrite = (currentWrite + 1) % Size;

        if (nextWrite == readIndex.load(std::memory_order_acquire))
            return false; // The queue is full

        buffer[currentWrite] = std::move(item);
        writeIndex.store(nextWrite, std::memory_order_release);
        return true;
    }

    bool pop(T& item)
    {
        const auto currentRead = readIndex.load(std::memory_order_relaxed);

        if (currentRead == writeIndex.load(std::memory_order_acquire))
            return false; // The queue is empty

        item = std::move(buffer[currentRead]);
        readIndex.store((currentRead + 1) % Size, std::memory_order_release);
        return true;
    }

private:
    std::unique_ptr<T[]> buffer;
    std::atomic<size_t> writeIndex{0};
    std::atomic<size_t> readIndex{0};
};

This is a wait-free data structure. The audio thread never blocks, even if the background thread is busy designing filters.

IR Package: Your Data Container

We need a container to transfer the designed filters:

struct IRPackage
{
    juce::AudioBuffer<float> bassIR;
    juce::AudioBuffer<float> midIR;
    juce::AudioBuffer<float> trebleIR;
    double designTime{0.0};
    int filterOrder{0};

    // Move-only semantics. No Copying.
    IRPackage() = default;
    IRPackage(IRPackage&&) = default;
    IRPackage& operator=(IRPackage&&) = default;
};

The Background Design Thread

Here's where the magic happens. All filter design moves to a dedicated background thread. Its lifecycle is managed by our MultiBandProcessor class.

class MultibandProcessor
{
private:
    static constexpr size_t IR_QUEUE_SIZE = 8;
    LockFreeFIFO<IRPackage, IR_QUEUE_SIZE> irQueue;

    std::thread filterDesignThread;
    std::atomic<bool> shouldStopThread{false};
    std::atomic<bool> filterUpdateNeeded{false};
    std::mutex filterUpdateMutex;
    std::condition_variable filterUpdateCondition;

    std::atomic<float> currentLowFreq{250.0f};
    std::atomic<float> currentHighFreq{2500.0f};

public:
    MultibandProcessor()
    {
        startFilterDesignThread();
    }

    ~MultibandProcessor()
    {
        stopFilterDesignThread();
    }
};

Now parameter changes are instantaneous...

void setCrossoverFrequency(float newFreq)
{
    currentLowFreq.store(newFreq);
    filterUpdateNeeded.store(true);
    filterUpdateCondition.notify_one();
}
void startFilterDesignThread()
{
    filterDesignThread = std::thread([this]() {
        while (!shouldStopThread.load())
        {
            std::unique_lock<std::mutex> lock(filterUpdateMutex);
            filterUpdateCondition.wait(lock, [this] { 
                return filterUpdateNeeded.load() || shouldStopThread.load(); 
            });

            if (shouldStopThread.load())
                break;

            if (filterUpdateNeeded.load())
            {
                filterUpdateNeeded.store(false);
                lock.unlock();

                designFiltersAsync();
            }
        }
    });
}

void designFiltersAsync()
{
    auto startTime = juce::Time::getMillisecondCounterHiRes();

    auto [bassIR, midIR, trebleIR] = designComplementaryFilterBank(
        currentLowFreq.load(), currentHighFreq.load(), currentFilterOrder);

    auto endTime = juce::Time::getMillisecondCounterHiRes();
    double designTime = endTime - startTime;

    IRPackage package;
    package.bassIR = std::move(bassIR);
    package.midIR = std::move(midIR);
    package.trebleIR = std::move(trebleIR);
    package.designTime = designTime;
    package.filterOrder = currentFilterOrder;

    if (!irQueue.push(std::move(package)))
    {
        DBG("IR queue full - dropping filter update (rare and OK)");
    }
    else
    {
        DBG("Filter design completed in " << designTime << "ms - queued for audio thread");
    }
}

The audio thread just checks the queue and loads any available filters...

void processFIRFiltering(int numChannels, int numSamples)
{
    processIRQueue();
    juce::dsp::AudioBlock<float> bassBlock(tempBufferBass);
    juce::dsp::ProcessContextReplacing<float> bassContext(bassBlock);
    bassConvolution.process(bassContext);
}

void processIRQueue()
{
    IRPackage pkg;
    while (irQueue.pop(pkg))
    {
        bassConvolution.loadImpulseResponse(
            std::move(pkg.bassIR),
            sampleRate,
            juce::dsp::Convolution::Stereo::no,
            juce::dsp::Convolution::Trim::no,
            juce::dsp::Convolution::Normalise::no);

        midConvolution.loadImpulseResponse(
            std::move(pkg.midIR), sampleRate, ...);

        trebleConvolution.loadImpulseResponse(
            std::move(pkg.trebleIR), sampleRate, ...);

        DBG("IRs loaded on audio thread (design time: " 
            << pkg.designTime << "ms)");
    }
}

Wrapping Up

To Conclude

Honestly, multi-band processing is one of those topics where every piece matters. JUCE doesn't provide linear-phase filters out of the box. Although that's inconvenient, they do provide the tools to set them up yourself. It's definitely down to your ingenuity and willingness to learn what you need and piece together the information you need to get the job done.

That brings me to the most important part. Part of developing plugins is writing code. The rest is an exercise in testing and quality control. In the next session, I'll go over some of the automated and manual testing I did to ensure the filters operated the way I wanted them to.

There are certain things I'd like to explore... such as implementing the Confined Gaussian window, as it's suggested that it has the most focused time-frequency shape for a given window width. Although JUCE doesn't provide the function, the approximation is fairly simple to calculate.

Major thanks to Simon Weis from Dark Palace Studios for sparking my inspiration, going through the first draft and making some suggestions.

References

[1] JUCE documentation: dsp::FIR::Filter. https://docs.juce.com/master/classdsp_1_1FIR_1_1Filter.html
[2] JUCE documentation: dsp::Convolution. https://docs.juce.com/master/classdsp_1_1Convolution.html
[3] JUCE documentation: dsp::FilterDesign. https://docs.juce.com/master/structdsp_1_1FilterDesign.html
[4] O. Hinton, "Design of FIR Filters," University of Newcastle upon Tyne, Ch. 4. https://www.staff.ncl.ac.uk/oliver.hinton/eee305/Chapter4.pdf
[5] "Blackman Window – Overview," ScienceDirect Topics. https://www.sciencedirect.com/topics/engineering/blackman-window
[6] JUCE Forum, "When to use ScopedNoDenormals and when to not?" https://forum.juce.com/t/when-to-use-scopednodenormals-and-when-to-not/37112
[7] J. O. Smith, "Perfect Reconstruction Filter Banks," Spectral Audio Signal Processing. https://www.dsprelated.com/freebooks/sasp/Perfect_Reconstruction_Filter_Banks.html
[8] Rane Corporation, "Linkwitz-Riley Crossovers: A Primer." https://www.ranecommercial.com/legacy/note160.html
[9] JUCE Forum, "How do I implement a non-uniform partitioned convolution without extreme CPU usage?" https://forum.juce.com/t/how-do-i-implement-a-non-uniform-partitioned-convolution-without-extreme-cpu-usage/66203
[10] JUCE Forum, "Building high pass filter using low pass (FIR)." https://forum.juce.com/t/building-high-pass-filter-using-low-pass-fir/29986
[11] JUCE documentation: dsp::Oversampling class reference. https://docs.juce.com/master/classdsp_1_1Oversampling.html

0
Subscribe to my newsletter

Read articles from Abhishek Shivakumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abhishek Shivakumar
Abhishek Shivakumar