Adaptive Compression for Cache Systems with Hexagonal Architecture

Daniele FrascaDaniele Frasca
10 min read

Any API at some significant scale adopts some form of caching for optimisation by reducing database load and speeding up response times. However, as applications scale, the size and variety of cached data can have some challenges:

  • Large objects consume excessive memory in the cache

  • Network bandwidth becomes a bottleneck for data transfer

  • Cache costs increase with data size

As there is no one solution to cover all, I had an idea:

  1. Design a caching system that could efficiently handle diverse data sizes and types

  2. Create a compression strategy that adapts to different data characteristics automatically

  3. Ensure the solution works across different caching services (Redis, Memcached...)

  4. Guarantee that compression benefits outweigh its processing costs

With hexagonal architecture, I can decouple the caching logic from specific providers:

// Sample architecture overview
interface CachePort {
  get(key: string): Promise<any>;
  set(key: string, value: any, ttl?: number): Promise<boolean>;
  getBatch(keys: string[]): Promise<any[]>;
  setBatch(items: {key: string, value: any}[], ttl?: number): Promise<boolean[]>;
}

class RedisAdapter implements CachePort { /* ... */ }
class MemcachedAdapter implements CachePort { /* ... */ }
class OtherAdapter implements CachePort { /* ... */ }

This architecture enabled compression implementation at the port level, ensuring it was available regardless of the cache provider used.

The adaptable compression module is based on the following:

  1. Data size (small objects aren't compressed)

  2. Operation type (single vs batch operations)

  3. Content type (some data compresses better than others)

The compression implementation looks like this:

import * as zlib from "zlib";

export enum CompressionType {
  BROTLI = "br",
  GZIP = "gzip",
  NONE = "none",
}

// Compression markers (first byte of the compressed data)
const COMPRESSION_MARKER = {
  NONE: 0,
  BROTLI: 1,
  GZIP: 2,
};

export const compressionStats = {
  enabled: process.env.ENV !== "prod", // Disable in production by default
  totalBytesInput: 0,
  totalBytesOutput: 0,
  totalSaved: 0,
  compressionRatio: 0,
  itemsCompressed: 0,
  reset: function () {
    this.totalBytesInput = 0;
    this.totalBytesOutput = 0;
    this.totalSaved = 0;
    this.itemsCompressed = 0;
    this.compressionRatio = 0;
  },
};

export function compress(value: string, options?: { preferredCompression?: CompressionType }): Uint8Array {
  const preferredCompression = options?.preferredCompression || CompressionType.GZIP;
  let compressedData: Uint8Array;
  let marker: number;
  const originalSize = value.length;

  switch (preferredCompression) {
    case CompressionType.BROTLI:
      console.debug(`Brotli compression: ${originalSize} bytes input`);
      compressedData = zlib.brotliCompressSync(Buffer.from(value));
      marker = COMPRESSION_MARKER.BROTLI;
      break;

    case CompressionType.GZIP:
      console.debug(`GZIP compression: ${originalSize} bytes input`);
      compressedData = zlib.gzipSync(Buffer.from(value));
      marker = COMPRESSION_MARKER.GZIP;
      break;

    default:
      // NONE type - no actual compression, add format marker
      console.debug(`No compression: ${originalSize} bytes`);
      compressedData = Buffer.from(value);
      marker = COMPRESSION_MARKER.NONE;
  }

  // Create a new buffer with marker byte at the beginning
  const result = new Uint8Array(compressedData.length + 1);
  result[0] = marker;
  result.set(compressedData, 1);

  // Update stats only if enabled (conditional)
  if (compressionStats.enabled) {
    compressionStats.totalBytesInput += originalSize;
    compressionStats.totalBytesOutput += result.length;
    compressionStats.totalSaved += originalSize - result.length;
    compressionStats.itemsCompressed += 1;
    compressionStats.compressionRatio = compressionStats.totalBytesOutput / compressionStats.totalBytesInput;

    // Periodic reset to avoid potential memory issues in long-running processes
    if (compressionStats.itemsCompressed > 1000000) {
      // Reset after 1 million items
      compressionStats.reset();
    }
  }

  console.debug(
    `Compression: ${originalSize}${result.length} bytes (${Math.round((result.length / originalSize) * 100)}%)`,
  );

  return result;
}

export function decompress(data: Uint8Array | string): string {
  if (typeof data === "string") {
    return data; // If it's a string, assume it's uncompressed
  }

  if (data.length === 0) {
    return ""; // Empty data
  }

  const marker = data[0];
  const compressedData = data.subarray(1); // Remove the marker byte

  try {
    let result: string;

    switch (marker) {
      case COMPRESSION_MARKER.BROTLI:
        result = zlib.brotliDecompressSync(compressedData).toString();
        break;

      case COMPRESSION_MARKER.GZIP:
        result = zlib.gunzipSync(compressedData).toString();
        break;

      case COMPRESSION_MARKER.NONE:
        result = Buffer.from(compressedData).toString();
        break;

      default:
        // For backwards compatibility and resiliency
        return Buffer.from(data).toString();
    }

    return result;
  } catch (error) {
    console.error(`Decompression error with marker ${marker}: ${error}`);
    // Last resort fallback
    return Buffer.from(compressedData).toString();
  }
}

The key aspect of this compression module is making the data "self-describing." By embedding a single-byte marker at the beginning of each compressed payload, the data contains information about how it was compressed, allowing me to determine which decompression algorithm to use. Uncompressed data can flow through the same pipeline as compressed data, and I can interchange my preferred compression at any moment. Another nice feature of this module is that I can add a new compression algorithm anytime. If, in the future, Node supports, for example, Zstd, I will add a new marker value. The marker-based compression system represents a simple and extendable pattern that brings significant power to data compression.

The cache client uses the compression module with adaptive logic, and it can look like this:

class CacheClient {
  private cacheAdapter: CachePort;

constructor(adapter: CachePort, options: Options)  {
    this.cacheAdapter = adapter;
    this.compressionThreshold = options.compressionThreshold || 2048; // 2KB default
    this.preferredCompression = options.preferredCompression || "GZIP";
    this.batchSize = options.batchSize || 100;
  }
// Single item storage with adaptive compression
  async set(key, value, ttl) {
    const stringValue = JSON.stringify(value);
    let compressedValue;

    // Apply adaptive compression logic based on data size
    if (stringValue.length < this.compressionThreshold) {
      // Small values skip compression - not worth the CPU cost
      compressedValue = value;
    } else if (stringValue.length > 50 * 1024) {
      // Large values (>50KB) use the configured preferred encoding
      compressedValue = {
        compressed: true,
        data: compress(stringValue, { encoding: this.preferredCompression })
      };
    } else {
      // Medium values always use GZIP for better speed/ratio balance
      compressedValue = {
        compressed: true,
        data: compress(stringValue, { encoding: "GZIP" })
      };
    }

    return this.cacheAdapter.set(key, compressedValue, ttl);
  }

  // Retrieval with automatic decompression
  async get(key) {
    const result = await this.cacheAdapter.get(key);

    if (!result) return null;

    // Handle compressed data
    if (result.compressed && result.data) {
      const decompressedValue = decompress(result.data);
      return JSON.parse(decompressedValue);
    }

    return result;
  }

  // Batch storage with adaptive logic
  async setBatch(items, ttl) {
    const batchSize = items.length;
    const processedItems = [];

    for (const item of items) {
      const stringValue = JSON.stringify(item.value);

      // Adjust threshold based on batch size
      // As batch size increases, compression threshold decreases
      const effectiveThreshold = Math.max(1024, this.compressionThreshold / Math.sqrt(batchSize));

      // Determine best compression algorithm for this batch item
      let compressionType;
      if (stringValue.length < effectiveThreshold) {
        // Skip compression for very small values
        processedItems.push({ key: item.key, value: item.value });
        continue;
      } else if (batchSize >= 20 || stringValue.length > 10 * 1024) {
        // Use Brotli for large batches or large individual items
        compressionType = "BROTLI";
      } else {
        // Use GZIP for smaller batches with medium-sized items
        compressionType = "GZIP";
      }

      processedItems.push({
        key: item.key,
        value: {
          compressed: true,
          data: compress(stringValue, { encoding: compressionType })
        }
      });
    }

    // Process in chunks to avoid overwhelming the cache service
    const results = [];
    for (let i = 0; i < processedItems.length; i += this.batchSize) {
      const chunk = processedItems.slice(i, i + this.batchSize);
      const chunkResults = await this.cacheAdapter.setBatch(chunk, ttl);
      results.push(...chunkResults);
    }

    return results;
  }

  // Batch retrieval with automatic decompression
  async getBatch(keys) {\
    const results = [];

    for (let i = 0; i < keys.length; i += this.batchSize) {
      const chunk = keys.slice(i, i + this.batchSize);
      const chunkResults = await this.cacheAdapter.getBatch(chunk);

      for (const result of chunkResults) {
        if (!result) {
          results.push(null);
        } else if (result.compressed && result.data) {
          // Decompress and parse
          const decompressedValue = decompress(result.data);
          results.push(JSON.parse(decompressedValue));
        } else {
          results.push(result);
        }
      }
    }

    return results;
  }
}

The setBatch operation adjusts the compression threshold based on batch size. As the batch size increases, I lower the size threshold at which compression activates. With a default threshold of 2KB:

  • Single operation: 2048 bytes

  • Batch of 4 items: ~1024 bytes

  • Batch of 25 items: ~409 bytes

It determines which compression algorithm to use, such as Brotli for large batches or large individual items and GZIP for smaller batches with medium-sized items. Additionally, chunking is utilised to prevent overwhelming the cache service.

I have run multiple tests:

  • Tests both compression types (GZIP, Brotli) and uncompressed operations

  • Measures performance metrics across different operations and data sizes

I have noticed the following:

  • Compression ratios: Both GZIP and Brotli achieve excellent compression (≈70-99%)

  • Single operations: GZIP is faster for SET operations (≈100ms vs 217ms for Brotli)

  • Batch operations: Brotli provides better read performance (≈18% faster than GZIP)

  • Memory efficiency: Compressed storage uses only ≈27-31% of the original size

If we are talking about the size, my conclusion is:

  • Small values (<2KB): Not compressed, minimal overhead applied

  • Medium values (2KB-50KB): Compressed with GZIP for better speed

  • Large values (>50KB): Compressed with configured algorithm

  • Small batches (<20 items): Use GZIP for better speed

  • Large batches (≥20 items): Use Brotli for better compression

The compression ratio is the following:

Data TypeOriginal SizeGZIP SizeBrotli SizeCompression Ratio
Medium JSON4KB1.2KB1.1KB73-75%
Large batch5KB per item55 bytes29 bytes99.4-99.5%
String data2KB46 bytes27 bytes98-99%
Large object100KB149 bytes-99.9%

Compression decisions are made at write time only (set or setBatch). This means:

  • If I set a 1KB item individually (uncompressed), then later include it in a getBatch operation, it remains uncompressed.

  • If I setBatch 25 items of 1KB each (compressed due to batch size), then later get a single item, it will be automatically decompressed.

  • The client automatically handles decompression regardless of which operation was used to retrieve the data.

Some scenarios while running tests:

ScenarioWhat HappensAlgorithmReasonNotes
10 items × 5KB eachAll compressed individuallyGZIPEach item > 2KB, batch size < 20Each 5KB item → 55 bytes (99% reduction)
25 items × 5KB eachAll compressed individuallyBrotliEach item > 2KB, batch size ≥ 20Each 5KB item → 29 bytes (99.4% reduction)
5 items × 15KB eachAll compressed individuallyBrotliIndividual items > 10KBBetter compression for large values
25 items × 1KB eachAll compressed individuallyBrotliThreshold adjusted for large batchesItems below standard threshold but compressed due to batch size
10 items × 1KB eachMay be compressedGZIPEffective threshold becomes ~1024 bytesCompression depends on exact size vs threshold
25 items × 1KB eachAll compressedBrotliLarge batch lowers threshold to ~1024 bytesDemonstrates adaptive threshold adjustment
10 items × 2KB eachAll compressedGZIPAt standard threshold, batch size < 20Standard compression behavior
25 items × 2KB eachAll compressedBrotliAbove threshold, batch size ≥ 20Algorithm selection based on batch size

To make it short in terms of performance, I have found the following:

  • GZIP is consistently faster for medium and large data sizes • GZIP is approximately 37.9% faster than Brotli for larger data

  • Decompression speeds are comparable between GZIP and Brotli (< 5ms difference)

Data SizeGZIP CompressBrotli CompressGZIP DecompressBrotli DecompressWinner
Small (10240 bytes)101.10ms99.77ms21.53ms23.65msBrotli
Medium (102400 bytes)22.77ms36.24ms21.85ms21.05msGZIP
Large (512000 bytes)23.67ms38.63ms22.44ms24.47msGZIP

I did some calculations (I hope they are correct), and I think I can achieve the following goals:

  1. Reduced Costs: decrease cache storage costs by 78% by compressing large objects

  2. Improved Performance: API response times improved by 32% for operations involving large cached objects

  3. Increased Cache Hit Ratio: More data could fit in the same cache size, increasing our hit ratio

  4. Cache Provider Independence: I could use different cache providers without changing the code

I have also found out some interesting facts:

  • Use GZIP for frequent write operations with medium/large data (faster compression)

  • Consider Brotli for read-heavy workloads where the compression ratio is critical

  • Don't compress small objects under 2KB, as the overhead isn't worth it

  • For batch operations, adjusting compression thresholds based on batch size is beneficial

There are some trade-offs, as usual, to consider:

  1. CPU/Memory Usage: Compression/decompression increased CPU/Memory utilisation, and I could see <5%, but you know it could be important

  2. Complex Logic: The adaptive rules add complexity to the caching layer

  3. Debugging Challenges: Compressed data is not readable for debugging.

0
Subscribe to my newsletter

Read articles from Daniele Frasca directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Daniele Frasca
Daniele Frasca

I am an AWS serverless community builder. I worked around Europe in public and private sector projects, and I have also been around for a while when Agile or Unit Tests were not a word. I use this blog to share my ideas and create collaboration. Feel free to contact me anytime.