Any API at some significant scale adopts some form of caching for optimisation by reducing database load and speeding up response times. However, as applications scale, the size and variety of cached data can have some challenges:

Large objects consume excessive memory in the cache
Network bandwidth becomes a bottleneck for data transfer
Cache costs increase with data size

As there is no one solution to cover all, I had an idea:

Design a caching system that could efficiently handle diverse data sizes and types
Create a compression strategy that adapts to different data characteristics automatically
Ensure the solution works across different caching services (Redis, Memcached...)
Guarantee that compression benefits outweigh its processing costs

With hexagonal architecture, I can decouple the caching logic from specific providers:

// Sample architecture overview
interface CachePort {
  get(key: string): Promise<any>;
  set(key: string, value: any, ttl?: number): Promise<boolean>;
  getBatch(keys: string[]): Promise<any[]>;
  setBatch(items: {key: string, value: any}[], ttl?: number): Promise<boolean[]>;
}

class RedisAdapter implements CachePort { /* ... */ }
class MemcachedAdapter implements CachePort { /* ... */ }
class OtherAdapter implements CachePort { /* ... */ }

This architecture enabled compression implementation at the port level, ensuring it was available regardless of the cache provider used.

The adaptable compression module is based on the following:

Data size (small objects aren't compressed)
Operation type (single vs batch operations)
Content type (some data compresses better than others)

The compression implementation looks like this:

import * as zlib from "zlib";

export enum CompressionType {
  BROTLI = "br",
  GZIP = "gzip",
  NONE = "none",
}

// Compression markers (first byte of the compressed data)
const COMPRESSION_MARKER = {
  NONE: 0,
  BROTLI: 1,
  GZIP: 2,
};

export const compressionStats = {
  enabled: process.env.ENV !== "prod", // Disable in production by default
  totalBytesInput: 0,
  totalBytesOutput: 0,
  totalSaved: 0,
  compressionRatio: 0,
  itemsCompressed: 0,
  reset: function () {
    this.totalBytesInput = 0;
    this.totalBytesOutput = 0;
    this.totalSaved = 0;
    this.itemsCompressed = 0;
    this.compressionRatio = 0;
  },
};

export function compress(value: string, options?: { preferredCompression?: CompressionType }): Uint8Array {
  const preferredCompression = options?.preferredCompression || CompressionType.GZIP;
  let compressedData: Uint8Array;
  let marker: number;
  const originalSize = value.length;

  switch (preferredCompression) {
    case CompressionType.BROTLI:
      console.debug(`Brotli compression: ${originalSize} bytes input`);
      compressedData = zlib.brotliCompressSync(Buffer.from(value));
      marker = COMPRESSION_MARKER.BROTLI;
      break;

    case CompressionType.GZIP:
      console.debug(`GZIP compression: ${originalSize} bytes input`);
      compressedData = zlib.gzipSync(Buffer.from(value));
      marker = COMPRESSION_MARKER.GZIP;
      break;

    default:
      // NONE type - no actual compression, add format marker
      console.debug(`No compression: ${originalSize} bytes`);
      compressedData = Buffer.from(value);
      marker = COMPRESSION_MARKER.NONE;
  }

  // Create a new buffer with marker byte at the beginning
  const result = new Uint8Array(compressedData.length + 1);
  result[0] = marker;
  result.set(compressedData, 1);

  // Update stats only if enabled (conditional)
  if (compressionStats.enabled) {
    compressionStats.totalBytesInput += originalSize;
    compressionStats.totalBytesOutput += result.length;
    compressionStats.totalSaved += originalSize - result.length;
    compressionStats.itemsCompressed += 1;
    compressionStats.compressionRatio = compressionStats.totalBytesOutput / compressionStats.totalBytesInput;

    // Periodic reset to avoid potential memory issues in long-running processes
    if (compressionStats.itemsCompressed > 1000000) {
      // Reset after 1 million items
      compressionStats.reset();
    }
  }

  console.debug(
    `Compression: ${originalSize} → ${result.length} bytes (${Math.round((result.length / originalSize) * 100)}%)`,
  );

  return result;
}

export function decompress(data: Uint8Array | string): string {
  if (typeof data === "string") {
    return data; // If it's a string, assume it's uncompressed
  }

  if (data.length === 0) {
    return ""; // Empty data
  }

  const marker = data[0];
  const compressedData = data.subarray(1); // Remove the marker byte

  try {
    let result: string;

    switch (marker) {
      case COMPRESSION_MARKER.BROTLI:
        result = zlib.brotliDecompressSync(compressedData).toString();
        break;

      case COMPRESSION_MARKER.GZIP:
        result = zlib.gunzipSync(compressedData).toString();
        break;

      case COMPRESSION_MARKER.NONE:
        result = Buffer.from(compressedData).toString();
        break;

      default:
        // For backwards compatibility and resiliency
        return Buffer.from(data).toString();
    }

    return result;
  } catch (error) {
    console.error(`Decompression error with marker ${marker}: ${error}`);
    // Last resort fallback
    return Buffer.from(compressedData).toString();
  }
}

The key aspect of this compression module is making the data "self-describing." By embedding a single-byte marker at the beginning of each compressed payload, the data contains information about how it was compressed, allowing me to determine which decompression algorithm to use. Uncompressed data can flow through the same pipeline as compressed data, and I can interchange my preferred compression at any moment. Another nice feature of this module is that I can add a new compression algorithm anytime. If, in the future, Node supports, for example, Zstd, I will add a new marker value. The marker-based compression system represents a simple and extendable pattern that brings significant power to data compression.

The cache client uses the compression module with adaptive logic, and it can look like this:

class CacheClient {
  private cacheAdapter: CachePort;

constructor(adapter: CachePort, options: Options)  {
    this.cacheAdapter = adapter;
    this.compressionThreshold = options.compressionThreshold || 2048; // 2KB default
    this.preferredCompression = options.preferredCompression || "GZIP";
    this.batchSize = options.batchSize || 100;
  }
// Single item storage with adaptive compression
  async set(key, value, ttl) {
    const stringValue = JSON.stringify(value);
    let compressedValue;

    // Apply adaptive compression logic based on data size
    if (stringValue.length < this.compressionThreshold) {
      // Small values skip compression - not worth the CPU cost
      compressedValue = value;
    } else if (stringValue.length > 50 * 1024) {
      // Large values (>50KB) use the configured preferred encoding
      compressedValue = {
        compressed: true,
        data: compress(stringValue, { encoding: this.preferredCompression })
      };
    } else {
      // Medium values always use GZIP for better speed/ratio balance
      compressedValue = {
        compressed: true,
        data: compress(stringValue, { encoding: "GZIP" })
      };
    }

    return this.cacheAdapter.set(key, compressedValue, ttl);
  }

  // Retrieval with automatic decompression
  async get(key) {
    const result = await this.cacheAdapter.get(key);

    if (!result) return null;

    // Handle compressed data
    if (result.compressed && result.data) {
      const decompressedValue = decompress(result.data);
      return JSON.parse(decompressedValue);
    }

    return result;
  }

  // Batch storage with adaptive logic
  async setBatch(items, ttl) {
    const batchSize = items.length;
    const processedItems = [];

    for (const item of items) {
      const stringValue = JSON.stringify(item.value);

      // Adjust threshold based on batch size
      // As batch size increases, compression threshold decreases
      const effectiveThreshold = Math.max(1024, this.compressionThreshold / Math.sqrt(batchSize));

      // Determine best compression algorithm for this batch item
      let compressionType;
      if (stringValue.length < effectiveThreshold) {
        // Skip compression for very small values
        processedItems.push({ key: item.key, value: item.value });
        continue;
      } else if (batchSize >= 20 || stringValue.length > 10 * 1024) {
        // Use Brotli for large batches or large individual items
        compressionType = "BROTLI";
      } else {
        // Use GZIP for smaller batches with medium-sized items
        compressionType = "GZIP";
      }

      processedItems.push({
        key: item.key,
        value: {
          compressed: true,
          data: compress(stringValue, { encoding: compressionType })
        }
      });
    }

    // Process in chunks to avoid overwhelming the cache service
    const results = [];
    for (let i = 0; i < processedItems.length; i += this.batchSize) {
      const chunk = processedItems.slice(i, i + this.batchSize);
      const chunkResults = await this.cacheAdapter.setBatch(chunk, ttl);
      results.push(...chunkResults);
    }

    return results;
  }

  // Batch retrieval with automatic decompression
  async getBatch(keys) {\
    const results = [];

    for (let i = 0; i < keys.length; i += this.batchSize) {
      const chunk = keys.slice(i, i + this.batchSize);
      const chunkResults = await this.cacheAdapter.getBatch(chunk);

      for (const result of chunkResults) {
        if (!result) {
          results.push(null);
        } else if (result.compressed && result.data) {
          // Decompress and parse
          const decompressedValue = decompress(result.data);
          results.push(JSON.parse(decompressedValue));
        } else {
          results.push(result);
        }
      }
    }

    return results;
  }
}

The setBatch operation adjusts the compression threshold based on batch size. As the batch size increases, I lower the size threshold at which compression activates. With a default threshold of 2KB:

Single operation: 2048 bytes
Batch of 4 items: ~1024 bytes
Batch of 25 items: ~409 bytes

It determines which compression algorithm to use, such as Brotli for large batches or large individual items and GZIP for smaller batches with medium-sized items. Additionally, chunking is utilised to prevent overwhelming the cache service.

I have run multiple tests:

Tests both compression types (GZIP, Brotli) and uncompressed operations
Measures performance metrics across different operations and data sizes

I have noticed the following:

Compression ratios: Both GZIP and Brotli achieve excellent compression (≈70-99%)
Single operations: GZIP is faster for SET operations (≈100ms vs 217ms for Brotli)
Batch operations: Brotli provides better read performance (≈18% faster than GZIP)
Memory efficiency: Compressed storage uses only ≈27-31% of the original size

If we are talking about the size, my conclusion is:

Small values (<2KB): Not compressed, minimal overhead applied
Medium values (2KB-50KB): Compressed with GZIP for better speed
Large values (>50KB): Compressed with configured algorithm
Small batches (<20 items): Use GZIP for better speed
Large batches (≥20 items): Use Brotli for better compression

The compression ratio is the following:

Data Type	Original Size	GZIP Size	Brotli Size	Compression Ratio
Medium JSON	4KB	1.2KB	1.1KB	73-75%
Large batch	5KB per item	55 bytes	29 bytes	99.4-99.5%
String data	2KB	46 bytes	27 bytes	98-99%
Large object	100KB	149 bytes	-	99.9%

Compression decisions are made at write time only (set or setBatch). This means:

If I set a 1KB item individually (uncompressed), then later include it in a getBatch operation, it remains uncompressed.
If I setBatch 25 items of 1KB each (compressed due to batch size), then later get a single item, it will be automatically decompressed.
The client automatically handles decompression regardless of which operation was used to retrieve the data.

Some scenarios while running tests:

Scenario	What Happens	Algorithm	Reason	Notes
10 items × 5KB each	All compressed individually	GZIP	Each item > 2KB, batch size < 20	Each 5KB item → 55 bytes (99% reduction)
25 items × 5KB each	All compressed individually	Brotli	Each item > 2KB, batch size ≥ 20	Each 5KB item → 29 bytes (99.4% reduction)
5 items × 15KB each	All compressed individually	Brotli	Individual items > 10KB	Better compression for large values
25 items × 1KB each	All compressed individually	Brotli	Threshold adjusted for large batches	Items below standard threshold but compressed due to batch size
10 items × 1KB each	May be compressed	GZIP	Effective threshold becomes ~1024 bytes	Compression depends on exact size vs threshold
25 items × 1KB each	All compressed	Brotli	Large batch lowers threshold to ~1024 bytes	Demonstrates adaptive threshold adjustment
10 items × 2KB each	All compressed	GZIP	At standard threshold, batch size < 20	Standard compression behavior
25 items × 2KB each	All compressed	Brotli	Above threshold, batch size ≥ 20	Algorithm selection based on batch size

To make it short in terms of performance, I have found the following:

GZIP is consistently faster for medium and large data sizes • GZIP is approximately 37.9% faster than Brotli for larger data
Decompression speeds are comparable between GZIP and Brotli (< 5ms difference)

Data Size	GZIP Compress	Brotli Compress	GZIP Decompress	Brotli Decompress	Winner
Small (10240 bytes)	101.10ms	99.77ms	21.53ms	23.65ms	Brotli
Medium (102400 bytes)	22.77ms	36.24ms	21.85ms	21.05ms	GZIP
Large (512000 bytes)	23.67ms	38.63ms	22.44ms	24.47ms	GZIP

I did some calculations (I hope they are correct), and I think I can achieve the following goals:

Reduced Costs: decrease cache storage costs by 78% by compressing large objects
Improved Performance: API response times improved by 32% for operations involving large cached objects
Increased Cache Hit Ratio: More data could fit in the same cache size, increasing our hit ratio
Cache Provider Independence: I could use different cache providers without changing the code

I have also found out some interesting facts:

Use GZIP for frequent write operations with medium/large data (faster compression)
Consider Brotli for read-heavy workloads where the compression ratio is critical
Don't compress small objects under 2KB, as the overhead isn't worth it
For batch operations, adjusting compression thresholds based on batch size is beneficial

There are some trade-offs, as usual, to consider:

CPU/Memory Usage: Compression/decompression increased CPU/Memory utilisation, and I could see <5%, but you know it could be important
Complex Logic: The adaptive rules add complexity to the caching layer
Debugging Challenges: Compressed data is not readable for debugging.

Adaptive Compression for Cache Systems with Hexagonal Architecture

Subscribe to my newsletter

Daniele Frasca

Daniele Frasca