Adaptive Compression for Cache Systems with Hexagonal Architecture

Any API at some significant scale adopts some form of caching for optimisation by reducing database load and speeding up response times. However, as applications scale, the size and variety of cached data can have some challenges:
Large objects consume excessive memory in the cache
Network bandwidth becomes a bottleneck for data transfer
Cache costs increase with data size
As there is no one solution to cover all, I had an idea:
Design a caching system that could efficiently handle diverse data sizes and types
Create a compression strategy that adapts to different data characteristics automatically
Ensure the solution works across different caching services (Redis, Memcached...)
Guarantee that compression benefits outweigh its processing costs
With hexagonal architecture, I can decouple the caching logic from specific providers:
// Sample architecture overview
interface CachePort {
get(key: string): Promise<any>;
set(key: string, value: any, ttl?: number): Promise<boolean>;
getBatch(keys: string[]): Promise<any[]>;
setBatch(items: {key: string, value: any}[], ttl?: number): Promise<boolean[]>;
}
class RedisAdapter implements CachePort { /* ... */ }
class MemcachedAdapter implements CachePort { /* ... */ }
class OtherAdapter implements CachePort { /* ... */ }
This architecture enabled compression implementation at the port level, ensuring it was available regardless of the cache provider used.
The adaptable compression module is based on the following:
Data size (small objects aren't compressed)
Operation type (single vs batch operations)
Content type (some data compresses better than others)
The compression implementation looks like this:
import * as zlib from "zlib";
export enum CompressionType {
BROTLI = "br",
GZIP = "gzip",
NONE = "none",
}
// Compression markers (first byte of the compressed data)
const COMPRESSION_MARKER = {
NONE: 0,
BROTLI: 1,
GZIP: 2,
};
export const compressionStats = {
enabled: process.env.ENV !== "prod", // Disable in production by default
totalBytesInput: 0,
totalBytesOutput: 0,
totalSaved: 0,
compressionRatio: 0,
itemsCompressed: 0,
reset: function () {
this.totalBytesInput = 0;
this.totalBytesOutput = 0;
this.totalSaved = 0;
this.itemsCompressed = 0;
this.compressionRatio = 0;
},
};
export function compress(value: string, options?: { preferredCompression?: CompressionType }): Uint8Array {
const preferredCompression = options?.preferredCompression || CompressionType.GZIP;
let compressedData: Uint8Array;
let marker: number;
const originalSize = value.length;
switch (preferredCompression) {
case CompressionType.BROTLI:
console.debug(`Brotli compression: ${originalSize} bytes input`);
compressedData = zlib.brotliCompressSync(Buffer.from(value));
marker = COMPRESSION_MARKER.BROTLI;
break;
case CompressionType.GZIP:
console.debug(`GZIP compression: ${originalSize} bytes input`);
compressedData = zlib.gzipSync(Buffer.from(value));
marker = COMPRESSION_MARKER.GZIP;
break;
default:
// NONE type - no actual compression, add format marker
console.debug(`No compression: ${originalSize} bytes`);
compressedData = Buffer.from(value);
marker = COMPRESSION_MARKER.NONE;
}
// Create a new buffer with marker byte at the beginning
const result = new Uint8Array(compressedData.length + 1);
result[0] = marker;
result.set(compressedData, 1);
// Update stats only if enabled (conditional)
if (compressionStats.enabled) {
compressionStats.totalBytesInput += originalSize;
compressionStats.totalBytesOutput += result.length;
compressionStats.totalSaved += originalSize - result.length;
compressionStats.itemsCompressed += 1;
compressionStats.compressionRatio = compressionStats.totalBytesOutput / compressionStats.totalBytesInput;
// Periodic reset to avoid potential memory issues in long-running processes
if (compressionStats.itemsCompressed > 1000000) {
// Reset after 1 million items
compressionStats.reset();
}
}
console.debug(
`Compression: ${originalSize} → ${result.length} bytes (${Math.round((result.length / originalSize) * 100)}%)`,
);
return result;
}
export function decompress(data: Uint8Array | string): string {
if (typeof data === "string") {
return data; // If it's a string, assume it's uncompressed
}
if (data.length === 0) {
return ""; // Empty data
}
const marker = data[0];
const compressedData = data.subarray(1); // Remove the marker byte
try {
let result: string;
switch (marker) {
case COMPRESSION_MARKER.BROTLI:
result = zlib.brotliDecompressSync(compressedData).toString();
break;
case COMPRESSION_MARKER.GZIP:
result = zlib.gunzipSync(compressedData).toString();
break;
case COMPRESSION_MARKER.NONE:
result = Buffer.from(compressedData).toString();
break;
default:
// For backwards compatibility and resiliency
return Buffer.from(data).toString();
}
return result;
} catch (error) {
console.error(`Decompression error with marker ${marker}: ${error}`);
// Last resort fallback
return Buffer.from(compressedData).toString();
}
}
The key aspect of this compression module is making the data "self-describing." By embedding a single-byte marker at the beginning of each compressed payload, the data contains information about how it was compressed, allowing me to determine which decompression algorithm to use. Uncompressed data can flow through the same pipeline as compressed data, and I can interchange my preferred compression at any moment. Another nice feature of this module is that I can add a new compression algorithm anytime. If, in the future, Node supports, for example, Zstd, I will add a new marker value. The marker-based compression system represents a simple and extendable pattern that brings significant power to data compression.
The cache client uses the compression module with adaptive logic, and it can look like this:
class CacheClient {
private cacheAdapter: CachePort;
constructor(adapter: CachePort, options: Options) {
this.cacheAdapter = adapter;
this.compressionThreshold = options.compressionThreshold || 2048; // 2KB default
this.preferredCompression = options.preferredCompression || "GZIP";
this.batchSize = options.batchSize || 100;
}
// Single item storage with adaptive compression
async set(key, value, ttl) {
const stringValue = JSON.stringify(value);
let compressedValue;
// Apply adaptive compression logic based on data size
if (stringValue.length < this.compressionThreshold) {
// Small values skip compression - not worth the CPU cost
compressedValue = value;
} else if (stringValue.length > 50 * 1024) {
// Large values (>50KB) use the configured preferred encoding
compressedValue = {
compressed: true,
data: compress(stringValue, { encoding: this.preferredCompression })
};
} else {
// Medium values always use GZIP for better speed/ratio balance
compressedValue = {
compressed: true,
data: compress(stringValue, { encoding: "GZIP" })
};
}
return this.cacheAdapter.set(key, compressedValue, ttl);
}
// Retrieval with automatic decompression
async get(key) {
const result = await this.cacheAdapter.get(key);
if (!result) return null;
// Handle compressed data
if (result.compressed && result.data) {
const decompressedValue = decompress(result.data);
return JSON.parse(decompressedValue);
}
return result;
}
// Batch storage with adaptive logic
async setBatch(items, ttl) {
const batchSize = items.length;
const processedItems = [];
for (const item of items) {
const stringValue = JSON.stringify(item.value);
// Adjust threshold based on batch size
// As batch size increases, compression threshold decreases
const effectiveThreshold = Math.max(1024, this.compressionThreshold / Math.sqrt(batchSize));
// Determine best compression algorithm for this batch item
let compressionType;
if (stringValue.length < effectiveThreshold) {
// Skip compression for very small values
processedItems.push({ key: item.key, value: item.value });
continue;
} else if (batchSize >= 20 || stringValue.length > 10 * 1024) {
// Use Brotli for large batches or large individual items
compressionType = "BROTLI";
} else {
// Use GZIP for smaller batches with medium-sized items
compressionType = "GZIP";
}
processedItems.push({
key: item.key,
value: {
compressed: true,
data: compress(stringValue, { encoding: compressionType })
}
});
}
// Process in chunks to avoid overwhelming the cache service
const results = [];
for (let i = 0; i < processedItems.length; i += this.batchSize) {
const chunk = processedItems.slice(i, i + this.batchSize);
const chunkResults = await this.cacheAdapter.setBatch(chunk, ttl);
results.push(...chunkResults);
}
return results;
}
// Batch retrieval with automatic decompression
async getBatch(keys) {\
const results = [];
for (let i = 0; i < keys.length; i += this.batchSize) {
const chunk = keys.slice(i, i + this.batchSize);
const chunkResults = await this.cacheAdapter.getBatch(chunk);
for (const result of chunkResults) {
if (!result) {
results.push(null);
} else if (result.compressed && result.data) {
// Decompress and parse
const decompressedValue = decompress(result.data);
results.push(JSON.parse(decompressedValue));
} else {
results.push(result);
}
}
}
return results;
}
}
The setBatch
operation adjusts the compression threshold based on batch size. As the batch size increases, I lower the size threshold at which compression activates. With a default threshold of 2KB:
Single operation: 2048 bytes
Batch of 4 items: ~1024 bytes
Batch of 25 items: ~409 bytes
It determines which compression algorithm to use, such as Brotli for large batches or large individual items and GZIP for smaller batches with medium-sized items. Additionally, chunking is utilised to prevent overwhelming the cache service.
I have run multiple tests:
Tests both compression types (GZIP, Brotli) and uncompressed operations
Measures performance metrics across different operations and data sizes
I have noticed the following:
Compression ratios: Both GZIP and Brotli achieve excellent compression (≈70-99%)
Single operations: GZIP is faster for SET operations (≈100ms vs 217ms for Brotli)
Batch operations: Brotli provides better read performance (≈18% faster than GZIP)
Memory efficiency: Compressed storage uses only ≈27-31% of the original size
If we are talking about the size, my conclusion is:
Small values (<2KB): Not compressed, minimal overhead applied
Medium values (2KB-50KB): Compressed with GZIP for better speed
Large values (>50KB): Compressed with configured algorithm
Small batches (<20 items): Use GZIP for better speed
Large batches (≥20 items): Use Brotli for better compression
The compression ratio is the following:
Data Type | Original Size | GZIP Size | Brotli Size | Compression Ratio |
Medium JSON | 4KB | 1.2KB | 1.1KB | 73-75% |
Large batch | 5KB per item | 55 bytes | 29 bytes | 99.4-99.5% |
String data | 2KB | 46 bytes | 27 bytes | 98-99% |
Large object | 100KB | 149 bytes | - | 99.9% |
Compression decisions are made at write time only (set
or setBatch
). This means:
If I
set
a 1KB item individually (uncompressed), then later include it in agetBatch
operation, it remains uncompressed.If I
setBatch
25 items of 1KB each (compressed due to batch size), then laterget
a single item, it will be automatically decompressed.The client automatically handles decompression regardless of which operation was used to retrieve the data.
Some scenarios while running tests:
Scenario | What Happens | Algorithm | Reason | Notes |
10 items × 5KB each | All compressed individually | GZIP | Each item > 2KB, batch size < 20 | Each 5KB item → 55 bytes (99% reduction) |
25 items × 5KB each | All compressed individually | Brotli | Each item > 2KB, batch size ≥ 20 | Each 5KB item → 29 bytes (99.4% reduction) |
5 items × 15KB each | All compressed individually | Brotli | Individual items > 10KB | Better compression for large values |
25 items × 1KB each | All compressed individually | Brotli | Threshold adjusted for large batches | Items below standard threshold but compressed due to batch size |
10 items × 1KB each | May be compressed | GZIP | Effective threshold becomes ~1024 bytes | Compression depends on exact size vs threshold |
25 items × 1KB each | All compressed | Brotli | Large batch lowers threshold to ~1024 bytes | Demonstrates adaptive threshold adjustment |
10 items × 2KB each | All compressed | GZIP | At standard threshold, batch size < 20 | Standard compression behavior |
25 items × 2KB each | All compressed | Brotli | Above threshold, batch size ≥ 20 | Algorithm selection based on batch size |
To make it short in terms of performance, I have found the following:
GZIP is consistently faster for medium and large data sizes • GZIP is approximately 37.9% faster than Brotli for larger data
Decompression speeds are comparable between GZIP and Brotli (< 5ms difference)
Data Size | GZIP Compress | Brotli Compress | GZIP Decompress | Brotli Decompress | Winner |
Small (10240 bytes) | 101.10ms | 99.77ms | 21.53ms | 23.65ms | Brotli |
Medium (102400 bytes) | 22.77ms | 36.24ms | 21.85ms | 21.05ms | GZIP |
Large (512000 bytes) | 23.67ms | 38.63ms | 22.44ms | 24.47ms | GZIP |
I did some calculations (I hope they are correct), and I think I can achieve the following goals:
Reduced Costs: decrease cache storage costs by 78% by compressing large objects
Improved Performance: API response times improved by 32% for operations involving large cached objects
Increased Cache Hit Ratio: More data could fit in the same cache size, increasing our hit ratio
Cache Provider Independence: I could use different cache providers without changing the code
I have also found out some interesting facts:
Use GZIP for frequent write operations with medium/large data (faster compression)
Consider Brotli for read-heavy workloads where the compression ratio is critical
Don't compress small objects under 2KB, as the overhead isn't worth it
For batch operations, adjusting compression thresholds based on batch size is beneficial
There are some trade-offs, as usual, to consider:
CPU/Memory Usage: Compression/decompression increased CPU/Memory utilisation, and I could see <5%, but you know it could be important
Complex Logic: The adaptive rules add complexity to the caching layer
Debugging Challenges: Compressed data is not readable for debugging.
Subscribe to my newsletter
Read articles from Daniele Frasca directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Daniele Frasca
Daniele Frasca
I am an AWS serverless community builder. I worked around Europe in public and private sector projects, and I have also been around for a while when Agile or Unit Tests were not a word. I use this blog to share my ideas and create collaboration. Feel free to contact me anytime.