π¨ From Pixels to Performance: Mastering Image Matrices, Compression & GPU Acceleration

Table of contents
- π Key Terms for Beginners
- π‘ Understanding Image Formats
- π Matrix vs File Size Example (250x250 RGBA)
- π· Grayscale Conversion: Concept
- π Python Example: Grayscale with PIL + NumPy
- π Apple Silicon GPU: Core Image Grayscale Example
- βοΈ NVIDIA CUDA Example: Grayscale in C++
- π Benchmarks (Approximate)
- π§ When to Use What?
- π§ Final Thoughts

Whether you're building a graphics editor, optimizing images for the web, or preprocessing data for machine learning, understanding how images are stored, compressed, and processed is essential. This blog dives deep from the basics of image matrices to advanced GPU-powered grayscale conversion using both Apple Silicon and NVIDIA CUDA.
π Key Terms for Beginners
What Is an Image Matrix?
An image is essentially a matrix (array) of pixel values:
Grayscale:
H x W
(1 channel: intensity)RGB:
H x W x 3
(Red, Green, Blue)RGBA:
H x W x 4
(RGB + Alpha for transparency)
What Is Compression?
Compression reduces file size by removing redundant or less important data.
Lossless: No data is lost. You can recover the exact original.
Lossy: Some data is discarded, prioritizing visual similarity over exact reconstruction.
What Is Transparency (Alpha Channel)?
An alpha channel defines pixel transparency. 0 is fully transparent, 255 is fully opaque.
π‘ Understanding Image Formats
BMP (Bitmap)
Raw pixel data, no compression
Very large file size
PNG (Portable Network Graphics)
Lossless compression using DEFLATE (zlib)
Supports transparency
JPEG (Joint Photographic Experts Group)
Lossy compression using DCT (Discrete Cosine Transform)
Ideal for photos, not UIs or text
WebP (by Google)
Supports both lossy and lossless compression
Supports transparency
Modern, web-optimized
Format | Lossless | Lossy | Alpha | Use Case |
BMP | β | β | β | Raw data, internal use |
PNG | β | β | β | UI, icons, screenshots |
JPG | β | β | β | Photos |
WebP (lossy) | β | β | β | Photos with transparency |
WebP (lossless) | β | β | β | Replacement for PNG |
π Matrix vs File Size Example (250x250 RGBA)
Format | Raw Matrix Size | Compressed Size (Approx) |
BMP | 250 KB | ~250 KB |
PNG | 250 KB | ~0.5β0.8 KB |
JPG | 250 KB | ~2 KB |
WebP Lossy | 250 KB | ~1β2 KB |
WebP Lossless | 250 KB | ~0.3β0.5 KB |
π· Grayscale Conversion: Concept
To convert RGB or RGBA to grayscale:
Gray = 0.299*R + 0.587*G + 0.114*B
If there's an alpha channel, we usually preserve it.
π Python Example: Grayscale with PIL + NumPy
from PIL import Image
import numpy as np
img = Image.open("input.png").convert("RGBA")
data = np.array(img)
# Extract RGB channels
r, g, b, a = data[:,:,0], data[:,:,1], data[:,:,2], data[:,:,3]
gray = (0.299 * r + 0.587 * g + 0.114 * b).astype(np.uint8)
# Combine grayscale with alpha
result = np.stack((gray, gray, gray, a), axis=-1)
Image.fromarray(result, mode="RGBA").save("gray_output.png")
π Apple Silicon GPU: Core Image Grayscale Example
import Foundation
import CoreImage
import AppKit // For macOS
let input = "/path/to/input.png"
let output = "/path/to/output.png"
let ciImage = CIImage(contentsOf: URL(fileURLWithPath: input))!
let filter = CIFilter.photoEffectMono()
filter.inputImage = ciImage
let outputCI = filter.outputImage!
let context = CIContext(options: [.useSoftwareRenderer: false])
let cgImage = context.createCGImage(outputCI, from: outputCI.extent)!
let nsImage = NSImage(cgImage: cgImage, size: .zero)
let rep = NSBitmapImageRep(cgImage: cgImage)
let pngData = rep.representation(using: .png, properties: [:])
try! pngData?.write(to: URL(fileURLWithPath: output))
Uses GPU under the hood (Metal)
Transparent PNG supported
βοΈ NVIDIA CUDA Example: Grayscale in C++
#include <cuda_runtime.h>
__global__ void rgbToGray(unsigned char* in, unsigned char* out, int w, int h) {
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int idx = (y * w + x) * 3;
if (x < w && y < h) {
unsigned char r = in[idx];
unsigned char g = in[idx + 1];
unsigned char b = in[idx + 2];
out[y * w + x] = 0.299f * r + 0.587f * g + 0.114f * b;
}
}
int main() {
int w = 250, h = 250;
int imgSize = w * h * 3;
int graySize = w * h;
unsigned char* h_in = new unsigned char[imgSize];
unsigned char* h_out = new unsigned char[graySize];
for (int i = 0; i < imgSize; i += 3) h_in[i] = 255, h_in[i+1] = 0, h_in[i+2] = 0;
unsigned char *d_in, *d_out;
cudaMalloc(&d_in, imgSize);
cudaMalloc(&d_out, graySize);
cudaMemcpy(d_in, h_in, imgSize, cudaMemcpyHostToDevice);
dim3 block(16, 16);
dim3 grid((w+15)/16, (h+15)/16);
rgbToGray<<<grid, block>>>(d_in, d_out, w, h);
cudaMemcpy(h_out, d_out, graySize, cudaMemcpyDeviceToHost);
cudaFree(d_in); cudaFree(d_out);
delete[] h_in; delete[] h_out;
return 0;
}
Requires NVIDIA GPU + CUDA
Ideal for massive parallel image/data processing
π Benchmarks (Approximate)
Method | Input Size | Execution Time | GPU Utilization |
PIL (CPU, Python) | 250x250 | ~12 ms | β |
Core Image (Mac) | 250x250 | ~2β3 ms | β |
CUDA (NVIDIA) | 250x250 | ~0.5β1 ms | β β |
Note: Real benchmarks vary based on hardware.
π§ When to Use What?
Use Case | Best Format / Method |
Web UI, icons | PNG / WebP Lossless |
Photography | JPG / WebP Lossy |
Transparent graphics | PNG / WebP |
ML input pipelines | PNG / BMP (exact pixels) |
GPU image filters | Apple Core Image / CUDA |
π§ Final Thoughts
All images are just matrices
Choosing between lossy and lossless depends on your use case
Use GPU (Apple Silicon / CUDA) for performance-heavy image processing
Use PNG/WebP when transparency or precision matters
Subscribe to my newsletter
Read articles from Santosh Mahto directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Santosh Mahto
Santosh Mahto
I work primarily with Go, designing distributed services and APIs that perform under pressure. Whether itβs Kafka pipelines, Redis optimizations, cloud-native storage, or real-time systems, I enjoy diving deep and finding practical, high-performance solutions. My experience spans from data modeling in Cassandra and PostgreSQL, to managing infra using tools like Docker, Kubernetes, and S3-compatible storage. Iβm also intrigued by modern compression, Apple Silicon GPUs, and efficient data handling (think Protocol Buffers + Zstd). When Iβm not debugging production flows or contributing to open-source, Iβm probably writing about what Iβve learned β or just testing out another crazy idea in Go.