🎨 From Pixels to Performance: Mastering Image Matrices, Compression & GPU Acceleration

Santosh MahtoSantosh Mahto
5 min read

Whether you're building a graphics editor, optimizing images for the web, or preprocessing data for machine learning, understanding how images are stored, compressed, and processed is essential. This blog dives deep from the basics of image matrices to advanced GPU-powered grayscale conversion using both Apple Silicon and NVIDIA CUDA.


πŸ“š Key Terms for Beginners

What Is an Image Matrix?

An image is essentially a matrix (array) of pixel values:

  • Grayscale: H x W (1 channel: intensity)

  • RGB: H x W x 3 (Red, Green, Blue)

  • RGBA: H x W x 4 (RGB + Alpha for transparency)

What Is Compression?

Compression reduces file size by removing redundant or less important data.

  • Lossless: No data is lost. You can recover the exact original.

  • Lossy: Some data is discarded, prioritizing visual similarity over exact reconstruction.

What Is Transparency (Alpha Channel)?

An alpha channel defines pixel transparency. 0 is fully transparent, 255 is fully opaque.


πŸ’‘ Understanding Image Formats

BMP (Bitmap)

  • Raw pixel data, no compression

  • Very large file size

PNG (Portable Network Graphics)

  • Lossless compression using DEFLATE (zlib)

  • Supports transparency

JPEG (Joint Photographic Experts Group)

  • Lossy compression using DCT (Discrete Cosine Transform)

  • Ideal for photos, not UIs or text

WebP (by Google)

  • Supports both lossy and lossless compression

  • Supports transparency

  • Modern, web-optimized

FormatLosslessLossyAlphaUse Case
BMPβœ…βŒβŒRaw data, internal use
PNGβœ…βŒβœ…UI, icons, screenshots
JPGβŒβœ…βŒPhotos
WebP (lossy)βŒβœ…βœ…Photos with transparency
WebP (lossless)βœ…βŒβœ…Replacement for PNG

πŸ“Š Matrix vs File Size Example (250x250 RGBA)

FormatRaw Matrix SizeCompressed Size (Approx)
BMP250 KB~250 KB
PNG250 KB~0.5–0.8 KB
JPG250 KB~2 KB
WebP Lossy250 KB~1–2 KB
WebP Lossless250 KB~0.3–0.5 KB

Image Compression Comparison


πŸ“· Grayscale Conversion: Concept

To convert RGB or RGBA to grayscale:

Gray = 0.299*R + 0.587*G + 0.114*B

If there's an alpha channel, we usually preserve it.


πŸ“„ Python Example: Grayscale with PIL + NumPy

from PIL import Image
import numpy as np

img = Image.open("input.png").convert("RGBA")
data = np.array(img)

# Extract RGB channels
r, g, b, a = data[:,:,0], data[:,:,1], data[:,:,2], data[:,:,3]
gray = (0.299 * r + 0.587 * g + 0.114 * b).astype(np.uint8)

# Combine grayscale with alpha
result = np.stack((gray, gray, gray, a), axis=-1)
Image.fromarray(result, mode="RGBA").save("gray_output.png")

🌟 Apple Silicon GPU: Core Image Grayscale Example

import Foundation
import CoreImage
import AppKit  // For macOS

let input = "/path/to/input.png"
let output = "/path/to/output.png"
let ciImage = CIImage(contentsOf: URL(fileURLWithPath: input))!

let filter = CIFilter.photoEffectMono()
filter.inputImage = ciImage
let outputCI = filter.outputImage!

let context = CIContext(options: [.useSoftwareRenderer: false])
let cgImage = context.createCGImage(outputCI, from: outputCI.extent)!
let nsImage = NSImage(cgImage: cgImage, size: .zero)

let rep = NSBitmapImageRep(cgImage: cgImage)
let pngData = rep.representation(using: .png, properties: [:])
try! pngData?.write(to: URL(fileURLWithPath: output))
  • Uses GPU under the hood (Metal)

  • Transparent PNG supported


βš™οΈ NVIDIA CUDA Example: Grayscale in C++

#include <cuda_runtime.h>
__global__ void rgbToGray(unsigned char* in, unsigned char* out, int w, int h) {
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    int idx = (y * w + x) * 3;
    if (x < w && y < h) {
        unsigned char r = in[idx];
        unsigned char g = in[idx + 1];
        unsigned char b = in[idx + 2];
        out[y * w + x] = 0.299f * r + 0.587f * g + 0.114f * b;
    }
}

int main() {
    int w = 250, h = 250;
    int imgSize = w * h * 3;
    int graySize = w * h;

    unsigned char* h_in = new unsigned char[imgSize];
    unsigned char* h_out = new unsigned char[graySize];
    for (int i = 0; i < imgSize; i += 3) h_in[i] = 255, h_in[i+1] = 0, h_in[i+2] = 0;

    unsigned char *d_in, *d_out;
    cudaMalloc(&d_in, imgSize);
    cudaMalloc(&d_out, graySize);
    cudaMemcpy(d_in, h_in, imgSize, cudaMemcpyHostToDevice);

    dim3 block(16, 16);
    dim3 grid((w+15)/16, (h+15)/16);
    rgbToGray<<<grid, block>>>(d_in, d_out, w, h);
    cudaMemcpy(h_out, d_out, graySize, cudaMemcpyDeviceToHost);

    cudaFree(d_in); cudaFree(d_out);
    delete[] h_in; delete[] h_out;
    return 0;
}
  • Requires NVIDIA GPU + CUDA

  • Ideal for massive parallel image/data processing


πŸš€ Benchmarks (Approximate)

MethodInput SizeExecution TimeGPU Utilization
PIL (CPU, Python)250x250~12 ms❌
Core Image (Mac)250x250~2–3 msβœ…
CUDA (NVIDIA)250x250~0.5–1 msβœ… βœ…

Note: Real benchmarks vary based on hardware.


🧭 When to Use What?

Use CaseBest Format / Method
Web UI, iconsPNG / WebP Lossless
PhotographyJPG / WebP Lossy
Transparent graphicsPNG / WebP
ML input pipelinesPNG / BMP (exact pixels)
GPU image filtersApple Core Image / CUDA

🧠 Final Thoughts

  • All images are just matrices

  • Choosing between lossy and lossless depends on your use case

  • Use GPU (Apple Silicon / CUDA) for performance-heavy image processing

  • Use PNG/WebP when transparency or precision matters

1
Subscribe to my newsletter

Read articles from Santosh Mahto directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Santosh Mahto
Santosh Mahto

I work primarily with Go, designing distributed services and APIs that perform under pressure. Whether it’s Kafka pipelines, Redis optimizations, cloud-native storage, or real-time systems, I enjoy diving deep and finding practical, high-performance solutions. My experience spans from data modeling in Cassandra and PostgreSQL, to managing infra using tools like Docker, Kubernetes, and S3-compatible storage. I’m also intrigued by modern compression, Apple Silicon GPUs, and efficient data handling (think Protocol Buffers + Zstd). When I’m not debugging production flows or contributing to open-source, I’m probably writing about what I’ve learned β€” or just testing out another crazy idea in Go.