Today I Learned: Huffman Coding in Python

kasumbi philkasumbi phil
2 min read

Today was an exciting day in my data structures and algorithms journey—I finally learned and implemented Huffman Coding! 🚀


🔍 What is Huffman Coding?

Huffman Coding is a greedy algorithm used for lossless data compression. It assigns shorter binary codes to more frequent characters and longer codes to less frequent ones, reducing the overall space required to store data.

It’s used in real-world applications like:

  • ZIP file compression

  • JPEG and MP3 encoding

  • Network data compression


🧱 How It Works (Step-by-Step)

  1. Frequency Count
    Count how many times each character appears in the input string.

  2. Min-Heap (Priority Queue)
    Store characters with their frequencies in a heap (min-priority queue).

  3. Build the Huffman Tree
    Combine the two lowest-frequency nodes into a new node. Repeat this until only one node (the root) remains.

  4. Generate Huffman Codes
    Traverse the tree:

    • Go left → add "0"

    • Go right → add "1"
      Until you reach a character (leaf node), then assign the code.

  5. Encode the Input
    Replace each character with its code.


📦 Why Use Huffman Coding?

  • Reduces file size

  • No information is lost (lossless)

  • Efficient and fast

  • Based on character frequency → adaptive to different data


🧪 What I Tried

I implemented Huffman Coding in Python using heapq for the priority queue. It helped me understand tree-building and recursive code generation. Here’s the full code I wrote and ran successfully:

import heapq

# Input string
input = "aaabbbcddccddd"

# Step 1: Count character frequencies
freq = {}
for char in input:
    if char in freq:
        freq[char] += 1
    else:
        freq[char] = 1

# Step 2: Build a priority queue (min-heap)
heap = [[count, char] for char, count in freq.items()]
heapq.heapify(heap)

# Step 3: Build the Huffman tree
while len(heap) > 1:
    left = heapq.heappop(heap)
    right = heapq.heappop(heap)
    combined_frequency = left[0] + right[0]
    new_node = [combined_frequency, [left, right]]
    heapq.heappush(heap, new_node)

# Step 4: Generate Huffman codes
def generate_codes(node, code="", codes={}):
    if isinstance(node[1], str):  # It's a character
        codes[node[1]] = code
        return codes
    generate_codes(node[1][0], code + "0", codes)
    generate_codes(node[1][1], code + "1", codes)
    return codes

# Root of the tree
huffman_tree = heap[0]
codes = generate_codes(huffman_tree)

# Step 5: Encode the input string
def encode_string(input_string, codes):
    encoded_string = ""
    for char in input_string:
        encoded_string += codes[char]
    return encoded_string

encoded_output = encode_string(input, codes)

# Display results
print("Character Frequencies:", freq)
print("Huffman Codes:", codes)
print("Encoded String:", encoded_output)

💡 My Takeaway

Understanding Huffman Coding really made me appreciate how smart compression algorithms can be. The fact that it adapts to frequency and builds an efficient binary tree blew my mind!

0
Subscribe to my newsletter

Read articles from kasumbi phil directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

kasumbi phil
kasumbi phil