Today I Learned: Huffman Coding in Python

Today was an exciting day in my data structures and algorithms journey—I finally learned and implemented Huffman Coding! 🚀
🔍 What is Huffman Coding?
Huffman Coding is a greedy algorithm used for lossless data compression. It assigns shorter binary codes to more frequent characters and longer codes to less frequent ones, reducing the overall space required to store data.
It’s used in real-world applications like:
ZIP file compression
JPEG and MP3 encoding
Network data compression
🧱 How It Works (Step-by-Step)
Frequency Count
Count how many times each character appears in the input string.Min-Heap (Priority Queue)
Store characters with their frequencies in a heap (min-priority queue).Build the Huffman Tree
Combine the two lowest-frequency nodes into a new node. Repeat this until only one node (the root) remains.Generate Huffman Codes
Traverse the tree:Go left → add "0"
Go right → add "1"
Until you reach a character (leaf node), then assign the code.
Encode the Input
Replace each character with its code.
📦 Why Use Huffman Coding?
Reduces file size
No information is lost (lossless)
Efficient and fast
Based on character frequency → adaptive to different data
🧪 What I Tried
I implemented Huffman Coding in Python using heapq
for the priority queue. It helped me understand tree-building and recursive code generation. Here’s the full code I wrote and ran successfully:
import heapq
# Input string
input = "aaabbbcddccddd"
# Step 1: Count character frequencies
freq = {}
for char in input:
if char in freq:
freq[char] += 1
else:
freq[char] = 1
# Step 2: Build a priority queue (min-heap)
heap = [[count, char] for char, count in freq.items()]
heapq.heapify(heap)
# Step 3: Build the Huffman tree
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
combined_frequency = left[0] + right[0]
new_node = [combined_frequency, [left, right]]
heapq.heappush(heap, new_node)
# Step 4: Generate Huffman codes
def generate_codes(node, code="", codes={}):
if isinstance(node[1], str): # It's a character
codes[node[1]] = code
return codes
generate_codes(node[1][0], code + "0", codes)
generate_codes(node[1][1], code + "1", codes)
return codes
# Root of the tree
huffman_tree = heap[0]
codes = generate_codes(huffman_tree)
# Step 5: Encode the input string
def encode_string(input_string, codes):
encoded_string = ""
for char in input_string:
encoded_string += codes[char]
return encoded_string
encoded_output = encode_string(input, codes)
# Display results
print("Character Frequencies:", freq)
print("Huffman Codes:", codes)
print("Encoded String:", encoded_output)
💡 My Takeaway
Understanding Huffman Coding really made me appreciate how smart compression algorithms can be. The fact that it adapts to frequency and builds an efficient binary tree blew my mind!
Subscribe to my newsletter
Read articles from kasumbi phil directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
