Mastering String Encoding and Decoding Techniques

Real-Life Example: Sending a Secret List

Imagine you are working on a messaging app like WhatsApp or Discord. You want to send a list of messages (strings) like:

["hello", "world", "I", "am", "awesome"]

But there’s a problem: networks don’t work directly. They only send strings!

So, how can we turn this into a single string, send it, and rebuild the original list on the other side?
That’s exactly what the “Encode and Decode a String Array“ problem is about.

Problem Statement

Design an algorithm to encode a list of strings to a single string.
Then, design the decoding function that reconstructs the list from that single string.

Example:

Input: ["leet", "Code", "is", "cool"]
Encoded: "4#leet4#code2#is4#cool"
Decoded: ["leet", "code", "is", "cool"]

Why This Problem Matters

This problem shows up in the interview at Meta(Facebook), Google, Amazon, and more because it tests:

  • Your understanding of string manipulation

  • Your ability to design custom formats

  • Your skill at thinking about edge cases and decoding

It’s also extremely practical for real-world systems that serialize and deserialize data like:

  • Sending data over API’s or sockets

  • Storing string in databases

  • Building chat apps or test processing tools

Naive Approach (and Why It Fails)

You might think: “Why not just join the strings with a comma?”

','.join(["Hello", "World"]) -> "Hello,World"

But what if a string contains a comma?

["Hello", "wor,ld"] -> "Hello,wor,ld" ❌ Confusing

We can’t tell how to split it back!
So… we need a robust, unambiguous way to encode and decode.

Smart Approach: Length-Prefix Encoding

We’ll encode each string as:

<length>#<string>

So [“Hello“, “world“] becomes:

5#hello5#world

We use # as a delimiter because it won’t appear in the length (which is just a number). During decoding, we read the length, skip the #, and then read the exact number of characters.

Step-by-Step Breakdown

Encode Function

  1. For each string:

    • Get its length.

    • Add length + # + string.

  2. Combine everything into one long string.

     def encode(strs):
         res = ""
            for s in strs:
                res += str(len(s)) + '#' + s
         return res
    

Decode Function

  1. Loop through the string.

  2. Find the length by scanning until you see #.

  3. Read the next length characters.

  4. Repeat until done.

We’ll encode each string as:

def decode(s):
    res = []
    i = 0
    while i < len(s):
        j = i
        while s[j] != "#":
            j += 1
        length = int(s[i:j])
        res.append(s[j+1:j+1+length])
        i = j + 1 + length
    return res

Test Example

input_data = ["hello", "world", "python"]
encoded = encode(input_data)
print(encoded)  #Output: 5#hello5#World6#python

decoded = decode(encoded)
print(decoded)  #Output: ["Hello", "world", "python"]

Works perfectly — even if the strings have special characters, punctuation, or even #

Time and Space Complexity

OperationTime Complexity
EncodingO(N)
DecodingO(N)

Where N is the total number of characters in all strings combined.
Why is this important?
→ It’s linear and fast, which matters when you process millions of strings.
→ No extra libraries. No tricky edge cases. Clean and reliable.

Key Takeaways

  • Avoid naive .join() When the delimiter might appear inside your strings.

  • Use length-prefix encoding to build reliable encoders/decoders.

  • It trains you to think like a system engineer. What can go wrong? How do you prevent it?, etc…

1
Subscribe to my newsletter

Read articles from Sam Anirudh Malarvannan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sam Anirudh Malarvannan
Sam Anirudh Malarvannan