🔡 How Unicode Characters Become Bytes in Go: From Code Points to UTF-8


Ever wondered how languages like Telugu, emojis, or even simple letters like 'A' are stored in Go?
It all comes down to Unicode and UTF-8 — and Go makes working with them surprisingly clean.
Let’s peel back the layers of abstraction and see what really happens under the hood when you write a character in Go.
Unicode vs UTF-8 vs Go Types
Concept | What It Is |
Unicode | A universal set of character codes (code points) — one number per symbol |
UTF-8 | A way to store those code points using 1–4 bytes |
Go | Uses rune to store a Unicode code point, and []byte to store its UTF-8 bytes |
Let's Take a Character: త
This is a Telugu letter, pronounced “ta”.
Step 1: Unicode
Every character has a unique code point in the Unicode spec.
goCopyEditr := 'త'
fmt.Printf("Unicode: U+%04X\n", r) // Output: U+0C24
So 'త'
has Unicode code point U+0C24
= 3108 in decimal.
Step 2: Convert to UTF-8
The Unicode code point is abstract — we need a way to store it in memory. That’s where UTF-8 comes in.
UTF-8 stores U+0C24
as:
Bytes:
[224, 176, 164]
Hex:
[0xE0, 0xB0, 0xA4]
Binary:
11100000 10110000 10100100
UTF-8 uses variable-length encoding. Since త
is in the U+0800
–U+FFFF
range, it uses 3 bytes.
Step 3: See It in Go
Here's a complete Go function to visualise this transformation:
package main
import (
"fmt"
)
func printUTF8Encoding(s string) {
fmt.Printf("Input: %q\n\n", s)
for i, r := range s {
utf8Bytes := []byte(string(r))
binaryUnicode := fmt.Sprintf("%016b", r)
fmt.Printf("Character #%d: %q\n", i+1, r)
fmt.Printf("→ Unicode: U+%04X (decimal: %d)\n", r, r)
fmt.Printf("→ Binary (Unicode): %s\n", insertEvery4Bits(binaryUnicode))
fmt.Printf("→ UTF-8 Bytes: %v\n", utf8Bytes)
fmt.Print("→ Hex: ")
for _, b := range utf8Bytes {
fmt.Printf("0x%X ", b)
}
fmt.Print("\n→ Binary: ")
for _, b := range utf8Bytes {
fmt.Printf("%08b ", b)
}
fmt.Println("\n---")
}
}
func insertEvery4Bits(s string) string {
out := ""
for i, c := range s {
if i > 0 && (len(s)-i)%4 == 0 {
out += " "
}
out += string(c)
}
return out
}
func main() {
printUTF8Encoding("త")
}
🧪 Output:
Input: "త"
Character #1: 'త'
→ Unicode: U+0C24 (decimal: 3108)
→ Binary (Unicode): 0000 1100 0010 0100
→ UTF-8 Bytes: [224 176 164]
→ Hex: 0xE0 0xB0 0xA4
→ Binary: 11100000 10110000 10100100
🔍 Summary
Layer | Value |
Unicode | U+0C24 (decimal: 3108) |
UTF-8 Bytes | [224, 176, 164] |
Go rune | int32 → 3108 |
Go []byte | UTF-8 bytes |
✨ Try More!
Pass any string into the function: "😊"
, "hi"
, "తెలుగు"
, "你好"
— and you'll see how Go handles it.
Subscribe to my newsletter
Read articles from Dushyanth directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Dushyanth
Dushyanth
A Full Stack Developer with a knack for creating engaging web experiences. Currently tinkering with GO.