Building an Interpreter in Go - Chapter 1: Understanding the Basics

Akshay ThakurAkshay Thakur
5 min read

Overview

Starting the journey of building an interpreter is like learning to be a translator between two worlds - the world of human-readable code and the world of machine instructions. This first chapter lays the foundation by explaining what an interpreter is, how it differs from a compiler, and the basic components we'll be building.

Key Concepts Learned

What is an Interpreter?

An interpreter is a program that directly executes instructions written in a programming language without requiring them to be compiled into machine language first. Think of it as a real-time translator who reads your code line by line and performs the actions immediately.

Components of an Interpreter

  1. Lexer (Tokenizer)

    • Breaks down source code into tokens

    • Example: let x = 5; becomes [LET, IDENTIFIER("x"), EQUALS, NUMBER(5), SEMICOLON]

    • Acts as the "eyes" of our interpreter

  2. Parser

    • Transforms tokens into an Abstract Syntax Tree (AST)

    • Understands the grammar of our language

    • Catches syntax errors

  3. Abstract Syntax Tree (AST)

    • Tree representation of the code's structure

    • Each node represents an operation or value

    • Example:

        CopyAssignment
        ├── Identifier(x)
        └── Value(5)
      

Code Implementation Highlights

1. Token Structure in Go

package token

type TokenType string

type Token struct {
    Type TokenType
    Literal string
} 

var keywords = map[string]TokenType {
    "fn" : FUNCTION,
    "let" : LET,
    "int" : INT,
}

func LookupIdent(ident string) TokenType {
    if tok, ok := keywords[ident]; ok {
        return tok
    }
    return IDENT
}

const (
    ILLEGAL = "ILLEGAL"
    EOF = "EOF"

    IDENT = "IDENT"
    INT = "INT"


    ASSIGN = "="
    PLUS = "+"

    COMMA = ","
    SEMICOLON = ";"

    LPAREN = "("
    RPAREN = ")"
    LBRACE = "{"
    RBRACE = "}"

    FUNCTION = "fn"
    LET = "let"

)

2. First Test Case

package lexer 

import (
    "testing"

    "interpreter/src/monkey/token"
)

func TestNextToken(t *testing.T) { 
    input := `
        let five = 5;

        let ten = 10;

        let add = fn(x, y) {
            x + y;
        };

        let result = add(five, ten);

    `

    tests := []struct {
        expectedType token.TokenType
        expectedLiteral string
    }{

        {token.LET, "let"},
        {token.IDENT, "five"},
        {token.ASSIGN, "="},
        {token.INT, "5"},
        {token.SEMICOLON, ";"},
        {token.LET, "let"},
        {token.IDENT, "ten"},
        {token.ASSIGN, "="},
        {token.INT, "10"},
        {token.SEMICOLON, ";"},
        {token.LET, "let"},
        {token.IDENT, "add"},
        {token.ASSIGN, "="},
        {token.FUNCTION, "fn"},
        {token.LPAREN, "("},
        {token.IDENT, "x"},
        {token.COMMA, ","},
        {token.IDENT, "y"},
        {token.RPAREN, ")"},
        {token.LBRACE, "{"},
        {token.IDENT, "x"},
        {token.PLUS, "+"},
        {token.IDENT, "y"},
        {token.SEMICOLON, ";"},
        {token.RBRACE, "}"},
        {token.SEMICOLON, ";"},
        {token.LET, "let"},
        {token.IDENT, "result"},
        {token.ASSIGN, "="},
        {token.IDENT, "add"},
        {token.LPAREN, "("},
        {token.IDENT, "five"},
        {token.COMMA, ","},
        {token.IDENT, "ten"},
        {token.RPAREN, ")"},
        {token.SEMICOLON, ";"},
        {token.EOF, ""},
    }

    l := New(input)

    for i, tt := range tests {
        tok := l.NextToken()

        if tok.Type != tt.expectedType {
            t.Fatalf("tests[%d] - tokentype wrong. expected=%q got=%q", i, tt.expectedType, tok.Type)
        }

        if tok.Literal != tt.expectedLiteral {
            t.Fatalf("tests[%d] - tokenliteral wrong. expected=%q got=%q", i, tt.expectedLiteral, tok.Literal)
        }
    }
}

3. Lexer Class

package lexer

import (
    "interpreter/src/monkey/token"
)

type Lexer struct {
    input string
    position int
    readPosition int
    ch byte
}

func New(input string) *Lexer{
    l := &Lexer{input : input}
    l.readChar()
    return l
}

func (l *Lexer) readChar() {
    if l.readPosition >= len(l.input) {
        l.ch = 0
    } else {
        l.ch = l.input[l.readPosition]
    }

    l.position = l.readPosition
    l.readPosition += 1
}

func (l *Lexer) NextToken() token.Token {
    var tok token.Token

    l.skipWhitespace() // Add this line

    switch l.ch {
    case '=':
        tok = newToken(token.ASSIGN, l.ch)
    case '+':
        tok = newToken(token.PLUS, l.ch)
    case '(':
        tok = newToken(token.LPAREN, l.ch)
    case ')':
        tok = newToken(token.RPAREN, l.ch)
    case '{':
        tok = newToken(token.LBRACE, l.ch)
    case '}':
        tok = newToken(token.RBRACE, l.ch)
    case ',':
        tok = newToken(token.COMMA, l.ch)
    case ';':
        tok = newToken(token.SEMICOLON, l.ch)
    case 0:
        tok.Literal = ""
        tok.Type = token.EOF
    default:
        if isLetter(l.ch) {
            tok.Literal = l.readIdentifier()
            tok.Type = token.LookupIdent(tok.Literal)
            return tok
        } else if isDigit(l.ch) {
            tok.Type = token.INT
            tok.Literal = l.readNumber()
            return tok
        } else {
            tok = newToken(token.ILLEGAL, l.ch)
        }
    }

    l.readChar()
    return tok
}

// Add this new method
func (l *Lexer) skipWhitespace() {
    for l.ch == ' ' || l.ch == '\t' || l.ch == '\n' || l.ch == '\r' {
        l.readChar()
    }
}

func (l *Lexer) readIdentifier() string {
    position := l.position

    for isLetter(l.ch){
        l.readChar()
    }

    return l.input[position:l.position]
}

func (l *Lexer) readNumber() string {
    position := l.position
    for isDigit(l.ch) {
        l.readChar()
    }
    return l.input[position:l.position]
}



func isLetter(ch byte) bool {
    return 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_' 
}



func isDigit(ch byte) bool {
    return '0' <= ch && ch <= '9'
}

func newToken(tokenType token.TokenType, ch byte) token.Token {
    return token.Token {Type: tokenType, Literal: string(ch)}
}

Challenges Faced

  1. Understanding Lexical Analysis

    • Challenge: Grasping how to break down source code into meaningful tokens

    • Solution: Created visual diagrams of the tokenization process

    • Learning: The importance of handling edge cases in lexical analysis

  2. Go's Type System

    • Challenge: Adapting to Go's strict typing after working with dynamic languages

    • Solution: Studied Go's type system documentation and examples

    • Learning: How Go's type system helps catch errors at compile time

Go Language Features Learned

  • Custom type definitions using type

  • Constants and iota

  • Struct composition

  • Go's testing framework

  • String manipulation in Go

Project Progress

  • [✓] Understanding Interpreters vs Compilers

  • [✓] Basic Lexer Implementation

  • [✓] Token Structure

  • [✓] Handles Whitespace, functions, digits, basic mathematical operations

  • Parser Implementation

  • AST Construction

Code Repository

Resources

Next Steps

  1. Implement more token types

  2. Handle whitespace and comments

  3. Begin parser implementation

  4. Study AST structures

Personal Notes

Time spent on this chapter: 6 hours Initial confidence level: Beginner Current confidence level: Intermediate

Key Insights

  1. The lexer is simpler than I initially thought - it's just pattern matching

  2. Test-Driven Development is incredibly helpful when building an interpreter

  3. Go's strict typing helps catch potential issues early

Tips for Others

  • Draw out the token structure before coding

  • Write tests for edge cases first

  • Keep the lexer simple - resist the urge to make it too clever

4
Subscribe to my newsletter

Read articles from Akshay Thakur directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Akshay Thakur
Akshay Thakur

Developer from India.