Interpreter in Go: Chapter 1 Basics

Overview

Starting the journey of building an interpreter is like learning to be a translator between two worlds - the world of human-readable code and the world of machine instructions. This first chapter lays the foundation by explaining what an interpreter is, how it differs from a compiler, and the basic components we'll be building.

Key Concepts Learned

What is an Interpreter?

An interpreter is a program that directly executes instructions written in a programming language without requiring them to be compiled into machine language first. Think of it as a real-time translator who reads your code line by line and performs the actions immediately.

Components of an Interpreter

Lexer (Tokenizer)
- Breaks down source code into tokens
- Example: let x = 5; becomes [LET, IDENTIFIER("x"), EQUALS, NUMBER(5), SEMICOLON]
- Acts as the "eyes" of our interpreter
Parser
- Transforms tokens into an Abstract Syntax Tree (AST)
- Understands the grammar of our language
- Catches syntax errors
Abstract Syntax Tree (AST)
- Tree representation of the code's structure
- Each node represents an operation or value
- Example:
```
  CopyAssignment
  ├── Identifier(x)
  └── Value(5)
```

Code Implementation Highlights

1. Token Structure in Go

package token

type TokenType string

type Token struct {
    Type TokenType
    Literal string
} 

var keywords = map[string]TokenType {
    "fn" : FUNCTION,
    "let" : LET,
    "int" : INT,
}

func LookupIdent(ident string) TokenType {
    if tok, ok := keywords[ident]; ok {
        return tok
    }
    return IDENT
}

const (
    ILLEGAL = "ILLEGAL"
    EOF = "EOF"

    IDENT = "IDENT"
    INT = "INT"


    ASSIGN = "="
    PLUS = "+"

    COMMA = ","
    SEMICOLON = ";"

    LPAREN = "("
    RPAREN = ")"
    LBRACE = "{"
    RBRACE = "}"

    FUNCTION = "fn"
    LET = "let"

)

2. First Test Case

package lexer 

import (
    "testing"

    "interpreter/src/monkey/token"
)

func TestNextToken(t *testing.T) { 
    input := `
        let five = 5;

        let ten = 10;

        let add = fn(x, y) {
            x + y;
        };

        let result = add(five, ten);

    `

    tests := []struct {
        expectedType token.TokenType
        expectedLiteral string
    }{

        {token.LET, "let"},
        {token.IDENT, "five"},
        {token.ASSIGN, "="},
        {token.INT, "5"},
        {token.SEMICOLON, ";"},
        {token.LET, "let"},
        {token.IDENT, "ten"},
        {token.ASSIGN, "="},
        {token.INT, "10"},
        {token.SEMICOLON, ";"},
        {token.LET, "let"},
        {token.IDENT, "add"},
        {token.ASSIGN, "="},
        {token.FUNCTION, "fn"},
        {token.LPAREN, "("},
        {token.IDENT, "x"},
        {token.COMMA, ","},
        {token.IDENT, "y"},
        {token.RPAREN, ")"},
        {token.LBRACE, "{"},
        {token.IDENT, "x"},
        {token.PLUS, "+"},
        {token.IDENT, "y"},
        {token.SEMICOLON, ";"},
        {token.RBRACE, "}"},
        {token.SEMICOLON, ";"},
        {token.LET, "let"},
        {token.IDENT, "result"},
        {token.ASSIGN, "="},
        {token.IDENT, "add"},
        {token.LPAREN, "("},
        {token.IDENT, "five"},
        {token.COMMA, ","},
        {token.IDENT, "ten"},
        {token.RPAREN, ")"},
        {token.SEMICOLON, ";"},
        {token.EOF, ""},
    }

    l := New(input)

    for i, tt := range tests {
        tok := l.NextToken()

        if tok.Type != tt.expectedType {
            t.Fatalf("tests[%d] - tokentype wrong. expected=%q got=%q", i, tt.expectedType, tok.Type)
        }

        if tok.Literal != tt.expectedLiteral {
            t.Fatalf("tests[%d] - tokenliteral wrong. expected=%q got=%q", i, tt.expectedLiteral, tok.Literal)
        }
    }
}

3. Lexer Class

package lexer

import (
    "interpreter/src/monkey/token"
)

type Lexer struct {
    input string
    position int
    readPosition int
    ch byte
}

func New(input string) *Lexer{
    l := &Lexer{input : input}
    l.readChar()
    return l
}

func (l *Lexer) readChar() {
    if l.readPosition >= len(l.input) {
        l.ch = 0
    } else {
        l.ch = l.input[l.readPosition]
    }

    l.position = l.readPosition
    l.readPosition += 1
}

func (l *Lexer) NextToken() token.Token {
    var tok token.Token

    l.skipWhitespace() // Add this line

    switch l.ch {
    case '=':
        tok = newToken(token.ASSIGN, l.ch)
    case '+':
        tok = newToken(token.PLUS, l.ch)
    case '(':
        tok = newToken(token.LPAREN, l.ch)
    case ')':
        tok = newToken(token.RPAREN, l.ch)
    case '{':
        tok = newToken(token.LBRACE, l.ch)
    case '}':
        tok = newToken(token.RBRACE, l.ch)
    case ',':
        tok = newToken(token.COMMA, l.ch)
    case ';':
        tok = newToken(token.SEMICOLON, l.ch)
    case 0:
        tok.Literal = ""
        tok.Type = token.EOF
    default:
        if isLetter(l.ch) {
            tok.Literal = l.readIdentifier()
            tok.Type = token.LookupIdent(tok.Literal)
            return tok
        } else if isDigit(l.ch) {
            tok.Type = token.INT
            tok.Literal = l.readNumber()
            return tok
        } else {
            tok = newToken(token.ILLEGAL, l.ch)
        }
    }

    l.readChar()
    return tok
}

// Add this new method
func (l *Lexer) skipWhitespace() {
    for l.ch == ' ' || l.ch == '\t' || l.ch == '\n' || l.ch == '\r' {
        l.readChar()
    }
}

func (l *Lexer) readIdentifier() string {
    position := l.position

    for isLetter(l.ch){
        l.readChar()
    }

    return l.input[position:l.position]
}

func (l *Lexer) readNumber() string {
    position := l.position
    for isDigit(l.ch) {
        l.readChar()
    }
    return l.input[position:l.position]
}



func isLetter(ch byte) bool {
    return 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_' 
}



func isDigit(ch byte) bool {
    return '0' <= ch && ch <= '9'
}

func newToken(tokenType token.TokenType, ch byte) token.Token {
    return token.Token {Type: tokenType, Literal: string(ch)}
}

Challenges Faced

Understanding Lexical Analysis
- Challenge: Grasping how to break down source code into meaningful tokens
- Solution: Created visual diagrams of the tokenization process
- Learning: The importance of handling edge cases in lexical analysis
Go's Type System
- Challenge: Adapting to Go's strict typing after working with dynamic languages
- Solution: Studied Go's type system documentation and examples
- Learning: How Go's type system helps catch errors at compile time

Go Language Features Learned

Custom type definitions using type
Constants and iota
Struct composition
Go's testing framework
String manipulation in Go

Project Progress

[✓] Understanding Interpreters vs Compilers
[✓] Basic Lexer Implementation
[✓] Token Structure
[✓] Handles Whitespace, functions, digits, basic mathematical operations
Parser Implementation
AST Construction

Code Repository

GitHub Repository: https://github.com/AkshayContributes/interpreter
Branch: chapter_1
Pull Request: https://github.com/AkshayContributes/interpreter/pull/1

Resources

Next Steps

Implement more token types
Handle whitespace and comments
Begin parser implementation
Study AST structures

Personal Notes

Time spent on this chapter: 6 hours Initial confidence level: Beginner Current confidence level: Intermediate

Key Insights

The lexer is simpler than I initially thought - it's just pattern matching
Test-Driven Development is incredibly helpful when building an interpreter
Go's strict typing helps catch potential issues early

Tips for Others

Draw out the token structure before coding
Write tests for edge cases first
Keep the lexer simple - resist the urge to make it too clever

Building an Interpreter in Go - Chapter 1: Understanding the Basics