Building a Custom Compiler or Transpiler with Node.js

Pradeepto SarkarPradeepto Sarkar
23 min read

Introduction

In the world of software development, compilers and transpilers are essential tools that allow developers to leverage new programming languages, optimize code, and ensure cross-platform compatibility. Despite their critical role, building a compiler or transpiler is often viewed as a daunting task reserved for experts. This blog post aims to demystify the process by guiding you through the creation of a custom compiler or transpiler using Node.js. We will cover each step in detail, provide practical examples, and delve into advanced topics to give you a comprehensive understanding of the process.

Understanding the Basics

What is a Compiler?

A compiler is a program that translates code written in one programming language (the source language) into another language (the target language), typically machine code. Unlike interpreters, which execute code line by line, compilers analyze and convert the entire source code before execution. Compilers also perform optimizations to improve the performance and efficiency of the generated code. Examples of well-known compilers include GCC (GNU Compiler Collection) and LLVM.

What is a Transpiler?

A transpiler, or source-to-source compiler, converts code from one high-level programming language to another. Transpilers are often used to transform code to a version compatible with different environments or to implement features not natively supported. Popular transpilers include Babel for JavaScript and the TypeScript compiler.

Key Concepts in Compilation and Transpilation

Lexical Analysis

Lexical analysis, or tokenization, is the first phase of compilation. It converts the source code into tokens, which are meaningful character sequences representing language constructs like keywords, operators, identifiers, and literals. This phase involves scanning the source code and grouping characters into lexemes that correspond to the language's vocabulary.

Syntax Analysis (Parsing)

Parsing involves analyzing the token sequence to ensure it adheres to the language's grammatical rules. The parser generates an Abstract Syntax Tree (AST), a tree representation of the source code's structure. The AST is a hierarchical structure that represents the syntax of the source code in a way that makes it easier to perform further analysis and transformations.

Semantic Analysis

Semantic analysis verifies that the code makes sense logically. This phase includes type checking, ensuring variables are declared before use, and other context-specific checks. It involves traversing the AST and checking for semantic errors, such as type mismatches and scope violations.

Code Generation

This phase translates the AST into the target language, which could be machine code, bytecode, or another high-level language. Code generation involves converting the high-level representation of the source code into a form that can be executed by the target platform.

Optimization (Optional)

Optimization improves the performance and efficiency of the generated code. This step is optional but can significantly enhance the output. Optimizations can be performed at various stages of the compilation process, including during code generation and in a separate optimization phase.

Setting Up Your Node.js Environment

Required Tools and Libraries

  • Node.js: Ensure you have Node.js installed.

  • npm: Node Package Manager to manage dependencies.

  • Libraries:

    • Esprima for JavaScript parsing.

    • Estraverse for traversing and transforming ASTs.

    • Source-map for generating source maps.

    • Chalk for pretty-printing errors and output.

Initial Project Setup

  1. Create a New Project:

     shCopy codemkdir custom-compiler
     cd custom-compiler
     npm init -y
    
  2. Install Dependencies:

     shCopy codenpm install esprima estraverse source-map chalk
    
  3. Project Structure:

     cssCopy codecustom-compiler/
     ├── src/
     │   ├── lexer.js
     │   ├── parser.js
     │   ├── semantic.js
     │   ├── generator.js
     │   ├── optimizer.js
     │   ├── compiler.js
     ├── tests/
     │   └── test.js
     ├── examples/
     │   └── example.code
     ├── package.json
     └── README.md
    

    Building the Lexical Analyzer

    Token Definitions

    Define the tokens for your language. For simplicity, let's assume we're building a compiler for a simple arithmetic language. We'll define tokens for keywords, operators, identifiers, literals, and punctuation.

    1. Token Types:

       jsCopy codeconst TOKEN_TYPES = {
         KEYWORD: 'Keyword',
         IDENTIFIER: 'Identifier',
         NUMBER: 'Numeric',
         STRING: 'String',
         OPERATOR: 'Operator',
         PUNCTUATION: 'Punctuation',
       };
      
       const KEYWORDS = ['let', 'const', 'if', 'else', 'while', 'return'];
       const OPERATORS = ['+', '-', '*', '/', '=', '==', '!=', '<', '>', '<=', '>='];
       const PUNCTUATIONS = [';', ',', '(', ')', '{', '}'];
      

Writing the Lexer

  1. Create Lexer: src/lexer.js

     jsCopy codeconst esprima = require('esprima');
     const chalk = require('chalk');
    
     function tokenize(code) {
       try {
         return esprima.tokenize(code, { loc: true });
       } catch (error) {
         console.error(chalk.red('Lexical Error:'), error.message);
         throw error;
       }
     }
    
     module.exports = { tokenize };
    
  2. Example Usage:

     jsCopy codeconst { tokenize } = require('./lexer');
     const code = 'let x = 42;';
     console.log(tokenize(code));
    

Building the Parser

Defining Grammar

For our arithmetic language, we'll handle basic arithmetic expressions, variable assignments, conditional statements, and loops.

  1. Grammar Rules:

     sqlCopy codeprogram       → statement* EOF ;
     statement     → exprStmt | ifStmt | whileStmt | block ;
     exprStmt      → expression ";" ;
     ifStmt        → "if" "(" expression ")" statement ("else" statement)? ;
     whileStmt     → "while" "(" expression ")" statement ;
     block         → "{" statement* "}" ;
     expression    → assignment ;
     assignment    → IDENTIFIER "=" assignment | equality ;
     equality      → comparison ( ( "!=" | "==" ) comparison )* ;
     comparison    → addition ( ( ">" | ">=" | "<" | "<=" ) addition )* ;
     addition      → multiplication ( ( "-" | "+" ) multiplication )* ;
     multiplication→ unary ( ( "/" | "*" ) unary )* ;
     unary         → ( "!" | "-" ) unary | primary ;
     primary       → NUMBER | STRING | IDENTIFIER | "(" expression ")" ;
    

Writing the Parser

  1. Create Parser: src/parser.js

     jsCopy codeconst esprima = require('esprima');
     const estraverse = require('estraverse');
     const chalk = require('chalk');
    
     function parse(code) {
       try {
         const ast = esprima.parseScript(code, { loc: true });
         return ast;
       } catch (error) {
         console.error(chalk.red('Syntax Error:'), error.message);
         throw error;
       }
     }
    
     function traverseAst(ast, visitors) {
       estraverse.traverse(ast, {
         enter: visitors.enter || (() => {}),
         leave: visitors.leave || (() => {}),
       });
     }
    
     module.exports = { parse, traverseAst };
    
  2. Example Usage:

     jsCopy codeconst { parse } = require('./parser');
     const code = 'let x = 42;';
     console.log(JSON.stringify(parse(code), null, 2));
    

Implementing Semantic Analysis

Type Checking and Scoping

We'll ensure variables are declared before use and handle basic type checking. Our semantic analysis will traverse the AST and perform context-specific checks.

  1. Create Semantic Analyzer: src/semantic.js

     jsCopy codeconst { traverseAst } = require('./parser');
     const chalk = require('chalk');
    
     function analyze(ast) {
       const declaredVariables = new Set();
    
       traverseAst(ast, {
         enter(node) {
           if (node.type === 'VariableDeclaration') {
             node.declarations.forEach(decl => declaredVariables.add(decl.id.name));
           } else if (node.type === 'Identifier' && node.parent.type !== 'VariableDeclarator') {
             if (!declaredVariables.has(node.name)) {
               console.error(chalk.red(`Undeclared variable ${node.name} at line ${node.loc.start.line}`));
               throw new Error(`Undeclared variable ${node.name}`);
             }
           }
         },
       });
     }
    
     module.exports = { analyze };
    
  2. Example Usage:

     jsCopy codeconst { parse } = require('./parser');
     const { analyze } = require('./semantic');
     const code = 'let x = 42; y = x + 1;';
     const ast = parse(code);
     analyze(ast); // Throws an error for undeclared variable 'y'
    

Generating Target Code

Code Generation Techniques

We'll generate JavaScript code from our AST. The code generator will traverse the AST and produce equivalent JavaScript code.

Writing the Code Generator

  1. Create Code Generator: src/generator.js

     jsCopy codeconst escodegen = require('escodegen');
     const chalk = require('chalk');
    
     function generate(ast) {
       try {
         return escodegen.generate(ast);
       } catch (error) {
         console.error(chalk.red('Code Generation Error:'), error.message);
         throw error;
       }
     }
    
     module.exports = { generate };
    
  2. Example Usage:

     jsCopy codeconst { parse } = require('./parser');
     const { analyze } = require('./semantic');
     const { generate } = require('./generator');
     const code = 'let x = 42;';
     const ast = parse(code);
     analyze(ast);
     const output = generate(ast);
     console.log(output); // Outputs the JavaScript code
    

Optimization Techniques

Optimization Overview

Optimization techniques improve the performance and efficiency of the generated code. Common optimization techniques include:

  • Constant Folding: Evaluating constant expressions at compile time.

  • Dead Code Elimination: Removing code that will never be executed.

  • Inlining Functions: Replacing function calls with the function's body to reduce overhead.

Implementing Basic Optimizations

  1. Create Optimizer: src/optimizer.js

     jsCopy codeconst { traverseAst } = require('./parser');
    
     function optimize(ast) {
       traverseAst(ast, {
         leave(node) {
           // Constant Folding
           if (node.type === 'BinaryExpression' && node.left.type === 'Literal' && node.right.type === 'Literal') {
             switch (node.operator) {
               case '+':
                 node.type = 'Literal';
                 node.value = node.left.value + node.right.value;
                 delete node.left;
                 delete node.right;
                 break;
               case '-':
                 node.type = 'Literal';
                 node.value = node.left.value - node.right.value;
                 delete node.left;
                 delete node.right;
                 break;
               case '*':
                 node.type = 'Literal';
                 node.value = node.left.value * node.right.value;
                 delete node.left;
                 delete node.right;
                 break;
               case '/':
                 node.type = 'Literal';
                 node.value = node.left.value / node.right.value;
                 delete node.left;
                 delete node.right;
                 break;
             }
           }
    
           // Dead Code Elimination
           if (node.type === 'IfStatement' && node.test.type === 'Literal') {
             if (node.test.value) {
               return node.consequent;
             } else if (node.alternate) {
               return node.alternate;
             } else {
               return null;
             }
           }
         },
       });
     }
    
     module.exports = { optimize };
    
  2. Example Usage:

     jsCopy codeconst { parse } = require('./parser');
     const { analyze } = require('./semantic');
     const { optimize } = require('./optimizer');
     const { generate } = require('./generator');
     const code = 'let x = 2 + 3; if (false) { x = 10; }';
     const ast = parse(code);
     analyze(ast);
     optimize(ast);
     const output = generate(ast);
     console.log(output); // Outputs optimized JavaScript code
    

Putting It All Together

Full Example

  1. Create Compiler: src/compiler.js

     jsCopy codeconst { tokenize } = require('./lexer');
     const { parse } = require('./parser');
     const { analyze } = require('./semantic');
     const { optimize } = require('./optimizer');
     const { generate } = require('./generator');
    
     function compile(code) {
       const tokens = tokenize(code);
       const ast = parse(code);
       analyze(ast);
       optimize(ast);
       const output = generate(ast);
       return output;
     }
    
     module.exports = { compile };
    
  2. Example Usage:

     jsCopy codeconst { compile } = require('./compiler');
     const code = 'let x = 2 + 3; if (false) { x = 10; }';
     const output = compile(code);
     console.log(output); // Outputs optimized JavaScript code
    

Testing Your Compiler/Transpiler

  1. Create Tests: tests/test.js

     jsCopy codeconst { compile } = require('../src/compiler');
     const assert = require('assert');
    
     const code1 = 'let x = 42;';
     const output1 = compile(code1);
     assert.strictEqual(output1, code1);
    
     const code2 = 'let y = 2 + 3; if (false) { y = 10; }';
     const expectedOutput2 = 'let y = 5;';
     const output2 = compile(code2);
     assert.strictEqual(output2, expectedOutput2);
    
     console.log('All tests passed!');
    
  2. Run Tests:

     shCopy codenode tests/test.js
    

Advanced Topics and Optimization

Advanced Optimization Techniques

Beyond basic optimizations, advanced techniques can further improve the performance and efficiency of the generated code. These techniques often require more sophisticated analysis and transformations:

  • Loop Unrolling: Expanding loops to reduce the overhead of loop control.

  • Function Inlining: Replacing function calls with the function body to eliminate call overhead.

  • Constant Propagation: Replacing variables that have constant values with their constant values.

Implementing Advanced Optimizations

  1. Loop Unrolling:

     jsCopy codefunction unrollLoops(ast) {
       traverseAst(ast, {
         leave(node) {
           if (node.type === 'ForStatement' && node.test.type === 'BinaryExpression' && node.test.operator === '<') {
             const iterations = node.test.right.value;
             if (iterations <= 4) { // Example threshold
               let unrolledBody = [];
               for (let i = 0; i < iterations; i++) {
                 const bodyClone = JSON.parse(JSON.stringify(node.body));
                 unrolledBody = unrolledBody.concat(bodyClone.body);
               }
               node.body.body = unrolledBody;
               node.test.value = false; // Disable the original loop
             }
           }
         },
       });
     }
    
  2. Function Inlining:

     jsCopy codefunction inlineFunctions(ast) {
       const functionDeclarations = {};
    
       traverseAst(ast, {
         enter(node) {
           if (node.type === 'FunctionDeclaration') {
             functionDeclarations[node.id.name] = node;
           }
         },
       });
    
       traverseAst(ast, {
         enter(node, parent) {
           if (node.type === 'CallExpression' && functionDeclarations[node.callee.name]) {
             const func = functionDeclarations[node.callee.name];
             const funcBody = JSON.parse(JSON.stringify(func.body.body));
             const params = func.params.map(param => param.name);
             const args = node.arguments;
    
             funcBody.forEach(statement => {
               traverseAst(statement, {
                 enter(childNode) {
                   if (childNode.type === 'Identifier' && params.includes(childNode.name)) {
                     const argIndex = params.indexOf(childNode.name);
                     Object.assign(childNode, args[argIndex]);
                   }
                 },
               });
             });
    
             parent.body = parent.body.reduce((acc, stmt) => {
               if (stmt === node) {
                 acc = acc.concat(funcBody);
               } else {
                 acc.push(stmt);
               }
               return acc;
             }, []);
           }
         },
       });
     }
    
  3. Constant Propagation:

     jsCopy codefunction propagateConstants(ast) {
       const constants = {};
    
       traverseAst(ast, {
         enter(node) {
           if (node.type === 'VariableDeclarator' && node.init.type === 'Literal') {
             constants[node.id.name] = node.init.value;
           }
         },
       });
    
       traverseAst(ast, {
         enter(node) {
           if (node.type === 'Identifier' && constants.hasOwnProperty(node.name)) {
             node.type = 'Literal';
             node.value = constants[node.name];
           }
         },
       });
     }
    

Integrating Advanced Optimizations

To incorporate these advanced optimizations into our compiler, we need to update our optimizer.js file:

  1. Update Optimizer: src/optimizer.js

     jsCopy codeconst { traverseAst } = require('./parser');
    
     function optimize(ast) {
       traverseAst(ast, {
         leave(node) {
           // Basic Optimization: Constant Folding
           if (node.type === 'BinaryExpression' && node.left.type === 'Literal' && node.right.type === 'Literal') {
             switch (node.operator) {
               case '+':
                 node.type = 'Literal';
                 node.value = node.left.value + node.right.value;
                 delete node.left;
                 delete node.right;
                 break;
               case '-':
                 node.type = 'Literal';
                 node.value = node.left.value - node.right.value;
                 delete node.left;
                 delete node.right;
                 break;
               case '*':
                 node.type = 'Literal';
                 node.value = node.left.value * node.right.value;
                 delete node.left;
                 delete node.right;
                 break;
               case '/':
                 node.type = 'Literal';
                 node.value = node.left.value / node.right.value;
                 delete node.left;
                 delete node.right;
                 break;
             }
           }
    
           // Dead Code Elimination
           if (node.type === 'IfStatement' && node.test.type === 'Literal') {
             if (node.test.value) {
               return node.consequent;
             } else if (node.alternate) {
               return node.alternate;
             } else {
               return null;
             }
           }
         },
       });
    
       // Apply Advanced Optimizations
       unrollLoops(ast);
       inlineFunctions(ast);
       propagateConstants(ast);
     }
    
     module.exports = { optimize };
    
  2. Example Usage with Advanced Optimizations:

     jsCopy codeconst { parse } = require('./parser');
     const { analyze } = require('./semantic');
     const { optimize } = require('./optimizer');
     const { generate } = require('./generator');
     const code = `
       function add(a, b) {
         return a + b;
       }
       let result = add(2, 3);
       for (let i = 0; i < 3; i++) {
         result += i;
       }
     `;
     const ast = parse(code);
     analyze(ast);
     optimize(ast);
     const output = generate(ast);
     console.log(output); // Outputs optimized JavaScript code
    

Extending the Compiler

To support more complex features, consider extending the compiler to handle additional syntax and semantics:

  • Functions: Add support for function declarations, calls, and returns.

  • Control Flow: Implement support for more complex control flow constructs, such as switch statements and for loops.

  • Data Types: Add support for different data types, such as arrays, objects, and custom types.

Supporting Functions

To extend the compiler to handle functions, we need to update the grammar, parser, semantic analyzer, and code generator to support function declarations, calls, and returns.

  1. Update Grammar:

     jsCopy codeconst GRAMMAR = `
       program       → declaration* EOF ;
       declaration   → varDecl | funDecl | statement ;
       varDecl       → "let" IDENTIFIER ( "=" expression )? ";" ;
       funDecl       → "function" IDENTIFIER "(" parameters? ")" block ;
       parameters    → IDENTIFIER ( "," IDENTIFIER )* ;
       statement     → exprStmt | ifStmt | whileStmt | returnStmt | block ;
       exprStmt      → expression ";" ;
       ifStmt        → "if" "(" expression ")" statement ("else" statement)? ;
       whileStmt     → "while" "(" expression ")" statement ;
       returnStmt    → "return" expression? ";" ;
       block         → "{" declaration* "}" ;
       expression    → assignment ;
       assignment    → IDENTIFIER "=" assignment | equality ;
       equality      → comparison ( ( "!=" | "==" ) comparison )* ;
       comparison    → addition ( ( ">" | ">=" | "<" | "<=" ) addition )* ;
       addition      → multiplication ( ( "-" | "+" ) multiplication )* ;
       multiplication→ unary ( ( "/" | "*" ) unary )* ;
       unary         → ( "!" | "-" ) unary | primary ;
       primary       → NUMBER | STRING | IDENTIFIER | "(" expression ")" | call ;
       call          → IDENTIFIER "(" arguments? ")" ;
       arguments     → expression ( "," expression )* ;
     `;
    
  2. Update Parser: src/parser.js

     jsCopy codefunction parseFunctionDeclaration() {
       const node = {
         type: 'FunctionDeclaration',
         id: { type: 'Identifier', name: '' },
         params: [],
         body: { type: 'BlockStatement', body: [] }
       };
       consume('function');
       node.id.name = consume('IDENTIFIER').value;
       consume('(');
       if (peek().type !== ')') {
         do {
           node.params.push({
             type: 'Identifier',
             name: consume('IDENTIFIER').value
           });
         } while (match(','));
       }
       consume(')');
       node.body = parseBlock();
       return node;
     }
    
     function parseCallExpression(callee) {
       const node = {
         type: 'CallExpression',
         callee: callee,
         arguments: []
       };
       consume('(');
       if (peek().type !== ')') {
         do {
           node.arguments.push(parseExpression());
         } while (match(','));
       }
       consume(')');
       return node;
     }
    
     function parseExpression() {
       // Extend parsePrimary to handle function calls
       const expr = parsePrimary();
       if (match('(')) {
         return parseCallExpression(expr);
       }
       return expr;
     }
    
     function parsePrimary() {
       if (match('NUMBER', 'STRING')) {
         return {
           type: 'Literal',
           value: previous().value
         };
       }
       if (match('IDENTIFIER')) {
         return {
           type: 'Identifier',
           name: previous().value
         };
       }
       if (match('(')) {
         const expr = parseExpression();
         consume(')');
         return expr;
       }
       throw new Error(`Unexpected token: ${peek().type}`);
     }
    
     function parseBlock() {
       const node = { type: 'BlockStatement', body: [] };
       consume('{');
       while (!match('}')) {
         node.body.push(parseDeclaration());
       }
       return node;
     }
    
     function parseDeclaration() {
       if (match('let')) return parseVariableDeclaration();
       if (match('function')) return parseFunctionDeclaration();
       return parseStatement();
     }
    
     module.exports = { parse };
    
  3. Update Semantic Analysis: src/semantic.js

     jsCopy codefunction analyze(ast) {
       const scopes = [{}];
       const declaredVariables = new Set();
    
       function enterScope() {
         scopes.push({});
       }
    
       function leaveScope() {
         scopes.pop();
       }
    
       traverseAst(ast, {
         enter(node) {
           if (node.type === 'FunctionDeclaration') {
             declaredVariables.add(node.id.name);
             enterScope();
             node.params.forEach(param => scopes[scopes.length - 1][param.name] = true);
           } else if (node.type === 'VariableDeclaration') {
             node.declarations.forEach(decl => scopes[scopes.length - 1][decl.id.name] = true);
           } else if (node.type === 'Identifier' && node.parent.type !== 'VariableDeclarator') {
             if (!scopes.some(scope => scope[node.name])) {
               console.error(`Undeclared variable ${node.name}`);
               throw new Error(`Undeclared variable ${node.name}`);
             }
           }
         },
         leave(node) {
           if (node.type === 'FunctionDeclaration') {
             leaveScope();
           }
         },
       });
     }
    
     module.exports = { analyze };
    
  4. Update Code Generation: src/generator.js

     jsCopy codefunction generateFunctionDeclaration(node) {
       let code = `function ${node.id.name}(${node.params.map(p => p.name).join(', ')}) `;
       code += generate(node.body);
       return code;
     }
    
     function generateCallExpression(node) {
       const callee = generate(node.callee);
       const args = node.arguments.map(arg => generate(arg)).join(', ');
       return `${callee}(${args})`;
     }
    
     function generate(node) {
       switch (node.type) {
         case 'Program':
           return node.body.map(generate).join('\n');
         case 'VariableDeclaration':
           return `let ${node.declarations.map(decl => generate(decl)).join(', ')};`;
         case 'VariableDeclarator':
           return `${node.id.name}${node.init ? ' = ' + generate(node.init) : ''}`;
         case 'FunctionDeclaration':
           return generateFunctionDeclaration(node);
         case 'BlockStatement':
           return `{ ${node.body.map(generate).join(' ')} }`;
         case 'ReturnStatement':
           return `return ${node.argument ? generate(node.argument) : ''};`;
         case 'IfStatement':
           return `if (${generate(node.test)}) ${generate(node.consequent)}${node.alternate ? ' else ' + generate(node.alternate) : ''}`;
         case 'BinaryExpression':
           return `${generate(node.left)} ${node.operator} ${generate(node.right)}`;
         case 'Literal':
           return JSON.stringify(node.value);
         case 'Identifier':
           return node.name;
         case 'CallExpression':
           return generateCallExpression(node);
         default:
           throw new Error(`Unknown node type: ${node.type}`);
       }
     }
    
     module.exports = { generate };
    
  5. Example Usage with Functions:

     jsCopy codeconst { compile } = require('./compiler');
     const code = `
       function add(a, b) {
         return a + b;
       }
       let result = add(2, 3);
     `;
     const output = compile(code);
     console.log(output); // Outputs JavaScript code with function support
    

Supporting Complex Control Flow

To extend the compiler to support more complex control flow constructs, such as switch statements and for loops, follow similar steps: update the grammar, parser, semantic analyzer, and code generator.

  1. Update Grammar:

     jsCopy codeconst GRAMMAR = `
       program       → declaration* EOF ;
       declaration   → varDecl | funDecl | statement ;
       varDecl       → "let" IDENTIFIER ( "=" expression )? ";" ;
       funDecl       → "function" IDENTIFIER "(" parameters? ")" block ;
       parameters    → IDENTIFIER ( "," IDENTIFIER )* ;
       statement     → exprStmt | ifStmt | whileStmt | forStmt | switchStmt | returnStmt | block ;
       exprStmt      → expression ";" ;
       ifStmt        → "if" "(" expression ")" statement ("else" statement)? ;
       whileStmt     → "while" "(" expression ")" statement ;
       forStmt       → "for" "(" exprStmt expression? ";" expression? ")" statement ;
       switchStmt    → "switch" "(" expression ")" "{" caseClauses "}" ;
       caseClauses   → caseClause* ( "default" ":" statement* )? ;
       caseClause    → "case" expression ":" statement* ;
       returnStmt    → "return" expression? ";" ;
       block         → "{" declaration* "}" ;
       expression    → assignment ;
       assignment    → IDENTIFIER "=" assignment | equality ;
       equality      → comparison ( ( "!=" | "==" ) comparison )* ;
       comparison    → addition ( ( ">" | ">=" | "<" | "<=" ) addition )* ;
       addition      → multiplication ( ( "-" | "+" ) multiplication )* ;
       multiplication→ unary ( ( "/" | "*" ) unary )* ;
       unary         → ( "!" | "-" ) unary | primary ;
       primary       → NUMBER | STRING | IDENTIFIER | "(" expression ")" | call ;
       call          → IDENTIFIER "(" arguments? ")" ;
       arguments     → expression ( "," expression )* ;
     `;
    
  2. Update Parser: src/parser.js

     jsCopy codefunction parseSwitchStatement() {
       const node = {
         type: 'SwitchStatement',
         discriminant: null,
         cases: []
       };
       consume('switch');
       consume('(');
       node.discriminant = parseExpression();
       consume(')');
       consume('{');
       while (!match('}')) {
         node.cases.push(parseCaseClause());
       }
       return node;
     }
    
     function parseCaseClause() {
       const node = {
         type: 'SwitchCase',
         test: null,
         consequent: []
       };
       if (match('case')) {
         node.test = parseExpression();
       } else {
         consume('default');
       }
       consume(':');
       while (!match('case') && !match('default') && !match('}')) {
         node.consequent.push(parseStatement());
       }
       return node;
     }
    
     function parseForStatement() {
       const node = {
         type: 'ForStatement',
         init: null,
         test: null,
         update: null,
         body: null
       };
       consume('for');
       consume('(');
       node.init = parseExpressionStatement();
       if (!match(';')) {
         node.test = parseExpression();
       }
       consume(';');
       if (!match(')')) {
         node.update = parseExpression();
       }
       consume(')');
       node.body = parseStatement();
       return node;
     }
    
     function parseDeclaration() {
       if (match('let')) return parseVariableDeclaration();
       if (match('function')) return parseFunctionDeclaration();
       if (match('switch')) return parseSwitchStatement();
       if (match('for')) return parseForStatement();
       return parseStatement();
     }
    
     module.exports = { parse };
    
  3. Update Semantic Analysis: src/semantic.js

     jsCopy codefunction analyze(ast) {
       const scopes = [{}];
       const declaredVariables = new Set();
    
       function enterScope() {
         scopes.push({});
       }
    
       function leaveScope() {
         scopes.pop();
       }
    
       traverseAst(ast, {
         enter(node) {
           if (node.type === 'FunctionDeclaration') {
             declaredVariables.add(node.id.name);
             enterScope();
             node.params.forEach(param => scopes[scopes.length - 1][param.name] = true);
           } else if (node.type === 'VariableDeclaration') {
             node.declarations.forEach(decl => scopes[scopes.length - 1][decl.id.name] = true);
           } else if (node.type === 'Identifier' && node.parent.type !== 'VariableDeclarator') {
             if (!scopes.some(scope => scope[node.name])) {
               console.error(`Undeclared variable ${node.name}`);
               throw new Error(`Undeclared variable ${node.name}`);
             }
           }
         },
         leave(node) {
           if (node.type === 'FunctionDeclaration') {
             leaveScope();
           }
         },
       });
     }
    
     module.exports = { analyze };
    
  4. Update Code Generation: src/generator.js

     jsCopy codefunction generateSwitchStatement(node) {
       let code = `switch (${generate(node.discriminant)}) {`;
       node.cases.forEach(c => {
         code += c.test ? `case ${generate(c.test)}:` : 'default:';
         c.consequent.forEach(stmt => {
           code += generate(stmt);
         });
       });
       code += '}';
       return code;
     }
    
     function generateForStatement(node) {
       let code = `for (${generate(node.init)} ${node.test ? generate(node.test) : ''}; ${node.update ? generate(node.update) : ''}) `;
       code += generate(node.body);
       return code;
     }
    
     function generate(node) {
       switch (node.type) {
         case 'Program':
           return node.body.map(generate).join('\n');
         case 'VariableDeclaration':
           return `let ${node.declarations.map(decl => generate(decl)).join(', ')};`;
         case 'VariableDeclarator':
           return `${node.id.name}${node.init ? ' = ' + generate(node.init) : ''}`;
         case 'FunctionDeclaration':
           return generateFunctionDeclaration(node);
         case 'BlockStatement':
           return `{ ${node.body.map(generate).join(' ')} }`;
         case 'ReturnStatement':
           return `return ${node.argument ? generate(node.argument) : ''};`;
         case 'IfStatement':
           return `if (${generate(node.test)}) ${generate(node.consequent)}${node.alternate ? ' else ' + generate(node.alternate) : ''}`;
         case 'BinaryExpression':
           return `${generate(node.left)} ${node.operator} ${generate(node.right)}`;
         case 'Literal':
           return JSON.stringify(node.value);
         case 'Identifier':
           return node.name;
         case 'CallExpression':
           return generateCallExpression(node);
         case 'SwitchStatement':
           return generateSwitchStatement(node);
         case 'ForStatement':
           return generateForStatement(node);
         default:
           throw new Error(`Unknown node type: ${node.type}`);
       }
     }
    
     module.exports = { generate };
    
  5. Example Usage with Control Flow:

     jsCopy codeconst { compile } = require('./compiler');
     const code = `
       let result = 0;
       switch (result) {
         case 0:
           result = 1;
           break;
         default:
           result = -1;
       }
    
       for (let i = 0; i < 3; i++) {
         result += i;
       }
     `;
     const output = compile(code);
     console.log(output); // Outputs JavaScript code with switch and for loop support
    

Adding Support for New Data Types

To support arrays, objects, and custom types, extend the grammar, parser, semantic analyzer, and code generator accordingly.

  1. Update Grammar:

     jsCopy codeconst GRAMMAR = `
       program       → declaration* EOF ;
       declaration   → varDecl | funDecl | statement ;
       varDecl       → "let" IDENTIFIER ( "=" expression )? ";" ;
       funDecl       → "function" IDENTIFIER "(" parameters? ")" block ;
       parameters    → IDENTIFIER ( "," IDENTIFIER )* ;
       statement     → exprStmt | ifStmt | whileStmt | forStmt | switchStmt | returnStmt | block ;
       exprStmt      → expression ";" ;
       ifStmt        → "if" "(" expression ")" statement ("else" statement)? ;
       whileStmt     → "while" "(" expression ")" statement ;
       forStmt       → "for" "(" exprStmt expression? ";" expression? ")" statement ;
       switchStmt    → "switch" "(" expression ")" "{" caseClauses "}" ;
       caseClauses   → caseClause* ( "default" ":" statement* )? ;
       caseClause    → "case" expression ":" statement* ;
       returnStmt    → "return" expression? ";" ;
       block         → "{" declaration* "}" ;
       expression    → assignment ;
       assignment    → IDENTIFIER "=" assignment | equality ;
       equality      → comparison ( ( "!=" | "==" ) comparison )* ;
       comparison    → addition ( ( ">" | ">=" | "<" | "<=" ) addition )* ;
       addition      → multiplication ( ( "-" | "+" ) multiplication )* ;
       multiplication→ unary ( ( "/" | "*" ) unary )* ;
       unary         → ( "!" | "-" ) unary | primary ;
       primary       → NUMBER | STRING | IDENTIFIER | "(" expression ")" | call | array | object ;
       call          → IDENTIFIER "(" arguments? ")" ;
       arguments     → expression ( "," expression )* ;
       array         → "[" elements? "]" ;
       elements      → expression ( "," expression )* ;
       object        → "{" properties? "}" ;
       properties    → property ( "," property )* ;
       property      → IDENTIFIER ":" expression ;
     `;
    
  2. Update Parser: src/parser.js

     jsCopy codefunction parseArray() {
       const node = { type: 'ArrayExpression', elements: [] };
       consume('[');
       if (!match(']')) {
         do {
           node.elements.push(parseExpression());
         } while (match(','));
       }
       consume(']');
       return node;
     }
    
     function parseObject() {
       const node = { type: 'ObjectExpression', properties: [] };
       consume('{');
       if (!match('}')) {
         do {
           node.properties.push(parseProperty());
         } while (match(','));
       }
       consume('}');
       return node;
     }
    
     function parseProperty() {
       const node = { type: 'Property', key: null, value: null };
       node.key = consume('IDENTIFIER');
       consume(':');
       node.value = parseExpression();
       return node;
     }
    
     function parsePrimary() {
       if (match('NUMBER', 'STRING')) {
         return {
           type: 'Literal',
           value: previous().value
         };
       }
       if (match('IDENTIFIER')) {
         return {
           type: 'Identifier',
           name: previous().value
         };
       }
       if (match('(')) {
         const expr = parseExpression();
         consume(')');
         return expr;
       }
       if (match('[')) {
         return parseArray();
       }
       if (match('{')) {
         return parseObject();
       }
       throw new Error(`Unexpected token: ${peek().type}`);
     }
    
     module.exports = { parse };
    
  3. Update Semantic Analysis: src/semantic.js

     jsCopy codefunction analyze(ast) {
       const scopes = [{}];
       const declaredVariables = new Set();
    
       function enterScope() {
         scopes.push({});
       }
    
       function leaveScope() {
         scopes.pop();
       }
    
       traverseAst(ast, {
         enter(node) {
           if (node.type === 'FunctionDeclaration') {
             declaredVariables.add(node.id.name);
             enterScope();
             node.params.forEach(param => scopes[scopes.length - 1][param.name] = true);
           } else if (node.type === 'VariableDeclaration') {
             node.declarations.forEach(decl => scopes[scopes.length - 1][decl.id.name] = true);
           } else if (node.type === 'Identifier' && node.parent.type !== 'VariableDeclarator') {
             if (!scopes.some(scope => scope[node.name])) {
               console.error(`Undeclared variable ${node.name}`);
               throw new Error(`Undeclared variable ${node.name}`);
             }
           }
         },
         leave(node) {
           if (node.type === 'FunctionDeclaration') {
             leaveScope();
           }
         },
       });
     }
    
     module.exports = { analyze };
    
  4. Update Code Generation: src/generator.js

     jsCopy codefunction generateArrayExpression(node) {
       return `[${node.elements.map(generate).join(', ')}]`;
     }
    
     function generateObjectExpression(node) {
       const properties = node.properties.map(prop => `${prop.key.name}: ${generate(prop.value)}`).join(', ');
       return `{${properties}}`;
     }
    
     function generate(node) {
       switch (node.type) {
         case 'Program':
           return node.body.map(generate).join('\n');
         case 'VariableDeclaration':
           return `let ${node.declarations.map(decl => generate(decl)).join(', ')};`;
         case 'VariableDeclarator':
           return `${node.id.name}${node.init ? ' = ' + generate(node.init) : ''}`;
         case 'FunctionDeclaration':
           return generateFunctionDeclaration(node);
         case 'BlockStatement':
           return `{ ${node.body.map(generate).join(' ')} }`;
         case 'ReturnStatement':
           return `return ${node.argument ? generate(node.argument) : ''};`;
         case 'IfStatement':
           return `if (${generate(node.test)}) ${generate(node.consequent)}${node.alternate ? ' else ' + generate(node.alternate) : ''}`;
         case 'BinaryExpression':
           return `${generate(node.left)} ${node.operator} ${generate(node.right)}`;
         case 'Literal':
           return JSON.stringify(node.value);
         case 'Identifier':
           return node.name;
         case 'CallExpression':
           return generateCallExpression(node);
         case 'SwitchStatement':
           return generateSwitchStatement(node);
         case 'ForStatement':
           return generateForStatement(node);
         case 'ArrayExpression':
           return generateArrayExpression(node);
         case 'ObjectExpression':
           return generateObjectExpression(node);
         default:
           throw new Error(`Unknown node type: ${node.type}`);
       }
     }
    
     module.exports = { generate };
    
  5. Example Usage with Arrays and Objects:

     jsCopy codeconst { compile } = require('./compiler');
     const code = `
       let arr = [1, 2, 3];
       let obj = { a: 1, b: 2 };
       arr.push(4);
       obj.c = 3;
     `;
     const output = compile(code);
     console.log(output); // Outputs JavaScript code with array and object support
    

With these extensions, the compiler now supports functions, advanced control flow constructs, and new data types, significantly enhancing its capabilities and practical applications. Each extension builds upon the previous implementation, demonstrating how to iteratively and modularly enhance a compiler's functionality.

Conclusion

We’ve covered the fundamental steps in building a custom compiler or transpiler using Node.js, delving into both basic and advanced optimization techniques. By understanding and implementing these techniques, you can significantly enhance the performance and efficiency of the generated code.

Creating a compiler is not only a technically enriching experience but also a way to deepen your appreciation for the tools and languages we use daily. Whether you’re optimizing your compiler for performance, extending it to support new language features, or simply refining your understanding, the skills you’ve acquired will serve you well across various domains in software development.

Remember, the field of compiler construction is vast and continually evolving. As you continue to experiment and learn, you’ll discover new techniques and approaches that can further enhance your projects. Don’t hesitate to dive into advanced topics, join online communities, and contribute to open-source projects. The world of compilers and transpilers is as rewarding as it is challenging, and with persistence and curiosity, you’ll continue to grow as a developer.

Happy coding, and may your compilers be ever efficient and your code transformations ever elegant!

References and Further Reading

  • Books:

    • "Compilers: Principles, Techniques, and Tools" by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman.

    • "Modern Compiler Implementation in Java" by Andrew W. Appel.

  • Articles and Documentation:

  • Tutorials and Courses:

    • "Crafting Interpreters" by Bob Nystrom (Online book with detailed explanation of building a language interpreter).

    • "The Super Tiny Compiler" by Jamie Kyle (A step-by-step guide to building a compiler in JavaScript).

    • "Compiler Design" by GeeksforGeeks (Comprehensive tutorials on compiler design concepts).

Online Communities and Forums

  • Stack Overflow: A great place to ask questions and find answers related to compiler design and Node.js.

  • Reddit: Subreddits like r/Compilers and r/javascript can be useful for discussions and getting feedback on your projects.

  • GitHub: Explore open-source compiler and transpiler projects to see real-world implementations and contribute to them.

7
Subscribe to my newsletter

Read articles from Pradeepto Sarkar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pradeepto Sarkar
Pradeepto Sarkar

An undergraduate engineering student pursuing computer science currently in my final year. I am an avid competitive programmer and a full stack software developer covering both front-end and back-end of the product development process. The skills I am proficient in: Full Stack Web Development using MERN (MongoDB, Express, React and Node.js) Problem solving using C++ Good knowledge of open source Other domains that I have dabbled into: Blockchain development A little bit of Machine Learning Cross-platform mobile app development