Chapter 5: Grammar of the language

Sawez FaisalSawez Faisal
4 min read

Understanding Grammars in Programming Languages

What is a Grammar?

In the world of programming languages, a grammar is a set of rules that defines the structure of valid statements within the language. It's the backbone of how we write and interpret code, serving several crucial purposes:

  1. It allows developers to understand how to write valid code

  2. It guides the implementation of parsers and compilers

  3. It helps in detecting and reporting syntax errors

A well-defined grammar is essential for creating a consistent and understandable programming language.

Components of a Grammar

A typical grammar consists of several key components:

  1. Terminal Symbols: These are the basic building blocks of the language, such as keywords, operators, and literals.

  2. Non-terminal Symbols: These represent higher-level constructs that can be broken down into terminal symbols or other non-terminals.

  3. Production Rules: These define how non-terminal symbols can be expanded into sequences of terminal and non-terminal symbols.

EBNF: A Common Notation for Grammars

Extended Backus-Naur Form (EBNF) is a popular notation used to express programming language grammars. It provides a concise and readable way to define the syntax rules of a language.

Key elements of EBNF notation include:

  • = means "is defined as"

  • | means "or"

  • [ ] encloses optional elements

  • { } encloses elements that can be repeated zero or more times

  • ( ) is used for grouping

  • Terminal symbols are often enclosed in quotes

Addressing Ambiguity in Grammars

One of the challenges in designing a grammar is avoiding ambiguity. A grammar is ambiguous if it allows for multiple valid interpretations of the same code. This can lead to unexpected behavior and makes it difficult for both humans and compilers to reason about the code.

Common sources of ambiguity include:

  1. Operator Precedence: When it's unclear which operators should be applied first in an expression.

  2. Dangling Else: In if-else statements when it's not clear which 'if' an 'else' belongs to.

  3. Left Recursion: When a non-terminal is defined in terms of itself as the leftmost symbol.

Here is a snippet of my grammar :

program = {function_declaration | function_definition | variable_declaration},
    main_function, {function_declaration | function_definition}, EOF;

main_function = "main", "(", ")", block;

function = "function", identifier, "(", [parameter_list], ")", block;
function_declaration = function, ";";
function_definition = function, block;

block = "{", {statement}, "}";

statement = variable_declaration | assignment | if_statement | while_loop |
            for_loop | function_call | expression_statement | return_statement |
            "error";

variable_declaration = type, identifier, [ "=", expression ], ";";

assignment = identifier, ("=" | "+=" | "-=" | "*=" | "/=" | "%="), expression,
    ";";

if_statement = "if", condition, block, {"else-if", condition, block},
    [ "else", block ];

while_loop = "while", condition, block;

for_loop = "for", "(", [type], identifier, "=", start, ",", identifier,
    comparison_operator, end, ",", identifier,
    ("+=" | "-=" | "*=" | "/=" | "%="), number ")", block;

function_call = identifier, "(", [argument_list], ")"

Resolving Ambiguity: The Case of Operator Precedence

Let's look at how a well-designed grammar can resolve ambiguity, using operator precedence as an example. Consider this expression:

expression = logic_or ;
logic_or = logic_and, {"||", logic_and} ;
logic_and = equality, {"&&", equality} ;
equality = relational, {("==" | "!="), relational} ;
relational = additive, {("<" | ">" | "<=" | ">="), additive} ;
additive = multiplicative, {("+" | "-"), multiplicative} ;
multiplicative = unary, {("*" | "/" | "%"), unary} ;
unary = {("!" | "-")}, primary ;
primary = NUMBER | IDENTIFIER | "(", expression, ")" ;

This structure ensures that operators with higher precedence (like *) are nested more deeply in the parse tree than operators with lower precedence (like + or ||). It also naturally handles left-associativity for binary operators.

Conclusion

Understanding grammars is crucial for anyone involved in language design or compiler construction. A well-designed grammar not only defines the syntax of a language but also plays a significant role in making the language intuitive and unambiguous for developers to use.

As programming languages continue to evolve, so too will the grammars that define them. Whether you're designing a new domain-specific language or just curious about how your favorite programming language works under the hood, a solid grasp of grammar concepts will serve you well.

0
Subscribe to my newsletter

Read articles from Sawez Faisal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sawez Faisal
Sawez Faisal

New to the field and eager to learn how complex systems work smoothly. From building compilers to scalable systems, I’m solving problems as they come—whatever the domain—and sharing all the highs, lows, and lessons along the way!