Java Tokens

Dilip PatelDilip Patel
9 min read

Java tokens are the smallest elements of a Java program that are meaningful to the compiler. They include keywords, identifiers, literals, operators, and separators. During compilation, Java code is broken down into these tokens through a process called tokenization, which involves lexical analysis, ignoring whitespace and comments, error detection, and symbol table creation. Keywords are reserved words with specific meanings, identifiers are names for program elements, literals are constant values, operators perform operations on operands, and separators define code structure. Tokenization is crucial for understanding and executing Java code.

What are Java Tokens?

Tokens are the smallest elements of a Java program that are meaningful to the compiler. When a Java program is compiled, the compiler parses the text and extracts individual tokens.

Java has five main types of tokens:

  • Keywords

  • Identifiers

  • Literals

  • Operators

  • Separators.

During compilation, Java code is broken down into these token types.

Example:

Token TypeExamples
Keywordspublic, class, if, else
IdentifiersmyVariable, calculateSum
Literals42, 3.14, "Hello"
Operators+, -, *, /, =
Separators(, ), {, }, ;

Java Compilation: Tokenization

When a Java program is compiled, the process involves several steps, one of which is parsing the text and extracting individual tokens.

Tokenization breaks down source code into meaningful units.

  1. Lexical Analysis: This is the first phase of the compilation process. The source code is read by the compiler, and the sequence of characters is converted into a sequence of tokens. Tokens are the smallest units of meaningful data for the compiler, such as keywords, identifiers, literals, operators, and separators.

  2. Tokenization: During tokenization, the compiler scans the source code from left to right, breaking it down into tokens. Each token represents a string of characters that form a syntactic unit. For example, in the statement int a = 5;, the tokens would be int, a, =, 5, and ;.

  3. Ignoring Whitespace and Comments: While parsing, the compiler ignores whitespace (spaces, tabs, newlines) and comments, as they do not affect the execution of the program. Their primary purpose is to enhance readability and provide explanations within the code.

  4. Error Detection: During this phase, the compiler also checks for lexical errors, such as invalid characters or malformed tokens. If any errors are found, the compiler generates error messages to help the programmer identify and correct the issues.

  5. Symbol Table Creation: As tokens are identified, the compiler may also create a symbol table, which is a data structure used to store information about identifiers (such as variable names, function names, etc.) and their attributes (such as data type, scope, etc.).

  6. Output of Tokens: The result of the lexical analysis phase is a stream of tokens that are passed on to the next phase of compilation, which is syntax analysis or parsing. This phase uses the tokens to build a syntactic structure, often represented as a parse tree or abstract syntax tree (AST).

Keywords

Keywords in Java are reserved words that have specific meanings and purposes within the language. They are integral to the syntax and structure of Java programs and cannot be used for naming variables, methods, classes, or any other identifiers. This restriction ensures that the compiler can correctly interpret the code.

For example, the keyword class, is used to define a new class in Java.

public class Example {
    public static void main(String[] args) {
        int number = 10; // 'int' is a keyword used to declare an integer variable
        if (number > 5) { // 'if' is a keyword used to start a conditional statement
            System.out.println("Number is greater than 5");
        }
    }
}

In this example:

  • public, class, int, and if are all keywords.

  • public is used to specify the access level of the class and method.

  • class is used to declare a class.

  • int is used to declare an integer variable.

  • if is used to start a conditional statement.

Identifiers

Identifiers in Java are names given to elements such as variables, methods, classes, and interfaces. They must adhere to specific rules:

  1. Allowed Characters: Identifiers can include uppercase and lowercase letters (A-Z, a-z), digits (0-9), dollar signs ($), and underscores (_).

  2. Starting Character: They must begin with a letter (A-Z, a-z), a dollar sign ($), or an underscore (_). They cannot start with a digit.

  3. Case Sensitivity: Identifiers are case-sensitive, meaning myVariable and MyVariable are considered different identifiers.

  4. Keywords Restriction: Identifiers cannot be Java keywords like int, class, or if.

  5. Length: There is no limit to the length of an identifier.

Examples:

  • Valid Identifiers: myVariable, _myVariable, $myVariable, myVariable1

  • Invalid Identifiers: 1myVariable (starts with a digit), my Variable (contains a space), my-Variable (contains a hyphen), int (a keyword)

Literals

Literals in Java are constant values that are directly written in the code and assigned to variables. They represent fixed values and can be of various types:

  1. Integer Literals: These are whole numbers without fractional parts. They can be specified in different number systems:

    • Decimal: int decimalNumber = 42;

    • Octal: int octalNumber = 052; (starts with 0)

    • Hexadecimal: int hexNumber = 0x2A; (starts with 0x)

  2. Floating-point Literals: These represent numbers with fractional parts and can be written in decimal or exponent form:

    • Decimal: double decimalFloat = 3.14;

    • Exponent: double exponentFloat = 2.5e3; (equivalent to 2.5 × 10³)

  3. Boolean Literals: These represent true or false values:

    • boolean isJavaFun = true;

    • boolean isFishTasty = false;

  4. Character Literals: These are single characters enclosed in single quotes, including escape sequences for special characters:

    • char letterA = 'A';

    • char newLine = '\n'; (escape sequence for newline)

  5. String Literals: These are sequences of characters enclosed in double quotes:

    • String greeting = "Hello, World!";

    • String emptyString = ""; (an empty string)

Operators

Operators are symbols that specify operations to be performed on operands. They can be arithmetic (e.g., +, -, *, /), relational (e.g., <, >, ==, !=), logical (e.g., &&, ||), bitwise, etc. Operators are crucial for performing calculations and making decisions in a program.

  1. Arithmetic Operators: These operators perform basic mathematical operations.

    • + (Addition): int sum = 5 + 3; // sum is 8

    • - (Subtraction): int difference = 5 - 3; // difference is 2

    • * (Multiplication): int product = 5 * 3; // product is 15

    • / (Division): int quotient = 6 / 3; // quotient is 2

    • % (Modulus): int remainder = 5 % 3; // remainder is 2

  2. Relational Operators: These operators compare two values.

    • < (Less than): boolean result = 5 < 3; // result is false

    • > (Greater than): boolean result = 5 > 3; // result is true

    • == (Equal to): boolean result = 5 == 3; // result is false

    • != (Not equal to): boolean result = 5 != 3; // result is true

  3. Logical Operators: These operators are used to combine multiple boolean expressions.

    • && (Logical AND): boolean result = (5 > 3) && (3 > 1); // result is true

    • || (Logical OR): boolean result = (5 > 3) || (3 < 1); // result is true

    • ! (Logical NOT): boolean result = !(5 > 3); // result is false

  4. Bitwise Operators: These operators perform operations on bits.

    • & (Bitwise AND): int result = 5 & 3; // result is 1

    • | (Bitwise OR): int result = 5 | 3; // result is 7

    • ^ (Bitwise XOR): int result = 5 ^ 3; // result is 6

    • ~ (Bitwise NOT): int result = ~5; // result is -6

    • << (Left shift): int result = 5 << 1; // result is 10

    • >> (Right shift): int result = 5 >> 1; // result is 2

Separators (Punctuators)

Separators are characters that separate elements within the program code. They help in defining the structure of the code. Common separators include parentheses (), braces {}, brackets [], semicolon ;, and comma ,. The semicolon terminates statements in Java.

  1. Parentheses (): Used to group expressions and parameters in method calls.

    • Example: System.out.println("Hello, World!"); - The parentheses enclose the argument for the println method.
  2. Braces {}: Define blocks of code, such as the body of a class, method, or loop.

    • Example:

        public class MyClass {
            public void myMethod() {
                // Code block
            }
        }
      
    • The braces enclose the code block for the class and method.

  3. Brackets []: Used for array declarations and accessing array elements.

    • Example: int[] numbers = {1, 2, 3}; - The brackets indicate that numbers is an array.
  4. Semicolon ;: Terminates statements in Java.

    • Example: int a = 5; - The semicolon marks the end of the statement.
  5. Comma ,: Separates multiple items in a list, such as in variable declarations or method parameters.

    • Example: int a = 1, b = 2, c = 3; - The commas separate the variable declarations.

Summary

  1. Keywords: Keywords are reserved words in Java that have predefined meanings. They are always written in lowercase and cannot be used as names for variables, methods, or identifiers. Examples include int, class, void, etc.

  2. Identifiers: Identifiers are tokens that represent names assigned to variables, methods, and classes. They must follow certain rules: they can include uppercase and lowercase letters, digits, dollar signs, and underscores; they must begin with a letter, dollar sign, or underscore; they are case-sensitive; they cannot be keywords; and they can be of any length. Examples of invalid identifiers include names with spaces, punctuation, or starting with a number.

  3. Literals: Literals are constant values assigned to variables. They can be of various types:

    • Integer Literals: Whole numbers without fractional parts, which can be specified in decimal, octal, or hexadecimal format.

    • Floating-point Literals: Numbers with fractional parts, written in decimal or exponent form.

    • Boolean Literals: Represent true or false values.

    • Character Literals: Single characters enclosed in single quotes, including escape sequences for special characters.

    • String Literals: Multiple characters enclosed in double quotes.

  4. Punctuators (Separators): Punctuators are used to group code elements and inform the compiler about the structure of the code. The most common punctuator in Java is the semicolon, which terminates statements.

  5. Operators: Operators are symbols that specify operations to be performed on operands. They can be arithmetic, relational, boolean, bitwise, etc. Operators are crucial for performing calculations and making decisions in a program.

Conclusion

  • Tokens are the fundamental elements of a Java program.

  • Java is case-sensitive, meaning identifiers like myVariable and MyVariable are different.

  • Whitespace (spaces, tabs, newlines) separates tokens but doesn't affect program execution.

  • Tokenization is the process of breaking down source code into valid tokens.

0
Subscribe to my newsletter

Read articles from Dilip Patel directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Dilip Patel
Dilip Patel

Software Developer