Java Tokens
Java tokens are the smallest elements of a Java program that are meaningful to the compiler. They include keywords, identifiers, literals, operators, and separators. During compilation, Java code is broken down into these tokens through a process called tokenization, which involves lexical analysis, ignoring whitespace and comments, error detection, and symbol table creation. Keywords are reserved words with specific meanings, identifiers are names for program elements, literals are constant values, operators perform operations on operands, and separators define code structure. Tokenization is crucial for understanding and executing Java code.
What are Java Tokens?
Tokens are the smallest elements of a Java program that are meaningful to the compiler. When a Java program is compiled, the compiler parses the text and extracts individual tokens.
Java has five main types of tokens:
Keywords
Identifiers
Literals
Operators
Separators.
During compilation, Java code is broken down into these token types.
Example:
Token Type | Examples |
Keywords | public , class , if , else |
Identifiers | myVariable , calculateSum |
Literals | 42 , 3.14 , "Hello" |
Operators | + , - , * , / , = |
Separators | ( , ) , { , } , ; |
Java Compilation: Tokenization
When a Java program is compiled, the process involves several steps, one of which is parsing the text and extracting individual tokens.
Tokenization breaks down source code into meaningful units.
Lexical Analysis: This is the first phase of the compilation process. The source code is read by the compiler, and the sequence of characters is converted into a sequence of tokens. Tokens are the smallest units of meaningful data for the compiler, such as keywords, identifiers, literals, operators, and separators.
Tokenization: During tokenization, the compiler scans the source code from left to right, breaking it down into tokens. Each token represents a string of characters that form a syntactic unit. For example, in the statement
int a = 5;
, the tokens would beint
,a
,=
,5
, and;
.Ignoring Whitespace and Comments: While parsing, the compiler ignores whitespace (spaces, tabs, newlines) and comments, as they do not affect the execution of the program. Their primary purpose is to enhance readability and provide explanations within the code.
Error Detection: During this phase, the compiler also checks for lexical errors, such as invalid characters or malformed tokens. If any errors are found, the compiler generates error messages to help the programmer identify and correct the issues.
Symbol Table Creation: As tokens are identified, the compiler may also create a symbol table, which is a data structure used to store information about identifiers (such as variable names, function names, etc.) and their attributes (such as data type, scope, etc.).
Output of Tokens: The result of the lexical analysis phase is a stream of tokens that are passed on to the next phase of compilation, which is syntax analysis or parsing. This phase uses the tokens to build a syntactic structure, often represented as a parse tree or abstract syntax tree (AST).
Keywords
Keywords in Java are reserved words that have specific meanings and purposes within the language. They are integral to the syntax and structure of Java programs and cannot be used for naming variables, methods, classes, or any other identifiers. This restriction ensures that the compiler can correctly interpret the code.
For example, the keyword class
, is used to define a new class in Java.
public class Example {
public static void main(String[] args) {
int number = 10; // 'int' is a keyword used to declare an integer variable
if (number > 5) { // 'if' is a keyword used to start a conditional statement
System.out.println("Number is greater than 5");
}
}
}
In this example:
public
,class
,int
, andif
are all keywords.public
is used to specify the access level of the class and method.class
is used to declare a class.int
is used to declare an integer variable.if
is used to start a conditional statement.
Identifiers
Identifiers in Java are names given to elements such as variables, methods, classes, and interfaces. They must adhere to specific rules:
Allowed Characters: Identifiers can include uppercase and lowercase letters (A-Z, a-z), digits (0-9), dollar signs ($), and underscores (_).
Starting Character: They must begin with a letter (A-Z, a-z), a dollar sign ($), or an underscore (_). They cannot start with a digit.
Case Sensitivity: Identifiers are case-sensitive, meaning
myVariable
andMyVariable
are considered different identifiers.Keywords Restriction: Identifiers cannot be Java keywords like
int
,class
, orif
.Length: There is no limit to the length of an identifier.
Examples:
Valid Identifiers:
myVariable
,_myVariable
,$myVariable
,myVariable1
Invalid Identifiers:
1myVariable
(starts with a digit),my Variable
(contains a space),my-Variable
(contains a hyphen),int
(a keyword)
Literals
Literals in Java are constant values that are directly written in the code and assigned to variables. They represent fixed values and can be of various types:
Integer Literals: These are whole numbers without fractional parts. They can be specified in different number systems:
Decimal:
int decimalNumber = 42;
Octal:
int octalNumber = 052;
(starts with0
)Hexadecimal:
int hexNumber = 0x2A;
(starts with0x
)
Floating-point Literals: These represent numbers with fractional parts and can be written in decimal or exponent form:
Decimal:
double decimalFloat = 3.14;
Exponent:
double exponentFloat = 2.5e3;
(equivalent to 2.5 × 10³)
Boolean Literals: These represent true or false values:
boolean isJavaFun = true;
boolean isFishTasty = false;
Character Literals: These are single characters enclosed in single quotes, including escape sequences for special characters:
char letterA = 'A';
char newLine = '\n';
(escape sequence for newline)
String Literals: These are sequences of characters enclosed in double quotes:
String greeting = "Hello, World!";
String emptyString = "";
(an empty string)
Operators
Operators are symbols that specify operations to be performed on operands. They can be arithmetic (e.g., +
, -
, *
, /
), relational (e.g., <
, >
, ==
, !=
), logical (e.g., &&
, ||
), bitwise, etc. Operators are crucial for performing calculations and making decisions in a program.
Arithmetic Operators: These operators perform basic mathematical operations.
+
(Addition):int sum = 5 + 3;
// sum is 8-
(Subtraction):int difference = 5 - 3;
// difference is 2*
(Multiplication):int product = 5 * 3;
// product is 15/
(Division):int quotient = 6 / 3;
// quotient is 2%
(Modulus):int remainder = 5 % 3;
// remainder is 2
Relational Operators: These operators compare two values.
<
(Less than):boolean result = 5 < 3;
// result is false>
(Greater than):boolean result = 5 > 3;
// result is true==
(Equal to):boolean result = 5 == 3;
// result is false!=
(Not equal to):boolean result = 5 != 3;
// result is true
Logical Operators: These operators are used to combine multiple boolean expressions.
&&
(Logical AND):boolean result = (5 > 3) && (3 > 1);
// result is true||
(Logical OR):boolean result = (5 > 3) || (3 < 1);
// result is true!
(Logical NOT):boolean result = !(5 > 3);
// result is false
Bitwise Operators: These operators perform operations on bits.
&
(Bitwise AND):int result = 5 & 3;
// result is 1|
(Bitwise OR):int result = 5 | 3;
// result is 7^
(Bitwise XOR):int result = 5 ^ 3;
// result is 6~
(Bitwise NOT):int result = ~5;
// result is -6<<
(Left shift):int result = 5 << 1;
// result is 10>>
(Right shift):int result = 5 >> 1;
// result is 2
Separators (Punctuators)
Separators are characters that separate elements within the program code. They help in defining the structure of the code. Common separators include parentheses ()
, braces {}
, brackets []
, semicolon ;
, and comma ,
. The semicolon terminates statements in Java.
Parentheses
()
: Used to group expressions and parameters in method calls.- Example:
System.out.println("Hello, World!");
- The parentheses enclose the argument for theprintln
method.
- Example:
Braces
{}
: Define blocks of code, such as the body of a class, method, or loop.Example:
public class MyClass { public void myMethod() { // Code block } }
The braces enclose the code block for the class and method.
Brackets
[]
: Used for array declarations and accessing array elements.- Example:
int[] numbers = {1, 2, 3};
- The brackets indicate thatnumbers
is an array.
- Example:
Semicolon
;
: Terminates statements in Java.- Example:
int a = 5;
- The semicolon marks the end of the statement.
- Example:
Comma
,
: Separates multiple items in a list, such as in variable declarations or method parameters.- Example:
int a = 1, b = 2, c = 3;
- The commas separate the variable declarations.
- Example:
Summary
Keywords: Keywords are reserved words in Java that have predefined meanings. They are always written in lowercase and cannot be used as names for variables, methods, or identifiers. Examples include
int
,class
,void
, etc.Identifiers: Identifiers are tokens that represent names assigned to variables, methods, and classes. They must follow certain rules: they can include uppercase and lowercase letters, digits, dollar signs, and underscores; they must begin with a letter, dollar sign, or underscore; they are case-sensitive; they cannot be keywords; and they can be of any length. Examples of invalid identifiers include names with spaces, punctuation, or starting with a number.
Literals: Literals are constant values assigned to variables. They can be of various types:
Integer Literals: Whole numbers without fractional parts, which can be specified in decimal, octal, or hexadecimal format.
Floating-point Literals: Numbers with fractional parts, written in decimal or exponent form.
Boolean Literals: Represent true or false values.
Character Literals: Single characters enclosed in single quotes, including escape sequences for special characters.
String Literals: Multiple characters enclosed in double quotes.
Punctuators (Separators): Punctuators are used to group code elements and inform the compiler about the structure of the code. The most common punctuator in Java is the semicolon, which terminates statements.
Operators: Operators are symbols that specify operations to be performed on operands. They can be arithmetic, relational, boolean, bitwise, etc. Operators are crucial for performing calculations and making decisions in a program.
Conclusion
Tokens are the fundamental elements of a Java program.
Java is case-sensitive, meaning identifiers like
myVariable
andMyVariable
are different.Whitespace (spaces, tabs, newlines) separates tokens but doesn't affect program execution.
Tokenization is the process of breaking down source code into valid tokens.
Subscribe to my newsletter
Read articles from Dilip Patel directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Dilip Patel
Dilip Patel
Software Developer