Unlock the Power of Trie in Java:Insertion,Searching,Word Break & more

In this chapter, we dive into one of the most versatile and efficient data structures used in various string manipulation problems – the Trie. We'll cover the basics of what a Trie is, how it works, and walk through its common applications like insertion, searching, and solving real-world problems such as word break and prefix issues. This chapter will deepen your understanding of how Tries can be applied in coding, and we'll also cover more advanced Trie-based problems.

Introduction to Tries

A **Trie(**pronounced as “try”), also known as a prefix tree or a retrieval tree, is a tree-like data structure that is used to store and manage strings efficiently. Each node in a Trie represents a single character, and multiple strings with common prefixes share the same initial nodes. This structure allows for efficient searching, insertion, and prefix-based operations.

For example, consider the words "there," "the," and "their." All these words share the common prefix "the." In a Trie, this prefix would only be stored once, and the branching would occur after the shared prefix, saving memory and speeding up retrieval. Similarly, if we want to store the words "apple" and "app," we would first store "app," and then extend it by adding the remaining part of "apple" (i.e., "le") rather than storing the entire word again.

A Trie is highly efficient for solving problems related to prefix matching and searching because it organizes data in such a way that common prefixes are stored together, reducing redundancy.

Common use cases for Trie include:

Auto-completion in search engines.
Spell checking.
Efficient storage and retrieval of words.

What is a Trie?

A Trie is a special type of K-ary tree (also known as a prefix tree or retrieval tree) used to efficiently store and retrieve strings. In a K-ary tree, each node can have up to K children, meaning it can point to multiple characters or nodes, unlike a binary tree which can only have up to two children (left and right).

Tries are particularly useful in scenarios where we work with prefixes and need to store large numbers of strings. The key idea of a Trie is to share prefixes among different words, reducing redundancy and improving efficiency in operations like searching, insertion, and auto-completion.

Comparison Between K-ary Tree and Binary Tree

A binary tree (Bi-ary tree) has at most two children (left and right), while a K-ary tree (in the case of a Trie) can have up to K children, where K is typically the number of possible characters (e.g., 26 for lowercase English letters).

Binary Tree (Bi-ary Tree) Example:

A balanced Binary Search Tree (BST), like an AVL tree, has a time complexity of O(log n) for searching because it maintains a balanced structure with a logarithmic height.
An unbalanced BST can degenerate into a linked list, resulting in a time complexity of O(n) for searching since the height of the tree can become as large as the number of nodes.

Visualization of a Binary Search Tree (Balanced vs. Unbalanced)

Balanced BST (AVL Tree):

        15
       /  \
      10   20
     / \   / \
    8  12 17  25

In this tree, the height difference between the left and right subtrees is at most 1, ensuring that the tree remains balanced, allowing for O(log n) search time.

Unbalanced BST:

This tree is unbalanced, resembling a linked list, with a time complexity of O(n) for search operations.

Visualizing a Trie

Consider the words: ["the", "a", "there", "their", "any", "thee"].

Start with an empty root node.
Add each word, sharing common prefixes when possible.

Trie Structure:

        (root)
        /   \
      a     t
     /      |
    n       h
   /        |
  y         e 
          / \  \
         r   e  e
        /     \
       e       i
                  \
                   r

Time Complexity in Trie vs. Binary Search Tree

Trie: The time complexity for search operations in a Trie is proportional to the length of the word, O(L), where L is the length of the word. Since all words with common prefixes share the same path, this can be very efficient when the height of the Trie is short.

Insertion

A Trie functions as a K-ary tree, where each node can have multiple children, unlike a binary tree, which has a maximum of two children. For a Trie, the maximum number of children is typically 26, corresponding to the lowercase letters of the English alphabet. This flexibility enables the Trie to represent various combinations of letters without redundancy, optimizing both space and search operations.

Node Implementation:
- The core structure of a Trie is the node, which can be implemented as follows:
```
  class Node {
      Node[] children = new Node[26]; // Represents 26 letters (a-z)
      boolean endOfWord = false; // Indicates if this node marks the end of a valid word
  }
```
- In this implementation, each Node contains an array of child nodes for each letter of the alphabet. The endOfWord boolean flag marks the completion of a word at that node, enabling efficient identification of valid words in the Trie.
Children Array:
- The children array in each node provides direct indexing based on character values. For example, to store the character 'c', it would be placed at index 2 (since 'a' is 0, 'b' is 1, etc.). This array structure minimizes the need for additional data structures, reducing overhead and allowing for rapid access to child nodes, which speeds up insertion and search operations.
End of Word Flag:
- The endOfWord flag is a crucial feature of each node. It serves to indicate whether a sequence of characters forming a word ends at the current node. When inserting the word "the," as each character is processed, the endOfWord flag at the node representing 'e' is set to true. This differentiation allows the Trie to accommodate prefixes and complete words without ambiguity, ensuring accurate search results.
Initialization:
- The constructor for the node initializes the children array to ensure all entries start as null. This is done to prepare each node for potential future insertions:
```
  Node() {
      for (int i = 0; i < 26; i++) {
          children[i] = null; // Initialize each child to null
      }
  }
```
- This initialization is critical because it prevents the program from encountering null pointer exceptions when accessing child nodes during insertion or search operations.
Root Node:
- The root of the Trie serves as the entry point for all operations and is typically initialized as follows:
```
  public static Node root = new Node(); // Root of the Trie
```
- The root node does not represent any character itself; it merely serves as a starting point from which all strings in the Trie can be accessed. This allows for a uniform entry point for inserting and searching words, regardless of their content.

iv) Insertion in Trie

To insert a word into a Trie, we start from the root node and move along the corresponding child nodes according to each character in the word. If the node for a character does not exist, we create a new node. Once the final character is inserted, we mark the isEndOfWord as true.

Java Code:

javaCopy codepublic void insert(String word) {
    TrieNode current = root;
    for (int i = 0; i < word.length(); i++) {
        int index = word.charAt(i) - 'a'; // Calculate index for each character
        if (current.children[index] == null) {
            current.children[index] = new TrieNode();
        }
        current = current.children[index];
    }
    current.isEndOfWord = true;
}

v) Searching in Trie

To search for a word in a Trie, we navigate through the child nodes for each character of the word. If at any point, the node corresponding to a character is null, we return false. If we reach the end of the word and the isEndOfWord flag is set to true, the word exists in the Trie.

Java Code:

javaCopy codepublic boolean search(String word) {
    TrieNode current = root;
    for (int i = 0; i < word.length(); i++) {
        int index = word.charAt(i) - 'a';
        if (current.children[index] == null) {
            return false; // Word doesn't exist
        }
        current = current.children[index];
    }
    return current.isEndOfWord;
}

vi) Word Break Problem Using Trie

The Word Break Problem involves checking if a given string can be segmented into a sequence of valid words using a dictionary. We can solve this problem efficiently using a Trie by inserting all the dictionary words and then checking if the input string can be broken down by searching prefixes in the Trie.

Approach:

Insert all dictionary words into the Trie.
For the input string, try to find prefixes using the Trie.
Recursively check if the remaining part of the string can be segmented.

vii) Prefix Problem in Trie

Given a prefix, the task is to find if any word in the Trie starts with that prefix. This can be done by navigating through the child nodes for each character in the prefix. If the traversal completes successfully, the prefix exists.

Java Code:

javaCopy codepublic boolean startsWith(String prefix) {
    TrieNode current = root;
    for (int i = 0; i < prefix.length(); i++) {
        int index = prefix.charAt(i) - 'a';
        if (current.children[index] == null) {
            return false;
        }
        current = current.children[index];
    }
    return true;
}

viii) StartsWith Problem

Similar to the prefix problem, we can use the startsWith method to find all words that begin with a specific prefix in the Trie.

ix) Unique Substring Problem

The problem of finding the number of unique substrings in a string can be solved efficiently using Trie. Each insertion into the Trie from different starting points in the string allows us to count unique substrings by counting the unique nodes created.

x) Longest Word with All Prefixes

To find the longest word in a list of words such that every prefix of the word is also present in the list, we can use Trie to store all the words and then check each word if all its prefixes are present in the Trie. The longest word that satisfies this condition is the result.

Conclusion

In this chapter, we explored the Trie data structure in detail, starting from basic operations like insertion and search to solving advanced problems like word break, prefix search, and longest word with all prefixes. Tries provide an efficient way to handle many string-related problems, making them invaluable in algorithmic challenges. As we continue, we'll delve deeper into solving more complex problems by leveraging this powerful data structure.

Follow my journey through the Java with DSA series as we explore more exciting concepts!

Related Posts:

You might be interested in exploring more detailed topics in my ongoing series:

Full Stack Java Development: A comprehensive guide to becoming a full stack Java developer.

Connect with me:

Your feedback and engagement are invaluable. Feel free to reach out with questions, comments, or suggestions. Happy coding!

Rohit Gawande
Full Stack Java Developer | Blogger | Coding Enthusiast

Chapter 34: Trie in Java - A Comprehensive Guide (Continuing Java With DSA Series)

Table of contents

Introduction to Tries

What is a Trie?

Comparison Between K-ary Tree and Binary Tree

Binary Tree (Bi-ary Tree) Example:

Visualization of a Binary Search Tree (Balanced vs. Unbalanced)

Visualizing a Trie

Time Complexity in Trie vs. Binary Search Tree

Insertion

iv) Insertion in Trie

v) Searching in Trie

vi) Word Break Problem Using Trie

vii) Prefix Problem in Trie

viii) StartsWith Problem

ix) Unique Substring Problem

x) Longest Word with All Prefixes

Conclusion

Subscribe to my newsletter

Rohit Gawande

Rohit Gawande