Java Regex 101 - Mastering the Basics

Pratik MPratik M
4 min read

Regular expressions, also known as regex or regexp, are essential tools for pattern matching and text manipulation in programming. In Java, the java.util.regex API provides full support for regular expressions. Mastering regex basics is key to leveraging its power for tasks like data validation, parsing, replacing text, etc.

Let's start with a quick regex refresher.

Regex Basics

A regex consists of a pattern to match against input text. It can contain character classes, quantifiers, anchors, groups, and other elements. For example:

\d{3}-\d{3}-\d{4}

This regex matches US phone numbers like 123-456-7890. It uses \d to match digits and {3} to match 3 instances.

Common Elements

  • . matches any character

  • \d matches digit

  • \w matches word character

  • \s matches whitespace

  • [abc] matches a, b or c

  • [^abc] matches except a, b or c

  • {3} matches 3 instances

  • + matches 1 or more

  • * matches 0 or more

  • ? makes quantifier lazy

Anchors

  • ^ matches line start

  • $ matches line end

  • \b matches word boundary

Groups

  • ( ) groups sub-patterns

  • (?: ) non-capturing group

  • | matches alternate patterns

Regex in Java

The java.util.regex API provides the Pattern and Matcher classes for regex support.

To use regex in Java:

  1. Compile pattern into Pattern instance

  2. Create Matcher from Pattern

  3. Use Matcher methods to perform matches

For example:

String patternString = "\\d{3}-\\d{3}-\\d{4}";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher("123-456-7890");

if(matcher.matches()) {
  System.out.println("Valid phone number"); 
} else {
  System.out.println("Invalid phone number");
}

This compiles the regex, creates a matcher to match input text, and prints if the phone number is valid.

Matcher Methods

The Matcher class provides various useful methods:

  • matches() - Matches full input

  • find() - Finds match in input

  • group() - Gets matched subgroups

  • replaceAll() - Replaces matches in input

For example:

Matcher matcher = pattern.matcher("123-456-7890");

// Check if matches full input 
boolean matchFound = matcher.matches();

// Find match 
matchFound = matcher.find(); 

// Get matched groups
String matchedText = matcher.group();

// Replace matches
String newString = matcher.replaceAll("X");

So the matcher allows detecting, extracting, and replacing matches.

Regex Examples

Let's look at some examples to master regex basics in Java.

1. Validate Email

To check if the input is a valid email address:

String emailRegex = "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,6}$";

String email = "john@example.com";

Pattern pattern = Pattern.compile(emailRegex);
Matcher matcher = pattern.matcher(email);

if(matcher.matches()) {
  System.out.println("Valid email");
} else {
  System.out.println("Invalid email"); 
}

This uses character classes, anchors, and quantifiers to define email format.

2. Extract URL Parts

To extract protocol, domain, and path from URL:

String urlRegex = "^(https?)://([^/]+)([^?#]*)(\\?[^#]*)?(#.*)?$";

String url = "https://www.example.com/path1/path2?foo=bar"; 

Pattern pattern = Pattern.compile(urlRegex);
Matcher matcher = pattern.matcher(url);

if(matcher.find()) {
  String protocol = matcher.group(1); 
  String domain = matcher.group(2);
  String path = matcher.group(3);

  System.out.println("Protocol: " + protocol);
  System.out.println("Domain: " + domain);
  System.out.println("Path: " + path);
}

This extracts different parts of the URL using grouped subpatterns.

3. Format Text

To format text with spaces and newlines:

String formatRegex = "([A-Z]{3,})";
String text = "IDCodeDATAValue";

Pattern pattern = Pattern.compile(formatRegex);
Matcher matcher = pattern.matcher(text);  

// Use replaceAll to reformat text
String formatted = matcher.replaceAll("\n$1"); 

System.out.println(formatted);

This adds newlines before uppercase words. The $1 in-replacement references matched the group.

4. Split String

To split a string on multiple delimiters:

String text = "foo,bar|baz|qux";

String delimRegex = "[,\\|]+"; 

Pattern pattern = Pattern.compile(delimRegex);
String[] parts = pattern.split(text);

System.out.println(Arrays.toString(parts));

This splits the string on a comma, pipe, or a mix of both.

Conclusion

This covers the basics of regular expressions in Java, including regex syntax, java.util.regex API, common methods, and practical examples. Key highlights:

  • Regex provides powerful pattern-matching capabilities

  • Pattern and Matcher classes enable full regex support in Java

  • Anchors, character classes, groups, and quantifiers form basic building blocks

  • Matcher methods like find(), group() and replaceAll() allow robust text processing

  • Regex is useful for validation, parsing text, reformatting, and more

Mastering regex fundamentals unlocks the capability to find, extract, replace, and validate text efficiently. Hope you enjoyed this quick regex primer! Let me know if you have any other regex topics you would like me to cover.

0
Subscribe to my newsletter

Read articles from Pratik M directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pratik M
Pratik M

As an experienced Linux user and no-code app developer, I enjoy using the latest tools to create efficient and innovative small apps. Although coding is my hobby, I still love using AI tools and no-code platforms.