Build up your confidence with Regex: 5 Techniques to make it STICK

My experience with Regex, and how it's a TIME-SAVER

Regular expressions (Regex) look intimidating due to their complex set of characters and symbols.

However, when you get to the know-how of things, Regex can be made simpler.

Take my story, for instance. I recently faced a problem in developing LiveAPI, where I had to take a codebase and extract the files that had API definitions.

Since there were many frameworks, I needed a solution that could match certain patterns in the code and filter out files with API definitions.

For instance, when working with a Flask codebase, I needed to locate files with API route definitions like this:

@app.route('/api/resource', methods=['GET'])
def get_resource():
    # Implementation here
    pass

or

@app.post('/api/resource')
def create_resource():
    # Implementation here
    pass

I didn't have a solid idea of how Regex expressions worked when I got approached with this problem.

So, I gradually started learning the necessary techniques, and I could make Regex expressions with ease.

This enabled me to design a solution for extracting the files that had these API definitions and also saved me considerable time compared to manual searching.

Let's see how we can start with Regex, and slowly move towards how I solved the problem in detail.

Make your step towards learning Regex: The Essentials

Before going into the techniques for Regex, we need a solid understanding of what they exactly are, and what are the principles behind them. So we can treat this logically.

Regex is short for Regular Expression. It helps to match, find or manage text.

Regular Expressions are a string of characters that express a search pattern. It is especially used to find or replace words in texts.

Additionally, we can test whether a text complies with the rules we set.

Now let's go through each concept one by one.

Basic Matchers and Characters

Week37(2)

  • Direct Matching
    • For this one, just input the characters you want to match, and you are done
    • Example: To match "cat" in a string, just use cat.
  • The full stop .
    • The period . Allows to select any character, including special characters and spaces
      • Example: c.t will match "cat", "cut", "c t", and even "c$t".
    • Exception: The . is a special character in regular expressions, so to match an actual period, you must escape it using a backslash (.).
      • Example: c\.t will match only "c.t" and not "cat" or "cut".
  • Character Sets []
    • If one of the characters in a word can be various characters, we write it in square brackets
    • Example:
      • I want something that can match "cat", "cet", "cit", "cot", and "cut".
      • The common letters here are c and t
      • The letters in between are different, a,e,i,o,u
      • So the Regex required will be c[aeiou]t
  • Negated Character sets [^]
    • If you want to exclude some characters for a particular position then write it in[^]
    • Example:
      • I do not want the words "cat", "cet", "cit", "cot", and "cut" to match
      • So the Regex required will be c[^aeiou]t

Ranges and Repetition

Week37(3)

  • Letter Ranges [A-Z]
    • If you want to find letters in a certain range then use starting letter and ending letter separated by a dash between them like [a-z], [g-r]
    • Example: [a-z] matches any lowercase letter from a to z. [g-r] matches any lowercase letter from g to r, so h would match, but s would not.
  • Number Range[0-9]
    • If you want to find numbers in a certain range then the starting number and ending number are separated by a dash between them. Like [0-9]
    • Example: [0-7] matches any single digit from 0 to 7, so 5 would match but 9 would not.
  • Asterisk*
    • We put an asterisk * after a character to indicate that the character may either not match at all or can match many times
    • Example: go*gle matches "ggle", "gogle", "google", "gooogle", etc.
    • So the character o here can appear 0 or more times
  • Plus Sign +
    • + sign is used to indicate a character can occur one or more times
    • Example: go+gle matches "gogle", "google", "gooogle", but not "ggle".
    • So, here the character o appears 1 or more times
  • Question Mark ?
    • To indicate a character is optional. We use the question mark ?
    • Example: colou?r matches both "color" and "colour".
    • Here the character u is optional.
  • Curly Braces {}
    • Use curly braces {n} to specify the exact number of times a character should occur.
    • Example: a{3} matches "aaa" but not "aa" or "aaaa".

Grouping and Alternation

Read the rest of the article on journal

10
Subscribe to my newsletter

Read articles from Rijul Rajesh T P directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rijul Rajesh T P
Rijul Rajesh T P