Regular Expression - Matching a URL
A regex, which is short for regular expression, is a sequence of characters that defines a specific search pattern. When included in code or search algorithms, regular expressions can be used to find certain patterns of characters within a string, or to find and replace a character or sequence of characters within a string. They are also frequently used to validate input. In this article we'll see how to use a regex to match a URL.
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
Let's break down the components of this regular expression used to match and validate URLs.
🌻 Regex Components
◾ Anchors
The URL matching regex starts with the ^
(caret) symbol and ends with the $
(dollar) symbol. These anchors define that the entire string must match the pattern between them, ensuring that the regex matches the complete URL and not just a part of it.
The caret anchor indicates that the string to be examined must include the characters following it. It is important to note that the regular expression is case-sensitive.
The dollar sign anchor indicates that the string to be examined includes the characters preceding it.
◾ Quantifiers
Quantifiers used to quantify how many times a part of your regular expression should be repeated.
*
(zero or more occurrences)+
(one or more occurrences)?
(zero or one occurrence){}
(specifying a specific range of occurrences).
◾ Grouping Constructs
The parentheses grouping constructs ()
in regular expressions are used to group commands together to determine the order of processing.
◾ Bracket Expressions
Bracket Expressions []
match any one of a set of characters specified within square brackets. For example, [abc] matches any single character that is either a, b, or c.
🌻 Matching a URL
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
🟡 Protocol
This part of our regex https?:\/\/
matches the protocol of a URL, such as "http://" or "https://". The ?
makes the 's' character optional, allowing URLs with both HTTP and HTTPS protocols to match.
🟡 Domain Name
This section [\da-z.-]+
matches the domain name in the URL. It allows for alphanumeric characters (including digits), hyphens, and dots.
www.example
🟡 Top-Level Domain
This section [a-z.]{2,6}
matches the top-level domain part of the URL. It typically includes domain extensions like .com, .net, .org, etc. The {2,6} specifies that it can consist of 2 to 6 characters.
.com
.org
🟡 Path and File Name
This segment [/\w .-]*
matches the path and file name portion of the URL(represents the specific location or resource on the web server). It allows forward slashes, word characters, spaces, dots, and hyphens.
/page1.html
_________________________________________
🌻 Regex101 a popular online tool that can help you testing, debugging, and learning regular expressions (regex).
Conclusion
Regular expressions offer a powerful way to search, manipulate, and validate text data in programming. With practice, regex can become an indispensable part of your programming toolkit.
Subscribe to my newsletter
Read articles from Yousra Kamal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Yousra Kamal
Yousra Kamal
Hi I'm Yousra, a clinical pharmacist and a software developer in the making. I'm self-taught, exploring development world with heavy focus on frontend applications.