ECMAScript TC39 - Regex Escape

nidhinkumarnidhinkumar
2 min read

The ECMAScript TC39 committee gathered last week in Tokyo and they have analyzed the new language proposals and moved to one of the five maturity stages they have to go through before they are officially adopted in the language this time.

  • STAGE 0 - Strawman Idea

  • STAGE 1 - Proposal Concept

  • STAGE 2 - Draft Specs

  • STAGE 3 - Candidate Implementation

  • STAGE 4 - Finished

Today we will see about the REGEX Modifiers proposal which has reached STAGE 4.

Being able to control a subset of regular expression flags is a usual capability amongst the majority of the regex engines that is commonly used by parser, syntax highlighters and other tools.

  • i IGNORE CASE

  • m MULTILINE

  • s SINGLE

  • x EXTENDED MODE

The modifiers are extremely helpful where executable code cannot be evaluated, such as JSON configuration file.

Regex Escape

It is often the case when we want to build a regular expression out of a string without treating special characters from the string as special regular expression tokens. For example, if we want to replace all occurrences of the the string let text = "Hello." which we got from the user, we might be tempted to do ourLongText.replace(new RegExp(text, "g")). However, this would match . against any character rather than matching it against a dot.

While some other languages provide an unescape method we choose to defer discussion about it to a later point, mainly because no evidence of people asking for it has been found (while RegExp.escape is commonly asked for).

With RegExp.escape function

This would be a RegExp.escape static function, such that strings can be escaped in order to be used inside regular expressions:

const str = prompt("Please enter a string");
const escaped = RegExp.escape(str);
const re = new RegExp(escaped, 'g'); // handles reg exp special tokens with the replacement.
console.log(ourLongText.replace(re));

Note the double backslashes in the example string contents, which render as a single backslash.

RegExp.escape("The Quick Brown Fox"); // "The\\ Quick\\ Brown\\ Fox"
RegExp.escape("Buy it. use it. break it. fix it.") // "Buy\\ it\\.\\ use it\\.\\ break\\ it\\.\\ fix\\ it\\."
RegExp.escape("(*.*)"); // "\\(\\*\\.\\*\\)"
RegExp.escape("。^・ェ・^。") // "。\\^・ェ・\\^。"
RegExp.escape("😊 *_* +_+ ... 👍"); // "😊\\ \\*_\\*\\ \\+_\\+\\ \\.\\.\\.\\ 👍"
RegExp.escape("\\d \\D (?:)"); // "\\\\d \\\\D \\(\\?\\:\\)"

To know more about Regex Escaping check the below Github Repo

Note: These features are not in the standard yet and some of them might never end up being adopted

0
Subscribe to my newsletter

Read articles from nidhinkumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

nidhinkumar
nidhinkumar