Regular expression patterns [[:upper:]] vs [A-Z]

Note: the comparisons in this article also applies to [[:lower:]] vs [a-z] regexp patterns.

Oracle regular expressions (regexp) support both [[:upper:]] or [A-Z] to find uppercase letters. At first glance they appear the same. Even regex101.com defines them as the same:

[[:upper:]]: Matches uppercase letters. Equivalent to [A-Z]. The double square brackets is not a typo, POSIX notation demands it.

There is a slight difference between the two. [A-Z] only deals with the 26 letters in the English alphabet whereas [[:upper:]] deals with special alphabet characters such as Ê - accent circumflex (or as we learned in French glass "e avec un chapeau "). The following example highlights the differences using the demo Oracle emp table:

-- Change the "A" in Martin to A with an accent on it
update emp
set ename = 'MÄRTIN'
where empno = 7654;


-- [A-Z]
select *
from emp
where 1=1
    and empno = 7654
    and regexp_like(ename, '^[A-Z]+$')
;

-- Returns
/*
No data found
*/

-- [[:upper:]]
select ename
from emp
where 1=1
    and empno = 7654
    and regexp_like(ename, '^[[:upper:]]+$')
;

-- Returns:
/*
ENAME  
------ 
MÄRTIN 
*/


-- Look at ASCII characters
select ename, dump(ename)
from emp
where empno = 7654
;

/*
ENAME  DUMP(ENAME)                         
------ ----------------------------------- 
MÄRTIN Typ=1 Len=7: 77,195,132,82,84,73,78 
*/

-- You can see the second characters out of normal a-Z ASCII characters

-- Reset
update emp
set ename = 'MARTIN'
where empno = 7654
;

As you can see the results are different and [[:upper:]] matched the special characters. The following description from this Stackoverflow post highlights the differences:

[A-Z] matches only an ASCII uppercase letter, that is, a letter from A through Z. There are other, non-ASCII uppercase letters (e.g., in languages other than English).

If you use regular expressions in your code do not go change everything from [A-Z] without consideration. They're some times where you may want to keep it in place (example lookup codes, etc). I tend to use [[:upper:]] for when dealing with user inputed fields when it makes sense.

0
Subscribe to my newsletter

Read articles from Martin Giffy D'Souza directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Martin Giffy D'Souza
Martin Giffy D'Souza