Regular expression patterns [[:upper:]] vs [A-Z]
Note: the comparisons in this article also applies to [[:lower:]]
vs [a-z]
regexp patterns.
Oracle regular expressions (regexp) support both [[:upper:]]
or [A-Z]
to find uppercase letters. At first glance they appear the same. Even regex101.com defines them as the same:
[[:upper:]]
: Matches uppercase letters. Equivalent to [A-Z]. The double square brackets is not a typo, POSIX notation demands it.
There is a slight difference between the two. [A-Z]
only deals with the 26 letters in the English alphabet whereas [[:upper:]]
deals with special alphabet characters such as Ê
- accent circumflex (or as we learned in French glass "e avec un chapeau "). The following example highlights the differences using the demo Oracle emp
table:
-- Change the "A" in Martin to A with an accent on it
update emp
set ename = 'MÄRTIN'
where empno = 7654;
-- [A-Z]
select *
from emp
where 1=1
and empno = 7654
and regexp_like(ename, '^[A-Z]+$')
;
-- Returns
/*
No data found
*/
-- [[:upper:]]
select ename
from emp
where 1=1
and empno = 7654
and regexp_like(ename, '^[[:upper:]]+$')
;
-- Returns:
/*
ENAME
------
MÄRTIN
*/
-- Look at ASCII characters
select ename, dump(ename)
from emp
where empno = 7654
;
/*
ENAME DUMP(ENAME)
------ -----------------------------------
MÄRTIN Typ=1 Len=7: 77,195,132,82,84,73,78
*/
-- You can see the second characters out of normal a-Z ASCII characters
-- Reset
update emp
set ename = 'MARTIN'
where empno = 7654
;
As you can see the results are different and [[:upper:]]
matched the special characters. The following description from this Stackoverflow post highlights the differences:
[A-Z]
matches only an ASCII uppercase letter, that is, a letter from A through Z. There are other, non-ASCII uppercase letters (e.g., in languages other than English).
If you use regular expressions in your code do not go change everything from [A-Z]
without consideration. They're some times where you may want to keep it in place (example lookup codes, etc). I tend to use [[:upper:]]
for when dealing with user inputed fields when it makes sense.
Subscribe to my newsletter
Read articles from Martin Giffy D'Souza directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by