. 5 minutes January 8, 2014

Regular expressions in PHP

PHP has three sets of functions that allow you to work with regular expressions. Programmer s does not use these powerful functions often as it seems to be difficult to create patterns. Also it is not easy to find a basic and simple regular expression tutorial in a single webpage .  So I would like to give a try on this to collect the info and make it easy to learn and interesting. Also I will include some very useful reg-expressions in the end of this post.

About regular expressions

A regular expression is a pattern that can match various strings. Regular expressions started as a feature of Unix shell. They were made to make string operations easier. It’s really useful in programming with PHP as they help to reduce a lot of codes. As a simple example we can say the validation of an email address or a phone number.

Different sets of regular expressions

PHP provides mainly three sets of regular expressions.
1.    preg  – All the preg functions require to specify the regular expression in Perl syntax. If you want to include a slash (/) in your expression string, you should escape it with a back slash(). Hope you understood the core idea of using Perl syntax in preg. (I would be using preg functions in my examples)
2.    ereg – The ereg functions require you to specify the regular expression as a string, as you would expect.

3.    mb_ereg – They are very similar to ereg functions, but when ereg treat string as a series of 8 bit characters, mb_ereg can work with multi byte characters.
Enough theories, let’s go practical.

Operators and purposes

OperatorPurpose
. (period)Match any single character
^ (caret)Match the string that occurs at the beginning of a line or string
$ (dollar sign)Match the string that occurs at the end of a line
AMatch an uppercase letter A
aMatch a lowercase letter a
|OR operator
dMatch any single digit
DMatch any single non digit character
wMatch any single alphanumeric character
[A-Z]Match any of uppercase A to Z
[^A-Z]Match any character except uppercase A to Z
[0-9]Match any digit from 0-9
[^0-9]Match any digit except 0 to 9
X?Match none or one capital letter X
X*Match zero or more capital Xes
X+Match one or more capital Xes
X{n}Match exactly n capital Xes (f.e: A{2})
X{n,m}Match at least n and no more than m capital Xes; if you omit m, the expression tries to match at least n Xes

Basic syntax explanation

I will explain the main operators and syntax which are mostly used.

The use of “^” and “$” (start with and end with)

“^Aaa” – matches any string that starts with ” Aaa “;

“aa test$” –  matches a string that ends in the substring “aa test”;

“^abc$” –  a string that starts and ends with “abc” — that could only be “abc” itself!

“abcd” –  a string that has the text “abcd” in it.

The use of “*”,”+” and ”?” (zero or more, one or more and zero or one)

“ab*” –  matches a string that has an ‘a’ followed by zero or more b’s (“a”, “ab”, “abbb”, etc.);

“ab+” – matches string followed by one or more ‘b’ (“ab”, “abbb”, etc.);

“ab?” –  there might be a ‘b’ or not;

“a?b+$” –  a possible ‘a’ followed by one or more ‘b’s ending a string.

Specify range of number of occurrences

“ab{2}” –  matches a string that has an a followed by exactly two b’s (“abb”);

“ab{2,}” –  there are at least two b’s (“abb”, “abbbb”, etc.);

“ab{3,5}” – from three to five b’s (“abbb”, “abbbb”, or “abbbbb”).

Note : the first value of a range should be specified (i.e, “{0,2}”, not “{,2}”).

Specify range of occurrences of a sequence

“a(bc)*” –  matches a string that has an a followed by zero or more copies of the sequence “bc”;

“a(bc){1,5}” –  one to five occurrences  of “bc.”

Using OR  (|) operator

“euro|dollar” – matches a string that has either “euro” or “dollar” in it;

“(b|cd)ef” –  a string that has either “bef” or “cdef”;

“(a|b)*c” –  a string that has a sequence of alternating a’s and b’s ending in a c;

Using period(.) operator

“a.[0-9]” –  matches a string that has an a followed by one character and a digit;

“^.{3}$” –  a string with exactly 3 characters.

Bracket (“[]”) expressions

They specify which characters are allowed in a single position of a string

“[ab]” –  matches a string that has either an a or a b (that’s the same as “a|b”);

“^[a-zA-Z]” –  a string that starts with a letter;

“[0-9]%” –  a string that has a single digit before a percent sign;

“,[a-zA-Z0-9]$” – a string that ends in a comma followed by an alphanumeric character.

In bracket expressions the symbol “^” brings a negative effect. Ie. It matches string that is NOT IN specified list.

“[^a-zA-Z0-9]” – means a string with character not in the character range specified.

Some useful regular expression patterns

i. regular expression pattern to replace/match the special characters in a string. It is really helpful when you want to rename files without special characters and check the presence of special characters in a string.

pattern : “%[^a-zA-Z0-9]%”

code sample:

<?php

$string = ‘$%abcd*.06′;

echo preg_replace(“%[^a-zA-Z0-9]%”,’_’,$string);

//output :  __abcd__06

?>

Also if you want to give exceptions for any of the special characers, just include them inside the bracket after the symbol “^”

pattern : “%[^a-zA-Z0-9.$]%”

code sample:

<?php

$string = ‘$%abcd*.06′;

echo preg_replace(“%[^a-zA-Z0-9.$]%”,’_’,$string);

//output :  $_abcd_.06

?>

ii. Regular expression patterns to match a valid email address

pattern :  “^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+.[a-zA-Z.]{2,5}$”

code sample :

<?php

$email = ‘[email protected]’;

echo eregi(“^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+.[a-zA-Z.]{2,5}$”,$email );

//output :  1

?>

I will add some more commonly used patterns soon.

Useful links

http://www.ibm.com/developerworks/library/os-php-regex1/

http://in3.php.net/manual/en/book.pcre.php

shopware
Greetings! I'm Aneesh Sreedharan, CEO of 2Hats Logic Solutions. At 2Hats Logic Solutions, we are dedicated to providing technical expertise and resolving your concerns in the world of technology. Our blog page serves as a resource where we share insights and experiences, offering valuable perspectives on your queries.
CEO
Aneesh Sreedharan
Founder & CEO, 2Hats Logic Solutions
Subscribe to our Newsletter
Arsha Content writer

    Stay In The Loop!

    Subscribe to our newsletter and learn about the latest digital trends.