Last Updated:

Regular expressions in PHP

Checking the correspondence of a string to a certain pattern is performed through regex, whose syntax is one of the most complex units of the PHP language when it is fully studied. For most tasks, the basic character sets specified in regular expressions are suitable.

Description of regular expressions

The language component discussed in this article is used to identify sequences that fall under a given template. Expressions only work with strings, so to use them with information stored in another form, you must first cast the data in the form of a string.

You should try not to abuse regex in parts of the code where you can assign the actions performed to other functions because of the significantly accelerated processing of the latter. The abundance of "regulars" also makes it difficult for people to read the script, which is especially important when returning to editing the program after a long time has elapsed since its writing and the absence of comments in the course of the code.

 

The regex entry is described as follows - the string is enclosed in quotation marks, the first is followed by a delimiter, then by the rule by which the check will be performed, and, after repeating the delimiter character, a list of optional flags. Any character can beat parts of the expression, with the exception of the reverse slash, letters, numbers, space. The most common option is a slash, often there is a tilde or hash.

 

This kind of information is case sensitive. You should also pay attention to the fact of different behavior of seconds and double quotation marks when interpreting - the data in the first are displayed "as is", and in the second case, the values of the variables used, and not just their names, are displayed.

Related Features

Regex themselves act as templates for working with functions whose principles of interaction change from one PHP specification to another in the direction of improving processing speed, but delays are still noticeable, which can be considered a feature of the language core. Despite this, their moderate use will not lead to significant problems in the operation of programs and, in particular, "userexperience" - user opinions about the speed of the site.

The list of main functions is presented as follows:

  • preg_match(); – search for a match in the string. Displays results in the format "Boolean type": true when found and false if there is no information corresponding to the template;
  • preg_match_all(); acts on a similar principle that preg_match();, but gives out the number of detected occurrences;
  • preg_replace(); finds a string fragment that falls under the specified filter and replaces it with the specified one;
  • pregi_replace();, in addition to actions preg_replace();, is not case sensitive;
  • preg_split(); splits the line into parts according to the pattern;
  • preg_grep(); works similarly to preg_match();, but returns an array of fragments;
  • preg_quote(); helps to compose regex by adding the "\" character to the left of service characters.

Deprecated ereg() functions and eregi();, despite the presence of a corresponding warning on the official PHP website since the release of 5.3.0 and a complete removal in 7.0.0, are found on some reference sites. Use is not recommended, and may not be completely possible. Both were replaced by preg_match();. Split, spliti have a similar fate, instead of which you should specify preg_split();.

Modifiers

This topic applies more to functions than to regex, but deserves attention due to the aforementioned disappearance of eregi(); – to use its analogue – preg_match();, you should use the constant PCRE_CASELESS, which looks like this:

<?php
$subject = "abcxyz";
$pattern = ‘/^xyz/’;
preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE);
print_r($matches); # Result: Array ( [0] => Array ( [0] =>xyz [1] => 0 ) )

Similar actions with preg_split();.

Modifier flags are used to introduce additional parameters into programs that support working with them, and allow the language to get rid of many similar functions that require memorizing the syntax of each.

Flags

Regular expressions also have modifiers called indicators or flags. They are entered in parentheses or after the closing sign and are presented in the following list:

  • i – independence of the register when performing a search;
  • m – multi-line processing;
  • u– work with layouts that differ from English – for example, cyrillic;
  • U – inverting "greed" – a parameter that shows which string will be processed first;
  • s – the dot mark will correspond to the line break;
  • x – ignoring spaces, in which they should be replaced with inverse traits.

To disable the settings, specify a minus before the letter.

Metacharacters

In expressions, signs are divided into those that are processed literally, and those that serve as pointers in the compilation of templates. The latter are called metacharacters and perform the following roles:

  • . – any one sign, not counting the return of the carriage;
  • ^ – the beginning of the line;
  • $– end of line;
  • * – any non-negative number of occurrences;
  • + – the symbol indicated before the plus is represented at least 1 time;
  • \ – processing the element that is a metacharacter as a regular part of regex;
  • a-z – lowercase letters;
  • A-Z – capital letters;
  • 0-9 – digits from 0 to 9;
  • [...] – class of certain signs;
  • | is a separator between alternatives that work even if the previous one returned a non-zero value;
  • \d – any digit;
  • \s – spaces and tabs (Tab key);
  • \b– Search by the beginning or end of a word.

Repetition

You can use the same metacharacters several times in a row by typing code of the form \d\d to find two or more combinations (depending on the number \d and the like), but the program will look simpler when using quantifiers responsible for repeating actions:

  • x+ – one or more x;
  • x* – zero or several x;
  • x? – one or zero;
  • x{5} – five x;
  • x{5,8} – from five to eightx;
  • x{5,}– five or more x.

Group

By default, repetition flags only affect the last element of the combination. To avoid this, you can specify quantifiers after each letter or pay attention to a more convenient option in the face of parentheses. You can enclose letters in them, and then put a modifier to process several characters at once.

Examples

preg_match();
<?php
if(preg_match(/world/i, "Hello, world")) {
echo"Occurrence found"; # Result
} else {
echo"Occurrence not found";

}

The function is used to check whether a string contains a fragment corresponding to a specified pattern and supports setting various flags. There is an analogue that does not require regex and, due to this, has a higher sweep speed - strpos();, providing substring search.

preg_match_all();
<?php
$string = "Sphinx of black quartz, judge my vow!";
$pattern = «/o/i»;
echopreg_match_all($pattern, $string); # Score: 2

Using preg_match_all(); a global search is performed by expression, not the fact that at least one of them is present. The result returned is the number of occurrences, which will be zero if there are no matches.

preg_replace();
<?php
$pattern = "/Sphinx/i";
$new = «Anubis»;
$original = «Sphinx of black quartz, judge my vow!»;
echopreg_replace($pattern, $new, $original); # Score: 2

To replace the part of the string that falls under the template with something else, use the preg_replace();.

Tips

  • Language updates are aimed at minimizing the amount of use of regular expressions, and it is not recommended to abuse them in code. Scripts closely related to regex are executed slowly, and site users may be impatient and simply close the page, which will negatively affect the column "Refusals" in services for maintaining statistics - for example, in Yandex.Metrica, when quickly leaving the page, the visit is set as a "failure".
  • A vivid proof of the trend is the appearance of the FILTER_VALIDATE_EMAIL flag, which serves to check the validity of the entered e-mail address and replaced the long regular expression, which also required frequent independent edits due to changes in email standards.
  • If you need to find some of the text "as is", it is better to use functions that are not related to regex. Replacement preg_match(); serve strpos(); and strstr();. This will help to significantly improve the experience – again, we are talking about loading speed, important for SEO and also able to be optimized with other solutions, including those related to the front-end.