Last Updated:

PHP - Optimization of PHP programs

PHP programs

In this article, using simple and obvious examples, we talk about some ways to optimize any (ready-made) program without changing any algorithm. For such optimization, you can even write a program for the automatic implementation of all recommendations, all of them are very simple (however, first you will have to write a parser of phP code).


Take $variables out of "text strings" - acceleration 25-40%

The same assignment operation (either echo/print for display), depending on whether the variables are enclosed in quotation marks or not, greatly affects the speed. In the first and second options, spaces have been added to equalize the size of the common parsing code.
  1. {$x="test".$test;    }
  2. {$x="test $test";    }
  3. {$x="test";$x.=$test;}

The variable $test contains the string "1234567890".

 

So, never write $a="$b", because this will slow down the program (in this line) by 40%.

However, if you have a large line with a lot of text and variables, the differences in speed decrease, because the total cost of parsing becomes much greater than the different effectiveness of the commands. But why not increase the speed of the program (assignment lines) by almost a quarter in such a simple method?

  1. {$x="test ".$test." test ".$test." test ".$test;                }
  2. {$x="test $test test $test test $test";                         }
  3. {$x="test ";$x.=$test;$x="test ";$x.=$test;$x="test ";$x.=$test;}

 


Short variables no more than 7 characters - acceleration 15%

 

How does the length of variable names affect the speed of the program? If you use very long variables, obviously very strongly. However, with short names, not everything is simple:
{$x=1;}
{$x2=1;}
{$x03=1;}
{$x004=1;}
{$x0005=1;}
{$x00006=1;}
{$x000007=1;}
{$x0000008=1;}
{$x000000010=1;}
{$x00000000012=1;}
{$x0000000000014=1;}
{$x000000000000016=1;}
{$x0000000000000000000000000000032=1;}
 
gives a predictable result:

 

Variables of 32 characters can slow down the program by almost half.

But if you fill in with spaces (" ") to all lines "$x=1; ..." the length took the same distance, then it turns out this:

{$x=1;                                   }
{$x2=1;                                  }
{$x03=1;                                 }
{$x004=1;                                }
{$x0005=1;                               }
{$x00006=1;                              }
{$x000007=1;                             }
{$x0000008=1;                            }
{$x000000010=1;                          }
{$x00000000012=1;                        }
{$x0000000000014=1;                      }
{$x000000000000016=1;                    }
{$x0000000000000000000000000000032=1;    }

I don't know how to comment on a single-character variable test (2% slower than fastest) ... Probably, the tests have a large error. I suggest someone run the test for an hour (test sources at the bottom of the page).

One thing is clear - with a variable length of 8 or more characters, there is a sharp decrease in performance, up to 15%! And there are a lot of commands that include variable names. Another less abrupt jump on variables with a name of 16 characters in length or more. And in other cases - the more, the longer, a very linear dependence.

Conclusion - do not use variables of 8 or more characters, you will win 15% of the speed (or rather, save).

 


Do arrays in PHP slow down? Or rather, how. Acceleration 40%.

 

But they don't slow down. I read somewhere, supposedly associative arrays in PHP are terribly slow. Of course, the test is simple, but there is no big difference between continuous prime (1), simple (2) and associative (3) arrays (element 0000 is converted to 0 - this is the same number, not a string). And they clearly do not slow down "non-associative non-continuous arrays".
  1. {$test[0000]=1;$test[0001]=1;$test[0002]=1;$test[0003]=1;$test[0004]=1;     }
  2. {$test[1000]=1;$test[1001]=1;$test[1002]=1;$test[1003]=1;$test[1004]=1;     }
  3. {$test["aa"]=1;$test["bb"]=1;$test["cc"]=1;$test["dd"]=1;$test["ee"]=1;     }
  4. {$test[aa]=1;  $test[bb]=1;  $test[cc]=1;  $test[dd]=1;  $test[ee]=1;       }
  5. {$test[0][0]=1;$test[0][1]=1;$test[0][2]=1;$test[0][3]=1;$test[0][4]=1;     }
  6. {$test[2][1]=1;$test[3][8]=1;$test[4][9]=1;$test[33][99]=1;$test[123][99]=1;}
  7. {$test[a][b]=1;$test[x][y]=1;$test[d][c]=1;$test[a][s]=1;$test[b][n]=1;     }

What can I comment on here.. it's obvious. Accessing a one-dimensional associative array element by a name that is not enclosed in quotation marks slows down the process by a third (relative to the same example, but in quotation marks). But in a two-dimensional array, the program works slower by as much as 2.5 times! After such a test, you don't want to, but in any program you will sacrifice convenience - accessing the elements of the array by name without quotation marks.

 


Take out multidimensional arrays from "text strings" - acceleration of 25-30%. One-dimensional can not be tolerated.

When using multidimensional arrays, there is a noticeable decrease in speed in strings Because of multidimensionality, you need to enclose variables in paired curly braces.
  1. {$x="test ".$myarray["name"]["second"][1]." test";       }
  2. {$x="test {$myarray[name][second][1]} test";             }
  3. {$x="test ";$x.=$myarray["name"]["second"][1];$x=" test";}

The same example with an associative 3-dimensional array, but with access to an element by its index number:

  1. {$x="test ".$myarray[3][2][0]." test";       }
  2. {$x="test {$myarray[3][2][0]} test";         }
  3. {$x="test ";$x.=$myarray[3][2][0];$x=" test";}

The difference in options 1 and 2 is very small. This says that the losses for inefficient use of quotation marks are not too great than access to arrays (see the tests in the first chapter).

  1. {$x="test".$myarray["test"]." test";}
  2. {$x="test$myarray[test]test";      }
  3. {$x="test{$myarray[test]}test";    }

And now, based on all three tests, the unequivocal conclusion is that you cannot use curly braces to indicate the boundaries of the element name of a multidimensional array. This greatly reduces the speed of work - 25-30% (in the third option, from the simple addition of brackets, the speed decreased by a quarter). You can't use parentheses without. Therefore, the only way not to lose 30% of the speed is to take multidimensional arrays out of brackets.

Same test, but for one-dimensional:

  1. {$x="test".$myarray["test"]." test".$myarray["test"]." test".$myarray["test"]; }
  2. {$x="test$myarray[test]testtest$myarray[test]testtest$myarray[test]test";      }
  3. {$x="test{$myarray[test]}testtest{$myarray[test]}testtest{$myarray[test]}test";}

Comparing the last two tests, it is obvious that one-dimensional arrays can not be tolerated, the loss is only 3-4% (but on simple variables - losses of 25-40%!).

 


Regular expressions: PHP(POSIX) vs Perl.

 

PHP supports regular expressions of the POSIX/eger*/ and PERL/preg*/-oriented standard (about their difference here - php.spb.ru/regular_expression.html). Which one is faster?

I want to warn Pearl lovers in advance not to rejoice: although pearl regs are cooler than phpish, only nothing and no one prevents you from using pearl regs in PHP! That's probably why they were built into PHP, which hurts the brakes are big... :-)

So, the simplest text. Search for a simple expression in the text, which consists of repeated repetition of this article (the size of the variable $text of 3 MB is obtained).

The test calls only 1 time, because the regs have a built-in tool for caching compilation results. That is, before starting, compilations take place, and repeated regs are not compiled. These are the features of regular expressions. Different programming languages are able to store a different number of compiled expressions that were called (in the order in which they were called in the program). And in this test, there is just no effect from compilation, because the function is called only once.

  1. {eregi("MaC+iB", $text);}
  2. {preg_match("/MaC+IV/im",$text);}
  1. {eregi("(ma[a-za-ya]{1,20})",$text);}
  2. {preg_match("/(ma[a-za-ya]{1,20})/im",$text);}

Example for another expression and 30-megabyte text (all the same repetitions of the article that you are currently reading):

  1. {eregi("(ma[a-za-ya]{1,20})",$text);}
  2. {preg_match("/(ma[a-za-ya]{1,20})/im",$text);}

I also wrote five different expressions, but the trend doesn't change. Speeds may vary, but Pelr outperforms POSIX by at least half. This is enough to bury the regular expression functions from PHP (POSIX). For all functions there are similar Perl-oriented (they are all built into PHP).

Next, one very illustrative example on the same article (increase to 28MB). The example searches in the text of the e-mail. According to the "greed" property of regular expressions, the largest and closest address to the left edge will be found.

This example will upset pearl lovers. Nice to upset them :-)

  1. {eregi("([a-z_-]+@([a-z][a-z-]*\.) +([a-z]{2}|com|mil|org|net|gov|edu|arpa|info|biz))",$text);}
  2. {preg_match("/([a-z_-]+@([a-z][a-z-]*\.) +([a-z]{2}|com|mil|org|net|gov|edu|arpa|info|biz))/im",$text);}

It is difficult to draw a conclusion from one test, but apparently the more complex the regular expression, the more POSIX lags behind Perl.

And now the same example, but only in the article (increase to 28MB) there is NOT a SINGLE "@" symbol (I specifically made a copy of the article and erased these symbols):

  1. {eregi("([a-z_-]+@([a-z][a-z-]*\.) +([a-z]{2}|com|mil|org|net|gov|edu|arpa|info|biz))",$text,$ok); echo $ok[1];}
  2. {preg_match("/([a-z_-]+@([a-z][a-z-]*\.) +([a-z]{2}|com|mil|org|net|gov|edu|arpa|info|biz))/im",$text,$ok); echo $ok[1];}

What do we see?.. Nothing in this world is perfect. Of course, this is a very unaltered expression for finding emails, but still all those who shouted to me in the forum "ereg - it sucks", on this and similar examples can rest. It also happens that there is not a single dog in the text :-)

So, the conclusion about speed with examples was given above. The conclusion is unambiguous - you need to use Perl-oriented regular expressions. At the beginning of the chapter, I mentioned caching compiled copies of regs. If the same expression occurs repeatedly in the program, the performance may differ not just many times, but 10-100-1000 times!

The celductive example is called 200 times in a row over a 250KB text:

  1. {eregi("MaC+iB", $text);}
  2. {preg_match("/MaC+IV/im",$text);}

Everyone knows what a cache is. Apparently it is with the cache in PHP that there are problems... By the way, for the sake of an example, disable the processor cache in the BIOS of your computer and try to boot Windows 2000... Can't wait! (I think they're called L1 and L2, two different caches for code and first- and second-level data, one of which can be disabled.)

 


Cycles: for, foreach, while, count/sizeof() - acceleration 15%-30%

 

At the beginning of the program, an array of $test of integers (100,000 elements) is created. Then the following examples run once: The cycle goes through this array in 3 ways (different cycles) and performs some operations. It is impossible not to perform anything in the loop, because this will not be a real test.
  1. {$x=0; foreach($test as $n)                          { $x=sprintf("test%08i",$i);        }}
  2. {$x=0; for ($it=0; $it<100000; $it++)                { $x=sprintf("test%08i",$i);        }}
  3. {$x=0; $it=0; while($it<100000)                      { $x=sprintf("test%08i",$i); $it++; }}
  4. {$x=0; for ($it=0; $it<count($test); $it++)          { $x=sprintf("test%08i",$i);        }}
  5. {$x=0; $it=0; while($it<count($test))                { $x=sprintf("test%08i",$i); $it++; }}
  6. {$x=0; $co=count($test); for ($it=0; $it<$co; $it++) { $x=sprintf("test%08i",$i);        }}
  7. {$x=0; $co=count($test); $it=0; while($it<$co)       { $x=sprintf("test%08i",$i); $it++; }}

Why sprintf and not real echoEcho cannot be used, because it will have an immeasurable buffer (OUTPUT to the browser or console).

Now for the case. The indisputable conclusion is that using foreach slows things down a lot, and there is no big difference between for and while. (On the naked test for/while/foreach {..} foreach brakes - 30%). This is not surprising, because foreach makes a copy of the array, which takes a lot of time (although this is only a rumor).

The conclusion with count() is not so obvious, because from different text in the cycle, the % of inhibition from the fastest variant increases dramatically... I took a loop with a small load - passing through a huge array of $test + formatting with the sprintf function. As you can see, the varints with count() and the replacement $co differ by 10% in speed among themselves (do not look at the varinanth with a constant of 100,000, it is impossible to know in advance the number of elements).

Conclusion about non-associative arrays: 1) foreach significantly slows down the work 2) the use of count() in simple loops - a slowdown of 10%. But on complex cycles, the losses from unnecessary count() runs will be completely imperceptible, so the situation is not obvious.

Comparison of count() and sizeof().

Judging by the manual - these are aliases. This is written on the pages of the functions themselves and the additional page "Appendex => Aliases list". What do we see on an array of 100,000 elements:

  1. {$x=0; for ($it=0; $it<count($test); $it++)  { $x=sprintf("test%08i",$test[$it]);}}
  2. {$x=0; for ($it=0; $it<sizeof($test); $it++) { $x=sprintf("test%08i",$test[$it]);}}

Let the tests have errors... But the result is one - count() is noticeably behind in speed from sizeof()! Hmm, I would make a note to the recording in the manual: "The sizeof() function is an alias for count(), but the latter slows down a lot!"

If the number of elements in the array is less than 65000 (64K), then these functions are practically indistinguishable in terms of speed. Here the conclusion is simple - we switch to using sizeof() as an accelerated count() alias. This will bring its results on huge arrays.

Associative Arrays: Testing Different Iteration Methods

The same problem is observed with them: on arrays of different sizes, different functions are effective, but best of all foreach!

An array of 200 elements and 1000 repetitions of the program:

  1. {$x=0; foreach($test as $k=>$v) { $x=sprintf("%s=>%s\n",$k,$v);                                                           }}
  2. {$x=0; reset($test); while (list($k, $v) = each($test)) { $x=sprintf("%s=>%s\n",$k,$v);                                   }}
  3. {$x=0; $k=array_keys($test); $co=sizeof($k); for ($it=0; $it<$co; $it++) { $x=sprintf("%s=>%s\n",$k[$it],$test[$k[$it]]); }}
  4. {$x=0; reset($test); while ($k=key($test)) { $x=sprintf("%s=>%s\n",$k,current($test)); next($test);                       }}

Same thing, but an array of 5000 elements and 200 repetitions:

 

Again the same, but an array of 100,000 elements and no repetitions:

 

Other tests on idle cycles also show the advantage of foreach.

Resume:

  • sizeof() is better than count()
  • in cycles sizeof it is better to replace it with a variable at all
  • for and while are almost indistinguishable
  • to iterate through simple indexed arrays, use for or while
  • foreach must be used to iterate through associative arrays

 


To read file() file faster than fopen+cycle - acceleration 40%

 

To read a 1MB file (100,000 lines of 10 bytes) into an array of $x, you can use two options: read the file using file(), or the traditional fopen/fgets method. Of course, for files of different sizes and contents, the speed may vary. But in this example, the statistics are as follows: file ("1Mb_file.txt") works 40% faster than:
 $f=fopen(&quot;1Mb_file.txt&quot;,&quot;r&quot;) or die(1);
   while($x[]=fgets($f,1000));
   fclose($f);
 

Similar options

$f=fopen(&quot;1Mb_file.txt&quot;,&quot;r&quot;) or die(1);
   while($s=fgets($f,1000)) $x[]=$s;
   fclose($f);

 

 
or
 
$f=fopen(&quot;1Mb_file.txt&quot;,&quot;r&quot;) or die(1);
   while(!feof($f))) $x[]=fgets($f,1000);
   fclose($f);

work even more slowly (in the second case, the extra feof() function significantly reduces the speed). The same test, but on a 15MB file (100,000 lines of 150 bytes) shows a difference of 50%, in favor of file(). The test was conducted in such a way as to exclude background swapping during operation due to previous commands to create/read such large files. It is not possible to calculate the same thing on very small files of 1-2 KB, because the read operation cannot be repeated during one test, read operations will be cached ...