Friday, August 10, 2012

Testing strings in PHP

Continuing from my last post on testing int's, I was curious about strings as well. Now strings are a bit easier to test in PHP than int's, so I'll skip some of the back story and get to the good stuff.

As far as I can tell, there are two valid methods for testing the validity of string variable: is_string($var) and (string)$var === $var. Now is_string is certainly easier to remember, but is it as fast?

The Testing:

I created a simple test strategy, create a set of both valid and invalid values, loop through them 100,000 times and see which takes the least amount of time. Here is my code:

<?php
require_once('../class/Timer.inc');
$timer = new Timer();

$testArr = array(123,'123',12.3,'12.3','1e9','0115',0115,0xFF,'0xFF','a',array(),array(5),array('b'),'',null,true,false);
$loops = 100000;

echo '<h2>Testing is_string($var)</h2>';
$timer->reset();
for($i=0;$i<$loops;$i++){
    foreach($testArr as $v){
        test_is_string($v);
    }
}
echo $timer->elapsed();

echo '<h2>Testing (string)$var === $var</h2>';
$timer->reset();
for($i=0;$i<$loops;$i++){
    foreach($testArr as $v){
        test_typecast($v);
    }
}
echo $timer->elapsed();

function test_is_string($var){
    return (is_string($var));
}
function test_typecast($var){
    return ((string)$var === $var);
}
?>

The Conclusion:

test_is_string: 0.82858300209045
test_typecast: 0.75825190544128

I was pleasantly surprised by this result... but it does turn out that (string)$var === $var is faster than is_string($var).

Testing for Int's in PHP

The problems:

The biggest issue with this kind of test, is that PHP is loosely typed such that 123 == '123' even thought the second value is a string. That said, a simple is_int('123') would fail, because '123' is obviously a string. What happens though when you are dealing with json from an external source, you cannot always control how that content comes over. Often we'll see something like ['123'] which is completely valid json (as opposed to the more accurate [123]). In those cases, is_int would fail as well. So what options do we have?

is_int('123'); fails, so that won't due.

$int = (int)'123'; works, but $int = (int)array('123'); returns 1, as does $int = (int)true;... so we can't just rely on type casting.

is_numeric('123'); works, is_numeric(array('123')) also fails properly, but is_numeric('123.45'); is true, so that by itself won't work.

(is_numeric('123') && '123'==(int)'123') does work in each case...

(is_int('123') || ctype_digit('123')) also works as expected. (ctype_digit() returns true if the value is a string but contains only numbers)

Just for kicks, I also decided to try it with regex: preg_match('/^\d+$/','123'); works in all cases as expected.

So now that we have three candidates (I tested each with a barrage of both valid and invalid options, all three passed as expected), which of them is the most performant?

The testing:


I created a simple test strategy, create a set of both valid and invalid values, loop through them 100,000 times and see which takes the least amount of time. Here is my code:

<?php
require_once('../class/Timer.inc');
$testArr = array(123, '123', 12.3, '12.3', 0115, '0115', 0xFF, '0xFF', 'a', array(), array(5), array('b'), '', null, true, false);
$timer = new Timer();//this is just a simple timer object
$loops = 100000;

echo '<h2>Testing is_numeric($int) && (int)$int == $int</h2>';
$timer->reset();
for($i=0;$i<$loops;$i++){
    foreach($testArr as $v){
        test_is_numeric($v);
    }
}
echo $timer->elapsed();

echo '<h2>Testing is_int($int) || ctype_digit($int)</h2>';
$timer->reset();
for($i=0;$i<$loops;$i++){
    foreach($testArr as $v){
        test_ctype_digit($v);
    }
}
echo $timer->elapsed();

echo '<h2>Testing preg_match(\'/^\d+$/\',$int)</h2>';
$timer->reset();
for($i=0;$i<$loops;$i++){
    foreach($testArr as $v){
        test_preg_match($v);
    }
}
echo $timer->elapsed();

function test_is_numeric($int){
    return (is_numeric($int) && (int)$int == $int);
}
function test_ctype_digit($int){
    return (is_int($int) || ctype_digit($int));
}
function test_preg_match($int){
    return preg_match('/^\d+$/',$int);
}
?>

The Conclusion:

test_is_numeric: 0.94352388381958
test_ctype_digit: 1.0512418746948
test_preg_match: 6.5737879276276

 The winner by a margin: (is_numeric($int) && (int)$int == $int). Oh and stay away from regex in this case at all costs. The regex overhead is over 6x slower then the other options.