Wednesday, February 17, 2010

PHP: References To Array Elements Are Risky

References to array elements can bite! And it is not only the case with referencing in foreach loop. It seems that creating a reference to an array element replaces that element itself with a reference. If you then copy such an array and change the elements inside copy you can overwrite original value!

All of the presented code was tested on Mac (PHP 5.2.11), Linux (PHP 5.2.6-1+lenny4) and Windows XP (PHP 5.3.0) using PHP cross platform testing lab on Mac based on VirtualBox.

Changing copy affects original array

Sounds impossible? I agree. I couldn't believe it myself. Nevertheless here is the proof:
Example 1
$a = array('one', 'two', 'three', 'four'); 
$a2 = &$a[2]; 
 
$b = $a; 
$b[1] = 'two again'; 
$b[2] = 'reference bites'; 
 
var_dump($a, $b);
The above example will output:
array(4) {
  [0]=>
  string(3) "one"
  [1]=>
  string(3) "two"
  [2]=>
  &string(15) "reference bites"
  [3]=>
  string(4) "four"
}
array(4) {
  [0]=>
  string(3) "one"
  [1]=>
  string(9) "two again"
  [2]=>
  &string(15) "reference bites"
  [3]=>
  string(4) "four"
}

Changing copy of a copy affects original array too

If you look at the dump carefully you will see that $a[2] and $b[2] are displayed as references here. That would mean the element has been replaced with a reference. And if you copy a reference you just get what? Same reference, right? So any copy of $a would contain that reference. Going forward any copy of a copy of $a would contain that same reference. Let's check it:
Example 2
$a = array('one', 'two', 'three', 'four'); 
$a2 = &$a[2]; 
 
$b = $a; 
$c = $b; 
$c[2] = 'references bites more'; 
$b[1] = 'two again'; 
$a[0] = 'one more'; 
var_dump($a, $b, $c);
The above example will output:
array(4) {
  [0]=>
  string(8) "one more"
  [1]=>
  string(3) "two"
  [2]=>
  &string(21) "references bites more"
  [3]=>
  string(4) "four"
}
array(4) {
  [0]=>
  string(3) "one"
  [1]=>
  string(9) "two again"
  [2]=>
  &string(21) "references bites more"
  [3]=>
  string(4) "four"
}
array(4) {
  [0]=>
  string(3) "one"
  [1]=>
  string(3) "two"
  [2]=>
  &string(21) "references bites more"
  [3]=>
  string(4) "four"
}

Tricky "foreach" with reference explained

As you can see, the only element affected is the one referenced. That can lead to serious potential problems. Working on a copy of an array with scalar elements seems to be not as safe as one may thought. However, that explains one of the biggest pitfalls of iterating arrays in PHP: foreach with reference. The following is rather common knowledge:
Example 3
$a = array('one', 'two', 'three', 'four'); 
foreach ($a as &$v) {} 
foreach ($a as $v) {} 
var_dump($a);
The above example will output:
array(4) {
  [0]=>
  string(3) "one"
  [1]=>
  string(3) "two"
  [2]=>
  string(5) "three"
  [3]=>
  &string(5) "three"
}
But what was the explanation for that, again? It is quite clear that $v keeps reference to $a[3] after the foreach loop is finished. So what values does $v (effectively $a[3]) get within the next foreach loop? Let's see:
Example 4
$a = array('one', 'two', 'three', 'four'); 
foreach ($a as &$v) {} 
foreach ($a as $v) { 
 var_dump($a[3]); 
}
The above example will output:
string(3) "one"
string(3) "two"
string(5) "three"
string(5) "three"
Well, it's quite obvious. Or is it? As the foreach manual page states: "unless the array is referenced, foreach operates on a copy of the specified array and not the array itself". I would assume that "array is referenced" means array elements are referenced like this: foreach ($a as &$v) {} Apparently that is not the case with the second loop and it should work on a copy. Let's have a look what exactly would happen if the second foreach loop really worked on a solid copy:
Example 5
$a = array('one', 'two', 'three', 'four'); 
$b = $a; 
foreach ($a as &$v) {} 
foreach ($b as $v) {} 
var_dump($a);
The above example will output:
array(4) {
  [0]=>
  string(3) "one"
  [1]=>
  string(3) "two"
  [2]=>
  string(5) "three"
  [3]=>
  &string(4) "four"
}
I imagine you did expect this result. Beware references to array elements and thanks for reading. I hope you found this article useful. Please don't think twice before you leave your comment.

9 comments:

  1. Using the latest version of PHP, instead of in example 5 doing the $b = $a, I instead was able to work around the issue by changing the $v in the second foreach loop to a different variable. It seems that using the same value variable that you used as the reference variable is what is not working as expected.

    ReplyDelete
  2. Another interesting bit: If you reference an array element that is later unset() or popped, the reference will retain its value;

    $a = array(1,2,3,4);
    $b = &$a[2]; // b = 3

    unset($a[2]);

    print $b; // prints 3

    ReplyDelete
  3. The reference to array element is really confusing me. I am still not very clear if that is a bug of PHP. And if the bug has been solved? Because only less article talked about it.

    If we unset (all) the reference to array element except itself, the array element would come back to normal. This is weird. Don't know the how it implemented

    ';

    unset($b);
    unset($c);

    var_dump($a);
    echo '
    ';


    output:
    array(2) { [0]=> &int(10) [1]=> int(1) }
    array(2) { [0]=> int(10) [1]=> int(1) }

    ReplyDelete
  4. I think the "Tricky "foreach" with reference explained" part is a misunderstanding. The point is not whether the second loop operates on the original array or a copy -- it doesn't change the array, so original-or-copy doesn't matter.

    The point is that the second loop comes after the first loop, and specifically after the symbol "v" has been bound to the last array element. Try renaming the loop variable in the second loop and it will go away. The second loop doesn't use a local variable as the loop variable, it actually uses the last array cell as the loop variable, because the name "v" is still bound to it. Unsetting "v" after the first loop is another way to make it go away, and actually I recommend to *always* unset the loop variable after a "foreach with reference" so you don't acidentally trip over this.

    ReplyDelete
  5. WRT your first observation, the PHP manual says that "references do not work like C pointers and more like Unix hardlinks". Actually, they work like neither of the two.

    Very good source: http://derickrethans.nl/talks/phparch-php-variables-article.pdf

    PHP has two kinds of variables, and a variable can switch between the two:

    - non-reference variables. New variables are created like this. Assigning them to another variable by value increases an internal reference count, it doesn't actually copy the value. Write access copies the variable if the refcount is >1, otherwise it modifies the variable. Assigning the variable to another variable by reference is treated as write access; this makes sure that the implicit reference in by-value assignment doesn't become explicit, but it also is the kind of "assignment affects the source" behavior you described. Yes, it affects the source -- it is a write access that triggers copy-on-write AND it turns the source variable into a reference variable.

    - reference variables. There are at least two names for this variable, and they are explicit references, i.e. no copy-on-write. If the refcount drops to 1, the variable becomes a non-reference variable again. The latter effect makes unset() cancel the effect in your example.

    ReplyDelete
  6. A final piece of information (sorry for multi-posting, but I'm writing this while discovering it myself):

    When an array is actually copied (for example, when a non-reference -- i.e. copy-on-write -- array with a refcount >1 is modified), then its element variables are treated in a way that is neither "assignment by value" nor "assignment by reference". Instead, the new array just uses the same variables as the original, and the refcount of each variable is increased by one. (In contrast: Assignment by value would copy reference variable elements, while assignment by reference would copy non-reference variable elements with a refcount >1, so this behavior is neither of the two).

    This also explains a more common observation made about arrays: That assigning arrays by value seemingly copies "normal" (non-reference) elements (actually it does copy-on-write), and shares elements that are references to other variables. Your observation is new in that sharing is also triggered by making a reference to a "normal" element, since it turns it into a reference variable.

    ReplyDelete
  7. can somebody post this issue on stackoverflow.com?? i wanna hear what community will say about this

    ReplyDelete
  8. You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...
    iOS Training in Chennai
    Android Training in Chennai
    php Training in Chennai

    ReplyDelete
  9. Such a great articles in my carrier, It's wonderful commands like easiest understand words of knowledge in information's.
    PHP Training in Chennai

    ReplyDelete