PHP : The fall of `array`

PHP : The fall of `array`
Photo by Dmitry Ratushny / Unsplash

array is a very common type in PHP but is often overused.
I'll try to explain why you don't (almost) need to type hint array anymore !

Table of Contents

  1. The problems
  2. First steps
  3. Using Countable
  4. Using ArrayAccess
  5. Using Traversable
    5.1. The hard way (implementing Iterator)
    5.2. The easy way (implementing IteratorAggregate)
    5.3. But what do I gain ?
  6. Conclusion

The problems

  1. Businness logic
    Let's first talk about why we need a replacement to array.
    When declaring a simple parameter like array $products we often need to apply some domain logic on it : groupByCategories, filterNonUnavailable, calculateTotalPrice, ... But where should we put all of these logic ? Often I see it repeated over multiple places in the code or in helper classes with naming or structure inconsistencies.
  1. Memory consumption
    Arrays are known to be memory intensive in PHP (see the tweet from Nikita Popov below).
    Additionally, it cannot be generated on the fly (vs Generator's that we will discuss later).
  1. ISP
    It always provides 3 different features : Countable, Traversable, ArrayAccess which you may not always have a need for.
  1. Keys must be string or int
    When using an array you MUST use either a string or an int as key (⚠️ floats are automatically converted to int as well as numeric string).
    Sometimes you may want to have a DateTimeImmutable as key or any objects. You even may want to have duplicated keys !
  1. Pass by value
    When you use an array variable to call a method or a function, PHP will actually duplicate that array everytime (pass by value). Object however are pass by reference.

First steps

If you've made it this far you might be afraid of all the refactoring it may take ! Let me reassure you first and then explain how we can improve from this. Here is a simple class that does exactly the same as array :

/** @extends ArrayIterator<string, Product> */
class ProductCollection extends ArrayIterator
{
    /**
      * @param array<string, Product> $products
      */
    public function __construct(array $products) 
    {
    	parent::__construct($products);
    }
}

Now you can typehint ProductCollection $products everywhere without changing anything else. It can be used in all similar way as array and PHPStorm will know that $productCollection['someId'] will return a Product class.

EDIT : The object IS NOT compatible with array_* functions natively. You'll need to use iterator_to_array($productCollection) to apply it to an array_* function. Either that or create a method on your object that will use the internal value.


Seems enough right ? Well I said first step. Now let's see how and why we can improve a bit further.

Using Countable

This one is very light and is often used in combination with the other two. If you only need to check the number in your array you can simply says that your class is Countable like so :

class ProductCollection implements Countable
{
    public function count(): int
    {
        return count($this->products);
    }
}

This class on its own is not very useful. Let's dive into the other two.

Using ArrayAccess

Let's say that now you want to provide the same syntax as $productCollection[$productId] to access data BUT you want to provide an immutable-like array.

Here is an example:

/** @implements ArrayAccess<string, Product> */
class FrozenProductCollection implements ArrayAccess
{
    /**
      * @param array<string, Product> $products
      */
    public function __construct(private readonly array $products) 
    {
    }

    public function offsetExists(mixed $offset): bool
    {
        return array_key_exists($offset, $this->products);
    }

    public function offsetGet(mixed $offset): mixed
    {
        return $this->products[$offset];
    }

    public function offsetSet(mixed $offset, mixed $value): void
    {
        throw new Exception('Cannot add a product to a FrozenProductCollection');
    }

    public function offsetUnset(mixed $offset): void
    {
        throw new Exception('Cannot remove a product to a FrozenProductCollection');
    }
}

With this you can access data but you cannot remove or add new products to the Collection.

  • offsetExists(mixed $offset): bool : return true if entry at position $offset exists (used by isset, ??, ...
  • offsetGet(mixed $offset): mixed : return the entry associated to the offset.
  • offsetSet(mixed $offset, mixed $value): void : add / override an entry at the position $offset.
  • offsetUnset(mixed $offset): void : remove an entry at the position $offset

Note that all these methods use mixed for $offset. So yes you could have a logic to offsetGet a Product by a given ProductId object or if you have a ProductAlias or whatever you want.

You could even combine this with the basic class like so :

/** @extends ArrayIterator<string, Product> */
class ProductCollection extends ArrayIterator
{
    /**
      * @param array<string, Product> $products
      */
    public function __construct(array $products) 
    {
    	parent::__construct($products);
    }
}

/** @implements ArrayAccess<string, Product> */
class FrozenProductCollection implements ArrayAccess
{
    public function __construct(
        private readonly ProductCollection $products,
    ) {
    }

    public function offsetExists(mixed $offset): bool
    {
        return array_key_exists($offset, $this->products);
    }

    public function offsetGet(mixed $offset): mixed
    {
        return $this->products[$offset];
    }

    public function offsetSet(mixed $offset, mixed $value): void
    {
        throw new Exception('Cannot add a product to a FrozenProductCollection');
    }

    public function offsetUnset(mixed $offset): void
    {
        throw new Exception('Cannot remove a product to a FrozenProductCollection');
    }
}

$productCollection = new FrozenProductCollection(new ProductCollection([
    '#123' => new Product(
        id: '#123',
        name: 't-shirt',
    ),
]));

$productCollection['#456'] = new Product(
    id: '#456',
    name: 'headband',
); // This is now forbidden

Decorator for the win 🎉 !

Using Traversable

PHP interface Traversable is used to define an object that can be iterated over (with foreach). array is Traversable but objects can too ! LEt's look how to implement it. You cannot implement Traversable directly (reserved for PHP) but you can either use Iterator or IteratorAggregate.

The hard way (implementing Iterator)

/** @implements Iterator<string, Product> */
class ProductCollection implements Iterator
{
    private int $cursorPosition = 0;
    
    /** @var array<string> */
    private readonly array $keys;
    
    /** @var array<Product> */
    private readonly array $products;

    /**
      * @param array<string, Product> $products
      */
    public function __construct(
        array $products
    ) {
        $this->keys = array_keys($products);
        $this->products = array_values($products);
    }
    
    public function rewind(): void
    {
        $this->cursorPosition = 0;
    }
    
    public function current(): Product
    {
        return $this->products[$this->cursorPosition];
    }
    
    public function key(): string
    {
        return $this->keys[$this->cursorPosition];
    }
    
    public function next(): void
    {
        ++$this->cursorPosition;
    }
    
    public function valid(): bool
    {
        return isset($this->products[$this->cursorPosition]);
    }
}

You can see that is a lot of code for something that does not provide yet anything better than array. Although this syntaxe may look the same accross any Collection it is actually a good way of implementing the PDO Generator technique. So do we avoid this much repeated code ?

The easy way (implementing IteratorAggregate)

Take a look at this class :

/** @implements IteratorAggregate<string, Product> */
class ProductCollection implements IteratorAggregate
{
    /**
      * @param array<string, Product> $products
      */
    public function __construct(
        private readonly array $products
    ) {
    }

    /** @return ArrayIterator<string, Product> */
    public function getIterator(): ArrayIterator
    {
        return new ArrayIterator($this->products);
    }
}

Both of these example produce the same result : you can iterate over a ProductCollection instance like so :

$productCollection = new ProductCollection([
    '#123' => new Product(
        id: '#123',
        name: 't-shirt',
    ),
]);

foreach ($productCollection as $product) {
    echo $product->id; // will print '#123'
}

But what do I gain ?

First of all you now a typed collection that you can explicitly typehint anywhere. Your collection herself holds the information about iteration whereas with array you had to typehint using phpdoc everywhere what does this array contains.

You also have access to a class so you can add any methods that you deemed usefull to this context.

As of this moment you do not have any PHP memory improvements because we are still using an array in our class. Let's improve this.

/** @implements IteratorAggregate<string, Product> */
class ProductCollection implements IteratorAggregate
{
    /**
      * @param iterable<string, Product> $products
      */
    public function __construct(
        private readonly iterable $products
    ) {
    }

    /** @return Generator<string, Product> */
    public function getIterator(): Generator
    {
        yield from $this->products;
    }
}

⚠️ Beware that this might be a non rewindable generator meaning that you can iterate over $this->products only once.

Now you can use it either with an object or an array. This object can be a generator (created using yield). Here is an example using doctrine that will iterate over every rows without loading all of them to memory at once :

class ProductRepository extends ServiceEntityRepository
{
    public function __construct(ManagerRegistry $registry)
    {
        parent::__construct($registry, Product::class);
    }
    
    public function streamAll(): ProductCollection
    {
        $qb = $this->createQueryBuilder('product');
        
        return new ProductCollection($qb->getQuery()->toIterable());
    }
}

There are a lot more to say about Iterator's. Depending on feedback on this post, I might go deeper in a follow-up article.


Conclusion

Using classes for what used to be provided by array allows you to :

  • more easily extend your code
  • keep business logic together
  • typehint precisely what you inject in a method or class
  • keep your static analysers happy !
Mastodon