PHP : The fall of `array`
array
is a very common type in PHP but is often overused.
I'll try to explain why you don't (almost) need to type hint array anymore !
Table of Contents
- The problems
- First steps
- Using
Countable
- Using
ArrayAccess
- Using
Traversable
5.1. The hard way (implementingIterator
)
5.2. The easy way (implementingIteratorAggregate
)
5.3. But what do I gain ? - Conclusion
The problems
- Businness logic
Let's first talk about why we need a replacement toarray
.
When declaring a simple parameter likearray $products
we often need to apply some domain logic on it :groupByCategories
,filterNonUnavailable
,calculateTotalPrice
, ... But where should we put all of these logic ? Often I see it repeated over multiple places in the code or in helper classes with naming or structure inconsistencies.
- Memory consumption
Arrays are known to be memory intensive in PHP (see the tweet from Nikita Popov below).
Additionally, it cannot be generated on the fly (vsGenerator
's that we will discuss later).
- ISP
It always provides 3 different features :Countable
,Traversable
,ArrayAccess
which you may not always have a need for.
- Keys must be
string
orint
When using an array you MUST use either a string or an int as key (⚠️float
s are automatically converted toint
as well as numericstring
).
Sometimes you may want to have aDateTimeImmutable
as key or any objects. You even may want to have duplicated keys !
- Pass by value
When you use an array variable to call a method or a function, PHP will actually duplicate that array everytime (pass by value). Object however are pass by reference.
First steps
If you've made it this far you might be afraid of all the refactoring it may take ! Let me reassure you first and then explain how we can improve from this. Here is a simple class that does exactly the same as array
:
/** @extends ArrayIterator<string, Product> */
class ProductCollection extends ArrayIterator
{
/**
* @param array<string, Product> $products
*/
public function __construct(array $products)
{
parent::__construct($products);
}
}
Now you can typehint ProductCollection $products
everywhere without changing anything else. It can be used in all similar way as array
and PHPStorm will know that $productCollection['someId']
will return a Product
class.
EDIT : The object IS NOT compatible with array_*
functions natively. You'll need to use iterator_to_array($productCollection)
to apply it to an array_*
function. Either that or create a method on your object that will use the internal value.
Seems enough right ? Well I said first step. Now let's see how and why we can improve a bit further.
Using Countable
This one is very light and is often used in combination with the other two. If you only need to check the number in your array you can simply says that your class is Countable
like so :
class ProductCollection implements Countable
{
public function count(): int
{
return count($this->products);
}
}
This class on its own is not very useful. Let's dive into the other two.
Using ArrayAccess
Let's say that now you want to provide the same syntax as $productCollection[$productId]
to access data BUT you want to provide an immutable-like array.
Here is an example:
/** @implements ArrayAccess<string, Product> */
class FrozenProductCollection implements ArrayAccess
{
/**
* @param array<string, Product> $products
*/
public function __construct(private readonly array $products)
{
}
public function offsetExists(mixed $offset): bool
{
return array_key_exists($offset, $this->products);
}
public function offsetGet(mixed $offset): mixed
{
return $this->products[$offset];
}
public function offsetSet(mixed $offset, mixed $value): void
{
throw new Exception('Cannot add a product to a FrozenProductCollection');
}
public function offsetUnset(mixed $offset): void
{
throw new Exception('Cannot remove a product to a FrozenProductCollection');
}
}
With this you can access data but you cannot remove or add new products to the Collection.
offsetExists(mixed $offset): bool
: return true if entry at position$offset
exists (used byisset
,??
, ...offsetGet(mixed $offset): mixed
: return the entry associated to the offset.offsetSet(mixed $offset, mixed $value): void
: add / override an entry at the position$offset
.offsetUnset(mixed $offset): void
: remove an entry at the position$offset
Note that all these methods use mixed
for $offset
. So yes you could have a logic to offsetGet
a Product
by a given ProductId
object or if you have a ProductAlias
or whatever you want.
You could even combine this with the basic class like so :
/** @extends ArrayIterator<string, Product> */
class ProductCollection extends ArrayIterator
{
/**
* @param array<string, Product> $products
*/
public function __construct(array $products)
{
parent::__construct($products);
}
}
/** @implements ArrayAccess<string, Product> */
class FrozenProductCollection implements ArrayAccess
{
public function __construct(
private readonly ProductCollection $products,
) {
}
public function offsetExists(mixed $offset): bool
{
return array_key_exists($offset, $this->products);
}
public function offsetGet(mixed $offset): mixed
{
return $this->products[$offset];
}
public function offsetSet(mixed $offset, mixed $value): void
{
throw new Exception('Cannot add a product to a FrozenProductCollection');
}
public function offsetUnset(mixed $offset): void
{
throw new Exception('Cannot remove a product to a FrozenProductCollection');
}
}
$productCollection = new FrozenProductCollection(new ProductCollection([
'#123' => new Product(
id: '#123',
name: 't-shirt',
),
]));
$productCollection['#456'] = new Product(
id: '#456',
name: 'headband',
); // This is now forbidden
Decorator for the win 🎉 !
Using Traversable
PHP interface Traversable
is used to define an object that can be iterated over (with foreach
). array
is Traversable
but objects can too ! LEt's look how to implement it. You cannot implement Traversable directly (reserved for PHP) but you can either use Iterator
or IteratorAggregate
.
The hard way (implementing Iterator
)
/** @implements Iterator<string, Product> */
class ProductCollection implements Iterator
{
private int $cursorPosition = 0;
/** @var array<string> */
private readonly array $keys;
/** @var array<Product> */
private readonly array $products;
/**
* @param array<string, Product> $products
*/
public function __construct(
array $products
) {
$this->keys = array_keys($products);
$this->products = array_values($products);
}
public function rewind(): void
{
$this->cursorPosition = 0;
}
public function current(): Product
{
return $this->products[$this->cursorPosition];
}
public function key(): string
{
return $this->keys[$this->cursorPosition];
}
public function next(): void
{
++$this->cursorPosition;
}
public function valid(): bool
{
return isset($this->products[$this->cursorPosition]);
}
}
You can see that is a lot of code for something that does not provide yet anything better than array
. Although this syntaxe may look the same accross any Collection
it is actually a good way of implementing the PDO Generator technique. So do we avoid this much repeated code ?
The easy way (implementing IteratorAggregate
)
Take a look at this class :
/** @implements IteratorAggregate<string, Product> */
class ProductCollection implements IteratorAggregate
{
/**
* @param array<string, Product> $products
*/
public function __construct(
private readonly array $products
) {
}
/** @return ArrayIterator<string, Product> */
public function getIterator(): ArrayIterator
{
return new ArrayIterator($this->products);
}
}
Both of these example produce the same result : you can iterate over a ProductCollection
instance like so :
$productCollection = new ProductCollection([
'#123' => new Product(
id: '#123',
name: 't-shirt',
),
]);
foreach ($productCollection as $product) {
echo $product->id; // will print '#123'
}
But what do I gain ?
First of all you now a typed collection that you can explicitly typehint anywhere. Your collection herself holds the information about iteration whereas with array
you had to typehint using phpdoc everywhere what does this array contains.
You also have access to a class so you can add any methods that you deemed usefull to this context.
As of this moment you do not have any PHP memory improvements because we are still using an array in our class. Let's improve this.
/** @implements IteratorAggregate<string, Product> */
class ProductCollection implements IteratorAggregate
{
/**
* @param iterable<string, Product> $products
*/
public function __construct(
private readonly iterable $products
) {
}
/** @return Generator<string, Product> */
public function getIterator(): Generator
{
yield from $this->products;
}
}
⚠️ Beware that this might be a non rewindable generator meaning that you can iterate over $this->products
only once.
Now you can use it either with an object or an array. This object can be a generator (created using yield
). Here is an example using doctrine that will iterate over every rows without loading all of them to memory at once :
class ProductRepository extends ServiceEntityRepository
{
public function __construct(ManagerRegistry $registry)
{
parent::__construct($registry, Product::class);
}
public function streamAll(): ProductCollection
{
$qb = $this->createQueryBuilder('product');
return new ProductCollection($qb->getQuery()->toIterable());
}
}
There are a lot more to say about Iterator
's. Depending on feedback on this post, I might go deeper in a follow-up article.
Conclusion
Using classes for what used to be provided by array
allows you to :
- more easily extend your code
- keep business logic together
- typehint precisely what you inject in a method or class
- keep your static analysers happy !