IEnumerable, It's Not a Collection.
IEnumerable<T>
has become one of the most popular ways not only to manipulate collections, but to pass them around as a read only structure. The power of LINQ and the safety of read-only immutable structures make it an excellent choice when modeling your domain.
But with great power, comes great responsibility, and IEnumerable<T>
has it's fair share of quirks to trip up unsuspecting developers.
It's an Enumerable, not a Collection
While we often use IEnumerable<T>
as a way to pass around collections, especially if we aren't picky about the underlying data structure. Maybe it's an T[]
, maybe it's a List<T>
. We don't know and we (allegedly) don't care.
But there are lots of objects that implement the IEnumerable<T>
interface and they aren't collections, and thus don't behave how we might expect them to. Often times developers will be unaware that these concrete types are in their codebases, assuming that whatever IEnumerable
object they are passing around represents an actual collection. Even if you've always specified a list or array, something as simple as using LINQ can open the door to some of these other types.
These types can get us in to trouble by assuming more than the IEnumerable<T>
interface promises.
Some Enumerables are lazy
Consider the following code:
public class Batter
{
public string Name { get; }
public bool IsAtBat { get; private set; }
public bool IsOnDeck { get; private set; }
public Batter(string name)
{
Name = name;
IsOnDeck = false;
}
public void MarkAsOnDeck()
{
IsOnDeck = true;
}
public void MarkAsAtBat()
{
IsOnDeck = false;
IsAtBat = true;
}
}
public class BattingOrder
{
public IEnumerable<Batter> BattingOrder { get; }
public BattingOrder(IEnumerable<string> battingOrderNames)
{
BattingOrder = CreateBatters(battingOrderNames);
BattingOrder.First().MarkAsAtBat();
BattingOrder.Skip(1).First().MarkAsOnDeck();
}
public IEnumerable<Batter> CreateBatters(IEnumerable<string> battingOrderNames)
=> battingOrderNames.Select(name => new Batter(name));
public Batter GetCurrentBatter()
=> BattingOrder.First(batter => batter.IsAtBat);
}
This code contains a bug, but not in any of the domain logic that's expressed.
The first Batter
in the batting order should be at bat. The second Batter
in the batting order should be on deck.
But when GetCurrentBatter()
is called, the function fails. It can't find a batter that is currently at bat.
How can that be? We created the IEnumerable<Batter>
instance, took the first one and marked it as at bat, and took the second one and marked it as on deck.
The key is right here
public IEnumerable<Batter> CreateBatters(IEnumerable<string> battingOrderNames)
=> battingOrderNames.Select(name => new Batter(name));
In this method we use the Select()
method to convert an enumerable of string
, to an enumerable of Batter
, and that's just it. We have an enumerable of batters which makes no guarantees about being a collection. In this specific case, the returned enumerable is lazy. Meaning every time we iterate through it (or call some specific LINQ methods on it) we're going to re-execute the mapping, which means we're constantly getting new references to Batter
objects.
When we call this method here
public BattingOrder(IEnumerable<string> battingOrderNames)
{
BattingOrder = CreateBatters(battingOrderNames);
BattingOrder.First().MarkAsAtBat();
BattingOrder.Skip(1).First().MarkAsOnDeck();
}
We're creating a lazy enumerable, then enumerating it twice. We're grabbing the first enumeration, then the second time, we're creating a new enumerable that skips the first enumeration, and then enumerating it to grab the second one, but remember each time we enumerate the BattingOrder
enumerable, we create new instances of Batter
objects. Meaning the "first" Batter
we skip over in the last line, isn't the same Batter
instance we marked as at bat on the previous line.
So finally when we call this method
public Batter GetCurrentBatter()
=> BattingOrder.First(batter => batter.IsAtBat);
We're enumerating the enumerable again, creating a whole new set of Batter
instances, none of which have been marked as at bat. This seemingly straight forward and readable code, contains a very hard to spot bug, that causes it to always fail.
What to do instead
What's the solution to avoiding these kinds of errors? If we know we're going to mutate an object (such as marking a Batter
as at bat), then we must necessarily have a constant reference to the object being mutated. To ensure that any enumerable we obtain will preserve these references, it's a good idea to copy the enumerable into a private collection, which we know will preserve instances. In our case, we need just a single method call to fix our error.
public BattingOrder(IEnumerable<string> battingOrderNames)
{
//Copy enumerable into a list
//to preserve references
BattingOrder = CreateBatters(battingOrderNames).ToList();
BattingOrder.First().MarkAsAtBat();
BattingOrder.Skip(1).First().MarkAsOnDeck();
}
With this change, even though the BattingOrder
variable is still types as an IEnumerable<Batter>
we know that the object itself is an instance of List<Batter>
which means we can be confident that calling .First()
will always return the same object instance, that is, the first item in the list.
If you get passed an IEnumerable<T>
as a parameter, it's almost always a good idea to call .ToList()
or .ToArray()
on it before assigning to variable, property or field. Not only does this ensure our reference isn't mutated by someone else, we also know that any mutations we might perform will persist.
Subscribe to my newsletter
Read articles from Sam Ferree directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Sam Ferree
Sam Ferree
Software Architect Specializing in the C# and ASP.NET Core But I've been lucky enough to dabble in Ruby, Python, Node.js, JAVA, C/C++, and more.