Initializing Lists in C#

Posted by : on (Updated: )

Category : C#

Introduction

A common data structure in C#, lists, many times are preferred over arrays because of their dynamic resizing. When we create an array, we have to tell the compiler, the number of elements it will contain, either explicitly by creating a new array like this

int[] bar = new int[4];

or implicitly by populating it at its construction like this:

int[] foo = {1, 2};

on the other hand, a list, doesn’t have a default number of elements at construction. It can dynamically grow or shrink, according to our needs. We can create lists like this:

List<int> fooList = new List<int>(){1 ,2 };
List<int> barList = new List<int>();

but we can add or remove elements as we please and the size of the list will change according to our needs.

UPDATE: From .NET 8, we can now initialize a list, span or array like this:

List<int> foo = [1,2,3];
char[] foo2 = ['a','v','q'];

Array elements in C# occupy a contiguous block in memory, so we can have fast access to those items, but so do lists. How is this possible? How the elements in a list can a occupy a contiguous space in memory, when the compiler doesn’t know at creation the amount of memory space it will need, as the list can change its size dynamically due to the addition and removal of elements at runtime?

The List<T> Class

The List<T> Class is actually a wrapper over an array. When we create a list, the compiler creates an array that occupies a contiguous space in memory. The size of this array varies, depending on our List. If we specify during the list construction a size, that will be the size of the array. If we don’t, then an array with capacity of 0 will be created and that capacity will become 4 the moment we add our first element in the list.

Whenever we try to add a new element to the list, the size of the array is being checked. If we are at the limit and try to add one more, then a new array will be created that is double the size of the previous array. The elements of the first array will be copied to the second, then the new item will be added and the previous array, won’t be referenced anymore and will be eligible for garbage collection.

Best practices and performance considerations

Value types vs Reference types

If our List holds value type elements then the data is stored in a contiguous space in memory, but if our List’s elements are reference types then the list has a contiguous list of references. This happens because a list can hold many references to the same instance, so the class instances cannot be contiguous.

Comparison with the LinkedList<T> class

The LinkedList class is not backed by an array. A linked list is composed by nodes that besides the data, they hold information about the next node in the list(single linked) and the previous node in the list(doubly linked). In C# the LinkedList<T> class is a doubly linked list. Usually the List<T> class is more convenient to use. The LinkedList looks more like a chain and can be more performant in adding/removing elements before/after a particular node, but random accessing elements is a lot slower because it doesn’t have an indexer.

Also the LinkedList probably will take more memory because it needs to hold the references to the nodes that are before/after each node element.(Exceptions always exist and each particular case must be tested individually, for example a List can occupy more space in memory because it can be backed by an array that many of its elements are unused.)

LinkedList is more efficient usually when accessing sequential data (still exceptions apply and each case must be tested, for example check this number crunching article if you are interested in a low level analysis of how microprocessors implement caching of memory and the locality of reference)

There are other differences between the two in the performance/memory needs of each that depend not only in the operation we execute but also in the implementation of the LinkedList class in C# (for example the number of elements is cached so count is constant time in both) but this article is about the List<T> class, so i won’t go in detail here.

As a general advice, when in doubt use the List<T> class.

Comparison with the ArrayList Class

The List<T> class is always better. The ArrayList class stores object references, in contrast the List<T> gives us type safety and if our elements are value types, we don’t have to deal with boxing issues because the List class will create an array of that value type. If we need a List that will hold our own value types then these types should implement the IEquatable<T> interface to avoid boxing in equality comparisons. (If the IEquatable<T> interface is not implemented the Contains method will call the Object.Equals(Object) method that will box our value type)

Initializing the List with a size during construction

We can initialize the List with a size during construction, like this:

List<int> foo = new List<int>(420);

This will create a List with an initial size of 420. That means that the backing array of the List will have an initial size of 420. If we have knowledge during compile time about the maximum number of elements that our list can have, then initializing the list like this can have performance and memory space benefits, because the list won’t have to create a new array every time the backing array has been filled. For example if our List during runtime had to hold 430 elements, the above list would create an array of 840 elements the moment we tried to add the 421th element, then copy the 420 elements of the previous backing array to the new array, add the 421th element and occupying memory for 410 element more than we need. Worse than that, the previous backing array would now be eligible for collection by the GC.

Even when we have a List that holds reference types, that benefit is still important. Let’s suppose that we have a List that holds reference types in a 64bit machine. That means that each element would be 8 bytes in the backing array. If we don’t initialize our List with a number at the constructor, a backing array of 4 elements will be created. The moment we add the 5th element a new array with a size of 8 will be created and the previous array will be eligible for collection. That is 4*8 = 32 bytes. When we add the 9th element the same thing will happen and now 64 bytes have to be collected and so on.

That gets even worse when our List is getting larger. After our List is over 85.000 bytes then it is considered a large object by the garbage collector and is stored in a special place in the heap called The Large Object Heap. These objects get only collected during a gen 2 collection but even worse they don’t get compacted by the GC. Gen 2 collection doesn’t happen often, so the previous backing array will occupy memory for a long time and because the Large Object Heap doesn’t get compacted, every time a new backing array will be created, it won’t “fit” in any previous space, even if a gen 2 collection has taken place. That will leave our memory fragmented and if this keeps happening we will have an OutOfMemoryException.

By initializing our List we avoid problems in performance with the GC and have more efficient use of the machine’s memory.

Getting a Span<T> from a List<T>

In an array we have the AsSpan method to get a Span from an array. I won’t cover span here because this article is already getting long, but i want to address why such a method doesn’t exist in List<T>.

In reality there is such a method. It is called CollectionsMarshal.AsSpan<T>(List<T>) Method, but we have to be careful when we use it, because in contrast with an array, a list doesn’t always exist at the same space in memory during runtime as we saw previously. This method will show invalid data when we add or remove elements from a list, so it should only be used if we are sure that elements aren’t added or removed from our list during the use of our span.

Conclusion

I think that covers how lists are implemented in C# and why initializing them during construction with a size, if that is possible, is a good idea. If i have forgotten anything or some clarification is needed, don’t hesitate to use the comments section, or contact me directly via the contact form or email, also if you don’t want to miss any of the new articles, you can always subscribe to my newsletter or the RSS feed.


Follow me: