Improving .NET Performance: Allocating Small Objects on the Stack in .NET 9 and Beyond

Posted by : on

Category : C#

Introduction

Until recently, the rule that value types could exist on either the stack or the heap while reference types always resided on the heap was a fundamental principle in C#. This was one of the core rules of object management: objects were created and allocated on the heap. Because of this, developers working on performance-critical scenarios had to be mindful of how and when objects were created.

The performance impact wasn’t just due to the Garbage Collector (GC) running through the code. Each time the GC executed, it had to scan more objects to determine which were eligible for garbage collection and then spend additional time collecting them. Another crucial factor was the speed difference between accessing the stack and heap memory. Objects on the heap could be accessed from anywhere, whereas the stack followed a strict Last In, First Out (LIFO) structure, allowing only the pushing of new items onto the stack or popping the top item off.

This rule—that objects exist on the heap, was not exclusive to C#. It was enforced by the Common Language Runtime (CLR). The CLR was designed in such a way that objects were always placed on the heap. While some independent developers had proposed ways to allow the CLR to allocate objects on the stack under specific conditions as early as 2015, and had made some prototypes with this feature, it wasn’t until .NET 9 that the CLR officially introduced this capability.

The real challenge is that for an object to be allocated on the stack, its lifetime must not exceed that of the method that created it. This check needs to be performed at runtime when the Intermediate Language (IL) is just-in-time (JIT) compiled, which can introduce performance concerns. The intricacies of how tiered JIT compilation works and how profiling determines whether objects escape their method are beyond the scope of this discussion. However, what’s important is that this process is now happening, and as of .NET 9, small objects can be allocated on the stack, leading to significant performance benefits.

Microsoft has taken an extremely conservative approach to this feature. While early prototypes suggested new keywords to indicate that an object created within a method would not escape it, in .NET 9, objects are stored on the stack only when they are boxed value types.

Looking ahead to .NET 10, there is a preview feature that extends this functionality to arrays of value types, suggesting slow but steady progress. However, in this post, I will focus solely on the boxing of value types, as I prefer not to discuss preview features before they are officially released, given that many aspects may change, or the feature might not make it to the final version (e.g., semi-auto properties).

Microsoft Example

Let’s start with the example Microsoft has for this feature, in Object stack allocation for boxes.

static bool Compare(object? x, object? y)
{
    if ((x == null) || (y == null))
    {
        return x == y;
    }

    return x.Equals(y);
}

Up until .NET 8, if x and y were boxed value types, such as the integers 3 and 4, they would be boxed into the object type, and those instances would be allocated on the heap. In .NET 9, they are still boxed, but now they are stored on the stack instead. This is possible because, in these cases, the runtime can determine that the lifetimes of these object instances do not extend beyond the method that creates them.

Boxing of value types still occurs, but it no longer carries the usual performance penalties traditionally associated with it. Since accessing the stack is significantly faster than accessing the heap, and the Garbage Collector (GC) does not need to track or clean up these objects, performance is improved.

Testing Different Implementations

Let’s explore different scenarios to see when this new stack-based object allocation occurs, when it does not, and the limitations imposed by JIT compilation profiling concerning an object’s lifetime.

If we profile the previous code in .NET 8, we observe an allocation of 48 bytes, 24 bytes for each object, which is expected. However, profiling the same code in .NET 9 reveals no allocations, as the objects are now stored on the stack. But if we modify the implementation so that an object’s lifetime extends beyond the method’s scope, like this:

static bool Compare(object? a, object? b)
{
    if (a == null || b == null)
        return a == b;

    tempStatic = a;  // object a, lives more than the method

    return a.Equals(b);
}

Then, we will see that the runtime correctly recognizes this and allocates a on the heap. The method above will allocate 24 bytes for the a object on the heap, while b will be created on the stack.

Let’s see two different methods:

bool NextFiveDivisionBy9(object? testObject)
{
    switch (testObject)
    {
        case int i:
        for (int j = 1; j <= 5; j++)
        {
            if ((i + j) % 9 == 0) return true;
        }
        break;
        case uint u:
        for (int j = 1; j <= 5; j++)
        {
            if ((u + j) % 9 == 0) return true;
        }
        break;
    }
    return false;
}

and:

bool NextFiveDivisionBy9Plain(object testObject)
{

    var integer = (int)testObject;
      
    if ((integer + 1) % 9 == 0) return true;
    if ((integer + 2) % 9 == 0) return true;
    if ((integer + 3) % 9 == 0) return true;
    if ((integer + 4) % 9 == 0) return true;
    if ((integer + 5) % 9 == 0) return true;
         
    return false;
}

Both of these will allocate memory. Even though the boxed object passed as the parameter testObject does not escape the method, the runtime does not seem to detect this.

Let’s try with a simpler method:

bool Example(object? testObject)
{
    var result = (int)(testObject ?? 0) == 1;
    return result;
}

This won’t allocate in .NET9, but the same method turned static will:

static bool ExampleStatic(object? testObject)
{
    var result = (int)(testObject ?? 0) == 1;
    return result;
}

Why does this happen when the IL generated for both methods is identical? This is likely related to how the runtime determines which methods to inline and, consequently, whether an object escapes the method. If we modify our static method like this:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
static bool ExampleStatic(object? testObject)
{
    var result = (int)(testObject ?? 0) == 1;
    return result;
}

then in .Net 9, it won’t allocate the testObject on the heap.

Detection Inside Loops and Multiple Unboxing

The runtime also struggles to determine whether objects should be created on the stack when unboxing or loops are involved. For example, consider the following:

bool Example(object? testObject)
{
    bool result = false;

    for (int i = 0; i < 2; i++)
    {
        result = (int)(testObject ?? 0) == 1;
    }
      
    return result;
}

then the testObject will be allocated on the heap, the same allocation (24 bytes) will happen if we write it like this:

bool Example(object? testObject)
{
    bool result = false;
      
    result = (int)(testObject ?? 0) == 1;
    result = (int)(testObject ?? 0) == 1;
      
    return result;
}

But when we have a method that doesn’t unbox, like this:

bool ExampleWithoutUnbox(object? o)
{
    if (o == null) return false;
    if (o == newObject) return false;

    return false;
}

Then, multiple uses of the boxed parameter will not allocate memory. However, this was also true in previous .NET versions since the underlying type is not utilized. Nevertheless, a for loop presents challenges for the JIT compiler in both .NET 8 and .NET 9, as it still results in allocations in both cases:

bool ExampleWithoutUnbox(object? o)
{
    if (o == null) return false;

    for (int i = 0; i < 2; i++)
    {
        if (o == newObject) return false;
    }
      
    return false;
}

when this doesn’t allocate in .NET 9, but allocates in .NET 8:

bool ExampleWithoutUnbox(object? o)
{
    if (o == null) return false;
      
    if (o == newObject) return false;
    if (o == newObject) return false;
      
    return false;
}

The issue with detection inside loops occurs not only within the implementation itself but also in the calling code. For example, if we call the previous method like this:

for (int i = 0; i < 2; i++)
{
   ExampleWithoutUnbox(1);
}

it will allocate 48 bytes, but if we call it like this:

ExampleWithoutUnbox(1);
ExampleWithoutUnbox(1);

there won’t be any allocations.

Example: Struct Implementing an Interface

In a previous post: How to avoid boxing structs that implement interfaces in C# I discussed how we can prevent the allocation of a struct that implements an interface by using generics.

With this new feature, we can also avoid allocations for structs implementing interfaces without relying on generics. However, this approach is more limited since the compiler cannot always accurately determine whether the boxed struct escapes the method.

Let’s consider the following interface and struct:

internal interface IIntNumber
{
   int GetInt();
}

internal struct IntNumber : IIntNumber
{
   private int _value;
   
   public IntNumber(int value) => _value = value;
   public int GetInt() => _value;
}

and the following method:

int GetNumberFromStruct(IIntNumber number)
{
    int n = 0;
    n = number.GetInt();

    return n;
}

when we call it, for the struct private readonly IntNumber _structWithNumber = new IntNumber(42); like this:

GetNumberFromStruct(_structWithNumber);
GetNumberFromStruct(_structWithNumber);

we won’t have any allocation on the heap, but calling it like this will:

for (int i = 0; i < 2; i++)
    GetNumberFromStruct(_structWithNumber);

The issue with loop detection by the runtime extends beyond the calling code and also affects the method’s implementation.

Implementing the method like this will prevent allocations when boxing structs:

int GetNumberFromStruct(IIntNumber number)
{
    int n = 0;

    n = number.GetInt();
    n = number.GetInt();

    return n;
}

But this implementation will allocate 24 bytes fot the object of the IIntNumber type:

int GetNumberFromStruct(IIntNumber number)
{
    int n = 0;
      
    for (int i = 0; i < 2; i++)
        n = number.GetInt();

    return n;
}

Of course, any implementation that allows the object to outlive its method, will also allocate:

int GetNumberFromStruct(IIntNumber number)
{
    int n = 0;
      
    n = number.GetInt();
    tempIntNumber = number;
      
    return n;
}

Therefore, using generics remains a safer way to avoid allocations of structs that implement an interface, compared to relying on the runtime to store them on the stack when they are boxed.

Arrays of Value Types Are Next

In .NET 10, which is still in early preview, this feature has been extended to include the allocation of arrays of value types on the stack. For example:

static void Sum()
{
    int[] numbers = {1, 2, 3};
    int sum = 0;

    for (int i = 0; i < numbers.Length; i++)
    {
        sum += numbers[i];
    }

    Console.WriteLine(sum);
}

The numbers array here will not be allocated on the heap, as the runtime compiler can safely deduce that its lifetime will not exceed the method, so the allocation will occur on the stack.

Benchmarking

Creating benchmarks for the above scenario is challenging, as the issues with loops prevent us from wrapping our methods in a way that produces meaningful results. In any case, a single execution happens quite quickly. Additionally, benchmarking between two different .NET versions can skew the results due to other optimizations, particularly how for loops are executed. In .NET 9, for loops execute in reverse, making them slightly faster. That said, here’s a simple benchmark result for two different structs, both in .NET 9. The first allocates on the heap, while the second does not:

int GetNumberFromStructWithFor(IIntNumber number)
{
    int n = 0;

    for (int i = 0; i < 2; i++)
    {
        n = number.GetInt();
    }
      
    return n;
}

int GetNumberFromStruct(IIntNumber number)
{
    int n = 0;
      
    n = number.GetInt();
    n = number.GetInt();
      
    return n;
}
MethodMeanErrorStdDevMedianGen0Allocated
GetNumberFromStruct0.0085 ns0.0119 ns0.0112 ns0.0035 ns--
GetNumberFromStructWithFor7.3438 ns0.1814 ns0.3362 ns7.4284 ns0.007624 B


This is a significant difference, not only in execution time, as accessing the stack is much faster than accessing the heap, but also because it reduces the workload for the Garbage Collector. With this new feature, the GC will have a smaller tree of objects to check and a reduced number of objects to collect.

Conclusion

Any performance improvement in .NET is welcome. The benchmark above shows significant gains from allocating small objects on the stack rather than the heap. However, in most real-world applications, few use cases will see a noticeable difference in single-digit nanoseconds. Still, the most notable benefit is the reduced workload for the Garbage Collector.

What’s more important to take away from this post are two key points:

1) The old interview answer that value types can exist on both the stack and the heap, while reference types always exist on the heap, is no longer true. Under specific circumstances, both value and reference types can exist in either memory area. This challenges some long-standing assumptions often seen in boilerplate interviews.

2) More seriously, this post highlights the importance of profiling and benchmarking code in situations that reflect your specific use case. Boxing/unboxing, the stack and the heap, value types and reference types, and other such considerations are implementation details of the compiler, runtime, BCL, and .NET itself. Theory should not be used to reason about the performance of code, what was true a year ago may not be today. With so many layers between the hardware and our C# code, which frequently change and interact in ways that are nearly impossible to predict, optimizations can alter performance at any time.

The only reliable way to test performance is through benchmarking, and only then can we reason about what is happening based on the results.

Thank you for reading, and if you have any questions or comments you can use the comments section, or contact me directly via the contact form or by email. Also, if you don’t want to miss any of the new blog posts, you can subscribe to my newsletter or the RSS feed.


Follow me: