Apr 17, 2017 7:46:44 AM |
.NET Exception Handling — System.OutOfMemoryException
A look into the System.OutOfMemoryException. See where the System.OutOfMemoryException resides within the .NET exception hierarchy.
Taking the next glorious step down the shining path of our .NET Exception Handling series, today we’ll be looking over the amazing System.OutOfMemoryException
. As the name implies, the System.OutOfMemoryException
typically occurs when the common language runtime (CLR
) is unable to allocate enough memory that would be necessary to perform the current operation.
We’ll spend this article seeing exactly where the System.OutOfMemoryException
resides within the .NET exception hierarchy, while also examining a trio of possible causes that could present a System.OutOfMemoryException
in your own code. Let the adventure begin!
The Technical Rundown
- All .NET exceptions are derived classes of the
System.Exception
base class, or derived from another inherited class therein. System.SystemException
is inherited from theSystem.Exception
class.System.OutOfMemoryException
is inherited from theSystem.SystemException
class.
When Should You Use It?
In spite of the name, the most likely cause of a System.OutOfMemoryException
is not technically due to a lack of memory. Instead, a System.OutOfMemoryException
can occur when attempting to increase the length of an instance of the StringBuilder
class, beyond what is specified by its current MaxCapacity
property.
To illustrate, here we have some simple code that generates a new StringBuilder
instance called builder
:
public static void StringBuilderExample()
{
try
{
string firstName = "Bob";
string lastName = "Smith";
// Initialize with allocated length (MaxCapacity) equal to initial value length.
StringBuilder builder = new StringBuilder(firstName.Length, firstName.Length);
Logging.Log($"builder.MaxCapacity: {builder.MaxCapacity}");
// Append initial value.
builder.Append(firstName);
// Attempt to insert additional value to builder already at MaxCapacity character count.
builder.Insert(value: lastName,
index: firstName.Length - 1,
count: 1);
}
catch (System.OutOfMemoryException e)
{
Logging.Log(e, true);
}
}
As indicated by the comments, we’re using a particular override of StringBuilder
, in this case the StringBuilder(Int32, Int32)
override, which defines the capacity
and MaxCapacity
property during initialization. In this case, both are set to 3
, the length of our firstName
string.
We then .Append
that initial value to our builder
, after which we attempt to .Insert
our second value at the end of the existing string index. However, because we’ve already set the MaxCapacity
value to 3
, and we’ve appended 3
characters, we’ve used up all allocated memory for our StringBuilder
instance. Thus, our .Insert
attempt throws a System.OutOfMemoryException
:
builder.MaxCapacity: 3
[EXPECTED] System.OutOfMemoryException: Insufficient memory to continue the execution of the program.
In this case, the issue is that we’ve told the CLR
how much memory to allocate using the MaxCapacity
property, which was assigned by using the StringBUilder(Int32, Int32)
override. The simplest solution is to use a different override, one that doesn’t assign the MaxCapacity
property. This will cause the default value to be set, which is Int32.MaxValue
(i.e. roughly 2.15 billion).
Another potential cause of a System.OutOfMemoryException
is, of course, actually running out of memory during execution. This could be due to repeatedly concatenating large strings, executing as a 32-bit process (which can only allocate a maximum of 2GB of memory), or attempting to retain massive data sets in memory during execution. We’ll use the latter issue in our example snippet below:
private static void LargeDataSetExample()
{
Random random = new Random();
List<Double> list = new List<Double>();
int maximum = 200000000;
int split = 10000000;
try
{
for (int count = 1; count <= maximum; count++)
{
list.Add(random.NextDouble());
if (count % split == 0)
{
Logging.Log($"Total item count: {count}.");
}
}
}
catch (System.OutOfMemoryException e)
{
Logging.Log(e, true);
}
}
This code serves no real functional purpose, but instead just illustrates one possible way of manipulating a huge data set within memory, without using any form of chunking
to reduce the allocated memory footprint of the application. In this case, we’re just looping some 200 million
times and adding a random number to our list
of Doubles
every time. Every 10 million
loops we also output our current total.
The result is that, eventually, the system cannot handle the amount of memory being used, so a System.OutOfMemoryException
is thrown:
Total item count: 10000000.
Total item count: 20000000.
Total item count: 30000000.
Total item count: 40000000.
Total item count: 50000000.
Total item count: 60000000.
Total item count: 70000000.
Total item count: 80000000.
Total item count: 90000000.
Total item count: 100000000.
Total item count: 110000000.
Total item count: 120000000.
Total item count: 130000000.
[EXPECTED] System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
The final snippet we’ll look at today is taken from the official documentation
. However, this code isn’t producing a System.OutOfMemoryException
due to a memory issue, as with our other examples. Instead, this snippet illustrates how System.OutOfMemoryExceptions
should be properly handled:
public static void ThrowExample()
{
try
{
// Outer block to handle any unexpected exceptions.
try
{
string s = "This";
s = s.Insert(2, "is ");// Throw an OutOfMemoryException exception.
throw new System.OutOfMemoryException();
}
catch (ArgumentException)
{
Logging.Log("ArgumentException in String.Insert");
}// Execute program logic.
}
catch (System.OutOfMemoryException e)
{
Logging.Log("Terminating application unexpectedly...");
Environment.FailFast(String.Format("Out of Memory: {0}",
e.Message));
}
}
Since a System.OutOfMemoryException
indicates a catastrophic error within the system, it’s recommended that anywhere a potential System.OutOfMemoryException
could occur be passed to the Environment.FailFast
method, which terminates the process and writes a message to the Windows Log
. Sure enough, executing the snippet above generates a log entry in the Windows Log
, which we can see using the Event Viewer
application:
Application: Airbrake.OutOfMemoryException.exe
Framework Version: v4.0.30319
Description: The application requested process termination through System.Environment.FailFast(string message).
Message: Out of Memory: Insufficient memory to continue the execution of the program.
Stack:
at System.Environment.FailFast(System.String)
at Airbrake.OutOfMemoryException.Program.ThrowExample()
at Airbrake.OutOfMemoryException.Program.Main(System.String[])
To get the most out of your own applications and to fully manage any and all .NET Exceptions, check out the Airbrake .NET Bug Handler, offering real-time alerts and instantaneous insight into what went wrong with your .NET code, along with built-in support for a variety of popular development integrations including: JIRA, GitHub, Bitbucket, and much more.
The exception that is thrown for errors in an arithmetic, casting, or conversion operation.
We haven’t written anything about avoiding this exception yet. Got a good tip on how to avoid throwing System.OutOfMemoryException? Feel free to reach out through the support widget in the lower right corner with your suggestions.
You may want to read this: «“Out Of Memory” Does Not Refer to Physical Memory» by Eric Lippert.
In short, and very simplified, «Out of memory» does not really mean that the amount of available memory is too small. The most common reason is that within the current address space, there is no contiguous portion of memory that is large enough to serve the wanted allocation. If you have 100 blocks, each 4 MB large, that is not going to help you when you need one 5 MB block.
Key Points:
- the data storage that we call “process memory” is in my opinion best visualized as a massive file on disk.
- RAM can be seen as merely a performance optimization
- Total amount of virtual memory your program consumes is really not hugely relevant to its performance
- «running out of RAM» seldom results in an “out of memory” error. Instead of an error, it results in bad performance because the full cost of the fact that storage is actually on disk suddenly becomes relevant.
Each character requires 2 bytes (as a char
in .NET is a UTF-16 code unit). So by the time you’ve reached 800 million characters, that’s 1.6GB of contiguous memory required1. Now when the StringBuilder needs to resize itself, it has to create another array of the new size (which I believe tries to double the capacity) — which means trying to allocate a 3.2GB array.
I believe that the CLR (even on 64-bit systems) can’t allocate a single object of more than 2GB in size. (That certainly used to be the case.) My guess is that your StringBuilder
is trying to double in size, and blowing that limit. You may be able to get a little higher by constructing the StringBuilder
with a specific capacity — a capacity of around a billion may be feasible.
In the normal course of things this isn’t a problem, of course — even strings requiring hundreds of megs are rare.
1 I believe the implementation of StringBuilder
actually changed in .NET 4 to use fragments in some situations — but I don’t know the details. So it may not always need contiguous memory while still in builder form… but it would if you ever called ToString
.
Check that you are building a 64-bit process, and not a 32-bit one, which is the default compilation mode of Visual Studio. To do this, right click on your project, Properties -> Build -> platform target : x64. As any 32-bit process, Visual Studio applications compiled in 32-bit have a virtual memory limit of 2GB.
64-bit processes do not have this limitation, as they use 64-bit pointers, so their theoretical maximum address space (the size of their virtual memory) is 16 exabytes (2^64). In reality, Windows x64 limits the virtual memory of processes to 8TB. The solution to the memory limit problem is then to compile in 64-bit.
However, object’s size in .NET is still limited to 2GB, by default. You will be able to create several arrays whose combined size will be greater than 2GB, but you cannot by default create arrays bigger than 2GB. Hopefully, if you still want to create arrays bigger than 2GB, you can do it by adding the following code to you app.config file:
<configuration>
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
</configuration>
Do it in chunks:
const int chunkSize = 2 * 1024; // 2KB
var inputFiles = new[] { "file1.dat", "file2.dat", "file3.dat" };
using (var output = File.Create("output.dat"))
{
foreach (var file in inputFiles)
{
using (var input = File.OpenRead(file))
{
var buffer = new byte[chunkSize];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
}
}
First, I would like to mention that what we are discussing here is not real permutations, but so called n-tuples
or permutations with repetition
— Wikipedia.
Second, regarding the System.OutOfMemoryException when generating permutations
, I think we all agree that the result should not be stored in a list, but provided as enumerable which will allow applying filtering and consuming (eventually storing) only the ones in interest.
In that regard the LINQ solution provided by @juharr is performing very well. So my goals are to minimize the intermediary memory allocations, including string concatenations and also to end up with a more general and faster solution.
In order to do that, I need to take some hard design decision. The signature of the general function I’m talking about will look like this
public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)
and the question is what should be the array yielded. If we follow the recomendations, they should be a separate array instances. However, remember I want to minimize allocations, I’ve decided to break that rules and yield one and the same array instance, moving the responsibility of not modifying it and cloning it if necessary to the caller. For instance, this allows the caller to perform no cost filtering. Or implement the OP function on top on it like this
public static IEnumerable<string> RepeatingPermutations(this string set, int N)
{
return set.ToCharArray().RepeatingPermutations(N).Select(p => new string(p));
}
A few words about the algorithm. Instead of looking at the problem recursively as some other answerers, I want to effectively implement the equivalent of something like this
from e1 in set
from e2 in set
...
from eN in set
select new [] { e1, e2, .., eN }
Interestingly, I recently answered a combinations related question and realized that the algorithms are pretty much the same.
With all that being said, here is the function:
public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)
{
var result = new T[N];
var indices = new int[N];
for (int pos = 0, index = 0; ;)
{
for (; pos < N; pos++, index = 0)
{
indices[pos] = index;
result[pos] = set[index];
}
yield return result;
do
{
if (pos == 0) yield break;
index = indices[--pos] + 1;
}
while (index >= set.Length);
}
}
I’ve did some tests by simply calling the different functions with N=2,3,..6 and simply iterating and counting. Here are the results on my machine:
A : N=2 Count= 676 Time=00:00:00.0000467 Memory= 29K
B1: N=2 Count= 676 Time=00:00:00.0000263 Memory= 16K
B2: N=2 Count= 676 Time=00:00:00.0000189 Memory= 8K
A : N=3 Count= 17,576 Time=00:00:00.0010107 Memory= 657K
B1: N=3 Count= 17,576 Time=00:00:00.0003673 Memory= 344K
B2: N=3 Count= 17,576 Time=00:00:00.0001415 Memory= 8K
A : N=4 Count= 456,976 Time=00:00:00.0184445 Memory= 2,472K
B1: N=4 Count= 456,976 Time=00:00:00.0096189 Memory= 2,520K
B2: N=4 Count= 456,976 Time=00:00:00.0033624 Memory= 8K
A : N=5 Count= 11,881,376 Time=00:00:00.4281349 Memory= 397K
B1: N=5 Count= 11,881,376 Time=00:00:00.2482835 Memory= 4,042K
B2: N=5 Count= 11,881,376 Time=00:00:00.0887759 Memory= 8K
A : N=6 Count= 308,915,776 Time=00:00:11.2697326 Memory= 1,688K
B1: N=6 Count= 308,915,776 Time=00:00:06.5638404 Memory= 1,024K
B2: N=6 Count= 308,915,776 Time=00:00:02.2674431 Memory= 8K
where
A — LINQ function from @juharr
B1 — my function with string
B2 — my function with char[]
As we can see, memory wise both string functions are comparable. Performance wise the LINQ function is only ~2 times slower, which is pretty good result.
As expected in such scenario, the non allocating function significantly outperforms them both.
UPDATE: As requested in the comments, here is the sample usage of the above functions (note that they are extension methods and must be placed in a static class of your choice):
var charSet = Enumerable.Range('A', 'Z' - 'A' + 1).Select(c => (char)c).ToArray();
var charPermutations = charSet.RepeatingPermutations(3);
var stringSet = new string(charset);
var stringPermutations = stringSet.RepeatingPermutations(3);
However, remember the design choice I’ve made, so if you expand the charPermutations
inside the debugger, you’ll see one and the same values (the last permutation). Consuming the whole result of the above call for char[]
should be like this
var charPermutationList = charSet.RepeatingPermutations(3)
.Select(p => (char[])p.Clone()).ToList();
Actually a good addition to the two methods presented would be the following extension method:
public static IEnumerable<T[]> Clone<T>(this IEnumerable<T[]> source)
{
return source.Select(item => (T[])item.Clone());
}
so the consuming call would be simple
var charPermutationList = charSet.RepeatingPermutations(3).Clone().ToList();
When the .NET Framework was first released, many developers believed the introduction of the garbage collector meant never having to worry about memory management ever again. In fact, while the garbage collector is efficient in managing memory in a managed application, it’s still possible for an application’s design to cause memory problems.
One of the more common issues we see regarding memory involves System.OutOfMemoryExceptions. After years of helping developers troubleshoot OutOfMemoryExceptions, we’ve accumulated a short list of the more common causes of these exceptions. Before I go over that list, it’s important to first understand the cause of an OutOfMemoryException from a 30,000 foot view.
What Is an OutOfMemoryException?
A 32-bit operating system can address 4GB of virtual address space, regardless of the amount of physical memory that is installed in the box. Out of that, 2GB is reserved for the operating system (Kernel-mode memory) and 2GB is allocated to user-mode processes. The 2GB allocated for Kernel-mode memory is shared among all processes, but each process gets its own 2GB of user-mode address space. (This all assumes that you are not running with the /3gb switch enabled.)
When an application needs to use memory, it reserves a chunk of the virtual address space and then commits memory from that chunk. This is exactly what the .NET Framework’s garbage collector (GC) does when it needs memory to grow the managed heaps. When the GC needs a new segment for the small object heap (where objects smaller than 85K reside), it makes an allocation of 64MB. When it needs a new segment for the large object heap, it makes an allocation of 32MB. These large allocations must be satisfied from contiguous blocks of the 2GB of address space that the process has to work with. If the operating system is unable to satisfy the GC’s request for a contiguous block of memory, a System.OutOfMemoryException (OOM) occurs.
There are two reasons why you might see an OOM condition.
- Your process is using a lot of memory (typically over 800MB.)
- The virtual address space is fragmented, reducing the likelihood that a large, contiguous allocation will succeed.
It’s also possible to see an OOM condition due to a combination of 1 and 2.
Let’s examine some of the common causes for each of these two reasons.
Common Causes of High Memory
When your worker process approaches 800MB in private bytes, your chances of seeing an OOM condition begin to increase simply because the chances of finding a large, contiguous piece of memory within the 2GB address space begin to decrease significantly. Therefore, you want to avoid these high memory conditions.
Let’s go over some of the more common causes of high memory that we see in developer support at Microsoft.
Large DataTables
DataTables are common in most ASP.NET applications. DataTables are made up of DataRows, DataColumns, and all of the data contained within each cell. Large DataTables can cause high memory due to the large number of objects that they create.
The most common cause of large DataTables is unfiltered data from a back-end data source. For example, if your site queries a database table containing hundreds of thousands of records and your design makes it possible to return all of those records, you’ll end up with a huge amount of memory consumed by the result set. The problem can be greatly exacerbated in a multi-user environment such as an ASP.NET application.
The easiest way to alleviate problems like this is to implement filtering so that the number of records you return is limited. If you are using a DataTable to populate a user-interface element such as a GridView control, use paging so that only a few records are returned at a time.
Storing Large Amounts of Data in Session or Application State
One of the primary considerations during application design is performance. Developers can come up with some ingenious ways to improve application performance, but sometimes at the expense of memory. For example, we’ve seen customers who stored entire database tables in Application state in order to avoid having to query SQL Server for the data! That might seem like a good idea at first glance, but the end result is an application that uses an extraordinary amount of memory.
If you need to store a lot of state data, consider whether using ASP.NET’s cache might be a better choice. Cache has the benefit of being scavenged when memory pressure increases so that you don’t end up in trouble as easily.
Running in Debug Mode
When you’re developing and debugging an application, you will typically run with the debug attribute in the web.config file set to true and your DLLs compiled in debug mode. However, before you deploy your application to test or to production, you should compile your components in release mode and set the debug attribute to false.
ASP.NET works differently on many levels when running in debug mode. In fact, when you are running in debug mode, the GC will allow your objects to remain alive longer (until the end of the scope) so you will always see higher memory usage when running in debug mode.
Another often unrealized side-effect of running in debug mode is that client scripts served via the webresource.axd and scriptresource.axd handlers will not be cached. That means that each client request will have to download any scripts (such as ASP.NET AJAX scripts) instead of taking advantage of client-side caching. This can lead to a substantial performance hit.
Running in debug mode can also cause problems with fragmentation. I’ll go into more detail on that later in this post. I’ll also show you how you can tell if an ASP.NET assembly was compiled with debug enabled.
Throwing a Lot of Exceptions
Exceptions are expensive when it comes to memory. When an exception is thrown, not only does the GC allocate memory for the exception itself, the message of the exception (a string), and the stack trace, but also memory needed to store any inner exceptions and the corresponding objects associated with that exception. If your application is throwing a lot of exceptions, you can end up with a high memory situation quite easily.
The easiest way to determine how many exceptions your application is throwing is to monitor the # of Exceps Thrown / sec counter in the .NET CLR Exceptions Performance Monitor object. If you are seeing a lot of exceptions being thrown, you need to find out what those exceptions are and stop them from occurring.
Regular Expression Matching of Very Large Strings
Regular expressions (often referred to as regex) represent a powerful way to parse and manipulate a string by matching a particular pattern within that string. However, if your string is very large (megabytes in size) and your regex has a large number of matches, you can end up in a high memory situation.
The RegExpInterpreter class uses an Int32 array to keep track of any matches for a regex and the positions of those matches. When the RegExpInterpreter needs to grow the Int32 array, it does so by doubling its size. If your use of regex creates a very large number of matches, you’ll likely see a substantial amount of memory used by these Int32 arrays.
What do I mean by “large number of matches”? Suppose you are running a regex against the HTML from a page that is several megabytes in size. (You might think that this isn’t a feasible scenario, but we have seen a customer do this with HTML code that was over 5MB!) Suppose also that the regex you are using against this HTML is as follows.
<body(.|n)*</body>
This regex does the following:
- “<body” matches the literal characters “<body”.
- The parenthesis tells the regex engine to match the regex within them and store the match as a back-reference.
- The dot (.) will match any single character that is not a line break.
- The “n” will match any character that is a line break.
- The “*” repeats the regex in parenthesis between zero and an unlimited number of times. It also indicates a greedy match, meaning that it will match as many times as possible within the string.
- “</body>” matches the literal characters “</body>”.
In other words, if you use this regex against the HTML code from a page, it will match the entire body of the page. It will also store that body as a back-reference. The result is a very large Int32 array.
Incidentally, this problem isn’t specific to our implementation of regex. This same type of problem will be encountered with any regex engine that is NFA-based. The solution to this problem is to rethink the architecture so as to avoid such large strings and large matches.
Common Causes of Fragmentation
Fragmentation is problematic because it can cause allocations of contiguous memory to fail. Assume that you have only 100MB of free address space for a process (you’re almost certain to have much more than that in real life) and one 4KB DLL loaded into the middle of that address space as shown in Figure 1. In this scenario, an allocation that requires 64MB of contiguous free space will fail with an OOM exception.
Figure 1 – Fragmented Address Space
The following are common causes of fragmentation.
Running in Debug Mode
One of the features in ASP.NET that is designed to avoid fragmentation is a feature called batch compilation. When batch compilation is enabled, ASP.NET will dynamically compile each folder of your application into a single DLL when the application is JITted. If batch compilation is not enabled, each page and user control is compiled into a separate DLL that is then loaded into the address space for the process. Each of these DLLs is very small, but because they are loaded into a non-specific address in memory, they tend to get peppered all over the address space. The result is a radical decrease in the amount of contiguous free memory, and that leads to a much greater probability of running into an OOM condition.
When you deploy your application, you need to make sure that you set the debug attribute in the web.config file to false as follows.
<compilation debug=»false» />
If you’d like to ensure that debug is disabled on your production server regardless of the setting in the web.config file, you can use the <deployment> element introduced in ASP.NET 2.0. This element should be set in the machine.config file as follows.
<configuration>
<system.web>
<deployment retail=»true» />
</system.web>
</configuration>
Adding this setting to your machine.config file will override the debug attribute in any web.config file on the server.
When debugging is enabled, ASP.NET will add a Debuggable attribute to the assembly. You can use .NET Reflector or ildasm.exe to examine an ASP.NET assembly and determine if it was compiled with the Debuggable attribute. If it was, debugging is enabled for the application.
Figure 2 shows two ASP.NET assemblies from the Temporary ASP.NET Files folder opened in .NET Reflector. The top assembly is selected and you can see that the Debuggable attribute is highlighted in red. (In order to see the manifest information in the right pane, right-click the assembly and select Disassemble from the menu.) The application running this assembly is running in debug mode.
Figure 2 — .NET Reflector showing an assembly compiled with debug enabled.
Figure 3 shows .NET Reflector with the second assembly selected. Notice that this assembly doesn’t have a Debuggable attribute. Therefore, the application running this assembly is not running in debug mode.
Figure 3 — .NET Reflector showing an assembly compiled without debug enabled.
Generating Dynamic Assemblies
Another common cause of fragmentation is the creation of dynamic assemblies. Dynamic assemblies fragment the address space of the process for the same reason that running in debug mode does.
Instead of going into the details here on how this happens, I’ll point you to my colleague Tom Christian’s blog post on dynamic assemblies. Tom goes into detail on what can create dynamic assemblies and how to work around those issues.
Resources
The following resources are helpful when tracking down memory problems in your application.
Gathering Information for Troubleshooting
Tom Christian’s blog post on gathering information for troubleshooting an OOM condition will help you if you need to open a support incident with us. Read more from Tom in this post.
Post-mortem Debugging of Memory Issues
Tess Ferrandez is famous for her excellent blog on debugging ASP.NET applications. She’s accumulated quite a collection of excellent posts on memory issues that includes everything from common memory problems to case studies that include debugging walkthroughs with Windbg. You can find Tess’s 21 most popular blog posts in this post.
Using DebugDiag to Troubleshoot Managed Memory
Tess has also recently published a blog post that includes a DebugDiag script that she wrote for the purpose of troubleshooting managed memory problems. The great thing about using DebugDiag with this script is that you can simply point it to a dump file of your worker process and it will automatically tell you a wealth of information that can help you track down memory usage.
You can find out how to use Tess’s script and download a copy of it here.
Understanding GC
If you ever wanted to know how the .NET garbage collector works, Tess can help! She wrote a great blog post that includes links to other great GC resources, and you can read it here.
“I Am a Happy Janitor!”
Maoni, a developer on the Common Language Runtime team, wrote a blog post that explains how the garbage collector works using the colorful analogy of a janitor. Read Maoni’s enlightening post here.
Using GC Efficiently
Maoni also wrote an excellent series on using GC efficiently in order to prevent memory issues. You can read the series here.
Conclusion
I hope that this information will help you to identify memory problems in your ASP.NET application that can lead to OutOfMemoryExceptions. However, if you have exhausted these ideas and are still plagued with memory problems, contact us and open a support ticket. We’ll be happy to help you troubleshoot!
You’ve just created a Console app in the latest Visual Studio, and wrote some C# code that allocates some non-negligible quantity of memory, say 6 GB. The machine you’re developing has a decent amount of RAM – 16GB – and it’s running 64-bit Windows 10.
You hit F5, but are struck to find the debugger breaking into your code almost immediately and showing:
What’s going on here ? You’re not running some other memory-consuming app. 6 GB surely should have been in reach for your code. The question that this post will set out to answer thus becomes simply: “Why do I get a System.OutOfMemoryException when running my recently created C# app on an idle, 64-bit Windows machine with lots of RAM ?“.
TL;DR (small scroll bar => therefore I deduct a lot of text => I’ve got no time for that, and need the answer now): The default build settings for Visual Studio limit your app’s virtual address space to 4 GB. Go into your project’s Properties, go to Build, and choose Platform target as x64
. Build your solution again and you’re done.
Not so fast ! Tell me more about what goes on under the hood: Buckle up, since we’re going for a ride. First we’ll look at a simple example of code that consumes a lot of memory fast, then uncover interesting facts about our problem, hit a “Wait, what ?” moment, learn the fundamentals of virtual memory, find the root cause of our problem then finish with a series of Q&A.
The Sample Code
Let’s replicate the issue you’ve encountered first – the out-of-memory thing. We’ll pick a simple method of allocating lots of memory – creating several large int
arrays. Let’s make each array contain 10 million int
values. As for how many of these arrays should be: our target for now is to replicate the initial scenario that started this blog post – that is consuming 6 GB of memory – so we should choose the number of arrays accordingly.
What we need to know is how much an int
takes in memory. As it turns out, an int
will always take 4 bytes of memory. Thus, an array of 10 million int
elements would take 40 million bytes of memory. This will actually be the same on either a 32-bit platform or a 64-bit one. If we divide the 6 GB (6.442.450.944 bytes) to 40 million bytes, we’ll get roughly 162. This should be in theory the number of 40 mil arrays required to fill 6 GB of memory.
Now that the numbers are clear, let’s write the code:
using System; namespace LeakMemory { class Program { static void Main(string[] args) { const int BlockSIZE = 10000000; // 10 million const int NoOfBlocks = 162; int[][] intArray = new int[NoOfBlocks][]; Console.WriteLine("Press a key to start"); Console.ReadLine(); try { for (int k = 0; k < NoOfBlocks; k++) { // Generate a large array of ints. This will end up on the heap intArray[k] = new int[BlockSIZE]; Console.WriteLine("Allocated (but not touched) for array {0}: {1}", k, BlockSIZE); // Sleep for 100 ms System.Threading.Thread.Sleep(100); } } catch (Exception e) { Console.WriteLine(e.Message); } Console.WriteLine("done"); Console.ReadLine(); // Prevent the GC from destroying the objects created, by // keeping a reference to them Console.WriteLine(intArray.Length); } } }
Aside from allocating the arrays themselves, most of the code is fluff, and deals with writing output messages, waiting for a key to be pressed to get to the next section or delaying allocating the subsequent array. However, this all will come in handy when we’ll analyze the memory usage in detail. We’re also catching any exception that might come up, and write it on the screen directly.
Something ain’t right
Let’s hit F5 and see how the sample code performs:
Not only that it doesn’t complete successfully, but the code doesn’t even make it till the 100th 10-million int
array. The exception thrown is our familiar System.OutOfMemoryException
. Visual Studio’s built-in profiling dashboard (top right) shows the memory used by our process going close to 3 GB – just as the exception hits.
Can I Get Some Light Over Here ?
Ok, we need to understand what goes on. Luckily, Visual Studio has a built-in memory profiler we can use right away. This will run the code once more, and allow us to take a snapshot after the exception is thrown, so that we understand where the usage goes:
Oddly enough, this time the code can successfully allocate 67 arrays (the code fails just after displaying error for the 0-based array no 66). When we first ran the code, it could only allocate 66 arrays.
Drilling down into the objects for which memory is allocated, we see the same number of arrays successfully allocated (67) as in the console output. Each array takes roughly 40 million bytes, as expected. But why only allocate barely close to 3 GB – to be more precise 2.6 GB, as the profiler shows above -, when it was supposed to go up to 6 GB ?
Anyone Got A Bigger Light ?
Clearly we need a tool that can shed more light on the problem, and allow us to see the memory usage in better detail. Enter VMMap, which is “a process virtual and physical memory analysis utility. It shows a breakdown of a process’s committed virtual memory types as well as the amount of physical memory (working set) assigned by the operating system to those types“. The “committed” and “virtual memory” parts might sound scary for now, but nonetheless, the tool seems to tell where the memory goes from the operating system’s point of view, which should show significantly more than Visual Studio’s integrated memory profiler. Let’s give it a spin:
The Timeline… button allows going to a particular point in time from the process’ life to see the exact memory usage then. The resulting window – Timeline (Committed) – shows (1) a gradual upward trend, then (2) an approximately constant usage, followed by (3) a sudden drop to zero. You also briefly see the time of the selection being changed to a point where the memory usage line is pretty flat (within (2) described above, which happens after the exception was thrown, but before all the lines start dropping as part of (3)). Ignore the yellow/green/purple colors mixed in the chart for a second, and also note that when the time of the selection is changed, all the values in the main window change as well.
Back in the main window, it’s a lot like a christmas tree, with multiple colors and lots of columns with funny names, but let’s leave that aside for a moment, and only focus on the Size column in the top left. Actually, let’s take a closer look at only 2 values there, the first one – Total
– which represents the total memory consumed, and last one – Free
– representing the total free memory. Here they are highlighted:
Hmm, the total size in the figure represents about 3.7 GB. That’s significantly larger than the 2.6 GB value we’ve got from Visual Studio’s Memory Profiler.
But look at the value for free space – that’s almost 300 MB of memory. This should have been more than enough for allocating 7 more of our 10 million int
arrays with no problem.
How about we sum the 2 values – the total size and the free space ? The number is exactly 4 GB. Intriguing. This seems to suggest that the total memory our process gets is exactly 4 GB.
Wait, What ?
VMMap has a concise, to-the-point help. If you lookup what Total WS means, it says “The amount of physical memory assigned to the type or region“. So the value at the intersection of the Total WS column and the Total row will tell us exactly how much physical memory the process is taking at its peak usage, right after the out-of-memory exception is thrown:
The value is… 12,356 KB. In other words about 12 MB. So VMMap is telling us that our program, which was supposed to allocate 6 GB of memory (but fails somewhere midway by throwing an exception) is only capable of allocating 12 MB of RAM ? But that’s not even the size of one array of 10 million int
, and we know for sure we’ve allocated not one, but 67 of them successfully ! What kind of sorcery is this ?
A Trip Down Memory Lane
Before moving on, you need to know that there’s a very special book, called “Windows Internals“, that analyses in depth how Windows works under the hood. It’s been around since 1992, back when Windows NT roamed the Earth. The current 7th edition handles Windows 10 and Windows Server 2016. The chapter describing memory management alone is 182 pages long. Extensive references to the contents found there will be made next.
Back to our issue at hand, we have to start with some basic facts about how programs access memory. Specifically, in our small C# example, the resulting process is never allocating chunks of physical memory directly. Windows itself doesn’t hand out to the process any address for physical RAM. Instead, the concept of virtual memory is used.
Let’s see how “Windows Internals” defines this notion:
Windows implements a virtual memory system based on a flat (linear) address space that provides each process with the illusion of having its own large, private address space. Virtual memory provides a logical view of memory that might not correspond to its physical layout.
Windows Internals 7th Edition – Chapter 1 “Concepts and tools”
Let’s visualize this:
So our process gets handed out a range of “fake”, virtual addresses. Windows works together with the CPU to translate – or map – these virtual addresses to the place where they actually point – either the physical RAM or the disk.
In figure 7, the green chunks are in use by the process, and point to a “backing” medium (RAM or disk), while the orange chunks are free.
Note something of interest: contiguous virtual memory chunks can point to non-contiguous chunks in physical memory. These chunks are called pages, and they are usually 4 KB in size.
Guilty As Charged
Let’s keep in mind the goal we’ve set out in the beginning of this post – we want to find out why we’ve got an out-of-memory exception. Remember that we know from VMMap that the total virtual memory size allocated to our process is 4 GB.
In other words, our process gets handed by Windows 4 GB of memory space, cut into pages, each 4 KB long. Initially all those pages will be “orange” – free, with no data written to them. Once we start allocating our int
arrays, some of the pages will start turning “green”.
Note that there’s a sort of dual reality going on. From the process’ point of view, it’s writing and allocating the int
arrays in either the “orange” boxes or “green” boxes that haven’t yet filled up; it knows nothing about where such a box is really stored in the back. The reality however, which Windows knows too well, is that there’s no data stored in either the “green” or “orange” boxes in figure 7, only simple mappings that lead to the data itself – stored in RAM or on the disk.
Since there’s really no compression at play here, there won’t really be a way to fit those 6 GB of data into just 4 GB. Eventually we’ll exhaust even the last available free page. You can’t just place 6 eggs into an egg carton that can only accommodate 4. We just have to accept that the exception raised is a natural thing, given the circumstances.
So The World Is A Small Place ?
“Are you saying that every process out there only gets access to 4 GB of memory ?(!)” I rightfully hear you asking.
Let’s take a look at the default configuration options used by Visual Studio for a C# console app:
Note the highlighted values. To simplify for the sake of our discussion, this combo (Any CPU
as Platform target plus Prefer 32-bit
) will get us 2 things:
- Visual Studio will compile the source code to an .exe file that will be run as a 32-bit process when started, regardless if the underlying system is 32-bit or 64-bit Windows.
- The Large Address Aware flag will be set in the resulting .exe file, which essentially tells Windows that it can successfully handle more than 2 GB of virtual address space.
These 2 points combine on a 64-bit Windows so that the process is granted via the Wow64 mechanism its maximum allocable space given its 32-bit constraint – that is of 2^32 bytes, or exactly 4 GB.
If the code is compiled specifically for 64-bit systems – eg by simply unticking the Prefer 32-bit
option back in figure 8, suddenly the process – when run on a 64-bit machine – will get access to 128 TB of virtual address space.
An important point to remember: the values presented above for a 64-bit system, namely 4 GB (for a 32-bit process that is large address aware) and 128 TB (for a 64-bit process) respectively are the maximum addressable virtual address space ranges currently for a Windows 10 box. A system can have only 2 GB of physical memory, yet it doesn’t change the fact that it will be able to address 4 GB of address space; how that address space is distributed when actually needed – eg say 700 MB in physical RAM, while the rest on disk – is up to the underlying operating system. Conversely however, having 6 GB (or 7/10/20/50 GB) won’t help a 32-bit large address aware process get more than 4 GB of virtual address space.
So 1 mystery down, 2 more to go…
Bits and Pieces
Remember those 300+ MB of free space in Figure 5 back when the out-of-memory exception was thrown ? Why is the exception raised when there’s still some space remaining ?
Let’s look first at how .NET actually reserves memory for an array. As this older Microsoft article puts it: “The contents of an array are stored in contiguous memory“.
But where in memory are these arrays actually placed ? Every object ends up in one of 2 places – the stack or the heap. We just need to figure out which. Luckily, “C# in Depth” (Third Edition) by Jon Skeet has the answer, all within a couple of pages:
Array types are reference types, even if the element type is a value type (so int[] is still a reference type, even though int is a value type)
C# in Depth (Third Edition), Jon Skeet
[…]an instance of a reference type is always created on the heap.
C# in Depth (Third Edition), Jon Skeet
The thing is that there are 2 types of heaps that a process can allocate: unmanaged and managed. Which kind is used by ours ? “Writing High-Performance .NET Code” (2nd Edition) by Ben Watson has the answer:
The CLR allocates all managed .NET objects on the managed heap, also called the GC heap, because the objects on it are subject to garbage collection.
“Writing High-Performance .NET Code” (2nd Edition), Ben Watson
If the words “managed heap” look familiar, it’s because VMMap has a dedicated category just for it in the memory types it’s showing.
Now let’s look at what happens in the last seconds of our process’ lifetime, shortly before the exception is thrown. We’ll use the “Address Space Fragmentation” window, which displays the various types of memory in use and their distribution within the process’ address space. Ignore the colors in the “Address Space Fragmentation” window to the right for now, but keep an eye out for the free space. We’ll also do one more thing: sort the free space blocks in descending order.
We can see the free space gradually filling up. The allocations are all contiguous, just like the theory quoted before said they would be. So we don’t see, for example, the free space around the “violet” data being filled, since there’s no large enough “gap” to accommodate it. Yet in the end we’re still left with 2 free blocks, each in excess of 40 mil bytes, which should accept at least 2 more int
arrays. You can see each of them highlighted, and their space clearly indicated on the fragmentation window towards the end of the animation.
The thing is that so far we’ve made an assumption – that each array will occupy the space required to hold the actual data, that is 4 bytes (/int
) x 10 million (int
objects) = 40 mil bytes. But let’s see how each block actually looks like in the virtual address space. We’ll go a to a point in time midway – when we know sufficient data has been already allocated – and only filter for the “Managed Heap” category, and sort the blocks by size in descending order:
It turns out that each block is 49,152 KB in size – or 48 MB -, and is composed of 2 sub-blocks: one of 39,068 KB and another of 10,084 KB. The first value – 39,068 KB – is really close to our expected 40.000.000 bytes, with only 5,632 bytes to spare, which suggests this is were our int
elements are stored. The second value seems to indicate some sort of overhead. Note several such sub-blocks being highlighted in the fragmentation view. Note that for each 48 MB block, both of the sub-blocks contained are contiguous.
What this means is that there has to be a free space “gap” big enough to accomodate 49,152 KB in order to successfully allocate another array of int
elements. But if you look back at Figure 9, you’ll see that we’ve just run out of luck – the largest free space block only has 41,408 KB. The system no longer has contiguous free memory space to use for one more subsequent allocation, and – despite having several hundred MB of free space made up from small “pieces” – throws an out-of-memory exception.
So it wasn’t the fact that we’ve exhausted our 4 GB virtual memory limit that threw the out-of-memory exception, but the inability to find a large enough block of free space.
This leaves one with one more question to answer.
Your Reservation Is Now Confirmed
Remember the ludicrously low number of actual used physical memory of 12,356 KB back in Figure 6 ? How come it’s so low ?
We briefly touched on this issue in the last paragraph of So the World Is A Small Place ? by saying that some of the virtual address space can be backed up by physical memory, or can be paged out to disk.
There are 4 kinds of memory pages:
Pages in a process virtual address space are either free, reserved, committed, or shareable.
Windows Internals 7th Edition – Chapter 5 “Memory Management”
When we’re allocating each int
array, what’s happening under the hood (through .NET and the underlying operating system) is that memory for that array is committed. Committing in this context involves the OS performing the following:
- Setting aside virtual address space within our process large enough for it to be able to address the area being allocated
- Providing a guarantee for the memory requested
For (1) this is relatively straightforward – the virtual address space is marked accordingly in structures called process VADs (virtual address descriptors). For (2), the OS needs to ensure that the memory requested is readily available to the process whenever it will need it in the future.
Note that neither of the two conditions demands providing the details of all the memory locations upfront. Giving out a guarantee that – say 12,000 memory pages – will be readily available when requested is very different than finding a particular spot for each of those 12,000 individual pages in a backing medium – be it physical RAM or one of the paging files on disk. The latter is a lot of work.
And the OS takes the easy way out – it just guarantees that the memory will be available when needed. It will do this by ensuring the commit limit – which is the sum of the size of RAM plus the current size of the paging files – is enough to honor all the requests for virtual address space that the OS has agreed to so far.
So if 3 processes commit memory – the first one 400 MB, the second 200 MB and the third 300 MB – the system must ensure that somewhere either in RAM or in the paging files there is enough space to hold at least 900 MB, that can be used to store the data if those processes might be accessing the data in the future.
The OS is literally being lazy. And this is actually the name of the trick employed: lazy-evaluation technique. More from “Windows Internals“:
For example, a page of private committed memory does not actually occupy either a physical page of RAM or the equivalent page file space until it’s been referenced at least once.
Why is this so ? Because:
When a thread commits a large region of virtual memory […], the memory manager could immediately construct the page tables required to access the entire range of allocated memory. But what if some of that range is never accessed? Creating page tables for the entire range would be a wasted effort.
And if you think back to our code, it’s simply allocating int
arrays, it doesn’t write to any of the elements. We never asked to store any values in the arrays, so the OS was lazy enough to not go about building the structures – called PTEs (Page Table Entries) that would have linked the virtual address space within our process to physical pages that were to be stored in RAM.
But what does the term working set stand for back in Figure 6 ?
A subset of virtual pages resident in physical memory is called a working set.
Yet we never got to the point where we demanded the actual virtual pages, therefore the system never built the PTE structures that would have linked those virtual pages to physical ones in the RAM, which resulted in our process having a close-to-nothing working set, as can be clearly seen in Figure 6.
Is It All Lies ?
But what if we were to actually “touch” the data that we’re allocating ? According to what we’ve seen above, this would have to trigger the creation of virtual pages mapped to RAM. Writing a value to every int
element in the arrays we’re spawning should do the trick.
However there’s one shortcut we can take. Remember that an int
element takes 4 bytes, and that a page is 4 KB in size – or 4096 bytes. We also know that the array will be allocated as contiguous memory. Therefore, we don’t really need to touch every single element of the array, but only every 1024th element. This is just enough to demand for a page to be created and brought within the working set. So let’s slightly modify the for
block that’s allocating the arrays in our code:
for (int k = 0; k < NoOfBlocks; k++) { // Generate a large array of ints. This will end up on the heap intArray[k] = new int[BlockSIZE]; //Console.WriteLine("Allocated (but not touched) for array {0}: {1} bytes", k, BlockSIZE); for(int i=0;i<BlockSIZE;i+=1024) { intArray[k][i] = 0; } Console.WriteLine("Allocated (and touched) for array {0}: {1} bytes", k, BlockSIZE); // Sleep for 100 ms System.Threading.Thread.Sleep(100); }
Let’s see the result after running this code:
The values are almost identical this time, meaning pages were created and our data currently sits in the physical RAM.
Q & A
Q: You mentioned back in one of sections that the pages are usually 4 KB in size. What’s the instance they have a different size, and what are those sizes ?
A: There are small (4 KB), large (2 MB) and – as of Windows 10 version 1607 x64 – huge pages (1 GB). For more details look in the “Large and small pages” section close to the beginning of chapter 5 in “Windows Internals, Part 1” (7th Edition).
Q: Why use this virtual memory concept in the first place ? It just seems to insert an unneeded level of indirection. Why not just write to RAM physical addresses directly ?
A: Microsoft itself lists 3 arguments going for the notion of virtual memory here. It also has some nice diagrams, and it’s concise for what it’s communicating across.
Q: You mentioned that on a 64-bit Windows, 64-bit compiler generated code will result in a process that can address up to 128 TB of virtual address space. However if I compute 2^64 I get a lot more than 128 TB. How come ?
A: A quote from Windows Internals:
Sixty-four bits of address space is 2 to the 64th power, or 16 EB (where 1 EB equals 1,024 PB, or 1,048,576 TB), but current 64-bit hardware limits this to smaller values.
Q: But AWE could be used from Pentium Pro times to allocate 64 GB of RAM.
A: Remember that the virtual address space is limited to 4 GB for a large-address aware, 32-bit process running on 64-bit Windows. A *lot* of physical memory could be mapped using the (by comparison, relatively small) virtual address space. In effect, the virtual address space is used as a “window” into the large physical memory.
Q: What if allocating larger int
blocks, from 10 mil to say 12 mil elements each. Would the overhead be increased proportionally ?
A: No. There are certain block sizes that seem to be used by the Large Object Heap. When allocating 12 mil elements, the overall size of the block is still 49,152 KB, with a “band” of only 2,272 KB of reserved memory. When allocating 13 mil elements, the overall size of the block goes up to 65,536 KB, with 14,748 KB of reserved space for each:
Q: What’s causing the overhead seen in the question above, as well as within the article ?
A: At this time (4/21/2019) I don’t have the answer. I do believe the low-fragmentation heap, which .NET is using under the hood for its heap implementation, holds the secret to this.
Q: Does the contiguous data found within each virtual page map to correspondingly contiguous data within the physical memory pages ? Or to rephrase, are various consecutive virtual space addresses within the same virtual page pointing to spread-out locations within a physical page, or even multiple physical pages ?
A: They are always contiguous. Refer to “Windows Internals, Part 1” (7th Edition) to chapter 5, where it’s detailed how in the process of address translation the CPU copies the last 12 bits in every virtual address to reference the offset in a physical page. This means the order is the same within both the virtual page as well as the physical one. Note how RamMap shows the correspondence of physical-to-virtual addresses on a 4 KB boundary, or exactly the size of a regular page.
Q: In all the animation and figures I’m seeing a yellow chunk of space, alongside the green one for “Managed Heap”. This yellow one is labeled “Private Data”, and it’s quite large in size. What’s up with that ?
A: There’s a bug in the current version of VMMap, whereby the 32-bit version – needed to analyze the 32-bit executable for the int
allocator code – incorrectly classifies virtual addresses pointing to .NET allocated data above 2 GB as private data, instead of managed heap. You’ll also see that the working set for all int
arrays classified as such appears to be nothing – when in reality this is not the case. I’m currently (4/21/2019) in contact with Mark Russinovich (the author of VMMap) to see how this can be fixed. The bug however doesn’t exist in the 64-bit version of VMMap, and all the allocations will correctly show up as ‘Managed Heap’.
Q: I’d like to understand more about the PTE structures. Where can I find more information ?
A: Look inside chapter 5 (“Memory Management“) within “Windows Internals, Part 1” (7th Edition). There’s an “Address Translation” section that goes into all the details, complete with diagrams.
Q: Your article is hard to follow and I can’t really understand much. Can you recommend some sources that do a better job than you at explaining these concepts ?
A: Eric Lippert has a very good blog post here. There’s also a very nice presentation by Mark Russinovich here which handles a lot of topics about memory (including a 2nd presentation, also 1+ hours long). Though both sources are quite dated, being several years old, the concepts are very much current.
Q: Where can I find more info about the Platform Target setting in Visual Studio ?
A: The previous post on this very blog describes that in detail. You can start reading from this section.
Q: I’ve tried duplicating your VMMap experiment, but sometimes I’m seeing that the largest free block available is in excess of 100 KB. This is more than double the size of an int
array, which should take around 49 KB (39KB + 10KB reserve), so there should’ve been space for at least one subsequent allocation. What’s going on ?
A: I don’t have a thorough answer for this right now (4/21/2019). I’ve noticed this myself. My only suspicion is that something extra goes on behind the scenes, aside the simple allocation for the int
array, such as the .NET allocation mechanism going after some extra blocks of memory.
Q: I heard an int
takes a double amount of space on a 64-bit system. You’re stating in this article that it’s 4 bytes on either 32-bit/64-bit. You’re wrong !
A: Don’t confuse an IntPtr
– whose size is 4 bytes on a 32-bit platform and 8 bytes on a 64-bit one – which represents a pointer, to an int
value. The pointer contains that int
variable’s address, but what’s found at that address is the int
value itself.
You’ve just created a Console app in the latest Visual Studio, and wrote some C# code that allocates some non-negligible quantity of memory, say 6 GB. The machine you’re developing has a decent amount of RAM – 16GB – and it’s running 64-bit Windows 10.
You hit F5, but are struck to find the debugger breaking into your code almost immediately and showing:
What’s going on here ? You’re not running some other memory-consuming app. 6 GB surely should have been in reach for your code. The question that this post will set out to answer thus becomes simply: “Why do I get a System.OutOfMemoryException when running my recently created C# app on an idle, 64-bit Windows machine with lots of RAM ?“.
TL;DR (small scroll bar => therefore I deduct a lot of text => I’ve got no time for that, and need the answer now): The default build settings for Visual Studio limit your app’s virtual address space to 4 GB. Go into your project’s Properties, go to Build, and choose Platform target as x64
. Build your solution again and you’re done.
Not so fast ! Tell me more about what goes on under the hood: Buckle up, since we’re going for a ride. First we’ll look at a simple example of code that consumes a lot of memory fast, then uncover interesting facts about our problem, hit a “Wait, what ?” moment, learn the fundamentals of virtual memory, find the root cause of our problem then finish with a series of Q&A.
The Sample Code
Let’s replicate the issue you’ve encountered first – the out-of-memory thing. We’ll pick a simple method of allocating lots of memory – creating several large int
arrays. Let’s make each array contain 10 million int
values. As for how many of these arrays should be: our target for now is to replicate the initial scenario that started this blog post – that is consuming 6 GB of memory – so we should choose the number of arrays accordingly.
What we need to know is how much an int
takes in memory. As it turns out, an int
will always take 4 bytes of memory. Thus, an array of 10 million int
elements would take 40 million bytes of memory. This will actually be the same on either a 32-bit platform or a 64-bit one. If we divide the 6 GB (6.442.450.944 bytes) to 40 million bytes, we’ll get roughly 162. This should be in theory the number of 40 mil arrays required to fill 6 GB of memory.
Now that the numbers are clear, let’s write the code:
using System; namespace LeakMemory { class Program { static void Main(string[] args) { const int BlockSIZE = 10000000; // 10 million const int NoOfBlocks = 162; int[][] intArray = new int[NoOfBlocks][]; Console.WriteLine("Press a key to start"); Console.ReadLine(); try { for (int k = 0; k < NoOfBlocks; k++) { // Generate a large array of ints. This will end up on the heap intArray[k] = new int[BlockSIZE]; Console.WriteLine("Allocated (but not touched) for array {0}: {1}", k, BlockSIZE); // Sleep for 100 ms System.Threading.Thread.Sleep(100); } } catch (Exception e) { Console.WriteLine(e.Message); } Console.WriteLine("done"); Console.ReadLine(); // Prevent the GC from destroying the objects created, by // keeping a reference to them Console.WriteLine(intArray.Length); } } }
Aside from allocating the arrays themselves, most of the code is fluff, and deals with writing output messages, waiting for a key to be pressed to get to the next section or delaying allocating the subsequent array. However, this all will come in handy when we’ll analyze the memory usage in detail. We’re also catching any exception that might come up, and write it on the screen directly.
Something ain’t right
Let’s hit F5 and see how the sample code performs:
Not only that it doesn’t complete successfully, but the code doesn’t even make it till the 100th 10-million int
array. The exception thrown is our familiar System.OutOfMemoryException
. Visual Studio’s built-in profiling dashboard (top right) shows the memory used by our process going close to 3 GB – just as the exception hits.
Can I Get Some Light Over Here ?
Ok, we need to understand what goes on. Luckily, Visual Studio has a built-in memory profiler we can use right away. This will run the code once more, and allow us to take a snapshot after the exception is thrown, so that we understand where the usage goes:
Oddly enough, this time the code can successfully allocate 67 arrays (the code fails just after displaying error for the 0-based array no 66). When we first ran the code, it could only allocate 66 arrays.
Drilling down into the objects for which memory is allocated, we see the same number of arrays successfully allocated (67) as in the console output. Each array takes roughly 40 million bytes, as expected. But why only allocate barely close to 3 GB – to be more precise 2.6 GB, as the profiler shows above -, when it was supposed to go up to 6 GB ?
Anyone Got A Bigger Light ?
Clearly we need a tool that can shed more light on the problem, and allow us to see the memory usage in better detail. Enter VMMap, which is “a process virtual and physical memory analysis utility. It shows a breakdown of a process’s committed virtual memory types as well as the amount of physical memory (working set) assigned by the operating system to those types“. The “committed” and “virtual memory” parts might sound scary for now, but nonetheless, the tool seems to tell where the memory goes from the operating system’s point of view, which should show significantly more than Visual Studio’s integrated memory profiler. Let’s give it a spin:
The Timeline… button allows going to a particular point in time from the process’ life to see the exact memory usage then. The resulting window – Timeline (Committed) – shows (1) a gradual upward trend, then (2) an approximately constant usage, followed by (3) a sudden drop to zero. You also briefly see the time of the selection being changed to a point where the memory usage line is pretty flat (within (2) described above, which happens after the exception was thrown, but before all the lines start dropping as part of (3)). Ignore the yellow/green/purple colors mixed in the chart for a second, and also note that when the time of the selection is changed, all the values in the main window change as well.
Back in the main window, it’s a lot like a christmas tree, with multiple colors and lots of columns with funny names, but let’s leave that aside for a moment, and only focus on the Size column in the top left. Actually, let’s take a closer look at only 2 values there, the first one – Total
– which represents the total memory consumed, and last one – Free
– representing the total free memory. Here they are highlighted:
Hmm, the total size in the figure represents about 3.7 GB. That’s significantly larger than the 2.6 GB value we’ve got from Visual Studio’s Memory Profiler.
But look at the value for free space – that’s almost 300 MB of memory. This should have been more than enough for allocating 7 more of our 10 million int
arrays with no problem.
How about we sum the 2 values – the total size and the free space ? The number is exactly 4 GB. Intriguing. This seems to suggest that the total memory our process gets is exactly 4 GB.
Wait, What ?
VMMap has a concise, to-the-point help. If you lookup what Total WS means, it says “The amount of physical memory assigned to the type or region“. So the value at the intersection of the Total WS column and the Total row will tell us exactly how much physical memory the process is taking at its peak usage, right after the out-of-memory exception is thrown:
The value is… 12,356 KB. In other words about 12 MB. So VMMap is telling us that our program, which was supposed to allocate 6 GB of memory (but fails somewhere midway by throwing an exception) is only capable of allocating 12 MB of RAM ? But that’s not even the size of one array of 10 million int
, and we know for sure we’ve allocated not one, but 67 of them successfully ! What kind of sorcery is this ?
A Trip Down Memory Lane
Before moving on, you need to know that there’s a very special book, called “Windows Internals“, that analyses in depth how Windows works under the hood. It’s been around since 1992, back when Windows NT roamed the Earth. The current 7th edition handles Windows 10 and Windows Server 2016. The chapter describing memory management alone is 182 pages long. Extensive references to the contents found there will be made next.
Back to our issue at hand, we have to start with some basic facts about how programs access memory. Specifically, in our small C# example, the resulting process is never allocating chunks of physical memory directly. Windows itself doesn’t hand out to the process any address for physical RAM. Instead, the concept of virtual memory is used.
Let’s see how “Windows Internals” defines this notion:
Windows implements a virtual memory system based on a flat (linear) address space that provides each process with the illusion of having its own large, private address space. Virtual memory provides a logical view of memory that might not correspond to its physical layout.
Windows Internals 7th Edition – Chapter 1 “Concepts and tools”
Let’s visualize this:
So our process gets handed out a range of “fake”, virtual addresses. Windows works together with the CPU to translate – or map – these virtual addresses to the place where they actually point – either the physical RAM or the disk.
In figure 7, the green chunks are in use by the process, and point to a “backing” medium (RAM or disk), while the orange chunks are free.
Note something of interest: contiguous virtual memory chunks can point to non-contiguous chunks in physical memory. These chunks are called pages, and they are usually 4 KB in size.
Guilty As Charged
Let’s keep in mind the goal we’ve set out in the beginning of this post – we want to find out why we’ve got an out-of-memory exception. Remember that we know from VMMap that the total virtual memory size allocated to our process is 4 GB.
In other words, our process gets handed by Windows 4 GB of memory space, cut into pages, each 4 KB long. Initially all those pages will be “orange” – free, with no data written to them. Once we start allocating our int
arrays, some of the pages will start turning “green”.
Note that there’s a sort of dual reality going on. From the process’ point of view, it’s writing and allocating the int
arrays in either the “orange” boxes or “green” boxes that haven’t yet filled up; it knows nothing about where such a box is really stored in the back. The reality however, which Windows knows too well, is that there’s no data stored in either the “green” or “orange” boxes in figure 7, only simple mappings that lead to the data itself – stored in RAM or on the disk.
Since there’s really no compression at play here, there won’t really be a way to fit those 6 GB of data into just 4 GB. Eventually we’ll exhaust even the last available free page. You can’t just place 6 eggs into an egg carton that can only accommodate 4. We just have to accept that the exception raised is a natural thing, given the circumstances.
So The World Is A Small Place ?
“Are you saying that every process out there only gets access to 4 GB of memory ?(!)” I rightfully hear you asking.
Let’s take a look at the default configuration options used by Visual Studio for a C# console app:
Note the highlighted values. To simplify for the sake of our discussion, this combo (Any CPU
as Platform target plus Prefer 32-bit
) will get us 2 things:
- Visual Studio will compile the source code to an .exe file that will be run as a 32-bit process when started, regardless if the underlying system is 32-bit or 64-bit Windows.
- The Large Address Aware flag will be set in the resulting .exe file, which essentially tells Windows that it can successfully handle more than 2 GB of virtual address space.
These 2 points combine on a 64-bit Windows so that the process is granted via the Wow64 mechanism its maximum allocable space given its 32-bit constraint – that is of 2^32 bytes, or exactly 4 GB.
If the code is compiled specifically for 64-bit systems – eg by simply unticking the Prefer 32-bit
option back in figure 8, suddenly the process – when run on a 64-bit machine – will get access to 128 TB of virtual address space.
An important point to remember: the values presented above for a 64-bit system, namely 4 GB (for a 32-bit process that is large address aware) and 128 TB (for a 64-bit process) respectively are the maximum addressable virtual address space ranges currently for a Windows 10 box. A system can have only 2 GB of physical memory, yet it doesn’t change the fact that it will be able to address 4 GB of address space; how that address space is distributed when actually needed – eg say 700 MB in physical RAM, while the rest on disk – is up to the underlying operating system. Conversely however, having 6 GB (or 7/10/20/50 GB) won’t help a 32-bit large address aware process get more than 4 GB of virtual address space.
So 1 mystery down, 2 more to go…
Bits and Pieces
Remember those 300+ MB of free space in Figure 5 back when the out-of-memory exception was thrown ? Why is the exception raised when there’s still some space remaining ?
Let’s look first at how .NET actually reserves memory for an array. As this older Microsoft article puts it: “The contents of an array are stored in contiguous memory“.
But where in memory are these arrays actually placed ? Every object ends up in one of 2 places – the stack or the heap. We just need to figure out which. Luckily, “C# in Depth” (Third Edition) by Jon Skeet has the answer, all within a couple of pages:
Array types are reference types, even if the element type is a value type (so int[] is still a reference type, even though int is a value type)
C# in Depth (Third Edition), Jon Skeet
[…]an instance of a reference type is always created on the heap.
C# in Depth (Third Edition), Jon Skeet
The thing is that there are 2 types of heaps that a process can allocate: unmanaged and managed. Which kind is used by ours ? “Writing High-Performance .NET Code” (2nd Edition) by Ben Watson has the answer:
The CLR allocates all managed .NET objects on the managed heap, also called the GC heap, because the objects on it are subject to garbage collection.
“Writing High-Performance .NET Code” (2nd Edition), Ben Watson
If the words “managed heap” look familiar, it’s because VMMap has a dedicated category just for it in the memory types it’s showing.
Now let’s look at what happens in the last seconds of our process’ lifetime, shortly before the exception is thrown. We’ll use the “Address Space Fragmentation” window, which displays the various types of memory in use and their distribution within the process’ address space. Ignore the colors in the “Address Space Fragmentation” window to the right for now, but keep an eye out for the free space. We’ll also do one more thing: sort the free space blocks in descending order.
We can see the free space gradually filling up. The allocations are all contiguous, just like the theory quoted before said they would be. So we don’t see, for example, the free space around the “violet” data being filled, since there’s no large enough “gap” to accommodate it. Yet in the end we’re still left with 2 free blocks, each in excess of 40 mil bytes, which should accept at least 2 more int
arrays. You can see each of them highlighted, and their space clearly indicated on the fragmentation window towards the end of the animation.
The thing is that so far we’ve made an assumption – that each array will occupy the space required to hold the actual data, that is 4 bytes (/int
) x 10 million (int
objects) = 40 mil bytes. But let’s see how each block actually looks like in the virtual address space. We’ll go a to a point in time midway – when we know sufficient data has been already allocated – and only filter for the “Managed Heap” category, and sort the blocks by size in descending order:
It turns out that each block is 49,152 KB in size – or 48 MB -, and is composed of 2 sub-blocks: one of 39,068 KB and another of 10,084 KB. The first value – 39,068 KB – is really close to our expected 40.000.000 bytes, with only 5,632 bytes to spare, which suggests this is were our int
elements are stored. The second value seems to indicate some sort of overhead. Note several such sub-blocks being highlighted in the fragmentation view. Note that for each 48 MB block, both of the sub-blocks contained are contiguous.
What this means is that there has to be a free space “gap” big enough to accomodate 49,152 KB in order to successfully allocate another array of int
elements. But if you look back at Figure 9, you’ll see that we’ve just run out of luck – the largest free space block only has 41,408 KB. The system no longer has contiguous free memory space to use for one more subsequent allocation, and – despite having several hundred MB of free space made up from small “pieces” – throws an out-of-memory exception.
So it wasn’t the fact that we’ve exhausted our 4 GB virtual memory limit that threw the out-of-memory exception, but the inability to find a large enough block of free space.
This leaves one with one more question to answer.
Your Reservation Is Now Confirmed
Remember the ludicrously low number of actual used physical memory of 12,356 KB back in Figure 6 ? How come it’s so low ?
We briefly touched on this issue in the last paragraph of So the World Is A Small Place ? by saying that some of the virtual address space can be backed up by physical memory, or can be paged out to disk.
There are 4 kinds of memory pages:
Pages in a process virtual address space are either free, reserved, committed, or shareable.
Windows Internals 7th Edition – Chapter 5 “Memory Management”
When we’re allocating each int
array, what’s happening under the hood (through .NET and the underlying operating system) is that memory for that array is committed. Committing in this context involves the OS performing the following:
- Setting aside virtual address space within our process large enough for it to be able to address the area being allocated
- Providing a guarantee for the memory requested
For (1) this is relatively straightforward – the virtual address space is marked accordingly in structures called process VADs (virtual address descriptors). For (2), the OS needs to ensure that the memory requested is readily available to the process whenever it will need it in the future.
Note that neither of the two conditions demands providing the details of all the memory locations upfront. Giving out a guarantee that – say 12,000 memory pages – will be readily available when requested is very different than finding a particular spot for each of those 12,000 individual pages in a backing medium – be it physical RAM or one of the paging files on disk. The latter is a lot of work.
And the OS takes the easy way out – it just guarantees that the memory will be available when needed. It will do this by ensuring the commit limit – which is the sum of the size of RAM plus the current size of the paging files – is enough to honor all the requests for virtual address space that the OS has agreed to so far.
So if 3 processes commit memory – the first one 400 MB, the second 200 MB and the third 300 MB – the system must ensure that somewhere either in RAM or in the paging files there is enough space to hold at least 900 MB, that can be used to store the data if those processes might be accessing the data in the future.
The OS is literally being lazy. And this is actually the name of the trick employed: lazy-evaluation technique. More from “Windows Internals“:
For example, a page of private committed memory does not actually occupy either a physical page of RAM or the equivalent page file space until it’s been referenced at least once.
Why is this so ? Because:
When a thread commits a large region of virtual memory […], the memory manager could immediately construct the page tables required to access the entire range of allocated memory. But what if some of that range is never accessed? Creating page tables for the entire range would be a wasted effort.
And if you think back to our code, it’s simply allocating int
arrays, it doesn’t write to any of the elements. We never asked to store any values in the arrays, so the OS was lazy enough to not go about building the structures – called PTEs (Page Table Entries) that would have linked the virtual address space within our process to physical pages that were to be stored in RAM.
But what does the term working set stand for back in Figure 6 ?
A subset of virtual pages resident in physical memory is called a working set.
Yet we never got to the point where we demanded the actual virtual pages, therefore the system never built the PTE structures that would have linked those virtual pages to physical ones in the RAM, which resulted in our process having a close-to-nothing working set, as can be clearly seen in Figure 6.
Is It All Lies ?
But what if we were to actually “touch” the data that we’re allocating ? According to what we’ve seen above, this would have to trigger the creation of virtual pages mapped to RAM. Writing a value to every int
element in the arrays we’re spawning should do the trick.
However there’s one shortcut we can take. Remember that an int
element takes 4 bytes, and that a page is 4 KB in size – or 4096 bytes. We also know that the array will be allocated as contiguous memory. Therefore, we don’t really need to touch every single element of the array, but only every 1024th element. This is just enough to demand for a page to be created and brought within the working set. So let’s slightly modify the for
block that’s allocating the arrays in our code:
for (int k = 0; k < NoOfBlocks; k++) { // Generate a large array of ints. This will end up on the heap intArray[k] = new int[BlockSIZE]; //Console.WriteLine("Allocated (but not touched) for array {0}: {1} bytes", k, BlockSIZE); for(int i=0;i<BlockSIZE;i+=1024) { intArray[k][i] = 0; } Console.WriteLine("Allocated (and touched) for array {0}: {1} bytes", k, BlockSIZE); // Sleep for 100 ms System.Threading.Thread.Sleep(100); }
Let’s see the result after running this code:
The values are almost identical this time, meaning pages were created and our data currently sits in the physical RAM.
Q & A
Q: You mentioned back in one of sections that the pages are usually 4 KB in size. What’s the instance they have a different size, and what are those sizes ?
A: There are small (4 KB), large (2 MB) and – as of Windows 10 version 1607 x64 – huge pages (1 GB). For more details look in the “Large and small pages” section close to the beginning of chapter 5 in “Windows Internals, Part 1” (7th Edition).
Q: Why use this virtual memory concept in the first place ? It just seems to insert an unneeded level of indirection. Why not just write to RAM physical addresses directly ?
A: Microsoft itself lists 3 arguments going for the notion of virtual memory here. It also has some nice diagrams, and it’s concise for what it’s communicating across.
Q: You mentioned that on a 64-bit Windows, 64-bit compiler generated code will result in a process that can address up to 128 TB of virtual address space. However if I compute 2^64 I get a lot more than 128 TB. How come ?
A: A quote from Windows Internals:
Sixty-four bits of address space is 2 to the 64th power, or 16 EB (where 1 EB equals 1,024 PB, or 1,048,576 TB), but current 64-bit hardware limits this to smaller values.
Q: But AWE could be used from Pentium Pro times to allocate 64 GB of RAM.
A: Remember that the virtual address space is limited to 4 GB for a large-address aware, 32-bit process running on 64-bit Windows. A *lot* of physical memory could be mapped using the (by comparison, relatively small) virtual address space. In effect, the virtual address space is used as a “window” into the large physical memory.
Q: What if allocating larger int
blocks, from 10 mil to say 12 mil elements each. Would the overhead be increased proportionally ?
A: No. There are certain block sizes that seem to be used by the Large Object Heap. When allocating 12 mil elements, the overall size of the block is still 49,152 KB, with a “band” of only 2,272 KB of reserved memory. When allocating 13 mil elements, the overall size of the block goes up to 65,536 KB, with 14,748 KB of reserved space for each:
Q: What’s causing the overhead seen in the question above, as well as within the article ?
A: At this time (4/21/2019) I don’t have the answer. I do believe the low-fragmentation heap, which .NET is using under the hood for its heap implementation, holds the secret to this.
Q: Does the contiguous data found within each virtual page map to correspondingly contiguous data within the physical memory pages ? Or to rephrase, are various consecutive virtual space addresses within the same virtual page pointing to spread-out locations within a physical page, or even multiple physical pages ?
A: They are always contiguous. Refer to “Windows Internals, Part 1” (7th Edition) to chapter 5, where it’s detailed how in the process of address translation the CPU copies the last 12 bits in every virtual address to reference the offset in a physical page. This means the order is the same within both the virtual page as well as the physical one. Note how RamMap shows the correspondence of physical-to-virtual addresses on a 4 KB boundary, or exactly the size of a regular page.
Q: In all the animation and figures I’m seeing a yellow chunk of space, alongside the green one for “Managed Heap”. This yellow one is labeled “Private Data”, and it’s quite large in size. What’s up with that ?
A: There’s a bug in the current version of VMMap, whereby the 32-bit version – needed to analyze the 32-bit executable for the int
allocator code – incorrectly classifies virtual addresses pointing to .NET allocated data above 2 GB as private data, instead of managed heap. You’ll also see that the working set for all int
arrays classified as such appears to be nothing – when in reality this is not the case. I’m currently (4/21/2019) in contact with Mark Russinovich (the author of VMMap) to see how this can be fixed. The bug however doesn’t exist in the 64-bit version of VMMap, and all the allocations will correctly show up as ‘Managed Heap’.
Q: I’d like to understand more about the PTE structures. Where can I find more information ?
A: Look inside chapter 5 (“Memory Management“) within “Windows Internals, Part 1” (7th Edition). There’s an “Address Translation” section that goes into all the details, complete with diagrams.
Q: Your article is hard to follow and I can’t really understand much. Can you recommend some sources that do a better job than you at explaining these concepts ?
A: Eric Lippert has a very good blog post here. There’s also a very nice presentation by Mark Russinovich here which handles a lot of topics about memory (including a 2nd presentation, also 1+ hours long). Though both sources are quite dated, being several years old, the concepts are very much current.
Q: Where can I find more info about the Platform Target setting in Visual Studio ?
A: The previous post on this very blog describes that in detail. You can start reading from this section.
Q: I’ve tried duplicating your VMMap experiment, but sometimes I’m seeing that the largest free block available is in excess of 100 KB. This is more than double the size of an int
array, which should take around 49 KB (39KB + 10KB reserve), so there should’ve been space for at least one subsequent allocation. What’s going on ?
A: I don’t have a thorough answer for this right now (4/21/2019). I’ve noticed this myself. My only suspicion is that something extra goes on behind the scenes, aside the simple allocation for the int
array, such as the .NET allocation mechanism going after some extra blocks of memory.
Q: I heard an int
takes a double amount of space on a 64-bit system. You’re stating in this article that it’s 4 bytes on either 32-bit/64-bit. You’re wrong !
A: Don’t confuse an IntPtr
– whose size is 4 bytes on a 32-bit platform and 8 bytes on a 64-bit one – which represents a pointer, to an int
value. The pointer contains that int
variable’s address, but what’s found at that address is the int
value itself.