What happens from the moment you launch a .NET Core application, until the very first managed instruction is executed ? How do we get from the simple act of running an .exe to having the managed code executing as part of a .NET Core application ?
This is what this post sets up to explain in detail, dealing with:
You’re using Azure Functions, and need to decide on the types to use in your C# code in order to implement whatever you’re trying to achieve. In today’s world of powerful cloud computing with sometimes cheap or downright free resources, is it worth pondering over the best type to use ? After all, you get 400,000 GB-s for free per month as part of the consumption plan. And the code within this Azure Function will be just a few dozen lines, tops. Would it matter selecting an efficient data structure for your implementation ?
Let’s get specific. Say storing 10 mil numbers is needed each minute as an interim step for further processing. Do you go with ArrayList or List<int> ? Your 2 options below:
Oh, and you get to pay for the area under the graph of each function. How much does that come to ? Read on.
Last time we saw that the code that added random numbers to the ArrayList instance was spending 75% of its total time (1,141.5 ms) just doing GC. Computing the other 25%, representing the actual work, gives out about 285 ms. But how can this be, since the time the List<T> code takes – including its own GC activity – is 223 ms ? Is the List<T> implementation really that much efficient, that even with GC activity, it manages to complete faster that the actual time spent doing work in the case of the ArrayList one ? Are generics just that magical ?
In the first part of this 3-article series, we’ve found to our astonishment that code based on ArrayList performs a whole lot worse than the same one using List<T>.
In order to understand why this is so, we’ll break up the code in smaller parts, see what these involve under the hood and how much time each takes. We should then be able to pinpoint the operation(s) that have the major contribution to the difference in performance. We’ll do this for both structures, and start with ArrayList in this post.
Consider consuming a series of bytes and storing them in memory for future processing. Let’s take the code snippet below:
const int noNumbers = 10000000; // 10 mil
ArrayList numbers = new ArrayList();
Random random = new Random(1); // use the same seed as to make
// benchmarking consistent
for (int i = 0; i < noNumbers; i++)
int currentNumber = random.Next(10); // generate a non-negative
// random number less than 10
numbers.Add(currentNumber); // BOXING occurs here
Is this code good from a performance standpoint ? Not really, it’s actually quite appalling. Take a look at the running times for the ArrayList snippet above and, for the same exact code, but which uses List instead:
This makes the ArrayList code more than 5 times slower than the List one.
Boxing happens in the first case, but not in the second. This mechanism explains most of the performance difference seen above. Intrigued ? Read on.
Ever wondered how many methods with a specific signature are contained within an assembly ? Say, how many methods that take at least one parameter of type object are in all the types marked as public within the mscorlib.dll assembly.
(TL;DR: I’m in a real hurry, and need the code to do this really quick ! Where is it ? You’ll find it towards the end of the article, here)
mscorlib.dll is a somewhat challenging example, since this assembly contains all the core types (Byte, Int32, String, etc) plus many more, each with scores of methods. In fact, the types within are so frequently used that the C# compiler will automatically reference this assembly when building your app, unless specifically instructed not to do so (via the /nostdlib switch).
You’ve just created a Console app in the latest Visual Studio, and wrote some C# code that allocates some non-negligible quantity of memory, say 6 GB. The machine you’re developing has a decent amount of RAM – 16GB – and it’s running 64-bit Windows 10.
You hit F5, but are struck to find the debugger breaking into your code almost immediately and showing:
What’s going on here ? You’re not running some other memory-consuming app. 6 GB surely should have been in reach for your code. The question that this post will set out to answer thus becomes simply: “Why do I get a System.OutOfMemoryException when running my recently created C# app on an idle, 64-bit Windows machine with lots of RAM ?“.
TL;DR (small scroll bar => therefore I deduct a lot of text => I’ve got no time for that, and need the answer now): The default build settings for Visual Studio limit your app’s virtual address space to 4 GB. Go into your project’s Properties, go to Build, and choose Platform target as x64. Build your solution again and you’re done.
Not so fast ! Tell me more about what goes on under the hood: Buckle up, since we’re going for a ride. First we’ll look at a simple example of code that consumes a lot of memory fast, then uncover interesting facts about our problem, hit a “Wait, what ?” moment, learn the fundamentals of virtual memory, find the root cause of our problem then finish with a series of Q&A.