List vs ArrayList in Azure Functions

You’re using Azure Functions, and need to decide on the types to use in your C# code in order to implement whatever you’re trying to achieve. In today’s world of powerful cloud computing with sometimes cheap or downright free resources, is it worth pondering over the best type to use ? After all, you get 400,000 GB-s for free per month as part of the consumption plan. And the code within this Azure Function will be just a few dozen lines, tops. Would it matter selecting an efficient data structure for your implementation ?

Let’s get specific. Say storing 10 mil numbers is needed each minute as an interim step for further processing. Do you go with ArrayList or List<int> ? Your 2 options below:

Figure 1 – Function execution units for both the ArrayList and the List<int> functions against 10 mil elements, running once each minute

Oh, and you get to pay for the area under the graph of each function. How much does that come to ? Read on.

We’ll briefly look at the code used in both functions, as it shows up in Azure Functions, how the cost is computed, measure the time it takes to run the code, discuss the memory usage and finish by analyzing a few scenarios together with the resulting costs. Note that only the consumption plan will be considered.

The Code

We’ll be using some very simple C# code to add 10 mil random int values to an ArrayList and to a List<int>, each part of the respective Azure Function:

using System;
using System.Collections;

public static void Run(TimerInfo myTimer, ILogger log)
{
    ArrayList numbers = new ArrayList();
    Random random = new Random(1);
    int noNumbers = 10000000;
    for(int i=0;i<noNumbers;i++) {
        numbers.Add(random.Next(10));
    }

    log.LogInformation($"Created an ArrayList of {numbers.Count} elements");
    log.LogInformation($"C# Timer trigger function executed at: {DateTime.Now}");
}

using System;
using System.Collections;

public static void Run(TimerInfo myTimer, ILogger log)
{
    List<int> numbers = new List<int>();
    Random random = new Random(1);
    int noNumbers = 10000000;
    for(int i=0;i<noNumbers;i++) {
        numbers.Add(random.Next(10));
    }

    log.LogInformation($"Created a List of {numbers.Count} elements");
    log.LogInformation($"C# Timer trigger function executed at: {DateTime.Now}");
}

Each Azure Function is linked to a trigger set to fire every minute:

Figure 2 – The time trigger used for both functions

The Cost: A Fortune Or Mere Pennies ?

An Azure Function’s cost depends on 2 things – a notion called “execution time” and the number of times the function is executed. The execution time is a measure of resource consumption – as per the documentation: “observed resource consumption is calculated by multiplying average memory size in gigabytes by the time in milliseconds it takes to execute the function“.

The chart in figure 1 is simply showing the memory used by each function as it ran, just as the time progressed. Summing the area under the graph yields the number of execution units.

The units you’re seeing in the chart are expressed in MB-ms. The prices in the official Microsoft page however represent costs per GB-s. To get from MB-ms to GB-s, one needs to simply divide by 1,024,000 (1,024 for going from MB to GB x 1,000 for going from ms to s). So for the value 205.86M in figure 1, this comes down to roughly 200 GB-s per hour.

Per month, we’ll get about 144,000 GB-s (the value above, x24 hours/day x 30 average days/month). This is well inside the free grant of 400,000 GB-s per month.

For our example, since we’re only running each function once per minute – which comes to 43,200 executions per month – we’re nowhere near the free grant of 1 mil execution per month.

So on its own, running our one function doesn’t incur any costs.

The Unforgiving Minute

But how about the performance of the 2 Azure Functions ? Adding the 10 mil elements will most likely not be an end in itself, so as an intermediary step we’re interested in handing off the result as soon as possible to the next step, perhaps another Azure function. How long does it take for the functions to complete ? ArrayList first, followed by List<int>:

Figure 3 – Time to add 10 mil int values to an ArrayList in Azure Functions
Figure 4 – Time to add 10 mil int values to an List<int> in Azure Functions

In the center graph, you can see that it takes 10 seconds on average for running the ArrayList code once. The List<int> code fares much better, at about 1 second per run. Even so, the times are far from what one sees on a regular Intel i7-powered laptop, running the “classical” .NET Framework. The analysis on such a host (the 3-part series can be found here 1 / 2 / 3) yielded around 1.1 seconds for the ArrayList method and around 200 ms for the List<int> one. As such, we’ve went from a difference by a factor of 5 (in the controlled, BenchmarkDotNet machine isolated environment) to one of a factor of 10 in Azure Functions on the consumption plan, while using the same code. If choosing the right type was important before, now it is even more so.

Additionally, note that for both Azure Functions there are occasional spikes that sometimes push the time to almost double the average. If you have sensitive timeouts, whereby the downstream steps need the result in a specific timeframe, then choosing the “wrong” type could break your workflow.

Memory Usage

The nice thing is that there’s an excellent official pricing calculator for Azure Functions. As one of its parameters is a field called “Memory size”, you might rightly think that this refers to how much memory the C# code within the Azure function is using during its lifetime. And for our 2 samples that are each adding 10 mil elements, we already know the exact values for this since we’ve computed them here128 MB for the List<int> and about 242 MB for ArrayList.

Note that the memory values above are for 32-bit code. How do we know that the Azure Function we’re using is itself running 32-bit code as well ? Because the default corresponding setting tells it to:

Figure 5 – Default platform setting for an Azure Function based on .NET Core

Yet this C# code can’t really run just by itself. The code is compiled and then a host process needs to run the resulting executable. And a process will use virtual memory for other things aside the types declared within the user code, such as its image (the executable itself mapped in memory), mapped files and shareable data for the various DLLs it might be using (and other file types as well), a stack for each of its threads and its own heap.

Here’s how the virtual memory usage looks like for the process running our ArrayList code, against the standard .NET Framework:

Figure 6 – VMMap against the ArrayList function adding 10 mil elements

The data allocated on the heap by our code – which includes the ArrayList and all the boxed int objects its elements are pointing to – is included in the “Private Data” category (normally this should be included in the “Managed Heap” section, but there is a history of not seeing managed heap correctly in VMMap (SO question here)). The value itself, highlighted in green and translating to roughly 284 MB, is significantly larger than the space required only for the ArrayList type and its dependent boxed ints, as there are various reserved blocks involved as well. As for the overall virtual memory used by the process itself, as expected it’s even larger (red highlight).

Now according to Microsoft, you’re being charged by the process’ private bytes, as described here, not against the overall virtual memory footprint for the process. However if one looks at either the Function App’s “private bytes” or the App Insights’ “process private bytes” – which are about the same – the difference is just too big. Here’s how much both of the “private bytes” metrics show in case of ArrayList which – as seen above – should consume around 242 MB during its 10-mil-element-adding lifetime:

Figure 7 – Private bytes metrics against ArrayList function, while adding 10 mil elements

That’s in excess of 100 MB as to what one would expect. The closest the values depicted here (~355 MB) come to anything displayed previously in VMMap is to the total virtual memory size for the process as seen in VMMap in figure 6 (~373 MB).

And indeed, the Azure Functions are running on .NET Core runtime, as opposed to the code against which the VMMap results were obtained from, which ran on a machine with .NET Framework. However one would not expect for the memory footprint to go up with .NET Core, but quite the contrary, if anything. Also, since there are GCs occurring while the code runs, and because VMMap doesn’t collect data too often (only about every 1.5s as seen in the trace), it’s “Private” section might miss some of the difference up until 242 MB.

Let’s see what happens if we feed the observed private bytes size into the computation used for the execution units:

355 MB x 10 s /run = 3,550,000 (MB-ms) each minute
3,550,000 (MB-ms) x 60 (each minute for an hour) = 213,000,000 (MB-ms)

The latter value is quite close to what was observed as the average execution units in figure 1, so it’s pretty certain that the private bytes metric is the one used in the cost calculation. Which leaves in turn the question of why is so much private memory used in Azure Functions ?

Getting in touch with Microsoft Support revealed a bigger picture. The code doesn’t run within the standard .NET Core host for a console app, but underneath an IIS process (w3wp.exe). It needs this because Azure Functions rely on Azure WebJobs (detailed here). As such, it’s the IIS process’ private bytes that the users will be charged against. And the overall allocated memory won’t matter, but only how much w3wp.exe happens to consume at any particular time while the function is running.

The amount of memory consumed just by our data towards the end (when the ArrayList is almost filled) is around 180 MB, since the ArrayList is implemented by allocating object[] arrays whose length double as the number of elements outgrow the current one; for the last such internal array, its length will be 16,777,216 and each object reference is 4 bytes (32-bit platform), thus consuming 67,108,864 bytes (exactly 64 MB); the boxed ints themselves will take 10 mil x 12 bytes = 120,000,000 bytes (~117 MB), yielding a total of ~181 MB.

The own internal data that w3wp.exe allocates will come on top of these 181 MB, yielding the value seen back in figure 1.

There’s still a GitHub issue currently opened here for some of the nitty gritty details – I’ll update the article once new info is available.

In a final twist, according to the official pricing article, “memory used by a function is measured by rounding up to the nearest 128 MB“. So the 355 MB would have to be rounded to 384 MB (the closest 128 MB multiple) before multiplied as in the 2 operations above, yielding yet an even larger MB-ms value for an hour, and farther away from the observed one. The mystery remains, for now.

Pushing the Buttons

For now let’s assume a 384 MB memory usage for the ArrayList function, running against the 10 mil elements. How about running more functions at the same time ?

Well, running 3 functions, each just like the ArrayList one described so far, will get you into paying territory:

Figure 8 – Cost incurred while running 3 ArrayList functions concurrently, each adding 10 mil elements once a minute

Even if we’d run 100 such ArrayList functions, the cost after one month will not be overly high:

Figure 9 – Cost incurred while running 100 ArrayList functions concurrently, each adding 10 mil elements once a minute

But why stop at a mere 10 mil as the number of elements ? Let’s get more, but stay within the 1.5 GB of memory that an Azure Function is currently limited to. Let’s raise the number of elements to 20 mil, and see how the ArrayList function performs over a few dozen minutes:

Figure 10 – Performance of the ArrayList function, while operating against 20 mil elements

Notice that as soon as the number of elements increases, the memory used as well as the time it takes for the code to run increase as well. The closest multiple of 128 MB to the observed memory consumption is 512 MB. Let’s update the numbers in the calculator accordingly, keeping the executions for 100 functions in parallel:

Figure 11 – Cost incurred while running 100 ArrayList functions concurrently, each adding 20 mil elements once a minute

Now for the List<int> function running for the same number of 20 mil elements:

Figure 12 – Performance of the List<int> function, while operating against 20 mil elements

The private bytes metrics weren’t quite consistent, with one of them actually missing most of the time. The average “process private bytes” value was chosen, and this was fed into the calculator, again keeping the executions to 100 functions in parallel:

Figure 13 – Cost incurred while running 100 List<int> functions concurrently, each adding 20 mil elements once a minute

Replacing the ArrayList with List<int> this time results in a save of $700.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s