Views: 626
Number of votes: 4
Average rating:

Two bugs in ASPNET that break Cache memory management

TL;DR: A bug in ASPNET memory monitoring prevents the cache monitoring / scavenging tread from running if the memory usage in the host at the time of application startup happens to be in a specific range in relation to the configured cache thresholds. Another bug causes the same behavior if no cache threshold is explicitly configured (falling back to default). This can further cause problems in Azure Web Apps because it triggers the Proactive Auto Heal feature, repeatedly recycling the application for no reason. The workaround is to set the privateBytesPollTime option to less than 30 seconds and also provide a value for percentagePhysicalMemoryUsedLimit (90 % or less to not conflict with Proactive Auto Heal).

Intro and previous investigations

In my previous blog post I started to investigate a problem with the ASPNET HTTP runtime cache and Azure Proactive Auto Heal and the buggy behavior of the configured memory usage threshold. I found an apparent correlation between certain settings for percentagePhysicalMemoryUsedLimit and the application failing to keep the cache memory usage under control, causing Proactive Auto Heal to recycle the application. But I also acknowledged that the behavior was intermittent, and I now know why.

A quick overview of ASPNET memory management internals

ASPNET (this refers to .net Framework 4.7.1) uses the AspNetMemoryMonitor class to monitor the application's memory usage. Internally it uses the LowPhysicalMemoryMonitor class to periodically compare the current memory pressure to the configured (or default) thresholds, and trigger cache scavenging/trimming if needed. Or, at least it is supposed to do this...

Bug 1: Periodic checks fail to start under certain conditions

For the periodic memory checks, LowPhysicalMemoryMonitor uses a System.Threading.Timer instance that is initialized like this in its constructor:

this._timer = new Timer(new TimerCallback(this.MonitorThread), (object) null, -1, this._currentPollInterval);

Note the third argument, -1, which means that the timer is initialized with a due date (next time it will trigger the callback) "infinitely" far off in the future. So essentially the timer is disabled from the start.

The framework will then call the LowPhysicalMemoryMonitor.Start method which calls AdjustTimer(false). AdjustTimer is actually called within the timer callback, to adjust the time for the next callback according to the current memory pressure. The callback will run more often if the memory pressure is high. So AdjustTimer is designed to do more than just start the timer for the Start method. It looks like this (reflected code):

internal void AdjustTimer(bool disable = false)
{
    lock (this._timerLock)
    {
        if (this._timer == null)
            return;
        if (disable)
            this._timer.Change(-1, -1);
        else if (this.PressureLast >= this.PressureHigh)
        {
            if (this._currentPollInterval <= 5000)
                return;
            this._currentPollInterval = 5000;
            this._timer.Change(this._currentPollInterval, this._currentPollInterval);
        }
        else if (this.PressureLast > this.PressureLow / 2)
        {
            int num = Math.Min(LowPhysicalMemoryMonitor.s_pollInterval, 30000);
            if (this._currentPollInterval == num)
                return;
            this._currentPollInterval = num;
            this._timer.Change(this._currentPollInterval, this._currentPollInterval);
        }
        else
        {
            if (this._currentPollInterval == CacheSizeMonitor.PollInterval)
                return;
            this._currentPollInterval = CacheSizeMonitor.PollInterval;
            this._timer.Change(this._currentPollInterval, this._currentPollInterval);
        }
    }
}

PressureLast is the last observed memory pressure (% physical memory used).
PressureHigh is the configured (or default) percentagePhysicalMemoryUsedLimit.
PressureLow is a value calculated from PressureHigh, lower than PressureHigh (generally it is PressureHigh - 9).
s_pollInterval is a default timer interval that is initialized to the value of the privateBytesPollTime setting, converted to milliseconds. The default for privateBytesPollTime is 2 minutes.
_currentPollInterval is the timer interval to use, which is also set to the Timer using the Change method. This field is initialized to 30 seconds.

Now, assume AdjustTimer is called by Start to start the timer. The fields will have the values described above, and most importantly, PressureLast is some value it has actually measured. So it can be anything. It can fulfill PressureHigh > PressureLast > PressureLow / 2, going into the second else if. That will get the minimum of 30000 and s_pollInterval, which by default is 120000, so 30000 is used. Then it compares 30000 to _currentPollInterval and does an early exit if they are the same, which they are because that is what _currentPollInterval was initialized with! This means the timer is never activated, AdjustTimer is never called again, and the LowPhysicalMemoryMonitor and the cache scavenging handled by it will remain inactive for the lifetime of the application.

How does this explain the apparent correlation with percentagePhysicalMemoryUsedLimit I described in the previous blog post? Because values in the range identified in that post are likely to arrange PressureLow and PressureHigh in such a way that a typical PressureLast value at application startup would match the scenario described above. At low values for percentagePhysicalMemoryUsedLimit it is much more likely to end up in the PressureLast >= PressureHigh scenario, and at high values it is more likely to end up in the PressureLast <= PressureLow / 2 scenario.

Bug 2: Periodic checks are stopped if default memory threshold is used

Let's take a look at LowPhysicalMemoryMonitor.ReadConfig, which is called from the constructor:

private void ReadConfig(CacheSection cacheSection)
{
  int val2 = cacheSection != null ? cacheSection.PercentagePhysicalMemoryUsedLimit : 0;
  if (val2 == 0)
    return;
  this._pressureHigh = Math.Max(3, val2);
  this._pressureLow = Math.Max(1, this._pressureHigh - 9);
  LowPhysicalMemoryMonitor.s_pollInterval = (int) Math.Min(cacheSection.PrivateBytesPollTime.TotalMilliseconds, (double) int.MaxValue);
}

Note the first line. The default for percentagePhysicalMemoryUsedLimit if it is omitted from config is actually 0. So unless you set that setting, you will end up in the early exit in line 3. The most important implication of that is that the last line never runs, so s_pollInterval retains its default value of 0.

Now say that the Bug 1 was not triggered and the timer is now running, periodically calling into AdjustTimer. Then take look again at the PressureLast > PressureLow / 2 case. Because s_pollInterval is now 0, that will be used as the new interval set to the timer. The call chain from Timer.Change ends up in System.Threading.TimerQueue.UpdateTimer where the interval (period) is used like this:

timer.m_period = (period == 0) ? Timeout.UnsignedInfinite : period;

So passing 0 disables the timer. And again we are in a situation where the memory monitoring and cache scavenging will be off for the lifetime of the application.

Workaround

Bug 1 can be avoided by specifying a privateBytesPollTime setting of less than 30 seconds, forcing the first call to AdjustTimer to adjust the timer and thereby starting it, even in the PressureLast > PressureLow / 2 scenario.
Bug 2 can be avoided simply by specifying a setting for percentagePhysicalMemoryUsedLimit other than the default of 0.

  Please login to post a comment