[Thread was being aborted] for long running scheduled jobs

Vote:
 

We use scheduled jobs to handle many tasks on our platform (Catalog indexing, product imports, site maps, etc.). Some of these scheduled jobs run for a long time (up to 5-6 hours). 

The last few months, these jobs fail with "Thread was being aborted" in the Episerver Scheduled jobs UI. They are scheduled to run every night, and I would say that they fail in about 8 out of 10 runs.

I have done the following things to try and solve this:

  • Deactivate CPU and memory limit on the CMS and Commerce application pools.
    • In advanced settings set CPU limit to 0 and Private Memory limit to 0.
  • Set start mode of all app pools to "AlwaysRunning"
  • Verified that no app pool recycles is made during the run of the job.
  • Verified that the server does not run out of CPU or Memory.
  • Set up a script using curl to request the site every five minutes.

Any tips on how to resolve this issue is greatly appreciated!

#228804
Oct 02, 2020 9:18
Tomas Hensrud Gulla - Oct 02, 2020 11:09
Everything I was going to suggest, you have already done.
Vote:
 

Would it be possible that you look into the log and obtain the full stacktrace of the error?

#228813
Oct 02, 2020 12:23
Vote:
 

Thanks for helping out!

Which log would I need to look into in order find this error?

We have a dedicated log file for the catalog indexing job. In this log, there's an abrupt stop to all log messages at the time of the "thread was being aborted" message.

I have also looked through our main log file (Defined as ErrorFileLogAppender in our log4net config). I cannot find a related stack trace in this log, but I do find this message:

2020-09-29 00:30:13,159 [387] ERROR EPiServer.MarketingAutomationIntegration.InitializationModule: There is no Marketing Automation provider package installed. Halt MA InitializationERROR EPiServer.MarketingAutomationIntegration.InitializationModule: There is no Marketing Automation provider package installed. Halt MA Initialization

I don't think this message is related the problem, but since this message is logged by a InitializationModule - this probably indicates an application restart?

From the history of the failing scheduled job, it seems that the thread was aborted at 12:28 (Which is only to minutes from the log entry above)

9/29/2020 12:28:54 AM FAILED Thread was being aborted.

Could this indicate that the jobs are aborted because of an unexpected shutdown of the application?

#228814
Edited, Oct 02, 2020 12:37
Vote:
 

Strange. That looks like an error message that should only be thrown during startup. Did your site restart during that time?

Could you wrap your job into a big try catch and try to log the error there?

#228817
Oct 02, 2020 12:51
Vote:
 

Yes - this message should only be thrown during startup. Where can I monitor for site restarts?

I'll try wrapping the job in a try catch and see if I can get the error.

It's worth mentioning that we experience this issue with multiple scheduled jobs.

In another scheduled job (run time when successful ~1 hour) we get this error in the Scheduled job history tab:

10/2/2020 1:55:50 PM FAILED Exception in ecf_CatalogEntryByCode: [Thread was being aborted.]

There's no error logged here either.

#228818
Edited, Oct 02, 2020 13:11
Vote:
 

It seems that the job is already wrapped in a try catch, so the error should be logged:

        public override string Execute()
        {
            if (!Config.CanBeExecutedNow)
            {
                var message = string.Format("Job can't be executed at this time. {0}", Config.Time);
                Log.Error(message);
                return message;
            }

            var lockTaken = false;
            Monitor.TryEnter(SyncObject, ref lockTaken);
            if (!lockTaken)
            {
                return "Job was already running.";
            }

            try
            {
                LogInfo("Started");

                var result = new StringBuilder();
                foreach (var applicationRow in AppContext.Current.GetApplicationDto().Application)
                {
                    var sw = Stopwatch.StartNew();
                    IndexApplication(applicationRow.Name);
                    sw.Stop();
                    result.AppendLine(string.Format("Indexing of {0} completed in {1}", applicationRow.Name, sw.Elapsed.Friendly()));
                }

                LogInfo("Ended");

                return result.ToString();
            }
            catch (Exception e) 
            {
                Log.Error(e);
                return e.Message;
            }
            finally
            {
                Monitor.Exit(SyncObject);
            }
        }

Maybe the solution is in figuring out why these restarts happen multiple times every day - the error

 ERROR EPiServer.MarketingAutomationIntegration.InitializationModule

happens multiple times every day.

How would I proceed to debug the unexpected restarts?

#228820
Oct 02, 2020 13:51
Vote:
 

https://weblogs.asp.net/scottgu/433194 perhaps this could help.

Also you can return the e.ToString() instead of e.Message 

#228821
Oct 02, 2020 13:57
Henrik S - Oct 02, 2020 14:20
Thanks! I'll try that. :)
Vote:
 

I was able to log the application restart using the approach you linked, and the following entry was logged to the event log:

The following information was included with the event: 



_shutDownMessage=BuildResult change, cache key=cindexingservice.svc.1f180bcd
HostingEnvironment initiated shutdown
HostingEnvironment caused shutdown

_shutDownStack=   at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
   at System.Environment.get_StackTrace()
   at System.Web.Hosting.HostingEnvironment.InitiateShutdownInternal()
   at System.Web.Hosting.HostingEnvironment.InitiateShutdownWithoutDemand()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

It seems related to the issue experienced by other users also: https://world.episerver.com/forum/developer-forum/Problems-and-bugs/Thread-Container/2019/8/application-restart-loop-because-of-buildresult-change-cache-keycindexingservice-svc-1f180bcd

#228845
Oct 03, 2020 13:04
Vote:
 

We were able to solve this issue by following the instructions in this post.

Our issues was not caused by request timeouts or unexpected app pool recycles, which often seem to be the cause of these kinds of problems.

Logging the application restarts as proposed by Quan M. was key in further debugging.

Thanks!

#229454
Oct 15, 2020 7:45
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.