Hide menu Last updated: Sep 21 2015

In Episerver CMS, you can track a website's broken links using the link validator scheduled job. The Link Validator scheduled job checks the links in tblContentSoftLink, performs a head request against each one, and save the link's status to tblContentSoftLink. The result of the validation job is available in the Report Center > Link Status report.

The scheduled job first gets a batch of up-to-1000 links from tblContentSoftLink. The job returns only unchecked links or links that were checked before the job started. The job uses the date the link was last checked and the re-check interval to determine if a link should be checked again.

Each link in a batch is checked using a head request, if the servers' robots.txt allows it. No host is checked more than once every five seconds. If a link exists on a host that was checked within the last five seconds, the job waits five seconds then checks the link.

The job saves the link's status, link check date, and HTTP status code (if possible) to tblContentSoftLink. The job saves information about when a link was first found broken. After the first batch of links is checked, a new batch is fetched from the database. The job continues until it can get no more unchecked links from the database, or the job's runtime exceeds the value set in maximumRunTime. The job stops if a large number of consecutive errors are found in external links, in case there is a network problem with the site server.

 

Configuring the Link Validator

No settings are required, but you can use them to customize the link validation job's behavior. Add the <linkValidator> node as a child to the <episerver> node of the web.config file. Example:

XML
<linkValidator
 externalLinkErrorThreshold="10"
 maximumRunTime="4:00:00"
 recheckInterval="30.00:00:00"
 userAgent="EPiServer LinkValidator"
 proxyAddress="http://myproxy.mysite.com"
 proxyUser="myUserName"
 proxyPassword="secretPassword"
  proxyDomain=".mysite.com"
  internalLinkValidation="Api">
   <excludePatterns>
   <add regex=".*doc"/>
    <add regex=".*pdf"/>
    </excludePatterns>
</linkValidator>

 Use these settings to configure the behavior of the Link Validation job.

  • externalLinkErrorThreshold. If there are more than the configured value of consecutive errors on external links, the job aborts. 
  • maximumRunTime. The maximum time the scheduled job executes. 
  • recheckInterval. A link that was validated as working is only rechecked after the configured time span elapses.
  • userAgent. The user agent string to use when validating a link. 
  • proxyAddress. Web proxy address to use when validating links. 
  • proxyUser. Web proxy user for authenticating proxy connection. 
  • proxyPassword. Web proxy password to authenticate the proxy connection. 
  • proxyDomain. Web proxy domain to authenticate the proxy connection. 
  • internalLinkValidation. How the link validator handles internal links. Possible values:
        - Off. Internal links are ignored.
        - Api. The internal API are used to validate that the referenced page exists. [default]
        - Request. Internal links are the checked the same as external, using a head request.
     
  • excludePatterns. A list of patterns for links that the link validation job skips. Use the regex attribute to identify what links to skip.

Known limitations

The link validator does not handle private resources with the exception of pages. This includes documents and images stored on a local file system which does not allow anonymous access. If you use forms authentication, these links are not validated and do not appear in the link report. If you use basic or Windows authentication, links to these resources result in 401 (access denied) in the link report. This may be the case for an intranet site with Windows authentication and anonymous access disabled.

Related topic

  • Configuration describes syntax used in the description of the configuration elements.

Comments