Validate links
Track broken links for a website using the link validator scheduled job.
In Optimizely Content Management System (CMS), you can track broken links for a website using the link validator scheduled job. The Link Validation scheduled job checks the links in tblContentSoftLink
, performs a head request against each one, and save the links status back to tblContentSoftLink
. The result of the validation job is available as a report called Link Validation report in Report Center.
The scheduled job first gets a batch of up-to-1000 links from tblContentSoftLink
. The job returns only unchecked links or links that were checked earlier than the time when the job started. The job uses the date the link was last checked and the re-check interval to determine if the link should be checked again.
Each of the links in the batch are checked using a head request, if the servers' robots.txt allows for this. No host is checked more than once every five seconds. If a link exists on a host that was checked in the last five seconds, the job waits five seconds and then checks the link.
The job saves the status of the link and the date the link was checked, and includes the HTTP status code if possible, to tblContentSoftLink
. The job saves information about when a link was first found broken. After the first batch of links is checked, a new batch is fetched from the database.
The job continues until it cannot get any more unchecked links form the database, or the job's runtime has exceeded the value set in maximumRunTime
. The job stops if a large number of consecutive errors are found on external links, in case there is a general network problem with the server running the site.
Configure the Link Validation
None of the settings are required but you can use them to customize the behavior of the link validation job. You can configure programmatically or with configuration files as described in Configure CMS. Here is an example
services.Configure<LinkValidatorOptions>(options => {
options.ExternalLinkMinimumRequestInterval = TimeSpan.FromSeconds(5);
options.MaximumRunTime = TimeSpan.Zero;
options.RecheckInterval = TimeSpan.FromHours(7);
options.UserAgent = "EPiServer Link Checker";
options.ProxyAddress = new Uri("http://myproxy.mysite.com");
options.ProxyUser = "myUserName";
options.ProxyPassword = "secretPassword";
options.ProxyDomain = ".mysite.com";
options.InternalLinkValidation = ValidationType.Api;
options.ExcludePatterns.Add(".*doc");
options.ExcludePatterns.Add(".*pdf");
});
The available settings are described at Link Validation.
Known limitations
The link validator does not handle private resources with the exception of pages. This includes documents and images stored on a local file system which does not allow anonymous access. If you use forms authentication, these links are not validated and do not appear in the link report. If you use basic or Windows authentication, links to these resources result in 401 (access denied) in the link report. This may be the case for an intranet site with Windows authentication and anonymous access disabled.
Updated 4 months ago