Loading...
Area: Episerver Search & Navigation
Applies to versions: 12 and higher
Other versions:

Connectors

Recommendations [hide]

This topic provides an introduction to predefined search connectors and configuration options for these in Episerver Search & Navigation (formerly Episerver Find). Search connectors let you index content from other sources, and integrate search results for these on your website.

Including external content

Episerver Search & Navigation supports these predefined search connector types to include external content: Crawler and RSS/Atom.

Content indexed with connectors uses EPiServer.Find.Framework.WebContent.

public class WebContent
  {
    public String SearchTitle;
    public String SearchHitUrl;
    public String SearchText;
    public String SearchSummary;
    public Dictionary<string, IndexValue> SearchMetaData { get; set; }
  }

Fine-tuning crawling and indexing

You can fine-tune indexing by excluding internet media types, and excluding or including parts of a website to be crawled and indexed from the Episerver Search & Navigation administrative interface. See the Episerver User Guide.

Excluding media types

When excluding media types, follow the standard method of classifying internet file types. See also: Media Types. The following media types are excluded by default when indexing:

  • text/css
  • text/javascript
  • text/ecmascript
  • application/x-pointplus
  • application/x-javascript
  • application/javascript
  • application/ecmascript

Excluding query strings

You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL http://www.episerver.se/cms/innehallshantering?utm_source=google, you can exclude utm_source to prevent the unintentional incrementing of a campaign counter.

Common exclusions of this type:

  • PHPSESSID
  • SESSIONID
  • JSESSIONID
  • ASPSESSIONID
  • sid
  • zenid

Note: All strings are case-sensitive. Include no wildcards nor whitespaces.

Patterns and globbing

Globbing lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. All excluded fields support common glob patterns. The crawler connector uses patterns similar to those in robots.txt.

Pattern Example Corresponding regex
'*' */abc/
/root/
.*/abc/.*
.*://.*/root/.*
'?' */???/ .*/.../.*
'{', '}', ',' {abc,def} .*(abc|def).*
'[', ']', '!', ',' [0-9,xyz][!abc] .*[0-9xyz][^abc].*
',' abc,def .*abc,def.*
'\' \*\?\,\{\}\[\]\\ .*\*\?\,\{\}\[\]\\.*
'.', '(', ')', '+', '|', '^', '$', '@', '%'  .()+|^$@% .*\.\+\|\^\$\@\%.*

Include patterns

Parameter name included_crawl_patterns. Can be a single globbing pattern as string or an array of globbing patterns.
Default: Seed base URLs.

Exclude patterns

Parameter name excluded_crawl_patterns. Can be a single globbing pattern as a string or an array of globbing patterns. Overrides include patterns.
Default: '.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}'

No index patterns

Parameter name excluded_index_patterns can be a single globbing pattern as a string or an array of globbing patterns.

Do you find this information helpful? Please log in to provide feedback.

Last updated: Oct 31, 2016

Recommendations [hide]