Try our conversational search powered by Generative AI!

Loading...
Area: Optimizely Search & Navigation
ARCHIVED This content is retired and no longer maintained. See the latest version here.

Recommended reading 

Connectors

This topic provides an introduction to predefined search connectors and configuration options for these in Episerver Find. Search connectors let you index content from other sources, and integrate search results for these on your website.

Including external content

Episerver Find supports these predefined search connector types to include external content: Crawler and RSS/Atom.

Content indexed with connectors uses EPiServer.Find.Framework.WebContent.

C#
public class WebContent
{
    public String SearchTitle;
    public String SearchHitUrl;
    public String SearchText;
    public String SearchSummary;
    public Dictionary<string, IndexValue> SearchMetaData { get; set; }
}

Fine-tuning crawling and indexing

You can fine-tune indexing by excluding internet media types, and excluding or including parts of a website to be crawled and indexed from the Episerver Find administrative interface. See the Episerver User Guide.

Excluding media types

When excluding media types, follow the standard method of classifying internet file types. See also: Media Types. The following media types are excluded by default when indexing:

  • text/css
  • text/javascript
  • text/ecmascript
  • application/x-pointplus
  • application/x-javascript
  • application/javascript
  • application/ecmascript

Excluding query strings

You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL http://www.episerver.se/cms/innehallshantering?utm_source=google, you can exclude utm_source to prevent the unintentional incrementing of a campaign counter.

Common exclusions of this type:

  • PHPSESSID
  • SESSIONID
  • JSESSIONID
  • ASPSESSIONID
  • sid
  • zenid

Note: All strings are case-sensitive. Include no wildcards nor whitespaces.

Patterns and globbing

Globbing lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. All excluded fields support Glob patterns. The crawler connector uses patterns similar to those in robots.txt.

Pattern Example Corresponding regex
'*' */abc/
/root/
.*/abc/.*
.*://.*/root/.*
'?' */???/ .*/.../.*
'{', '}', ',' {abc,def} .*(abc|def).*
'[', ']', '!', ',' [0-9,xyz][!abc] .*[0-9xyz][^abc].*
',' abc,def .*abc,def.*
'\' \*\?\,\{\}\[\]\\ .*\*\?\,\{\}\[\]\\.*
'.', '(', ')', '+', '|', '^', '$', '@', '%'  .()+|^$@% .*\.\+\|\^\$\@\%.*

Include patterns

Parameter name 'included_crawl_patterns'. Can be a single globbing pattern as string or an array of globbing patterns.
Default: Seed base URLs.

Exclude patterns

Parameter name 'excluded_crawl_patterns'. Can be a Single globbing pattern as a string or an array of globbing patterns. Overrides include patterns.
Default: '.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}'

No index patterns

Parameter name excluded_index_patterns can be a single globbing pattern as a string or an array of globbing patterns.

Do you find this information helpful? Please log in to provide feedback.

Last updated: Nov 16, 2015

Recommended reading