Connectors

Include external content

Optimizely Search & Navigation supports these predefined search connector types to include external content: Crawler and RSS/Atom.

Content indexed with connectors uses EPiServer.Find.Framework.WebContent.

public class WebContent {
  public String SearchTitle;
  public String SearchHitUrl;
  public String SearchText;
  public String SearchSummary;
  public Dictionary < string, IndexValue > SearchMetaData {
    get;
    set;
  }
}

Exclude media types

You can fine-tune indexing by excluding internet media types and excluding or including parts of a website to be crawled and indexed from the Optimizely Search & Navigation administrative interface. See Add connectors in the Optimizely User Guide.

When excluding media types, follow the standard method of classifying internet file types. See also: Media Types. The following media types are excluded by default when indexing:

text/css
text/javascript
text/ecmascript
application/x-pointplus
application/x-javascript
application/javascript
application/ecmascript

Exclude query strings

You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL http://www.episerver.se/cms/innehallshantering?utm\_source=google, you can exclude utm\_source to prevent the unintentional incrementing of a campaign counter.

Common exclusions of this type:

PHPSESSID
SESSIONID
JSESSIONID
ASPSESSIONID
sid
zenid

📘
Note
All strings are case-sensitive. Include no wildcards nor whitespaces.

Patterns and globbing

Globbing lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. All excluded fields support common glob patterns. The crawler connector uses patterns similar to those in robots.txt.

Pattern	Example	Corresponding regex
'*'	*/abc/ /root/	./abc/. .://./root/.*
'?'	*/???/	./.../.
'{', '}', ','	{abc,def}	.*(abc	def).*
'[', ']', '!', ','	[0-9,xyz][!abc]	.[0-9xyz][^abc].
','	abc,def	.abc,def.
''	\*?,{}[]\	.\?,{}[]\.*
'.', '(', ')', '+', '	', '^', '$', '@', '%'	.()+	^$@%	..+\|^$@%.

Include patterns

Parameter name included\_crawl\_patterns. It can be a single globbing pattern as a string or an array of globbing patterns.
Default: Seed base URLs.

Exclude patterns

Parameter name excluded\_crawl\_patterns. It can be a single globbing pattern as a string or an array of globbing patterns. Overrides include patterns.
Default: '.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}'

No index patterns

Parameter name excluded\_index\_patterns can be a single globbing pattern as a string or an array of globbing patterns.

Include external content

Exclude media types

Exclude query strings

📘Note

Patterns and globbing

Include patterns

Exclude patterns

No index patterns

📘
Note