Indexing

Introduction

This document describes the indexing in an integration with EPiServer 7.5+ CMS. Given that we have referenced the EPiServer.Find.Cms assembly in our EPiServer CMS project published content will be automatically indexed. Content are also reindexed, or deleted from the index, when they are saved, moved or deleted. Each language version is indexed as a separate document.

Indexing Module

The indexing module is an IInitializableModule that handles all DataFactory evented indexing. Whenever content is saved, published, moved or deleted it will trigger an index request to the ContentIndexer.Instance object which then will handle the actual indexing.

ContentIndexer.Instance

The ContentIndexer.Instance singleton located in the EPiServer.Find.Cms namespace adds support for indexing IContent and UnifiedFile objects. It allows for re-indexing the entire PageTree as well as specific language branches and individual content and files. When indexing an IContent object all page files are also indexed.

Invisible mode

One core feature of the ContentIndexer is its ability to work in invisible mode when indexing objects passed by the IndexingModule. When in invisible mode, all indexing will be handled in a separate thread and not in the DataFactory event thread. This way indexing wont delay the DataFactory event thread and therefore not the save/publish action. This is the default behavior and can be overridden by setting ContentIndexer.Instance.Invisible to false.

Conventions

The ContentIndexer.Instance supports a set of conversions for tweaking how indexing is executed. Examples of such conventions are controlling which pages are indexed (described below) and dependencies between pages.

Customizing pages to be indexed

It is possible to set control which content should be indexed by passing a verification expression to the ShouldIndex convention. By default all published content are indexed.

For example, if we do not want to index a page type such as the LoginPageType, this can be done by simply passing a verification expression that validates to false for the LoginPageType to the ShouldIndex convention, preferably during application startup such as in the Application_Start method in global.asax.

//using EPiServer.Find.Cms.Conventions;

ContentIndexer.Instance.Conventions
  .ForInstancesOf<LoginPageType>()
  .ShouldIndex(x => false);

To override the default setting, add a convention for PageData and add the appropriate verification expression.

//using EPiServer.Find.Cms.Conventions;
ContentIndexer.Instance.Conventions
  .ForInstancesOf<PageData>()
  .ShouldIndex(x => true);

Excluding a property from being indexed can be done by either using the JsonIgnore attribute or adding a convention for it.

//using EPiServer.Find.Cms.Conventions;
ContentIndexer.Instance.Conventions
  .ForInstancesOf<PageData>()
  .ExcludeField(x => x.ACL)

[JsonIgnore]
public DateInterval Interval { get; set; }

File indexing

Using IContentMedia, files will be indexed by default when based on any of the following MIME types:

"text/plain"
"application/pdf"
"application/postscript"
"application/msword"
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"

Changing the name or namespaces of page types

When changing the name or namespaces of your page types there will be a mismatch between the types already in your index and your new page types. This might cause errors when querying as the API cannot resolve the right page type from what is reported from the index. To solve this you have to reindex all pages, by the scheduled plugin, to have your new page types reflected in the index.

Try our conversational search powered by Generative AI!