Loading...
Area: Episerver Search & Navigation
Applies to versions: 12 and higher
Other versions:

Removing HTML tags

Recommendations [hide]

This topic describes how to remove HTML tags prior to indexing of objects in Episerver Search & Navigation (formerly Episerver Find), so that the HTML markup is not displayed in the search results.

How it works

In most situations where content to be indexed contains HTML tags, remove the tags before indexing. If you do not, HTML markup is returned in search results.

Examples

Example of removing HTML tags from a specific RemoveHtmlTagsWhenIndexing attribute found in the EPiServer.Find.Json namespace:

using EPiServer.Find.Json;

public class WithStringProperty
  {
    public string Title { get; set; }

    [RemoveHtmlTagsWhenIndexing]
    public string Content { get; set; }
  }

You also can customize the Client conventions to remove HTML tags from all string fields:

client.Conventions.ForInstancesOf<object>()
  .FieldsOfType<string>().StripHtml();

To remove HTML tags from a specific field when indexing a specific type, use the ForType and Field methods:

client.Conventions.ForType<BlogPost>()
  .Field(x => x.Content).StripHtml();

The StripHtml method also performs HTML decoding. The goals are to index the text that users see when viewing the page, and to be able to find that content.

For example, the Swedish text Jag gillar äpplen is stored as Jag gillar &#228;pplen, and is decoded back when indexing. This means that a user can find the text using a query like äpplen.

Do you find this information helpful? Please log in to provide feedback.

Last updated: Oct 31, 2016

Recommendations [hide]