Replace tags with a space


Seems like the indexer is replacing tags with an empty string, is it possible to configure it to replace it with a space instead?

<p><em>CEO</em><br><a href="">Kalle Banan</a></p>

This gets indexed as "CEOKalle Banan" and I would like it to be "CEO Kalle Banan"

May 16, 2013 8:37


Yes, you can change the StripHtml-convention by replacing the default for strings and XHtmlStrings:

SearchClient.Conventions.ForInstancesOf<XhtmlString>().Field(x => x.AsViewedByAnonymous()).ConvertBeforeSerializing(x => x.MyHtmlStripMethod())



May 16, 2013 9:09

Was a bit to quick on the trigger for the solved link.

I use the following code, the string that is passed in to "ReplaceHtmlTagsWithSpace()" has already been stripped from all html tags...

				.ConvertBeforeSerializing(x => x.ReplaceHtmlTagsWithSpace());
Edited, May 16, 2013 10:36

I'll look into that. For now you should be able to avoid this by loading your conventions before the CMS-does. If you do this in Global.asax ApplicationStart your convention should be executed before the CMS.

May 16, 2013 11:32

No difference when I setup the convention in Application_Start, the html tags have been stripped before it hits my breakpoint inside ReplaceHtmlTagsWithSpace(). Using EPiServer.Find

May 16, 2013 13:20

Ok, then we need to use the final trick and reset the convention for the properties where you have this issue by Excluding/Including:

client.Conventions.ForInstancesOf<PageData>().ExcludeField(x => x.PageName);
client.Conventions.ForInstancesOf<PageData>().IncludeField(x => x.PageName);
       .Field(x => x.PageName)
       .ConvertBeforeSerializing(x => x.MyExtension());

A little messy but it will solve your isse without affecting any querying code (as it would have if we had extended all properties with an pre-stripped-html version).

May 16, 2013 14:50

Yep, that solved it! It would be nice though if it were possible to do it using the other solution you posted... :)

May 16, 2013 15:11

A possibly better way of doing this is to "overwrite" the PageName used by the JSON serializer, aided by the JsonPropertyAttribute like so

[JsonProperty(PropertyName = "PageName")]
public string PageNameForIndexing
     get { return PageName.ReplaceHtmlTagsWithSpace(); }

This solution also makes possible the combination of other page properties as value for the PageName property, as was my requirement.

Edited, Nov 17, 2014 12:55
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.