Removing HTML tags
Describes how to remove HTML tags prior to indexing of objects in Optimizely Search & Navigation, so that the HTML markup is not displayed in the search results.
In most situations where content to be indexed contains HTML tags, remove the tags before indexing. If you do not, search results return HTML markup.
Example of removing HTML tags from a specific RemoveHtmlTagsWhenIndexing
attribute found in the EPiServer.Find.Json
namespace:
using EPiServer.Find.Json;
public class WithStringProperty {
public string Title {
get;
set;
}
[RemoveHtmlTagsWhenIndexing]
public string Content {
get;
set;
}
}
You also can customize the Client
conventions to remove HTML tags from string fields:
client.Conventions.ForInstancesOf<object>()
.FieldsOfType<string>().StripHtml();
To remove HTML tags from a specific field when indexing a particular type, use the ForType
 and Field
 methods:
client.Conventions.ForType<BlogPost>()
.Field(x => x.Content).StripHtml();
The StripHtml
 method also performs HTML decoding. The goals are to index the text that users see when viewing the page and to be able to find that content.
For example, Optimizely Search & Navigation stores the Swedish text Jag gillar äpplen as Jag gillar äpplen and decodes it back when indexing. A user can find the text using a query like äpplen.
Updated 9 months ago