HomeDev GuideAPI Reference
Dev GuideAPI ReferenceUser GuideGitHubNuGetDev CommunitySubmit a ticketLog In
GitHubNuGetDev CommunitySubmit a ticket

.NET Indexing

Explains what indexing in .NET is, what happens when objects are indexed, and how to modify the indexing, for example with regards to availability and identity of indexed documents.

Indexing sends an object to the Optimizely Search & Navigation service for storage and analysis to retrieve it as search results. It overwrites the results if a document of the same type and ID already exists. The .NET client API supports indexing any .NET object.

Because objects are serialized upon indexing, the only restriction on them is their serializability. If they are typically circular references, they cannot be serialized. The API supports customizing how an object is serialized and indexed. you can use this flexibility to implement functionality, such as indexing the return value of extension methods.

Mappings in Optimizely Search & Navigation

When you index data into the search and navigation service, it maps each property on your object index into different fields in the data storage engine. Each property on the object produces several fields in the storage or search engine to search in the fields using other models for tokenizing the data in the property. Mappings are the metadata stored within the search engine. The larger the mappings grow, the more load is put on the search engine, which can cause the service to produce high latency or unavailability in the worst case.

You should limit the number of properties and fields in your objects indexed in the Optimizely Search & Navigation service.

Avoid using large dictionaries or dictionaries with dynamic data that are constantly changing because the mappings in the search engine never get deleted and continually grow. Ultimately, the search engine uses RAM and disk space to store those fields because it creates data structures to search and aggregate them, which can get costly with many fields and degrade performances.

Index using local or service queue

By default, the indexing process uses a local queue on the site when you save, publish, move, or delete content. It stores a reference in the queue with its operation, and another thread pulls items from the local queue every 5 seconds for indexing. This procedure makes indexing more efficient, reducing the number of requests from the site to the service.

[New in Optimizely Search & Navigation version 13.4.2]

Version 13.4.2 introduced a service queue that you can use instead of or with the local queue. The service queue is turned off by default but can be enabled. The service queue speeds up the site's indexing job because the indexing/bulk/delete requests are returned when the items are put into a queue. The indexing processor will index the content in the order they come in. Optimizely Search & Navigation prioritizes search performance over indexing time, possibly delaying indexing during high peaks.

To enable the service queue, add "disableServiceQueue=false" in the episerver.find element. Open the web.config/app.config file and add the attribute as in this example.

<episerver.find serviceUrl="http://..." defaultIndex="myindex" disableServiceQueue=false/>

You can turn off the local queue in a similar way, using the useLocalQueue attribute.

<episerver.find serviceUrl="http://..." defaultIndex="myindex" useLocalQueue=false/>

Index objects

Indexing occurs with the Index method, exposed by the IClient interface. If you have an instance of a Client and an object to index, you can index using this code.

IClient client = //A client retrieved from config or injected into the method
BlogPost blogPost = //An instance of an arbitrary class

client.Index(blogPost)

You can index several objects in a batch.

BlogPost blogPost = //An instance of an arbitrary class
  Article article = //An instance of another arbitrary class

  //Indexing supplying objects as params
  client.Index(blogPost, article);

var listOfObjects = new List<object> {
  blogPost,
  article
};
//Indexing supplying IEnumerable
client.Index(listOfObjects);

When an object is indexed, an instance of the IndexResult class is returned. Use that class to verify that the indexing was successful and retrieve the document ID.

var result = client.Index(blogPost);
bool succesfull = result.Ok;
string id = result.Id;

Time delay

After an object is indexed, it is instantly available for retrieval with the Client Get method. However, you must refresh the index before returning the object in search results, which happens automatically every second. However, if an object must be available immediately, modify the client command that tells the service to refresh the index. Only do this if really necessary (and preferably only while testing or debugging) because it can negatively affect performance.

client.Index(blogPost, x => x.Refresh = true);

Identity

Unless specified, the service automatically assigns an ID to an indexed document. To select an ID explicitly, modify the command or annotate a property on the indexed class with the ID attribute. In either case, the ID's value must be compatible with the DocumentID type.

//Specifying the id by modifying the command
client.Index(blogPost, x => x.Id = 42);

//Specifying that a property should be used as id
public class BlogPost {
  [Id]
  public int Id {
    get;
    set;
  }
}

You can also modify the Client class conventions to use a specific property or method as the ID for all instances of a type without changing the actual class.

client.Conventions.ForInstancesOf<Product>()
    .IdIs(x => x.Key);

Ignore properties

To exclude individual properties in a class from being indexed, annotate them with the JSONIgnore attribute. You can also exclude properties without modifying their classes with Client class conventions.

public class BlogPost {
  [JsonIgnore]
  public int SomethingInternal {
    get;
    set;
  }
}

Customize type indexing

There are several ways to customize how type is serialized and indexed. You can exclude properties, remove HTML tags in string properties, and include return values of methods to use later when searching or filtering.

Update a single field

You can update a single field if you have the indexed item's ID.

client.Update<BlogPost>(Id).Field(x => x.PublishDate, newTime).Execute();

Limit the depth of ContentAreas to be indexed

You can modify a JSON contract to limit the maximum depth of ContentAreas to index. If your site architecture features a complex structure of nested ContentAreas, using the limit should improve indexing and searching performance.

SearchClient.Instance.Conventions.ForInstancesOf<ContentArea>().ModifyContract(x => x.Converter = new MaxDepthContentAreaConverter(1));

Size of index requests

When performing index requests, you should not exceed the maximum request size (which is 50 MB by default).

📘

Note

Maximum size refers to the base64 encoded file size, which means that the maximum is effectively 37 MB.

If a batch exceeds the maximum and the Optimizely Search & Navigation service rejects it, the Optimizely Search & Navigation client downsizes and then attempts a retry. Sometimes, you could improve performance by limiting batches to a size less than the maximum.

You can implement code that adjusts batch sizes. Specifically, you can control ContentBatchSize (for content) and MediaBatchSize (for event-driven indexing), as illustrated below. With the Find indexing job, only ContentBatchSize applies.

[InitializableModule]
[ModuleDependency(typeof (IndexingModule))]
public class IndexingConventions: IInitializableModule {
  public void Initialize(InitializationEngine context) {
    ContentIndexer.Instance.ContentBatchSize = 50
    ContentIndexer.Instance.MediaBatchSize = 1
  }
}

The method illustrated below, IsFileSizeLimitReached, (which you can use in a convention), has two goals:

  • Adjusts batch size
  • It avoids attempts to index files that exceed the maximum.
// The media object will be indexed without attachment or...
ContentIndexer.Instance.Conventions.ForInstancesOf<MyMediaData>().IndexAttachment(x => !IsFileSizeLimitReached(x));
or
// ...the media object won't be indexed.
ContentIndexer.Instance.Conventions.ForInstancesOf<MyMediaData>().ShouldIndex(x => !IsFileSizeLimitReached(x));

private static bool IsFileSizeLimitReached(IBinaryStorable binaryContent) {
  const int limitKb = 37000;
  var fileSize = 0.0;
  try {
    var blobByte = (binaryContent.BinaryData as AzureBlob)?.ReadAllBytes() ??
      (binaryContent.BinaryData as FileBlob)?.ReadAllBytes();
    var content = binaryContent.CastTo<IContent>();
    if (blobByte != null) {
      fileSize = blobByte.Length;
      var isLimitReached = (int)(fileSize / 1024) >= limitKb;
      return isLimitReached;
    }
    return false;
  } catch (Exception ex) {
    var content = binaryContent.CastTo<IContent>();
    return false;
  }
}