.NET Indexing
Explains what indexing in .NET is, what happens when objects are indexed, and how to modify the indexing, for example with regards to availability and identity of indexed documents.
Indexing is the process of sending an object to the Optimizely Search & Navigation service for storage and analysis, so it can be retrieved as search results. If a document of the same type and ID already exists, it is overwritten. The .NET client API supports the indexing of any .NET object.
Because objects are serialized upon indexing, the only restriction on the object is serializable. If for some reason, typically circular references, an object cannot be serialized, the API supports customizing how an object is serialized and indexed. This flexibility can also be used to implement functionality, such as indexing the return value of extension methods.
Index using local or service queue
The indexing process uses a local queue on the site by default. The local queue is used when content is saved, published, moved or deleted. A reference is stored in the queue together with its operation, and another thread is pulling items from the local queue every 5 seconds for indexing. This procedure makes indexing more efficient, reducing number of requests from the site to the service.
[New in Optimizely Search & Navigation version 13.4.2]
In version 13.4.2, a service queue was introduced which can be used instead of, or together with, the local queue. The service queue is disabled by default, but can easily be enabled. The service queue speeds up the indexing job on the site, because the indexing/bulk/delete request will be returned once the items have been put into a queue. The indexing processor will index the content in the order they come in. Search performance is prioritized over indexing time, with the possibility to delay indexing during high peaks.
To enable the service queue, add "disableServiceQueue=false"
in the episerver.find
element. Open the web.config/app.config file and add the attribute as in this example.
<episerver.find serviceUrl="http://..." defaultIndex="myindex" disableServiceQueue=false/>
You can disable the local queue in a similar way, using the useLocalQueue
attribute.
<episerver.find serviceUrl="http://..." defaultIndex="myindex" useLocalQueue=false/>
Index objects
Indexing is done via the Index
method, exposed by the IClient
interface. If you have an instance of a Client
and an object to index, you can index using this code.
IClient client = //A client retrieved from config or injected into the method
BlogPost blogPost = //An instance of an arbitrary class
client.Index(blogPost)
You can index several objects in a batch.
BlogPost blogPost = //An instance of an arbitrary class
Article article = //An instance of another arbitrary class
//Indexing supplying objects as params
client.Index(blogPost, article);
var listOfObjects = new List<object>
{
blogPost,
article
};
//Indexing supplying IEnumerable
client.Index(listOfObjects);
Once an object is indexed, an instance of the IndexResult
class is returned. Use that class to verify that the indexing was successful and retrieve the document ID.
var result = client.Index(blogPost);
bool succesfull = result.Ok;
string id = result.Id;
Time delay
After an object is indexed, it is instantly available for retrieval via the Client
Get
method. However, before the object is returned in search results, the index must be refreshed. This happens automatically every second. However, if it is crucial that an object be available immediately, modify the client command that tells the service to refresh the index. Only do this if really necessary (and preferably only while testing or debugging), since it can negatively affect performance.
client.Index(blogPost, x => x.Refresh = true);
Identity
Unless specified, the service automatically assigns an ID to an indexed document. To explicitly specify an ID, either modify the command or annotate a property on the indexed class with the ID attribute. In both cases, the ID's value must be compatible with the DocumentID
type.
//Specifying the id by modifying the command
client.Index(blogPost, x => x.Id = 42);
//Specifying that a property should be used as id
public class BlogPost
{
[Id]
public int Id { get; set; }
}
You can also modify the Client
class conventions to use a specific property or method as the ID for all instances of a type without modifying the actual class.
client.Conventions.ForInstancesOf<Product>()
.IdIs(x => x.Key);
Ignore properties
To exclude individual properties in a class from being indexed, annotate them with the JSONIgnore
attribute. You can also exclude properties without modifying their classes via Client
class conventions.
public class BlogPost
{
[JsonIgnore]
public int SomethingInternal { get; set; }
}
Customize type indexing
There are several ways to customize how type is serialized and indexed. You can exclude properties, remove HTML tags in string properties, and include return values of methods so they can be used later when searching or filtering.
Update a single field
You can update a single field if you have the indexed item's ID.
client.Update<BlogPost>(Id).Field(x => x.PublishDate, newTime).Execute();
Limit the depth of ContentAreas to be indexed
You can modify a JSON contract to limit the maximum depth of ContentAreas
to index. If your site architecture features a complex structure of nested ContentAreas
, using the limit should improve the performance of indexing and searching.
SearchClient.Instance.Conventions.ForInstancesOf<ContentArea>().ModifyContract(x => x.Converter = new MaxDepthContentAreaConverter(1));
Size of index requests
When performing index requests, you should not exceed the maximum request size (by default, 50 MB).
Note
Maximum size refers to the base64 encoded file size, which means that the maximum is effectively 37 MB.
If a batch exceeds the maximum and is rejected by the Optimizely Search & Navigation service, the Optimizely Search & Navigation client downsizes then attempts a retry. In some cases, you could improve performance by limiting batches to a size less than the maximum.
You can implement code that adjusts batch sizes. Specifically, you can control ContentBatchSize
(for content) and MediaBatchSize
(for event-driven indexing), as illustrated below. With the Find indexing job, only ContentBatchSize
applies.
[InitializableModule]
[ModuleDependency(typeof(IndexingModule))]
public class IndexingConventions : IInitializableModule
{
public void Initialize(InitializationEngine context)
{
ContentIndexer.Instance.ContentBatchSize = 50
ContentIndexer.Instance.MediaBatchSize = 1
}
}
The method illustrated below, IsFileSizeLimitReached
, which could be used in a convention, has two goals:
- Adjusts batch size
- Avoids attempts to index files that exceed the maximum
// The media object will be indexed without attachment or...
ContentIndexer.Instance.Conventions.ForInstancesOf<MyMediaData>().IndexAttachment(x => !IsFileSizeLimitReached(x));
or
// ...the media object won't be indexed.
ContentIndexer.Instance.Conventions.ForInstancesOf<MyMediaData>().ShouldIndex(x => !IsFileSizeLimitReached(x));
private static bool IsFileSizeLimitReached(IBinaryStorable binaryContent)
{
const int limitKb = 37000;
var fileSize = 0.0;
try
{
var blobByte = (binaryContent.BinaryData as AzureBlob)?.ReadAllBytes() ??
(binaryContent.BinaryData as FileBlob)?.ReadAllBytes();
var content = binaryContent.CastTo<IContent>();
if (blobByte != null)
{
fileSize = blobByte.Length;
var isLimitReached = (int)(fileSize / 1024) >= limitKb;
return isLimitReached;
}
return false;
}
catch (Exception ex)
{
var content = binaryContent.CastTo<IContent>();
return false;
}
}
Updated 29 days ago