With Optimizely Search & Navigation , you can index external attachments in different formats, such as Word and PDF.
To index attachments using the .NET API, create an instance of a class that has a property of type Attachment (found in the `
EPiServer.Find` namespace). The `
Attachment` class constructor has a single parameter of type `
Func<FileStream>`. Another class, `
FileAttachment` (also in the `
EPiServer.Find` namespace) requires a file path as a constructor parameter.
For example, you create a class named `
You can index an instance of the `
Document` class to index a Word document along with some metadata (`
Name` in this example).
You can search the indexed Word document. For example, if it contains "Banana," the result variable below would contain a hit.
A REST API issue causes an exception the first time an instance of a type with an Attachment property (document in this example) is indexed. This only happens the first time--after that, everything works as expected.
## Improve search relevancy of attachments
By default, search relevancy for text inside an attachment is imperfect. This is because attachments are indexed in the default language, which might not match the document's content. (Optimizely Content Management System (CMS) content, in contrast, is indexed using all enabled languages to improve search relevancy.)
Also, when browsing Optimizely Search & Navigation's explore view of an attachment, the attachment text is not readable, because it is indexed using the base64 representation of itself.
To improve the search relevancy of text attachments, use the `
IAttachmentHelper` interface, which enables developers to implement their own parsing of attachments. Out of the box, Optimizely provides an implementation of `
IAttachmentHelper` that uses Microsoft `
IFilter` functionality. For this to work, the correct `
IFilters` need to be installed on the client.
### Implement the IAttachmentHelper
Install the _EPiServer.Find.Cms.AttachmentFilter_ NuGet package.
Determine which attachment file types you want to support (for example, PDF and Microsoft Word). Each file type has a corresponding filter. The list of file types and filters is below.
Download and install the selected filters.
Add some supported file attachments under your site's **media** folder.
Log into your website and browse to **Find > Overview > Explore**.
Find the attachments and verify that their content is stored as readable text under `
### Supported file formats
Ifilters` and Optimizely Search & Navigation can parse the following file types.
_adw, ai, doc, docm, docx, dwg, eps, gif, html, htm, jpeg, jpg, mm, msg, odt, ods, odp, odi, one, otf, otp, pdf, png, ppt, pptm, pptx, ps, rar, sda, sdg, sdm, sfs, sgf, smf, std, sti, stw, svg, sxd, sxi, txt, vdx, vsd, vdx, vor, vss, vst, vsx, vtx, wma, wmv, xls, xlsb, xlsm, xlsx, xml, zip_
For many file types, more than one filter is available. See below for common file types and filters, and additional filter sources.
Adobe has an PDF `
IFIlter`, although it does not work in all environments. If your environment is not supported, try the PDF-XChange Viewer from Tracker Software.
### Microsoft Office 2010 filter packs
Legacy Office Filter (97-2003; .doc, .ppt, .xls)
Metro Office Filter (2007; .docx, .pptx, .xlsx)
Open Document Format Filter
## Related topics
[Elastic search limitations](🔗)
See [Apache Tika documentation](🔗) for a list of supported formats.
[IFilterShop for more filter formats](🔗)
[Download Microsoft Office 2010 filter packs](🔗)
[Download Adobe PDF IFilter](🔗)
[Download PDF-XChange Viewer from Tracker Software](🔗)
Blog post: [Get related hits on attached documents for pages in Episerver Search & Navigation](🔗)