Disclaimer: This website requires Please enable JavaScript in your browser settings for the best experience.

HomeDev GuideRecipesAPI Reference
Dev GuideAPI ReferenceUser GuideGitHubNuGetDev CommunityOptimizely AcademySubmit a ticketLog In
Dev Guide

Improve search relevancy for attachments

By default, search relevancy for text inside an attachment is imperfect because it indexes attachments in the default language, which might not match the document's content. CMS content, in contrast, indexes using enabled languages to improve search relevancy.

Also, when browsing Search & Navigation's explore view of an attachment, the attachment text is not readable because it is indexed using the base64 representation of itself.

To improve the search relevancy of text attachments, use the IAttachmentHelper interface, which enables developers to implement their parsing of attachments. Out of the box, Optimizely provides an implementation of IAttachmentHelper that uses Microsoft IFilter functionality. For this to work, the correct IFilters need to be installed on the client.

You should use this package because it enhances the quality of your search.

Use the default implementation of IAttachmentHelper

  1. Install the EPiServer.Find.Cms.AttachmentFilter NuGet package.
  2. Determine which attachment file types you want to support (PDF and Microsoft Word). Each file type has a corresponding filter. The list of file types and filters is below.
  3. Download and install the selected filters.
  4. Restart.
  5. Add some supported file attachments to your site.
  6. Log into your website and go to Find > Overview > Explore.
  7. Find the attachments and verify their content is stored as readable text under SearchAttachmentText$$String.

Supported file formats

Using Ifilters with Search & Navigation, you can parse the file types below.

File types: adw, ai, doc, docm, docx, dwg, eps, gif, html, htm, jpeg, jpg, mm, msg, odt, ods, odp, odi, one, otf, otp, pdf, png, ppt, pptm, pptx, ps, rar, sda, sdg, sdm, sfs, sgf, smf, std, sti, stw, svg, sxd, sxi, txt, vdx, vsd, vdx, vor, vss, vst, vsx, vtx, wma, wmv, xls, xlsb, xlsm, xlsx, xml, zip

For many file types, more than one filter is available. For example, you can find more filters on IFilterShop.

Some common file types and their filters are listed below.

PDF

Adobe has the PDF IFIlter, although it does not work in all environments. 

Microsoft Office 2010 filter packs

Microsoft's filter pack (download here) covers the file types below.

  • Legacy Office Filter (97-2003; .doc, .ppt, .xls)
  • Metro Office Filter (2007; .docx, .pptx, .xlsx)
  • Zip Filter
  • OneNote filter
  • Visio Filter
  • Publisher Filter
  • Open Document Format Filter