File attachments
Describes supported file formats and explains how to work with search relevancy for attached files.
With Optimizely Search & Navigation, you can index external attachments in different formats, such as Word and PDF.
To index attachments using the .NET API, create an instance of a class that has a property of type Attachment (found in the EPiServer.Find
namespace). The Attachment
class constructor has a single parameter of type Func<FileStream>
. Another class, FileAttachment
(also in the EPiServer.Find
namespace) requires a file path as a constructor parameter.
For example, you create a class named Document
.
public class Document {
public string Name {
get;
set;
}
public Attachment Attachment {
get;
set;
}
}
You can index an instance of the Document
class to index a Word document along with some metadata (Name
in this example).
var path = "TestData/Memoirs.docx";
var document = new Document() {
Name = "My memoirs",
Attachment = new FileAttachment(path);
}
client.Index(document);
You can search the indexed Word document. For example, if it contains "Banana," the result variable below would have a hit.
var result = client.Search<Document>()
.For("Banana").GetResult();
Note
A REST API issue causes an exception the first time an instance of a type with an Attachment property (document in this example) is indexed. This only happens the first time--after that, everything works as expected.
Improve search relevancy of attachments
By default, search relevancy for text inside an attachment is imperfect because it indexes attachments in the default language, which might not match the document's content. (In contrast, Optimizely Content Management System (CMS) content is indexed using enabled languages to improve search relevancy.)
Also, when browsing Optimizely Search & Navigation's explore view of an attachment, the attachment text is not readable because it is indexed using the base64 representation of itself.
To improve the search relevancy of text attachments, use the IAttachmentHelper
interface, which enables developers to implement their parsing of attachments. Optimizely provides an implementation of IAttachmentHelper
that uses Microsoft IFilter
functionality. For this to work, the correct IFilters
need to be installed on the client.
Implement the IAttachmentHelper
- Install the
EPiServer.Find.Cms.AttachmentFilter
NuGet package. - Determine which attachment file types you want to support (PDF and Microsoft Word). Each file type has a corresponding filter. The list of file types and filters is below.
- Download and install the selected filters.
- Restart.
- Add some supported file attachments under your site's media folder.
- Log into your website and go to Find > Overview > Explore.
- Find the attachments and verify that it stores their content as readable text under
SearchAttachmentText$$String
.
Supported file formats
Ifilters
 and Optimizely Search & Navigation can parse the following file types.
adw, ai, doc, docm, docx, dwg, eps, gif, html, htm, jpeg, jpg, mm, msg, odt, ods, odp, odi, one, otf, otp, pdf, png, ppt, pptm, pptx, ps, rar, sda, sdg, sdm, sfs, sgf, smf, std, sti, stw, svg, sxd, sxi, txt, vdx, vsd, vdx, vor, vss, vst, vsx, vtx, wma, wmv, xls, xlsb, xlsm, xlsx, xml, zip
For many file types, more than one filter is available. See below for common file types and filters, and additional filter sources.
PDF
Adobe has a PDFÂ IFIlter
, although it does not work in all environments. If your environment is not supported, try the PDF-XChange Viewer from Tracker Software.
Microsoft Office 2010 filter packsÂ
- Legacy Office Filter (97-2003; .doc, .ppt, .xls)
- Metro Office Filter (2007; .docx, .pptx, .xlsx)
- Zip Filter
- OneNote filter
- Visio Filter
- Publisher Filter
- Open Document Format Filter
Updated 9 months ago