Disclaimer: This website requires Please enable JavaScript in your browser settings for the best experience.

HomeDev GuideRecipesAPI Reference
Dev GuideAPI ReferenceUser GuideGitHubNuGetDev CommunityOptimizely AcademySubmit a ticketLog In
Dev Guide

Connect external sources to search

Describes predefined search connectors and configuration options for them in Optimizely Search & Navigation. Search connectors let you index content from other sources, and integrate search results for these on your website.

A search connector is a component that facilitates the integration between Optimizely's search infrastructure and external data sources or systems. Search connectors enable the indexing and retrieval of content from various sources, making it accessible and searchable through the Optimizely platform.

Include external content

Optimizely Search & Navigation supports these predefined search connector types to include external content: Crawler and RSS/Atom.

Content indexed with connectors uses EPiServer.Find.Framework.WebContent.

public class WebContent {
  public String SearchTitle;
  public String SearchHitUrl;
  public String SearchText;
  public String SearchSummary;
  public Dictionary<string, IndexValue> SearchMetaData {
    get;
    set;
  }
}

Exclude media types

You can fine-tune indexing by excluding internet media types and excluding or including parts of a website to be crawled and indexed from the Optimizely Search & Navigation administrative interface. See Add connectors.

When excluding media types, follow the standard method of classifying internet file types. See also: Media Types. The following media types are excluded by default when indexing:

  • text/css
  • text/javascript
  • text/ecmascript
  • application/x-pointplus
  • application/x-javascript
  • application/javascript
  • application/ecmascript

Exclude query strings

You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL http://www.episerver.se/cms/innehallshantering?utm\_source=google, you can exclude utm\_source to prevent the unintentional incrementing of a campaign counter.

Common exclusions of this type:

  • PHPSESSID
  • SESSIONID
  • JSESSIONID
  • ASPSESSIONID
  • sid
  • zenid

📘

Note

All strings are case-sensitive. Include no wildcards nor whitespaces.

Patterns and globbing

Globbing lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. Excluded fields support common glob patterns. The crawler connector uses patterns similar to those in robots.txt.

PatternExampleCorresponding regex
'*'*/abc/
/root/
.*/abc/.*
.*://.*/root/.*
'?'*/???/.*/.../.*
'{', '}', ','{abc,def}.*(abcdef).*
'[', ']', '!', ','[0-9,xyz][!abc].*[0-9xyz][^abc].*
','abc,def.*abc,def.*
''\*?,{}[]\.*\*?,{}[]\.*
'.', '(', ')', '+', '', '^', '$', '@', '%' .()+^$@%.*.+|^$@%.*

Include patterns

Parameter name included\_crawl\_patterns. It can be a single globbing pattern as a string or an array of globbing patterns.

Default – Seed base URLs.

Exclude patterns

Parameter name excluded\_crawl\_patterns. It can be a single globbing pattern as a string or an array of globbing patterns. Overrides include patterns.

Default –

.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}

No index patterns

Parameter name excluded\_index\_patterns can be a single globbing pattern as a string or an array of globbing patterns.