Connect external sources to search

A search connector is a component that facilitates the integration between Optimizely's search infrastructure and external data sources or systems. Search connectors enable the indexing and retrieval of content from various sources, making it accessible and searchable through the Optimizely platform.

Include external content

Optimizely Search & Navigation supports these predefined search connector types to include external content: Crawler and RSS/Atom.

Content indexed with connectors uses EPiServer.Find.Framework.WebContent.

public class WebContent {
  public String SearchTitle;
  public String SearchHitUrl;
  public String SearchText;
  public String SearchSummary;
  public Dictionary<string, IndexValue> SearchMetaData {
    get;
    set;
  }
}

Exclude media types

You can fine-tune indexing by excluding internet media types and excluding or including parts of a website to be crawled and indexed from the Optimizely Search & Navigation administrative interface. See Add connectors.

When excluding media types, follow the standard method of classifying internet file types. See also: Media Types. The following media types are excluded by default when indexing:

text/css
text/javascript
text/ecmascript
application/x-pointplus
application/x-javascript
application/javascript
application/ecmascript

Exclude query strings

You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL http://www.episerver.se/cms/innehallshantering?utm\_source=google, you can exclude utm\_source to prevent the unintentional incrementing of a campaign counter.

Common exclusions of this type:

PHPSESSID
SESSIONID
JSESSIONID
ASPSESSIONID
sid
zenid

📘
Note
All strings are case-sensitive. Include no wildcards nor whitespaces.

Patterns and globbing

Globbing lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. Excluded fields support common glob patterns. The crawler connector uses patterns similar to those in robots.txt.

Pattern	Example	Corresponding regex
'*'	*/abc/ /root/	./abc/. .://./root/.*
'?'	*/???/	./.../.
'{', '}', ','	{abc,def}	.*(abc	def).*
'[', ']', '!', ','	[0-9,xyz][!abc]	.[0-9xyz][^abc].
','	abc,def	.abc,def.
''	\*?,{}[]\	.\?,{}[]\.*
'.', '(', ')', '+', '	', '^', '$', '@', '%'	.()+	^$@%	..+\|^$@%.

Include patterns

Parameter name included\_crawl\_patterns. It can be a single globbing pattern as a string or an array of globbing patterns.

Default – Seed base URLs.

Exclude patterns

Parameter name excluded\_crawl\_patterns. It can be a single globbing pattern as a string or an array of globbing patterns. Overrides include patterns.

Default –

.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}

No index patterns

Parameter name excluded\_index\_patterns can be a single globbing pattern as a string or an array of globbing patterns.

Include external content

Exclude media types

Exclude query strings

📘Note

Patterns and globbing

Include patterns

Exclude patterns

No index patterns

📘
Note