HomeGuidesAPI Reference
Submit Documentation FeedbackJoin Developer CommunityLog In

Connectors

This topic describes predefined search connectors and configuration options for them in Optimizely Search & Navigation. Search connectors let you index content from other sources, and integrate search results for these on your website.

Including external content

Optimizely Search & Navigation supports these predefined search connector types to include external content: Crawler and RSS/Atom.

Content indexed with connectors uses EPiServer.Find.Framework.WebContent.

public class WebContent
  {
    public String SearchTitle;
    public String SearchHitUrl;
    public String SearchText;
    public String SearchSummary;
    public Dictionary<string, IndexValue> SearchMetaData { get; set; }
  }

Fine-tuning crawling and indexing

You can fine-tune indexing by excluding internet media types, and excluding or including parts of a website to be crawled and indexed from the Optimizely Search & Navigation administrative interface. See the Optimizely User Guide.

Excluding media types

When excluding media types, follow the standard method of classifying internet file types. See also: Media Types. The following media types are excluded** **by default when indexing:

  • text/css
  • text/javascript
  • text/ecmascript
  • application/x-pointplus
  • application/x-javascript
  • application/javascript
  • application/ecmascript

Excluding query strings

You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL http://www.episerver.se/cms/innehallshantering?utm\_source=google, you can exclude utm_source to prevent the unintentional incrementing of a campaign counter.

Common exclusions of this type:

  • PHPSESSID
  • SESSIONID
  • JSESSIONID
  • ASPSESSIONID
  • sid
  • zenid

📘

Note

All strings are case-sensitive. Include no wildcards nor whitespaces.

Patterns and globbing

Globbing lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. All excluded fields support common glob patterns. The crawler connector uses patterns similar to those in robots.txt.

PatternExampleCorresponding regex
'*'*/abc/
/root/
.*/abc/.*
.*://.*/root/.*
'?'*/???/.*/.../.*
'{', '}', ','{abc,def}.*(abc
'[', ']', '!', ','[0-9,xyz][!abc].*[0-9xyz][^abc].*
','abc,def.*abc,def.*
''\*?,{}[]\.*\*?,{}[]\.*
'.', '(', ')', '+', '', '^', '$', '@', '%' .()+

Include patterns

Parameter name included_crawl_patterns. Can be a single globbing pattern as string or an array of globbing patterns.
Default: Seed base URLs.

Exclude patterns

Parameter name excluded_crawl_patterns. Can be a single globbing pattern as a string or an array of globbing patterns. Overrides include patterns.
Default: '.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}'

No index patterns

Parameter name excluded_index_patterns can be a single globbing pattern as a string or an array of globbing patterns.


Did this page help you?