Connectors
Describes predefined search connectors and configuration options for them in Optimizely Search & Navigation. Search connectors let you index content from other sources, and integrate search results for these on your website.
Include external content
Optimizely Search & Navigation supports these predefined search connector types to include external content: Crawler and RSS/Atom.
Content indexed with connectors uses EPiServer.Find.Framework.WebContent
.
public class WebContent {
public String SearchTitle;
public String SearchHitUrl;
public String SearchText;
public String SearchSummary;
public Dictionary<string, IndexValue> SearchMetaData {
get;
set;
}
}
Exclude media types
You can fine-tune indexing by excluding internet media types and excluding or including parts of a website to be crawled and indexed from the Optimizely Search & Navigation administrative interface. See Add connectors in the Optimizely User Guide.
When excluding media types, follow the standard method of classifying internet file types. See also: Media Types. The following media types are excluded by default when indexing:
- text/css
- text/javascript
- text/ecmascript
- application/x-pointplus
- application/x-javascript
- application/javascript
- application/ecmascript
Exclude query strings
You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL http://www.episerver.se/cms/innehallshantering?utm\_source=google
, you can exclude utm\_source
to prevent the unintentional incrementing of a campaign counter.
Common exclusions of this type:
- PHPSESSID
- SESSIONID
- JSESSIONID
- ASPSESSIONID
- sid
- zenid
Note
All strings are case-sensitive. Include no wildcards nor whitespaces.
Patterns and globbing
Globbing lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. Excluded fields support common glob patterns. The crawler connector uses patterns similar to those in robots.txt
.
Pattern | Example | Corresponding regex | ||
---|---|---|---|---|
'*' | */abc/ /root/ | .*/abc/.* .*://.*/root/.* | ||
'?' | */???/ | .*/.../.* | ||
'{', '}', ',' | {abc,def} | .*(abc | def).* | |
'[', ']', '!', ',' | [0-9,xyz][!abc] | .*[0-9xyz][^abc].* | ||
',' | abc,def | .*abc,def.* | ||
'' | \*?,{}[]\ | .*\*?,{}[]\.* | ||
'.', '(', ')', '+', ' | ', '^', '$', '@', '%'Â | .()+ | ^$@% | .*.+|^$@%.* |
Include patterns
Parameter name included\_crawl\_patterns
. It can be a single globbing pattern as a string or an array of globbing patterns.
Default: Seed base URLs.
Exclude patterns
Parameter name excluded\_crawl\_patterns
. It can be a single globbing pattern as a string or an array of globbing patterns. Overrides include patterns.
Default: '.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}'
No index patterns
Parameter name excluded\_index\_patterns
can be a single globbing pattern as a string or an array of globbing patterns.
Updated 8 months ago