Connect external sources to search
Describes predefined search connectors and configuration options for them in Optimizely Search & Navigation. Search connectors let you index content from other sources, and integrate search results for these on your website.
A search connector is a component that facilitates the integration between Optimizely's search infrastructure and external data sources or systems. Search connectors enable the indexing and retrieval of content from various sources, making it accessible and searchable through the Optimizely platform.
Include external content
Optimizely Search & Navigation supports these predefined search connector types to include external content: Crawler and RSS/Atom.
Content indexed with connectors uses EPiServer.Find.Framework.WebContent
.
public class WebContent {
public String SearchTitle;
public String SearchHitUrl;
public String SearchText;
public String SearchSummary;
public Dictionary<string, IndexValue> SearchMetaData {
get;
set;
}
}
Exclude media types
You can fine-tune indexing by excluding internet media types and excluding or including parts of a website to be crawled and indexed from the Optimizely Search & Navigation administrative interface. See Add connectors.
When excluding media types, follow the standard method of classifying internet file types. See also: Media Types. The following media types are excluded by default when indexing:
- text/css
- text/javascript
- text/ecmascript
- application/x-pointplus
- application/x-javascript
- application/javascript
- application/ecmascript
Exclude query strings
You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL http://www.episerver.se/cms/innehallshantering?utm\_source=google
, you can exclude utm\_source
to prevent the unintentional incrementing of a campaign counter.
Common exclusions of this type:
- PHPSESSID
- SESSIONID
- JSESSIONID
- ASPSESSIONID
- sid
- zenid
Note
All strings are case-sensitive. Include no wildcards nor whitespaces.
Patterns and globbing
Globbing lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. Excluded fields support common glob patterns. The crawler connector uses patterns similar to those in robots.txt
.
Pattern | Example | Corresponding regex | ||
---|---|---|---|---|
'*' | */abc/ /root/ | .*/abc/.* .*://.*/root/.* | ||
'?' | */???/ | .*/.../.* | ||
'{', '}', ',' | {abc,def} | .*(abc | def).* | |
'[', ']', '!', ',' | [0-9,xyz][!abc] | .*[0-9xyz][^abc].* | ||
',' | abc,def | .*abc,def.* | ||
'' | \*?,{}[]\ | .*\*?,{}[]\.* | ||
'.', '(', ')', '+', ' | ', '^', '$', '@', '%'Â | .()+ | ^$@% | .*.+|^$@%.* |
Include patterns
Parameter name included\_crawl\_patterns
. It can be a single globbing pattern as a string or an array of globbing patterns.
Default – Seed base URLs.
Exclude patterns
Parameter name excluded\_crawl\_patterns
. It can be a single globbing pattern as a string or an array of globbing patterns. Overrides include patterns.
Default –
.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}
No index patterns
Parameter name excluded\_index\_patterns
can be a single globbing pattern as a string or an array of globbing patterns.
Updated about 8 hours ago