HomeDev GuideRecipesAPI ReferenceGraphQL
Dev GuideAPI ReferenceUser GuideGitHubNuGetDev CommunityDoc feedbackLog In
GitHubNuGetDev CommunityDoc feedback


## Include external content

Optimizely Search & Navigation supports these predefined search connector types to include external content: Crawler and RSS/Atom.

Content indexed with connectors uses `EPiServer.Find.Framework.WebContent`.



## Exclude media types

You can fine-tune indexing by excluding internet media types, and excluding or including parts of a website to be crawled and indexed from the Optimizely Search & Navigation administrative interface. See [Add connectors](🔗) in the Optimizely User Guide.

When excluding media types, follow the standard method of classifying internet file types. See also: [Media Types](🔗). The following media types are excluded by default when indexing:

  • text/css

  • text/javascript

  • text/ecmascript

  • application/x-pointplus

  • application/x-javascript

  • application/javascript

  • application/ecmascript

## Exclude query strings

You can exclude any query string. As a use case, exclude known tracking URL parameters. For example, in the URL _[http://www.episerver.se/cms/innehallshantering?utm\_source=google](🔗)_, you can exclude _utm\_source_ to prevent the unintentional incrementing of a campaign counter.

Common exclusions of this type:

  • PHPSESSID

  • SESSIONID

  • JSESSIONID

  • ASPSESSIONID

  • sid

  • zenid

Note

All strings are case-sensitive. Include no wildcards nor whitespaces.

## Patterns and globbing

_Globbing_ lets you expand a non-specific file name containing a wildcard character into a set of specific file names for storage on a computer, server, or network. All excluded fields support common glob patterns. The crawler connector uses patterns similar to those in _robots.txt_.

**Pattern****Example****Corresponding regex**Column Title
Column Title
'\*'\*/abc/ <br>/root/.\*/abc/.\* <br>.\*://.\*/root/.\*

'?'\*/???/.\*/.../.\*

'{', '}', ','{abc,def}.\*(abcdef).\*
'[', ']', '!', ','[0-9,xyz][!abc].\*[0-9xyz][^abc].\*

','abc,def.\*abc,def.\*

'\'\\\*\?\,\{\}\[\]\\.\*\\\*\?\,\{\}\[\]\\.\*

'.', '(', ')', '+', '', '^', '$', '@', '%' .()+^$@%.\*\.\+|\^\$\@\%.\*

## Include patterns

Parameter name `included\_crawl\_patterns`. Can be a single globbing pattern as string or an array of globbing patterns. Default: Seed base URLs.

## Exclude patterns

Parameter name `excluded\_crawl\_patterns`. Can be a single globbing pattern as a string or an array of globbing patterns. Overrides include patterns. Default: '.{avi,bmp,css,gif,gz,ico,jpeg,jpg,js,m4v,mid,mov,mp2,mp3,mp4,mpeg,png,ram,rar,rm,smil,swf,tif,tiff,wav,wma,wmv,zip}'

## No index patterns

Parameter name `excluded\_index\_patterns` can be a single globbing pattern as a string or an array of globbing patterns.