Dev guideRecipesAPI ReferenceChangelog
Dev guideRecipesUser GuidesNuGetDev CommunityOptimizely AcademySubmit a ticketLog In
Dev guide

Stop words

Describes how to enable and use stop words.

🚧

Important

This feature is deprecated. For new implementations, rely on the default analyzers and relevance ranking in Optimizely Graph instead of maintaining a custom stop list.

Stop words are the words in a stop list (or negative dictionary) that are filtered out (stopped) before or after processing of natural language data (text) because they are insignificant. Use stop words to remove low-value tokens from full-text search results so that queries focus on meaningful terms and surface more relevant matches.

A use case of stop words, besides stopping unimportant words from being processed, is stopping words that are considered noise from a business or societal perspective. This means that the search engine retrieves no matches for the queried stop words. Optimizely Graph does not use stop words by default, but you can configure them.

Prerequisites

Before you configure stop words, confirm the following:

  • An Optimizely Graph account that has not yet been provisioned and synchronized. Stop lists cannot be updated after provisioning.
  • An HMAC key and secret with permission to call the PUT <GATEWAY_URL>/resources/stopwords REST endpoint.
  • The list of stop words prepared as a text file with one entry per line, with each line under 1,000 characters or bytes and a total under 50,000 entries.

Stop words with full-text search

This section describes how Optimizely Graph applies a stop list during indexing and querying so that you can predict which tokens are filtered and which queries return results.

The following list of English words are often considered stop words:

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with.

A stop word is usually a single word used as a filter to stop a token from being indexed.

For example, if a field has the value "the dog is at the park" and you use this stop list, the indexed tokens become ["dog", "park"], and you can only match on these two tokens when doing a full-text search using the contains or like operators.

❗️

Warning

Stop words are supported for searchable string fields but not supported for normal string, number, date, and Boolean fields.

In the latter field types, stop words are not applied, and results are found when querying with stop words. Optimizely Graph supports only single-token stop words, and multi-word stop words are not applied.

Store custom stop words

Store custom stop words to apply your own filtering rules at index time and query time, instead of relying on the default behavior of Optimizely Graph (no stop words). Stop words are stored as a text file, each line being a single stop word.

🚧

Important

A line in the stop list cannot exceed 1,000 characters or bytes, and the maximum number of entries is 50,000.

Stop words are treated case-sensitively at index and query time. For example, the is different than The. This could be useful to fully index and query on The Guardian (newspaper) but ignore the in the guardian with full-text search.

The following is an example of a list of stop words. The query examples that follow reference the, Schwarzenegger, and amy; Bob is included to show that an additional entry can sit in the list without affecting the examples.

the
Schwarzenegger
amy
Bob

Store stop words using the REST endpoint configured in the GraphQL gateway. It requires authorization using your HMAC key and secret.

  • PUT <GATEWAY_URL>/resources/stopwords with the following optional query string:
    • language_routing – Stores the custom stop words in the request body for a specific locale (default is standard, that is, no locale).

The body should contain stop words as previously described, or can be empty if you do not want to configure any stop words (the default behavior). When you do not use a query parameter with this endpoint, the custom stop list is applied to the NEUTRAL locale (index with no languages configured).

After storing stop words, they are automatically applied when synchronizing content and ignored when querying with Optimizely Graph.

❗️

Warning

You must store your stop words in Optimizely Graph before provisioning your account and synchronizing content. You cannot update stop lists after your account is provisioned. To update your stop list, you must do the following:

  1. Delete account.
  2. Upload the updated stop list with the PUT endpoint as previously described.
  3. Create account.
  4. Synchronize content.

Query examples

The following examples show how a configured stop list affects results for searchable string fields and how case sensitivity changes the outcome. The examples reference a BiographyPage content type that exposes Name, Die, Born, and Language fields supplied by the demo schema (these are not built-in Optimizely Graph fields).

For full-text search with the contains and like operators on searchable string fields, Optimizely only permits single-token stop words, and multi-word stop words are not applied.

When Schwarzenegger is a stop word and occurs as Schwarzenegger (case-sensitive) in your content, the following query does not return any results.

{
  BiographyPage(where: { Name: { contains: "Schwarzenegger" } }) {
    items {
      Name
      Die
      Born
      Language {
        DisplayName
        Name
      }
      _score
    }
  }
}

However, if the name Amy Winehouse occurs in your content but amy (note the lowercase) is defined as a stop word, the following GraphQL query still returns a result because the term Amy (note the uppercase) was never stopped from being indexed and returns the result.

{
  BiographyPage(where: { Name: { contains: "Amy" } }) {
    items {
      Name
      Die
      Born
      Language {
        DisplayName
        Name
      }
      _score
    }
  }
}

This query is equivalent in this form and also returns the result.

{
  BiographyPage(where: { Name: { like: "%Amy%" } }) {
    items {
      Name
      Die
      Born
      Language {
        DisplayName
        Name
      }
      _score
    }
  }
}

Both examples return this result:

{
  "data": {
    "BiographyPage": {
      "items": [
        {
          "Name": "Amy Winehouse",
          "Die": "2011-07-23T00:00:00Z",
          "Born": "1983-11-14T00:00:00Z",
          "Language": {
            "DisplayName": "English",
            "Name": "en"
          },
          "_score": 1.6928279
        }
      ]
    }
  }
}

Stop words are processed case-sensitively at indexing time. Therefore, the following query does not return any results because it is a stop word.

{
  BiographyPage(where: { Name: { contains: "amy" } }) {
    items {
      Name
      Die
      Born
      Language {
        DisplayName
        Name
      }
      _score
    }
  }
}