HomeDev guideAPI ReferenceGraphQL
Dev guideUser GuideGitHubNuGetDev CommunitySubmit a ticketLog In
GitHubNuGetDev CommunitySubmit a ticket

Fuzzy search

Explanation and examples on how to support fuzzy search matching in Optimizely Graph.

Site visitors and other users may not be able to get the content items that they are looking for. One of the reasons could be misspelled query terms. It is easy to make a spelling mistake. To make sure that your site search can still return relevant results, Optimizely Graph supports fuzzy search.

Under the hood, Optimizely applies approximate string matching on both the query terms and the terms occurring in the content items that you have synchronized to Optimizely Graph. For example, when you have a query entered as Arnodl Schwarzeneggerr, with fuzzy matching support enabled, the system will return content items that have Arnold Schwarzenegger.

The Optimizely Graph query language supports fuzzy search for (searchable) string fields, including the _fulltext field. It can be enabled using the fuzzy operator option (similar to boost and synonym), which is of type Boolean. By default, it is false and disabled. It is supported for the following operators:

  • eq and notEq
  • in and notIn
  • contains
  • match

📘

Note

With the contains, the fuzzy option works for each word. Optimizely extracts words from a string value using a non-word boundary, meaning special characters like @ or - are used for tokenization.

For example, Sn@pdragon is split into two words, sn and pdragon. Applying fuzzy with sn@pdragon on such words will not retrieve results when the actual value is Snapdragon because the distance for each word is too great.

The algorithm used by Optimizely Graph to measure the distance between two strings is the Levenshtein distance. The edit distance is automatically calculated by the length of the query term. The distance is calculated as follows:

  • For contains, the edit distance is determined by each word. The distance is calculated for the whole value for eq and in (and their inverses).
  • The edit distance is calculated with the following heuristics:
    • If a word is between zero and two characters long, it must be exact (distance of zero).
    • If a word is between three and five characters long, the edit distance is one.
    • If a word is six characters or greater long, the edit distance is two.

📘

Fuzzy and synonyms

Fuzzy search is not applied with synonyms. With synonyms, only exact (case-insensitive) values are expanded by synonyms.

Examples

A misspelled name with eq matching will return correct results when fuzzy: true. Otherwise, it would not return results.

{
  BiographyPage(where: { Name: { eq: "Arnodl Schwarzeneggerr", fuzzy: true } }) {
    items {
      Name
    }
  }
}

Similarly for the contains operator for searchable string fields, Optimizely Graph will return results with this query.

{
  BiographyPage(where: { Name: { contains: "Swarzenegger", fuzzy: true } }) {
    items {
      Name
      Die
      Born
      Language {
        DisplayName
        Name
      }
    }
  }
}

The following query will not return results with contains with the value Xiaomi Youpin Lydsto because Optimizely does a fuzzy match on each word, and the distance for XiaomiYoupin is too great to match on any of the words.

{
  TemporaryPage(
    locale: ALL
    where: { Product: { contains: "XiaomiYoupin Lydsto", fuzzy: true } }
    orderBy: { Product: ASC }
    limit: 100) {
  	total
  	items {
    	Product
  	}
	}
}

However, this query will return results with the value ferrari-purosangue because it will match the word purosanqe as the distance between both words is two. The fuzzy matching does not have an impact on ferrary in case the content is synchronized to the English locale, as the system applies stemming then, so ferrary => ferrari.

{
  TemporaryPage(
    locale: ALL
    where: { Product: { contains: "ferrary-purosanqe", fuzzy: true } }
    orderBy: { Product: ASC }
    limit: 100) {
    total
    items {
      Product
    }
  }
}