HomeDev guideRecipesAPI ReferenceGraphQL
Dev guideUser GuideGitHubNuGetDev CommunitySubmit a ticketLog In
GitHubNuGetDev CommunitySubmit a ticket

Fuzzy search

Explanation and examples on how to support fuzzy search matching in Optimizely Graph

Site visitors and other users may not be able to get content items that they looked for. One of the reasons could be mispelled query terms. It is easy to make a spelling mistake. To make sure that your site search can still return relevant results, Optimizely Graph supports fuzzy search.

What happens under the hood is that we apply approximate string matching on both the query terms and the terms occurring in the content items that you have synchronized to Optimizely Graph. For example, when you have a query entered as Arnodl Schwarzeneggerr, with fuzzy matching support enabled, the system will return content items that have Arnold Schwarzenegger. Without doubt, this is a valid relevant result.

The Optimizely Graph query language supports fuzzy search for (searchable) string fields, including the _fulltext field. It can be enabled using the fuzzy operater option (similar to boost and synonym) which is of type Boolean. By default this is false and thus disabled.Β It is supported for the following operators:

  • eq and notEq
  • in and notIn
  • contains
  • match

πŸ“˜

Fuzzy with contains and special characters

With the contains, the fuzzy option works for each word. We extract words from a string value by using a non-word boundary, which means that special characters like @ or - are used for tokenization. For example, Sn@pdragon will be split into two words sn and pdragon. Applying fuzzy with sn@pdragon on such words will not retrieve results when the actual value is Snapdragon, because the distance for each word is too great.

The algorithm used by Optimizely Graph to measure the distance between 2 strings is the Levenshtein distance. The edit distance is automatically calculated by the length of the query term. The distance is calculated as follow:

  • For contains the edit distance is determined by each word. For eq and in (and their inverses), the distance is calculated for the whole value.
  • The edit distance is calculated with the following heuristics:
    • If a word is between 0 and 2 characters long, it must exactly (distance of 0)
    • If a word is between 3 and 5 characters long, the edit distance is 1
    • If a word is 6 characters or greater long, the edit distance is 2

πŸ“˜

Fuzzy and synonyms

Fuzzy search is not applied with synonyms. With synonyms, only exact (case-insensitive) values will be expanded by synonyms.

Examples

A mispelled name with eq matching will return correct results when fuzzy: true, otherwise it would not return results.

{
  BiographyPage(where: { Name: { eq: "Arnodl Schwarzeneggerr", fuzzy: true } }) {
    items {
      Name
    }
  }
}

Similarly for the contains operator for searchable string fields, Optimizely Graph will return results with this query.

{
  BiographyPage(where: { Name: { contains: "Swarzenegger", fuzzy: true } }) {
    items {
      Name
      Die
      Born
      Language {
        DisplayName
        Name
      }
    }
  }
}

The following query will not return results with contains when we have a value Xiaomi Youpin Lydsto because we do fuzzy match on each word and the distance for XiaomiYoupin is too great to match on any of the words.

{
  TemporaryPage(
    locale: ALL
    where: { Product: { contains: "XiaomiYoupin Lydsto", fuzzy: true } }
    orderBy: { Product: ASC }
    limit: 100) {
  	total
  	items {
    	Product
  	}
	}
}

However, this query will return results when we have a value ferrari-purosangue, because it will match on the word purosanqe as the distance between both words is 2. The fuzzy does not have an impact on ferrary in case the content is synchronized to the English locale, as the system applies stemming then, so ferrary => ferrari.

{
  TemporaryPage(
    locale: ALL
    where: { Product: { contains: "ferrary-purosanqe", fuzzy: true } }
    orderBy: { Product: ASC }
    limit: 100) {
    total
    items {
      Product
    }
  }
}