Fuzzy search
Explanation and examples on how to support fuzzy search matching in Optimizely Graph.
Site visitors and other users may not be able to get the content items that they are looking for. One of the reasons could be misspelled query terms. It is easy to make a spelling mistake. To make sure that your site search can still return relevant results, Optimizely Graph supports fuzzy search.
Under the hood, Optimizely applies approximate string matching on both the query terms and the terms occurring in the content items that you have synchronized to Optimizely Graph. For example, when you have a query entered as Arnodl Schwarzeneggerr
, with fuzzy matching support enabled, the system will return content items that have Arnold Schwarzenegger
.
The Optimizely Graph query language supports fuzzy search for (searchable) string fields, including the _fulltext
field. It can be enabled using the fuzzy
operator option (similar to boost
and synonym
), which is of type Boolean. By default, it is false
and disabled. It is supported for the following operators:
eq
andnotEq
in
andnotIn
contains
match
Note
With the
contains
, thefuzzy
option works for each word. Optimizely extracts words from a string value using a non-word boundary, meaning special characters like@
or-
are used for tokenization.For example,
Sn@pdragon
is split into two words,sn
andpdragon
. Applying fuzzy withsn@pdragon
on such words will not retrieve results when the actual value isSnapdragon
because the distance for each word is too great.
The algorithm used by Optimizely Graph to measure the distance between two strings is the Levenshtein distance. The edit distance is automatically calculated by the length of the query term. The distance is calculated as follows:
- For
contains
, the edit distance is determined by each word. The distance is calculated for the whole value foreq
andin
(and their inverses). - The edit distance is calculated with the following heuristics:
- If a word is between zero and two characters long, it must be exact (distance of zero).
- If a word is between three and five characters long, the edit distance is one.
- If a word is six characters or greater long, the edit distance is two.
Fuzzy and synonyms
Fuzzy search is not applied with synonyms. With synonyms, only exact (case-insensitive) values are expanded by synonyms.
Examples
A misspelled name with eq
matching will return correct results when fuzzy: true
. Otherwise, it would not return results.
{
BiographyPage(where: { Name: { eq: "Arnodl Schwarzeneggerr", fuzzy: true } }) {
items {
Name
}
}
}
Similarly for the contains
operator for searchable string fields, Optimizely Graph will return results with this query.
{
BiographyPage(where: { Name: { contains: "Swarzenegger", fuzzy: true } }) {
items {
Name
Die
Born
Language {
DisplayName
Name
}
}
}
}
The following query will not return results with contains
with the value Xiaomi Youpin Lydsto
because Optimizely does a fuzzy match on each word, and the distance for XiaomiYoupin
is too great to match on any of the words.
{
TemporaryPage(
locale: ALL
where: { Product: { contains: "XiaomiYoupin Lydsto", fuzzy: true } }
orderBy: { Product: ASC }
limit: 100) {
total
items {
Product
}
}
}
However, this query will return results with the value ferrari-purosangue
because it will match the word purosanqe
as the distance between both words is two. The fuzzy matching does not have an impact on ferrary
in case the content is synchronized to the English locale, as the system applies stemming then, so ferrary => ferrari
.
{
TemporaryPage(
locale: ALL
where: { Product: { contains: "ferrary-purosanqe", fuzzy: true } }
orderBy: { Product: ASC }
limit: 100) {
total
items {
Product
}
}
}
Updated 8 months ago