Boosting
Describes why and how to use boosting of queries in Optimizely Graph.
Optimizely Graph offers access to your data with a search engine. Relevance ranking is a crucial feature that search engines support, besides the fast retrieval performance. The most relevant results for a given query in a search engine result page (SERP) should be ranked higher than the less relevant ones, and the most relevant results should be ranked at the top. Offering customization in tweaking and tuning relevance ranking can be beneficial in improving the effectiveness of searches. Web users expect search engines to be effective. Poorly configured site or ecommerce search performs poorly in terms of not only conversation rates but a deficit in engagement and general trust in the site.
Applications of boosting
Optimizely Graph offers relevance ranking out of the box and gives you customization with query boosting. There can be business logic or domain-specific logic that significantly improves the effectiveness of some queries. Matches in some fields may be more relevant than others.
Another application with boosting is implementing sponsored search. Given certain matching terms in the query, you may want to expand the query with some IDs of documents (certain pages or products) so these documents display at the top of the SERP.
How does it work?
Regular boost
In the Where
input argument of a GraphQL schema, you specify each predicate that you want to be boosted with the boost
operator option. This is of type Int
, but Optimizely Graph does not support negative boosting, so the value cannot be a negative number. If the value of a boost
is null
, then no boosting is applied.
By default, no boosting is applied in any predicate. Because boost
is meant to influence the relevance ranking, you should project the relevance score with the _score
field projection.
The following example shows a query with matching on content types, but where you want to boost the media
types to the top of the results.
{
Content(
locale: en,
limit: 5,
where: {
_or: [
{
ContentType: {
eq: "Content"
}
},
{
ContentType: {
eq: "Media",
boost: 10
}
}
]
}
) {
items {
ContentType
_score
}
total
}
}
Response
The media
content types are boosted to the top.
{
"data": {
"Content": {
"items": [
{
"ContentType": [
"Image",
"Media",
"ImageFile",
"Content"
],
"_score": 149.00687
},
{
"ContentType": [
"Image",
"Media",
"ImageFile",
"Content"
],
"_score": 149.00687
},
{
"ContentType": [
"Image",
"Media",
"ImageFile",
"Content"
],
"_score": 149.00687
},
{
"ContentType": [
"Page",
"StandardPage",
"Content"
],
"_score": 0.05333282
},
{
"ContentType": [
"Page",
"StandardPage",
"Content"
],
"_score": 0.05333282
}
],
"total": 50
}
},
"extensions": {
"correlationId": "83b98f1c-b3d8-4719-a3ec-b484504c1f4e"
}
}
Boosting of datetime fields with a Gaussian function
You can prioritize recent results by applying a greater boost using a Gaussian function. Example use cases include
- Giving pages that are more recently published a higher score.
- Ranking people higher that are younger or older given a point in time.
It is called a decay function because the greater the distance from the origin, the lower the weight becomes. It becomes a decay curve. In Optimizely Graph's query language, you can add decay
to a field used in the where
input types, where there are three parameters
origin
– The origin of the date with optionally time where the top of the curve (central point) should start.- Default –
now()
- Default –
scale
– The rate of decay in days.- Default – 1,000.
rate
– Defines how documents are scored at the distance given a rate of decay.- Default – 0.5.
Example
{
BiographyPage(
orderBy: { _ranking: SEMANTIC }
where: { Born: { decay: { origin: "1990-01-01", scale: 10000, rate: 0.3 } } }
) {
total
items {
_score
Born
}
}
}
Influencing scores with numeric values
You can influence the score based on number fields of type Float
and Int
. This can be used for dynamic metadata that have dimensions like "number of clicks", number of upvotes", "number in inventory", or business logic like prioritization of pages. You do not want to purely order by these dimensions, but combine it with other ranking dimensions like similarity of query and content, time with a decay function (see above), the language of the query or location of the user. The calculations of the ranking are represented with the _score
field. When having a mixture of these dimensions, it is possible to implement personalized content delivery (search). There is no one-size fits all approach when tweaking and tuning the ranking on your data and with your queries. So test carefully, for example with an offline test with a set of queries.
You can influence the score in Optimizely Graph by adding the factor
operator in a number field. It has 2 properties that can be set:
value
(float), which is factor that should be multiplied with the number in a field. By default this is1.0
and it cannot be a negative number.modifier
, which has the following enumeration:NONE
(default): Do not apply any multiplier to the field value.SQUARE
: Square the field value (multiply it by itself).SQRT
: Take the square root of the field value.LOG
: Add 1 to the field value and take the natural logarithm.RECIPROCAL
: Reciprocate the field value, same as 1/x where x is the field's value.
In case a field consists of an array with multiple numbers, then the lowest number is used.
Example
This GraphQL query shows how we can influence score by the number of clicks.
{
BiographyPage(
locale: en
orderBy: { _ranking: SEMANTIC }
where: {
_fulltext: { match: "female actor newton amy" }
NumClicks: { gt: 1, factor: { value: 10, modifier: SQRT } }
}
) {
total
items {
_score
_fulltext
NumClicks
}
}
}
It will return the following results, where we see the impact of the value of NumClicks
on the _score
in combination with semantic search as the item with most clicks get ranked to the top, but not totally overriding the weight of the relevance scoring as subsequent items show.
{
"data": {
"BiographyPage": {
"total": 4,
"items": [
{
"_score": 235.25735,
"_fulltext": [
"Alan Turing"
],
"NumClicks": 5435
},
{
"_score": 230.35423,
"_fulltext": [
"Test CharacterName",
"This is a quote content",
"Amy Winehouse",
"5",
"1",
"10"
],
"NumClicks": 1001
},
{
"_score": 112.84823,
"_fulltext": [
"Marie Curie"
],
"NumClicks": 1234
},
{
"_score": 104.785904,
"_fulltext": [
"Arnold Schwarzenegger",
"3",
"7",
"12"
],
"NumClicks": 999
}
]
}
}
}
Updated about 2 months ago