Boosting

Optimizely Graph offers access to your data with a search engine. Relevance ranking is a crucial feature that search engines support, besides the fast retrieval performance. The most relevant results for a given query in a search engine result page (SERP) should be ranked higher than the less relevant ones, and the most relevant results should be ranked at the top. Offering customization in tweaking and tuning relevance ranking can be beneficial in improving the effectiveness of searches. Web users expect search engines to be effective. Poorly configured site or ecommerce search performs poorly in terms of not only conversation rates but a deficit in engagement and general trust in the site.

Applications of boosting

Optimizely Graph offers relevance ranking out of the box and gives you customization with query boosting. There can be business logic or domain-specific logic that significantly improves the effectiveness of some queries. Matches in some fields may be more relevant than others.

Another application with boosting is implementing sponsored search. Given certain matching terms in the query, you may want to expand the query with some IDs of documents (certain pages or products) so these documents display at the top of the SERP.

How does it work?

Regular boost

In the Where input argument of a GraphQL schema, you specify each predicate that you want to be boosted with the boost operator option. This is of type Int, but Optimizely Graph does not support negative boosting, so the value cannot be a negative number. If the value of a boost is null, then no boosting is applied.

By default, no boosting is applied in any predicate. Because boost is meant to influence the relevance ranking, you should project the relevance score with the _score field projection.

The following example shows a query with matching on content types, but where you want to boost the media types to the top of the results.

{
  Content(
    locale: en,
    limit: 5,
    where: {
      _or: [
        {
          ContentType: {
            eq: "Content"
          }
        },
        {
          ContentType: {
            eq: "Media", 
            boost: 10
          }
        }
      ]
    }
  ) {
    items {
      ContentType
      _score
    }
    total
  }
}

Response

The media content types are boosted to the top.

{
  "data": {
    "Content": {
      "items": [
        {
          "ContentType": [
            "Image",
            "Media",
            "ImageFile",
            "Content"
          ],
          "_score": 149.00687
        },
        {
          "ContentType": [
            "Image",
            "Media",
            "ImageFile",
            "Content"
          ],
          "_score": 149.00687
        },
        {
          "ContentType": [
            "Image",
            "Media",
            "ImageFile",
            "Content"
          ],
          "_score": 149.00687
        },
        {
          "ContentType": [
            "Page",
            "StandardPage",
            "Content"
          ],
          "_score": 0.05333282
        },
        {
          "ContentType": [
            "Page",
            "StandardPage",
            "Content"
          ],
          "_score": 0.05333282
        }
      ],
      "total": 50
    }
  },
  "extensions": {
    "correlationId": "83b98f1c-b3d8-4719-a3ec-b484504c1f4e"
  }
}

Boosting of datetime fields with a Gaussian function

You can prioritize recent results by applying a greater boost using a Gaussian function. Example use cases include

Giving pages that are more recently published a higher score.
Ranking people higher that are younger or older given a point in time.

It is called a decay function because the greater the distance from the origin, the lower the weight becomes. It becomes a decay curve. In Optimizely Graph's query language, you can add decay to a field used in the where input types, where there are three parameters

origin – The origin of the date with optionally time where the top of the curve (central point) should start.
- Default – now()
scale – The rate of decay in days.
- Default – 1,000.
rate – Defines how documents are scored at the distance given a rate of decay.
- Default – 0.5.

Example

{
    BiographyPage(
      orderBy: { _ranking: SEMANTIC }
      where: { Born: { decay: { origin: "1990-01-01", scale: 10000, rate: 0.3 } } }
  ) {
    total
    items {
      _score
      Born
    }
  }
}

Influencing scores with numeric values

You can influence the score based on number fields of type Float and Int. This can be used for dynamic metadata that have dimensions like "number of clicks", number of upvotes", "number in inventory", or business logic like prioritization of pages. You do not want to purely order by these dimensions, but combine it with other ranking dimensions like similarity of query and content, time with a decay function (see above), the language of the query or location of the user. The calculations of the ranking are represented with the _score field. When having a mixture of these dimensions, it is possible to implement personalized content delivery (search). There is no one-size fits all approach when tweaking and tuning the ranking on your data and with your queries. So test carefully, for example with an offline test with a set of queries.

You can influence the score in Optimizely Graph by adding the factor operator in a number field. It has 2 properties that can be set:

value (float), which is factor that should be multiplied with the number in a field. By default this is 1.0 and it cannot be a negative number.
modifier, which has the following enumeration:
1. NONE (default): Do not apply any multiplier to the field value.
2. SQUARE: Square the field value (multiply it by itself).
3. SQRT: Take the square root of the field value.
4. LOG: Add 1 to the field value and take the natural logarithm.
5. RECIPROCAL: Reciprocate the field value, same as 1/x where x is the field's value.

In case a field consists of an array with multiple numbers, then the lowest number is used.

Example

This GraphQL query shows how we can influence score by the number of clicks.

{
  BiographyPage(
    locale: en
    orderBy: { _ranking: SEMANTIC }
    where: {
      _fulltext: { match: "female actor newton amy" }
      NumClicks: { gt: 1, factor: { value: 10, modifier: SQRT } }
    }
  ) {
    total
    items {
      _score
      _fulltext
      NumClicks
    }
  }
}

It will return the following results, where we see the impact of the value of NumClicks on the _score in combination with semantic search as the item with most clicks get ranked to the top, but not totally overriding the weight of the relevance scoring as subsequent items show.

{
  "data": {
    "BiographyPage": {
      "total": 4,
      "items": [
        {
          "_score": 235.25735,
          "_fulltext": [
            "Alan Turing"
          ],
          "NumClicks": 5435
        },
        {
          "_score": 230.35423,
          "_fulltext": [
            "Test CharacterName",
            "This is a quote content",
            "Amy Winehouse",
            "5",
            "1",
            "10"
          ],
          "NumClicks": 1001
        },
        {
          "_score": 112.84823,
          "_fulltext": [
            "Marie Curie"
          ],
          "NumClicks": 1234
        },
        {
          "_score": 104.785904,
          "_fulltext": [
            "Arnold Schwarzenegger",
            "3",
            "7",
            "12"
          ],
          "NumClicks": 999
        }
      ]
    }
  }
}