HomeDev GuideAPI Reference
Dev GuideAPI ReferenceUser GuideGitHubNuGetDev CommunitySubmit a ticketLog In
GitHubNuGetDev CommunitySubmit a ticket

Prevent indexing of PII data

Describes how to filter out personally identifiable information (PII) data to prevent indexing of this in Optimizely Search & Navigation.

This is an important part when managing GDPR compliance.

IGDPRConventions and ITrackSanitizerPatternRepository are used for adding the filtering.

Conventions

IGDPRConventions has these methods.

DescriptionSample
Set patterns to remove GDPR data from a search query.public virtual void SetGDPRPatterns(List gdprPatterns)
Get the GDPR patterns to be removed in a search querypublic virtual IEnumerable Get

GDPRPatterns()
Delete the GDPR data in the search query that matches the patterns.public string RemoveGDPRDataInQuery(string queryStringQuery)

ITrackSanitizerPatternRepository

The ITrackSanitizerPatternRepository has these methods.

Method descriptionSample
Add patterns to remove PII data from search query
Add single patternpublic string Add(TrackSanitizerPattern pattern)
Add multiple patternspublic void Add(IEnumerable patterns)
Update patterns to remove PII data from search query
Update single patternpublic string Update(TrackSanitizerPattern pattern)
Update multiple patternspublic bool Update(IEnumerable patterns)
Get patterns to remove PII in search query
Get all patternspublic IEnumerable GetAll()
Get a pattern by Idpublic TrackSanitizerPattern Get(string patternId)
Delete PII data in the search query that matched the patterns
Delete pattern by Idpublic void Delete(string patternId)
Delete all patternspublic void DeleteAll()

Example

The patterns support plain text, wildcard, and regex. Here are some example filters.

  • Full name – “John Smith”, “Steven” …
  • Keyword contains email – “*@gmail.com”, “*@yahoo.com” …
  • Regex string – “\w+([-+.]\w+)*@\w+([-.]\w+)*.\w+([-.]\w+)*” …
public class Sample
{
    protected IClient _client;
    protected IStatisticsClient _statisticsClient;
    protected ITrackSanitizerPatternRepository _trackSaniziterRepository;
    public Sample(IClient client)
    {
        _client = client;
        _statisticsClient = client.Statistics();
        _trackSaniziterRepository = client.TrackSanitizer().TrackSaniziterRepository;
    }

    public void Test()
    {
        // Setting and add sanitizer patterns.
        _trackSaniziterRepository.Add(new List<TrackSanitizerPattern>
            {
            new TrackSanitizerPattern
                {
                PatternString = "admin",
                PatternType = TrackSanitizerFilterType.PlainText
                },
            new TrackSanitizerPattern
                {
                PatternString = "email",
                PatternType = TrackSanitizerFilterType.PlainText
                },
            new TrackSanitizerPattern
                {
                PatternString = "*@mail.com",
                PatternType = TrackSanitizerFilterType.Wildcard
                },
            new TrackSanitizerPattern
                {
                PatternString = "1#1",
                PatternType = TrackSanitizerFilterType.Wildcard
                },
            new TrackSanitizerPattern
                {
                PatternString = "c[a-e]ll",
                PatternType = TrackSanitizerFilterType.Wildcard
                },
            new TrackSanitizerPattern
                {
                PatternString = @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
                PatternType = TrackSanitizerFilterType.Regex
                }
            });

        // Doing Tracking behavior
        var result = _client
            .UnifiedSearchFor(@"[email protected]")
            .StatisticsTrack()
            .GetResult();

        // Try to get GDPR data by exact term matched sanitize pattern.
        var response = _statisticsClient.GetGDPR("[email protected]", x => { });
    }
};

The _statisticsClient.GetGDPR() API only support exact term search due to limitations of statistics indexes.

Install and verify

The steps below describe how to implement and verify the PII filtering.

Install packages

  1. In Visual Studio, set the default project to Templates.Alloy.

  2. Install the following NuGet packages (use the “-pre” option to get the latest development package).

    • Find.Cms
    • Find.Statistics
  3. Open the Alloy web.config file and update the following entries: 

    • In the <episerver.find> tag
      • serviceUrl
      • defaultIndex
    • In the <episerver.find.ui> tag 
      • clientSideResourceBaseUrl
    • In the <appSettings> tag
      • add an item with key episerver:Find.TrackingSanitizerEnabled and value true
  4. Access Admin Mode and add a GDPR test page.
    a. Go to CMS > Admin > Content Type tab > Page Types > [Specialized] Start Page > Settings.

b. Click Available Page Types and check  [Specialized] Find GDPR API Demo Page and click Save.

  1. Go to the CMS Edit > navigation panel > Pages tab > Start branch of the tree structure.
  2. Create a GDPR Search page and publish it.
  1. Return to CMS > Admin view.
  2. Under Scheduled jobs, click Optimizely Find Content Indexing Job and start that job manually.

Verify

In these steps, you perform a search, delete the GDPR-related data, and add a filtering pattern to prevent it from being indexed.

  1. Open the GDPR Demo page created in the previous steps. Clear the GDPR pattern settings to verify that the tracking function runs well.
  1. Go to the search page and execute a search with some keywords.
790
  1. Go to the GDPR Demo page and review the displayed data.
991 970
  1. Delete the existing GDPR data and set patterns to prevent it.
745
  1. Search again and recheck for the GDPR data. This should now be filtered out.
796 698