Preventing indexing of PII data
This topic describes how to filter out personally identifiable information (PII) data to prevent indexing of this in Optimizely Search & Navigation.
This is an important part when managing GDPR compliance.
How it works
IGDPRConventions and ITrackSanitizerPatternRepository are used for adding the filtering.
Conventions
IGDPRConventions has these methods.
Description | Sample |
---|---|
Set patterns to remove GDPR data from a search query. | public virtual void SetGDPRPatterns(List<GDPRPattern> gdprPatterns) |
Get the GDPR patterns to be removed in search query | public virtual IEnumerable<GDPRPattern> Get GDPRPatterns() |
Delete the GDPR data in the search query that matches the patterns. | public string RemoveGDPRDataInQuery(string queryStringQuery) |
ITrackSanitizerPatternRepository
The ITrackSanitizerPatternRepository has these methods.
Method description | Sample |
---|---|
Add patterns to remove PII data from search query | |
Add single pattern | public string Add(TrackSanitizerPattern pattern) |
Add multiple patterns | public void Add(IEnumerable<TrackSanitizerPattern> patterns) |
Update patterns to remove PII data from search query | |
Update single pattern | public string Update(TrackSanitizerPattern pattern) |
Update multiple patterns | public bool Update(IEnumerable<TrackSanitizerPattern> patterns) |
Get patterns to remove PII in search query | |
Get all patterns | public IEnumerable<TrackSanitizerPattern> GetAll() |
Get pattern by Id | public TrackSanitizerPattern Get(string patternId) |
Delete PII data in the search query that matched the patterns | |
Delete pattern by Id | public void Delete(string patternId) |
Delete all patterns | public void DeleteAll() |
Example
The patterns support plain text, wildcard, regex. Here are some example filters.
- Full name: “John Smith”, “Steven” …
- Keyword contains email: “*@gmail.com”, “*@yahoo.com” …
- Regex string: “\w+([-+.]\w+)*@\w+([-.]\w+)*.\w+([-.]\w+)*” …
public class Sample
{
protected IClient _client;
protected IStatisticsClient _statisticsClient;
protected ITrackSanitizerPatternRepository _trackSaniziterRepository;
public Sample(IClient client)
{
_client = client;
_trackSaniziterRepository = new DefaultTrackSanitizerRepository(_client);
}
// Setting and add sanitizer patterns.
_trackSaniziterRepository.Add(new List<TrackSanitizerPattern>
{
new TrackSanitizerPattern
{
PatternString = "admin",
PatternType = TrackSanitizerFilterType.PlainText
},
new TrackSanitizerPattern
{
PatternString = "email",
PatternType = TrackSanitizerFilterType.PlainText
},
new TrackSanitizerPattern
{
PatternString = "*@mail.com",
PatternType = TrackSanitizerFilterType.Wildcard
},
new TrackSanitizerPattern
{
PatternString = "1#1",
PatternType = TrackSanitizerFilterType.Wildcard
},
new TrackSanitizerPattern
{
PatternString = "c[a-e]ll",
PatternType = TrackSanitizerFilterType.Wildcard
},
new TrackSanitizerPattern
{
PatternString = @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
PatternType = TrackSanitizerFilterType.Regex
}
});
// Doing Tracking behavior
var result = _client
.UnifiedSearchFor(@"email admin [email protected] [email protected]
[email protected] ball bell bill 121 131 141 call cell")
.StatisticsTrack()
.GetResult();
// Try to get GDPR data by keyword matched sanitize pattern.
var response = _statisticsClient.GetGDPR("@mail.com", x => { });
};
Installation and verification
In the steps below we describe how to implement and verify the PII filtering.
Prerequisites<o:p></o:p>
- CMS Alloy sample site (for CMS 11 and Commerce 13) installed from the Visual Studio Extension. See also Installing Optimizely .NET5 for CMS 12 and Commerce 14.
- Optimizely Search & Navigation service URL and default index name, for example http://es-api-test01.episerver.com/<PRIVATE\_KEY>;.
- Optimizely Search & Navigation client-side resource base URL, for example https://dl.episerver.net/13.2.0.
- Optimizely Search & Navigation 13.2.0
Installation
-
In Visual Studio, set the default project to Templates.Alloy.
-
Install the following NuGet packages (use the “-pre” option to get latest development package).
- Find.Cms
- Find.Statistics
-
Open the Alloy web.config file and update the following entries:
- In the <episerver.find> tag
- serviceUrl
- defaultIndex
- In the <episerver.find.ui> tag
- clientSideResourceBaseUrl
- In the <episerver.find> tag
-
Access Admin Mode and add a GDPR test page.
a. Go to CMS > Admin **Mode > Content Type tab > Page Types > [Specialized] Start Page > Settings**.


b. Click **Available Page Types** and check **[Specialized] Find GDPR API Demo Page** and click **Save**.


3. Go to the CMS **Edit** > navigation pane > **Pages** tab > **Start** branch of the tree structure.
4. Create a GDPR Seach page and publish it.


5. Return to CMS > **Admin ****Mode.**
6. Under **Scheduled jobs**, click **Optimizely Find Content Indexing Job** and start that job manually.
Verification
In these steps we perform a search, delete the GDPR-related data, and add a filtering pattern to prevent it from being indexed.
- Open the GDPR Demo page created in previous steps. Clear the GDPR pattern settings to verify that the tracking function is running well.


- Go to the search page and execute a search with some keywords.


- Go to the GDPR Demo page and review the displayed data.




- Delete the existing GDPR data and set patterns to prevent it.


- Search again and recheck for the GDPR data. This should now have been filtered out.




Updated 3 months ago