Loading...
Area: Episerver Search & Navigation
Applies to versions: 13.2 and higher

Preventing indexing of PII data

Recommendations [hide]

This topic describes how to filter out personally identifiable information (PII) data to prevent indexing of this in Episerver Search & Navigation (formerly Episerver Find). This is an important part when managing GDPR compliance.

How it works

IGDPRConventions and ITrackSanitizerPatternRepository are used for adding the filtering. 

Conventions

IGDPRConventions has these methods.

Description Sample

Set patterns to remove GDPR data from a search query.

public virtual void SetGDPRPatterns(List<GDPRPattern> gdprPatterns)

Get the GDPR patterns to be removed in search query

public virtual IEnumerable<GDPRPattern> Get

GDPRPatterns()

Delete the GDPR data in the search query that matches the patterns.

public string RemoveGDPRDataInQuery(string queryStringQuery)

ITrackSanitizerPatternRepository

The ITrackSanitizerPatternRepository has these methods.

Method description Sample

Add patterns to remove PII data from search query

Add single pattern

public string Add(TrackSanitizerPattern pattern)

Add multiple patterns

public void Add(IEnumerable<TrackSanitizerPattern> patterns)

Update patterns to remove PII data from search query

Update single pattern

public string Update(TrackSanitizerPattern pattern)

Update multiple patterns

public bool Update(IEnumerable<TrackSanitizerPattern> patterns)

Get patterns to remove PII in search query

Get all patterns

public IEnumerable<TrackSanitizerPattern> GetAll()

Get pattern by Id

public TrackSanitizerPattern Get(string patternId)

Delete PII data in the search query that matched the patterns

Delete pattern by Id

public void Delete(string patternId)

Delete all patterns

public void DeleteAll()

Example

The patterns support plain text, wildcard, regex. Here are some example filters.

  • Full name: “John Smith”, “Steven” …
  • Keyword contains email: “*@gmail.com”, “*@yahoo.com” …
  • Regex string: “\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*” …
public class Sample
  {
    protected IClient _client;
    protected IStatisticsClient _statisticsClient;
    protected ITrackSanitizerPatternRepository _trackSaniziterRepository;
    public Sample(IClient client)
      {
        _client = client;
        _trackSaniziterRepository = new DefaultTrackSanitizerRepository(_client);
      } 
    // Setting and add sanitizer patterns.
    _trackSaniziterRepository.Add(new List<TrackSanitizerPattern>
      {
        new TrackSanitizerPattern 
          { 
            PatternString = "admin", 
            PatternType = TrackSanitizerFilterType.PlainText 
          },
        new TrackSanitizerPattern 
          { 
            PatternString = "email",
            PatternType = TrackSanitizerFilterType.PlainText 
          },
        new TrackSanitizerPattern 
          { 
            PatternString = "*@mail.com", 
            PatternType = TrackSanitizerFilterType.Wildcard 
          },
        new TrackSanitizerPattern 
          {
            PatternString = "1#1", 
            PatternType = TrackSanitizerFilterType.Wildcard 
          },
        new TrackSanitizerPattern 
          { 
            PatternString = "c[a-e]ll", 
            PatternType = TrackSanitizerFilterType.Wildcard 
          },
        new TrackSanitizerPattern 
          { 
            PatternString = @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*", 
            PatternType = TrackSanitizerFilterType.Regex 
          }
      });

    // Doing Tracking behavior
    var result = _client
      .UnifiedSearchFor(@"email admin admin@mail.com admin1@mail.com 
        admin2@mail.com ball bell bill 121 131 141 call cell")
      .StatisticsTrack()
      .GetResult();

    // Try to get GDPR data by keyword matched sanitize pattern.
    var response = _statisticsClient.GetGDPR("@mail.com", x => { });
  };

Installation and verification

In the steps below we describe how to implement and verify the PII filtering.

Prerequisites

Installation

  1. In Visual Studio, set the default project to Templates.Alloy.
  2. Install the following NuGet packages (use the “-pre” option to get latest development package).
    • Find.Cms
    • Find.Statistics
  1. Open the Alloy web.config file and update the following entries: 
    • In the <episerver.find> tag
      • serviceUrl
      • defaultIndex
    • In the <episerver.find.ui> tag 
      • clientSideResourceBaseUrl
  2. Access Admin Mode and add a GDPR test page.
    1. Go to CMS > Admin Mode > Content Type tab > Page Types > [Specialized] Start Page > Settings.

      page types screen

    2. Click Available Page Types and check  [Specialized] Find GDPR API Demo Page and click Save.

    3. Go to the CMS Edit > navigation pane > Pages tab > Start branch of the tree structure.
    4. Create a GDPR Seach page and publish it.
    5. Return to CMS > Admin Mode.
    6. Under Scheduled jobs, click Episerver Find Content Indexing Job and start that job manually.

Verification

In these steps we perform a search, delete the GDPR-related data, and add a filtering pattern to prevent it from being indexed.

  1. Open the GDPR Demo page created in previous steps. Clear the GDPR pattern settings to verify that the tracking function is running well.

    demo page

  2. Go to the search page and execute a search with some keywords.

    Search page

  3. Go to the GDPR Demo page and review the displayed data.

  4. Delete the existing GDPR data and set patterns to prevent it.

  5. Search again and recheck for the GDPR data. This should now have been filtered out.

Related topics

Do you find this information helpful? Please log in to provide feedback.

Last updated: Jun 17, 2019

Recommendations [hide]