Episerver Find Wildcard Queries and Best Bets | Episerver Developer Community
Views: 621
Number of votes: 7
Average rating:

Episerver Find Wildcard Queries and Best Bets

I have used the approach detailed in Joel Abrahamsson's 2012 blog post, Wildcard Queries with Episerver Find, for quite a while. The Episerver Find built-in WildcardQuery has some important advantages. Notably, it provides a means to boost results that have wildcard search hits against a specific field or set of fields. But, in practice, wildcards are only one piece in the puzzle of constructing a good search experience for the user. 

The purpose of this blog post is to address some of the challenges that come up when using WildcardQuery: 

  • Best Bets
  • Multiple Fields
  • Multiple Words
  • Apostrophes

Getting Started

The code block below is the base query that we'll be working with. For the uninitiated, I've taken Joel's extension method and made one key update: asterisks are added to the query string within the method itself.  

public static ITypeSearch<T> WildcardSearch<T>(this ITypeSearch<T> search,
    string query, Expression<Func<T, string>> fieldSelector, double? boost = null)
{
    query = query?.ToLowerInvariant();
    query = WrapInAsterisks(query);

    var fieldName = search.Client.Conventions
        .FieldNameConvention
        .GetFieldNameForAnalyzed(fieldSelector);

    var wildcardQuery = new WildcardQuery(fieldName, query)
    {
        Boost = boost
    };

    return new Search<T, WildcardQuery>(search, context =>
    {
        if (context.RequestBody.Query != null)
        {
            var boolQuery = new BoolQuery();
            boolQuery.Should.Add(context.RequestBody.Query);
            boolQuery.Should.Add(wildcardQuery);
            boolQuery.MinimumNumberShouldMatch = 1;
            context.RequestBody.Query = boolQuery;
        }
        else
        {
            context.RequestBody.Query = wildcardQuery;
        }
    });
}

public static string WrapInAsterisks(string input)
{
    return string.IsNullOrWhiteSpace(input) ? "*" : $"*{input.Trim().Trim('*')}*";
}

In Joel's version, the asterisks were added by the consuming code. But here, if the query "viol" is passed, it will convert it to "*viol*" itself, which will match against both of the words "violin" and "viola". 

This extension method can be called as follows: 

string query = "viol";
double pageNameBoost = 1.5;
var result = SearchClient.Instance.Search<PageData>()
    .WildcardSearch(query, x => x.PageName, pageNameBoost)
    .GetPagesResult();

Best Bets

One of the challenges of using wildcards is getting them to work with Episerver Find's Best Bets. Because wildcard queries use query strings with asterisks, best bets do not match. Consider the following example...

Say we have defined a Best Bet, with the phrases "violin", "viola", and "viol", to a music teacher profile page: "Chen, L.", our primary music teacher for violins and violas. So whenever a user searches for "viol", the Best Bet is found, and the "Chen, L." teacher profile page appear at the top of the results.

But our site requirements also state that search should support partial word matches. Which leads us to use the WildcardSearch method defined above.

This is a problem because Best Bets are not wildcard enabled. Best Bet lookup doesn't treat an asterisk any differently than, say, an "a" or a "3". So when our WildcardSearch() method passes the phrase "*viol*" to Find, the string doesn't match on any Best Bet, and the "Chen, L." teacher profile page does not (necessarily) appear at the top of the results.

Note that the Find admin UI does not permit special characters, so even if we wanted to add a best bet for "*viol*" -- not that we should -- the system wouldn't allow it.

Fortunately, Best Bets can be added by chaining a plain vanilla For() to the search object. In our consuming code: 

string query = "viol";
double pageNameBoost = 1.5;
var result = SearchClient.Instance.Search<PageData>()
    .For(query)
    .InField(x => x.PageName)
    .ApplyBestBets()
    .WildcardSearch(query, x => x.PageName, pageNameBoost)
    .GetPagesResult();

Although repetitive, this works because WildcardSearch() ORs the query generated by For() with the WildcardQuery it uses under the hood. Which is the purpose of BoolQuery and this line: 

boolQuery.Should.Add(context.RequestBody.Query);

InField() ensures that we only search against the field we are passing to WildcardSearch(), and avoid false positives from searching against the built-in All field.

We can tighten up reusability by putting these additional chains into another extension method:

public static ITypeSearch<T> ForWithWildcards<T>(this ITypeSearch<T> search,
    string query, Expression<Func<T, string>> fieldSelector, double? boost = null)
{
    return search
        .For(query)
        .InField(fieldSelector)
        .ApplyBestBets()
        .WildcardSearch(query, fieldSelector, boost);
}

Which would be called by the following code: 

string query = "viol";
double pageNameBoost = 1.5;
var result = SearchClient.Instance.Search<PageData>()
    .ForWithWildcards(query, x => x.PageName, pageNameBoost)
    .GetPagesResult();

I like to keep WildcardSearch() separate from ForWithWildcards() for situations where I need to provide my own sort order instead of sorting by score. Since Best Bets are irrelevant without score, I can spare Find the load of processing the QueryStringQuery created in For().

Side note: When the requirements call for Best Bets to appear at the top of a custom sorted set of results, you can retrieve Best Bets from BestBetRepository. BestBetRepository lives in the EPiServer.Find.Framework.BestBets namespace, and can be injected (or service located) into your consuming service.

Multiple Fields

With some minor refactoring, ForWithWildcards() and WildcardSearch() can accept multiple fields. In C# 7, System.ValueTuple -- which you can install from NuGet -- makes this a trivial effort:

public static ITypeSearch<T> ForWithWildcards<T>(this ITypeSearch<T> search,
    string query, params (Expression<Func<T, string>>, double?)[] fieldSelectors)
{
    return search
            .For(query)
            .InFields(fieldSelectors.Select(x => x.Item1).ToArray())
            .ApplyBestBets()
            .WildcardSearch(query, fieldSelectors);
}

public static ITypeSearch<T> WildcardSearch<T>(this ITypeSearch<T> search,
    string query, params (Expression<Func<T, string>>, double?)[] fieldSelectors)
{
    query = query?.ToLowerInvariant();
    query = WrapInAsterisks(query);

    var wildcardQueries = new List<WildcardQuery>();

    foreach (var fieldSelector in fieldSelectors)
    {
        string fieldName = search.Client.Conventions
            .FieldNameConvention
            .GetFieldNameForAnalyzed(fieldSelector.Item1);

        wildcardQueries.Add(new WildcardQuery(fieldName, query)
        {
            Boost = fieldSelector.Item2
        });
    }

    return new Search<T, WildcardQuery>(search, context =>
    {
        var boolQuery = new BoolQuery();

        if (context.RequestBody.Query != null)
        {
            boolQuery.Should.Add(context.RequestBody.Query);
        }

        foreach (var wildcardQuery in wildcardQueries)
        {
            boolQuery.Should.Add(wildcardQuery);
        }

        boolQuery.MinimumNumberShouldMatch = 1;
        context.RequestBody.Query = boolQuery;
    });
}

The calling code would then look something like this (depending on which fields you want to search against): 

var result = SearchClient.Instance.Search<PageData>()
    .ForWithWildcards("viol", 
        (x => x.PageName, 1.5),
        (x => x.SearchText(), null));

ValueTuple can, of course, be replaced with your own strongly typed class, but I have used it here for brevity.

Multiple Words and Apostrophes

In our example above, we used the query string "viol", which WildcardSearch() mutates into "*viol*". But what if the user searches for, say, "viol lessons"? In the code above, this will become "*viol lessons*", which will not match against "violin" or "viola".

I like to solve this problem by splitting the query string, by whitespace, into an array, and then ORing a separate WildcardQuery per word. This is done in our WildcardSearch()... 

var words = query.Split(new [] { " " }, StringSplitOptions.RemoveEmptyEntries)
    .Select(WrapInAsterisks)
    .ToList();

...

foreach (var word in words)
{
    wildcardQueries.Add(new WildcardQuery(fieldName, word)
    {
        Boost = fieldSelector.Item2
    });
}

Another challenge is presented by apostrophes. The Find (Elasticsearch) standard analyzer interprets apostrophes as whitespace. So the phrase, "Chen's" is indexed as "Chen s". This works with both plurals -- thanks to stemming -- and possessives, but causes trouble with other words that contain apostrophes.

For example, the name "O'Reilly Books" is indexed as "O Reilly Books". This presents a pattern matching issue for our WildcardSearch() -- and Find in general -- because the code above will mutate "O'Reilly Books" into "o'reilly* books*", which Find will then interpret as "o reilly* books*". If the user searches for "O'Reilly", then "O'Whatever" will also appear in the result list.

To address this scenario, I like to convert apostrophes into asterisks. "O'Reilly Books" becomes "o*reilly* books*" (note that there are no spaces in "o*reilly*"). Searches for "O'Reilly Books" do match "O'Reilly", do not match "O'Whatever", and don't interfere with plurals or possessives.

query = query.ToLowerInvariant().Replace('\'', '*');

With multiple words and apostrophes accounted for, the final extension method code is the following: 

public static ITypeSearch<T> ForWithWildcards<T>(this ITypeSearch<T> search,
    string query, params (Expression<Func<T, string>>, double?)[] fieldSelectors)
{
    return search
            .For(query)
            .InFields(fieldSelectors.Select(x => x.Item1).ToArray())
            .ApplyBestBets()
            .WildcardSearch(query, fieldSelectors);
}

public static ITypeSearch<T> WildcardSearch<T>(this ITypeSearch<T> search,
    string query, params (Expression<Func<T, string>>, double?)[] fieldSelectors)
{
    if (string.IsNullOrWhiteSpace(query))
        return search;

    query = query.ToLowerInvariant().Replace('\'', '*');

    var words = query.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
        .Select(WrapInAsterisks)
        .ToList();

    var wildcardQueries = new List<WildcardQuery>();

    foreach (var fieldSelector in fieldSelectors)
    {
        string fieldName = search.Client.Conventions
            .FieldNameConvention
            .GetFieldNameForAnalyzed(fieldSelector.Item1);

        foreach (var word in words)
        {
            wildcardQueries.Add(new WildcardQuery(fieldName, word)
            {
                Boost = fieldSelector.Item2
            });
        }
    }

    return new Search<T, WildcardQuery>(search, context =>
    {
        var boolQuery = new BoolQuery();

        if (context.RequestBody.Query != null)
        {
            boolQuery.Should.Add(context.RequestBody.Query);
        }

        foreach (var wildcardQuery in wildcardQueries)
        {
            boolQuery.Should.Add(wildcardQuery);
        }

        boolQuery.MinimumNumberShouldMatch = 1;
        context.RequestBody.Query = boolQuery;
    });
}

public static string WrapInAsterisks(string input)
{
    return string.IsNullOrWhiteSpace(input) ? "*" : $"*{input.Trim().Trim('*')}*";
}

Enjoy!

  Please login to post a comment