Removing extra results that use grammatical article words

I'm finding in my free text search that I am getting a lot of extra results when the search phrase includes an article as part of the term. Articles in English are the words "the, a, an".

Doing a search for the book "hunger games" produces 12,575 results.

Doing a search for "the hunger games" produces 414, 524 results.

While the majority of the top products are the same, there are some differences to the results when using the word "the".

So far the only way I can see to trim this is by removing any article words from the query before executing the search. I wasn't sure if there is a way for Find to do this through code or configuration?  I don't want to use AND instead of OR for the search, as I still wouldn't like to restrict it too much.

The search for of my query looks like this:

querySearch = _findClient.Search<BaseProduct>(Language.English).For(query)
.InField(x => x.DisplayName, 1.25)
.InField(x => x.ByLine)
.InField(x => x.Description)

// Filters applied

.GetResult();
#183414 Oct 12, 2017 18:21
  • Thanks Bob for the reply. I will continue to strip out words with Regex in the meantime, though it's not an ideal solution.

    Any idea within Episerver if this feature will be picked up?

    #183434 Oct 13, 2017 10:31
  • Member since: 2008

    Hey Janaka

    The MoreLikeThis query has a StopWords method, perhaps you can look at how its implemented and re-use the implementation for a standard search?

    David

    #183453 Oct 13, 2017 22:43
  • Hi David

    Thanks for that. I took a look through this and I can see the stop words applied to a MoreLikeQuery get sent in the JSON.  However this seems to be quite coupled to the MoreLike query. I tried creating my own query type but this didn't seem to have any effect on the main typed search.

    It looks like it's to internal to the standard typed search to be able to extend at this point, though my experience here is limited. Probably best left for the Find team.

    In the end I was able to come up with a solution that works.

    I have created an extension method to remove the stop words from the query using Regex and then apply my standard typed search.

    querySearch.For(query.RemoveStopWords())
                        .InField(x => x.DisplayName)


    I just ensure that the full search term is tracked.

    querySearch.Track(new[] {query});

    #183471 Oct 16, 2017 11:19