This topic explains how to build functionality to search for related objects using the MoreLike method in Episerver Find.
How it works
Use the MoreLike method to find documents whose text content is "like" a given string. This functionality is typically used for, but not limited to, finding related documents/objects.
A simple example can look like this:
searchResult = client.Search<BlogPost>() .MoreLike("guitar") .GetResult();
After invoking the MoreLike method, you can customize the search query with a number of methods. For instance, since we don't have a lot of documents with similar content, we probably want to lower the minimum document frequency requirement. That is, the level at which words are ignored that do not occur in at least that many documents, which defaults to five.
searchResult = client.Search<BlogPost>() .MoreLike("guitar") .MinimumDocumentFrequency(1) .GetResult();
A full list of extension methods for customizing the query follows below. But before we look at those, let us look at an example of finding documents "related" to a given document. Assuming we indexed two BlogPosts with similar content, we can search for similar documents as the first and expect the second using a query such as this:
var firstBlogPost = //Some indexed blog post about guitars var secondBlogPost = //Another blog post about guitars searchResult = client.Search<BlogPost>() .MoreLike(firstBlogPost.Content) .MinimumDocumentFrequency(1) .Filter(x => !x.Id.Match(firstBlogPost.Id)) .GetResult();
Note: When you issue these types of queries, use some caching because the result is not likely to change very often. Even if it does, a few minutes' delay might not matter.
As the nature of content can differ greatly between indexes and types, it is often a good idea to play around with available settings after having invoked the MoreLike method. Below is a list of methods that can be called to customize the query. See also the Elastic Search guide.
The frequency at which words are ignored which do not occur in at least this many docs. Default is 5.
The maximum frequency in which words may still appear. Words that appear in more than this many docs are ignored. Default is unbounded.
The percentage of terms to match on. Default is 30 (percent).
The frequency below which terms are ignored in the source doc. The default frequency is 2.
The minimum word length below which words are ignored. Default is 0.
The maximum word length above which words are ignored. Default is unbounded (0).
The maximum number of query terms included in any generated query. Default is 25.
A list of words considered “uninteresting” and which are ignored.