Views: 2183
Number of votes: 7
Average rating:

Episerver Find - Performance, Reloading Custom Data and GetContentResult vs GetResult

In this blog I'm going to cover a few things around Episerver Find / Elastic search and how some of the underlying code for Episerver works when working with Episerver Find. Once broken down this should help understand how Episerver under the covers deals with reloading of data from Find.

I'll be running through a few points around common scenarios such as creating performant indexed objects and also indexing custom data then breakdown the best Find methods to use for searching this data.

Find Overview

The Episever Find framework and packages sit on top of the powerful, scalable ElasticSearch platform https://www.elastic.co/ which allows for data/models to be indexed in Episerver Find. It's a powerful product which for Episerver currently has 2 solutions

  • Episerver Find - Episerver's native wrapper around ElasticSearch. Provides you with a Nuget package which gives developers the ability to index Episerver Content and working with APIs for the retriving of the data and described here https://world.episerver.com/documentation/developer-guides/find/. Also comes with a great dashboard for viewing statistics and indexing content as well as configuring best bets, boosting and other such features.
  • Vulcan - A custom open source solution for indexing content in Elastic search https://github.com/TCB-Internet-Solutions/vulcan. Comes with a lot of the auto indexing features similar to Find and a dashboard for viewing content plus a great API.

Both of these solutions take content in Episerver and index that content as serialized objects within ElasticSearch. The aim of using this methods is that querying ElasticSerch and returning records should be super fast and the data should be sourced from ElasticSearch itself. This is where things can be tricky without an understanding of Find's internals.

Reloading Custom Data

When it comes to the models that are indexed in Episerver Find it's not always as simple that the searches we need to do will only need the core indexed properties. Therefore Find comes with the ability to index custom fields as described here https://world.episerver.com/documentation/developer-guides/find/NET-Client-API/Customizing-serialization/Including-fields/ 

This allows you to add your own properties in to what Find indexes even complicated nested objects. Once these are indexed you can use the Find API to query this data which is very powerful and can do nested faceted searches.

Within this documentation is stated that you can handle the reloading of Find data back from the Index

client.Conventions.ForInstancesOf<BlogPost>()
    .IncludeField(
        x => x.GetSomeField(), 
        (x, value) => x.SetSomeField(value));

This makes sense, if you have an extension method/property for indexing the data you need a way of reloading it.

However in the next section I'll break down how you need to understand the methods for searching to be able to use these.

GetContentResult vs GetResult

Two of the main methods for querying data from Find are GetResult and GetContentResult and it's very important to understand these.

GetContentResult

As you can see in the .NET guide https://world.episerver.com/documentation/developer-guides/find/NET-Client-API/searching/ there is a small Note at the top of the page saying that if you are searching for objects inheriting IContent you should use GetContentResult and you will see a lot of examples that use this method.

If you use this method it works exactly as we expect it gets the content back. However if you are as above trying to index and reload some custom data you'll find that the SetSomeField() method in the example above never get's called and the data is never set.....so why is this?

If you look at the internal implementation there's a key bit of code which is very interesting

      SearchResults<ContentInLanguageReference> result = search1.GetResult<ContentInLanguageReference>();
      IContentRepository instance = ServiceLocator.Current.GetInstance<IContentRepository>();
      Dictionary<ContentInLanguageReference, IContent> dictionary = new Dictionary<ContentInLanguageReference, IContent>();
      foreach (IGrouping<string, ContentInLanguageReference> source in result.GroupBy<ContentInLanguageReference, string>((Func<ContentInLanguageReference, string>) (x => x.Language)))
      {
        foreach (IContent content in instance.GetItems((IEnumerable<ContentReference>) source.Select<ContentInLanguageReference, ContentReference>((Func<ContentInLanguageReference, ContentReference>) (x => x.ContentLink)).ToList<ContentReference>(), (LoaderOptions) new LanguageSelector(source.Key)))
          dictionary[new ContentInLanguageReference(content)] = content;
      }

Essentially the steps of how this method works with Find are

  • Query against the Find Instance (Supports querying against custom data)
  • The code gets the data back from Find as a collection of Language/Content References.
  • The code then uses the IContentRepository to reload the data directly from the database/cache.

This esentially means the conventions we have set up to reload the data from Find using SetSomeField() will never work as the APIs are reloading the whole dataset. This to be seems like a large flaw in how this data works, we've seen complex objects taking a long time to reload custom index data. 

GetResult

Here is the real hero of the API. The GetResult method works in a completely different way than the above GetContentResult

  • Query against the Find Instance (Supports querying against custom data)
  • Deserialzes the data back from Find directly in to the custom object

If we now go back to our custom data loading SetSomeField() this works exactly as expected and we can reload any complex data object directly from Find.

However as mentioned above you need to use GetContentResult for anything IContent. This is enforced to the point where if you try to pass an object in to the Search with GetResult that is IContent inherited it will error telling you to use GetContentResult.

Solution

Obviously performance is key, and when indexing large datasets with complex data you really don't want everything reloading all the time. A key goal of elastic search is to deserialze the data back so performance is as optional as possible. So what's the solution?

1. Using Projections

Find support projections out of the box which allows you to project the results directly returned from Episerver Find in to a model or an anonymous type. This is great as it allows us to use the GetResult() without the limiter of the IContent restriction. https://world.episerver.com/documentation/developer-guides/find/NET-Client-API/searching/Projections/ 

The one caveat I've found is it doesn't play 100% nice when it comes to reloading the data if you are using extensions and the IncludeField() method as show above. However what does work is to Lazy load the data in the model for example looking at Alloy again

    /// <summary>
    /// Used primarily for publishing news articles on the website
    /// </summary>
    [SiteContentType(
        GroupName = Global.GroupNames.News,
        GUID = "AEECADF2-3E89-4117-ADEB-F8D43565D2F4")]
    [SiteImageUrl(Global.StaticGraphicsFolderPath + "page-type-thumbnail-article.png")]
    public class ArticlePage : StandardPage
    {
        private string _findTestField;

        public string FindTestField
        {
            get => string.IsNullOrWhiteSpace(_findTestField) ? Name : _findTestField;
            set => _findTestField = value;
        }
    }

For the indexing of the item. This will index FindTestField in the find index with the value we want. Then to get the data back as a projection

var articlePage = SearchClient.Instance.Search<ArticlePage>().Select(a =>
new {
       a.Name,
       a.FindTestField
    })
.GetResult();

You will now find that your indexed property comes through on the projected object.

2. Creating a Custom Model

Creating a custom model can really save the day here. All you need to do is create a C# class for any of the content you want to search for, populate it and then index it using the the Index method. Here is an example of indexing a custom object (note this only indexes on Initialization in a real world scenarios you'd likely index differently as described below)

 [ModuleDependency(typeof(IndexingModule))]
    public class FindInitialization : IInitializableModule
    {
        /// <inheritdoc />
        public void Initialize(InitializationEngine context)
        {
            SearchClient.Instance.Conventions
                .ForInstancesOf<Person>();

            var people = GetPeople();

            foreach (var person in people)
            {
                SearchClient.Instance.Index(person);
            }
        }

        /// <inheritdoc />
        public void Uninitialize(InitializationEngine context)
        {
        }

        private IEnumerable<Person> GetPeople()
        {
            return new List<Person>
            {
                new Person(new Guid("455469A8-654F-4541-96F4-6E958D8B7231"), "Scott", "Reed", 34),
                new Person(new Guid("BCF021FA-6790-40FD-B4C5-A1574F125CA8"), "Steven", "Galton", 25),
                new Person(new Guid("2DF87D73-2527-4247-950A-3BDEE966C114"), "David", "Harlow", 28)
            };
        }
    }

    public class Person
    {
        private IEnumerable<TestData> _testData;

        [Id]
        public Guid Id { get; set; }

        public string SearchTitle { get; set; }

        public IEnumerable<TestData> TestData
        {
            get => _testData ?? GetTestData();
            set => _testData = value;
        }

        public Person(Guid id, string firstname, string lastname, int age)
        {
            Id = id;
            Firstname = firstname;
            Lastname = lastname;
            Age = age;
            SearchTitle = $"{firstname} {lastname} {age}";
        }

        public string Firstname { get; set; }

        public string Lastname { get; set; }

        public int Age { get; set; }

        public IEnumerable<TestData> GetTestData()
        {
            return GetTestDataModels();
        }

        private static IEnumerable<TestData> GetTestDataModels()
        {
            return new List<TestData>
            {
                new TestData
                {
                    Test1 = "Test",
                    Reference = new ContentReference(5)
                },
                new TestData
                {
                    Test1 = "Test 2",
                    Reference = new ContentReference(6)
                },
                new TestData
                {
                    Test1 = "Test 3",
                    Reference = new ContentReference(7)
                },
                new TestData
                {
                    Test1 = "Test 4",
                    Reference = new ContentReference(9)
                }
            };
        }
    }

    public class TestData
    {
        public string Test1 { get; set; }

        public ContentReference Reference { get; set; }
    }

Here you can see that's its pretty easy to index custom data and once indexed you can use the GetResult method. To note 2 key properties for a custom indexed object

  • Id - This can be any type but need the [Id] attribute. This is what Find uses to know if it's an update or create.
  • SearchTitle - This is used in the Find admin area when viewing the data. This will allow you to custom name your objects so you can identify them better in the data viewer.

Steps To Index An IContent Item via a Custom Model

  1. Create a model that contains all of the data you need to search againsts and reload from
  2. Hook in to the ContentEvents (saving, publishing, deleteing, moving) and custom index your model when the IContent item it relates to changes so the model is always in sync with the model.
  3. Use the GetResult method for quick and fast searching :-)

To note remember to leave your core IContent inherited object in the index if using Find for the CMS search as well so admins can still use the core search features in the pages, commerce and assets panes.

With this method you can add complex data on to your find models for querying against using nested querying and have all of the data reload. In the project we worked on moving from GetContentResult to GetResult with a custom model changed our searching loading times from 10 second+ to 0.5 -1 second.

Thanks to Karl from Episerver who helped look in to some of this. We have an R&D ticket open to try and solve some of these issues in the framework.

Thanks all, hope you have enjoyed.

Mar 19, 2019

Son Do
( By Son Do, 3/20/2019 8:37:53 AM)

Great finding (Y) This is a very good working around.

I would prefer GetResult() than GetContentResult() because GetResult is lower level (in EPiServer.Find package) and it returns "raw" indexed document and we could play with this.

The disadvantage of GetResult() method, results isn't original CMS content so we have to work around like that. However the advantage we could see is good performance.

( 3/20/2019 10:22:29 AM)

Thanks, yes this relies on the index always being right up to date but usually it should be if it's a primary source of info. It's definately been an amazing win with performace. Janaka Fernando also suggest that projects could do the same thing as well without custom models but I quite like having full control via a custom indexing model.

( 3/20/2019 1:28:13 PM)

Projections rather, have updated the article to reflect

Please login to comment.