Views: 1288
Number of votes: 7
Average rating:

Using KQL to list popular content from Profile Store

Shortly after Profile Store was released I wrote a post about how we can use it to do more than just process customer-centric data for display in insight and, instead, look at more event-centric data like popular site content. While the solution did the job, there was plenty of room for improvement. As I said in the post, “In an ideal world, we could do the aggregation as part of our query but unfortunately this isn’t supported right now”. Well, following the recent announcements about KQL support, this should now be possible so I thought it would be a good time to revisit that post and see how we could implement this “popular content” functionality today.

Introducing KQL

For those not familiar with KQL, it stands for Kusto Query Language and you’re most likely to have seen (or even used) it when interrogating data in Application Insights in Azure. There’s a brief overview of using KQL with profile store in the Profile Store documentation and a bit more detail in this post from Dmytro Duk. I won’t cover-off the specifics of KQL here as there are other resources out there for that purpose but it’s worth noting that it’s a huge step up from the previous filter-based queries we could perform against Profile Store data. Where previously we could only filter and sort the results returned, KQL gives us the ability to project, aggregate, join and transform that data. You can even apply machine learning algorithms for clustering data if you’re that way inclined, but for our purposes, we’ll stick with the basics.

Tracking

For the reasons mentioned in the original post, we’ll track our page views using the PageViewTrackingAttribute in EPiServer.Tracking.PageView which records epiPageView events in Profile Store which look like this:

{
    "TrackId": null,
    "DeviceId": "91c08036-5335-4762-9660-d17c5677fba9",
    "EventType": "epiPageView",
    "EventTime": "2019-11-29T13:57:13.9670478Z",
    "Value": "Viewed Start",
    "Scope": "463470c3-3eca-41d3-8b12-3f7f92f62d34",
    "CountryCode": "Localhost",
    "PageUri": "http://localhost:59422/",
    "PageTitle": null,
    "RemoteAddress": "::1",
    "Payload": {
        "epi": {
            "contentGuid": "bd437cef-41bd-4ebc-8805-0c20fcf4edcf",
            "language": "en",
            "siteId": "463470c3-3eca-41d3-8b12-3f7f92f62d34",
            "ancestors": [
                "43f936c9-9b23-4ea3-97b2-61c538ad07c9"
            ],
            "recommendationClick": null
        }
    },
    "User": {
        "Name": "admin",
        "Email": "email@address.blah",
        "Info": {}
    },
    "SessionId": "a7751741-6936-4f89-81ac-e50dd980ab13"
}

We can search on any of the properties shown in the JSON structure above but there’s a problem. In the original post we had the ability to filter by content type (e.g. return a list of popular content of type ArticlePage) but content type isn’t recorded in the epiPageView event so how did that work? Well, in the original post, the data was processed in a scheduled job (as it would have been too slow and resource-intensive to do on-the-fly) and stored in the dynamic data store so, at that point we could augment the data with, for example, the page type. Our searching was then done against the DDS rather than Profile Store.

In this instance I want to avoid using scheduled jobs and DDS queries if at all possible and try to condense everything we need into a KQL query, so we’re going to have to store the content type as part of that tracking request. We can do that by creating a tracking data interceptor which allows us to modify the page tracking data before it’s sent to Profile Store. This involves creating a class which implements ITrackingDataInterceptor, registering it as an instance of ITrackingDataInterceptor and swapping out the standard payload for one which contains an additional field called typeId which will hold our content type. This is done in the intercept method like this:

[ServiceConfiguration(ServiceType = typeof(ITrackingDataInterceptor), Lifecycle = ServiceInstanceScope.Singleton)]
public class ContentTypeTrackingInterceptor : ITrackingDataInterceptor
{
    private IContentLoader _contentLoader;
    private IContentTypeRepository _contentTypeRepository;

    public int SortOrder => 100;

    public ContentTypeTrackingInterceptor(IContentLoader contentLoader, IContentTypeRepository contentTypeRepository)
    {
        _contentLoader = contentLoader;
        _contentTypeRepository = contentTypeRepository;
    }

    public void Intercept<TPayload>(TrackingData<TPayload> trackingData)
    {
        if (trackingData == null || trackingData.Payload == null)
        {
            return;
        }
        if (!(trackingData.Payload is EPiServer.Tracking.PageView.EpiPageViewWrapper payload))
        {
            return;
        }

        // Create replacement Epi payload object
        var pageView = new EpiPageViewWithType(payload.Epi);
        var page = _contentLoader.Get<IContent>(payload.Epi.ContentGuid);
        pageView.TypeId = page.ContentTypeID;
        payload.Epi = pageView;
    }
}

If we run the site now and click around a bit, the recorded events should now have our Payload.epi.typeId field like this:

{
    "TrackId": null,
    "DeviceId": "91c08036-5335-4762-9660-d17c5677fba9",
    "EventType": "epiPageView",
    "EventTime": "2019-11-29T13:57:13.9670478Z",
    "Value": "Viewed Start",
    "Scope": "463470c3-3eca-41d3-8b12-3f7f92f62d34",
    "CountryCode": "Localhost",
    "PageUri": "http://localhost:59422/",
    "PageTitle": null,
    "RemoteAddress": "::1",
    "Payload": {
        "epi": {
            "typeId": 23,
            "contentGuid": "bd437cef-41bd-4ebc-8805-0c20fcf4edcf",
            "language": "en",
            "siteId": "463470c3-3eca-41d3-8b12-3f7f92f62d34",
            "ancestors": [
                "43f936c9-9b23-4ea3-97b2-61c538ad07c9"
            ],
            "recommendationClick": null
        }
    },
    "User": {
        "Name": "admin",
        "Email": "email@address.blah",
        "Info": {}
    },
    "SessionId": "a7751741-6936-4f89-81ac-e50dd980ab13"
}

KQL query

So now we’ve got the data in the correct format, let’s look at how we can query that data to retrieve popular content. To run a KQL query against Profile Store, first we need to wrap our query in a query object containing the KQL query and the scope to run it against like this:

{ 
    "Query": "...", 
    "Scope": "..." 
}

N.B. Unless you’ve explicitly set it to something else, the value of ‘Scope’ will be the GUID of the site you’re tracking which can be accessed as SiteDefinition.Current.Id.

This object needs to be POSTed as the body of a request to:
https://[your profilestore host name here]/api/v2.0/TrackEvents/preview

You’ll also need to set the following headers:
Authorization: epi-single [your token here]
Content-Type: application/json-patch+json

In our original post we started by querying all epiPageView events in the last 24 hours which we could do quite easily with a query like this:

Events
| where EventTime between (ago(24h) .. now())
and EventType == 'epiPageView'

If we use that query though, we’d still need to manually aggregate the data in a scheduled job as we did before, so we need to be more selective in what we return and we need to retrieve aggregated data rather than a raw listing of events. To do that we can add some additional predicates to the where clause then use KQL’s 'summarize' operator  alongside the 'top' operator to return us the guids for the top n content items of a given type, in a given language, under a given ancestor, ordered by number of views. Putting that together, we get a query like this:

Events
| where EventTime between (ago(24h) .. now())
    and EventType == 'epiPageView'
    and Payload.epi.language == 'en'
    and Payload.epi.ancestors contains('bd437cef-41bd-4ebc-8805-0c20fcf4edcf')
    and Payload.epi.typeId == 15
| summarize Count = count() by Payload.epi.contentGuid
| top 5 by Count desc

But there’s a caveat - data returned from a KQL query to Profile Store is returned in a fixed structure (the epiPageView JSON structure shown above) and when we call summarize, this cuts down the data returned to just include the count (if we assign it a name) and the values used to group the data. In our case we’re grouping the data by the page guid (Payload.epi.contentGuid) but, as that’s not a top-level variable name, we can’t retrieve it and get a 500 error. The workaround is to map the variable to one of the top-level property names. In my case I’ve chosen to use ‘Value’ so our slightly tweaked KQL query looks like this:

Events
| where EventTime between (ago(24h) .. now())
    and EventType == 'epiPageView'
    and Payload.epi.language == 'en'
    and Payload.epi.ancestors contains('bd437cef-41bd-4ebc-8805-0c20fcf4edcf')
    and Payload.epi.typeId == 15
| summarize Count = count() by Value = tostring(Payload.epi.contentGuid)
| top 5 by Count desc

Which will return us a list of content GUIDs as shown below which can then be used to load in the content items associated with each GUID.

{
    "items": [
        {
            "Value": "74964f63-98c9-4d05-8069-5f8221e0c6ad"
        },
        {
            "Value": "59f81174-6502-4c18-9e71-19cd7a6f2980"
        },
        {
            "Value": "6f3891ee-c7a1-4fc5-a12c-95222f05b537"
        },
        {
            "Value": "5f280886-1e3a-4ae0-a283-6b7c180abc82"
        },
        {
            "Value": "e99088f3-2394-49c0-8205-2e3a1aecc8f7"
        }
    ],
    "count": 5
}

Putting it all together

All that remains is to wrap that all in some code to substitute in the appropriate values into the query, make the requests and process the response, giving us two very similar methods to the “GetPopularPages” methods in the previous post but with one important difference. As we’re making the queries in real-time, we don’t have to have a fixed window for what’s classed as recent so, if articles attract thousands of views an hour but FAQs only attract a few per day, we could set the window for a recent view of an article to be 6 hours but an FAQ to be 7 days vastly improving the reusability of our block type.

Putting it all together, our code looks like this:

public class ProfileStoreHelper
{
    //Settings
    private string _apiRootUrl = ConfigurationManager.AppSettings["episerver:profiles.ProfileApiBaseUrl"];
    private string _appKey = ConfigurationManager.AppSettings["episerver:profiles.ProfileApiSubscriptionKey"];
    private string _eventUrl = "/api/v2.0/TrackEvents/preview";
    private string _scope = ConfigurationManager.AppSettings["episerver:profiles.Scope"] ?? SiteDefinition.Current.Id.ToString();

    private IContentLoader _contentLoader;
    private IContentTypeRepository _contentTypeRepository;

    public ProfileStoreHelper(IContentTypeRepository contentTypeRepository = null, IContentLoader contentLoader = null)
    {
        _contentTypeRepository = contentTypeRepository ?? ServiceLocator.Current.GetInstance<IContentTypeRepository>();
        _contentLoader = contentLoader ?? ServiceLocator.Current.GetInstance<IContentLoader>();
    }

    /// <summary>
    /// Get pages of a given type
    /// </summary>
    public IEnumerable<T> GetPopularPages<T>(ContentReference ancestor, string lang, int resultCount = 5, int recentHours = 24) where T : PageData
    {
        var contentTypeId = _contentTypeRepository.Load<T>().ID;
        var ancestorGuid = _contentLoader.Get<IContent>(ancestor).ContentGuid;
        var hits = GetRecentContentResponse(ancestorGuid, lang, resultCount, recentHours, contentTypeId);
        return hits?.Items?.Select(x => _contentLoader.Get<T>(x.Value)) ?? Enumerable.Empty<T>();
    }

    /// <summary>
    /// Get all popular content regardless of type
    /// </summary>
    public IEnumerable<IContent> GetPopularPages(ContentReference ancestor, string lang, int resultCount = 5, int recentHours = 24)
    {
        var ancestorGuid = _contentLoader.Get<IContent>(ancestor).ContentGuid;
        var hits = GetRecentContentResponse(ancestorGuid, lang, resultCount, recentHours);
        return hits?.Items?.Select(x => _contentLoader.Get<IContent>(x.Value)) ?? Enumerable.Empty<IContent>();
    }

    /// <summary>
    /// Make request to profile store API
    /// </summary>
    private RecentContentResponse GetRecentContentResponse(Guid ancestorGuid, string lang, int resultCount = 5, int recentHours = 24, int typeId = 0)
    {
        var requestBody = $"{{\"Query\": \"{GenerateKQLQuery(ancestorGuid, lang, resultCount, recentHours, typeId)}\", \"Scope\": \"{_scope}\" }}";
        var req = new RestRequest(_eventUrl, Method.POST);
        req.AddHeader("Authorization", $"epi-single {_appKey}");
        req.AddParameter("application/json-patch+json", requestBody, ParameterType.RequestBody);
        req.RequestFormat = DataFormat.Json;
        req.AddBody(requestBody);
        var client = new RestClient(_apiRootUrl);
        var getEventResponse = client.Execute(req);
        return JsonConvert.DeserializeObject<RecentContentResponse>(getEventResponse.Content);
    }

    /// <summary>
    /// Construct KQL query
    /// </summary>
    private string GenerateKQLQuery(Guid ancestorGuid, string lang, int resultCount = 5, int recentHours = 24, int typeId = 0)
    {
        var kqlQueryObj = @"Events 
            | where EventTime between(ago({0}h) .. now()) 
                and EventType == 'epiPageView' 
                and Payload.epi.language == '{1}' 
                and Payload.epi.ancestors contains('{2}') 
                {3}
            | summarize Count = count() by Value = tostring(Payload.epi.contentGuid)
            | top {4} by Count desc";
        //Only add type restriction if a type has been specified
        var typeQuery = typeId > 0 ? $"and Payload.epi.typeId == {typeId}" : string.Empty;
        return string.Format(kqlQueryObj, recentHours, lang, ancestorGuid.ToString(), typeQuery, resultCount);
    }
}

public class RecentContentResponse
{
    public int Count { get; set; }
    public RecentContentResponseItem[] Items { get; set; }
}

public class RecentContentResponseItem
{
    public Guid Value { get; set; }
}

Which we can then call from a block controller to give us something like this

As you can see, by taking advantage of the recent API updates in Profile Store we’ve managed to remove our reliance on scheduled jobs and the DDS, and cut down the amount of code we need to get useful data while improving the relevance of the results we return.

As with the previous post, I’ve added the code (including the block) to a Gist on GitHub but do bear in mind that this has been created as a proof-of-concept rather than a battle-hardened, production-ready feature so use it with caution.

Dec 04, 2019

David Knipe
( By David Knipe, 12/4/2019 12:01:27 PM)

 Nice write up, thanks for sharing! 

Scott Reed
( By Scott Reed, 12/4/2019 12:55:20 PM)

Sadly not got to use this yet but maybe one day. Night work Paul

Ravindra S. Rathore
( By Ravindra S. Rathore, 12/4/2019 6:45:56 PM)

Great post, Paul! Thanks for sharing this with us

Praful Jangid
( By Praful Jangid, 12/5/2019 3:03:22 AM)

Good job Paul. Hope soon to try this by my own hands.

@Scott seems like you are working late nights :D 

Nicola Ayan
( By Nicola Ayan, 12/5/2019 1:07:48 PM)

Ahh nice work Paul, thanks for sharing. Will give this a go.

Paul Gruffydd
( By Paul Gruffydd, 12/6/2019 10:13:14 AM)

Thanks all.

Please login to comment.