Include more data on an PageData object

Member since: 2007

Hi,


First, I'm not using PageTypeBuilder in this project, but maybe I'm gonna add that for some specific page types.


Alot of the web pages are a collection of other pages, e.g. teaser pages and more specific in this case tabs.

Now I'm using the PageIndexer and sets some conventions, like;

PageIndexer.Instance.Conventions.ForInstancesOf<PageData>()
    .ShouldIndex(page =>
        PageTypesToIndex.Contains(page.PageTypeID) &&
        page["ExcludePageInSearch"] == null);

But how do I include more data to the pages, like tabs and teasers? The client class has IncludeField, so I tried that;

IClient client = Client.CreateFromConfig();

client.Conventions.ForInstancesOf<PageData>()
    .IncludeField(page => page.GetExtraContent());

GetExtraContent() is an extension method which loops through all tab pages and generates a string of all content. But the content this methods returns is not searchable. I can't see the content in 'Explore Index' either.


How can I add more content to an PageData so it's searchable?

 

#63632 Nov 22, 2012 16:02
  • If you are doing it exactly like your example I think it is because you are not using the conventions on the searchclient singleton in the second example which should make it hard for it to remember it.

    SearchClient.Instance.Conventions.ForInstancesOf<PageData>()
                    .IncludeField(page => page.GetExtraContent());


    #63634 Nov 22, 2012 16:10
  • Member since: 2007

    Wow I feel stupid now :) Of course I have to use the same instance. Now it works.

    How should I format the content that I'm passing in? How does the property SearchText works? Should I remove all html, is there a helper for removing all unnecessary content, like tables and headings?

    #63636 Nov 22, 2012 16:47
  • There is a helper extension method for string called StripHtml that should remove the tags for you. That you can run

    #63637 Edited, Nov 22, 2012 16:50
  • Regardings SearchText() it loops through all properties and filters out those that aren't marked as searchable on the page type. It then sorts the properties placing those of type string first. Finally it concatenates the rest. In other words it provides a decent default search text which could be especially usefull for non-PTB and non-v7 sites.

    #63643 Nov 22, 2012 22:26
  • Member since: 2007

    That was my guess too. Is it possible to hook in some code and add extra content to SearchText? Or should I go with my own extension method?

    How does the relevancy works? The sooner the term is in SearchText the more relevant? Because then it would be nice to be able to add headings, titles, keywords and so on first in the string.

    #63644 Nov 22, 2012 23:08
  • The default SearchText method isn't extensible in the sense that you can hook in to it. However, you can easily replace it. If you're using PTB or CMS 7 the easiest way is to add a string property (non-EPiServer) to your page types named SearchText which will then take precedence. You can also exclude the method and then include your own. In both cases you can choose to "extend" on the default SearchText method by invoking it and then add what it returns before returning the value from your own property or method.

    Regarding relevancy all text in the SearchText is equal. If you want to boost some specific text you can include it in a separate field in which you search (using .InField(...)). By doing that alone you probably boost that text but using the InField method you can also choose to give that field a specific boost should you want to.

    #63666 Nov 25, 2012 23:11
  • Member since: 2007

    Joel: Overriding SearchText worked out great!

    First I had to exclude the default SearchText method and then include mine. I added one parameter to the method signature so I could call mine.

    SearchClient.Instance.Conventions.ForInstancesOf<PageData>()
        .ExcludeField(page => page.SearchText()) // Exclude the default SearchText
        .IncludeField(page => page.SearchHitTypeName())
        .IncludeField(page => page.SearchHitUrl())
        .IncludeField(page => page.SearchPublishDate())
        .IncludeField(page => page.SearchText(true)) // Include our extened SearchText
        .IncludeField(page => page.SearchTitle())
        .IncludeField(page => page.SearchTypeName())
        .IncludeField(page => page.SearchUpdateDate());
    public static string SearchText(this PageData page, bool extended)
    {
        StringBuilder content = new StringBuilder();
    
        content.AppendLine(page.SearchText());
    
        // Custom content
    
        return content.ToString().StripHtml();
    }

        

    #63800 Nov 29, 2012 17:09
  • Member since: 2009

    Does this idea for IncludeField and ExcludeField still work in Find v12? I can't seem to figure it out with the ContentIndexer..

    #170782 Oct 30, 2016 17:37
  • Member since: 2009

    Okay, figured out it should be:

    EPiServer.Find.Framework.SearchClient.Instance.Conventions.ForInstancesOf<PageData>().ExcludeField(x => x.ACL); // with using EPiServer.Find.ClientConventions;

    Matt

    #170789 Oct 31, 2016 0:45