Views: 0
Number of votes: 4
Average rating:

EPiServerSearchMeta

Over the last few years, I’ve done four implementations of the Google Mini search appliance.  This is a piece of hardware (a 1U rack mount) that acts has a search crawler and engine.

It crawls your Web site (or whatever else you point it at) 24 hours a day, and you can throw queries at it via a REST interface, and get results back as XML (you can also transform the XML on the device itself, and use it to actually present queries to the end user, but this is awkward and requires you to dupe your interface on another machine, which is never fun).

The device is quite good for text-heavy search, and retails for $2,995, making it a cheap solution for a lot of situations.

The Mini can do fairly granular searching of META (search protocol reference). Over the years, we’ve figured out that you should stack as many META tags as possible in your pages, because you never know what you’re going to want to search on.  If, for instance, your client wants to isolate a search to just news articles, then it’s helpful to have a META tag in there with the type of content (alternately, you could create a distinct collection in the device, but maintaining these can be tedious).

For another CMS, we developed a control that dumped all sorts of META to the HEAD tag of the page.  We refined this over the years to only run for the Mini, since it got to the point where it was computationally expensive to find and return all this information, and we only needed it for the Mini (we didn’t need it for public search engines, for instance).

For our first EPiServer/Mini integration, we adapted the control a bit, but the functionality is roughly the same – it dumps all sort of information to META tags, including any properties you might specify.

Register it like this:

<%@ Register TagPrefix=”Blend” Namespace=”Blend.EPiServer.Controls” Assembly=”[insert your assembly name here"]” />

Then put the control in the HEAD tag like this:

<Blend:EPiServerSearchMeta TagNameFormat="MySite.EPiServer.{0}" UserAgentString=”gsa” QuerystringCode=”OpenSesame” Properties="Title,Summary" runat="server" />

It will only run when the currently executing page is of type TemplatePage (so, only for EPiServer templates that have a content object attached).

The control outputs the following information:

  • The page ID
  • The page type ID
  • The page type name
  • The page name
  • The parent page ID
  • The parent page type ID
  • The parent page type name
  • Every page ID from the current page’s parent back to the start page (in multiple META tags)
  • The depth of the page (the start page is 0, top level pages are 1, etc.)

It looks like this:

<meta name="MySite.EPiServer.PageID" content="9" />
<meta name="MySite.EPiServer.PageTypeID" content="7" />
<meta name="MySite.EPiServer.PageTypeName" content="NewsArticle" />
<meta name="MySite.EPiServer.PageName" content="Deane Saves the World" />
<meta name="MySite.EPiServer.ParentPageID" content="8" />
<meta name="MySite.EPiServer.ParentTypeID" content="5" />
<meta name="MySite.EPiServer.ParentTypeName" content="NewsArchive" />
<meta name="MySite.EPiServer.AncestorID" content="8" />
<meta name="MySite.EPiServer.AncestorID" content="7" />
<meta name="MySite.EPiServer.AncestorID" content="3" />
<meta name="MySite.EPiServer.PageDepth" content="3" />
<meta name="MySite.EPiServer.Category" content="7" />
<meta name="MySite.EPiServer.Category" content="9" />
<meta name="MySite.EPiServer.Category" content="13" />
<meta name="MySite.EPiServer.Category" content="15" />
<meta name="MySite.EPiServer.Category" content="16" />

There are a few control attributes…

TagNameFormat is the format of the “name” attribute of the resulting META tag.  So, in the above example, the Page Type ID of the content will output as:

<meta name=”MySite.EPiServer.PageTypeID” content=”7”/>

Properties is a comma-delimited list of properties you want to dump to META.  Be careful here, obviously – the entire text of the content object is unnecessary and potentially problematic.  The control will simply call ToWebString() on all of them, so make sure this outputs what you want.  Also, if the property is a Category selection, the control will split the IDs up under separate tags.

UserAgentString is used to identify the crawler. Enter a value in here that will be unique to the user agent string of your crawler – “gsa” works well for the Mini.  If the control finds this string it will execute, otherwise it will exit without doing anything.

QuerystringCode is a secret code you can use to debug the control.  If this value is found in a querystring argument called “show_meta,” the control will always execute (regardless of the user agent string). This is useful for debugging, so you can see the META it outputs.

Get the Code (.zip file, containing a single .cs file)

Sep 01, 2009

Guest
(By Guest, 9/21/2010 12:32:37 PM)

Awesome stuff Deane. Nice to see you blogging
/ Jacob Khan

jwilliams
(By jwilliams, 9/21/2010 12:32:37 PM)

Nice article. Have you ever had the Google Mini indexing a document surfaced on the web via EPiServer SharePoint Connect?

Guest
(By Guest, 9/21/2010 12:32:37 PM)

Joel: I have not, sorry.
/ Deane

jwilliams
(By jwilliams, 9/21/2010 12:32:37 PM)

no worries, thanks anyway.

Please login to comment.