Try our conversational search powered by Generative AI!

Magnus von Wachenfeldt
Feb 12, 2010
  10578
(1 votes)

Making EPiServer and Search Engines Work Together

With the dawn of search appliance everybody wants to rack up an external box in their server hall, leaving the gritty details of indexing and searching to companies who specialize in doing it best.

We often forget how picky these magic boxes are. They can not think for themselves and thrive on being provided exact directions on how to correctly index your information. That is where we more often than not fail, many times without even knowing we failed.

The Problem

When deleting or moving pages in EPiServer, there is no information stored about that specific page’s history. You either get 200 OK or 404 Not Found, taking no concern whether that page has actually been moved or even deleted. This confuses search engines, clutters your index and breaks links from external sources when restructuring your page tree.

By attaching to some of the events exposed by EPiServer’s DataFactory I will try and fix this problem with a custom module, so let’s implement IHttpModule and get running.

Wiring Up

First, let us take a look on initializing the module.

   1: public void Init(HttpApplication context)
   2: {
   3:     SetupEvents();
   4:  
   5:     context.EndRequest += new EventHandler(context_EndRequest);
   6: }

In SetupEvents() we need to listen to the relevant events.

   1: private Type _urlMapperType = typeof(SimpleSqlUrlMapper);
   2: private static readonly object _lockObject = new object();
   3: private static bool _initialized = false;
   4:  
   5: private IUrlMapper Mapper
   6: {
   7:     get
   8:     {
   9:         return Activator.CreateInstance(_urlMapperType) as IUrlMapper;
  10:     }
  11: }
  12:  
  13: private void SetupEvents()
  14: {
  15:     if (!_initialized)
  16:     {
  17:         try
  18:         {
  19:             if (Monitor.TryEnter(_lockObject))
  20:             {
  21:                 this.Mapper.GenerateSchema();
  22:  
  23:                 EPiServer.DataFactory.Instance.DeletedPage += 
  24:                     new EPiServer.PageEventHandler(Instance_DeletedPage);
  25:                 
  26:                 EPiServer.DataFactory.Instance.MovedPage += 
  27:                     new EPiServer.PageEventHandler(Instance_MovedPage);
  28:                 
  29:                 EPiServer.DataFactory.Instance.PublishedPage += 
  30:                     new EPiServer.PageEventHandler(Instance_PublishedPage);
  31:  
  32:                 _initialized = true;
  33:             }
  34:         }
  35:         finally
  36:         {
  37:             Monitor.Exit(_lockObject);
  38:         }
  39:     }
  40: }

Locking here is very important, since we do not want a possible race condition, resulting in hooking up the events multiple times. I generate a schema for the SimpleSqlUrlMapper. Then, we let a static boolean tell us whether the module has been initialized or not.

IUrlMapper is a simple interface with just a couple of methods. I was thinking about splitting responsibility for some things into another interface, but in the end I wanted something that’s just simple to set up.

   1: public interface IUrlMapper
   2: {
   3:     void GenerateSchema();
   4:     void Add301MovedPermanentlyStatus(string oldUrl, string newUrl);
   5:     void Add410GoneStatus(string url);
   6:     void RemoveMappings(string url);
   7:     bool Exists(string url);
   8:     string GetRedirectUrl(string oldUrl);
   9:     HttpStatusCode? GetStatusCode(string url);
  10: }

Now. let us see what is hidden in context_EndRequest()

   1: private void context_EndRequest(object sender, EventArgs e)
   2: {
   3:     HttpApplication app = sender as HttpApplication;
   4:     if (app != null && app.Context.Response.StatusCode == (int)HttpStatusCode.NotFound)
   5:     {
   6:         HttpStatusCode? statusCode = 
   7:             this.Mapper.GetStatusCode(app.Context.Request.Url.AbsolutePath);
   8:  
   9:         if (statusCode != null)
  10:         {
  11:             app.Context.Server.ClearError();
  12:             app.Context.Response.Clear();
  13:             app.Context.Response.StatusCode = (int)statusCode.Value;
  14:  
  15:             if (statusCode == HttpStatusCode.MovedPermanently)
  16:             {
  17:                 app.Context.Response.AddHeader("Location", 
  18:                     this.Mapper.GetRedirectUrl(app.Context.Request.Url.AbsolutePath));
  19:             }
  20:         }
  21:     }
  22: }

We need to attach to EndRequest, since the Error event will not work under IIS7 in Integrated Mode. The code is pretty self explanatory. If we get a 404, check if we have mapped this URL and respond with the correct status code.

I will not go into the implementation details of the UrlMapper, because that is not really relevant to the problem. Instead, let’s see what happens with this module set up!

Shuffling Pages Around

Let’s create a new page and fiddle around to see what happens!

new page

Now, let’s move it to “News”.

moved page once

And we end up with this in our table.

table moved once

That is all fine and dandy. Now let’s see if our magic http module works by browsing to http://acme/en/test-page/ !

redirect first page

Hooray! We got a 301 and the browser redirected us. Let us move the page around a little more to see what happens.

moved page twice

This should make both /en/test-page/ and /en/News/test-page/ tell us that this page now resides under /en/Events/test-page/.

table moved twice

…but let’s say you decided you wanted it under the start page after all.

moved page to start position

That would make us have to remove the old mapping aside from making the other locations point to the new one.

table moved page to start position

Now, let’s say you’re sick of this page and its infidelity with the page structure and you decide to remove it.

moved page to waste basket

That should teach it. Let’s see what’s in the table now.

table moved page to waste basket

Let’s open up Firefox and open one one of the URLs.

410 gone

Mission accomplished! But, what happens if we create another test-page under the root now?

new page

The mapping should be gone. Let’s look.

mapping gone

It is.

 

Let’s see if we can change the URL segment of the page too.

change url segment

Should be updated in the table now.

table change url segment

 

Disclaimer

This is the result of a late night hack, thus it is probably riddled with bugs. However, it is a good start towards building EPiServer sites with better support for search engines. There are a few known bugs or things I just neglected for now due to sleep deprivation, such as:

* Weird behaviour when moving pages out of waste basket
* No support for mapping descendants of moved pages yet

Go ahead and try it out, and please give me a shout if you make any improvements or just want to toss ideas at me.

Install the module by putting it in your bin folder and add this to your modules section in web.config:
<add name="StatusCodeModule" type="Nansen.StatusCodes.Modules.StatusCodeModule, Nansen.StatusCodes" />

Feb 12, 2010

Comments

Sep 21, 2010 10:33 AM

Awesome!

Sep 21, 2010 10:33 AM

Combining the logic above with the 404 Handler from EPiCode on CodeResort would be a great idea. (https://www.coderesort.com/p/epicode/wiki/404Handler).

Sep 21, 2010 10:33 AM

I like the idea. I'm definitely making a module based on this. Nice work!
I see you have switched company since I saw you last as well, good luck you crazy bastard. :)

/ Daniel Ovaska

Sep 21, 2010 10:33 AM

Hey there. :) Thank you for the kind words!

There is an update at http://world.episerver.com/Blogs/Magnus-von-Wachenfeldt/Dates/2010/2/Update-to-the-Url-Mapping-Module/ with bugfixes and stuff. It's going to be used on a very big site, so I'll ask them for updates when they're done...

Please login to comment.
Latest blogs
Optimizely and the never-ending story of the missing globe!

I've worked with Optimizely CMS for 14 years, and there are two things I'm obsessed with: Link validation and the globe that keeps disappearing on...

Tomas Hensrud Gulla | Apr 18, 2024 | Syndicated blog

Visitor Groups Usage Report For Optimizely CMS 12

This add-on offers detailed information on how visitor groups are used and how effective they are within Optimizely CMS. Editors can monitor and...

Adnan Zameer | Apr 18, 2024 | Syndicated blog

Azure AI Language – Abstractive Summarisation in Optimizely CMS

In this article, I show how the abstraction summarisation feature provided by the Azure AI Language platform, can be used within Optimizely CMS to...

Anil Patel | Apr 18, 2024 | Syndicated blog

Fix your Search & Navigation (Find) indexing job, please

Once upon a time, a colleague asked me to look into a customer database with weird spikes in database log usage. (You might start to wonder why I a...

Quan Mai | Apr 17, 2024 | Syndicated blog

The A/A Test: What You Need to Know

Sure, we all know what an A/B test can do. But what is an A/A test? How is it different? With an A/B test, we know that we can take a webpage (our...

Lindsey Rogers | Apr 15, 2024

.Net Core Timezone ID's Windows vs Linux

Hey all, First post here and I would like to talk about Timezone ID's and How Windows and Linux systems use different IDs. We currently run a .NET...

sheider | Apr 15, 2024