Views: 764
Number of votes: 5
Average rating:

How to create a Robots.txt handler for a multi-site episerver project

With my team we had the opportunity to start working on a new multi-site project using the EPiServer 11.20.0 & DXP and it's been a very exciting journey so far. We are planning to release 4 websites over the next months, each website with different templates, behaviors and routes.

While we are busy defining the different templates that each website will use, we also noticed that some components could be created once and be available for each website without a need to "override" their default behavior. I am talking about the following components:

  • Robots.txt file
  • Sitemap (xml) file

While the sitemap component is also worth an article on its own, today I want to explore the idea of a single entrypoint per website to generate a robots.txt in EPiServer.

It might sound controversial, but I do love being a lazy developer. I do love working with such developers. They don't write that much code. They might spent the time to look for plugins or examples online that have been battle-tested but in the end it's a methodology that is brings dividends over time.

So I started looking for plugins online. But we need to have rules. Without rules there's chaos.

So here's my set of rules:

  • The plugin / package must not be a beta.
  • The plugin / package must be up to date.
  • The plugin can be open-source but it has to come from a trusted source and must be maintained properly.
    There's a possibility to add it from a package manager or to copy the parts that we need.
  • If the plugin is not open-source then there must be a contract involving maintenance.
    It's often better to pay a license to make sure there's support instead of adding some dlls with little transparency. We do not like ticking bombs.
  • For this scenario, the plugin / package must be working in a multi-site environment.

Unfortunately, I couldn't find what I was looking for 😭 so I began to think of a way to put the pieces together. 

Before starting our coding journey, let's list our requirements:

  • We want a unique "/routes.txt" endpoint available for each website we are hosting and get a different result based on the website we are visiting.
  • We want to be able to edit the content of our robots.txt file inside the CMS - and in one place only (per website).
  • We want to write as little code as possible.

First step was to allow Mvc attribute routes in our EPiServer project:

 [InitializableModule]
    [ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
    public class AttributeRoutingInitializationModule : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            var routes = RouteTable.Routes;

            routes.MapMvcAttributeRoutes();
        }

        public void Uninitialize(InitializationEngine context)
        {
            
        }
    }

This code will allow us to bind an MVC action to a specific route. In our scenario, I want the "/robots.txt" route to become an endpoint for my project.

Second step was to setup an interface that will contain the content of our robots.txt file:

public interface ISitePage
{
     string RobotsTxtContent { get; set; }
}
public class MyFreshWebsiteSitePage : PageData, ISitePage
{
        [UIHint(UIHint.Textarea)]
        public virtual string RobotsTxtContent { get; set; }
}

Each of my websites will have a site page that will be the first ancestor for all the content that I want. It looks like the right place to store a property that is unique for each website. I also want to edit this property inside a textarea:
We are almost there. Final step is to setup our MVC endpoint:

//after calling UrlResolver and IContentLoader in the controller constructor using Dependency injection

[Route("robots.txt")]
[HttpGet]
public ActionResult RenderRobotsTxt()
{
	var site = _contentLoader.Get<ISitePage>(ContentReference.StartPage);
	if (site == null)
		return new HttpNotFoundResult();

	var robotsContent = site.RobotsTxtContent;

	if (string.IsNullOrWhiteSpace(robotsContent))
		return new HttpNotFoundResult();

	return Content(robotsContent, "text/plain", Encoding.UTF8);
}

Before we wrap up and go home, let's analyse the code we have here. For the sake of my example I allowed some assumptions in order to find the page containing the Robots.txt content:

  • All the site pages are at the root level. It's easy to find the site using the GetRoot extension method ContentReference.StartPage as we know that the start page is the site page that contains the property we are looking for.

It is recommended to allow only a specific group of people to edit this property as it is a very sensitive one. We do not want Google to decide that one of our website should no longer be indexed. I would also recommend to add a "robots.txt" validator to make sure the syntax is the right one.

Update: I replaced the GetRoot and Url.GetLeftPart by a call to the cache with ContentReference.StartPage as reference. Credits to Stefan Holm Olsen.

From there we can locate the site page, find the content of our robots.txt and send it as a text ! Hurrayyyyyy 🥳🥳

Please feel free to provide feedback & comments in the comment section & don't forget to like the article if it was useful 🤩

Ps: While this solution will work for our project, we are currently considering the idea of having a more generic solution inside a NuGet package where we would "resolve" the path to the Robots.txt content property inside the CMS.

It could be some configuration in the code, a [RobotsTxtContent] attribute to decorate the property, etc. If you have a clever implementation in mind we would love to hear more about it in the comment section 😊

Oct 17, 2020

David Knipe
( By David Knipe, 10/17/2020 6:02:28 PM)

Hey thanks for sharing the post! I'm curious to hear if you reasearched the possible robots handler as part of the journey and if so what part of your criteria it didn't fit?

https://github.com/markeverard/POSSIBLE.RobotsTxtHandler 

https://nuget.episerver.com/package/?id=POSSIBLE.RobotsTxtHandler 

Stefan Holm Olsen
( By Stefan Holm Olsen, 10/17/2020 6:28:26 PM)

Good idea putting the robots.txt content inside the CMS.

To simplify things, and since you are probably going to store the text on the start page, you could try replacing:

var site = _contentLoader.GetRootSite(content) as ISitePage;

with this

var site = _contentLoader.Get<ISitePage>(ContentReference.StartPage);

Then Episerver will give you the relevant start page instance, based on the requested domain.

Giuliano Dore
( By Giuliano Dore, 10/17/2020 7:17:25 PM)

Hey David, POSSIBLE is looked like an excellent option but I couldn't pick it, I wanted the content inside the CMS (instead of a custom section) & I also noticed that the latest commit was ~3 years ago which is a bit of a deal breaker for me.

Hi Stefan, thank you so much for this suggestion ! I am updating the code and the article as we speak 😄

valdis
( By valdis, 10/18/2020 5:03:40 PM)

Mark just makes mature enough packages that do not need any commits after 1.0 release ;))

Mark Everard
( By Mark Everard, 10/20/2020 1:37:32 PM)

Thanks @Valdis :)

The key is to pick a simple domain and then you can reach maturity quickly. Don't pick anything complex like Localization. Painful.

Seriously though, @Guiliano raises an interesting point when it comes to chosing an open source solution. How do you identify something that's reached maturity and suits most uses cases well, compared to something that's unmaintained. Last commit doesn't really help then.

I suspect though, that they'll need to be some work on the package to make it work in the Delivery Core world. 

valdis
( By valdis, 10/20/2020 1:42:09 PM)

Rotate version every month and package will look "fresh" :) But yeah - choosing package mature vs abandoned one is tough.

Regarding Delivery Core - this is true most probably for every package author out there..

David Knipe
( By David Knipe, 10/20/2020 2:24:21 PM)

Well you could say that the POSSIBLE Robots handler has been around a long time as it started life as EpiRobots which was built in 2011. Mark just did the hard work and continued to maintain it after the original author stopped (ahem)... 

https://archive.codeplex.com/?p=epirobots 

https://github.com/davidknipe/epirobots 

Once delivery core comes there will be a load of new work to maintain packages but that's where Valdis is way ahead of the curve on that one :)!

Mark Everard
( By Mark Everard, 10/20/2020 2:28:42 PM)

Now I'm really getting found out :)

Quickest way to reach package maturity.

  1. Pick a simple domain
  2. Take over an existing package that pretty much did everything anyway

"Always give your tasks to a lazy man. They'll be sure to find the easiest way."

valdis
( By valdis, 10/20/2020 4:25:28 PM)

Mark just made it POSSIBLE! :)

valdis
( By valdis, 10/20/2020 4:26:06 PM)

But Giuliano, sorry for spamming your post comments :)

Mark Everard
( By Mark Everard, 10/20/2020 4:42:01 PM)

Me too! But that's part of being in a community right. Welcome Giuliano!

Giuliano Dore
( By Giuliano Dore, 10/20/2020 5:37:43 PM)

 That thread was a lot of fun. Thanks everyone 😁 What is Delivery Core ?

Mark Everard
( By Mark Everard, 10/21/2020 12:24:34 PM)

Delivery Core is Episerver rearchitecture / next-gen tech stack allowing us to deliver CMS sites using .net core, and so opening up hosting on Linux and all of the other performance and improvements in .net core.

Its a clever and quite significant rearchitecture as it decouples presentation and templates from the CMS management site, meaning you would have two applications to deliver an Episerver site.

  1. The Delivery site, which hosts the web front end / templates (running on .net core)
  2. The management site which is the Episerver editor etc and will still run on .Net framework

More is explained in the link below. This decoupled appraoch is agnostic of your preferred tech stack and so supports the moves towards other using presentation frameworks such as React / Blazor etc. A PHP frontend for an Episerver site? Why not?

https://world.episerver.com/blogs/martin-ottosen/dates/2019/12/asp-net-core-beta-program/

AB
( By AB, 10/22/2020 9:14:40 AM)

Great post Guiliano.

Please login to comment.