Views: 11531
Number of votes: 6
Average rating:

SQL replication and EPiServer CMS

I’ve been spending some time setting up an EPiServer CMS site using SQL replication and will in this post go through some basics, but lets start with the reasons for choosing replication in your architecture. The two obvious reasons are security and scalability. Security because you can have a master database that has zero surface to the internet, it just pushes changes to the replica. Scalability because you have have the visitors and editors on different database servers, you can also have multiple database servers for visitors. The main reason for staying away from replication is complexity, do understand that this setup will require a lot more configuration and maintenance than a “normal” setup, but in some cases the benefits are worth it.

This configuration was done using EPiServer CMS 5 R2 SP2, Microsoft Windows Server 2008, Microsoft SQL Server 2008.

Note: An alternative to SQL replication is EPiServer CMS Mirroring that runs on a higher abstraction level by publishing pages using the API, which means you have more control over what gets mirrored in the application but less control over dependencies such as custom tables (without writing code).

The installation

I will not describe in detail how you install and configure SQL replication since that would probably constitute a 100 page book, I am focusing on the configuration related to CMS. The initial approach is to install two copies of your site on two different servers using two different database servers with the replication feature installed. You now have two identical copies of the site. You also need to figure out how to share uploaded files by for example have a shared disk/SAN or use file synchronization.

Preparing the database

There are some changes needed on the schema for “object store”, a technology that is being used by for example XForms, permanent links and versioned files.

Make these changes:

  • tblIndex* tables needs primary key on fkSchemaItemId+fkItemId
  • tblRelation needs primary key on FromId,ToId

The publication

The main area of configuration when you setup replication will be the publication on the source database which describes what objects in the database that should be published.

image

I’ve included everything but..

These tables are used for syncing group membership for logged in users so I’ve excluded them:

  • tblWindowsUser
  • tblWindowsGroup
  • tblWindowsRoles

tblMasterSlave is used to store master/slave licenses so it should be excluded unless you plan to have the master and slave on different sides of the replication (which I honestly don’t know if its supported anyway).

Scheduled jobs will not work correctly when replicated so I’ve excluded them (you either don’t have any scheduled jobs on the replica or have a separate configuration):

  • tblScheduledItem
  • tblScheduledItemLog

These tables can normally be excluded, keep them if you are uncertain:

  • tblPageSoftlink (links to other pages in the editor for example)
  • tblPlugIn (if you don't have plug-in settings that must be propagated)
  • tblTask (tasks are normally only used by editors)

If you use replication your destinations are mostly read-only which means that replicating versions make no sense unless you call an API that require the existence of versions. For this to work in EPiServer CMS 5 R2 SP2 you need a database hot fix available in the downloads sections since two stored procedures has that dependency. I suggest that keep these tables in your initial testing and then remove them when you have everything working:

  • tblWorkCategory
  • tblWorkPage
  • tblWorkProperty

Also exclude these since they are “state”, this also brings up another question regarding the membership provider. You cannot have the master for the membership provider on the source database since the replicas will need to update these tables:

  • tblTempKeyword (used as a temp table while indexing keywords)
  • Aspnet_* (everything asp.net related)
  • CompletedScope (Workflow)
  • InstanceState (Workflow)

I also decided to remove these stored procedures from the destination and from the publication just to make sure there was no way to publish any content, this is of course optional as well and only a security measure.

  • editDelete*
  • editSave*
  • editPublish*
  • netPageTypeCopy
  • netPageTypeDelete*
  • netPageTypeSave
  • netPageDefinitionConvert*
  • netPageDefinitionDelete
  • netPageDefinitionMove*
  • netPageDefinitionSave
  • netPageDefinitionTypeDelete
  • netPageDefinitionTypeSave

 

Cache expiration

EPiServer has an event management system that is more known as “Remote Events”. This system provides a mechanism for distributing events within an EPiServer CMS site, between EPiServer CMS sites on the same physical server, and between EPiServer CMS 5 sites on separate servers.

This communication is mainly used to expire items in the cache but some other feature take advantage of this platform, for example EPiServer CMS 6 which support stoppable scheduled jobs which use this channel to tell all servers to terminate the job if they are running it.

The technology behind the actual communication channel is WCF and our implementation supports both UPD broadcasts and TCP/IP direct channels. If you want to get into the details have a look at the document Event Management System Specification.

image

The problem here is that SQL replication is not instant, under normal operations it takes a few seconds for the changes in the database to appear on the remote server but the events are by design instant. The solution to this problem is to have events written to the database instead and have them replicated over to the target and then have the target site site poll that table for new events.

 

image

I’ve written a plug-in for the event system that does just that, see the next section on how to install it.

Installation of “Database Events”

The actual binaries can be found in the download sections.

  1. Run the Database.sql script to install the required table tblRemoteEvents (make sure you include it in the publication)
  2. Install the DLL's in the bin-directory on both sites:
        EPiServer.Events.Database.dll
        StructureMap.dll
  3. Add this block to the <modules>-section (or httpModules if you are on IIS6) on all sites involved to trigger the initialization:
        <add name="DatabaseEventsModule" type="EPiServer.Events.Database.RemoteEventsDatabaseModule, EPiServer.Events.Database" preCondition="managedHandler"></add>
  4. Set enableRemoteEvents + enableEvents to true in site settings to enable the remote events system
  5. Important: Make sure there are no end points in web.config defined for service EventReplication, it is possible to combine database events with the WCF events but a site cannot receive events from both sources. It is very important that you verify this because otherwise the same event will be delivered twice causing “unexpected behaviors”.
  6. If you have a single site on the source database you can disable the subscriber to events since there is nothing to read anyway:
  7. <appSettings>
        <add key="EPiServer.Events.Database.DisableSubscriber" value="true"/>
    </appSettings>
  8. On the sites sitting on the replica you should disable the publisher since they are only read-only and you cannot write to the events table. If you have a solution that require the sites on the replica to distribute events between each other you could use the standard WCF approach to enable those events as well to be distributed between the sites as long as you don’t connect a site sitting on the source database (I can’t think of a situation were you might need this right now but it should be possible if needed).
  9. <appSettings> 
        <add key="EPiServer.Events.Database.DisablePublisher" value="true"/>
    </appSettings>
  10. Test the functionality by publishing some changes to a page and see it appear on the other site within 20 seconds.

Troubleshooting

As always you enable logging in EPiServer to track problems, there are logging in this plug-in as well. All events are written to tblRemoteEvents, you should see rows in that table as events fire away.

Testing

I’ve done some testing on the setup by stressing the system using the following measures (at the same time of course):

  • Content Generator generating 50-100 pages per second
  • Applied load using VS Load Test on the web site for the target database, averaged on 40% CPU.
  • Manually created pages, deleted pages, emptied wastebasket etc
  • Verified that cache expirations worked (new pages appeared within 20 seconds etc)

Overall the performance of replication was good, I did not see more than a few seconds delay before content reached the target. Since replication sets some restrictions on what the target system can do with the database you need to test the functionality that you will be using to make sure it works okay (you can’t have inserts on the target on replicated tables).

image 

 

Download

Disclaimer

SQL replication is not officially supported by EPiServer which basically means that we do not make any tests to make sure every possible scenario works from version to version, this is something you need to do yourself and it may limit you to always stay on the latest version. We will fix any bugs or problems related to replication if possible to make sure it is an option if want to tweak your setup. We will change and add new features to the database that you need to handle when upgrading (determine what should be replicated and what should not). If you make any changes to our schema you need to make sure you remove those before upgrading to a new version since our scripts only knows about how to upgrade our version of the schema, not yours.

Feedback

Please, let me if you got any feedback or I left out something important. If you think that replication is very important scenario you should say so in the comments, it may help convincing the product owners ;-)

    (By Guest , 21 September 2010 10:32, Permanent link)

    Great work Per! I'm really looking forward to try it out.
    / Tom Stenius

    (By Nicklas Israelsson , 21 September 2010 10:32, Permanent link)

    Thanks for sharing, Per! In developer support we've gotten the question regarding database replication a few times and unfortunately we always had to say that people are on their own when configuring this. That's still the case but at least now we'll have some documentation that verifies that it's possible. Nice work!

    (By Steve Celius , 21 September 2010 10:32, Permanent link)

    Nice job as always Per!
    This is an important feature. I have had three cases just the last month where replication has been an option, but as it is not officially supported and not easy to set up and maintain, I cannot recommend it.

    I have great hope for the new mirroring implementation though, and if it works as well as it looks on paper, I think that mirroring should be preferred before SQL Server replication.

    (By Guest , 21 September 2010 10:32, Permanent link)


    Hi

    We have implemented this solution on a test environment with an editorial server pointing to DB1 and 2 load balanced web servers pointing to DB2. The two databases are kept in sync with SQL replication. This solution has worked well so far. However we have implemented this on our live system and the tblRemoteEvents seems to replicate before content in some instances and thus the content miss the trigger.

    The epi server log reports

    2010-05-26 13:44:06.1632010-05-26 14:48:44,214 [17] EPiServer.Events.Remote.RemoteEventsManager.ReceiveEvent(:0) - WARNING:POSSIBLE DENIAL OF SERVICE ATTACK!! Verification data did not match for the following: raiserId:d8e1eca9-3364-4306-ae7e-284e06aaa382 siteId:unknown verificationData:{246:x2}{102:x2}{8:x2}{95:x2}{167:x2}{16:x2}{148:x2}{12:x2}{194:x2}{187:x2}{195:x2}{131:x2}{95:x2}{230:x2}{112:x2}{192:x2}{240:x2}{210:x2}{157:x2}{176:x2}{124:x2}{44:x2}{247:x2}{239:x2}{89:x2}{26:x2}{4:x2}{137:x2}{90:x2}{88:x2}{129:x2}{58:x2} eventId:4e755664-8fd9-4906-88ca-476842076f98 param:36308621-3947-43fe-8267-7d324199044a

    Has any one experienced this issue? were still investigating

    Thanks

    Rob
    / Rob

    (By Per Bjurström , 21 September 2010 10:32, Permanent link)

    The error suggest that data could not be verified. Are all the servers using the same siteId? I see that its defined as the default "unknown" in this instance.

  Please login to post a comment