|Number of votes:||5|
Our client came to us with a problem; they wanted a semantic website that modelled their business allowing users to locate unstructured content, efficiently integrated into a content management system.
To briefly take a step back it is important to define the semantic web and the concept behind it although I am sure most are familiar with it. The term was originally phrased by Tim Berners-Lee to describe the method of allowing computers to understand the meaning or semantics of information contained in the World Wide Web. By adding metadata to content, external agents and automated software would be able to classify this data more intelligently and therefore create logical groupings of information providing extra insight into the user’s search that they might not find in a normal search.
The out of the box EPiServer search allows for generic searching but like most search solutions it doesn’t cater for enhanced search facilities based on semantic groupings. Our client suggested using Smartlogic and their suite of software known as Semaphore to provide semantic web concepts for the content managed website. As EPiServer caters for in depth customisation of the CMS system it seemed that these two pieces of software could successfully work in parallel to provide the rich functionality we required, but still be easy for editors to use.
Semaphore itself is not a complete search solution, rather it is a middleware product set that provides the mechanism to define and store taxonomical/ontological information, automatically classify documents and enhance search facilities based upon taxonomical/ontological information. We therefore still required a search engine to provide our search results so we opted to use a Google Search Appliance (GSA) box to provide the rich search indexing that Google provides.
The goals of the project were as follows:
Firstly, here’s short overview of the major components of Semaphore (you can find more details about the software on their website):
Integration between Semaphore and EPiServer had never been built before, so the challenge for Rufus Leonard was to integrate these services seamlessly into EPiServer, making it easy for users to tag their content with terms in the ontology and automatically classify their documents using the classification server. We created a fully configurable Rufus Leonard Smartlogic solution that sits as a layer between EPiServer and Semaphore that seamlessly integrates the two pieces of software. It uses an n-tier architecture design to provide services to an EPiServer solution allowing developers to query those services and extract the data they require. The solution provides the following layers to support the integration:
By referencing these projects and configuring them through the web.config, users can easily add the services and plug-ins required to start classifying content and getting data back to display in their front ends.
The plug-ins allow users to classify documents and post these results to the GSA box in the form of an XML document so that the content is indexed. By injecting Dublin Core meta tags into the XML document we are able to inject tags based on their relevancy. For example, when we classify a document the Semaphore software may recommend that one term is more relevant compared to another term, based on this we are able to inject this term multiple times into the Dublin Core meta tags so that Google interprets this tag as a more relevant term and as such indexes this page and associated terms accordingly. As well as posting the XML to the GSA box it is of course also possible to have the GSA box crawl the website and have the Dublin Core meta tags output on the front end to achieve the same result. This can all be configured through the GSA front end.
The process to classify the documents through the custom EPiServer plugin and post the XML to Google has been refined to be easy to use and highly intuitive to the content editors. The process is shown below:
As the GSA indexes are created we are then able to call services in the service layer in the Rufus Leonard Smartlogic solution to receive meaningful results so that we can bring back content that otherwise would not have been found in a standard search. Not only do we use these search results when a user actually queries the site, but we also use these to drive users to other content by suggesting related pages or populating content areas with pages that might be of interest. Although these pages may not have been specifically tagged with the same term, through the semantic search we can drive them to meaningful related content as defined by the businesses ontology.
Currently the solution is only compatible with EPiServer 6 although it can be retrofitted for earlier versions if required. The working solution took several months to implement with a large amount of that time spent building the underlying infrastructure without much to demo, so it was great once everything came together and we were able to see a fully functional example of the semantic web in action.
If you would like to hear more about the Rufus Leonard Semaphore integration project or talk to us about EPiServer in general please contact us: http://www.rufusleonard.com/london/