How do I reset EPiServer's search index?

Member since: 2008

My SearchDataSource control is not returning pages that have been added recently.  I'm assuming that it's stopped indexing, for some reason.

I'm just a little confused about the search architecture -- EPiServer has a "CMS Indexing Service," but my review of the API and the DLLs tells me that EPiServer is using Microsoft Indexing Service (if so, I can't figure out what catalog they're using...).  Additionally, there's all sorts of Lucene integration floating around.  The whole thing is a little black-box-ish.

So, in the end, what actually maintains the search index in EPiServer?  And how do I get it to start indexing again?  Is there some way to "reset" it?

#40816 Jun 17, 2010 18:47
  • Member since: 2007

    Try restarting the EPiserver indexing service. You should easily find it under services. If you check the vpp folders there should be a indexing folder.

    There's a setting in web.config for the delay before a newly created page get's indexed.

    But I think restarting the service should do it. I've had to reset it several times in different environments.

    /Per

    #40819 Jun 17, 2010 21:51
  • Member since: 2008

    Per:

    Thanks, but that hasn't really solved it.  Search works great on pages we created long ago, but the newer the page, the less likely it is to be in the index.  It's like the indexer stopped indexing at some point in time.

    I'd really like some more insight on how this thing works behind the scenes.  I used Reflector to dig through the SearchDataSource control, and I see that there's a method in there that calls IndexServerSearch, which actually makes uses Microsoft Indexing Server to run a search on...something.  I checked and I still only have the System and Web catalogs, and I never set anything else up, so I have no idea what index it's searching.

    And, on top of all this, I have no idea how Lucene fits in into all this.

    Deane

    #40842 Jun 18, 2010 18:16
  • Member since: 1999

    A quick overview:

    * The versioned VPP (files) is using the EPiServer Indexing Service which uses Lucene for the index.

    * The native VPP (files) is using Microsoft Indexing Service and is the "classic" implementation and not enabled by default. Not dependent on the EPiServer Indexing Service.

    * Searching for pages is using a custom search implementation stored as keywords in the database (see tblKeyword, tblPageKeyword). It listens for events and is not dependent on any indexing service. Implemented in EPiServer.LazyIndexer.

    We are looking into consolidating this for a future version.

    #40855 Jun 21, 2010 11:14
  • Member since: 2008

    So Microsoft Indexing Services is essentially deprecated by default?

    To sum up --

    (1) Lucene indexes binary files, and (2) a custom SQL implementation indexes pages.

    #40867 Edited, Jun 21, 2010 14:53
  • Member since: 1999

    You are correct. We still support native file systems using MS Indexing, but by default we don't use it since you don't get permanent links and versioning if you go that route.

    #40871 Jun 21, 2010 15:23
  • Member since: 2008

    Thanks, Per.  I have a ticket open with support on this, and it's gotten really odd.  In particular, since the page indexing is event-based, not service-based, it couldn't have just "stopped indexing."  Also, support has had me run SQL queries on the keywords in the database, and the results are not consistent.

    I'll report back here with the solution.  I appreciate the background info -- that helped clear up some questions for me.

    #40872 Jun 21, 2010 15:45
  • Member since: 2003

    On workaround is to delete contents of tblKeyword and tblPageKeyword, and then run the following code:

     

    ArrayList array = new ArrayList();
    IList pages = new EPiServer.DataAccess.PageListDB().ListAll();
    foreach (EPiServer.Core.PageReference page in pages)
         array.Add(page.ID);
    
    IndexPageJob job = new IndexPageJob((int[])array.ToArray(typeof(int)));
    job.Execute();

    This will start the re-indexing process (this is a time-consuming process, you only need to start it once).

     

     

    #40894 Edited, Jun 22, 2010 8:09
  • Member since: 2008

    Mari:

    Thanks for this code.  I've wrapped a Scheduled Job around it, and I'm running it now.

    One question, though -- Reflector tells me that IndexPageJob just passes the IDs to LazyIndexer which queues them up.  What process actually clears the queue?  Is this done in the Web process, or is it the EPiServer CMS Indexing Service?

    Deane

    #40904 Jun 22, 2010 17:14
  • Member since: 1999

    LazyIndexer has a timer that checks the queue every minute (the "lazy" part to get better perf when a lof pages are being published). It is done in the web process.

    The IndexPageJob is internally used when the application is being shut down to make sure we don't loose unprocessed pages in the queue, thats why it looks a bit strange and just queues up pages.

    You could also call LazIndexer.IndexPage(pageID) to force an instant re-index of a page (no queues or timers involved).

    #40906 Jun 22, 2010 17:30
  • Member since: 2008

    Per:

    You make an interesting point there -- what happens if there's 1,000 pages in the LazyIndexer queue, and the process suddenly goes away?  I don't see any persistence layer anywhere, so it strikes me that these pages just wouldn't be indexed.

    If the app is shutdown gracefully, you might be able to do something, but if it's reset suddenly, I think you end up with holes in the index. (And, I don't think there's a clean way around this, either.)

    Deane

    #40907 Jun 22, 2010 18:25
First   1 2 3   Last 
This topic is locked because the last reply was posted more than 6 months ago. Please contact epw@episerver.com to unlock it.