PDF files and EPiServer indexing.

Jan Saare
Member since: 2002
 

Hi

According to "Microsoft Indexing Service" (23-09-2008) EPiServer indexing takes care of indexing the versioning UFS. I cant get it to index (probably find and bind) to the pdf IFilter. MS-Indexing-server is working and placing the same pdf file in a native UFS a MS-indexingserver search with a SearchDataSource is working.

Is EPiServer Indexing-service supposed to bind to a pdf-Ifilter? It seems to index .doc and .xls files. I have recreated the index folders in the VPP folders.

Must the IFilter perhaps be present when the EPiServer installation is made?

Any one?

#29106 Apr 06, 2009 16:18
  • Per Bjurström
    Member since: 1999
     

    The EPiServer Indexing Service should find any IFilter when indexing files. The IFilter must be available when the document is being indexed but not when EPiServer is installed (since filters are discovered at runtime when indexing documents). Which version of EPiServer CMS are you using ?

     

     

    #29115 Apr 07, 2009 12:11
  • Jan Saare
    Member since: 2002
     

    Hi

    I'm using CMS 5 R2 SP1 on Windows 2003 R2. Scratch installed. Installed pdf IFilter first from Acrobat reader 9 (IFilter v 6.0) Have tried v 5.0 also.

    IndexingService proces loads the filter at startup (scaned with filmon). But no indexing. What is trigging the indexing process to start? EPiServer or a filesystem eventwatch? Is there a way of logging the EPiServer indexing service?

    #29119 Apr 07, 2009 14:38
  • Per Bjurström
    Member since: 1999
     

    The service queries the database every minute for files that are not indexed (for example new or changed files uploaded inside edit-mode). If the complete index is deleted from disk it will reindex everything from scratch.

    Stop the Windows service, then you can start the indexing service exe from a command prompt with "EPiServer.IndexingService.exe DEBUG" to see what it actually does. (That will print all log4net messages to the console).

    #29122 Apr 07, 2009 18:14
  • Jan Saare
    Member since: 2002
     

    Thanks. I uploaded the good old Drift av EPiServer 4 pdf (244kB) and this is what the log says: 

    Scanning configuration 2 for changes
    Deleting item medlemmar 2008.pdf from index
    Deleted 0 item(s) from index
    Deleting item Drift av EpiServer 4.pdf from index
    Deleted 0 item(s) from index
    Create index document for item Drift av EpiServer 4.pdf
    Exception creating index for item Drift av EpiServer 4.pdf - Capacity exceeds ma
    ximum capacity.
    Parameter name: capacity
    Failed to create index document for item Drift av EpiServer 4.pdf
    Closing index
    Going to sleep

    The line "Deleting item medlemmar 2008.pdf from index" repeats every run (every minute) in all (3) index configurations.

    /Janne 

     

    #29124 Apr 08, 2009 8:50
  • Per Bjurström
    Member since: 1999
     

    I think this is a problem that has been fixed after SP1, but we have not been able to reproduce it. Basically we do not allocate enough memory for some IFilter chunking implementations. Try uninstalling the latest IFilter and get an older version installed.

    I think you need to file a support case to dig any deeper into this if you can't get it to work, we have had several related cases to PDF:s the last month so a reasonable guess is that it is version related since this has worked in the past.

     

    #29137 Apr 08, 2009 15:53
  • Arnold Macauley
    Member since: 2005
     

    Hi Jan,

    Did you get anywhere when you tried installing a previous version of IFilter?  We are experiencing similar problems when using IFilter v 6.0

     

    Thanks

    #32867 Sep 22, 2009 12:01