Text in PDF body indexed but not found when searching

 

I have the following problem when searching for a term which only occurs in the text of a PDF file.

The text in PDF files is indexed, i.e. "SearchAttachmentText$$string" is filled with the text of the PDF. When I search for a term which only occurs in the SearchAttachmentText from the Find > Overview interface the item with the term is found.

When I search for the same term from the Find > Configure > Boosting interface the item is not found. Also, when searching from the website front-end, the item is not found.

If I search for the item based on words in the title, the item is found, and I can also see the term on which I initially searched in the excerpt.

Can anybody indicate what might be the cause of this problem?

#188436 Feb 23, 2018 11:02
  • Erik Lagerholm
    Member since: 2016
     

    Are you using the TypedSearch api? In that case you might need a call to InAllField() to allow for hits in documents. My understanding is quite limited, but I think the reason for this is because the index puts arbitrary text fields in something called the All Field. https://world.episerver.com/documentation/Items/Developers-Guide/EPiServer-Find/8/DotNET-Client-API/Searching/Free-text-search/

    #188489 Edited, Feb 26, 2018 7:29
  •  

    Hello Erik,

    Thanks for your reply. No, I'm not using the typed search api in this case. The strange thing is that searching for PDF's in my local development environment with a demo account works. Comparing the index result for the specific document with which I am testing for both environments gave no significant differences, and in both environments the "SearchAttachmentText$$string" is filled.

    What is also awkward is that in the Test and Acceptation environment the "SearchAttachmentText$$string" is not filled, installing the Adobe filter doesn't really seem to work.

    #188571 Feb 27, 2018 9:35
  •  

    Eventually this was solved. I  (re-)installed the Adobe PDF IFilter, and restarted the server. The "SearchAttachmentText$$string" disappeared from the index, and "SearchAttachment$$attachment" appeared, with unreadable data. Nevertheless, this resulted in correct search results.

    #188943 Mar 07, 2018 13:07