Programmatically search documents

Vote:
 

Hi all,

using the template site and installing the relevant pdf IFilter I am able to search the contents of pdf files. I notice that there is a 'UrlToDocument' property on the 'Document' pagetype which is set to be searchable ('

#50211
Apr 18, 2011 14:38
Vote:
 

Hi!

When you set searchable on a property, that means that the content of the property will be indexed. In the UrlToDocument case, that would mean the link would be indexed, not the content of the link you are referencing. So that shouldn't have any impact on what you are trying to to.

However, what you want to achive shouldn't be that hard. In a standard installation all files added to the default directories, 'Documents', 'Global Files' and 'Page Files', should be indexed by the EPiServer Indexing Service. And then you would use the UnifiedSearchQuery class to search for those files.

Regards

Per Gunsarfs
EPiServer Development Team

#50213
Apr 18, 2011 15:24
Vote:
 

Hi Per,

Thank you for your response - very beneficial. In terms of searching inside the uploaded pdf files, will I have to do anything else to search pdf  contents or is installing the pdf IFilter enough?

I currently have the following:

string[] virtualRoots = { "/Global/","/Documents/","/Documents/Guides", "/PageFiles/" };

foreach (string virtualDirectoryPath invirtualRoots)

{

UnifiedDirectory dir = ((UnifiedDirectory)HostingEnvironment.VirtualPathProvider.GetDirectory(virtualDirectoryPath));

UnifiedSearchQuery query = newUnifiedSearchQuery();

query.FreeTextQuery = "consumer";

UnifiedSearchHitCollection hits = dir.Search(query);

}

I have some pdf files inside the 'Documents/Guides' directory with the text 'consumer' in their content. However, doing a free text search as in the above code doesn't yeild any results.

Any help would be greatly appreciated.

Thank you

#50236
Apr 19, 2011 15:20
Vote:
 

Your code looks good, and as far as I know you shouldn't need to do anything else except installing an IFilter to get it to work with pdf files. The IFilters are used when indexing the content of the files, and shouldn't have any effect on how you perform the search

Regards

Per Gunsarfs
EPiServer Development Team

#50237
Apr 19, 2011 16:04
Vote:
 

Hello Per,

do you know if there's any way to connect the UnifiedFile objects (from the UnifiedSearchHitCollection hits) to the EPiServer page that they reside in? I'm looking for a PageID or PageReference.

Regards

//Patrik

#54272
Oct 07, 2011 11:15
Vote:
 

Hi

I haven't actually tried it, but I'm wondering if something like the code below wouldn't work. That is assuming of course that the file actially resides in a page folder.

int pageFolderId;
UnifiedDirectory.TryParsePageFolder(unifiedFile.VirtualPath, out pageFolderId);
PageReference myPage = DataFactory.Instance.ResolvePageFolder(pageFolderId);

Regards

Per Gunsarfs
EPiServer Development TEam

#54282
Oct 07, 2011 15:51
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.