FindPagesWithCriteria exclude pages with unpublished parents

Vote:
 

We have our own indexer that index all pages in our site, we get all pages using FindPagesWithCriteria and use multiple criterias to speed up the indexing.
We have noticed that pages that has an unpublished parent is the result, we don't want pages that has an unpublished parent.
Is there anyway to remove the pages that has an unpublished parent anywhere up in the pagetree?

#54348
Oct 11, 2011 10:13
Vote:
 

I don't think so, not in the query. You have to post-filter them somehow.

But if you start crawling up the page tree for every page you will probably pull most pages on the site into the cache (=cache posioning), not good if you do this crawl often.

An approach is maybe to find all unpublished pages, get their Decendant pages (DataFactory.GetDescendents - will not pull pages into cache) and intersec this with your current result set to find the pages you should exclude.

#54349
Oct 11, 2011 10:23
Vote:
 

Hi Magnus

To bad there isn't a way for this.

Any nice tip to convert DataFactory.Instance.GetDescendents to a PageDataCollection?

 


#54354
Oct 11, 2011 12:54
Vote:
 

Well, PageDataCollection has a constructor you can use with an IEnumerable which you can get by projecting the collection you get, something like

// descendents contains result from GetDescendents

PageDataCollection pages = new PageDataCollection(descendents.Select(DataFactory.GetPage));

But the point is, in this case, you shouldn't do that unless you have to, because it will pull all the pages into cache for no reason. Instead just compare the PageReferences from GetDescendents with the PageReferences you have in your collection of PageDatas (through PageData.PageLink). 

#54355
Oct 11, 2011 13:23
Vote:
 

Thank you for the help Magnus.

Got it working now.

#54356
Oct 11, 2011 13:49
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.