Adding a secondary node - entry relation takes a long time for some categories

Vote:
 

We have some categories with 10's of thousands of products in them for organizational purposes. Adding products to one of the categories (as a non-primary relation) can take a long time (> 1 min) when using the Commerce Shell.

I see that the RelationStore Rest API in the commerce shell package seems to be creating writeable clones of all the relations for a parent before modifying any relation for the parent. Perhaps that is an expensive operation to do with 10's of thousands of relations?

If this is correct, is there a maximum number or recommended limit of products/variations (Catalog Entries) that should belong to a category (Node)? 

If that's not a correct assessment of the catalog rest api's behavior, is there some other step I can take to troubleshoot?

#186851
Jan 05, 2018 21:27
Vote:
 

I should note: We're on EPi Commerce 11.5.0

#186852
Jan 05, 2018 22:16
Vote:
 

I remember seeing a recommendation to structure 100*100*100 if you have 1M products, but I think I remember that from an old Commerce cert exam :D

#186853
Jan 05, 2018 22:20
Vote:
 

Thanks, Joel. I didn't remember that, so maybe it's time to get recertified. =)

Belonging to one of the large categories is an important part of our business rules, including how discounts are applied. Perhaps we could restructure the large categories by subcategorizing products by their name's first letter. But, I think our stakeholders would like to keep them as-is. And, even with sucategorization, there could still be some massive subcategories.

#186854
Jan 05, 2018 22:58
Vote:
 

They would still belong to your 4 main categories, but you'd need sub categories beneath them. So you can keep that part of the business rules, might have to change some technical relation rules and saying that the ancestor category is the one being displayed etc.

What I'm trying to say is that 100*100100 doesn't necesarily mean you need to have 100 top level categories, it rather mean that you should spread out your products in as many categories as you can. So going 4 * 25 * 100 * 100 Would be a way I suppose.

#186855
Edited, Jan 05, 2018 23:05
Vote:
 

You don't see the same slowness when adding other relations, e.g. to smaller categories? If you do, I suggest checking/rebuilding your indexes. If you don't, this could be an interesting point to investigate for us. Do you have the possibility to capture some profiling data for this operation to get some more info about where all this time is spent? SQL trace could also be interesting, if you can find that this operation causes massive SQL updates. Ideally it should update only the new relation, but we have had bugs before where other relations are affected, causing more updates than necessary.

#186856
Jan 05, 2018 23:07
Vote:
 

If you are unable to enable sql profiler atleast check the table [NodeEntryRelation] in the commerce database for any relations with a recent Modified datetime. More specifically see if your insert has triggered an update of a large set of relations (potential updating the SortOrder of everything in the same node).

#186914
Jan 08, 2018 16:36
Vote:
 

@Erik Norberg - Thanks for the suggestion. I don't see the NodeEntryRelation table modified for more than the specific node-entry relation that was intented to be assigned. Therefore, I think my conjecture that relations were being spurriously updated is incorrect.

@Magnus Rahl - I do have some sql profiler traces, but I was having a hard time sorting out updates to the secondary node relations. I'll try rebuilding indexes again and see if the performance improves. If not, then I'll try to isolate the sql for the relation update call and report back.

The other categories are slow, but the update relation request duration appears to increase for nodes that have more relations to begin with.

#186921
Jan 08, 2018 20:21
Vote:
 

Keep an eye out for calls to:

ecf_catalog_getchildrenentriess

We had an issue recently where it got called an extraordinary amount of times. The response time for each call wasn't too bad <100ms, but since it was called so many times it stood for 80-90% of the database server's CPU load.

We will work on optimizing our code but in the end I can't rule out that we might have to add a new index to improve performance of that stored procedure.

#187012
Jan 10, 2018 10:11
Vote:
 

A grey area optimization 

http://vimvq1987.com/2017/09/episerver-commerce-catalog-performance-optimization-part-4/

It's not officially supported, so use it with your own risks :)

#187016
Jan 10, 2018 11:43
Vote:
 

Ah, thanks for reminding me Quan, have read it before but didn't recall it was the same sp.

Will work on why we are calling it in the first place then. :)

#187019
Jan 10, 2018 12:46
Vote:
 

Sorry for the late response.

I think the slow response when adding a product to a node is due to catalog update events updated for relationships that don't change. Update events are generated for all entries that have an existing relation to the node which is involved in the newly added relation.

My Findings

Here's a screenshot of a timeline trace from dotTrace covering the relation update. (I've filtered the view to hilight code executed during http requests.)

https://www.screencast.com/t/EmcS815L3PU *

In the trace are requests to:

1) GET the svg asset for drawing animation,
2) POST to update the relation store (the actual update work)
3) GET the content structure for the node involved in the relation update, and
4) some GET requests after the addition of the new relation is complete for assets and more catalog structure.re api call.

The call to update the relation takes 33 seconds to complete. This category only has 531 entries in the NodeEntryRelation table. This seems like a long time to update all these relations. (I'm reindexing once more now to see if getting the content is slow and out-of-date indexes are the problem)

Here's the payload for the POST to update the relation:

{"id":"1073742035_73362_1","contentTypeIdentifier":"bbscatalyst.pagetypes.commerce.eolnodecontent","groupName":null,"name":"Valentine's Day","path":"Catalog Root\\EOLCatalog\\Designs\\Valentine's Day","code":"Valentines-Day_1","quantity":1.0,"sortOrder":800,"source":"73362_2072519_CatalogContent","target":"1073742035__CatalogContent","type":1,"typeString":"Category","typeIdentifier":"episerver.commerce.catalog.linking.relation","requestMode":0}

Here are the relations I logged when adding the product to a different node. 

What is retrieved from RelationRepository.GetChildren<Relation>(parent):

1073741979<-97157, ro=False, so=0
1073741979<-113813, ro=False, so=0
1073741979<-113814, ro=False, so=0
1073741979<-113816, ro=False, so=0
1073741979<-113817, ro=False, so=0
....
1073741979<-147745, ro=False, so=300
1073741979<-106461, ro=False, so=400
1073741979<-147049, ro=False, so=800
1073741979<-30300, ro=False, so=900

What is passed to RelationRepository.UpdateRelations(list):

1073741979<-97157, ro=False, so=0
1073741979<-113813, ro=False, so=0
1073741979<-113814, ro=False, so=0
1073741979<-113816, ro=False, so=0
1073741979<-113817, ro=False, so=0
...
1073741979<-147745, ro=False, so=300
1073741979<-106461, ro=False, so=400
1073741979<-147049, ro=False, so=800
1073741979<-30300, ro=False, so=900
1073741979<-73362_2072519, ro=False, so=1000   // New relation with workid

ro is read-only; so is sort order; The node id is on the left of the "<-", the entry id on the right of the "<-":

Diffing the list of node-entry-relations retrieved and the list to be updated, the only relation that's changed is the one added by the POST. It seems to me like Episerver.Commerce.Shell.Rest.RelationStore is retrieving all the relations of the Node and submitting them to be checked for updates.

Here's a screenshot of the sql calls made during the request to update the relation by time taken. Note that dotTrace had too many sql events, so this list definitely omits some, and may omit some that are significant. 

https://www.screencast.com/t/gK8wZjw6QxBo

I have some SQL traces that may mix up the relation update with find indexing. If it's important, I can try to get a clean SQL trace of the calls being made during the relation update only.

Also of note: the Find indexer starts retrieving content to update about 10s after the call to update the relation is POSTed. I grabbed the contents of tblFindIndexQueue after a few minutes of adding to the large category. There were about 3.5k entries. All the rows were catalog entries, and all were children of the node that the relation was added to.

Is this a bug?

I'd really rather that the update events be prevented when the relation is not modified.

It seems that once Mediachase.COmmerce.Catalog.Manager.CatalogRelationManager gets the list of relations that might be updated, then it has to retrieve everything from the db to compare it to the list of possibly updated ids. This seems like a long operation, and it's done piecemeal, instead of retrieving all the data in one sql call. It also seems to trigger Find reindexing on everything, so perhaps it thinks there's a change even after it compares the data to update to the data in the db.

If CatalogRelationManager can't detect updates without retrieving all the data, perhaps RelationStore can prevent triggering updates for the relations that are not changing due to the POST to add the relation?

Our own code

We have a catalog event listener reacting to updates. We clear the frontend cache for products and relations that are updated, and retrieve and reindex the prices for updated variations, too.  This doesn't seem to cause any frontend slowness.

However, when our code is not commented out, it seems like call to update a relation takes even longer. Perhaps it's because we retrieve prices. Perhaps we're also listening to spurrious updates, or listening to too many types of events. Perhaps we should be more granular about distinguishing between price updates and relation updates. But, in the end, we trigger a Find reindex for anthything that we get a catalog event for.

addenda:

  * Longer explanations of what the numbers in the drawing are.

  • A: Using the admin UI, I add a product to a secondary category. This is the thread with the post
  • B: The call tree shows the amount of time spent in each method. Shown is the thread that is processing the call to the relation store 
  • C: Before the call to the relation store, this is the call to get the svg to draw the drag animation
  • D: More than 85% of the time spent is under EPiServer.Find.Commerce.CatalogContent.IndexContentsIfNeeded
  • E: The thread to do find indexing starts approx 10s after the request is made
  • F: I'm redacting urls to protect our test site.

(A screenshot showing the few requests just after the call to update the relation completes: https://www.screencast.com/t/zJyxLE35X8c. You can see the end of the post to update the relation, another call to get the relations for entry being updated, calls to get the content structure for the primary category and the newly updated one, and thumbnails for the products listed in the catalog flyout)

#187204
Edited, Jan 16, 2018 0:29
Vote:
 

Thank you for the detailed post!

RelationStore could probably be more clever and only create writeable clones as needed (i.e. when reordering items it will need to update more items than the one moved). However, creating the clone is a cheap operation, and the items without actual changes are ignored when compiling the data to store to the database and to include in the ordinary .net events raised locally. This also goes for updating entries, nodes etc as well.

However, there is a bug in remote events for relations specifically, where it doesn't check the change state of the data before including it in the remote event, meaning it will include the unmodified entities as well. I'm pretty sure this is what you are seeing. And I'm thinking the reason it makes it so slow is not so much broadcasting the massive event data, but what happens as a reaction to that event data - find starting to reindex tons of content as well as your custom code loading prices etc.

The good news is this should be fixed in Commerce 11.7.0 that is currently in QA and will hopefully be released next week.

Slightly off-topic tip:

You normally don't need to listen to remote events to clear items from the cache. Cache clears are already distributed to all servers so there are a couple of approaches that are probably simpler:

  1. Listen to the ordinary .net events (e.g. in CatalogEventHandler or EventContext) and use ISynchronizedObjectInstanceCache to evict the keys you know are related to the change described by the event data. This will then be automatically replicated to all instances.
  2. Use cache dependencies when inserting your custom data to the cache. The ISynchronizedObjectInstanceCache Insert and ReadThrough methods take a CacheEvictionPolicy object which you can set up with a set of "master keys" which you can get from the MasterCacheKeys class. These master keys will be evicted when there is a change to the correspoinding object, and if your object depends on that key it will be evicted too, automatically across all instances. Example:
ISynchronizedObjectInstanceCache cache;
cache.Insert("MyKey", myObject,
    new CacheEvictionPolicy(null,
        new[]
        {
            MasterCacheKeys.GetNodeRelationKeyForEntry(1),
            MasterCacheKeys.GetEntryKey(1),
            MasterCacheKeys.RootKey
        }));

This will make sure "MyKey" is evicted from the cache whenever there is a change in a node-entry relation involving the entry with id 1, or if there is a change to entry 1 itself. I recommend always throwing the RootKey in there too, it is used by some rare events like the catalog language setup changing, to nuke all objects.

#187212
Jan 16, 2018 7:17
Vote:
 

@Drew: this is the kind of post I want to see more on World forums. Very nice post, detailed, well formatted and I have no doubt you put quite effort into debugging and researching. Keep up the good job!

#187221
Jan 16, 2018 12:27
Vote:
 

@Quan

Thanks, It's hard to be clear when describing a complex system, so I'm glad it came through that way.

@Magnus

Thanks for your detailed response. My further testing shows that it is indeed the remote events that are causing the latency when adding relations.

I had my doubts, because eveything seemed to be happening on the same thread as the POST to add the relation. But, after turning off the remote event client and server for that site (I commented out the endpoint elements in the web.config system.serviceModel), the relation update returns within a second or two, even for the categories with 80k relations.

We'll look forward to the release of Commerce 11.7.0 and perhaps turn off remote events in the meantime.

With respect to which events to listen to:

I believe I misstated what we are doing. We are indeed listening to the .NET event events as described in Pro EPiServer Commerce Chs. 9 & 10 (thanks, @Quan, it's been a really helpful book. Looking forward to the cookbook!). 

commerceProductUpdatedEvent = Event.Get(CatalogEventBroadcaster.CommerceProductUpdated);
commerceProductUpdatedEvent.Raised += CatalogEventUpdated;

keyEventBroadcaster = Event.Get(CatalogKeyEventBroadcaster.CatalogKeyEventGuid);
keyEventBroadcaster.Raised += PriceUpdatedEvent;

We then filter for relation, content, and association updates, and price updates.

We are not caching content. We cache find results and typeahead results, but the former are perhaps overkill. Find is pretty quick at giving us our data. Our strategy was that, if a relation gets updated, then we need to clear all Find results: if a product is added to a category, our find results for that category need to have the product; if content gets updated, then we need to clear Find results as the facets/keyword could be different. But, since we haven't seen any performance problems since we switched our category/search listing to only using find data (as opposed to retrieving content from the db to generate result view models) we may not need to cache these results at all. Typeahead triggers queries more frequently, though, so we still think caching those is a good idea.

For price updates, I believe we still need to submit the update to find for reindex and to clear the find cache to get the facets correct. 

#187240
Jan 16, 2018 17:57
Vote:
 

@Magnus

Is the chagnge-state check for remote events fixed in 11.7.0? 

https://world.episerver.com/documentation/Release-Notes/?versionFilter=11.7.0&packageFilter=EPiServer.Commerce&typeFilter=All

Is it the this CatalogEventBroadcaster message size exceeding 256 KB bug?

https://world.episerver.com/documentation/Release-Notes/ReleaseNote/?releaseNoteId=COM-6051

#187419
Jan 22, 2018 20:39
Vote:
 

Yes, 11.7.0 has the fix you are looking for and it was done as part of the fix for COM-6051.

Regarding clearing the cache, the approach I suggested with the cache dependencies work even if you cache other objects than content (in fact, you should never cache content objetcts yourself). So instead of caching your typeahead result and remove it when you get a change event for entry A or relation B, cache it with dependencies to the master keys for Entry A and Relation B and it will be automatically evicted from cache when there is a change in either of these objects.

#187420
Jan 22, 2018 20:46
Vote:
 

@Magnus,

Thanks a lot! We'll try doing that.

#187421
Jan 22, 2018 20:49
Vote:
 

Update 198, which includes Commerce 11.7.0, is now available. 

#187424
Jan 22, 2018 21:50
Vote:
 

@Bob

Thanks, upgrading now.

#187425
Jan 22, 2018 21:52
Vote:
 

I've upgraded our site in our test environment, but still need to implement a new caching/cache invalidation strategy. The relation update is still taking an inordinate amount of time.

#187471
Jan 23, 2018 21:32
Vote:
 

I'm sorry to hear that. To make sure you see the same behavior as me regarding the events, could you enable a debug logger for the CatalogEventBroadcaster in EPiServerLog.config, like so:

<logger name="Mediachase.Commerce.Catalog.Events.CatalogEventBroadcaster">
<level value="Debug" />
<appender-ref ref="debugLogAppender" />
</logger>

(debugLogAppender is some appender you define)

When you link the entry to a secondary node you should basically see two lines added to the log. One for the RelationUpdated event (which can contain a huge amount of IDs since it includes the unmodified rows in the dataset) and one which should really only contain the IDs of the node and entry involved in the link. Only the RelationUpdatED event is listened to by the find indexer, so you should see less reindexing after the fix as well.

#187473
Jan 24, 2018 7:13
Vote:
 

@Magnus

I do indeed see 2 lines added to the log from CatalogEventBroadcaster. These are for adding a secondary node-entry relationship, where the node has relatively few relation:

2018-01-24 12:16:25,408 DEBUG [508] Mediachase.Commerce.Catalog.Events.CatalogEventBroadcaster.RaiseEvent - Event type 'RelationUpdating' for Catalogs [ 2 ] Nodes [ 211 ] Entries [ 7329, 7705, 7911, 8217, 8921, 9151, 10685, 10739, 11201, 11203 ... and 522 more ]

2018-01-24 12:16:32,690 DEBUG [508] Mediachase.Commerce.Catalog.Events.CatalogEventBroadcaster.RaiseEvent - Event type 'RelationUpdated' for Catalogs [ 2 ] Nodes [ 211 ] Entries [ 46570 ]

As you say, the post-update event (RelationUpdated in this case) does only have the single updated relation. We are indeed listening to the post-update events. We are just being too general in what needs to be checked for consistency based on the relation event.

Here's a trace what I'm seeing when adding to a large category. We're obviously triggering reindexing for too much catalog content when the relation is updated.

https://www.screencast.com/t/jVzJJb53

I've proposed to our stakeholders that I tear down all our caching, see if performance is acceptable, and then add back caching where needed. We've recently tried some APM tools, so hopefully that will help us only target the pain points for users.

The more important issue at the moment is the find reindexing for catalog events for consistency (eg. variation price, entry associations) as our Find results are dependent on those, but it's where we're not being selective enough. I need to invest time a little time in analyzing why our logic is wrong and how to do it better. However, I'm waiting for direction from stakeholders to figure out the first place to invest time.

#187508
Jan 24, 2018 19:37
Vote:
 

From that trace it clearly looks like you are reacting to the UpdatING event. Are you sure you have wired it up correctly and that you are reacting only when the CatalogContentUpdateEventArgs.EventType is an UpdatED event?

Some more observations:

It looks like the ReIndexByEntryIds method is calling straight into the CatalogEntryManager.GetCatalogEntry method. You shouldn't call the Manager class methods directly, you should use ICatalogSystem or all your calls will be uncached. Also, we recommend not using the Entry model as it is very heavy to load. It is better to load the IContent models, as you also seem to do. So what are you doing with the Entry?

Finally, I wonder if this isn't all unnecessary. If I understand correctly you are using Find, and if you have the EPiServer.Find.Commerce package installed, evented reindexing of entries for node/entry/relation/association as well as price and inventory changes is already wired up for you. No need to index anything yourself.

#187511
Edited, Jan 24, 2018 20:22
Vote:
 

@Magnus

Yes, you are correct that we are using Find. We are also using ServiceAPI in addition to the admin to update catalog prices, relations, and associations. (Though we have only observed the slowdown when updating relations in the admin and not the ServiceAPI)

IIRC The reason we wanted to listen to events (CatalogEventBroadcaster and CatalogKeyEventBroadcaster) was that we were worried that the Find client would not get event notifications from ServiceAPI calls to update price, relation, or association. We thought that reindex would be a cheap operation, so reindexing when it wasn't necessary wouldn't cause harm. Is this a mistaken concern? Do IPriceServcie and the Find Client listen to events emitted from ServiceAPI when ServiceAPI updates prices, relations, or associations?

You are right that we are listening to all CatalogEvents and reindexing based on the catalog keys in the events. I'm now restricting reindexing to Relation/Association/Entry-Updated/Deleted events. I'll test with that and report back.

#187515
Jan 24, 2018 22:43
Vote:
 

I'm not following the entire discussion (TL;DR), but to your latest questions, ServiceAPi uses IPriceService internally to update prices. and also the price service is a publisher, not listener of the price events.

And yes, Find Commerce (not Find) listen to all the events (CatalogEventBroadcaster), price and inventory changes and reindex the related contents. (Find by default only listens to content level events.). However you are able to change the default behavior and tell FindCommerce to stop listening to certain events. So you can basically install Find.Commerce and let it do the job, unless you have some special requirements.

#187517
Jan 24, 2018 23:26
Vote:
 

@Quan

Thanks for your response.

My concern with IPriceService is: when I call IPriceService.GetCatalogEntryPrices, I don't want to get an out-of-date price. The scenario that I thought could lead to that was: ServiceAPI updates a price on another server. But, it sounds to me like you're saying that IPriceService listens to all catalogkey events, and that catalogkey events are emitted by ServiceAPI when any price changes.

What do you mean by "... and reindex the related contents"? Will Find Commerce reindex a Product if the Variations related to it through CatalogEntryRelation change or the price of any of the Variations? Will Find Commerce reindex a product if a new CatalogEntryAssociation is added in which the product is either the source or the target? I need to update what's in Find for a Product if any of these related/associated entries change.

@Magnus

Here's another trace of the beginning of a call to add an entry to a large category. This is after I filtered the events we listen to to only Association/CatalogEntry/Relation-Updated/Deleted Events. The XHR to the RelationStore took 12 min to complete.

https://www.screencast.com/t/jcBHzb2h0u

#187518
Jan 25, 2018 0:19
Vote:
 

All the built in services like the content repository, the default price and inventory services etc are automatically taking care of cache evictions for any updated or dependent entity, and this is synchronized across all the instances in the server fram (provided remote events are set up correctly and working). So no need to handle any of that. For custom objects put into cache, I again recommend using the cache dependencies over handling events whenever possible.

When it comes to the automatic reindexing for content in Find.Commerce "related content" refers for example to reindexing all entries in a node if the node itself changes (because the index documents for the entries depend on data from the parent node), reindexing both ends of a relation or association etc. Price changes also trigger reindexing of the affected entries. Changes to Variation data doesn't reindex the related product (or vice versa) IIRC, but if that is a problem it is something we could look into and at least add extension points for.

The latest trace is very different from the previous one. In this case it seems most of the time is spent loading relations either from cache or from the database. I'm a bit surprised how much that costs, how many entries are in that node? Anyway I think it shows the Catalog UI / RelationStore wasn't designed with nodes that large in mind. We might be able to optimize it so it doesn't have to load all the relations in the target node, but there is no quickfix.

#187526
Jan 25, 2018 7:53
Vote:
 

Rahl already answered most of your questions. I just want to add that (the default implementation of) IPriceService does not listen to anything. It only fires events when the prices are changed (and saved). As Rahl said the cache is handled by the famous `ISynchronizedObjectInstanceCache` so IPriceService only remove the cache key, and ISynchronizedObjectInstanceCache will handle the rest on other servers. 

#187530
Jan 25, 2018 8:54
Vote:
 

@Magnus 

Ok, thanks for the explanation of the entities covered by the synchronized cache. We'll change our custom cahce keys to use EPiServer's MasterCacheKeys as you describe here:

https://world.episerver.com/forum/developer-forum/Episerver-Commerce/Thread-Container/2018/1/adding-a-secondary-node---entry-relation-takes-a-long-time-for-some-categories/?pageIndex=2#187212

It would be helpful to have something in Find.Commerce that would allow us to define a custom invalidation/reindex strategy. A master-/content-specific-Find key similar to the master-/content-specific cache keys would allow the maximum flexibility. But, perhaps providing an extension point that allow us to compute additional content that must be reindexed when a particular piece of content must be reindexed would be easier for us site developers to use. Perhaps my thread here has convinced you that EPi devs shouldn't be working with the event system since we can screw it up pretty easily. smile For now, we'll continue to use the event system to detect which product needs to be reindexed in Find. 

I'm about to go into an all-day meeting, but later tonight or tomorrow I'll do another test with all of our custom event code commented out to see what the performance is when adding to another large category. The one from the last trace is a Node with 22k relations in the NER table.

#187550
Jan 25, 2018 18:11
Vote:
 

I have created a bug for the excessive loading of relations when adding a new relation. I spotted one thing which might be reasonably easy to fix. Remains to be seen if that holds and how much boost it will give.

Find.Commerce reindexing extension points is a good idea. I'm not sure what possibilities there are already, there might be some considering it is not an unusual scenario to add extension methods with custom indexing for CMS pages as well. I'll try to find out.

If you can provide an example of something you include on the index documents today that has a dependency in such a way that it doesn't get reindexed correctly, that would certainly help sell that feature to the product team.

#187562
Jan 26, 2018 12:12
Vote:
 

@Magnus

First of all, thank you for all the time you've put in to responding to my messages, helping me diagnose what's going on, and creating the bug report.

To confirm my conjecture that our custom code is not affecting the behavior of the admin (at least, since we're now filtering events for UpdatED) I commented out the registration of our Catalog event listeners and ran one more trace. The performance is similar to the slow performace I saw before (12min for a node w/ 20k products). This trace is in the middle of the XHR being processed:

https://www.screencast.com/t/m8kdSJUlmXl4

The trace shows there were two other threads running to retrieve content structure, perhaps to populate the "Catalog" list in the right flyout.

https://www.screencast.com/t/m8kdSJUlmXl4

https://www.screencast.com/t/auwzoU3LDNHs

Why we think we need to reindex "manually"

The site is a digital+physical products ecommerce site. Digital products are available in several file formats. We want end-users to be able to filter products by file format. We also offer collections of digital products. Epi packages and bundles didn't quite fit our business requirements. So, these collections are regular products whose consitutent products are defined by an entry association with a specific name. (The collection product still has its own variations with their own unique prices). When a constituent product's file formats change, we want the collection product file formats updated and the colleciton to be reindexed in Find. Since it's at the other end of an association, we listen for updates to catalog entries, figure out if the updated entry is a part of the collection, and then ask Find to reindex the collection product in addition to the updated constituent.

We also found that price updates scheduled for the future weren't triggering find reindexing when the time lapsed for the new price. So, we created a scheduled job to trigger every few minutes, pull price records that had changed since the last scheduled job run, and reindex the products related to the variations with the updated price. We're just using a custom sp to retireve prices that have expired since the last run of the scheduled job. We understand direct db access isn't supported.

http://vimvq1987.com/2017/08/reindex-obsolete-prices-episerver-commerce/

We also were concerned that ServiceAPI updates to prices wouldn't trigger a reindex or invalidate the synchronized cache, but I think that we are mistaken in that accoring to @QuanMai's response:

https://world.episerver.com/forum/developer-forum/Episerver-Commerce/Thread-Container/2018/1/adding-a-secondary-node---entry-relation-takes-a-long-time-for-some-categories/?pageIndex=3#

#187570
Jan 26, 2018 19:42
Vote:
 

Thank you for the detailed reports :) I would say all of us in the dev teams are very committed to improving our products, and possible performance improvements I think many of us find particularily engaging. And to get traces with a report without even asking is simply fantastic.

After looking more closely into this I agree that the main issue is probably not the events (even though the fixes on our side prompted by this discussion are still valuable). The problem is that it is trying a bit too much to get the sort order absolutely correct in relation to what you see when you drag-and-drop sort items in the catagory view with the default view of variants grouped under products. Though it can be argued that it is logically correct and not a problem on small categories, the added logic is very heavy on large categoris (basically fetching relations for each and every entry in the category to analyze the product-variant grouping) and completely unnecessary for any scenario I can come up with. So in a future bugfix (COM-6360) I have removed that logic together with some other optimizations so it should be significantly faster in your case.

One of the two additional traces is actually the same as the main one posting the new relation. But the other one, like you say, seems to be listing the entries in the category to show in the UI. We have done some optimizations in that area to allow for large categories, for example it will automatically fall into a simplified mode where it displays a flat listing of products and variants (rather than nesting the variants under the products).

For the custom indexing you indeed seem to have some cases not covered by the default behavior. I suggest you keep your approach listening to the events to add the products to the indexing queue based on your custom logic. But I also suggest you avoid adding items that are already indexed, e.g. the product/variant itself whose id is included in the remote event data. I think the indexing queue has some buffer interval and de-duplication, but to avoid indexing the same thing twice I suggest you single out only the extra items you need to index and leave the rest to the framework. I think taht will be a good enough solution and we'll think about whether adding a specific extension point for this in FindCommerce is necessary.

The price thing sounds like something we might want to pull into the framework. I'll bring it up for discussion.

#187642
Edited, Jan 30, 2018 15:32
Vote:
 

@MagnusRahl

Sorry, here's the other thread I meant to post instead of the duplicate of the RelationStore call:

https://www.screencast.com/t/3joYospLMUeL

I like posting traces for performance problems, because otherwise I'm just waving my hands and complaining. I'm glad they were helpful in evaluating this question.

Thanks for fixing the bug so quickly. We'll look forward to its release. While we're waiting, what should our strategy be? Update exclusively through the ServiceAPI? Add rows to the NodeEntryRelation table directly and wait for the cache to be invalidataed? We could probably reduce the cache times in ecf.catalog.config on the frontend in the interim. Is this the place to do it? Do these represent 15min cache timeouts?

<Catalog autoConfigure="true">
  <Cache enabled="true" collectionTimeout="0:15:0" entryTimeout="0:15:0" nodeTimeout="0:15:0" schemaTimeout="0:15:0" />
...
</Catalog>

Thanks also for the suggestions for indexing. We will be more careful about computing what actually needs to be manually reindexed.

Custom reindexing calculations do seem to be a special implementation case to me; you support it, but most sites don't use it. Perhaps writing custom event listeners is an appropriate task for devs who want to accomplish indexing of associated content.

I understand that designing an API is a balancing act. You want to provide powerful features that are easy to reason about and use. Implementors are always under pressure to implement features in the end-user application in as cost-effective a way as possible. We're excited that the performance benefits of a tool like Find allow us to do that. But, we tend to make assumptions about how heavily we can lean on those tools to accomplish our goals. Sometimes we take claims made about one portion of the tool (eg. query performance) and assume that the same claims hold for another portion (eg. indexing performance). So, to ensure correctness and save time we make the framework responsible for something. We could actually do ourselves, if we took a little more time and did some integration testing on. At least, that's where we found our selves on this app.

#187646
Jan 30, 2018 18:04
Vote:
 

You can add the extra relation through service api or by using IRelationRepository from code. Then you will also get all the events and cache evictions that you get when doing it from the UI. But you won't get the long delay when it is fetching all those relations, because that is part of the UI layer. Any direct manipulation of the database is unsupported.

Adding the relation through IRelationRepository is very simple:

relationRepository.Update(new NodeEntryRelation() { Child = productContentReference, Parent = targetCategoryContentReference });

This will add a non-primary relation with SortOrder 0 (i.e. "at the top" (which is actually at the top if there are many relations with SortOrder 0 is undefined). 

#187679
Jan 31, 2018 14:55
Vote:
 

The fix for COM-6360 will be in the next relase, Commerce 11.8.1.

#187935
Feb 06, 2018 20:03
Vote:
 

@MagnusRahl,

Thanks for the update. I'll post back when we have evalueated it in test.

#187936
Feb 06, 2018 20:11
Vote:
 

@MagnusRahl,

We've install EPiServer Commerce 11.8.1, and I can confirm that this fixes the problems we saw. Updates to add an entry to a large node take a couple of seconds to complete.

Thanks again for your help.

-Drew

#188322
Feb 19, 2018 22:23
Vote:
 

@Drew happy to hear we finally got it working better!

#188414
Feb 22, 2018 15:03