Performance of saving catalog data via Content Repository

Vote:
 

Hi,

I'm working with importing quite much catalog data (300 nodes and approximately 40.000 variants). 

I'm using strongly typed models, and the contentRpository to save the entries:

var variant = ContentRepository.GetDefault(parent);
Mapper(ref variant, objectToMapFrom);
ContentRepository.Save(variant);

It takes about ~250-500 milliseconds per save (it can take more). And with 40.000 variants, it will take at least 3 hours to import the whole catalogue.
This was just variants. I'm going to add prices and stock as well.

I have seen that (In earlier versions of EpiServer commerce) that if we had RemoteEvents turned on - the Save was extremely slow, because we needed to publish the events to other servers (and if the server was down or sleeping - the events were just gone into the blue)

So the question: Are there any way to do the save faster than this?

I would really like to see method like this:

ContentRepository.Save(IEnumerable contents);


[Pasting files is not allowed][Pasting files is not allowed][Pasting files is not allowed][Pasting files is not allowed][Pasting files is not allowed]

#181603
Edited, Aug 25, 2017 9:37
Vote:
 

Hi,

Currently there is no such way to do that - on Commerce level because IContentRepository is something from CMS Core and it just does not support batch saving.

Is it possible to use the CatalogImport functionality instead?

#181606
Aug 25, 2017 10:19
Vote:
 

As you suggest, a batch save method would be easier for us to optimize. The reason the built-in xml catalog import outperforms the content repository is in large part because of the batching capabilities of the lower level APIs that the import code uses.

I suggest you add some kind of change tracking to your external data source, or logic to compare the data of each variant to the existing data in the content repository, so you only have to update whenever there is a change. This won't help the initial import, but I assume you will be updating the content from your external data source periodically, and that not every item will have changed on every import.

Adding batching to IContentRepository has been discussed before, but I will bring it up for discussion again. But for the time being, the catalog import or directly using the lower level APIs (ICatalogSystem, MetaObject) has the potential to be a lot faster, but also a lot less nice to work with code-wise.

#181607
Aug 25, 2017 10:26
Vote:
 

Hi,

Quan: Yes, I may have to look at it instead.

Magnus: Good that you'll have a look at it. I suggest that batch reading (ContentRepository.Get<Variant>(IEnumerable<ContentReference> contentReferences) also should be taken into considerations. I have had a use of it several times.

Magnus/Quan: Have you done any performance comparison between using the ICatalogSystem and IContentRepository? How much time (use your gut feeling) do you think the import will take using the ICatalogSystem instead?


Generally when writing to the database, I yould have looked at SQLBulkCopy on the lower level api's to insert/update data. I have also good experiences by bulking sql-statements that we send to the server like this:

command.Execute("exec sp_Name @v1 = 'Test', @v2 = 'Test2';exec sp_Name @v1 = 'Test3', @v2 = 'Test4';exec sp_Name @v1 = 'Test5', @v2 = 'Test6';exec sp_Name @v1 = 'Test7', @v2 = 'Test2';)

Some other stuff we have done to improve write performance to database:
- disable all indexes before writing
- don't use transactions
- have "shadow"-tables that we write to and switch when we are finished (sp_rename @objName = 'TableName' @newName = 'TableName_Shadow2';sp_rename @objName = 'TableName_Shadow' @newName = 'TableName' ;sp_rename @objName = 'TableName_Shadow2' @newName = 'TableName_Shadow')

#181609
Aug 25, 2017 11:09
Vote:
 

Hi,

We would suggest to avoid SQL direct manipulation, because there are stuffs (cache, version) which need to be handled.

We haven't done such testing, but speaking from experience, I'd say when you have a few hundred contents to be updated, then IContentRepository works fine. When you have thousands or more, then batch processing methods, like CatalogEntryDto/MetaObject or CatalogImport (which uses CatalogEntryDto/MetaObject internally) should be faster option. And it usually fasts enough so you don't have to really care about SQL stuff or custom optimization. On my machine I could import 100k entries catalog in under 10 minutes. Of course it varies a lot on the catalog and the hardware configuration, but that might give you some ideas. 

#181612
Aug 25, 2017 11:14
Vote:
 

Thanks Quan!

The structure of the data makes me need a PropertyList on the strongly typed object. Do you have any Idea how to manage that through the CatalogEntry/DTO?

/Thomas

#181614
Aug 25, 2017 12:06
Vote:
 

By default the backing type of PropertyList<T> is MetaDataType.LongString. If you have your class of PropertyList<T>, just call ToString() on the instance, and then assign it to the metafield. (I'll need to double check to be absolutely sure, but that should be it) 

#181616
Aug 25, 2017 12:35
Vote:
 

Thanks.

I see that the ICatalogSystem does not have a Save-method for multiple CatalogEntries, but for each single. But I suppose it is faster anyway. I'll do a test.

If I am going to use the legacy-catalog import (As you say it is here: http://world.episerver.com/documentation/developer-guides/commerce/catalogs/low-level-apis/) - In which time frame do you think wee need to rewrite the import if we use it?

#181618
Aug 25, 2017 13:04
Vote:
 

The CatalogEntryDto can hold multiple entry at one (it's a dataset, btw), and you can save multiple MetaObjects at once as well.

We generally recommend to use the content APIs, but there are areas - such as this - where ICatalogSystem/MetaObject works better. We don't plan to obsolete them anytime soon (not at least the batching save is fully supported, and that will take time), so I don't think you'll have to worry about that for (at least) next one or two major versions 

#181621
Aug 25, 2017 13:09
Vote:
 

Regarding batch saving content I checked and we have tried to implement it before but backed out because of the great complexity it gave rise to. It is not just saving content, it is adjusting save actions and statuses, taking decisions about creating versions, raising pre and post events where a third party could make adjustments or cancel actions, just to mention a few things. We still might take another stab at it at some point but I wouldn't hold my breath.

#181651
Aug 25, 2017 19:24
Vote:
 

Thanks, Magnus

I guess I'll look on an implementation where I use the ICatalogSystem / MetaObjects-functionality, but use the strongly typed objects to create the DTOs. It should be quite easy to use reflection on the objects and create generic methods for creating DTOs. Then I will have "the best of both worlds"


In the future I would also really like to have the opportunity to create a completely custom catalog provider (As you have with payment/shipping/warehouse-providers). Then we could read data directly from PIM-systems and other external systems for the catalog. (Just wishful thinking from my side.. :))

/ Thomas

#181676
Aug 28, 2017 12:25
Vote:
 

@magnus / @quan

I noticed that you add a lot to the ApplicationLog when saving the catalog entries. I tried to disable the logging in ecf.catalog.config by setting the this xml element to nothing:

<Events /> - but it still writes a lot of log entries. Seems like this is sucking a lot of performance as well. Are there any other ways to disable logging?

#181702
Aug 28, 2017 15:51
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.