Try our conversational search powered by Generative AI!

Arthur Vander Voort
Feb 18, 2023
  791
(0 votes)

Optimizely PIM - Data Cleansing in PIM: Save Time by Importing Messy Data

In my previous post, I talked pretty extensively about why setting up data governance is both an obstacle and essential to getting data into the PIM. Another common obstacle is trying to perfect your data prior to importing it to the PIM. You can spend a lot of time in an Excel spreadsheet, trying to find and replace a bunch of bad values across dozens or hundreds of columns. While this is a valid approach, Optimizley PIM also has functionality that you can leverage to cleanse your data directly in the application. This article outlines an approach to speed up your implementations without sacrificing data quality.

MessyDataSample.png

Before we get into it, here are a few caveats:

  • This approach is specifically for what we commonly call 'attribute' data – product data that has a defined list of values. Think of size, color, guage, width, length, material, finish, wattage, and so on.
  • Depending on the number of properties and property values, this can still be a time consuming process. 
  • This approach will not help you clean up data for unique product properties like product description or product title.
  • This approach is only recommended during implementation. You must re-establish strong data governance after cleansing your data to get value from this exercise.

Set up your properties to allow bad data

Properties that ony allow a defined list of values typically make up the bulk of product data. Normally, when data is imported, if a product has a value that is not on the list for this property, the data will not be imported, and the bad data will be isolated. The key to this approach is to intentionally set your properties in a way that allows bad data to be ingested. You must set all of your properties that use a defined list of values to allow ad hoc values. With ad hoc values enabled, no data governance is enforced, and any value will be imported and added to the property's value list.

If you already have all of your properties created, you can easily toggle them to allow ad hoc values:

  1. Filter the property list for properties that use a dropdown control type.
  2. Export properties and choose the option to only included the filtered list.
  3. In the exported file, set 'Allow Adhoc Values' to 'yes' for every row.
  4. Import the properties to the PIM.
SetAdhocs.gif

Import your messy product data

With the properties configured, you can now import your product data. Data governance will not prevent your messy data from being imported for these properties, which is exactly what we want.

  1. Go to imports.
  2. Select your product data file.
  3. Map your data and proceed through the import wizard.
  4. Initiate the product import.

Review & cleanse your data

With the import complete, all of the data, good and bad, has been imported and the property value lists have been updated. This is where the work starts. We are going to review the values for each property and correct any bad ones. To do this, we will filter for the properties we configured to allow ad hoc values and then work our way through the list.

  1. On the property list, apply a filter for properties that use a dropdown control type.
  2. Edit the first property.
  3. Go to the values tab.
  4. Review the list of values for bad data.
  5. For any bad value, click the edit icon and enter the preferred value (for instance, if you want '33 inches to' be '33 in', update this).
  6. When you have cleaned up all the bad values, toggle 'allow ad hoc values' to false and save the property.
  7. Repeat until you are through all properties.
PropertyValueCleanup.gif

When the data was initially imported, any bad values were added to the property list and saved to products that had those values. When you edit the property value and save the property, we will propagate the changes to all products that had the value. This lets you remove duplicate (e.g. 24" & 24 inches) and erroneous values while automatically updating the product data.

Re-establish your data governance to maintain quality

Now that all of our property value lists have been reviewed and the product data is cleansed,  we need to ensure that we don't waste this effort by allowing bad data back into the PIM. In the previous step, we recommended toggling the ad hoc value option off after you finished each property, but make sure you did not miss any. To do this, we can use the export/import approach again similar to when we initially enabled the ad hoc values.

  1. Filter the property list for properties that use a dropdown control type.
  2. Export properties and choose the option to only included the filtered list.
  3. In the exported file, set 'enable ad hoc' to 'no' for every row.
  4. Import the properties to PIM.
Feb 18, 2023

Comments

Please login to comment.
Latest blogs
A day in the life of an Optimizely Developer - Enabling Opti ID within your application

Hello and welcome to another instalment of A Day In The Life Of An Optimizely developer, in this blog post I will provide details on Optimizely's...

Graham Carr | May 9, 2024

How to add a custom property in Optimizely Graph

In the Optimizely CMS content can be synchronized to the Optimizely Graph service for it then to be exposed by the GraphQL API. In some cases, you...

Ynze | May 9, 2024 | Syndicated blog

New Security Improvement released for Optimizely CMS 11

A new security improvement has been released for Optimizely CMS 11. You should update now!

Tomas Hensrud Gulla | May 7, 2024 | Syndicated blog

Azure AI Language – Key Phrase Extraction in Optimizely CMS

In this article, I demonstrate how the key phrase extraction feature, offered by the Azure AI Language service, can be used to generate a list of k...

Anil Patel | May 7, 2024 | Syndicated blog