Online sales are booming. Even before the COVID-19 pandemic, shopping malls were on the decline as more and more consumers and manufacturers turned to the digital world. But analyzing product performance online is much different than in the physical world. There is no universal product code (UPC), many products have different names, and even products with the same name may be different. Each retailer has its own rules, leaving manufacturers scrambling to determine their “share of shelf”, which product reviews to count, what the true rating is for a particular product, how to analyze their competition against their performance, how to optimize their product offering on a particular ecommerce channel and more. In short, it’s a jungle out there.
The challenge of extracting ecommerce market intelligence from websites
Ecommerce sites behave very differently from brick-and-mortar. With so many open marketplaces, such as Amazon, Walmart, e-Bay and Target, and with low barriers to entry for new merchants, the amount of inventory that is uploaded and sold online is staggering. As buyers, being able to find the product you want and to compare it across different channels or even within the same channel can be a real head-scratcher.
Imagine searching for that perfect pink lipstick and seeing the results below. Are these the same lipsticks? If not – how are they different and what’s the best deal for my needs?
But it’s not only the buyers who are confused. Consider brands that use multiple third-party distributors and are trying to track how their products and their competitors are performing online.
Extracting ecommerce market intelligence with advanced analytics
The ecommerce giants are not in a position to solve this issue, as it is in their interest to attract more merchants and expand their reach. As long as there is a low barrier to entry for new merchants who bring in more inventory and create more competition, buyers will continue to purchase and, as a result, tracking and understanding conversion rates for product groupings will continue to be difficult.
The reality is, creating standardization and enforcing sellers to comply with naming conventions, providing universal identifiers such as UPC and mapping inventory to the correct category is not an attractive proposition. It is up to brands to figure it out on their own. One of the newest techniques at their disposal is a Natural Language Processing technology incorporated in the Skai advanced analytics platform, called product clustering.
The product clustering capability groups together all the different listings of a product into one view, across all different merchants and distribution channels. Under this consolidated view, it is possible to have a singular understanding of how a product is performing, for example, how many reviews has it generated and from which channels, what are the consumers saying about the product, and what is its average rating. Moreover, it is possible to benchmark a product against an entire portfolio and compare it against the competition.
Because not every brand defines product clusters in the same way, it is also necessary for advanced analytics platforms to allow flexibility in how the groupings are recognized. Consider two different flavors of your favorite potato chips brand. They come in the same bag, the same size, they’re even sold next to each other on the shelf. Are they unique on their own or should they be clustered as one product line? And what if the same flavor comes in different bag sizes? Or sold individually, in a pack of 3, or a pack of 6? The truth is always in the eye of the beholder.
The secret sauce: NLP breakthroughs for extracting market intelligence from ecommerce websites
Applying a quality product clustering solution on top of 100s of thousands of products from multiple e-commerce channels is not a trivial task. To achieve product clustering with high degrees of accuracy, Skai employs a unique combination of processes that utilize patented NLP technologies, highly scalable autoML capabilities, and brand refinement machine learning algorithms.
It starts with the ability to extract deep knowledge about a product that is ingested into the platform and structures this knowledge into proprietary data models. This includes identifying the product’s solution type and other key features and benefits as well as many other product attributes and then normalizing and refining all of the brand values that are identified to ensure naming consistency within the entire data set.
After the data is organized, it gets segmented. This step determines the criteria that belong to a certain cluster. For example, only products that have the same brand name will be part of a specific segment. The segment can be further winnowed down if only products of the same solution type, flavor, color, size, etc. are included.
Next, dictionaries are generated, curated, and applied to the data which effectively cleans the titles of messy keyword stuffing and other nonrelevant terms. These two steps combined are performed by expert analysts and with autoML capabilities; there is no additional coding work required to create these unique configurations.
Finally, after the data has been organized, segmented, and cleaned, a K-means algorithm scans the full data set and determines the optimal clustering arrangement per segment. This algorithm is manually evaluated and measured for precision (homogeneity of the cluster) and recall (completeness of the cluster). This approach has been shown to yield greater than 95% accuracy, which is well above the industry standard of around 78%.
————————————–
*This blog post originally appeared on Signals-Analytics.com. Kenshoo acquired Signals-Analytics in December 2020. Read the press release.