Data Taxonomy Best Practices (Gain Actionable Insights)

The human brain is primed to detect patterns and organize experiences into taxonomies. In fact, human intelligence is based upon our ability to group experiences into categories and concepts; when we experience something new, we’re able to respond intelligently and appropriately by instantly identifying how the new experience fits into the categories we’ve already learned.

Perhaps because our brains work this way naturally, it’s instinctive to apply taxonomic structures to complex concepts in the external world as well in order to make them easier to understand. Carolus Linnaeus, an 18th-century Swedish scientist, invented the Linnaen classification system of the natural world that we still use today (albeit in an updated form).

In 1869, the Russian scientist Dmitri Mendeleev invented the Periodic Table to simplify conversations around the building blocks of our existence. Websites, grocery stores, libraries, and many other well-organized digital and real-life spaces use taxonomy structures to arrange content and objects in logical, easy-to-find ways.

Organizing a grocery store is one thing. Organizing an ever-growing collection of raw data is quite another. Although classification is key to making sense of and manipulating data, few data analytics companies are able to apply consistent, useful taxonomies to big data in a way that produces transformative insights.

Why do we need data taxonomies?

Digital data creation is growing at an exponential rate. Every day, Facebook users post more than 250 million photos to the platform. Every second, Instagram users upload 1,000 photos. About 10% of all consumers write online product reviews for the products they buy, and more than 30,000 new CPG products are launched each year.

Many of our experiences—as both people and organizations—have moved online, and we’ve left digital footprints everywhere. Brands that are able to harness the meaning behind all of that data can generate valuable insights into their consumers, competitors, and the overall marketplace. Those insights drive optimizations in marketing, messaging, product development, and more. But it’s not an easy thing to do, for three reasons:

80-90% of data is unstructured. Because data is generated and stored in a variety of formats and locations, there’s no single “owner” and no simple way to search or format all of it. The problem only gets harder to solve as data grows exponentially every day.
Human communications are rarely straightforward. We’re clever with language. Hyperbole, jokes, sarcasm, and double entendres are common in a range of data types, especially social media posts and product reviews. Even when these data sources are structured and searchable, it’s hard to correctly tease out the nuance inherent in our language.
Few unstructured data types speak the same language. While structured data types often use unique identifiers to connect with other types (such as a SKU), unstructured data types have no such commonalities. At a basic level, there may be multiple names (with multiple spellings) that describe the same thing; for example, consumers may misspell the word “avocado” or refer to it as “avo” in product reviews, while scientists and researchers may prefer the scientific name “persea americana”. Additional complexity is introduced when terms develop double meanings that make it difficult to extract context. If a skincare brand was searching for consumer sentiment around “avocado face mask” reviews, how many results today would actually be about cloth face masks with an avocado theme—a totally irrelevant yet identically named product?

The key to effective taxonomies

All data analytics companies can structure external data into taxonomies. But effective taxonomies—those that can provide specific, transformative insights – have an essential element: super granular, super-relevant taxonomy values that are consistent across all data sources and points. If every data point can be tagged using the same parameters that are important to the business, then every data point can be connected to every other.

Classifying data into effective taxonomies is a game-changer for brands that want to make data-driven decisions. Taxonomies connect and organize all data across an organization, making it possible for data software platforms to quickly and easily search for information, extract sentiment, and generate meaningful visuals. Most interestingly, taxonomies allow brands to connect both market chatter and the voice of the consumer with their products, revealing a brand’s strengths, gaps, and opportunities.

How effective taxonomies organize data

Imagine data points as individual books. Unstructured data is akin to haphazard piles of books everywhere and no card catalog to guide you to the information you want. A so-so taxonomy might organize the big pile of books into a few different stacks based on genre. If you were hoping to sort your books by the author’s last name or publication date, you’d be out of luck.

An effective taxonomy is like a giant bookcase, with books organized onto shelves and tagged with thousands of specific attribute identifiers like genre, number of pages, publication year, author name, and any other tags that are important to your bookstore. The books can be easily reshuffled by identifier because every book, from comic books to scientific journals, uses the same taxonomy values, or Key Intelligence Parameters (KIPs).

Connecting two or more KIPs can generate trend predictions. For instance, to see if romance novels are becoming more popular over time, you could plot sales figures for books in the “romance” category. If your identifiers are appropriately specific, you can even roll some of them up into broader classifications for a more zoomed-out view of a category. Perhaps you know that a segment of your audience enjoys both biographies and academic texts; combining the two into a “non-fiction” category allows you to include more data sets for more accurate analytical results.

Taxonomies Broke the Mold

A platform that configures the most effective taxonomies possible uses three techniques:

Using super granular KIPs. Every feature, benefit, ingredient, trend, SKU, competitor, and detail that’s important to a brand is included in its KIPs. Getting the specific KIPs right is the hardest part; once those are set, it’s easy to group several KIPs together under bigger themes to examine different angles on a trend or attribute.
Creating a shared language for all data types. It’s common for data analytics companies to structure all the unstructured data types they analyze. But many stop there, and structuring alone is not enough. Instead, all data should be both structured and normalized before building taxonomies; that second step is rare but essential for getting the most out of your data. Normalizing all data types with a Natural Language Processing (NLP) engine using the same taxonomy values effectively translates them all into the same language, making it possible to see how each KIP is naturally affecting or engaging with each other in all corners of the real world.
Taking a flexible approach. Every data source has different levels of language specificity. Patents and scientific research papers are often the most specific in their language. By contrast, consumers will rarely start a social media discussion using the SKU of the product being discussed; instead, those conversations tend to be conducted at the product type level. The Skai platform has the flexibility required to ensure all use cases are represented in the data outputs. Rolling several taxonomy values together under a single umbrella will capture both the most specific mentions of a term as well as broader conversations about the same thing. So a CPG brand that’s interested in tracking trends in organic products can combine terms like “organic,” “100% natural”, and “no preservatives” under the same “organic” umbrella to fully capture all the ways that consumers talk about organic and organic-adjacent products.

There is another type of flexibility, too: the ability to add new taxonomy values quickly, easily, and on demand for an early-stage view on what’s trending or happening in the world. New terms—like “biodiverse”, in the organic food world—and new developments in the world at large—like the COVID-19 pandemic—are simple to add to existing taxonomy values since all of our data is normalized.

How Skai configures data taxonomies

When creating taxonomy values for a category that’s new to Skai, we take two different paths to ensure total coverage. The first is a top-down approach. Every vertical already has some taxonomy values in common use, such as product filters, attributes that appear in product descriptions, and categories and subcategories in various ecommerce channels. We start with these familiar taxonomies. Then we add a bottom-up approach, running huge data sets through an NLP engine to surface meaningful keywords with a high recurrence. This helps us identify keywords that are harder to spot at scale using the top-down method.

Once the taxonomy values are identified, create custom combinations of values to reflect mega and micro trends. Combining values in this way presents new views on large market trends that affect several categories as well as very specific connections between, say, product attributes and perceived benefits for a particular product line.

PepsiCo, for example, used the Skai platform to track mega and micro trends across their entire portfolio of products to reveal new product development opportunities they had never seen before.

Brands like PepsiCo turn to Skai because nobody else can provide the level of granular insights that we can. And that’s all thanks to our taxonomies! Read the full case study.

To find out how Skai could help your brand do more with data, contact us for a demo.