One thing I’ve learned from my past year in data science, is that it is an iterative process – identify high impact questions, preprocess data, build a model, interpret results, repeat. Each phase within this progression is crucial to reaching success, but let’s focus around “repeat” and what this means. Ask yourself: what do the results tell us? Is this what we would expect? Are the results actionable? Can we improve this? Identify absolutely everything about this analysis that makes sense, what you need to clarify through a separate approach, and what needs to be changed. Now make adjustments and continue.
Let’s look at an example. Company A sells running gear. They now want to start implementing analytics into their company (as they should!) to gain more insight into which shoes they should be buying. They continually order the same four brands. The following is a sample table depicting sales for the past three years.
You can determine from the numbers that Asics and Nike consistently have more sales than Brooks or Saucony. You might even have identified the peak years for each brand – great – but all this really tells you is which brand logo you most frequently saw walking out your door. What do these numbers mean? We are now at the “repeat” phase and need to take the analysis one step further. We want data that tells us something – we want actionable results. In order to get these answers we partition our numbers into the categories by which we order the shoes: brand, style, sex, width, and size. This could follow a format similar to Asics – style A – Women – Pink – Wide – Size 8.
The first table led us to believe that Brooks and Saucony are less popular shoes. By dividing the brands into smaller segments we discovered that we sell half as many styles of Brooks – no wonder we sell less, we have less! This discounts our initial inference about this brand. Next we move over to Saucony and notice that style B – women – yellow sell extremely well at your store. Now compare these to other Saucony shoes and you will notice that these are the only bright colored shoes you offer in this brand. But we thought Saucony was just less popular? Is the success of this color just a coincidence? By taking a sum of the sales for each color we see that bright colored shoes consistently blow all other colors out of the water. This trend does not mean you should immediately stop ordering anything other than a blinding neon spectrum of shoes, but allow these numbers to identify areas of opportunity, such as providing an additional Saucony style in one of the colors proven to be a hit among your customers. Maybe there are even specific colors popular among women or men that will allow you to make even more educated decisions for your business.
This example, although a very basic representation of the iterative process, demonstrates how important it is to understand the story that the results are telling and how you can use this information. Identify areas of weakness in the story, hit the “repeat” button, improve your output, and experience the power of data science.