The Secret to Knowing the Unknowable: Predictive Analytics
Updated: Jun 14, 2019
There are two branches of analytics that are used widely to describe the ways we can summarize large amounts of data. The first way simply describes the data. We call this descriptive analytics.
Descriptive analytics has a wonderful use. It helps summarize the data into readable chunks so we can understand what happened in the big pile of messy data. With descriptive statistics, we can begin to make decisions because we can analyze what we know.
But what happens when we want to know something we haven’t seen yet (or are not able to ask directly because gathering that data would be impossible or too costly?) Questions like…
Will my product sales be higher next year than they are this year?
What kind of product will my customers buy tomorrow?
What is the best way to respond to a brand crisis?
How much should I spend on advertising next year?
The 2nd form of analytics is much more interesting and, in my opinion, useful. It’s called predictive analytics.
By definition, predictive analytics is inferring something we do not know (our prediction) from something we know (data we’ve collected). The reason to use predictive analytics is when the cost to acquire some information is too much. Either the question is impossible to ask or it would take us too long to acquire that information.
Predictive analytics is using data we have available to build a model that allows us to predict data that doesn't exist yet.
With predictive analytics, I can make better decisions because I have a model that helps me understand something I didn’t know before.
To be clear, predictive analytics cannot reduce all risk. It’s impossible to know what WILL happen. The goal of predictive analytics is to understand what MIGHT happen and all of the caveats that went into the analysis.
Let’s talk about the two types of predictive analytics.
1.) Extrapolate - Time Series Forecasting
This version of predictive analytics is relatively straightforward. Most decision makers are familiar with it and how to use it. In time series forecasting, I want to know what may happen in the future given the trends of the past.
This trend line, which is typically a time series model, summaries the past trajectory of data. The magic happens when you take that model summary and use it to extrapolate future time where data doesn’t exist.
In the case of extrapolation, the reason we do not have data is that it doesn’t exist yet! We cannot measure future events (at least not yet — that I know of). Knowing the future is the realm of science fiction and fantasy — looking at you Bran Stark (Game of Thrones).
2.) Non-Temporal Predictive Analytics
If you’re a follower of my blog, then you already know of an example of non-temporal predictive analytics — where I used a model of news preference to predict which data science articles I want to read for the day.
Read more here: MachinaNova — News Reco Engine
In the case of MachinaNova, I was personalizing my daily news experience. In this application, I collected data on my preferences for data science articles. Based on this past preference, I built a natural language model that predicts new articles that would appeal to me based on the new article’s content (words, topics, etc.). Sounds cool, right? You can read more on how that was accomplished in my blog linked above.
The idea of non-temporal predictive analytics as the model is irrelevant. We could build a model in MANY different ways. The main idea is to make sure you have a way to understand its accuracy. In the case of MachinaNova, we understood the accuracy of the model by splitting our training data set and using some of that data to predict data the model hadn’t seen yet. Since we understood actual article preference, we could compare the model’s output versus actual results to understand how good the model is.
This is ridiculously important. Anyone can model anything based on what they think they know about the world. Successful predictive models are those that can remain accurate with data the model was not trained with. A model that falls apart outside of its training data set is completely useless — or, worse, will give you inaccurate predictions.
In the case of non-temporal predictive analytics, the reason we do not know something is that data is really difficult or impossible to ask. Could you imagine if Amazon had to ask you what products you like every time you visited their website? Non-temporal predictive analytics to the rescue!
In this post, we explored predictive analytics—using data we have to understand what MIGHT happen in the future. We also know how it’s different than descriptive analytics—which is using data to summarize what happened in the past.
The main advantage of having a predictive model is that we can extrapolate the model to areas where data doesn’t exist and make predictions. This is extremely useful because knowing everything past, present and future—is impossible. At least in the real world.