INSIGHT CENTER
The Next Analytics Age
SPONSORED BY SAS
Harnessing the power of machine learning and other
technologies.
First, whenever we spoke with machine learning experts (data scientists focused on training and
testing predictive models) about the most difficult part of their job, they said again and again,
“the data is a mess.” Initially, taking that statement literally, we imagined it referred to well-
known issues with data — missing values or a lack of coherence across databases. But as we dug
deeper, we realized the problem was slightly different. In its rawest form, even clean data is too
overwhelming and complex to be understood at first glance, even by experts. It has too many
tables and fields and is often collected at a very high granularity (for example, online clickstreams
generate new data with every click, and sensor data is collected at 125 observations per second).
Machine learning experts are used to working with data that’s already been aggregated into
useful variables, such as the number of website visits by a user, rather than a table of every action
the user has ever taken on the site.
At the same time, we often heard business
experts complain that “we have a lot of data and
we are not doing anything with it.” Further
investigation revealed that this was not strictly
correct either. Instead, this frustration stems
from two problems. For one thing, due to the
time it takes to understand, formulate, and
process data for a machine learning problem, machine learning experts often instead focus on the
later parts of the pipeline—trying different models, or tuning the hyperparameters of the model
once a problem is formulated, rather than formulating newer predictive questions for different
business problems. Therefore, while business experts are coming up with problems, machine
learning experts cannot always keep up.
For another, machine learning experts often didn’t build their work around the final objective—
deriving business value. In most cases, predictive models are meant to improve efficiency,
increase revenue, or reduce costs. But the folks actually working on the models rarely ask “what
value does this predictive model provide, and how can we measure it?” Asking this question
about value proposition often leads to a change in the original problem formulation, and asking
such questions is often more useful than tweaking later stages of the process. At a recent panel
filled with machine learning enthusiasts, I polled the audience of about 150 people, asking “How
many of you have built a machine learning model?” Roughly one-third raised their hands. Next, I
asked, “How many of you have deployed and/or used this model to generate value, and evaluated
it?” No one had their hand up.