by Tim Lindeman | July 31st 2018
Predictive analytics is a topic generating great hype and great hope in healthcare and other industries. As this area of data science matures, it is important to remember that predictive analytics is not defined by one technology or technique, although it can be roughly divided into two approaches: pattern recognition and simulation.
Pattern recognition is the most common approach, the foundation of much-hyped machine learning and artificial intelligence. Simulation is another, more human alternative to understanding business problems, predicting future trends, and recommending optimum decisions. In this blog, I explain the essentials of simulation and highlight three of its advantages.
Pattern recognition vs. simulation
Pattern recognition is inherently data-centric. You throw a bunch of data at an algorithm, it finds patterns in the data, and maps future trends. This is the backbone of data mining, machine learning, and AI. Other things being equal, the larger the data set, the greater the accuracy of the predictions. Therefore, big data is highly desired.
Simulation, in contrast, is model-centric. You start by using human knowledge of cause and effect to create a model of the system in which the problem operates. You then connect the data you have available with that model to obtain a future projection. For example, to predict future sales, you would model its key causal factors, such as sales staff experience, product quality, various market factors, and how they all relate to one other. Other things being equal, the greater the expertise of the humans involved, the greater the accuracy of the predictions.
The fundamental difference between the two approaches is that pattern recognition relies on correlation, while simulation relies on human knowledge of causation.
Advantages of simulation
1. Simulation integrates signals missing in the data
Often, key causal factors are not present in your data. For example, soft factors, such as time pressure, morale, and reputation, can have a significant effect on desired outcomes, but are rarely captured by information systems. In simulation, everything that is known about the missing factors can be included in the model, and unknown factors can be estimated. The resulting projections will take these factors into consideration and quantify the degree of uncertainty.
2. Simulation has relatively low data acquisition and processing costs
In contrast to pattern recognition, which relies on large volumes of high quality data, simulation uses the data that is available and supplements it with knowledge. In addition, simulation does not require all the data that “might be related” to the problem to look for meaningful correlations. The causes of the problem are already built into the model. Therefore, simulation often has a less time-consuming and costly data acquisition stage.
3. The accuracy of simulation predictions is highly reliable
One of the challenges with pattern recognition is that correlation does not always reflect causality. Often data will contain correlations that appear to be causes, but are not. Such false correlations, which are common with big data analysis, lead to failed predictions. Simulation starts with expert understanding of cause and effect, which is grounded in scientific knowledge, and produces reliable results. Simulation also employs a model testing and adjustment phase that both improves predictive accuracy and improves our understanding of cause and effect.
With such clear advantages, you might wonder why simulation has been getting so little attention during this wave of hype around predictive analytics. Up to now, simulation’s biggest proponents have been academics and specialized consulting firms who have implemented applications in a broad range of industries. Dimensional Insight recognizes the predictive power of simulation and is exploring possible healthcare applications in partnership with Ventana Systems, a company with deep MIT roots and more than 30 years of experience delivering predictive solutions.