Classical statistics versus statistical learning

You might be one of many people learn about t-tests, ANOVA, p-values and null hypothesis testing during their studies at university. Such research tools belong to “classical statistics” that date back almost 100 years ago. These statistical methods were designed in a time of mechanical calculators, when mathematics was a pure paper-and-pencil matter and when analyzing data from small samples of a few individuals was the norm.

Classical statistics versus statistical learning: collision of thought styles | ©Bimbim @shutterstock.com

Things have changed dramatically ever since. Computers have permeated almost every aspect of human life. Computation and working-memory resources have become always more affordable, accessible, and self-evident. Not only the hardware that we exploit to solve dedicated tasks has evolved but also the properties of the datasets that we process with it. The internet and other abundant means for information storage enabled the accumulation of always bigger and more complex data from always more individuals. What has not changed so much however is the broad use of classical statistical methods in many domains. This appears to be especially true for research in academia.

“How Classical statistics and statistical learning collide but provide complementary methodological approaches.“
Twittern WhatsApp

How come? It turns out that classical statistics enjoys almost unchallenged use by academics (at least in biology, medicine, psychology and sociology). In stark contrast, many areas of industry have happily migrated to an alternative statistical framework - “statistical learning”. This statistical regime is more naturally adopted by people from computer science, physics, and engineering, typically without formal training in stats or mathematics. Statistical learning tries to get by with the least assumptions possible, let the data speak for itself, and consider notions of certainty and notions of computational load in a same process. Classical statistics, on the other hand, try to make a set of explicit assumptions, derive analytical solutions by mathematical proofs, and then use these to estimate pre-specified models. As an oversimplified intuition, statistical learning aims at deriving flexible and partly unknown models from the data, while classical statistics test human-specified models based on data. Moreover, statistical learning is more directly dedicated to the prediction of the future, whereas classical statistics is concerned with generalization to the general population. Importantly, the diverging historical origins, conceptual foundations, and analysis goals are currently leading to a clash of thought styles in practice. Indeed, the boundaries and common relationships between statistical learning and classical statistics do currently not appear to be rigorously defined.

Conclusion

One thing is however for sure: there is not one statistical world, but several of them coexist. It is the available data and the question at hand that together constrain which statistical framework is most appropriate.

Learn more:

My column series on Machine Learning in Nature Methods: [1] [2][3]
An opinion piece on the potential future of deep learning in biomedicine with John Ioannidis
Inference in the age of big data in Neuroscience and Medicine
Machine learning for precision psychiatry
My talk (slides): 10 reasons why precision psychiatry will not be based on classical null-hypothesis testing
Statistical Modeling: The Two Cultures by Leo Breiman
Classical Statistics and Statistical Learning in Imaging Neuroscience
Frontiers in Massive Data Analysis (2013)

Classical statistics versus statistical learning: collision of thought styles

Conclusion

Wie hat Ihnen dieser Beitrag gefallen?

Related articles

Personalized medicine

The default mode network: What is the default function of the human brain?