10 Modern Statistical Concepts Discovered by Data Scientists

Here’s the list:

Clustering using tagging or indexation methods (see section 3 after clicking on the link), allowing you to cluster text (articles, websites) much faster than any traditional statistical technique, with a scalable algorithm very easy to implement
Bucketization – the science and art of identifying the right homogeneous data buckets (millions of buckets among billions of observations), to provide highly localized (or segment-targeted) predictions, or to smooth regression parameters across similar buckets, with strong statistical significance. It is equivalent to joint (not sequential) binning in multiple dimensions, which is a combinatorial optimization problem. While decision trees also produce some bucketization, the data science approach is more robust, simple, scalable and model-free. It does not directly produce decision trees, and lead to easy interpretation (each data bucket corresponding to a specific type of fraud, in a fraud detection problem). A related problem is bucket clustering, via standard hierarchical clustering techniques.
Random number generation, a 3,000 year old problem, benefited from data science advances: for instance, using the digits of irrational numbers such as Pi or SQRT(2), produced with very fast algorithms, to simulate randomness.
Model-free confidence intervals, getting rid of p-value, hypothesis testing, asymptotic analysis, errors due to poor model-fitting or outliers, and of a bunch of obscure statistical old-fashioned concepts
Variable / feature selection and data reduction, without using L2-based, model-based techniques such as PCA, potentially numerically unstable, which are sensitive to outliers, and lead to difficult interpretation
Hidden decision trees, an hybrid technique combining some sort of averaged decision trees and Jackknife regression, more accurate, and far easier to code, implement, and interpret than either logistic regression or traditional decision trees. Not subject to over-fitting, unlike its ancestor statistical techniques.
Jackknife regression, a universal, simplified regression technique, easy to code and to integrate in black-box analytical products. Traditional statistical science offers hundreds of regression techniques, nobody but statisticians know which one to use, and when, obviously a nightmare in production environments.
Predictive power and other synthetic metrics designed for robustness rather than for mathematical elegance
Identification of true signal in data subject to the curse of big data (spurious correlations)
New data visualization techniques – in particular using data video to display insights

All Courses

All Courses

Uncategorized

10 Modern Statistical Concepts Discovered by Data Scientists

Leave A Reply Cancel reply

All Courses

All Courses

Uncategorized

10 Modern Statistical Concepts Discovered by Data Scientists

You may also like

DEEP LEARNING TERMS EXPLAINED IN SIMPLE ENGLISH

5 EXCEL ADD INS EVERY DATA SCIENTIST SHOULD INSTALL

UNUSUAL PROBLEMS THAT CAN BE SOLVED WITH DATA SCIENCE

Leave A Reply Cancel reply

Login with your site account

Register a new account