While Model Trains

Read data blog posts.
Carefully handpicked.
Presented 3 at a time.

Data Cleaning IS Analysis, Not Grunt Work

Randy Au

"The act of cleaning data is the act of preferentially transforming data so that your chosen analysis algorithm produces interpretable results. That is also the act of data analysis."

Read it!

Why Correlation Usually ≠ Causation

Gwern

"Despite this admonition, people are overconfident in claiming correlations to support favored causal interpretations and are surprised by the results of randomized experiments, suggesting that they are biased & systematically underestimate the prevalence of confounds / common-causation."

Read it!

Variance after scaling and summing: One of the most useful facts from statistics

Chris Said

"What do R2, laboratory error analysis, ensemble learning, meta-analysis, and financial portfolio risk all have in common? The answer is that they all depend on a fundamental principle of statistics that is not as widely known as it should be. Once this principle is understood, a lot of stuff starts to make more sense."

Read it!