While Model Trains

Read data blog posts.
Carefully handpicked.
Presented 3 at a time.

Variance after scaling and summing: One of the most useful facts from statistics

Chris Said

"What do R2, laboratory error analysis, ensemble learning, meta-analysis, and financial portfolio risk all have in common? The answer is that they all depend on a fundamental principle of statistics that is not as widely known as it should be. Once this principle is understood, a lot of stuff starts to make more sense."

Read it!

The Four Jobs of the Data Scientist

Roger Peng

For each "Data Analytic Iteration," you need to embody the roles of a scientist, statistician, system engineer, and politician.

Read it!

Prediction intervals for Random Forests

Ando Saabas

Prediction intervals are commonly used for linear models but are often underused for random forests. Leveraging the fact that a random forest can provide a conditional distribution instead of just the conditional mean makes prediction intervals relatively straightforward to use in this context.

Read it!