While Model Trains

Read data blog posts.
Carefully handpicked.
Presented 3 at a time.

The Four Jobs of the Data Scientist

Roger Peng

For each "Data Analytic Iteration," you need to embody the roles of a scientist, statistician, system engineer, and politician.

Read it!

How much data should you allocate to training and validation?

Francesco Pochetti

To avoid responding with "that's what Andrew NG said" when asked about the reason behind choosing an 80% training and 20% validation split, consider this explanation.

Read it!

Variance after scaling and summing: One of the most useful facts from statistics

Chris Said

"What do R2, laboratory error analysis, ensemble learning, meta-analysis, and financial portfolio risk all have in common? The answer is that they all depend on a fundamental principle of statistics that is not as widely known as it should be. Once this principle is understood, a lot of stuff starts to make more sense."

Read it!