While Model Trains

Read data blog posts.
Carefully handpicked.
Presented 3 at a time.

Do data scientists spend 80% of their time cleaning data? Turns out, no?

Leigh Dodds

"Data scientists do a whole range of different types of task. If you arbitrary label some of these as analysis and others not, then you can make them add up to 80%."

Read it!

Are Pop Lyrics Getting More Repetitive?

Colin Morris

A fascinating visual essay that utilizes the Lempel-Ziv algorithm (which powers GIFs, PNGs, and most archive formats) to analyze if pop songs are becoming more repetitive.

Read it!

Understanding the beta distribution (using baseball statistics)

David Robinson

"The beta distribution is best for representing a probabilistic distribution of probabilities- the case where we don’t know what a probability is in advance, but we have some reasonable guesses."

Read it!