Flicker Fusion

The problems with using Twitter as a model for the general population are simple. You don’t have to be a pollster to understand that searching for tweets that match some keywords hardly constitutes proper probabilistic sampling. We might display a map that shows colors mentioned by Americans on Twitter, but nobody would say this is an accurate map of favorite colors for each region of the USA. Naturally, most graphics play it safe and say overtly that they are only representions of Twitter and are not meant to provide deeper insight beyond that into the general population.

However, the distinction is lost on a lot of readers. I think many of us find these graphics so appealing because we see ourselves reflected in our data streams.

Posted on .

The problems with using Twitter as a model for the general population are simple. You don’t have to be a pollster to understand that searching for tweets that match some keywords hardly constitutes proper probabilistic sampling. We might display a map that shows colors mentioned by Americans on Twitter, but nobody would say this is an accurate map of favorite colors for each region of the USA. Naturally, most graphics play it safe and say overtly that they are only representions of Twitter and are not meant to provide deeper insight beyond that into the general population.

However, the distinction is lost on a lot of readers. I think many of us find these graphics so appealing because we see ourselves reflected in our data streams.

—Jake Harris, one of the smartest news nerds in the biz, on the perils of polling Twitter.