Twitter polls annoy me

Obligatory disclaimer: I'm not a statistician or a sociologist. My qualifications come from being a physics undergraduate, where I live and breathe error analysis in the lab, and also from reading around the subject. Finally, I use Twitter far too much; it's really just morbid fascination at this point. Oh well.

Last year, Twitter introduced polls as a thing you can add to your tweets. This was more likely than not done to poke Twitter into being vaguely profitable, since it's stagnant as hell.

Unfortunately, polls are only as good as the people making them and Twitter is a wretched hive. There have been some really good polls not used as polls - for example, a wacky choose-your-own-adventure romp which started with swiping left or right on a famous politician's Tinder profile - but Sturgeon's Law applies and most polls are...well...crappy.

"Why are Twitter polls so bad?" you might ask. Turns out that addressing the how instead of the why is far more useful (and also that Twitter polls can be surprisingly enlightening, but I'll address that later).

Twitter polls are not a good data collection tool if you want a sample reflecting the general population (this is really, really important when trying to collect data...well...about a general population). On the surface, this seems a little strange: Twitter polls made by people who don't lock their accounts are in theory visible to anyone on the planet, discounting censorware. Theoretically you'd be able to get a very large sample from a public poll - possibly larger than the samples used by outfits like YouGov (who have representative sample sizes of about 2000 per poll).

In practice, when was the last time you combed someone's account specifically to look for a poll, and happened to do that within days of the poll's creation? More likely you voted in a poll because someone you follow created it or retweeted it from someone else, and because you happened to be online when it showed up on your timeline.

Welcome to selection bias. Selection bias is when certain groups are over- or under-represented in your sample compared to their distribution in your target population (in my example, the general population - but if your target population was men and women over 50 and somehow you ended up with far more women in your sample than men, that's also selection bias). It's a complete pain when trying to assess how useful polls are (and to whom), and Twitter is rife with it.

I already mentioned time as an example of selection bias in Twitter polls - your sample will be drawn from people who were online and saw your poll during the time it was active. Language is a fairly obvious one. Who you follow and who the people who you're following happen to follow (try saying that quickly) are incredibly important, as this will determine the polls you participate in. Not just that - when you follow people, you generally have common interests, or share political opinions, or might already know each other in person...There are many, many more examples, but the point is this: the audience for a Twitter poll isn't random. It's very heavily self-selected. It's hardly reflective of the general population - only of the people answering those polls.

There are also problems in poll design. Twitter is pretty limited for only allowing you 4 options, but whatever - decent data collection is not really the goal of these polls. Many won't include a "don't know/none of the above" option. I know a lot of people see this as a "weak" option, but some people are genuinely dissatisfied or agnostic on the situation. Leaving out these folks can heavily skew your sample.

I'm going to quickly acknowledge the people who create polls with loaded questions or responses clearly biased in favour of one side or another. For data gathering, this is an utterly shitty method, both professionally and ethically (people who like to skew statistics push all my buttons). However, I'd wager that even people who have never bothered to read a single article on statistics know that they're not going to get good, unbiased data out of these polls; that's not what they want. They want to get themselves into a self-righteous circlejerk. It's a way of signalling affinity to a group and displaying virtue in a really perverse way.

I've now written about 700 words on why Twitter polls are bad. This is naive - in many ways they're actually pretty good.

Hang on, hang on - I've spent over 700 words slating Twitter polls and now I want to defend them?! What kind of indecisive bullshit is this?!

Twitter polls aren't necessarily representative of the entire population and are usually poorly designed, so for gauging the response of the general population, they're pretty rubbish. I have laboured this point. However, if you want to know about the people making and spreading the polls, they're great. You could make diagrams of poll-makers and the people who vote in or retweet the polls to work out the structure of a network and how it might relate to other networks. If you have other demographic information such as age or gender, that could augment a model.

(If you think this is creepy, a lot of this is publicly available information. It's not hiddden away.)

Twitter might not be great for telling you what the general population thinks, but it's definitely great for telling you how different groups clump together - and how much they interact with each other. That might not be the data poll-makers were trying to get - but it's important nonetheless.