Disease outbreaks trackable with Twitter, study says

About 15% of Tweets can accurately be connected to state-level location data or better
Most of that data is parsed from users' public profile
Such volume means Twitter could be used as an early-warning system to monitor spread of diseases

This flu season you’ve probably seen a number of friends on social media talking about symptoms.

New research from Brigham Young University says such posts on Twitter could actually be helpful to health officials looking for a head start on outbreaks.

The study sampled 24 million tweets from 10 million unique users. They determined that accurate location information is available for about 15 percent of tweets (gathered from user profiles and tweets that contain GPS data). That’s likely a critical mass for an early-warning system that could monitor terms like “fever,” “flu” and “coughing” in a city or state.

“One of the things this paper shows is that the distribution of tweets is about the same as the distribution of the population so we get a good representation of the country,” said BYU professor Christophe Giraud-Carrier. “That’s another nice validity point especially if you’re going to look at things like diseases spreading.”

Professor Giraud-Carrier (@ChristopheGC) and his computer science students at BYU report their findings in a recent issue of the Journal of Medical Internet Research.

The researchers found surprisingly less data than they expected from Twitter’s feature that enables tweets to be tagged with a location. They found that just 2 percent of tweets contained the GPS info. That’s a much lower rate than what Twitter users report in surveys.

“There is this disconnect that’s well known between what you think you are doing and what you are actually doing,” Giraud-Carrier said.

Location info can more often be found and parsed from user profiles. Of course some people use that location field for a joke, i.e. “Somewhere in my imagination” or “a cube world in Minecraft.” However, the researchers confirmed that this user-supplied data was accurate 88 percent of the time. Besides the jokes, a portion of the inaccuracies arise from people tweeting while they travel.

The net result is that public health officials could capture state-level info or better for 15 percent of tweets. That bodes well for the viability of a Twitter-based disease monitoring system to augment the confirmed data from sentinel clinics.

“The first step is to look for posts about symptoms tied to actual location indicators and start to plot points on a map,” said Scott Burton, a graduate student and lead author of the study. “You could also look to see if people are talking about actual diagnoses versus self-reported symptoms, such as ‘The doctor says I have the flu.’”

The computer scientists collaborated with two BYU health science professors on the project. Professor Josh West says speed is the main advantage Twitter gives to health officials.

“If people from a particular area are reporting similar symptoms on Twitter, public health officials could put out a warning to providers to gear up for something,” West said. “Under conditions like that, it could be very useful.”

BYU undergraduate Kesler Tanner is a co-author on the study. He wrote the code to obtain the data from Twitter. When he graduates in April, he’ll be headed off to graduate school to earn a Ph.D.

Earlier this year, this same group of researchers published a study showing that most exercise apps are based on bad info.

Follow @BYU on Twitter.

Disease outbreaks trackable with Twitter, study says