Twitter and other social media platforms can serve as powerful tools to help predict outbreaks of both infectious and non-infectious diseases and should be viewed as more than just a breeding ground for misinformation.
This was recently confirmed in work by Gina Debogovich, senior director at the United Health Group and Dr. Danita Kiser (PhD) at Optum which they discussed at a session during last week’s Health Information and Management Systems Society (HIMSS) meeting in Orlando, FL.
Their assessment of several million US tweets in the early stages of the COVID-19 pandemic, showed that information contained in tweets about COVID-10 was 7-10 days ahead of public case data.
The work of Debogovich and Dr. Kiser was based on the hypothesis that “social media conversations may contain insights into COVID prevalence and may be a leading indicator for cases and hospitalization.” Debogovich said Twitter was chosen as the social media platform to evaluate because meta-data with the tweets often contains the geographic location of the tweet.
In their study, natural language process techniques were used to identify COVID-19 related tweets and classify them into different categories. Statistical analysis and machine learning was then used to determine if the tweets were leading indicators of COVID-19 spread in a community.
In their initial work, more than 15,000 geo-located tweets that contained either an address or the latitude and longitude of the tweeter were hand classified into 7 primary categories and further divided by proximity or no proximity.
The categories used were:
- Confirmed (the tweet stated the subject had or believe they had COVID-19)
- Showing symptoms (the tweet indicated the subject had symptoms of COVID-19)
- Perished (subject had died as a result of COVID-19)
- Quarantine (subject was in quarantine)
- News (usually about a news article related to COVID-19)
- Hoax (message contained misinformation)
Tweets were further categorized by whether they contained location data or not.
Having developed the categories, Debogovich and Dr. Kiser then assessed 100 million tweets posted from February 2020 to February 2021. They found that in the first phases of the pandemic public case data lagged tweets by 7-10 days on average. However this was reduced to 2 days in second wave of pandemic.
As a result of these findings, Debogovich and Dr. Kiser concluded that Twitter data could be useful for predicting future COVID-19 cases but the accuracy depended on the dynamics of the pandemic and tweets were most beneficial during times in which cases were rising or trending up.
Waste-water analysis and other tools are helpful in predicting infectious disease outbreaks but digital surveillance could be more effective in predicting spikes in symptoms, said Debogovich.
The study confirms early research done during Twitter’s infancy in which researchers showed how tweets could be used to predict outbreaks of influenza and other diseases. During the presentation, Debogovich said the rapid analysis of the huge amounts of data available on social media platforms remain underutilized for research and public health purposes. Mining data from social media is “hard work” and complex but could be the next big thing in predicting disease outbreaks, she concluded.