Document Type

Thesis

First Faculty Advisor

Suhong Li

Second Faculty Advisor

Gao Niu

Keywords

machine learning; COVID-19; misinformation; topic modeling

Publisher

Bryant University

Rights Management

CC-BY-NC-ND

Abstract

There have been many studies conducted over the last few years that have attempted to uncover the impacts of the COVID-19 pandemic. One of the largest areas of concern with COVID-19 is misinformation, as it is a novel virus that many report on, even if unqualified to do so. This study aims to predict whether a Tweet can be classified as misinformation, and then analyze the differences between Tweets that are labeled as either misinformation or not by this model. Machine learning models are created and validated using the CovidMis20 dataset as a training set. The dataset to be labeled contains COVID-19 Tweets that have been collected by one of the authors since March 2020, and there are currently more than 1 billion Tweets in total. Specifically, this study will focus on the United States, India, and the United Kingdom. Through various analysis techniques performed on the Tweets (word clouds, top hashtags, top users tagged, and topic modeling), this study investigated whether a machine learning model can be effective at predicting misinformation at a global scale. It was discovered that there was some evidence of the model being effective for the United States and United Kingdom. For the United States, Hydroxychloroquine was among the top terms in predicted misinformation, which is a very promising result. For the United Kingdom, Boris Johnson dominated the predicted misinformation dataset. All machine learning methods and analyses for this project were completed in an HPC environment through access granted by the National Science Foundation's CAREERS Cyberteam grant and the University of Rhode Island.

Included in

Data Science Commons

COinS