Document Type

Thesis

First Faculty Advisor

Suhong Li

Keywords

COVID-19; fake news; misinformation; machine learning

Publisher

Bryant University

Rights Management

CC-BY-NC-ND

Abstract

The aim of this project is to use a machine learning model to identify COVID-19 fake news on Twitter and perform additional analysis on the fake news tweets to distinguish any common trends. As misinformation is very common in the information found online, the purpose of the study is to see how machine learning can be used to discern what information can be classified as true versus what is false. Prior research regarding fake news detection, modeling, and analysis was conducted to familiarize on the current studies provided in predicting and analyzing COVID-19 fake news on Twitter. In this study, over one billion tweets were collected and analyzed between March 2020 and October 2021 via the Twitter API using #covid and COVID-19 as keywords. Specific keywords were set to further filter the data. The training data set was retrieved from the University of Pennsylvania's COVID-19 healthcare misinformation dataset (CoAID), which included 137,799 tweets that were manually classified as 'Real' or 'Fake'. A binary logistic regression model was used to classify the COVID-19 tweets, resulting in approximately 92% of the tweets containing accurate information regarding COVID-19, while the remaining 8% containing false information about the coronavirus. Of the English-based tweets, Twitter posts with false information on COVID-19 were commonly related to politics, vaccinations, and social distancing, regardless of location.

Included in

Data Science Commons

COinS