Sentiment Analysis of Tweets

Click the Poster to View Full Screen, Right click to save image

Christian Micco

CoPIs:
Daniel Garcia, Matthew Lane

College:
The Dorothy and George Hennings College of Science, Mathematics, and Technology

Major:
Computer Science

Faculty Research Advisor(s):
Ching-yu Huang

Abstract:
This project will utilize machine learning to capture emotion values from 3 large datasets of tweets. The first dataset is based on hate speech/cyberbullying tweets, the second dataset is based on a more general collection of tweets, and the third dataset is related to the Covid-19 hashtag. The hate speech dataset includes an ID and the Tweet as attributes, the general dataset includes the ID, the Tweet, and a label, and the covid-19 dataset includes 10 attributes including the Tweet and a unique Username, but not all will likely be used as they are unrelated to the sentiment of the tweet. The goal of this project will be to be able to correctly predict the sentiments of tweets with a high degree of accuracy, likely around 80% barring extreme processing times. The data mining process to be used is called sentiment analysis and quantifies emotions like rage or joy as negative (1) or positive (0) values. Machine learning techniques in the deep learning and natural language processing areas will be used, with a pretrained model like the BERT model. A tokenizer import will be used that provides “tokens” which are subsets of a phrase or sentence that conveys the sentiment of the total phrase/sentence. Python programming will be used for our machine learning purposes and the Kean University Obi2 database will be used for data storage and retrieval using MySQL.


Previous
Previous

Kean Skylands Community Day

Next
Next

Dobbs V. Jackson Women's Health Organization: Case Study