Leveraging NLP Techniques to Summarize Reviews
College:
The Dorothy and George Hennings College of Science, Mathematics, and Technology
Major:
Computer Science
Faculty Research Advisor(s):
Daehan Kwak
Abstract:
This research project addresses the issue of information overload associated with
e-commerce platforms, specifically customer reviews. The goal is to simplify the review analysis
process, which is done through the use of various Natural Language Processing (NLP)
techniques. This project uses the Yelp dataset due to its substantial size of 5.3GB. This dataset
contains 6.9 million total reviews, 150 thousand restaurants, and 2 million users. Through the use
of Python and NLTK libraries for text manipulation and sentiment analysis, the study
preprocesses the text from reviews to extract insights. TextBlob and Vader were used to perform
sentiment analysis on the text, and the text manipulation was done using the stem, lemma,
tokenize, regex, collections, and pandas libraries. In the future, this project aspires to be made
fully autonomous through the use of more advanced techniques such as semantic analysis and
topic modeling. By alleviating information overload, this research aims to simplify the review
analysis process and increase productivity, accuracy, and efficiency by providing concise and
detailed summaries.