Airline Reviews Sentiment Analysis
CONCEPT
In the "Airline Reviews Sentiment Analysis," customer satisfaction in the airline industry is examined using a refined dataset from Airline Quality reviews (Kaggle). The dataset, cleansed for analysis, is subjected to EDA, revealing insights and correlations, and sentiment analysis using VADER and TextBlob. Discoveries include temporal sentiment fluctuations, identification of airlines with consistently lower scores, and targeted improvement recommendations. Critical flight aspects were scrutinized, guiding suggestions for enhancement. Statistical robustness, validated through Welch’s T-test, underlines the transformative impact of data on customer experience, informing business decisions and elevating service quality in the airline industry.
DATA
Utilizing Airline Quality, a platform for travelers to share experiences, dataset is sourced from Kaggle, structured from web-scraped data with beautifulsoup. Legal and privacy adherence ensured, the dataset poses no concerns, available in CSV format.
With 23,172 rows (23,171 observations), data preparation involved cleansing by substituting missing text entries with 'N/A' and missing numeric values with rounded means. Columns irrelevant to analysis were dropped, and the dataset filtered to airlines with 100 reviews.
Post-cleaning and transformations, the final dataset comprises 2,589 rows (2,588 observations) across 19 variables, aligning for effective insights to aid airlines in enhancing customer satisfaction.
APPROACH
In the "Airline Reviews Sentiment Analysis" project, a comprehensive approach to sentiment analysis was undertaken, with a focus on rigorous text preprocessing. Elements such as special characters, emoticons, and irrelevant punctuation were removed or standardized to streamline data complexity and emphasize meaningful content.
Algorithms and Methodologies
For sentiment analysis, two contrasting techniques were implemented, with primary emphasis on the Valence Aware Dictionary and Sentiment Reasoner (VADER). VADER, a lexicon and rule-based tool, was utilized for its excellence in social media sentiment analysis, generating nuanced polarity and intensity scores. The numerical scores for negativity, neutrality, positivity, and an overall compound score were obtained through the implementation of the SentimentIntensityAnalyzer() class in Python's NLTK.
Practically, a moderate positive correlation (Pearson coefficient ~0.554) was discovered between VADER and TextBlob sentiment scores. The differences observed are attributed to the handling of nuances in language, context, sarcasm, and negation. The practical implications emphasize that, although VADER and TextBlob may often align, they are not interchangeable, particularly in capturing the subtleties of sentiment in airline reviews.
This approach laid the groundwork for meaningful insights into passenger satisfaction and contributed to the enhancement of the airline industry. The subsequent application of algorithms ensured a nuanced understanding of sentiments, resulting in the project's success in providing actionable recommendations for improving the overall quality of air travel.
Post a comment