google2c7a87877ffe6fb7.html

Twitter Sentiment Analysis using machine learning

Twitter Sentiment Analysis

Problem Statement

Twitter is a popular social networking website where members create and interact with messages known as “tweets”. This serves as a mean for individuals to express their thoughts or feelings about different subjects. Various different parties such as consumers and marketers have done sentiment analysis on such tweets to gather insights into products or to conduct market analysis. Furthermore, with the recent advancements in machine learning algorithms, we are able improve the accuracy of our sentiment analysis predictions.

In this report, we will attempt to conduct sentiment analysis on “tweets” using various different machine learning algorithms. We attempt to classify the polarity of the tweet where it is either positive or negative. If the tweet has both positive and negative elements, the more dominant sentiment should be picked as the final label.

We use the dataset from Kaggle which was crawled and labeled positive/negative. The dat provided comes with emoticons, usernames and hashtags which are required to be processed and converted into a standard form. We also need to extract useful features from the text such unigrams

and bigrams which is a form of representation of the “tweet”. We use various machine learning algorithms to conduct sentiment analysis using the extracted features.

However, just relying on individual models did not give a high accuracy so we pick the top few models to generate a model ensemble. Ensembling is a form of meta learning algorithm technique where we combine different classifiers in order to improve the prediction accuracy. Finally, we report our experimental results and findings at the end.

2 Data Description

The data given is in the form of a comma-separated values files with tweets and their corresponding sentiments. The training dataset is a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative) , and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type

tweet_id,tweet. and emoticons contribute to predicting the sentiment, but URLs and references to people don’t.

Therefore, URLs and references can be ignored. The words are also a mixture of misspelled words, extra punctuations, and words with many repeated letters. The tweets, therefore, have to be preprocessed to standardize the dataset

SS InfoTech software services, academic projects and paper publication services

Twitter Sentiment Analysis using machine learning

Twitter Sentiment Analysis

Author : ss infotech

Share this

Total Pageviews

SS InfoTech software services, academic projects and paper publication services

Twitter Sentiment Analysis using machine learning

Twitter Sentiment Analysis

Author : ss infotech

Share this

Related Posts

Total Pageviews