Code for this project can be found HERE
My first notebook introduces the problem space for this project by asking the question: How can we predict the sentiment of tweets directed at a professional sports team?. In this case, I am using tweets that mention the key phrase ‘Dallas Mavericks’.
In order to answer this, I acquired data that was collected by Alex Huggler by gathering all tweets that mentioned ‘Dallas Mavericks’ during the 2021-22 basketball season. The data is hosted on Kaggle and can be found HERE and hosted on Kaggle.
I utilized the Pandas library in order to organize and manipulate the data. In doing so, I:
The end result of this was a somewhat imbalanced dataset, where approximately 40% of the tweets were Positive, 40% were Neutral but only 20% were Negative.
With the dataset cleaned, the next step was to use Natural Language Processing and Feature Engineering to ensure the dataset was ready for modelling and this was conducted in Part 2 of this project.