What do the songs talk about? What is the sentiment of their lyrics? And what about the genres or the artists? This project tries to shed some light on this and other questions.
This is the project that I created for my Mid-Bootcamp task in IronHack.
I had some requirements and restrictions like:
1. Collect the data by myself (cannot download datasets)
2. The dataset should have between 30 and 100 observations (rows) and 5 to 10 features (columns)
3. I could enrich the dataset with more information obtained with other methods than manual typing (for example, web scraping)
4. Need to complete one analysis to answer the questions that I have to solve with this project, also I should supplement the analysis with some hypothesis.
My project
My questions were about the lyrics of the songs that we use to listen every day, these questions are:
1. Do the lyrics have an overall positive sentiment?
2. Are women's lyrics more positive than men's
3. Are pop's lyrics better than hip hop's?
My solution
Python
- Creation of the dataset using Python:
I decided to collect information from Spotify Charts because I wanted to analyze lyrics globally, so I chose the top artist of week 47 of the year 2022. The process I followed was to type in a file the artist of that chart and also use Last.fm to type more information like gender, main genre and if is a band.
Then I complete the dataset by searching the 10 most popular songs of every artist in Spotify using their API.
After that, I used the lyricsgenius library to connect to the website Genius to download 5 lyrics of the 10 most popular songs in Spotify of every artist (because the name of the songs in Spotify not always has a corresponding name in Genius).
At this point, I also created one function with Selenium to get the connection token, just in case the token to connect to Genius changes.
Right now is stored in a file (secrets.txt) but could get it without storing it.
- Sentiment Analysis:
Once I downloaded all the information, needed to make a treatment to do the sentiment analysis (using the library Flair) and natural language processing (NLTK). - Translations:
Since the list of songs was used in different languages, I decided to translate all into English to simplify the analysis. For this, I used the library Fasttext with their pre-trained model to detect the language of the lyrics and then translate them to English with the library translators which if cannot do the translation uses google translator to perform the work. - Top words:
I calculated the top words with NLTK functions and update manually their stop words list to include more that where not interesting to my analysis. - Wordcloud:
I also created a function to generate a one-word cloud with the library wordcloud and PIL to show different shapes that I downloaded. - Hypothesis analysis:
In another Jupyter notebook, I developed the hypothesis analysis and prepared the dataset for the visual analysis with Tableau
Tableau
I used Tableau in his public version to create the presentation of the project and the visual analysis of the project, which can be found in my Tableau
Flask
The project includes one demo in Flask that allows to search one son of one artist and will show the lyrics with the top of the words translated to english, the sentiment analysis and one wordcloud.
To summarize the links of my project, here they are:
-Visualizations: My Tableau
-Source codes (including the Flask App): My Github
-Presentation video: My Youtube