The Happiest, Saddest, Most Energetic and Most Popular Persian Singers on Spotify

Introduction

I am a music lover and like my other hobbies I am really interested in applying data science methods to it. A few months ago I participated in the third week of the TidyTuesday project where I made a map of Spotify songs based on audio features and a dimensionality reduction algorithm called UMAP. Since then I have been using Spotify’s Web API to collect data and recently, I decided to look at some of my favorite Iranian artists and their songs on Spotify. We have different genres and types of music and while pop and rap are very popular among the younger generation I like the traditional style more. Nevertheless, I was always curios to understand how different the traditional music and pop music are from each other. For this reason, that I like the most These are a few questions that I would like to answer:

  1. How different audio features can be among top Persian singers?
  2. What are the most danceable and least danceable Persian songs?
  3. Who is the most popular Persian singer and what is the most popular song?
library(kableExtra)
library(tidyverse)
library(googlesheets4)
library(tidymodels)
library(gghighlight)
library(hrbrthemes)
library(ggthemes)
library(ggrepel)
library(ggalt)
library(extrafont)
library(ggtext)
library(ggforce)
library(cowplot)

Data Collection

I compiled a list of Persian Singers manually and collected information about their available songs on Spotify using spotifyr which gives us access to Spotify’s API. This process was cumbersome as sometimes I wasn’t getting what I was looking for. For instance, sometimes songs that belonged to another random artist were retrieved. For each singer, we can only retrieve the top 10 popular songs. It means that the rest of songs have no popularity scores. Nevertheless, I collected several types of information about more than 10000 songs (of course some of them are duplicates in the Spoitfy’s playlist or just a remix).

songs_audio_plus_pop <- read_csv('https://raw.githubusercontent.com/mcnakhaee/datasets/master/Persian_Songs_Spotify.csv')

head(songs_audio_plus_pop) %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
track_id poet lyrics lyrics source disc_number duration_ms explicit track_name track_name_farsi artist_name artist_name_farsi popularity track_number album_href album_id album_name album_release_date album_total_tracks album_release_year track_href danceability energy key loudness mode speechiness acousticness instrumentalness liveness valence tempo time_signature key_name mode_name key_mode artist_id lyrics_1 poet_1 lyric source genre
31iPeC6I0AiRW8InOxNKzm NA NA NA 1 446880 FALSE Ghazale Taze NA Salar Aghili سالار عقیلی NA 1 NA 6GcmAWrnnMb2BuVriPhBLa Va Eshgh Amad 2020-02-03 NA 2020 https://api.spotify.com/v1/tracks/31iPeC6I0AiRW8InOxNKzm 0.437 0.390 0 -7.170 0 0.0299 0.839 3.51e-05 0.1360 0.330 131.913 3 C minor C minor NA NA NA NA NA
4Fi46ha8teWYTwk0b8fNPi NA NA NA 1 851920 FALSE Ayeeneye Hosn NA Salar Aghili سالار عقیلی NA 2 NA 6GcmAWrnnMb2BuVriPhBLa Va Eshgh Amad 2020-02-03 NA 2020 https://api.spotify.com/v1/tracks/4Fi46ha8teWYTwk0b8fNPi 0.379 0.146 5 -10.008 1 0.0414 0.970 3.60e-04 0.0812 0.346 105.634 4 F major F major NA NA NA NA NA
0lQAe6EslKA7CUsS7SCW6Q NA NA NA 1 293160 FALSE Tarke Eshgh NA Salar Aghili سالار عقیلی NA 3 NA 6GcmAWrnnMb2BuVriPhBLa Va Eshgh Amad 2020-02-03 NA 2020 https://api.spotify.com/v1/tracks/0lQAe6EslKA7CUsS7SCW6Q 0.437 0.453 5 -5.392 0 0.0349 0.664 2.07e-03 0.1100 0.501 94.651 5 F minor F minor NA NA NA NA NA
6dAFmJdVsKk5ksCpGqnKgO NA NA NA 1 648720 FALSE Moghbacheye Bade Foroosh NA Salar Aghili سالار عقیلی NA 4 NA 6GcmAWrnnMb2BuVriPhBLa Va Eshgh Amad 2020-02-03 NA 2020 https://api.spotify.com/v1/tracks/6dAFmJdVsKk5ksCpGqnKgO 0.488 0.138 2 -12.287 0 0.0451 0.915 6.58e-03 0.2120 0.445 110.967 5 D minor D minor NA NA NA NA NA
4VSDJGyEdSMB8UL4fDSCvv NA NA NA 1 273480 FALSE Bigharar NA Salar Aghili سالار عقیلی NA 5 NA 6GcmAWrnnMb2BuVriPhBLa Va Eshgh Amad 2020-02-03 NA 2020 https://api.spotify.com/v1/tracks/4VSDJGyEdSMB8UL4fDSCvv 0.301 0.443 0 -5.702 0 0.0334 0.657 8.50e-06 0.1200 0.410 148.053 1 C minor C minor NA NA NA NA NA
1tqsOZ3fGtMXL0r2ySBpvA NA NA NA 1 260754 FALSE Negar NA Salar Aghili سالار عقیلی NA 1 NA 09Hepb4NioQ6sO87tsDyiz Negar 2019-10-30 NA 2019 https://api.spotify.com/v1/tracks/1tqsOZ3fGtMXL0r2ySBpvA 0.577 0.366 0 -6.668 0 0.0368 0.834 3.90e-06 0.1110 0.367 77.453 3 C minor C minor NA NA NA NA NA

Overall Song Features

Aside from variables such the album that a song belongs to and its date of release, Spotify’s API can gives us several features that capture different audio characteristics of a song.

You can see a full list of these features in this link. However, I’m only interested in some of these features such as:

  • valence measures the happiness of a song.
  • energy is quite self-explanatory.
  • tempo measures the speed of a song.
  • loudness is also self-explanatory.
  • acousticness identifies whether the track is acoustic
  • instrumentalness shows whether a track contains no vocals.
  • danceability determines how good a song is for dancing.

This wonderful visualization inspired me to create a similar plot for some of the most well-known Persian singers and see how their audio features differ from each other.

artists <-
  c( 'Sirvan Khosravi','Hesameddin Seraj','Rastak','Shahram Nazeri','Hossein Alizadeh','Reza Sadeghi','Alireza Eftekhari','Mohammadreza Shajarian',
     'Salar Aghili','Morteza Pashaei', 'Alireza Ghorbani','Homayoun Shajarian', 'Mohsen Yeganeh' ,'Morteza Pashaei','Moein','Farzad Farzin',
     'Babak Jahanbakhsh', 'Ehsan Khajeh Amiri','Siavash Ghomayshi','Xaniar Khosravi','Tohi' ,'Mohsen Chavoshi','Amir Tataloo',
     'Hamed Homayoun','Kayhan Kalhor')

I’ll plot the average, the minimum and the maximum value of each feature for each singer. That gives us a good picture of how different their audio characteristics are from each other. However, we must make the right adjustments to the dataset before visualizing it:

  1. We need to transform the original dataset into a long form, which can be done by pivot_longer from dplyr package.

  2. We should rescale each audio features, otherwise the plot wouldn’t make any sense.

order <- c(
  "valence",
  "energy",
  "tempo",
  "loudness",
  "acousticness",
  "instrumentalness",
  "danceability"
)

scaled_features_long <- songs_audio_plus_pop %>%
  mutate_at(order, scales::rescale, to = c(0, 7)) %>%
  filter(!is.na(popularity)) %>%
  filter(artist_name %in% artists) %>%
  mutate(artist_name = factor(artist_name))  %>%
  pivot_longer(
    names_to = 'metric',
    cols = c(
      "valence",
      "energy",
      "tempo",
      "loudness",
      "acousticness",
      "danceability"),
    values_to = 'value') 

Now, we can visualize the results for each artist. As I mentioned before, I’ll compare artists by the minimum (red) , the average (orange) and maximum (yellow) values of each audio features in their songs.

ggplot() +
  ### This plots the average of each audio feature
  geom_polygon(
    data = scaled_features_long %>%  group_by(artist_name, metric) %>%
      summarise_at(c("value"), mean) %>%
      arrange(factor(metric, levels = order)) %>%
      ungroup(),
    aes(x = metric, y = value, group = artist_name,),
    alpha = .54,
    size = 1.5,
    show.legend = T,
    fill = '#FF1654'
  ) +
  ### This plots the maximum of each audio feature
  geom_polygon(
    data = scaled_features_long %>%  group_by(artist_name, metric) %>%
      summarise_at(c("value"), max) %>%
      arrange(factor(metric, levels = order)) %>%
      ungroup(),
    aes(x = metric, y = value, group = artist_name,),
    alpha = .44,
    size = 1.5,
    show.legend = T,
    fill = '#FFE066'
  ) +
  ### This plots the mimumn of each audio feature
  geom_polygon(
    data = scaled_features_long %>%  group_by(artist_name, metric) %>%
      summarise_at(c("value"), min) %>%
      arrange(factor(metric, levels = order)) %>%
      ungroup(),
    aes(x = metric, y = value, group = artist_name,),
    alpha = .84,
    size = 1.5,
    show.legend = T,
    fill =  "#EF476F"
  ) +
  scale_x_discrete(
    limits = order,
    labels = c(
      "Happy",
      "Energy",
      "Fast",
      "Loud",
      "Acoustic",
      "Instrumental",
      "Danceable"
    )
  ) +
  coord_polar(clip = 'off') +
  theme_minimal() +
  labs(title = "Persian Singers and Their Audio Characteristics",
       caption = 'Source: Spotify \n Visualization: mcnakhaee') +
  ylim(0, 8) +
  facet_wrap( ~ artist_name, ncol = 4) +
  theme(
    axis.title = element_blank(),
    axis.ticks = element_blank(),
    axis.text.y = element_blank(),
    axis.text.x = element_text(
      family =  'Montserrat',
      size = 13.5,
      margin = ggplot2::margin(30, 0, 20, 0)
    ),
    plot.caption = element_text(
      family = 'Montserrat',
      margin = ggplot2::margin(30, 0, 20, 0),
      size = 11,
      color = 'grey80'
    ) ,
    text = element_text(family =  'Montserrat'),
    strip.text = element_text(family =  'Montserrat', size = 18),
    strip.text.x = element_text(margin = ggplot2::margin(1, 1, 1, 1, "cm")),
    panel.spacing = unit(3.5, "lines"),
    panel.grid = element_blank(),
    plot.title = element_text(
      family = 'Montserrat',
      hjust = .5,
      margin = ggplot2::margin(30, 0, 20, 0),
      size = 32,
      color = 'gray10'
    )
  )

Looking more closely at each audio feature

My first plot is informative but it only gives us an overall picture of audio features but I would like to have a more detailed picture of singers and the audio features for each of their songs. For this reason, I will also make a separate plot for each audio feature where every song and its corresponding feature values are shown. I will also mark a few popular songs from each artist with a separate color on this plot.

# Set a custom theme for our plots
theme_set(theme_void() +
  theme(
    text = element_text(family =  'Montserrat'),
    axis.text.x = element_text(
      family = 'Montserrat',
      margin = ggplot2::margin(30, 0, 20, 0),
      color = 'gray80',
      size = 18
    ),
    axis.text.y = element_text(
      family = 'Montserrat',
      margin = ggplot2::margin(30, 0, 20, 20),
      color = 'gray80',
      size = 20
    ),
    axis.title.x = element_text(
      family = 'Montserrat',
      margin = ggplot2::margin(30, 0, 20, 0),
      size = 22,
      color = 'gray80'
    ),
    plot.title = element_text(
      family = 'Montserrat',
      hjust = .5,
      margin = ggplot2::margin(40, 0, 40, 0),
      size = 35,
      color = 'gray80'
    ),
    plot.caption = element_text(family ='Montserrat',
                                  margin = ggplot2::margin(30, 0, 20, 20),
                                      size = 20,
                                  color = 'gray70') ,
    legend.position = 'none',
    plot.background = element_rect(fill = "#516869")
  ))

Again here I’ll change the dataset to make it ready for visualization.

songs_audio_plus_pop_jitter <- songs_audio_plus_pop %>% 
  filter(artist_name %in% artists) %>% 
  mutate(is_popular = !is.na(popularity)) %>%
  distinct(artist_name,track_name,.keep_all = T) %>% 
  mutate(is_popular_size = if_else(!is.na(popularity),popularity,25),
         is_popular_alpha = if_else(!is.na(popularity),0.8,0.5)) %>% 
  mutate(track_name= str_wrap(track_name, width = 15)) %>% 
  mutate(popular_track_name = if_else(!is.na(track_name_farsi)& !is.na(popularity) & nchar(track_name) < 20 & !explicit,track_name,'')) 

Happiness

songs_audio_plus_pop_jitter %>%
  ggplot(aes(x = artist_name, y = valence)) +
  geom_jitter(
    aes(
      color = is_popular,
      size = is_popular_size,
      alpha = is_popular_alpha
    ),
    size = 6,
    width = 0.2,
  ) +
  geom_text_repel(
    aes(label = popular_track_name , x = artist_name , y = valence),
    family = 'Montserrat',
    color = 'gray99',
    size = 5,
    force = 0.6,
    max.iter = 2000,
    box.padding = 0.4,
    point.padding = 0.6,
    min.segment.length = 0.15,
    nudge_y      = 0.001,
    hjust = 0.5,
    segment.alpha = 0.6,
    segment.size = 0.6
  ) +
  stat_summary(
    fun = mean,
    geom = 'point',
    color = '#FF9F1C',
    size = 5,
    aes(group = artist_name)) +
  scale_color_manual(values = c('#FFD166', '#EF476F')) +
  scale_y_continuous(sec.axis = dup_axis()) +
  coord_flip()

Energy

songs_audio_plus_pop_jitter %>%
  ggplot(aes(x = artist_name, y = energy)) +
  geom_jitter(
    aes(
      color = is_popular,
      size = is_popular_size,
      alpha = is_popular_alpha
    ),
    size = 6,
    width = 0.2,
  ) +
  geom_text_repel(
    aes(label = popular_track_name , x = artist_name , y = energy),
    family = 'Montserrat',
    color = 'gray90',
    size = 6,
    force = 0.6,
    max.iter = 2000,
    box.padding = 0.4,
    point.padding = 0.6,
    min.segment.length = 0.15,
    nudge_y      = 0.001,
    hjust = 0.5,
    segment.alpha = 0.6,
    segment.size = 0.6
  ) +
  stat_summary(
    fun = mean,
    geom = 'point',
    color = '#FF9F1C',
    size = 5,
    aes(group = artist_name)
  ) +
  scale_color_manual(values = c('#EF476F', '#EF476F')) +
  scale_y_continuous(sec.axis = dup_axis()) +
  coord_flip() 

Acousticness

songs_audio_plus_pop_jitter %>%
  ggplot(aes(x = artist_name, y = acousticness)) +
  geom_jitter(
    aes(
      color = is_popular,
      size = is_popular_size,
      alpha = is_popular_alpha
    ),
    size = 6,
    width = 0.2,
  ) +
  geom_text_repel(
    aes(label = popular_track_name , x = artist_name , y = acousticness),
    family = 'Montserrat',
    color = 'gray90',
    size = 6,
    force = 0.6,
    max.iter = 2000,
    box.padding = 0.4,
    point.padding = 0.6,
    min.segment.length = 0.15,
    nudge_y      = 0.001,
    hjust = 0.5,
    segment.alpha = 0.6,
    segment.size = 0.6
  ) +
  stat_summary(
    fun = mean,
    geom = 'point',
    color = '#FF9F1C',
    size = 5,
    aes(group = artist_name)
  ) +
  scale_color_manual(values = c('#118AB2', '#06D6A0')) +
  scale_y_continuous(sec.axis = dup_axis()) +
  coord_flip() 

Danceability

songs_audio_plus_pop_jitter %>%
  ggplot(aes(x = artist_name, y = danceability)) +
  geom_jitter(
    aes(
      color = is_popular,
      size = is_popular_size,
      alpha = is_popular_alpha
    ),
    size = 6,
    width = 0.2,
  ) +
  geom_text_repel(
    aes(label = popular_track_name , x = artist_name , y = danceability),
    family = 'Montserrat',
    color = 'gray90',
    size = 6,
    force = 0.6,
    max.iter = 2000,
    box.padding = 0.4,
    point.padding = 0.6,
    min.segment.length = 0.15,
    nudge_y      = 0.001,
    hjust = 0.5,
    segment.alpha = 0.6,
    segment.size = 0.6
  ) +
  stat_summary(
    fun = mean,
    geom = 'point',
    color = '#FF9F1C',
    size = 5,
    aes(group = artist_name)
  ) +
  scale_color_manual(values = c('#A5668B', '#EF476F')) +
  scale_y_continuous(sec.axis = dup_axis()) +
  coord_flip()

Loudness

songs_audio_plus_pop_jitter %>%
  ggplot(aes(x = artist_name, y = loudness)) +
  geom_jitter(
    aes(
      color = is_popular,
      size = is_popular_size,
      alpha = is_popular_alpha
    ),
    size = 6,
    width = 0.2,
    
  ) +
  geom_text_repel(
    aes(label = popular_track_name , x = artist_name , y = loudness),
    family = 'Montserrat',
    color = 'gray90',
    size = 6,
    force = 0.6,
    max.iter = 2000,
    box.padding = 0.4,
    point.padding = 0.6,
    min.segment.length = 0.15,
    nudge_y      = 0.001,
    hjust = 0.5,
    segment.alpha = 0.6,
    segment.size = 0.6
  ) +
  stat_summary(
    fun = mean,
    geom = 'point',
    color = '#FF9F1C',
    size = 5,
    aes(group = artist_name)
  ) +
  scale_color_manual(values = c('#06D6A0', '#EF476F')) +
  scale_y_continuous(sec.axis = dup_axis()) +
  coord_flip() 

Avatar
Muhammad Chenariyan Nakhaee
Machine Learning Researcher

I am Muhammad,a data scientist and machine learning enthusiast

Related