Approach

To develop a classifier that can distinguish music genre, we prepare three datasets with three different music genres- Country, Rap and Jazz. We generated 400 song data for each genre, and train the classifier with the dataset. We use three different datasets to train three models – one use (Country, Rap) dataset, another use (Country, Jazz) as dataset, and the other use (Country, Rap, Jazz) dataset. As for attributes, we perform text mining technique- TF-IDF on the lyrics to extract the features (the important keywords) of each song. TF-IDF here is used to weigh a keyword in a song, and assign the importance to that keywords based on the frequency it appears in that song. It can also filter out the keywords that appear frequently but in fact has low relevance to that song by finding out the frequency of the keywords’ appearance on the other songs. After obtaining TF-IDF matrix of all the song lyrics, we tried several different classifiers such as Naïve Bayes, Logistic Regression, MultiLayer Perceptron (MLP) using Sklearn and DNN using Keras to fit the model. For the validation method, we use 10-fold cross validation.

Detailed implementation: