Result

Naïve Bayes classifier

Naïve Bayes classifier is a simple and efficient linear classifier. The probabilistic model of naive Bayes classifiers is based on Bayes’ theorem, and it assumes that every word is independent given the class variable. Naïve Bayes can perform well under this assumption especially for small sample sizes. However, in our case, every word in the song lyrics can’t be independent to each other. Each word in a sentence is relevant to each other to form a meaningful sentence.

Therefore, due to the violation of the independent assumption, NB gives the relatively low accuracy compare to other classifiers. We also noticed that NB got particularly bad result on the dataset (Jazz, Rap) and (Jazz, Country), compared to the results of other classifiers. We printed out the error prediction of the test data and find out that for the (Jazz, Rap) dataset, almost all the error occurs due to the fact that Jazz songs are mislabeled to Rap. As for the (Jazz, Country) dataset, almost all the error occurs due to the fact that Jazz songs are mislabeled to Country. This phenomenon didn’t happen with any other classifiers. The reason might because that the vocabulary volume in Jazz is much less than that in Rap or Country. Therefore, lots of word in Rap or Country don’t appear in Jazz song and this result in low probability product. This may be the reason that why Jazz songs are always misclassified to other genres.

Logistic Regression

Logistic Regression can make predictions of P(Y|X) directly based on our training data. In contrast to Naïve Bayes, it is a discriminative classifier that need not to hold under the assumption that features are independent to each other given the class variable. Also, it uses gradient descent technique to solve the problem. As a result, LO gives a better accuracy in our result.

SGD Classifier (SVM)

SGD classifier implement linear classifier with stochastic gradient descent (SGD). In our case, we use default- a linear support vector machine (SVM) as the classifier.

Multilayer Perceptron

A Multilayer Perceptron consists of an input layer, at least one hidden layer and also an output layer. It includes non-linear activations for each layer. MLP is trained using backpropagation, where the weights updated iteratively. It is capable of learning nonlinear models well if we tune the parameters properly.

We tried out using MLPClassifier in Sklearn and Sequential Model in Keras to build different DNN models and compared the two result.

MLPClassifier:

We use add three hidden layers with tanh being an activation function for each layer. Each hidden layer unit is set to 10. The result is slightly better than LO classifier.

DNN in Keras:

The neural network we built using sequential model in Keras also consists of three hidden layers with activation function tanh for each hidden layer. Different from the previous one, we apply softmax activation function on the output layer, which makes the accuracy better than the previous one. MLP for multi-class softmax classification so far gives the best result among all the classifiers.