Human Emotions based on Facial
Expression using Deep CNN
— In this project we have developed a deep
convolutional neural network model for facial expression recognition. Every facial expression is
classified into one of the six expressions considered for this project. We
implemented convolution-reLU-fully connected layers followed by softmax. To
reduce the overfitting of models, we used dropout and local response
normalization. To recognize emotions at real time we capture live images from
video frames and this image is sent to the model for prediction, then the model
outputs the emotion. The test accuracy obtained is 60.1%.
I. Introduction
Humans use different forms of communications to communicate
with each other such as speech, gestures and emotions. There are many ways of
recognizing human emotions such as from body language, tone of the speech or by
brain mapping. Understanding one’s emotion is challenging when compared to
others. But the best and most common and convenient method used to understand
human emotion is by examining the facial expression. Facial expression provides
cues about emotional response, regulates interpersonal behavior, and
communicates aspects of psychopathology. We have proposed and developed a
neural network model which can efficiently identify human emotions by facial
recognition. The input into the system is an image of the person; then the
network predicts the facial expression. The application of this can be in the field of surveillance
and behavioural classification by law,
automatic capture of photo when a person smiles[1].
II. Goal
Giving the capability to an
artificial neural network to interpret human facial expression, that is to
recognize one of six
categories of human emotions (Angry,Fear, Happy, Sad, Surprise, Neutral)[2].
III. Literature Survey/Related Work
In this section we survey some previous studies and related
work done on image classification.
A. Imagenet classification with deep convolutional neural networks[3] :
A revolutionary paper
in the history of the deep learning by Krizhevsky, Sutskever and Hinton on
Image classification, in which a neural network with 5 convolutional, 3 max pooling, and 3 fully
connected layers was trained and tested using 1.2 million images from the
ImageNet LSVRC-2010 contest and obtained a error rate of 37.5%, which was the
best ever reported at that time. It demonstrated the capability of CNN in real
world image classification problems. It
popularized the use of convolutions along with max pooling and techniques to
reduce overfitting like dropout.
B. Facial expression recognition using local transitional pattern on gabor Filtered facial images[4] :
Emotion
classification work on the Cohn-Kanade database (CK) makes use of Gabor
filtering for image processing and Support vector Machine (SVM). The emotion
recognition accuracies found out to be high, from 88% on anger to 100% on
surprised. A big disadvantage of the approach is that it requires very precise
pre-processing of the data, so that every image adheres to a strict format
before sending as an input to the classifier. This clearly has a problem in
real world applications as the images will not always adhere to the format.
C. Recognizing semantic features in faces using deep learning[5]:
A recent
thesis by Gudi on emotion recognition describes a Deep neural network with
capability to recognize age, race, emotion, and gender from pictures of human
faces. Facial Expression Recognition Challenge (FERC-2013) is used as data set.
A network consisting of 3 convolutional layers, 1 fully connected layer
obtained an average accuracy of 66.56% on emotion classification, which comes
close to previous experimental results published on the same dataset.
IV.
DataSet Evaluation
Neural networks need
large amounts of data for training. The choice of data (images) used for
training is responsible for the performance of the model. So, we need both
highly qualitative and quantitative dataset. For emotion recognition, several
datasets are available for research, varying from a few hundred high resolution
photos to thousands of small low resolution images. The main dataset that will
be used for training as well as testing will be the dataset provided in Facial
Expression Recognition Challenge(FERC-2013)[6].
A. FERC (Facial Emotion Recognition Challenge) :
FERC dataset contains 28,709 training images and 7178 testing images (public and private) each 48 x 48 pixels grayscale. The data is in a .csv file containing three columns. First column is the emotion label for image, second column is their pixel values and third column is an indicator for either training, public test or private test set. We chose to use public test set as validation data and private test set as test data. Each image has to be categorized into one of the seven classes that express different facial emotions. These facial emotions have been categorized as: 0=Anger, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, and 6=Neutral.The large size of the dataset, will be beneficial for the robustness of a model. As observed in the following figure, the number of images corresponding to ‘Disgust’ are few. In contrast ‘Happy’ samples are above 8000. We realised that the accuracy of detecting a particular emotion depends on the amount of training data corresponding to that emotion. So, we decided to merge ‘Disgust’ data set with ‘Angry’ data set, as both represent similar emotion.

thank you for your comment
pls call me on 8125424511