Human Emotions based on Facial Expression using Deep CNN





Human Emotions based on Facial Expression using Deep CNN



In this project we have developed a deep convolutional neural network model for facial expression  recognition. Every facial expression is classified into one of the six expressions considered for this project. We implemented convolution-reLU-fully connected layers followed by softmax. To reduce the overfitting of models, we used dropout and local response normalization. To recognize emotions at real time we capture live images from video frames and this image is sent to the model for prediction, then the model outputs the emotion. The test accuracy obtained is 60.1%.


                                                                                             I.                Introduction

Humans use different forms of communications to communicate with each other such as speech, gestures and emotions. There are many ways of recognizing human emotions such as from body language, tone of the speech or by brain mapping. Understanding one’s emotion is challenging when compared to others. But the best and most common and convenient method used to understand human emotion is by examining the facial expression. Facial expression provides cues about emotional response, regulates interpersonal behavior, and communicates aspects of psychopathology. We have proposed and developed a neural network model which can efficiently identify human emotions by facial recognition. The input into the system is an image of the person; then the network predicts the facial expression. The application  of this can be in the field of surveillance and behavioural classification  by law, automatic capture of photo when a person smiles[1].

                                                                                                                                                                           II.              Goal

Giving the capability to an artificial neural network to interpret human facial expression, that is to recognize one of six categories of human emotions (Angry,Fear, Happy, Sad, Surprise, Neutral)[2].

                                                                                                                            III.             Literature Survey/Related Work

In this section we survey some previous studies and related work done on image classification.

A.   Imagenet classification with deep convolutional neural networks[3] :

A revolutionary paper in the history of the deep learning by Krizhevsky, Sutskever and Hinton on Image classification, in which a neural network with  5 convolutional, 3 max pooling, and 3 fully connected layers was trained and tested using 1.2 million images from the ImageNet LSVRC-2010 contest and obtained a error rate of 37.5%, which was the best ever reported at that time. It demonstrated the capability of CNN in real world image classification  problems. It popularized the use of convolutions along with max pooling and techniques to reduce overfitting like dropout.

B. Facial expression recognition using local transitional pattern on gabor Filtered facial images[4] :

Emotion classification work on the Cohn-Kanade database (CK) makes use of Gabor filtering for image processing and Support vector Machine (SVM). The emotion recognition accuracies found out to be high, from 88% on anger to 100% on surprised. A big disadvantage of the approach is that it requires very precise pre-processing of the data, so that every image adheres to a strict format before sending as an input to the classifier. This clearly has a problem in real world applications as the images will not always adhere to the format.

C. Recognizing semantic features in faces using deep learning[5]:

A recent thesis by Gudi on emotion recognition describes a Deep neural network with capability to recognize age, race, emotion, and gender from pictures of human faces. Facial Expression Recognition Challenge (FERC-2013) is used as data set. A network consisting of 3 convolutional layers, 1 fully connected layer obtained an average accuracy of 66.56% on emotion classification, which comes close to previous experimental results published on the same dataset.

                                                                           
          
IV.            
DataSet Evaluation

Neural networks need large amounts of data for training. The choice of data (images) used for training is responsible for the performance of the model. So, we need both highly qualitative and quantitative dataset. For emotion recognition, several datasets are available for research, varying from a few hundred high resolution photos to thousands of small low resolution images. The main dataset that will be used for training as well as testing will be the dataset provided in Facial Expression Recognition Challenge(FERC-2013)[6].

A. FERC (Facial Emotion Recognition Challenge) :

FERC dataset contains 28,709 training images and 7178 testing images (public and private) each 48 x 48 pixels grayscale. The data is in a .csv file containing three columns. First column is the emotion label for image, second column is their pixel values and third column is an indicator for either training, public test or private test set. We chose to use public test set as validation data and private test set as test data.  Each image has to be categorized into one of the seven classes that express different facial emotions. These facial emotions have been categorized as: 0=Anger, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, and 6=Neutral.


The large size of the dataset, will be beneficial for the robustness of a model. As observed in the following figure, the number of images corresponding to ‘Disgust’ are few. In contrast ‘Happy’ samples are above 8000. We realised that the accuracy of detecting a particular emotion depends on the amount of training data corresponding to that emotion. So, we decided to merge ‘Disgust’ data set with ‘Angry’ data set, as both represent similar emotion.

                                                                                              System Architecture

Our proposed system architecture consists of two main modules: Image Manipulation Module, Neural Network Module. The image is first preprocessed using openCV. The preprocessed image is fed into Convolutional Neural Network layers which gives the emotion as the output. The system architecture is as shown below: 

Share this

Related Posts

Previous
Next Post »

thank you for your comment

pls call me on 8125424511