Human Emotion Recognition Project

Engine/Framework:

Python 2.7.9

 

Coding & Scripting Languages:

Python

 

Target Platform:
Windows PC

 

Development Time:

1 Week

 

Completed On:
2018

Tools Used: 

Anaconda, Notepad++

Libraries Used: 

OpenCV, Boost-Python, Numpy, 

Introduction

The aim of this project is to create an application that can detect which emotion among joy, anger, sadness, disgust, fear and surprise, a face is experiencing. There are various third-party tools that help to achieve basic emotion recognition so in addition this project will look and determine if it is possible to for a simple such program to be used for a commercial or non-commercial purpose. 
 

Methodology

The first step in this project was acquiring a database of faces and editing it to be suitable for use. After some searching this project was granted a license to use the CK+ Dataset. With some quick python scripts that used the databases weighing’s (can be found in the application folder) the images were categorized per emotion, turned into grayscale and cropped so only the face remained. Some manual pruning occurred in the neutral folder as there were many repeats of the same person. These were deleted by hand in case they would influence the classifier to choose a person rather than the feeling they were displaying. This provided quite the ideal database. 
 
There are three versions implemented in the application. Once run the user would be presented with the possibility of choosing which version they want to use. The first implementation uses the Fisher Face method which creates an approximation of the average face for each emotion using all the known data for it. It then uses these master-faces to make a simple comparison between the image and the fisher face. This can work quite well in emotion recognition as the states do differ between them. It can however miss subtle changes and is therefore not very accurate. 
 
The second uses a prebuilt face detector to populate an image/face with 68 landmarks. Using these it can reduce these images to an array of floats or vectors making comparison much easier. Due to the number landmarks this method can prove better for subtle details without being significantly slower, at least for the size of the database. 
 
The third classifier also uses the Fisher Face method but is instead made to retain its training information and then detect a face through a webcam and determine what emotion this face is experiencing. It was trained in a variety of environments with differing lighting and backgrounds. 
 
The main tool used in this project is OpenCV, an open-source library focused on computer vision and machine learning. Unlike alternatives such as MATLAB, OpenCV’s focus on computer vision means that not only does it have all the algorithms that are 
needed for this project but that they perform well, something quite important when it comes to large databases or real-time video editing. Finally, it’s support of many languages meant that different implementation avenues could be explored. After a C++ implementation and a C# implementation (Running OpenCV on the game engine Unity) it was decided to use python for the final project.  
 
Anaconda was used as the main python redistributor as it provided may helping tools such as a superior command prompt and needed python add-ons such as Numerical Python and matplotlib. Additionally, Dlib and Python-Boost were used to further extend the language. Visual Studio was used to compile the third-party libraries. 
 
The actual development was made using Notepad++ as it was lightweight enough to be quick while also providing certain features such as indentation markers and error handling. For running the files the GUI version of python 2.4.14 was used as it also offered some limited error handling. 
 

 
Results

The Fisher Face version achieved average scores of 93.4% with the 355 images of the personal database. This average was found after 100 runs with the lowest accuracy reported being 85% and the highest 97%. However once presented with the CK+ dataset that average fell to 73 with its 296 images. The average was found after 100 runs with the lowest accuracy reported being 60% and the highest 83%. By removing the emotions that were least used (fear and sadness) the average accuracy rose to 81.9% but then failed to meet the standards of the submission. 
 
The Landmarks version fared a bit better as it achieved average scores of 80% with the 296 images of the CK database. This average was found after 100 runs with the lowest accuracy reported being 74% and the highest 91%. This was a significantly better performance that the Fisher Face equivalent one. It also had increased performance when the lower performing emotions were removed but the difference was minimal, from 1% to 3%. 
 
The real-time application shows the camera footage to the user and asks them to press ‘Q’ in order for it to make a prediction. It has been trained to work with a singular face and while showed high accuracy (96%) at first, when the location and appearance of a person was changed it went down to 65%. With further training in a variety of locations it could become quite a robust solution. 
 
When images sourced from google (not retained due to copyright) were used the performance of all three went down drastically with 30%-40% being the norm. This shows how much performance stood on the quality of the dataset. 
 

Conclusions

Computer vision remains a big challenge to be tackled. While this project was able to achieve a relatively high accuracy that was only possible to the high quality of the datasets and the powerful tools that were available.  Within the tight constraints of this project it appears that the Landmark method outperforms the Fisher Face method by an average of 15% accuracy. When tested with 
random images both solutions had low accuracy. 30%-40% which while still above the random chance of 16% could not be further used in an application. 
 
It is this projects conclusion that simple face emotion recognition software is not viable for real life usage. While accuracy was high during ideal circumstances that won’t be the case outside of a testing ground. It would require a more advanced technology or a dedicated development team using more complex implementations to achieve results that could be used to achieve a project goal.

References

Kumar, Vinay. Agarwal, Arpit. Mittal, Kanika. Introduction to Emotion Recognition for Digital Images. (2018). Available at: https://hal.inria.fr/inria-00561918/PDF/TutorialIntroduction_to_Emotion_Recognition_for_Digital_Images.pdf 
 
Van Gent, P. Computer Vision. (2016).

Available at: http://www.paulvangent.com/category/computer-vision/ 
 
Gage, Justin, Introduction to Emotion Recognition. (2018).  Available at: https://blog.algorithmia.com/introduction-to-emotion-recognition/ 
 
OpenCV, OpenCVTutorials. (2018).  Available at: https://docs.opencv.org/master/d9/df8/tutorial_root.html 


Lucey, P. Cohn, J. F. Kanade, T. Saragih, J. Ambadar, Z. Matthews, I. (2010). The Extended Cohn-Kanade Dataset (CK+): A complete expression dataset for action unit and emotion-specified expression. Proceedings of the Third International Workshop on CVPR for Human Communicative Behavior Analysis.

Available at : http://www.dataonthemind.org/node/1617 ▪ Kanade, T. Cohn, J. F. Tian, Y. (2000).

 

Comprehensive database for facial expression analysis. Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00), Grenoble, France, 46-53. [TYPE: DATASET]