Face Recognition – The essential part of “Face ID”

Upon seeing a person, what enters our eyes is the person’s face. Human face plays an important role in our daily life when we interact and communicate with others. Unlike other biometrics such as fingerprint, identifying a person with its face can be a non-contract process. We can easily acquire face images of a person from a distance and recognize the person without interacting with the person directly. As a result, it is intuitive that we use human face as the key to build a Face Recognition system.



Over the last ten years, Face Recognition is a popular research area only in computer vision. However, with the rapid development of deep learning techniques in recent years, Face Recognition has become an AI  topic and more and more people are interested in this field. Many company such as Google, Microsoft and Amazon have developed their own Face Recognition tools and applications. In the late 2017, Apple also introduced the iPhone X with Face ID, which is a Face Recognition system aimed at replacing the fingerprint-scanning Touch ID to unlock the phone.


What Face Recognition can be used?

  • automated border system for arrival and departure in the airport
  • access control system for a company
  • criminal surveillance system for government
  • transaction certification for consumer
  • unlocking system for phone or computer


How Face Recognition Works?

Face Recognition system can be divided into three parts:

  • Face Detection : tell where the face is in the image
  • Face Representation : encode facial feature of a face image
  • Face Classification : determine which person is it

Face Detection

Locating the face in the image and finding the size of the face is what Face Detection do. Face Detection, is essentially an object-class detection problem for a given class of human face. For object detection in computer vision, a set of features is first extracted from the image and classifiers or localizers are run in sliding window through the whole image to find the potential bounding box, which is time-consuming and complex. With the approach of deep learning, object detection can be accomplished by a single neural network, from image pixels to bounding box coordinates and class probabilities, with the benefit of end-to-end training and real-time prediction. YOLO, which is an open source real-time object detection system, was built for Face Detection in our Face Recognition pipeline.


Face Representation

With the goal of comparing two faces, computing the distance of two face images pixel by pixel is somehow impracticable because of large computing time and resources. Thus, what we need to do is extract face feature to represent face image.

“The distance between your eyes and ears” and “The size of your noes and mouth”….

These facial features become an easy measurement for us to compare whether the two unknown face represent the same person. Eigen-face and genetic algorithm are used in old days to help discover these features. With the new deep learning technique, a deep neural network project each face image on a 128-dimensional unit hypersphere and generate feature vector of each image for us.

Regarding to transforming face images into Face Representations, OpenFace and DLIB are two commonly used model to generate feature vector. Some experiments are done for these two models and we found out that the face representation for DLIB model is more consistent between each frames for the same person and it indeed outperformed OpenFace model for accuracy test, as a result, DLIB was finally used as our face representation model.


Each vertical slice represents a face representation for a specific person from a image frame. The x-axis is the timestamp for each frame of video. This results show that dlib model does a better job at making consistent images-to-representations transformation for the face image of the same person between each frame.


Face Classification

Gathering the face representations for each person to build a face database, a classifier can be trained to classify each person. To stabilize the final classification results, “weighted moving average” is introduced into our system where we take classification results from the previous frames into consideration when determining the current classification results. With this mechanism, we found out that it smoothes the final classification results and has a better performance on accuracy test compared to classification result from a single image.


Feature image by Ars Electronica / CC BY

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *