Once identity walks in front of the camera, the system goes through certain face recognition steps. These are not actually phases but different processes. For better understanding, we'll look at them as phases:
1. Head and face detection: The system marks the head and the face with a green circle and blue broken-line square. Sighting is triggered after initial face detection is done. Head detection serves as a base for people counting and heat maps while face detection and sighting serve the complete face recognition process. The first phase is done on a locally dedicated PC.
2. In the second phase, the system does landmark detection & face alignment. The system provides 3 dots to a detected face in the area of the eyes and nose as a signal that the system has located the face and knows its orientation as a 2D surface in a 360-degree space. Face alignment is important due to the fact that the system only executes face recognition from a frontal face position and face alignment can compensate only the in-plane face rotation and scaling to normalize face detection for subsequent steps. Meaning, in this phase, the system is scanning the face and actually sets up certain parameters for recognition. In this phase system still does AI processing on a locally dedicated server.
3. The third phase is the identification consisting of embedding calculation and attributes prediction. This is where the central embedder converts the biometric data collected through the sighting into a vector consisting of 512 numbers. Each time an identity shows up in front of a camera, embedding calculations, and attributes prediction are performed for every detection of sufficient quality within the sighting. So if the subject appears 10 times embedder will create a new vector each time. The central embedding process is done on a locally dedicated server and afterward gets sent to a cloud database. So, all vectors ever created are stored on a dedicated cloud. These predictions are aggregated for sighting in order to get a better estimate for embedding vectors and attributes. The sighting central embedding vector is then used in the identification process. Once the vector is sent to a cloud database, it's compared to all existing vectors there to check if any one of those has a similar value. If there is an existing vector with a similar value it joins the new one and enters the embedding vector collection that is a representation of one identity. The system tracks the similarity of vectors with beforehand set thresholds. Slightly similar vectors are listed as similar identities. If a new vector is different from any existing vector, the system creates a new identity.
4. The last phase in this cycle is Age and Gender Prediction. This is where the system makes an estimation of identity's age and gender based on the face analysis. The age and gender prediction are performed by a neural network or a model trained especially for this purpose. Meaning, age and gender prediction is executed independently.