How We Built a Real-Time Feedback-Assisted Auto Face Capture in React
Capturing a valid photo that meets certain criteria can be tricky, especially when users need to ensure their faces are aligned correctly, lighting is appropriate, and no obstructions are present. Recently, I had the opportunity to work on an exciting auto face capture feature that assists users in capturing photos by guiding them in real-time.
This feature automatically captures a photo once all conditions are met, eliminating the need for manual intervention. It was built using a combination of MediaPipe Face Landmarker machine learning model and a secondary model which detects other facial attributes which the MediaPipe Face Landmarker cannot, integrated into a React-based UI. In this blog, we’ll be mainly diving into how MediaPipe Face Landmarker can be used to process frames from video stream and provide near real time results.
Final Output
Overview
The auto face capture is designed to provide real-time feedback while the user is in front of the camera, ensuring their photo meets all criteria before it’s captured. Here’s a high-level overview of what this feature does:
Face Detection: Detects facial landmarks (eyes, nose, mouth, etc.) and facial attributes.
Validation: Checks for conditions such as proper lighting, face alignment, distance from the camera, and whether the face is covered.
Auto Capture: Once the face meets all the required conditions, the system automatically captures the frame after a short countdown.
Tools Used
MediaPipe Face Landmarker ML Model: This model identifies facial landmarks, providing the x, y, z coordinates of key points on the face.
Ref - https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker
React: For rendering UI
Step-by-Step Implementation with React
Let’s break down how this is implemented
Creating Face Landmarker instance
First we need to install the following package from Google -
@mediapipe/task-vision
which will help in detecting the landmarks of faces. Once done, we can initialize the face landmarker instance which will also download the binary of model -face_landmarker.task
export const createFaceLandmarker = async () => {
const filesetResolver = await FilesetResolver.forVisionTasks(
'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision/wasm',
);
const faceLandmarker = await FaceLandmarker.createFromOptions(
filesetResolver,
{
baseOptions: {
modelAssetPath: `https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task`,
delegate: 'CPU' // or 'GPU', check if GPU is available and set accordingly
},
outputFaceBlendshapes: true,
runningMode: 'IMAGE',
numFaces: 50,
},
);
return faceLandmarker;
};
We will run this model in IMAGE
mode.
Setting up the Video Stream
Next is accessing the device camera and streaming the video feed into an HTML video element. We achieve this by using the
navigator.mediaDevices.getUserMedia
API from browser and a React’suseRef
to manage the video element.
const videoRef = useRef<HTMLVideoElement | null>(null);
useEffect(() => {
if (navigator.mediaDevices.getUserMedia) {
navigator.mediaDevices.getUserMedia({ video: true })
.then((stream) => {
if (videoRef.current) {
videoRef.current.srcObject = stream;
}
})
.catch((error) => {
console.error("Camera access error: ", error);
});
}
}, []);
videoRef
: A reference to the video element where the video stream is displayed. This is essential for accessing and controlling the video feed within the React component.useEffect
: This hook ensures that the camera access is requested and the stream is applied to the video element as soon as the component mounts.
Processing Video Frames Using Canvas and ML Models
Once the video stream is active, we need to process each frame to run it through the machine learning models. We use an HTML
<canvas>
element (controlled via a ReactuseRef
) to capture and process the video frames in real-time.const canvasRef = useRef<HTMLCanvasElement | null>(null); const isModelRunningRef = useRef(false); const [captureStatus, setCaptureStatus] = useState(''); const validateFrame = ( faceLandmarkerResult?: FaceLandmarkerResult, canvas?: HTMLCanvasElement, ) => { const { isTooBright, isTooDark } = isTooDarkOrTooBright(canvas); if (isTooDark) { return 'TOO_DARK'; } if (isTooBright) { return 'TOO_BRIGHT'; } if (isMultipleFaces(faceLandmarkerResult)) { return 'MULTIPLE_FACE' } //...all other checks can be added here. return 'GOOD_PHOTO'; } const runModel = (canvas, faceLandmarker) => { // This is make sure to run models on new frames only if processing of previous frame is complete. // Please note this mean some frames are ignored and not processed if(isModelRunningRef.current === true) return; isModelRunningRef.current = true; // Process the frame using Face Landmarker const faceLandmarks = faceLandmarker.detect(canvas) // Process frame using internal ML Model const modelResult = runTFLiteModel(canvas); // Validate the frame const captureStatus = validateFrame(faceLandmarks, canvas, modelResult); setCaptureStatus(captureStatus); // use this state to show feedback on UI if(captureStatus !== 'GOOD_PHOTO'){ stopCapture(); setCaptureStatus(captureStatus); isModelRunningRef.current = false; return; } // captureStatus is POSITIVE, start the capture startCapture(); setCaptureStatus('GOOD_PHOTO'); isModelRunningRef.current = false; } const processFrame = (faceLandmarker) => { const canvas = canvasRef.current; const video = videoRef.current; if (canvas && video) { const context = canvas.getContext('2d')!; canvas.width = video.videoWidth; canvas.height = video.videoHeight; // Draw the current video frame onto the canvas context.drawImage(video, 0, 0, canvas.width, canvas.height); runModel(canvas, faceLandmarker); // Continue processing frames recursively window.requestAnimationFrame(processFrame); } }; navigator.mediaDevices .getUserMedia(constraints) .then((stream) => { streamRef.current = stream; if (videoRef.current == null) return; videoRef.current.srcObject = stream; videoRef.current.play(); // Start processing frames on `loadeddata` event on video element. videoRef.current.addEventListener('loadeddata', () => processFrame(faceLandmarker), ); })
canvasRef
: a reference to the canvas element where each frame from the video is drawn.processFrame
: this function is recursively called usingwindow.requestAnimationFrame
which basically meansprocessFrame
is called after each repaint done by browser and has it’s own advantage. For instance, if the tab is not active, thenprocessFrame
would not be called.startCapture()
: just starts the countdown and handle whatever is needed when countdown is started.
Validating the frame
To ensure the captured photo meets all the necessary criteria, we validate each frame by running various checks. Here are the utility functions used for validation:
Lighting Validation (
isTooDark
,isTooBright
)These functions check whether the lighting is either too dark or too bright, based on the RGB values of each pixel
const TOO_DARK_THRESHOLD = 60; const TOO_BRIGHT_THRESHOLD = 200; // This function will convert each color to gray scale and return average of all pixels, so final value will be between 0 (darkest) and 255 (brightest) const getFrameBrightness = (canvas: HTMLCanvasElement) => { const ctx = canvas.getContext('2d'); if (!ctx) return; let colorSum = 0; const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height); const data = imageData.data; let r, g, b, avg; for (let x = 0, len = data.length; x < len; x += 4) { r = data[x]; g = data[x + 1]; b = data[x + 2]; avg = Math.floor((r + g + b) / 3); colorSum += avg; } // value between 0 - 255 const brightness = Math.floor(colorSum / (canvas.width * canvas.height)); return brightness; }; const isTooDarkOrTooBright = (canvas: HTMLCanvasElement) => { const brightness = getFrameBrightness(canvas); let isTooDark = false; let isTooBright = false; if (brightness == null) { return { isTooBright, isTooDark, }; } if (brightness < TOO_DARK_THRESHOLD) { isTooDark = true; } else if (brightness > TOO_BRIGHT_THRESHOLD) { isTooBright = true; } return { isTooBright, isTooDark, }; };
Checking for Multiple Faces (
isMultipleFaces
)Result returned by face landmarker can be passed to this utility and if there are face landmarks of multiple faces present, this returns
true
export const isMultipleFaces = ( faceLandmarkerResult, ) => { if (faceLandmarkerResult && faceLandmarkerResult.faceLandmarks.length > 1) { return true; } return false; };
Face Cutoff Detection (
isFaceCutoff
)This function checks whether any of the face landmarks are outside the boundaries of the image (canvas). Since
x
andy
co-ordinates in face landmarker result are normalized, we convert to actual pixel co-ordinates and multiplying with frame width and height accordingly.import { NormalizedLandmark } from '@mediapipe/tasks-vision'; function isFaceCutOffScreen( faceLandmarks: NormalizedLandmark[], imgW: number, imgH: number, ): boolean { for (const landmark of faceLandmarks) { const x = Math.round(landmark.x * imgW); const y = Math.round(landmark.y * imgH); if (x <= 0 || x >= imgW || y <= 0 || y >= imgH) { return true; } } return false; }
Face Distance Detection (
isFaceTooClose
,isFaceTooFar
)This function determines if the face is too far from the camera by measuring the distance between the eyes.
import { NormalizedLandmark } from '@mediapipe/tasks-vision'; // Calculate Euclidean distance between two points const getDistance = (point1: number[], point2: number[]): number => { const [x1, y1] = point1; const [x2, y2] = point2; return Math.sqrt(Math.pow(x2 - x1, 2) + Math.pow(y2 - y1, 2)); }; const FACE_TOO_CLOSE_THRESHOLD = 370; const FACE_TOO_FAR_THRESHOLD = 300; function isFaceTooFar( landmark: NormalizedLandmark[], imgW: number, imgH: number, threshold: number = FACE_TOO_FAR_THRESHOLD, ): boolean { const leftEye = [landmark[33].x * imgW, landmark[33].y * imgH]; const rightEye = [landmark[263].x * imgW, landmark[263].y * imgH]; // Calculate the distance between the eyes const eyeDistance = getDistance(leftEye, rightEye); return eyeDistance < threshold; } function isFaceTooClose( landmark: NormalizedLandmark[], imgW: number, imgH: number, threshold: number = FACE_TOO_CLOSE_THRESHOLD, ): boolean { const leftEye = [landmark[33].x * imgW, landmark[33].y * imgH]; const rightEye = [landmark[263].x * imgW, landmark[263].y * imgH]; // Calculate the distance between the eyes const eyeDistance = getDistance(leftEye, rightEye); return eyeDistance > threshold; }
Is the face centered ?
These functions check whether the face is positioned too far to the left, too far right, too far up, too far down in frame. This is done by checking leftmost, rightmost, topmost and bottommost points from landmarks and adjusting the threshold accordingly.
const FACE_TOO_RIGHT_THRESHOLD = 500; const FACE_TOO_LEFT_THRESHOLD = 600; const FACE_TOO_FAR_UP_THRESHOLD = 150; const FACE_TOO_FAR_DOWN_THRESHOLD = 450; function isFaceTooFarLeft( landmark: NormalizedLandmark[], imgWidth: number, thresholdRatio: number = FACE_TOO_LEFT_THRESHOLD, ): boolean { const leftmostX = Math.min( landmark[1].x * imgWidth, landmark[263].x * imgWidth, ); return leftmostX > thresholdRatio; } function isFaceTooFarRight( landmark: NormalizedLandmark[], imgWidth: number, thresholdRatio: number = FACE_TOO_RIGHT_THRESHOLD, ): boolean { const rightmostX = Math.max( landmark[1].x * imgWidth, landmark[263].x * imgWidth, ); return rightmostX < thresholdRatio; } function isFaceTooFarUp( landmark: NormalizedLandmark[], imgHeight: number, thresholdRatio: number = FACE_TOO_FAR_UP_THRESHOLD, ): boolean { const topmostY = landmark[10].y * imgHeight; return topmostY < thresholdRatio; } function isFaceTooFarDown( landmark: NormalizedLandmark[], imgHeight: number, thresholdRatio: number = FACE_TOO_FAR_DOWN_THRESHOLD, ): boolean { const bottommostY = landmark[10].y * imgHeight; return bottommostY > thresholdRatio; }
Are Eyes Closed?
Fortunately Face Landmarker returns something called as face blendshapes which has different face attributes like are eyes closed, looking left right etc. We can leverage 2 of these attributes to check if eyes are closed or not.
For more such attributes, please refer to this codepen - https://codepen.io/mediapipe-preview/pen/OJBVQJm
import { FaceLandmarkerResult } from '@mediapipe/tasks-vision'; const isEyesClosed = (faceLandmarkResult: FaceLandmarkerResult) => { const result = faceLandmarkResult?.faceBlendshapes?.[0]?.categories ?.filter( (category: any) => category.categoryName === 'eyeBlinkLeft' || category.categoryName === 'eyeBlinkRight', ) ?.map((category: any) => category.score); if (!result) return false; return result[0] > 0.5 || result[1] > 0.5; };
Detecting Head Orientation
To check if user is looking up, down, left right, we can calculate something called as
yaw
andpitch
angles. There are ways to calculate these angles using OpenCV library which includes doing some complex calculations on landmark points to get these angles. You can check it out here - https://medium.com/@susanne.thierfelder/head-pose-estimation-with-mediapipe-and-opencv-in-javascript-c87980df3acbThough I did not wanted to add OpenCV package as dependency to the project just for this usecase, so I found an alternative to the above method which does a decent job. You can read more about it here - https://medium.com/@sshadmand/a-simple-and-efficient-face-direction-detection-in-react-e02cd9d547e5
Here’s how I implemented the same -
const getAngleBetweenLines = ( midpoint: NormalizedLandmark, point1: NormalizedLandmark, point2: NormalizedLandmark, ) => { const vector1 = { x: point1.x - midpoint.x, y: point1.y - midpoint.y }; const vector2 = { x: point2.x - midpoint.x, y: point2.y - midpoint.y }; // Calculate the dot product of the two vectors const dotProduct = vector1.x * vector2.x + vector1.y * vector2.y; // Calculate the magnitudes of the vectors const magnitude1 = Math.sqrt(vector1.x * vector1.x + vector1.y * vector1.y); const magnitude2 = Math.sqrt(vector2.x * vector2.x + vector2.y * vector2.y); // Calculate the cosine of the angle between the two vectors const cosineTheta = dotProduct / (magnitude1 * magnitude2); // Use the arccosine function to get the angle in radians const angleInRadians = Math.acos(cosineTheta); // Convert the angle to degrees const angleInDegrees = (angleInRadians * 180) / Math.PI; return angleInDegrees; }; const calculateDirection = ( faceLandmarkerResult: FaceLandmarkerResult, ) => { const landmarks = faceLandmarkerResult.faceLandmarks[0]; // leftmost, center, rightmost points of nose. if (!landmarks?.[1] || !landmarks?.[279] || !landmarks?.[49]) return { isLookingDown: false, isLookingLeft: false, isLookingRight: false, isLookingUp: false, }; const noseTip = { ...landmarks[1] }; const leftNose = { ...landmarks[279] }; const rightNose = { ...landmarks[49] }; // MIDESCTION OF NOSE IS BACK OF NOSE PERPENDICULAR const midpoint: NormalizedLandmark = { x: (leftNose.x + rightNose.x) / 2, y: (leftNose.y + rightNose.y) / 2, z: (leftNose.z + rightNose.z) / 2, visibility: 0, }; const perpendicularUp: NormalizedLandmark = { x: midpoint.x, y: midpoint.y - 50, z: midpoint.z, visibility: 0, }; // CALC ANGLES const pitch = getAngleBetweenLines(midpoint, noseTip, perpendicularUp); const yaw = getAngleBetweenLines(midpoint, rightNose, noseTip); const isLookingUp = pitch < PITCH_UP_THRESHOLD; const isLookingDown = pitch > PITCH_DOWN_THRESHOLD; const isLookingLeft = yaw > YAW_LEFT_THRESHOLD; const isLookingRight = yaw < YAW_RIGHT_THRESHOLD; return { isLookingDown, isLookingLeft, isLookingRight, isLookingUp }; };
Face Capture and Final Confirmation
Once all validations pass and the frame is deemed valid, a countdown starts, and the frame is captured automatically.
useCountdown
hook can be implemented from scratch or can be consumed from any external package. I usedusehooks-ts
package as I did not want to reinvent the wheel and this package handles the nitty gritty details of hook’s implementation.import { useCountdown } from 'usehooks-ts'; const isCapturingRef = useRef(false); const [photo, setPhoto] = useState<Blob | null>(null); const [count, { startCountdown, stopCountdown, resetCountdown }] = useCountdown({ countStart: 3, countStop: 1, intervalMs: 1000, }); const startCapture = () => { startCountdown(); }; const stopCapture = () => { stopCountdown(); resetCountdown(); }; const onImageCapture = () => { if (canvasRef && canvasRef.current) { const context = canvasRef.current.getContext('2d'); if (context) { // Convert the canvas to a blob and store photo in state canvasRef.current.toBlob((b) => setPhoto(b), 'image/jpeg', 0.9); } } }; useEffect(() => { if (count === 1) { onImageCapture(); } }, [count]);
Finally we have our captured photo stored in a React’s state photo
, which can be consumed as needed. This can be shown to user for confirmation and then sent to upstream services.
Useful trick
To get the image url from a blob, you can simply use URL.createObjectURL(photo)
this will return a string which can be passed to src
attribute of img
tag.
Fine-Tuning Thresholds
While the conditions mentioned above work well out of the box, it’s highly customizable. You can adjust thresholds for detecting brightness, face distance etc.
Performance Optimization
Since the models are running continuously processing one frame after another, it can overwhelm the main thread potentially deeming the UI to be frozen degrading user experience, unusable. To solve this, we can run the models asynchronously. Especially for time-consuming operations like face detection, asynchronous execution is preferred to maintain a responsive user interface and provide a better user experience.
So I wrote a wrapper which converts a sync function to an async function and used this wrapper to run Face Landmarker.
function asyncWrapper(syncFunction: () => void) {
return new Promise((resolve, reject) => {
setTimeout(() => {
try {
const result = syncFunction();
resolve(result);
} catch (error) {
reject(error);
}
}, 0);
});
}
const runModel = async () => {
//...
await asyncWrapper(() => faceLandmarker.detect(canvas);
//....
}
I hope this is helpful to you! If you have any questions or need further assistance, don't hesitate to reach out to me.
Subscribe to my newsletter
Read articles from Gaurav Sharma directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Gaurav Sharma
Gaurav Sharma
I like building stuff.