Hands on MediaPipe and OpenCV
Learning about computer vision, I found OpenCV. OpenCV is a library for real-time applications of computer vision. This is a cross-platform developed by Intel. Originally, OpenCV library was written in C++ programming language but it has binding with other languages also which are python, javascript and MATLAB.
Let us start with some interesting questions. Have you ever wondered what is behind the google assistant in your phones. Well that is MediaPipe.
MediaPipe is a cross-platform framework designed for building machine learning libraries for time-series data processing like video, audio etc. It provides a suite of libraries and tools to apply machine learning and artificial intelligence techniques in the applications. There are cross-platform APIs, libraries, pre-trained models and studios to work on these framework.
Let us simply know about an application of MediaPipe to use in hand landmark and movement detection. MediaPipe provides hand movement and finger tracking solution. By providing this hand perception functionality to the wider research and development community, it will result in an emergence of creative use cases, stimulating new applications and new research avenues.
There is a small code which can be used for hand landmark and palm detection. First there will be palm detection model through which one can detect palm addressing and carrying various objects, bounding fists and articulated fingers. Later, hand landmark model will be developed. In hand landmark model, keypoint localization of 21 3D hand-knuckle coordinates inside the detected hand regions via regression.
From this one can detect any part of the hand.
image = cv2.flip(cv2.imread(file), 1)
# Convert the BGR image to RGB before processing.
results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
print('Handedness:', results.multi_handedness)
if not results.multi_hand_landmarks:
continue
image_height, image_width, _ = image.shape
annotated_image = image.copy()
for hand_landmarks in results.multi_hand_landmarks:
print('hand_landmarks:', hand_landmarks)
print(
f'Index finger tip coordinates: (',
f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, '
f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})'
)
From this, one can know the coordinates of index finger in a 2D graphical model.
If we want to draw hand annotations as shown in the video above, one can use following code:
# Draw the hand annotations on the image.
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(
image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())
Computer vision library can be used to capture the image as
cap = cv2.VideoCapture(0)
cv2 is a computer vision library used here to capture incoming video and use it for application.
There are many applications of MediaPipe which are:
Object detection
Image classification
Image segmentation
Interactive segmentation
Gesture recognition
Hand landmark detection
Image embedding
Face detection
Face landmark detection
Pose landmark detection
With these, I would like to conclude my article.
Hope you guys liked it
Thank you
Akhil Soni
To connect from me: https://www.linkedin.com/in/akhil-soni-9827181a1/
Subscribe to my newsletter
Read articles from Akhil Soni directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Akhil Soni
Akhil Soni
I am an ML enthusiast along with passionate for development and also interested in programming and problem solving.