Mediapipe: Fingers counting in Python w/o GPU

Dhruv Pandey
Analytics Vidhya
Published in
4 min readApr 15, 2021

--

Some of the popular techniques for counting fingers in a image are by training a CNN or using contours and convexity hull. I have actually tried both of these techniques and in this section would like to mention the challenges that I faced with these techniques(putting links to check work done).

Experiments and challenges

  1. CNN Approach: The model is able to achieve good training and validation accuracy. The final plot also looks good. But when it comes to detection in real-life images, the model fails very badly. I tried tuning the hyperparameters, applying data augmentation, transfer learning, learning rate decay, tuning model architecture but alas no improvements. The main reason why the model fails on real-life images is that the training and testing images are very similar and over-simplified, so the model tries to overfit and learn quickly. Check my attempts 1 and 2 (feel free to suggest any changes that can help)
  2. Contours and Convexity Hull: This approach performs quite better as compared to the first. The detections are very quick and with good confidence. The only challenge here is that you have to take care of the background, doesn’t work with the crowded background.

So after so much investigation, I encountered the Hands module of Mediapipe library which surprisingly performed very well and didn’t have any challenges that I faced above plus it’s super easy to implement, needs no GPU. This article is in continuation to my previously written article about Mediapipe. I strongly recommend going through it before starting this one.

Small tuning to HandDetecor class

Before starting to work on the finger counter, you need to add a small piece of code to your existing Hand detector class that you wrote in the last article. You can check the complete source code to find out where exactly to put this change.

if results.multi_handedness:
label = results.multi_handedness[handNumber].classification[0].label # label gives if hand is left or right
#account for inversion in webcams
if label == "Left":
label = "Right"
elif label == "Right":
label = "Left"

Earlier we were only sending id, x, and y coordinates to the calling function. Now we also send a string representing if the hand is left or right. We also need to account for lateral inversions happening while you read the image from the webcam.

Actual finger counter work

  1. Import the libraries and do the standard initialization(all details discussed in the first article).
from handDetector import HandDetector
import cv2

handDetector = HandDetector(min_detection_confidence=0.7)
webcamFeed = cv2.VideoCapture(0)

2. The main logic goes here:

while True:
status, image = webcamFeed.read()
handLandmarks = handDetector.findHandLandMarks(image=image, draw=True)
count=0

if(len(handLandmarks) != 0):
#we will get y coordinate of finger-tip and check if it lies above middle landmark of that finger
#details: https://google.github.io/mediapipe/solutions/hands

if handLandmarks[4][3] == "Right" and handLandmarks[4][1] > handLandmarks[3][1]: #Right Thumb
count = count+1
elif handLandmarks[4][3] == "Left" and handLandmarks[4][1] < handLandmarks[3][1]: #Left Thumb
count = count+1
if handLandmarks[8][2] < handLandmarks[6][2]: #Index finger
count = count+1
if handLandmarks[12][2] < handLandmarks[10][2]: #Middle finger
count = count+1
if handLandmarks[16][2] < handLandmarks[14][2]: #Ring finger
count = count+1
if handLandmarks[20][2] < handLandmarks[18][2]: #Little finger
count = count+1

cv2.putText(image, str(count), (45, 375), cv2.FONT_HERSHEY_SIMPLEX, 5, (255, 0, 0), 25)
cv2.imshow("Volume", image)
cv2.waitKey(1)

We read the webcam frame and initialize a variable count. This would be the final variable that will hold the fingers count.

Have a look at this image from Mediapipe’s hand module:

The image is taken from Mediapipe’s official website. Please check here to view full working details.

The logic which we are using here is that if the y coordinate of the tip of any finger lies below the y coordinate of the central landmark of that finger, that means the finger is closed.

For example: If the y coordinate of landmark 8 is less than the y coordinate of landmark 6 → This means the index finger is closed.

if handLandmarks[8][2] < handLandmarks[6][2]:
count = count+1

A special case is for the thumbs. If you try making a fist, your thumb does not close vertically like other fingers, instead, it closes horizontally. So your thumb will not follow the same rule as that of your fingers. Instead, you would have to put the condition on x coordinate(try it yourself, I gave an image just for reference).

Thumb position for “0”
Thumb position for “1”
if handLandmarks[4][3] == "Right" and handLandmarks[4][1] > handLandmarks[3][1]:       #Right Thumb
count = count+1
elif handLandmarks[4][3] == "Left" and handLandmarks[4][1] < handLandmarks[3][1]: #Left Thumb
count = count+1

In the final part, you do some routine work to display the count on your frame.

This is pretty much the code. When you run this, you will see the count getting printed on the frame. You will notice that the detection is pretty fast without any lags.

Final words

See how simple it was to implement a finger counter. The idea behind writing this article is not to motivate you to stop using deep learning and switch to some third-party libraries. Instead before starting working on any problem, just spend some time researching the available options in the market and take the decision is it worth doing everything from scratch or building something on top of the available resources.

Rest everything is up to your will and creativity. You can find the code for this project here.

--

--

Dhruv Pandey
Analytics Vidhya

A machine learning and computer vision enthusiast working as a web developer in Finland.