Researchers at Google have developed a technology that allows high-fidelity hand and finder tracking by using machine learning. Currently, approaches to hand perception rely primarily on desktop environments for inference. Google’s new approach is described as achieving real-time performance on a mobile phone and even scaling to multiple hands. The technology is based on several machine learning models working together: a palm detector model operating on the full image, a hand landmark model operating on the image region defined by the palm detector and generate 3D hand key points, and a gesture recogniser that classifies the previously computed key point configuration into a set of gestures. Google chose to make its new technology available to the research and development community. While the company will continue to work on improving the mechanisms to allow more robust and stable hand tracking, it hopes that other researchers will come up with creative use cases, including in sign language understanding.

cross-circle