The Kinect is a hardware sensor that gave back the vision community belief in the active vision systems. Too smart to be passive, the Kinect combines stereovision with active vision.
So now we have a depth image as shown below and not only colored image. As active vision is used, result is sure better than passive stereo vision.
So now we have a depth image as shown on the right and not only a colored image. But what is the range and the precision of the Kinect ?
Thanks to the UCL Department of Civil, Environmental and Geomatic Engineering, we have a Depth resolution vs distance study.
- The first step now is to extract a user out of this Depth data: In the University of Texas at Austin
Department of Electrical and Computer Engineering, they combine different techniques to detect the human.
- The second step is to use Decision forests to learn how a person looks like, learning it part by part, or guessing where the closest joint is with regression.
The rest of the story is that today, a bunch of players are jumping in their living room in front of their TV. Tomorrow, we might be not necessarily jumping for gaming, but making hand signs to control things around as a Natural User Interface.
The key point to these kind of applications goes beyond the Kinect sensor, and relies onto machine learning for what do we want to recognize, hands, objects…
I dedicate this first blog to the Kinect because this one product out-passes many years of future expected research in human posture recognition:
- They hoped to do it in real time, Kinect did it in super real time.
- They hoped for specific postures recognition, Kinect recognizes general purpose ones.
Robotics community promises us since the 70s that very soon we will have home robots that will clean our tables and do home tasks yet intelligent robots do only give indirect advances for all other fields like multidimensional path planning and navigation (they can move), precise control (they can act) but only if they’re certain of what’s around them, but they’re still too young to be trusted for autonomous decision.
Embedded vision research is going side by side with robotics research, where vision researchers promises roboticists that very soon a robot will be able to perceive all of it’s environment and why not make the difference between a needle and a paperclip. Based on these assumptions, roboticists do simulations (or scenarios) and promises all what is possible to do given that vision fulfills its promises.
After some first real applications of cars plate number readings that only made us get more fined, the next generation of vision applications are finally here. Active vision, scene flow processing, machine learning, might help to perceive and why not augment the reality, hopefully not making it worse.