The kinect IR pattern is not very dense, thus many patterns coming from many Kinects might combine without major interference.
Let’s remind that the pattern is a laser infrared fixed (not blinking) structured light. Structured in the way that it’s like an image that’s already available in the hardware memory and the disparity allows to process the distance from the sensor.
We have two kinds of multi kinects system interference:
We can notice on the left image on the middle more noise than in the right image. In fact, that’s another Kinect facing the first one. In one case, we assume that luckily, no special beams reached the first sensor. In the second case, the sensor has slightly been moved, that placed some IR beams to interfere with the sensor.
Here in the right image, a second sensor projector has been activated in the same area sensed by the first one.
So what are the ways out of this trap ? Are we limited to one sensor for good depth sensing accuracy. Ways out might be:
- As simple as orienting the sensors to different directions
- Sensors scanning the same object from both sides might be placed higher than the object and a little bent so that the projector of one doesn’t reach the second sensor.
- Another open idea is to use a unique pattern with multiple sensors. Well already an infrared pattern textured surface can better deal with classic stereo vision, or a pattern localization might be operated but it seems to be a bit complex.
The Kinect is a hardware sensor that gave back the vision community belief in the active vision systems. Too smart to be passive, the Kinect combines stereovision with active vision.
So now we have a depth image as shown below and not only colored image. As active vision is used, result is sure better than passive stereo vision.
So now we have a depth image as shown on the right and not only a colored image. But what is the range and the precision of the Kinect ?
Thanks to the UCL Department of Civil, Environmental and Geomatic Engineering, we have a Depth resolution vs distance study.
- The first step now is to extract a user out of this Depth data: In the University of Texas at Austin
Department of Electrical and Computer Engineering, they combine different techniques to detect the human.
- The second step is to use Decision forests to learn how a person looks like, learning it part by part, or guessing where the closest joint is with regression.
The rest of the story is that today, a bunch of players are jumping in their living room in front of their TV. Tomorrow, we might be not necessarily jumping for gaming, but making hand signs to control things around as a Natural User Interface.
The key point to these kind of applications goes beyond the Kinect sensor, and relies onto machine learning for what do we want to recognize, hands, objects…
I dedicate this first blog to the Kinect because this one product out-passes many years of future expected research in human posture recognition:
- They hoped to do it in real time, Kinect did it in super real time.
- They hoped for specific postures recognition, Kinect recognizes general purpose ones.
Robotics community promises us since the 70s that very soon we will have home robots that will clean our tables and do home tasks yet intelligent robots do only give indirect advances for all other fields like multidimensional path planning and navigation (they can move), precise control (they can act) but only if they’re certain of what’s around them, but they’re still too young to be trusted for autonomous decision.
Embedded vision research is going side by side with robotics research, where vision researchers promises roboticists that very soon a robot will be able to perceive all of it’s environment and why not make the difference between a needle and a paperclip. Based on these assumptions, roboticists do simulations (or scenarios) and promises all what is possible to do given that vision fulfills its promises.
After some first real applications of cars plate number readings that only made us get more fined, the next generation of vision applications are finally here. Active vision, scene flow processing, machine learning, might help to perceive and why not augment the reality, hopefully not making it worse.