Eric Hunsberger, Victor Reyes Osorio, Jeff Orchard, Bryan P Tripp
The most accurate stereo disparity algorithms take dozens or hundreds of seconds to process a single frame. This timescale is impractical for many applications. However, high accuracy is often not needed throughout the scene. Here, we investigate a “foveation” approach (in which some parts of an image are processed more intensively than others) in the context of modern stereo algorithms. We consider two scenarios: disparity estimation with a convolutional network in a robotic grasping context, and disparity estimation with a Markov random field in a navigation context. In each case, combining fast and slow methods in different parts of the scene improves frame rates while maintaining accuracy in the most task-relevant areas. We also demonstrate a simple and broadly applicable utility function for choosing foveal regions, which combines image and task information. Finally, we characterize the benefits of defining multiple individually placed small foveae per image, rather than a single large fovea. We find little benefit, supporting the use of hardware foveae of fixed size and shape. More generally, our results reaffirm that foveation is a practical way to combine speed with task-relevant accuracy. Foveae are present in the most complex biological vision systems, suggesting that they may become more important in artificial vision systems, as these systems become more complex.