We introduce a multi-scale framework for low-level vision, where the goal is estimating physical scene values from image data---such as depth from stereo image pairs. The framework uses a dense, overlapping set of image regions at multiple scales and a ``local model,'' such as a slanted-plane model for stereo disparity, that is expected to be valid piecewise across the visual field. Estimation is cast as optimization over a dichotomous mixture of variables, simultaneously determining which regions are inliers with respect to the local model (binary variables) and the correct co-ordinates in the local model space for each inlying region (continuous variables). When the regions are organized into a multi-scale hierarchy, optimization can occur in an efficient and parallel architecture, where distributed computational units iteratively perform calculations and share information through sparse connections between parents and children. The framework performs well on a standard benchmark for binocular stereo, and it produces a distributional scene representation that is appropriate for combining with higher-level reasoning and other low-level cues.
We develop a framework for extracting a concise representation of the shape information available from diffuse shading in a small image patch. This produces a mid-level scene descriptor, comprised of local shape distributions that are inferred separately at every image patch across multiple scales. The framework is based on a quadratic representation of local shape that, in the absence of noise, has guarantees on recovering accurate local shape and lighting. And when noise is present, the inferred local shape distributions provide useful shape information without over-committing to any particular image explanation. These local shape distributions naturally encode the fact that some smooth diffuse regions are more informative than others, and they enable efficient and robust reconstruction of object-scale shape. Experimental results show that this approach to surface reconstruction compares well against the state-of-art on both synthetic images and captured photographs.
Recent attempts to fabricate surfaces with custom reflectance functions boast impressive angular resolution, yet their spatial resolution is limited. In this paper we present a method to construct spatially varying reflectance at a high resolution of up to 220dpi, orders of magnitude greater than previous attempts, albeit with a lower angular resolution. The resolution of previous approaches is limited by the machining, but more fundamentally, by the geometric optics model on which they are built. Beyond a certain scale geometric optics models break down and wave effects must be taken into account. We present an analysis of incoherent reflectance based on wave optics and gain important insights into reflectance design. We further suggest and demonstrate a practical method, which takes into account the limitations of existing micro-fabrication techniques such as photolithography to design and fabricate a range of reflection effects, based on wave interference.
[SIGGRAPH 2013] [Slides] [Poster] [Video] [Project page]
To produce images that are suitable for display, tone-mapping is widely used in digital cameras to map linear color measurements into narrow gamuts with limited dynamic range. This introduces non-linear distortion that must be undone, through a radiometric calibration process, before computer vision systems can analyze such photographs radiometrically. This paper considers the inherent uncertainty of undoing the effects of tone-mapping. We observe that this uncertainty varies substantially across color space, making some pixels more reliable than others. We introduce a model for this uncertainty and a method for fitting it to a given camera or imaging pipeline. Once fit, the model provides for each pixel in a tone-mapped digital photograph a probability distribution over linear scene colors that could have induced it. We demonstrate how these distributions can be useful for visual inference by incorporating them into estimation algorithms for a representative set of vision tasks.
[PAMI 2014] [CVPR 2012] [Slides] [Poster] [Project page]
We present a blind estimation algorithm for multi-input and multi-output (MIMO) systems with sparse common support. Key to the proposed algorithm is a matrix generalization of the classical annihilating filter technique, which allows us to estimate the nonlinear parameters of the channels through an efficient and noniterative procedure. An attractive property of the proposed algorithm is that it only needs the sensor measurements at a narrow frequency band. By exploiting this feature, we can derive efficient sub-Nyquist sampling schemes which significantly reduce the number of samples that need to be retained at each sensor. Numerical simulations verify the accuracy of the proposed estimation algorithm and its robustness in the presence of noise.
[ICASSP 2012] [Poster]