End-to-End Policy Learning for Active Visual Categorization
Jayaraman, Dinesh, etc.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41(7): 1601-1614
Visual recognition systems mounted on autonomous moving agents face the challenge of unconstrained data, but simultaneously have the opportunity to improve their performance by moving to acquire new views at test time. In this work, we first show how a recurrent neural network-based system may be trained to perform end-to-end learning of motion policies suited for this "active recognition" setting. Further, we hypothesize that active vision requires an agent to have the capacity to reason about the e?ects of its motions on its view of the world. To verify this hypothesis, we attempt to induce this capacity in our active recognition pipeline, by simultaneously learning to forecast the e?ects of the agent's motions on its internal representation of the environment conditional on all past views. Results across three challenging datasets confirm both that our end-to-end system successfully learns meaningful policies for active category recognition, and that "learning to look ahead" further boosts recognition performance.
GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence
Bian, Jia-Wang, etc.
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128(6): 1580-1593
Feature matching aims at generating correspondences across images, which is widely used in many computer vision tasks. Although considerable progress has been made on feature descriptors and fast matching for initial correspondence hypotheses, selecting good ones from them is still challenging and critical to the overall performance. More importantly, existing methods oen take a long computational time, limiting their use in real-time applications. This paper attempts to separate true correspondences from false ones at high speed. We term the proposed method (GMS) grid-based motion Statistics, which incorporates the smoothness constraint into a statistic framework for separation and uses a grid-based implementation for fast calculation. GMS is robust to various challenging image changes, involving in viewpoint, scale, and rotation. It is also fast, e.g., take only 1 or 2 ms in a single CPU thread, even when 50K correspondences are processed. This has important implications for real-time applications. What's more, we show that incorporating GMS into the classic feature matching and epipolar geometry estimation pipeline can significantly boost the overall performance. Finally, we integrate GMS into the wellknown ORB-SLAM system for monocular initialization, resulting in a significant improvement.
Minimal navigation solution for a swarm of tiny flying robots to explore an unknown environment
McGuire, K. N., etc.
SCIENCE ROBOTICS, 2019, 4(35)
Swarms of tiny flying robots hold great potential for exploring unknown, indoor environments. Their small size allows them to move in narrow spaces, and their light weight makes them safe for operating around humans. Until now, this task has been out of reach due to the lack of adequate navigation strategies. The absence of external infrastructure implies that any positioning attempts must be performed by the robots themselves. State-of-the-art solutions, such as simultaneous localization and mapping, are still too resource demanding. This article presents the swarm gradient bug algorithm (SGBA), a minimal navigation solution that allows a swarm of tiny flying robots to autonomously explore an unknown environment and subsequently come back to the departure point. SGBA maximizes coverage by having robots travel in di?erent directions away from the departure point. The robots navigate the environment and deal with static obstacles on the fly by means of visual odometry and wall-following behaviors. Moreover, they communicate with each other to avoid collisions and maximize search e?iciency. To come back to the departure point, the robots perform a gradient search toward a home beacon. We studied the collective aspects of SGBA, demonstrating that it allows a group of 33-g commercial o?-the-shelf quadrotors to successfully explore a realworld environment. The application potential is illustrated by a proof-of-concept search-and-rescue mission in which the robots captured images to find "victims" in an o?ice environment. The developed algorithms generalize to other robot types and lay the basis for tackling other similarly complex missions with robot swarms in the future.
Active Perception for Foreground Segmentation: An RGB-D Data-Based Background Modeling Method
Sun, Yuxiang, etc.
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2019, 16(4): 1596-1609
Foreground moving object segmentation is a fundamental problem in many computer vision applications. As a solution for foreground segmentation, background modeling has been intensively studied over past years and many e?ective algorithms have been developed. However, accurate foreground segmentation is still a di?icult problem. Currently, most of the algorithms work solely within the color space, in which the segmentation performance is prone to be degraded by a multitude of challenges, such as illumination changes, shadows, automatic camera adjustments, and color camouflage. RGB-D cameras are active visual sensors that provide depth measurements along with color images. We present in this paper an innovative background modeling method by using both the color and depth information from an RGB-D camera. The proposed method is evaluated using a public RGB-D data set. Various experiments confirm that our method is able to achieve superior performance compared with existing well-known methods. Note to Practitioners-This paper investigates background modeling for foreground segmentation with active perception. Recent RGB-D cameras that leverage the active perception technology have advanced many computer vision algorithms. In this paper, we develop a background modeling method to achieve superior performance by using an RGB-D camera instead of a color camera. Due to the use of the active sensing technology, the proposed method is characterized by its robustness to common challenges. Our method could be used for improving existing infrastructures, such as visual surveillance systems for parking spaces. Moreover, the simple design of our method allows it to be easily deployed on various computing platforms, which facilitates many practical applications that usually require embedded computing devices. However, our method cannot run real timely at the current status. We believe that it can be further improved using parallel programming techniques to meet the real-time requirement.